ArrayExpress update--trends in database growth and links to data analysis tools.
Rustici, Gabriella; Kolesnikov, Nikolay; Brandizi, Marco; Burdett, Tony; Dylag, Miroslaw; Emam, Ibrahim; Farne, Anna; Hastings, Emma; Ison, Jon; Keays, Maria; Kurbatova, Natalja; Malone, James; Mani, Roby; Mupo, Annalisa; Pedro Pereira, Rui; Pilicheva, Ekaterina; Rung, Johan; Sharma, Anjan; Tang, Y Amy; Ternent, Tobias; Tikhonov, Andrew; Welter, Danielle; Williams, Eleanor; Brazma, Alvis; Parkinson, Helen; Sarkans, Ugis
2013-01-01
The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.
Geoseq: a tool for dissecting deep-sequencing datasets.
Gurtowski, James; Cancio, Anthony; Shah, Hardik; Levovitz, Chaya; George, Ajish; Homann, Robert; Sachidanandam, Ravi
2010-10-12
Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.
Li, Shuyu; Li, Yiqun Helen; Wei, Tao; Su, Eric Wen; Duffin, Kevin; Liao, Birong
2006-10-25
The tissue expression pattern of a gene often provides an important clue to its potential role in a biological process. A vast amount of gene expression data have been and are being accumulated in public repository through different technology platforms. However, exploitations of these rich data sources remain limited in part due to issues of technology standardization. Our objective is to test the data comparability between SAGE and microarray technologies, through examining the expression pattern of genes under normal physiological states across variety of tissues. There are 42-54% of genes showing significant correlations in tissue expression patterns between SAGE and GeneChip, with 30-40% of genes whose expression patterns are positively correlated and 10-15% of genes whose expression patterns are negatively correlated at a statistically significant level (p = 0.05). Our analysis suggests that the discrepancy on the expression patterns derived from technology platforms is not likely from the heterogeneity of tissues used in these technologies, or other spurious correlations resulting from microarray probe design, abundance of genes, or gene function. The discrepancy can be partially explained by errors in the original assignment of SAGE tags to genes due to the evolution of sequence databases. In addition, sequence analysis has indicated that many SAGE tags and Affymetrix array probe sets are mapped to different splice variants or different sequence regions although they represent the same gene, which also contributes to the observed discrepancies between SAGE and array expression data. To our knowledge, this is the first report attempting to mine gene expression patterns across tissues using public data from different technology platforms. Unlike previous similar studies that only demonstrated the discrepancies between the two gene expression platforms, we carried out in-depth analysis to further investigate the cause for such discrepancies. Our study shows that the exploitation of rich public expression resource requires extensive knowledge about the technologies, and experiment. Informatic methodologies for better interoperability among platforms still remain a gap. One of the areas that can be improved practically is the accurate sequence mapping of SAGE tags and array probes to full-length genes.
Asamizu, E; Nakamura, Y; Sato, S; Tabata, S
2000-06-30
For comprehensive analysis of genes expressed in the model dicotyledonous plant, Arabidopsis thaliana, expressed sequence tags (ESTs) were accumulated. Normalized and size-selected cDNA libraries were constructed from aboveground organs, flower buds, roots, green siliques and liquid-cultured seedlings, respectively, and a total of 14,026 5'-end ESTs and 39,207 3'-end ESTs were obtained. The 3'-end ESTs could be clustered into 12,028 non-redundant groups. Similarity search of the non-redundant ESTs against the public non-redundant protein database indicated that 4816 groups show similarity to genes of known function, 1864 to hypothetical genes, and the remaining 5348 are novel sequences. Gene coverage by the non-redundant ESTs was analyzed using the annotated genomic sequences of approximately 10 Mb on chromosomes 3 and 5. A total of 923 regions were hit by at least one EST, among which only 499 regions were hit by the ESTs deposited in the public database. The result indicates that the EST source generated in this project complements the EST data in the public database and facilitates new gene discovery.
USDA-ARS?s Scientific Manuscript database
The complement of gamma gliadin genes expressed in the wheat cultivar Butte 86 was evaluated by analyzing publicly available expressed sequence tag (EST) data. Eleven contigs were assembled from 153 Butte 86 ESTs. Nine of the contigs encoded full-length proteins and four of the proteins contained an...
Cosmetology: Scope and Sequence.
ERIC Educational Resources Information Center
Nashville - Davidson County Metropolitan Public Schools, TN.
This scope and sequence guide, developed for a cosmetology vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…
Aircraft Mechanics: Scope and Sequence.
ERIC Educational Resources Information Center
Nashville - Davidson County Metropolitan Public Schools, TN.
This scope and sequence guide, developed for an aircraft mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and…
Auto Mechanics: Scope and Sequence.
ERIC Educational Resources Information Center
Nashville - Davidson County Metropolitan Public Schools, TN.
This scope and sequence guide, developed for an auto mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…
Diesel Mechanics: Scope and Sequence.
ERIC Educational Resources Information Center
Nashville - Davidson County Metropolitan Public Schools, TN.
This scope and sequence guide, developed for a diesel mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…
Commercial Art: Scope and Sequence.
ERIC Educational Resources Information Center
Nashville - Davidson County Metropolitan Public Schools, TN.
This scope and sequence guide, developed for a commercial art vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…
Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags
de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.
2000-01-01
Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084
Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo
2003-01-01
To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979
Zhou, Rongqiong; Xia, Qingyou; Huang, Hancheng; Lai, Min; Wang, Zhenxin
2011-10-01
Toxocara canis is a widespread intestinal nematode parasite of dogs, which can also cause disease in humans. We employed an expressed sequence tag (EST) strategy in order to study gene-expression including development, digestion and reproduction of T. canis. ESTs provided a rapid way to identify genes, particularly in organisms for which we have very little molecular information. In this study, a cDNA library was constructed from a female adult of T. canis and 215 high-quality ESTs from 5'-ends of the cDNA clones representing 79 unigenes were obtained. The titer of the primary cDNA library was 1.83×10(6)pfu/mL with a recombination rate of 99.33%. Most of the sequences ranged from 300 to 900bp with an average length of 656bp. Cluster analysis of these ESTs allowed identification of 79 unique sequences containing 28 contigs and 51 singletons. BLASTX searches revealed that 18 unigenes (22.78% of the total) or 70 ESTs (32.56% of the total) were novel genes that had no significant matches to any protein sequences in the public databases. The rest of the 61 unigenes (77.22% of the total) or 145 ESTs (67.44% of the total) were closely matched to the known genes or sequences deposited in the public databases. These genes were classified into seven groups based on their known or putative biological functions. We also confirmed the gene expression patterns of several immune-related genes using RT-PCR examination. This work will provide a valuable resource for the further investigations in the stage-, sex- and tissue-specific gene transcription or expression. Copyright © 2011. Published by Elsevier Inc.
The bovine lactation genome: Insights into the evolution of mammalian milk
USDA-ARS?s Scientific Manuscript database
The newly assembled Bos Taurus genome sequence enables the linkage of bovine milk and lactation data with other mammalian genomes. Using publicly available milk proteome data and mammary expressed sequence tags, 197 milk protein genes and over 6,000 mammary genes were identified in the bovine genome...
PanGEA: identification of allele specific gene expression using the 454 technology.
Kofler, Robert; Teixeira Torres, Tatiana; Lelley, Tamas; Schlötterer, Christian
2009-05-14
Next generation sequencing technologies hold great potential for many biological questions. While mainly used for genomic sequencing, they are also very promising for gene expression profiling. Sequencing of cDNA does not only provide an estimate of the absolute expression level, it can also be used for the identification of allele specific gene expression. We developed PanGEA, a tool which enables a fast and user-friendly analysis of allele specific gene expression using the 454 technology. PanGEA allows mapping of 454-ESTs to genes or whole genomes, displaying gene expression profiles, identification of SNPs and the quantification of allele specific gene expression. The intuitive GUI of PanGEA facilitates a flexible and interactive analysis of the data. PanGEA additionally implements a modification of the Smith-Waterman algorithm which deals with incorrect estimates of homopolymer length as occuring in the 454 technology To our knowledge, PanGEA is the first tool which facilitates the identification of allele specific gene expression. PanGEA is distributed under the Mozilla Public License and available at: http://www.kofler.or.at/bioinformatics/PanGEA
PanGEA: Identification of allele specific gene expression using the 454 technology
Kofler, Robert; Teixeira Torres, Tatiana; Lelley, Tamas; Schlötterer, Christian
2009-01-01
Background Next generation sequencing technologies hold great potential for many biological questions. While mainly used for genomic sequencing, they are also very promising for gene expression profiling. Sequencing of cDNA does not only provide an estimate of the absolute expression level, it can also be used for the identification of allele specific gene expression. Results We developed PanGEA, a tool which enables a fast and user-friendly analysis of allele specific gene expression using the 454 technology. PanGEA allows mapping of 454-ESTs to genes or whole genomes, displaying gene expression profiles, identification of SNPs and the quantification of allele specific gene expression. The intuitive GUI of PanGEA facilitates a flexible and interactive analysis of the data. PanGEA additionally implements a modification of the Smith-Waterman algorithm which deals with incorrect estimates of homopolymer length as occuring in the 454 technology Conclusion To our knowledge, PanGEA is the first tool which facilitates the identification of allele specific gene expression. PanGEA is distributed under the Mozilla Public License and available at: PMID:19442283
Analysis of xylem formation in pine by cDNA sequencing
NASA Technical Reports Server (NTRS)
Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.;
1998-01-01
Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.
Rangel, Luiz Thibério; Novaes, Jeniffer; Durham, Alan M.; Madeira, Alda Maria B. N.; Gruber, Arthur
2013-01-01
Parasites of the genus Eimeria infect a wide range of vertebrate hosts, including chickens. We have recently reported a comparative analysis of the transcriptomes of Eimeria acervulina, Eimeria maxima and Eimeria tenella, integrating ORESTES data produced by our group and publicly available Expressed Sequence Tags (ESTs). All cDNA reads have been assembled, and the reconstructed transcripts have been submitted to a comprehensive functional annotation pipeline. Additional studies included orthology assignment across apicomplexan parasites and clustering analyses of gene expression profiles among different developmental stages of the parasites. To make all this body of information publicly available, we constructed the Eimeria Transcript Database (EimeriaTDB), a web repository that provides access to sequence data, annotation and comparative analyses. Here, we describe the web interface, available sequence data sets and query tools implemented on the site. The main goal of this work is to offer a public repository of sequence and functional annotation data of reconstructed transcripts of parasites of the genus Eimeria. We believe that EimeriaTDB will represent a valuable and complementary resource for the Eimeria scientific community and for those researchers interested in comparative genomics of apicomplexan parasites. Database URL: http://www.coccidia.icb.usp.br/eimeriatdb/ PMID:23411718
Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.
Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio
2017-10-06
Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.
Stajdohar, Miha; Rosengarten, Rafael D; Kokosar, Janez; Jeran, Luka; Blenkus, Domen; Shaulsky, Gad; Zupan, Blaz
2017-06-02
Dictyostelium discoideum, a soil-dwelling social amoeba, is a model for the study of numerous biological processes. Research in the field has benefited mightily from the adoption of next-generation sequencing for genomics and transcriptomics. Dictyostelium biologists now face the widespread challenges of analyzing and exploring high dimensional data sets to generate hypotheses and discovering novel insights. We present dictyExpress (2.0), a web application designed for exploratory analysis of gene expression data, as well as data from related experiments such as Chromatin Immunoprecipitation sequencing (ChIP-Seq). The application features visualization modules that include time course expression profiles, clustering, gene ontology enrichment analysis, differential expression analysis and comparison of experiments. All visualizations are interactive and interconnected, such that the selection of genes in one module propagates instantly to visualizations in other modules. dictyExpress currently stores the data from over 800 Dictyostelium experiments and is embedded within a general-purpose software framework for management of next-generation sequencing data. dictyExpress allows users to explore their data in a broader context by reciprocal linking with dictyBase-a repository of Dictyostelium genomic data. In addition, we introduce a companion application called GenBoard, an intuitive graphic user interface for data management and bioinformatics analysis. dictyExpress and GenBoard enable broad adoption of next generation sequencing based inquiries by the Dictyostelium research community. Labs without the means to undertake deep sequencing projects can mine the data available to the public. The entire information flow, from raw sequence data to hypothesis testing, can be accomplished in an efficient workspace. The software framework is generalizable and represents a useful approach for any research community. To encourage more wide usage, the backend is open-source, available for extension and further development by bioinformaticians and data scientists.
Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing.
Jäger, Marten; Ott, Claus-Eric; Grünhagen, Johannes; Hecht, Jochen; Schell, Hanna; Mundlos, Stefan; Duda, Georg N; Robinson, Peter N; Lienau, Jasmin
2011-03-24
The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This information will provide a basis for future molecular research involving the sheep as a model organism.
Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing
2011-01-01
Background The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Results Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Conclusions Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This information will provide a basis for future molecular research involving the sheep as a model organism. PMID:21435219
O'Rourke, Jamie A; Fu, Fengli; Bucciarelli, Bruna; Yang, S Sam; Samac, Deborah A; Lamb, JoAnn F S; Monteros, Maria J; Graham, Michelle A; Gronwald, John W; Krom, Nick; Li, Jun; Dai, Xinbin; Zhao, Patrick X; Vance, Carroll P
2015-07-07
Alfalfa (Medicago sativa L.) is the primary forage legume crop species in the United States and plays essential economic and ecological roles in agricultural systems across the country. Modern alfalfa is the result of hybridization between tetraploid M. sativa ssp. sativa and M. sativa ssp. falcata. Due to its large and complex genome, there are few genomic resources available for alfalfa improvement. A de novo transcriptome assembly from two alfalfa subspecies, M. sativa ssp. sativa (B47) and M. sativa ssp. falcata (F56) was developed using Illumina RNA-seq technology. Transcripts from roots, nitrogen-fixing root nodules, leaves, flowers, elongating stem internodes, and post-elongation stem internodes were assembled into the Medicago sativa Gene Index 1.2 (MSGI 1.2) representing 112,626 unique transcript sequences. Nodule-specific and transcripts involved in cell wall biosynthesis were identified. Statistical analyses identified 20,447 transcripts differentially expressed between the two subspecies. Pair-wise comparisons of each tissue combination identified 58,932 sequences differentially expressed in B47 and 69,143 sequences differentially expressed in F56. Comparing transcript abundance in floral tissues of B47 and F56 identified expression differences in sequences involved in anthocyanin and carotenoid synthesis, which determine flower pigmentation. Single nucleotide polymorphisms (SNPs) unique to each M. sativa subspecies (110,241) were identified. The Medicago sativa Gene Index 1.2 increases the expressed sequence data available for alfalfa by ninefold and can be expanded as additional experiments are performed. The MSGI 1.2 transcriptome sequences, annotations, expression profiles, and SNPs were assembled into the Alfalfa Gene Index and Expression Database (AGED) at http://plantgrn.noble.org/AGED/ , a publicly available genomic resource for alfalfa improvement and legume research.
PipeOnline 2.0: automated EST processing and functional data sorting.
Ayoubi, Patricia; Jin, Xiaojing; Leite, Saul; Liu, Xianghui; Martajaja, Jeson; Abduraham, Abdurashid; Wan, Qiaolan; Yan, Wei; Misawa, Eduardo; Prade, Rolf A
2002-11-01
Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.
Study of cnidarian-algal symbiosis in the "omics" age.
Meyer, Eli; Weis, Virginia M
2012-08-01
The symbiotic associations between cnidarians and dinoflagellate algae (Symbiodinium) support productive and diverse ecosystems in coral reefs. Many aspects of this association, including the mechanistic basis of host-symbiont recognition and metabolic interaction, remain poorly understood. The first completed genome sequence for a symbiotic anthozoan is now available (the coral Acropora digitifera), and extensive expressed sequence tag resources are available for a variety of other symbiotic corals and anemones. These resources make it possible to profile gene expression, protein abundance, and protein localization associated with the symbiotic state. Here we review the history of "omics" studies of cnidarian-algal symbiosis and the current availability of sequence resources for corals and anemones, identifying genes putatively involved in symbiosis across 10 anthozoan species. The public availability of candidate symbiosis-associated genes leaves the field of cnidarian-algal symbiosis poised for in-depth comparative studies of sequence diversity and gene expression and for targeted functional studies of genes associated with symbiosis. Reviewing the progress to date suggests directions for future investigations of cnidarian-algal symbiosis that include (i) sequencing of Symbiodinium, (ii) proteomic analysis of the symbiosome membrane complex, (iii) glycomic analysis of Symbiodinium cell surfaces, and (iv) expression profiling of the gastrodermal cells hosting Symbiodinium.
Na, Dokyun; Lee, Doheon
2010-10-15
RBSDesigner predicts the translation efficiency of existing mRNA sequences and designs synthetic ribosome binding sites (RBSs) for a given coding sequence (CDS) to yield a desired level of protein expression. The program implements the mathematical model for translation initiation described in Na et al. (Mathematical modeling of translation initiation for the estimation of its efficiency to computationally design mRNA sequences with a desired expression level in prokaryotes. BMC Syst. Biol., 4, 71). The program additionally incorporates the effect on translation efficiency of the spacer length between a Shine-Dalgarno (SD) sequence and an AUG codon, which is crucial for the incorporation of fMet-tRNA into the ribosome. RBSDesigner provides a graphical user interface (GUI) for the convenient design of synthetic RBSs. RBSDesigner is written in Python and Microsoft Visual Basic 6.0 and is publicly available as precompiled stand-alone software on the web (http://rbs.kaist.ac.kr). dhlee@kaist.ac.kr
Cloning, expression and purification of d-tagatose 3-epimerase gene from Escherichia coli JM109.
He, Xiaoliang; Zhou, Xiaohui; Yang, Zi; Xu, Le; Yu, Yuxiu; Jia, Lingling; Li, Guoqing
2015-10-01
An unknown d-tagatose 3-epimerase (DTE) containing a IoIE domain was identified and cloned from Escherichia coli. This gene was subcloned into the prokaryotic expression vector pET-15b, and induced by IPTG in E. coli BL21 expression system. Through His-select gel column purification and fast-protein liquid chromatography, highly purified and stable DTE protein was produced. The molecular weight of the DTE protein was estimated to be 29.8kDa. The latest 83 DTE sequences from public database were selected and analyzed by molecular clustering, multi-sequence alignment. DTEs were roughly divided into five categories. Copyright © 2015 Elsevier Inc. All rights reserved.
EXP-PAC: providing comparative analysis and storage of next generation gene expression data.
Church, Philip C; Goscinski, Andrzej; Lefèvre, Christophe
2012-07-01
Microarrays and more recently RNA sequencing has led to an increase in available gene expression data. How to manage and store this data is becoming a key issue. In response we have developed EXP-PAC, a web based software package for storage, management and analysis of gene expression and sequence data. Unique to this package is SQL based querying of gene expression data sets, distributed normalization of raw gene expression data and analysis of gene expression data across experiments and species. This package has been populated with lactation data in the international milk genomic consortium web portal (http://milkgenomics.org/). Source code is also available which can be hosted on a Windows, Linux or Mac APACHE server connected to a private or public network (http://mamsap.it.deakin.edu.au/~pcc/Release/EXP_PAC.html). Copyright © 2012 Elsevier Inc. All rights reserved.
Rudd, Stephen
2005-01-01
The public expressed sequence tag collections are continually being enriched with high-quality sequences that represent an ever-expanding range of taxonomically diverse plant species. While these sequence collections provide biased insight into the populations of expressed genes available within individual species and their associated tissues, the information is conceivably of wider relevance in a comparative context. When we consider the available expressed sequence tag (EST) collections of summer 2004, most of the major plant taxonomic clades are at least superficially represented. Investigation of the five million available plant ESTs provides a wealth of information that has applications in modelling the routes of plant genome evolution and the identification of lineage-specific genes and gene families. Over four million ESTs from over 50 distinct plant species have been collated within an EST analysis pipeline called openSputnik. The ESTs were resolved down into approximately one million unigene sequences. These have been annotated using orthology-based annotation transfer from reference plant genomes and using a variety of contemporary bioinformatics methods to assign peptide, structural and functional attributes. The openSputnik database is available at http://sputnik.btk.fi.
Bürckert, Jean-Philippe; Dubois, Axel R S X; Faison, William J; Farinelle, Sophie; Charpentier, Emilie; Sinner, Regina; Wienecke-Baldacchino, Anke; Muller, Claude P
2017-01-01
The identification and tracking of antigen-specific immunoglobulin (Ig) sequences within total Ig repertoires is central to high-throughput sequencing (HTS) studies of infections or vaccinations. In this context, public Ig sequences shared by different individuals exposed to the same antigen could be valuable markers for tracing back infections, measuring vaccine immunogenicity, and perhaps ultimately allow the reconstruction of the immunological history of an individual. Here, we immunized groups of transgenic rats expressing human Ig against tetanus toxoid (TT), Modified Vaccinia virus Ankara (MVA), measles virus hemagglutinin and fusion proteins expressed on MVA, and the environmental carcinogen benzo[a]pyrene, coupled to TT. We showed that these antigens impose a selective pressure causing the Ig heavy chain (IgH) repertoires of the rats to converge toward the expression of antibodies with highly similar IgH CDR3 amino acid sequences. We present a computational approach, similar to differential gene expression analysis, that selects for clusters of CDR3s with 80% similarity, significantly overrepresented within the different groups of immunized rats. These IgH clusters represent antigen-induced IgH signatures exhibiting stereotypic amino acid patterns including previously described TT- and measles-specific IgH sequences. Our data suggest that with the presented methodology, transgenic Ig rats can be utilized as a model to identify antigen-induced, human IgH signatures to a variety of different antigens.
The FDA's Experience with Emerging Genomics Technologies-Past, Present, and Future.
Xu, Joshua; Thakkar, Shraddha; Gong, Binsheng; Tong, Weida
2016-07-01
The rapid advancement of emerging genomics technologies and their application for assessing safety and efficacy of FDA-regulated products require a high standard of reliability and robustness supporting regulatory decision-making in the FDA. To facilitate the regulatory application, the FDA implemented a novel data submission program, Voluntary Genomics Data Submission (VGDS), and also to engage the stakeholders. As part of the endeavor, for the past 10 years, the FDA has led an international consortium of regulatory agencies, academia, pharmaceutical companies, and genomics platform providers, which was named MicroArray Quality Control Consortium (MAQC), to address issues such as reproducibility, precision, specificity/sensitivity, and data interpretation. Three projects have been completed so far assessing these genomics technologies: gene expression microarrays, whole genome genotyping arrays, and whole transcriptome sequencing (i.e., RNA-seq). The resultant studies provide the basic parameters for fit-for-purpose application of these new data streams in regulatory environments, and the solutions have been made available to the public through peer-reviewed publications. The latest MAQC project is also called the SEquencing Quality Control (SEQC) project focused on next-generation sequencing. Using reference samples with built-in controls, SEQC studies have demonstrated that relative gene expression can be measured accurately and reliably across laboratories and RNA-seq platforms. Besides prediction performance comparable to microarrays in clinical settings and safety assessments, RNA-seq is shown to have better sensitivity for low expression and reveal novel transcriptomic features. Future effort of MAQC will be focused on quality control of whole genome sequencing and targeted sequencing.
The FDA’s Experience with Emerging Genomics Technologies—Past, Present, and Future
Xu, Joshua; Thakkar, Shraddha; Gong, Binsheng; Tong, Weida
2016-01-01
The rapid advancement of emerging genomics technologies and their application for assessing safety and efficacy of FDA-regulated products require a high standard of reliability and robustness supporting regulatory decision-making in the FDA. To facilitate the regulatory application, the FDA implemented a novel data submission program, Voluntary Genomics Data Submission (VGDS), and also to engage the stakeholders. As part of the endeavor, for the past 10 years, the FDA has led an international consortium of regulatory agencies, academia, pharmaceutical companies, and genomics platform providers, which was named MicroArray Quality Control Consortium (MAQC), to address issues such as reproducibility, precision, specificity/sensitivity, and data interpretation. Three projects have been completed so far assessing these genomics technologies: gene expression microarrays, whole genome genotyping arrays, and whole transcriptome sequencing (i.e., RNA-seq). The resultant studies provide the basic parameters for fit-for-purpose application of these new data streams in regulatory environments, and the solutions have been made available to the public through peer-reviewed publications. The latest MAQC project is also called the SEquencing Quality Control (SEQC) project focused on next-generation sequencing. Using reference samples with built-in controls, SEQC studies have demonstrated that relative gene expression can be measured accurately and reliably across laboratories and RNA-seq platforms. Besides prediction performance comparable to microarrays in clinical settings and safety assessments, RNA-seq is shown to have better sensitivity for low expression and reveal novel transcriptomic features. Future effort of MAQC will be focused on quality control of whole genome sequencing and targeted sequencing. PMID:27116022
Differences in expression of retinal pigment epithelium mRNA between normal canines
2004-01-01
Abstract A reference database of differences in mRNA expression in normal healthy canine retinal pigment epithelium (RPE) has been established. This database identifies non-informative differences in mRNA expression that can be used in screening canine RPE for mutations associated with clinical effects on vision. Complementary DNA (cDNA) pools were prepared from mRNA harvested from RPE, amplified by PCR, and used in a subtractive hybridization protocol (representational differential analysis) to identify differences in RPE mRNA expression between canines. The effect of relatedness of the test canines on the frequency of occurrence of differences was evaluated by using 2 unrelated canines for comparison with 2 female sibling canines of blue heeler/bull terrier lineage. Differentially expressed cDNA species were cloned, sequenced, and identified by comparison to public database entries. The most frequently observed differentially expressed sequence from the unrelated canine comparison was cDNA with 21 base pairs (bp) identical to the human epithelial membrane protein 1 gene (present in 8 of 20 clones). Different clones from the same-sex sibling RPE contained repetitions of several short sequence motifs including the human epithelial membrane protein 1 (4 of 25 clones). Other prevalent differences between sibling RPE included sequences similar to a chicken genetic marker sequence motif (5 of 25), and 6 clones with homology to porcine major histocompatibility loci. In addition to identifying several repetitively occurring, noninformative, differentially expressed RPE mRNA species, the findings confirm that fewer differences occurred between siblings, highlighting the importance of using closely related subjects in representational difference analysis studies. PMID:15352545
USDA-ARS?s Scientific Manuscript database
One-hundred-thirty-six expressed sequence tags (ESTs) encoding alpha gliadins from Triticum aestivum cv Butte 86 were identified in public databases and assembled into 19 contigs. Consensus sequences for 12 of the contigs encoded complete alpha gliadin proteins, but only two were identical to protei...
Preparation of highly multiplexed small RNA sequencing libraries.
Persson, Helena; Søkilde, Rolf; Pirona, Anna Chiara; Rovira, Carlos
2017-08-01
MicroRNAs (miRNAs) are ~22-nucleotide-long small non-coding RNAs that regulate the expression of protein-coding genes by base pairing to partially complementary target sites, preferentially located in the 3´ untranslated region (UTR) of target mRNAs. The expression and function of miRNAs have been extensively studied in human disease, as well as the possibility of using these molecules as biomarkers for prognostication and treatment guidance. To identify and validate miRNAs as biomarkers, their expression must be screened in large collections of patient samples. Here, we develop a scalable protocol for the rapid and economical preparation of a large number of small RNA sequencing libraries using dual indexing for multiplexing. Combined with the use of off-the-shelf reagents, more samples can be sequenced simultaneously on large-scale sequencing platforms at a considerably lower cost per sample. Sample preparation is simplified by pooling libraries prior to gel purification, which allows for the selection of a narrow size range while minimizing sample variation. A comparison with publicly available data from benchmarking of miRNA analysis platforms showed that this method captures absolute and differential expression as effectively as commercially available alternatives.
Oduru, Sreedhar; Campbell, Janee L; Karri, SriTulasi; Hendry, William J; Khan, Shafiq A; Williams, Simon C
2003-01-01
Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish) genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells. PMID:12783626
Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop
2012-01-01
Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382
Huang, Xiaoyun; Zang, Xiaonan; Wu, Fei; Jin, Yuming; Wang, Haitao; Liu, Chang; Ding, Yating; He, Bangxiang; Xiao, Dongfang; Song, Xinwei; Liu, Zhu
2017-01-01
Gracilariopsis lemaneiformis (aka Gracilaria lemaneiformis) is a red macroalga rich in phycoerythrin, which can capture light efficiently and transfer it to photosystemⅡ. However, little is known about the synthesis of optically active phycoerythrinin in G. lemaneiformis at the molecular level. With the advent of high-throughput sequencing technology, analysis of genetic information for G. lemaneiformis by transcriptome sequencing is an effective means to get a deeper insight into the molecular mechanism of phycoerythrin synthesis. Illumina technology was employed to sequence the transcriptome of two strains of G. lemaneiformis- the wild type and a green-pigmented mutant. We obtained a total of 86915 assembled unigenes as a reference gene set, and 42884 unigenes were annotated in at least one public database. Taking the above transcriptome sequencing as a reference gene set, 4041 differentially expressed genes were screened to analyze and compare the gene expression profiles of the wild type and green mutant. By GO and KEGG pathway analysis, we concluded that three factors, including a reduction in the expression level of apo-phycoerythrin, an increase of chlorophyll light-harvesting complex synthesis, and reduction of phycoerythrobilin by competitive inhibition, caused the reduction of optically active phycoerythrin in the green-pigmented mutant.
Comparative transcriptome analysis of microsclerotia development in Nomuraea rileyi.
Song, Zhangyong; Yin, Youping; Jiang, Shasha; Liu, Juanjuan; Chen, Huan; Wang, Zhongkang
2013-06-19
Nomuraea rileyi is used as an environmental-friendly biopesticide. However, mass production and commercialization of this organism are limited due to its fastidious growth and sporulation requirements. When cultured in amended medium, we found that N. rileyi could produce microsclerotia bodies, replacing conidiophores as the infectious agent. However, little is known about the genes involved in microsclerotia development. In the present study, the transcriptomes were analyzed using next-generation sequencing technology to find the genes involved in microsclerotia development. A total of 4.69 Gb of clean nucleotides comprising 32,061 sequences was obtained, and 20,919 sequences were annotated (about 65%). Among the annotated sequences, only 5928 were annotated with 34 gene ontology (GO) functional categories, and 12,778 sequences were mapped to 165 pathways by searching against the Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) database. Furthermore, we assessed the transcriptomic differences between cultures grown in minimal and amended medium. In total, 4808 sequences were found to be differentially expressed; 719 differentially expressed unigenes were assigned to 25 GO classes and 1888 differentially expressed unigenes were assigned to 161 KEGG pathways, including 25 enrichment pathways. Subsequently, we examined the up-regulation or uniquely expressed genes following amended medium treatment, which were also expressed on the enrichment pathway, and found that most of them participated in mediating oxidative stress homeostasis. To elucidate the role of oxidative stress in microsclerotia development, we analyzed the diversification of unigenes using quantitative reverse transcription-PCR (RT-qPCR). Our findings suggest that oxidative stress occurs during microsclerotia development, along with a broad metabolic activity change. Our data provide the most comprehensive sequence resource available for the study of N. rileyi. We believe that the transcriptome datasets will serve as an important public information platform to accelerate studies on N. rileyi microsclerotia.
Alkio, Merianne; Jonas, Uwe; Declercq, Myriam; Van Nocker, Steven; Knoche, Moritz
2014-01-01
The exocarp, or skin, of fleshy fruit is a specialized tissue that protects the fruit, attracts seed dispersing fruit eaters, and has large economical relevance for fruit quality. Development of the exocarp involves regulated activities of many genes. This research analyzed global gene expression in the exocarp of developing sweet cherry (Prunus avium L., ‘Regina’), a fruit crop species with little public genomic resources. A catalog of transcript models (contigs) representing expressed genes was constructed from de novo assembled short complementary DNA (cDNA) sequences generated from developing fruit between flowering and maturity at 14 time points. Expression levels in each sample were estimated for 34 695 contigs from numbers of reads mapping to each contig. Contigs were annotated functionally based on BLAST, gene ontology and InterProScan analyses. Coregulated genes were detected using partitional clustering of expression patterns. The results are discussed with emphasis on genes putatively involved in cuticle deposition, cell wall metabolism and sugar transport. The high temporal resolution of the expression patterns presented here reveals finely tuned developmental specialization of individual members of gene families. Moreover, the de novo assembled sweet cherry fruit transcriptome with 7760 full-length protein coding sequences and over 20 000 other, annotated cDNA sequences together with their developmental expression patterns is expected to accelerate molecular research on this important tree fruit crop. PMID:26504533
Brassica ASTRA: an integrated database for Brassica genomic research.
Love, Christopher G; Robinson, Andrew J; Lim, Geraldine A C; Hopkins, Clare J; Batley, Jacqueline; Barker, Gary; Spangenberg, German C; Edwards, David
2005-01-01
Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au.
A cloud-based workflow to quantify transcript-expression levels in public cancer compendia
Tatlow, PJ; Piccolo, Stephen R.
2016-01-01
Public compendia of sequencing data are now measured in petabytes. Accordingly, it is infeasible for researchers to transfer these data to local computers. Recently, the National Cancer Institute began exploring opportunities to work with molecular data in cloud-computing environments. With this approach, it becomes possible for scientists to take their tools to the data and thereby avoid large data transfers. It also becomes feasible to scale computing resources to the needs of a given analysis. We quantified transcript-expression levels for 12,307 RNA-Sequencing samples from the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas. We used two cloud-based configurations and examined the performance and cost profiles of each configuration. Using preemptible virtual machines, we processed the samples for as little as $0.09 (USD) per sample. As the samples were processed, we collected performance metrics, which helped us track the duration of each processing step and quantified computational resources used at different stages of sample processing. Although the computational demands of reference alignment and expression quantification have decreased considerably, there remains a critical need for researchers to optimize preprocessing steps. We have stored the software, scripts, and processed data in a publicly accessible repository (https://osf.io/gqrz9). PMID:27982081
Monoallelic Gene Expression in Mammals.
Chess, Andrew
2016-11-23
Monoallelic expression not due to cis-regulatory sequence polymorphism poses an intriguing problem in epigenetics because it requires the unequal treatment of two segments of DNA that are present in the same nucleus and that can indeed have absolutely identical sequences. Here, I focus on a few recent developments in the field of monoallelic expression that are of particular interest and raise interesting questions for future work. One development is regarding analyses of imprinted genes, in which recent work suggests the possibility that intriguing networks of imprinted genes exist and are important for genetic and physiological studies. Another issue that has been raised in recent years by a number of publications is the question of how skewed allelic expression should be for it to be designated as monoallelic expression and, further, what methods are appropriate or inappropriate for analyzing genomic data to examine allele-specific expression. Perhaps the most exciting recent development in mammalian monoallelic expression is a clever and carefully executed analysis of genetic diversity of autosomal genes subject to random monoallelic expression (RMAE), which provides compelling evidence for distinct evolutionary forces acting on random monoallelically expressed genes.
Chae, Heejoon; Lee, Sangseon; Seo, Seokjun; Jung, Daekyoung; Chang, Hyeonsook; Nephew, Kenneth P; Kim, Sun
2016-12-01
Measuring gene expression, DNA sequence variation, and DNA methylation status is routinely done using high throughput sequencing technologies. To analyze such multi-omics data and explore relationships, reliable bioinformatics systems are much needed. Existing systems are either for exploring curated data or for processing omics data in the form of a library such as R. Thus scientists have much difficulty in investigating relationships among gene expression, DNA sequence variation, and DNA methylation using multi-omics data. In this study, we report a system called BioVLAB-mCpG-SNP-EXPRESS for the integrated analysis of DNA methylation, sequence variation (SNPs), and gene expression for distinguishing cellular phenotypes at the pairwise and multiple phenotype levels. The system can be deployed on either the Amazon cloud or a publicly available high-performance computing node, and the data analysis and exploration of the analysis result can be conveniently done using a web-based interface. In order to alleviate analysis complexity, all the process are fully automated, and graphical workflow system is integrated to represent real-time analysis progression. The BioVLAB-mCpG-SNP-EXPRESS system works in three stages. First, it processes and analyzes multi-omics data as input in the form of the raw data, i.e., FastQ files. Second, various integrated analyses such as methylation vs. gene expression and mutation vs. methylation are performed. Finally, the analysis result can be explored in a number of ways through a web interface for the multi-level, multi-perspective exploration. Multi-level interpretation can be done by either gene, gene set, pathway or network level and multi-perspective exploration can be explored from either gene expression, DNA methylation, sequence variation, or their relationship perspective. The utility of the system is demonstrated by performing analysis of phenotypically distinct 30 breast cancer cell line data set. BioVLAB-mCpG-SNP-EXPRESS is available at http://biohealth.snu.ac.kr/software/biovlab_mcpg_snp_express/. Copyright © 2016 Elsevier Inc. All rights reserved.
The promises and pitfalls of RNA-interference-based therapeutics
Castanotto, Daniela; Rossi, John J.
2009-01-01
The discovery that gene expression can be controlled by the Watson–Crick base-pairing of small RNAs with messenger RNAs containing complementary sequence — a process known as RNA interference — has markedly advanced our understanding of eukaryotic gene regulation and function. The ability of short RNA sequences to modulate gene expression has provided a powerful tool with which to study gene function and is set to revolutionize the treatment of disease. Remarkably, despite being just one decade from its discovery, the phenomenon is already being used therapeutically in human clinical trials, and biotechnology companies that focus on RNA-interference-based therapeutics are already publicly traded. PMID:19158789
Wheat EST resources for functional genomics of abiotic stress
Houde, Mario; Belcaid, Mahdi; Ouellet, François; Danyluk, Jean; Monroy, Antonio F; Dryanova, Ani; Gulick, Patrick; Bergeron, Anne; Laroche, André; Links, Matthew G; MacCarthy, Luke; Crosby, William L; Sarhan, Fathey
2006-01-01
Background Wheat is an excellent species to study freezing tolerance and other abiotic stresses. However, the sequence of the wheat genome has not been completely characterized due to its complexity and large size. To circumvent this obstacle and identify genes involved in cold acclimation and associated stresses, a large scale EST sequencing approach was undertaken by the Functional Genomics of Abiotic Stress (FGAS) project. Results We generated 73,521 quality-filtered ESTs from eleven cDNA libraries constructed from wheat plants exposed to various abiotic stresses and at different developmental stages. In addition, 196,041 ESTs for which tracefiles were available from the National Science Foundation wheat EST sequencing program and DuPont were also quality-filtered and used in the analysis. Clustering of the combined ESTs with d2_cluster and TGICL yielded a few large clusters containing several thousand ESTs that were refractory to routine clustering techniques. To resolve this problem, the sequence proximity and "bridges" were identified by an e-value distance graph to manually break clusters into smaller groups. Assembly of the resolved ESTs generated a 75,488 unique sequence set (31,580 contigs and 43,908 singletons/singlets). Digital expression analyses indicated that the FGAS dataset is enriched in stress-regulated genes compared to the other public datasets. Over 43% of the unique sequence set was annotated and classified into functional categories according to Gene Ontology. Conclusion We have annotated 29,556 different sequences, an almost 5-fold increase in annotated sequences compared to the available wheat public databases. Digital expression analysis combined with gene annotation helped in the identification of several pathways associated with abiotic stress. The genomic resources and knowledge developed by this project will contribute to a better understanding of the different mechanisms that govern stress tolerance in wheat and other cereals. PMID:16772040
Development of an Expressed Sequence Tag (EST) Resource for Wheat (Triticum aestivum L.)
Lazo, G. R.; Chao, S.; Hummel, D. D.; Edwards, H.; Crossman, C. C.; Lui, N.; Matthews, D. E.; Carollo, V. L.; Hane, D. L.; You, F. M.; Butler, G. E.; Miller, R. E.; Close, T. J.; Peng, J. H.; Lapitan, N. L. V.; Gustafson, J. P.; Qi, L. L.; Echalier, B.; Gill, B. S.; Dilbirligi, M.; Randhawa, H. S.; Gill, K. S.; Greene, R. A.; Sorrells, M. E.; Akhunov, E. D.; Dvořák, J.; Linkiewicz, A. M.; Dubcovsky, J.; Hossain, K. G.; Kalavacharla, V.; Kianian, S. F.; Mahmoud, A. A.; Miftahudin; Ma, X.-F.; Conley, E. J.; Anderson, J. A.; Pathan, M. S.; Nguyen, H. T.; McGuire, P. E.; Qualset, C. O.; Anderson, O. D.
2004-01-01
This report describes the rationale, approaches, organization, and resource development leading to a large-scale deletion bin map of the hexaploid (2n = 6x = 42) wheat genome (Triticum aestivum L.). Accompanying reports in this issue detail results from chromosome bin-mapping of expressed sequence tags (ESTs) representing genes onto the seven homoeologous chromosome groups and a global analysis of the entire mapped wheat EST data set. Among the resources developed were the first extensive public wheat EST collection (113,220 ESTs). Described are protocols for sequencing, sequence processing, EST nomenclature, and the assembly of ESTs into contigs. These contigs plus singletons (unassembled ESTs) were used for selection of distinct sequence motif unigenes. Selected ESTs were rearrayed, validated by 5′ and 3′ sequencing, and amplified for probing a series of wheat aneuploid and deletion stocks. Images and data for all Southern hybridizations were deposited in databases and were used by the coordinators for each of the seven homoeologous chromosome groups to validate the mapping results. Results from this project have established the foundation for future developments in wheat genomics. PMID:15514037
Trapnell, Cole; Roberts, Adam; Goff, Loyal; Pertea, Geo; Kim, Daehwan; Kelley, David R; Pimentel, Harold; Salzberg, Steven L; Rinn, John L; Pachter, Lior
2012-01-01
Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ~1 h of hands-on time. PMID:22383036
A dehydration-inducible gene in the truffle Tuber borchii identifies a novel group of dehydrins
Abba', Simona; Ghignone, Stefano; Bonfante, Paola
2006-01-01
Background The expressed sequence tag M6G10 was originally isolated from a screening for differentially expressed transcripts during the reproductive stage of the white truffle Tuber borchii. mRNA levels for M6G10 increased dramatically during fruiting body maturation compared to the vegetative mycelial stage. Results Bioinformatics tools, phylogenetic analysis and expression studies were used to support the hypothesis that this sequence, named TbDHN1, is the first dehydrin (DHN)-like coding gene isolated in fungi. Homologs of this gene, all defined as "coding for hypothetical proteins" in public databases, were exclusively found in ascomycetous fungi and in plants. Although complete (or almost complete) fungal genomes and EST collections of some Basidiomycota and Glomeromycota are already available, DHN-like proteins appear to be represented only in Ascomycota. A new and previously uncharacterized conserved signature pattern was identified and proposed to Uniprot database as the main distinguishing feature of this new group of DHNs. Expression studies provide experimental evidence of a transcript induction of TbDHN1 during cellular dehydration. Conclusion Expression pattern and sequence similarities to known plant DHNs indicate that TbDHN1 is the first characterized DHN-like protein in fungi. The high similarity of TbDHN1 with homolog coding sequences implies the existence of a novel fungal/plant group of LEA Class II proteins characterized by a previously undescribed signature pattern. PMID:16512918
Gene expression distribution deconvolution in single-cell RNA sequencing.
Wang, Jingshu; Huang, Mo; Torre, Eduardo; Dueck, Hannah; Shaffer, Sydney; Murray, John; Raj, Arjun; Li, Mingyao; Zhang, Nancy R
2018-06-26
Single-cell RNA sequencing (scRNA-seq) enables the quantification of each gene's expression distribution across cells, thus allowing the assessment of the dispersion, nonzero fraction, and other aspects of its distribution beyond the mean. These statistical characterizations of the gene expression distribution are critical for understanding expression variation and for selecting marker genes for population heterogeneity. However, scRNA-seq data are noisy, with each cell typically sequenced at low coverage, thus making it difficult to infer properties of the gene expression distribution from raw counts. Based on a reexamination of nine public datasets, we propose a simple technical noise model for scRNA-seq data with unique molecular identifiers (UMI). We develop deconvolution of single-cell expression distribution (DESCEND), a method that deconvolves the true cross-cell gene expression distribution from observed scRNA-seq counts, leading to improved estimates of properties of the distribution such as dispersion and nonzero fraction. DESCEND can adjust for cell-level covariates such as cell size, cell cycle, and batch effects. DESCEND's noise model and estimation accuracy are further evaluated through comparisons to RNA FISH data, through data splitting and simulations and through its effectiveness in removing known batch effects. We demonstrate how DESCEND can clarify and improve downstream analyses such as finding differentially expressed genes, identifying cell types, and selecting differentiation markers. Copyright © 2018 the Author(s). Published by PNAS.
Ohyanagi, Hajime; Takano, Tomoyuki; Terashima, Shin; Kobayashi, Masaaki; Kanno, Maasa; Morimoto, Kyoko; Kanegae, Hiromi; Sasaki, Yohei; Saito, Misa; Asano, Satomi; Ozaki, Soichi; Kudo, Toru; Yokoyama, Koji; Aya, Koichiro; Suwabe, Keita; Suzuki, Go; Aoki, Koh; Kubo, Yasutaka; Watanabe, Masao; Matsuoka, Makoto; Yano, Kentaro
2015-01-01
Comprehensive integration of large-scale omics resources such as genomes, transcriptomes and metabolomes will provide deeper insights into broader aspects of molecular biology. For better understanding of plant biology, we aim to construct a next-generation sequencing (NGS)-derived gene expression network (GEN) repository for a broad range of plant species. So far we have incorporated information about 745 high-quality mRNA sequencing (mRNA-Seq) samples from eight plant species (Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, Sorghum bicolor, Vitis vinifera, Solanum tuberosum, Medicago truncatula and Glycine max) from the public short read archive, digitally profiled the entire set of gene expression profiles, and drawn GENs by using correspondence analysis (CA) to take advantage of gene expression similarities. In order to understand the evolutionary significance of the GENs from multiple species, they were linked according to the orthology of each node (gene) among species. In addition to other gene expression information, functional annotation of the genes will facilitate biological comprehension. Currently we are improving the given gene annotations with natural language processing (NLP) techniques and manual curation. Here we introduce the current status of our analyses and the web database, PODC (Plant Omics Data Center; http://bioinf.mind.meiji.ac.jp/podc/), now open to the public, providing GENs, functional annotations and additional comprehensive omics resources. PMID:25505034
Cárdenas, Leyla; Sánchez, Roland; Gomez, Daniela; Fuenzalida, Gonzalo; Gallardo-Escárate, Cristián; Tanguy, Arnaud
2011-09-01
The marine gastropod Concholepas concholepas, locally known as the "loco", is the main target species of the benthonic Chilean fisheries. Genetic and genomic tools are necessary to study the genome of this species in order to understand the molecular basis of its development, growth, and other key traits to improve the management strategies and to identify local adaptation to prevent loss of biodiversity. Here, we use pyrosequencing technologies to generate the first transcriptomic database from adult specimens of the loco. After trimming, a total of 140,756 Expressed Sequence Tag sequences were achieved. Clustering and assembly analysis identified 19,219 contigs and 105,435 singleton sequences. BlastN analysis showed a significant identity with Expressed Sequence Tags of different gastropod species available in public databases. Similarly, BlastX results showed that only 895 out of the total 124,654 had significant hits and may represent novel genes for marine gastropods. From this database, simple sequence repeat motifs were also identified and a total of 38 primer pairs were designed and tested to assess their potential as informative markers and to investigate their cross-species amplification in different related gastropod species. This dataset represents the first publicly available 454 data for a marine gastropod endemic to the southeastern Pacific coast, providing a valuable transcriptomic resource for future efforts of gene discovery and development of functional markers in other marine gastropods. Copyright © 2011 Elsevier B.V. All rights reserved.
Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping
2007-01-01
Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730
Model-Based Optimal Experimental Design for Complex Physical Systems
2015-12-03
for public release. magnitude reduction in estimator error required to make solving the exact optimal design problem tractable. Instead of using a naive...for designing a sequence of experiments uses suboptimal approaches: batch design that has no feedback, or greedy ( myopic ) design that optimally...approved for public release. Equation 1 is difficult to solve directly, but can be expressed in an equivalent form using the principle of dynamic programming
2011-01-01
Background Abiotic stresses, such as water deficit and soil salinity, result in changes in physiology, nutrient use, and vegetative growth in vines, and ultimately, yield and flavor in berries of wine grape, Vitis vinifera L. Large-scale expressed sequence tags (ESTs) were generated, curated, and analyzed to identify major genetic determinants responsible for stress-adaptive responses. Although roots serve as the first site of perception and/or injury for many types of abiotic stress, EST sequencing in root tissues of wine grape exposed to abiotic stresses has been extremely limited to date. To overcome this limitation, large-scale EST sequencing was conducted from root tissues exposed to multiple abiotic stresses. Results A total of 62,236 expressed sequence tags (ESTs) were generated from leaf, berry, and root tissues from vines subjected to abiotic stresses and compared with 32,286 ESTs sequenced from 20 public cDNA libraries. Curation to correct annotation errors, clustering and assembly of the berry and leaf ESTs with currently available V. vinifera full-length transcripts and ESTs yielded a total of 13,278 unique sequences, with 2302 singletons and 10,976 mapped to V. vinifera gene models. Of these, 739 transcripts were found to have significant differential expression in stressed leaves and berries including 250 genes not described previously as being abiotic stress responsive. In a second analysis of 16,452 ESTs from a normalized root cDNA library derived from roots exposed to multiple, short-term, abiotic stresses, 135 genes with root-enriched expression patterns were identified on the basis of their relative EST abundance in roots relative to other tissues. Conclusions The large-scale analysis of relative EST frequency counts among a diverse collection of 23 different cDNA libraries from leaf, berry, and root tissues of wine grape exposed to a variety of abiotic stress conditions revealed distinct, tissue-specific expression patterns, previously unrecognized stress-induced genes, and many novel genes with root-enriched mRNA expression for improving our understanding of root biology and manipulation of rootstock traits in wine grape. mRNA abundance estimates based on EST library-enriched expression patterns showed only modest correlations between microarray and quantitative, real-time reverse transcription-polymerase chain reaction (qRT-PCR) methods highlighting the need for deep-sequencing expression profiling methods. PMID:21592389
Metagenomic Taxonomy-Guided Database-Searching Strategy for Improving Metaproteomic Analysis.
Xiao, Jinqiu; Tanca, Alessandro; Jia, Ben; Yang, Runqing; Wang, Bo; Zhang, Yu; Li, Jing
2018-04-06
Metaproteomics provides a direct measure of the functional information by investigating all proteins expressed by a microbiota. However, due to the complexity and heterogeneity of microbial communities, it is very hard to construct a sequence database suitable for a metaproteomic study. Using a public database, researchers might not be able to identify proteins from poorly characterized microbial species, while a sequencing-based metagenomic database may not provide adequate coverage for all potentially expressed protein sequences. To address this challenge, we propose a metagenomic taxonomy-guided database-search strategy (MT), in which a merged database is employed, consisting of both taxonomy-guided reference protein sequences from public databases and proteins from metagenome assembly. By applying our MT strategy to a mock microbial mixture, about two times as many peptides were detected as with the metagenomic database only. According to the evaluation of the reliability of taxonomic attribution, the rate of misassignments was comparable to that obtained using an a priori matched database. We also evaluated the MT strategy with a human gut microbial sample, and we found 1.7 times as many peptides as using a standard metagenomic database. In conclusion, our MT strategy allows the construction of databases able to provide high sensitivity and precision in peptide identification in metaproteomic studies, enabling the detection of proteins from poorly characterized species within the microbiota.
2010-01-01
Background Little genomic or trancriptomic information on Ganoderma lucidum (Lingzhi) is known. This study aims to discover the transcripts involved in secondary metabolite biosynthesis and developmental regulation of G. lucidum using an expressed sequence tag (EST) library. Methods A cDNA library was constructed from the G. lucidum fruiting body. Its high-quality ESTs were assembled into unique sequences with contigs and singletons. The unique sequences were annotated according to sequence similarities to genes or proteins available in public databases. The detection of simple sequence repeats (SSRs) was preformed by online analysis. Results A total of 1,023 clones were randomly selected from the G. lucidum library and sequenced, yielding 879 high-quality ESTs. These ESTs showed similarities to a diverse range of genes. The sequences encoding squalene epoxidase (SE) and farnesyl-diphosphate synthase (FPS) were identified in this EST collection. Several candidate genes, such as hydrophobin, MOB2, profilin and PHO84 were detected for the first time in G. lucidum. Thirteen (13) potential SSR-motif microsatellite loci were also identified. Conclusion The present study demonstrates a successful application of EST analysis in the discovery of transcripts involved in the secondary metabolite biosynthesis and the developmental regulation of G. lucidum. PMID:20230644
Comparative transcriptome analysis of microsclerotia development in Nomuraea rileyi
2013-01-01
Background Nomuraea rileyi is used as an environmental-friendly biopesticide. However, mass production and commercialization of this organism are limited due to its fastidious growth and sporulation requirements. When cultured in amended medium, we found that N. rileyi could produce microsclerotia bodies, replacing conidiophores as the infectious agent. However, little is known about the genes involved in microsclerotia development. In the present study, the transcriptomes were analyzed using next-generation sequencing technology to find the genes involved in microsclerotia development. Results A total of 4.69 Gb of clean nucleotides comprising 32,061 sequences was obtained, and 20,919 sequences were annotated (about 65%). Among the annotated sequences, only 5928 were annotated with 34 gene ontology (GO) functional categories, and 12,778 sequences were mapped to 165 pathways by searching against the Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) database. Furthermore, we assessed the transcriptomic differences between cultures grown in minimal and amended medium. In total, 4808 sequences were found to be differentially expressed; 719 differentially expressed unigenes were assigned to 25 GO classes and 1888 differentially expressed unigenes were assigned to 161 KEGG pathways, including 25 enrichment pathways. Subsequently, we examined the up-regulation or uniquely expressed genes following amended medium treatment, which were also expressed on the enrichment pathway, and found that most of them participated in mediating oxidative stress homeostasis. To elucidate the role of oxidative stress in microsclerotia development, we analyzed the diversification of unigenes using quantitative reverse transcription-PCR (RT-qPCR). Conclusion Our findings suggest that oxidative stress occurs during microsclerotia development, along with a broad metabolic activity change. Our data provide the most comprehensive sequence resource available for the study of N. rileyi. We believe that the transcriptome datasets will serve as an important public information platform to accelerate studies on N. rileyi microsclerotia. PMID:23777366
Nakamura, Sayaka; Sato, Hiroaki; Tanaka, Reiko; Yaguchi, Takashi
2016-01-01
We have previously proposed a rapid identification method for bacterial strains based on the profiles of their ribosomal subunit proteins (RSPs), observed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). This method can perform phylogenetic characterization based on the mass of housekeeping RSP biomarkers, ideally calculated from amino acid sequence information registered in public protein databases. With the aim of extending its field of application to medical mycology, this study investigates the actual state of information of RSPs of eukaryotic fungi registered in public protein databases through the characterization of ribosomal protein fractions extracted from genome-sequenced Aspergillus fumigatus strains Af293 and A1163 as a model. In this process, we have found that the public protein databases harbor problems. The RSP names are in confusion, so we have provisionally unified them using the yeast naming system. The most serious problem is that many incorrect sequences are registered in the public protein databases. Surprisingly, more than half of the sequences are incorrect, due chiefly to mis-annotation of exon/intron structures. These errors could be corrected by a combination of in silico inspection by sequence homology analysis and MALDI-TOF MS measurements. We were also able to confirm conserved post-translational modifications in eleven RSPs. After these verifications, the masses of 31 expressed RSPs under 20,000 Da could be accurately confirmed. These RSPs have a potential to be useful biomarkers for identifying clinical isolates of A. fumigatus .
Zhou, Ren-Bin; Lu, Hui-Meng; Liu, Jie; Shi, Jian-Yu; Zhu, Jing; Lu, Qin-Qin; Yin, Da-Chuan
2016-01-01
Recombinant expression of proteins has become an indispensable tool in modern day research. The large yields of recombinantly expressed proteins accelerate the structural and functional characterization of proteins. Nevertheless, there are literature reported that the recombinant proteins show some differences in structure and function as compared with the native ones. Now there have been more than 100,000 structures (from both recombinant and native sources) publicly available in the Protein Data Bank (PDB) archive, which makes it possible to investigate if there exist any proteins in the RCSB PDB archive that have identical sequence but have some difference in structures. In this paper, we present the results of a systematic comparative study of the 3D structures of identical naturally purified versus recombinantly expressed proteins. The structural data and sequence information of the proteins were mined from the RCSB PDB archive. The combinatorial extension (CE), FATCAT-flexible and TM-Align methods were employed to align the protein structures. The root-mean-square distance (RMSD), TM-score, P-value, Z-score, secondary structural elements and hydrogen bonds were used to assess the structure similarity. A thorough analysis of the PDB archive generated five-hundred-seventeen pairs of native and recombinant proteins that have identical sequence. There were no pairs of proteins that had the same sequence and significantly different structural fold, which support the hypothesis that expression in a heterologous host usually could fold correctly into their native forms.
Zhou, Ren-Bin; Lu, Hui-Meng; Liu, Jie; Shi, Jian-Yu; Zhu, Jing; Lu, Qin-Qin; Yin, Da-Chuan
2016-01-01
Recombinant expression of proteins has become an indispensable tool in modern day research. The large yields of recombinantly expressed proteins accelerate the structural and functional characterization of proteins. Nevertheless, there are literature reported that the recombinant proteins show some differences in structure and function as compared with the native ones. Now there have been more than 100,000 structures (from both recombinant and native sources) publicly available in the Protein Data Bank (PDB) archive, which makes it possible to investigate if there exist any proteins in the RCSB PDB archive that have identical sequence but have some difference in structures. In this paper, we present the results of a systematic comparative study of the 3D structures of identical naturally purified versus recombinantly expressed proteins. The structural data and sequence information of the proteins were mined from the RCSB PDB archive. The combinatorial extension (CE), FATCAT-flexible and TM-Align methods were employed to align the protein structures. The root-mean-square distance (RMSD), TM-score, P-value, Z-score, secondary structural elements and hydrogen bonds were used to assess the structure similarity. A thorough analysis of the PDB archive generated five-hundred-seventeen pairs of native and recombinant proteins that have identical sequence. There were no pairs of proteins that had the same sequence and significantly different structural fold, which support the hypothesis that expression in a heterologous host usually could fold correctly into their native forms. PMID:27517583
García-Sastre, Adolfo; Palese, Peter
2005-01-01
In public databases, we identified sequences reported as human genes expressed in kidney mesangial cells. The similarity of these genes to paramyxovirus matrix, fusion, and phosphoprotein genes suggests that they are derived from a novel paramyxovirus. These genes are sufficiently unique to suggest the existence of a novel paramyxovirus genus. PMID:15705331
Knoll-Gellida, Anja; André, Michèle; Gattegno, Tamar; Forgue, Jean; Admon, Arie; Babin, Patrick J
2006-01-01
Background The ability of an oocyte to develop into a viable embryo depends on the accumulation of specific maternal information and molecules, such as RNAs and proteins. A serial analysis of gene expression (SAGE) was carried out in parallel with proteomic analysis on fully-grown ovarian follicles from zebrafish (Danio rerio). The data obtained were compared with ovary/follicle/egg molecular phenotypes of other animals, published or available in public sequence databases. Results Sequencing of 27,486 SAGE tags identified 11,399 different ones, including 3,329 tags with an occurrence superior to one. Fifty-eight genes were expressed at over 0.15% of the total population and represented 17.34% of the mRNA population identified. The three most expressed transcripts were a rhamnose-binding lectin, beta-actin 2, and a transcribed locus similar to the H2B histone family. Comparison with the large-scale expressed sequence tags sequencing approach revealed highly expressed transcripts that were not previously known to be expressed at high levels in fish ovaries, like the short-sized polarized metallothionein 2 transcript. A higher sensitivity for the detection of transcripts with a characterized maternal genetic contribution was also demonstrated compared to large-scale sequencing of cDNA libraries. Ferritin heavy polypeptide 1, heat shock protein 90-beta, lactate dehydrogenase B4, beta-actin isoforms, tubulin beta 2, ATP synthase subunit 9, together with 40 S ribosomal protein S27a, were common highly-expressed transcripts of vertebrate ovary/unfertilized egg. Comparison of transcriptome and proteome data revealed that transcript levels provide little predictive value with respect to the extent of protein abundance. All the proteins identified by proteomic analysis of fully-grown zebrafish follicles had at least one transcript counterpart, with two exceptions: eosinophil chemotactic cytokine and nothepsin. Conclusion This study provides a complete sequence data set of maternal mRNA stored in zebrafish germ cells at the end of oogenesis. This catalogue contains highly-expressed transcripts that are part of a vertebrate ovarian expressed gene signature. Comparison of transcriptome and proteome data identified downregulated transcripts or proteins potentially incorporated in the oocyte by endocytosis. The molecular phenotype described provides groundwork for future experimental approaches aimed at identifying functionally important stored maternal transcripts and proteins involved in oogenesis and early stages of embryo development. PMID:16526958
Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir
2013-01-01
Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum. PMID:24376689
Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir
2013-01-01
Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum.
Gorodkin, Jan; Cirera, Susanna; Hedegaard, Jakob; Gilchrist, Michael J; Panitz, Frank; Jørgensen, Claus; Scheibye-Knudsen, Karsten; Arvin, Troels; Lumholdt, Steen; Sawera, Milena; Green, Trine; Nielsen, Bente J; Havgaard, Jakob H; Rosenkilde, Carina; Wang, Jun; Li, Heng; Li, Ruiqiang; Liu, Bin; Hu, Songnian; Dong, Wei; Li, Wei; Yu, Jun; Wang, Jian; Stærfeldt, Hans-Henrik; Wernersson, Rasmus; Madsen, Lone B; Thomsen, Bo; Hornshøj, Henrik; Bujie, Zhan; Wang, Xuegang; Wang, Xuefei; Bolund, Lars; Brunak, Søren; Yang, Huanming; Bendixen, Christian; Fredholm, Merete
2007-01-01
Background Knowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages. Results Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which approximately 25% have a high confidence match to UniProt. Approximately 6,000 new porcine gene clusters were identified. Expression analysis based on the non-normalized libraries resulted in the following findings. The distribution of cluster sizes is scaling invariant. Brain and testes are among the tissues with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression of genes between different tissues, in particular brain/spinal cord, and found patterns of correlation between genes that share expression in pairs of libraries. Finally, there was remarkable agreement in expression between specialized tissues according to Gene Ontology categories. Conclusion This EST collection, the largest to date in pig, represents an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies. PMID:17407547
Ohyanagi, Hajime; Takano, Tomoyuki; Terashima, Shin; Kobayashi, Masaaki; Kanno, Maasa; Morimoto, Kyoko; Kanegae, Hiromi; Sasaki, Yohei; Saito, Misa; Asano, Satomi; Ozaki, Soichi; Kudo, Toru; Yokoyama, Koji; Aya, Koichiro; Suwabe, Keita; Suzuki, Go; Aoki, Koh; Kubo, Yasutaka; Watanabe, Masao; Matsuoka, Makoto; Yano, Kentaro
2015-01-01
Comprehensive integration of large-scale omics resources such as genomes, transcriptomes and metabolomes will provide deeper insights into broader aspects of molecular biology. For better understanding of plant biology, we aim to construct a next-generation sequencing (NGS)-derived gene expression network (GEN) repository for a broad range of plant species. So far we have incorporated information about 745 high-quality mRNA sequencing (mRNA-Seq) samples from eight plant species (Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, Sorghum bicolor, Vitis vinifera, Solanum tuberosum, Medicago truncatula and Glycine max) from the public short read archive, digitally profiled the entire set of gene expression profiles, and drawn GENs by using correspondence analysis (CA) to take advantage of gene expression similarities. In order to understand the evolutionary significance of the GENs from multiple species, they were linked according to the orthology of each node (gene) among species. In addition to other gene expression information, functional annotation of the genes will facilitate biological comprehension. Currently we are improving the given gene annotations with natural language processing (NLP) techniques and manual curation. Here we introduce the current status of our analyses and the web database, PODC (Plant Omics Data Center; http://bioinf.mind.meiji.ac.jp/podc/), now open to the public, providing GENs, functional annotations and additional comprehensive omics resources. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.
De Pittà, Cristiano; Bertolucci, Cristiano; Mazzotta, Gabriella M; Bernante, Filippo; Rizzo, Giorgia; De Nardi, Barbara; Pallavicini, Alberto; Lanfranchi, Gerolamo; Costa, Rodolfo
2008-01-01
Background Little is known about the genome sequences of Euphausiacea (krill) although these crustaceans are abundant components of the pelagic ecosystems in all oceans and used for aquaculture and pharmaceutical industry. This study reports the results of an expressed sequence tag (EST) sequencing project from different tissues of Euphausia superba (the Antarctic krill). Results We have constructed and sequenced five cDNA libraries from different Antarctic krill tissues: head, abdomen, thoracopods and photophores. We have identified 1.770 high-quality ESTs which were assembled into 216 overlapping clusters and 801 singletons resulting in a total of 1.017 non-redundant sequences. Quantitative RT-PCR analysis was performed to quantify and validate the expression levels of ten genes presenting different EST countings in krill tissues. In addition, bioinformatic screening of the non-redundant E. superba sequences identified 69 microsatellite containing ESTs. Clusters, consensuses and related similarity and gene ontology searches were organized in a dedicated E. superba database . Conclusion We defined the first tissue transcriptional signatures of E. superba based on functional categorization among the examined tissues. The analyses of annotated transcripts showed a higher similarity with genes from insects with respect to Malacostraca possibly as an effect of the limited number of Malacostraca sequences in the public databases. Our catalogue provides for the first time a genomic tool to investigate the biology of the Antarctic krill. PMID:18226200
Yang, Aifu; Zhou, Zunchun; Pan, Yongjia; Jiang, Jingwei; Dong, Ying; Guan, Xiaoyan; Sun, Hongjuan; Gao, Shan; Chen, Zhong
2016-06-14
Sea cucumber Apostichopus japonicus is an important economic species in China, which is affected by various diseases; skin ulceration syndrome (SUS) is the most serious. In this study, we characterized the transcriptomes in A. japonicus challenged with Vibrio splendidus to elucidate the changes in gene expression throughout the three stages of SUS progression. RNA sequencing of 21 cDNA libraries from various tissues and developmental stages of SUS-affected A. japonicus yielded 553 million raw reads, of which 542 million high-quality reads were generated by deep-sequencing using the Illumina HiSeq™ 2000 platform. The reference transcriptome comprised a combination of the Illumina reads, 454 sequencing data and Sanger sequences obtained from the public database to generate 93,163 unigenes (average length, 1,052 bp; N50 = 1,575 bp); 33,860 were annotated. Transcriptome comparisons between healthy and SUS-affected A. japonicus revealed greater differences in gene expression profiles in the body walls (BW) than in the intestines (Int), respiratory trees (RT) and coelomocytes (C). Clustering of expression models revealed stable up-regulation as the main pattern occurring in the BW throughout the three stages of SUS progression. Significantly affected pathways were associated with signal transduction, immune system, cellular processes, development and metabolism. Ninety-two differentially expressed genes (DEGs) were divided into four functional categories: attachment/pathogen recognition (17), inflammatory reactions (38), oxidative stress response (7) and apoptosis (30). Using quantitative real-time PCR, twenty representative DEGs were selected to validate the sequencing results. The Pearson's correlation coefficient (R) of the 20 DEGs ranged from 0.811 to 0.999, which confirmed the consistency and accuracy between these two approaches. Dynamic changes in global gene expression occur during SUS progression in A. japonicus. Elucidation of these changes is important in clarifying the molecular mechanisms associated with the development of SUS in sea cucumber.
Nakamura, Sayaka; Sato, Hiroaki; Tanaka, Reiko; Yaguchi, Takashi
2016-01-01
We have previously proposed a rapid identification method for bacterial strains based on the profiles of their ribosomal subunit proteins (RSPs), observed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). This method can perform phylogenetic characterization based on the mass of housekeeping RSP biomarkers, ideally calculated from amino acid sequence information registered in public protein databases. With the aim of extending its field of application to medical mycology, this study investigates the actual state of information of RSPs of eukaryotic fungi registered in public protein databases through the characterization of ribosomal protein fractions extracted from genome-sequenced Aspergillus fumigatus strains Af293 and A1163 as a model. In this process, we have found that the public protein databases harbor problems. The RSP names are in confusion, so we have provisionally unified them using the yeast naming system. The most serious problem is that many incorrect sequences are registered in the public protein databases. Surprisingly, more than half of the sequences are incorrect, due chiefly to mis-annotation of exon/intron structures. These errors could be corrected by a combination of in silico inspection by sequence homology analysis and MALDI-TOF MS measurements. We were also able to confirm conserved post-translational modifications in eleven RSPs. After these verifications, the masses of 31 expressed RSPs under 20,000 Da could be accurately confirmed. These RSPs have a potential to be useful biomarkers for identifying clinical isolates of A. fumigatus. PMID:27843740
Devailly, Guillaume; Mantsoki, Anna; Joshi, Anagha
2016-11-01
Better protocols and decreasing costs have made high-throughput sequencing experiments now accessible even to small experimental laboratories. However, comparing one or few experiments generated by an individual lab to the vast amount of relevant data freely available in the public domain might be limited due to lack of bioinformatics expertise. Though several tools, including genome browsers, allow such comparison at a single gene level, they do not provide a genome-wide view. We developed Heat*seq, a web-tool that allows genome scale comparison of high throughput experiments chromatin immuno-precipitation followed by sequencing, RNA-sequencing and Cap Analysis of Gene Expression) provided by a user, to the data in the public domain. Heat*seq currently contains over 12 000 experiments across diverse tissues and cell types in human, mouse and drosophila. Heat*seq displays interactive correlation heatmaps, with an ability to dynamically subset datasets to contextualize user experiments. High quality figures and tables are produced and can be downloaded in multiple formats. Web application: http://www.heatstarseq.roslin.ed.ac.uk/ Source code: https://github.com/gdevailly CONTACT: Guillaume.Devailly@roslin.ed.ac.uk or Anagha.Joshi@roslin.ed.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
mirEX: a platform for comparative exploration of plant pri-miRNA expression data.
Bielewicz, Dawid; Dolata, Jakub; Zielezinski, Andrzej; Alaba, Sylwia; Szarzynska, Bogna; Szczesniak, Michal W; Jarmolowski, Artur; Szweykowska-Kulinska, Zofia; Karlowski, Wojciech M
2012-01-01
mirEX is a comprehensive platform for comparative analysis of primary microRNA expression data. RT-qPCR-based gene expression profiles are stored in a universal and expandable database scheme and wrapped by an intuitive user-friendly interface. A new way of accessing gene expression data in mirEX includes a simple mouse operated querying system and dynamic graphs for data mining analyses. In contrast to other publicly available databases, the mirEX interface allows a simultaneous comparison of expression levels between various microRNA genes in diverse organs and developmental stages. Currently, mirEX integrates information about the expression profile of 190 Arabidopsis thaliana pri-miRNAs in seven different developmental stages: seeds, seedlings and various organs of mature plants. Additionally, by providing RNA structural models, publicly available deep sequencing results, experimental procedure details and careful selection of auxiliary data in the form of web links, mirEX can function as a one-stop solution for Arabidopsis microRNA information. A web-based mirEX interface can be accessed at http://bioinfo.amu.edu.pl/mirex.
USDA-ARS?s Scientific Manuscript database
Genomic and transcriptomic data on kiwifruit (Actinidia chinensis) in public databases are very limited despite its nutritional and economic value. Previously, we have constructed and sequenced nine fruit RNA-Seq libraries of A. chinensis cv. 'Hongyang' at immature, mature, and postharvest ripening...
Imin, Nijat; De Jong, Femke; Mathesius, Ulrike; van Noorden, Giel; Saeed, Nasir A; Wang, Xin-Ding; Rose, Ray J; Rolfe, Barry G
2004-07-01
Using a combination of two-dimensional gel electrophoresis (2-DE) protein mapping and mass spectrometry (MS) analysis, we have established proteome reference maps of Medicago truncatula embryogenic tissue culture cells. The cultures were generated from single protoplasts, which provided a relatively homogeneous cell population. We used these to analyze protein expression at the globular stages of somatic embryogenesis, which is the earliest morphogenetic embryonic stage. Over 3000 proteins could reproducibly be resolved over a pI range of 4-11. Three hundred and twelve protein spots were extracted from colloidal Coomassie Blue-stained 2-DE gels and analyzed by matrix-assisted laser desorption/ionization-time of flight MS analysis and tandem MS sequencing. This enabled the identification of 169 protein spots representing 128 unique gene products using a publicly available expressed sequence tag database and the MASCOT search engine. These reference maps will be valuable for the investigation of the molecular events which occur during somatic embryogenesis in M. truncatula. The proteome reference maps and supplementary materials will be available and updated for public access at http://semele.anu.edu.au/.
Barvkar, Vitthal T; Pardeshi, Varsha C; Kale, Sandip M; Kadoo, Narendra Y; Gupta, Vidya S
2012-05-08
The glycosylation process, catalyzed by ubiquitous glycosyltransferase (GT) family enzymes, is a prevalent modification of plant secondary metabolites that regulates various functions such as hormone homeostasis, detoxification of xenobiotics and biosynthesis and storage of secondary metabolites. Flax (Linum usitatissimum L.) is a commercially grown oilseed crop, important because of its essential fatty acids and health promoting lignans. Identification and characterization of UDP glycosyltransferase (UGT) genes from flax could provide valuable basic information about this important gene family and help to explain the seed specific glycosylated metabolite accumulation and other processes in plants. Plant genome sequencing projects are useful to discover complexity within this gene family and also pave way for the development of functional genomics approaches. Taking advantage of the newly assembled draft genome sequence of flax, we identified 137 UDP glycosyltransferase (UGT) genes from flax using a conserved signature motif. Phylogenetic analysis of these protein sequences clustered them into 14 major groups (A-N). Expression patterns of these genes were investigated using publicly available expressed sequence tag (EST), microarray data and reverse transcription quantitative real time PCR (RT-qPCR). Seventy-three per cent of these genes (100 out of 137) showed expression evidence in 15 tissues examined and indicated varied expression profiles. The RT-qPCR results of 10 selected genes were also coherent with the digital expression analysis. Interestingly, five duplicated UGT genes were identified, which showed differential expression in various tissues. Of the seven intron loss/gain positions detected, two intron positions were conserved among most of the UGTs, although a clear relationship about the evolution of these genes could not be established. Comparison of the flax UGTs with orthologs from four other sequenced dicot genomes indicated that seven UGTs were flax diverged. Flax has a large number of UGT genes including few flax diverged ones. Phylogenetic analysis and expression profiles of these genes identified tissue and condition specific repertoire of UGT genes from this crop. This study would facilitate precise selection of candidate genes and their further characterization of substrate specificities and in planta functions.
2012-01-01
Background The glycosylation process, catalyzed by ubiquitous glycosyltransferase (GT) family enzymes, is a prevalent modification of plant secondary metabolites that regulates various functions such as hormone homeostasis, detoxification of xenobiotics and biosynthesis and storage of secondary metabolites. Flax (Linum usitatissimum L.) is a commercially grown oilseed crop, important because of its essential fatty acids and health promoting lignans. Identification and characterization of UDP glycosyltransferase (UGT) genes from flax could provide valuable basic information about this important gene family and help to explain the seed specific glycosylated metabolite accumulation and other processes in plants. Plant genome sequencing projects are useful to discover complexity within this gene family and also pave way for the development of functional genomics approaches. Results Taking advantage of the newly assembled draft genome sequence of flax, we identified 137 UDP glycosyltransferase (UGT) genes from flax using a conserved signature motif. Phylogenetic analysis of these protein sequences clustered them into 14 major groups (A-N). Expression patterns of these genes were investigated using publicly available expressed sequence tag (EST), microarray data and reverse transcription quantitative real time PCR (RT-qPCR). Seventy-three per cent of these genes (100 out of 137) showed expression evidence in 15 tissues examined and indicated varied expression profiles. The RT-qPCR results of 10 selected genes were also coherent with the digital expression analysis. Interestingly, five duplicated UGT genes were identified, which showed differential expression in various tissues. Of the seven intron loss/gain positions detected, two intron positions were conserved among most of the UGTs, although a clear relationship about the evolution of these genes could not be established. Comparison of the flax UGTs with orthologs from four other sequenced dicot genomes indicated that seven UGTs were flax diverged. Conclusions Flax has a large number of UGT genes including few flax diverged ones. Phylogenetic analysis and expression profiles of these genes identified tissue and condition specific repertoire of UGT genes from this crop. This study would facilitate precise selection of candidate genes and their further characterization of substrate specificities and in planta functions. PMID:22568875
ESTree db: a Tool for Peach Functional Genomics
Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo
2005-01-01
Background The ESTree db represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. Results The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. Conclusion The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig. PMID:16351742
ESTree db: a tool for peach functional genomics.
Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo
2005-12-01
The ESTree db http://www.itb.cnr.it/estree/ represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig.
Samusik, Nikolay; Krukovskaya, Larisa; Meln, Irina; Shilov, Evgeny; Kozlov, Andrey P.
2013-01-01
PBOV1 is a known human protein-coding gene with an uncharacterized function. We have previously found that PBOV1 lacks orthologs in non-primate genomes and is expressed in a wide range of tumor types. Here we report that PBOV1 protein-coding sequence is human-specific and has originated de novo in the primate evolution through a series of frame-shift and stop codon mutations. We profiled PBOV1 expression in multiple cancer and normal tissue samples and found that it was expressed in 19 out of 34 tumors of various origins but completely lacked expression in any of the normal adult or fetal human tissues. We found that, unlike the cancer/testis antigens that are typically controlled by CpG island-containing promoters, PBOV1 was expressed from a GC-poor TATA-containing promoter which was not influenced by CpG demethylation and was inactive in testis. Our analysis of public microarray data suggests that PBOV1 activation in tumors could be dependent on the Hedgehog signaling pathway. Despite the recent de novo origin and the lack of identifiable functional signatures, a missense SNP in the PBOV1 coding sequence has been previously associated with an increased risk of breast cancer. Using publicly available microarray datasets, we found that high levels of PBOV1 expression in breast cancer and glioma samples were significantly associated with a positive outcome of the cancer disease. We also found that PBOV1 was highly expressed in primary but not in recurrent high-grade gliomas, suggesting the presence of a negative selection against PBOV1-expressing cancer cells. Our findings could contribute to the understanding of the mechanisms behind de novo gene origin and the possible role of tumors in this process. PMID:23418531
Puranik, Swati; Kumar, Karunesh; Srivastava, Prem S; Prasad, Manoj
2011-10-01
The NAC (NAM/ATAF1,2/CUC2) proteins are among the largest family of plant transcription factors. Its members have been associated with diverse plant processes and intricately regulate the expression of several genes. Inspite of this immense progress, knowledge of their DNA-binding properties are still limited. In our recent publication,1 we reported isolation of a membrane-associated NAC domain protein from Setaria italica (SiNAC). Transactivation analysis revealed that it was a functionally active transcription factor as it could stimulate expression of reporter genes in vivo. Truncations of the transmembrane region of the protein lead to its nuclear localization. Here we describe expression and purification of SiNAC DNA-binding domain. We further report identification of a novel DNA-binding site, [C/G][A/T][T/A][G/C]TC[C/G][A/T][C/G][G/C] for SiNAC by electrophoretic mobility shift assay. The SiNAC-GST protein could bind to the NAC recognition sequence in vitro as well as to sequences where some bases had been reshuffled. The results presented here contribute to our understanding of the DNA-binding specificity of SiNAC protein.
Puranik, Swati; Kumar, Karunesh; Srivastava, Prem S
2011-01-01
The NAC (NAM/ATAF1,2/CUC2) proteins are among the largest family of plant transcription factors. Its members have been associated with diverse plant processes and intricately regulate the expression of several genes. Inspite of this immense progress, knowledge of their DNA-binding properties are still limited. In our recent publication,1 we reported isolation of a membrane-associated NAC domain protein from Setaria italica (SiNAC). Transactivation analysis revealed that it was a functionally active transcription factor as it could stimulate expression of reporter genes in vivo. Truncation of the transmembrane region of the protein lead to its nuclear localization. Here we describe expression and purification of SiNAC DNA-binding domain. We further report identification of a novel DNA-binding site, [C/G][A/T] [T/A][G/C]TC[C/G][A/T][C/G][G/C] for SiNAC by electrophoretic mobility shift assay. The SiNAC-GST protein could bind to the NAC recognition sequence in vitro as well as to sequences where some bases had been reshuffled. The results presented here contribute to our understanding of the DNA-binding specificity of SiNAC protein. PMID:21918373
RefEx, a reference gene expression dataset as a web tool for the functional analysis of genes.
Ono, Hiromasa; Ogasawara, Osamu; Okubo, Kosaku; Bono, Hidemasa
2017-08-29
Gene expression data are exponentially accumulating; thus, the functional annotation of such sequence data from metadata is urgently required. However, life scientists have difficulty utilizing the available data due to its sheer magnitude and complicated access. We have developed a web tool for browsing reference gene expression pattern of mammalian tissues and cell lines measured using different methods, which should facilitate the reuse of the precious data archived in several public databases. The web tool is called Reference Expression dataset (RefEx), and RefEx allows users to search by the gene name, various types of IDs, chromosomal regions in genetic maps, gene family based on InterPro, gene expression patterns, or biological categories based on Gene Ontology. RefEx also provides information about genes with tissue-specific expression, and the relative gene expression values are shown as choropleth maps on 3D human body images from BodyParts3D. Combined with the newly incorporated Functional Annotation of Mammals (FANTOM) dataset, RefEx provides insight regarding the functional interpretation of unfamiliar genes. RefEx is publicly available at http://refex.dbcls.jp/.
RefEx, a reference gene expression dataset as a web tool for the functional analysis of genes
Ono, Hiromasa; Ogasawara, Osamu; Okubo, Kosaku; Bono, Hidemasa
2017-01-01
Gene expression data are exponentially accumulating; thus, the functional annotation of such sequence data from metadata is urgently required. However, life scientists have difficulty utilizing the available data due to its sheer magnitude and complicated access. We have developed a web tool for browsing reference gene expression pattern of mammalian tissues and cell lines measured using different methods, which should facilitate the reuse of the precious data archived in several public databases. The web tool is called Reference Expression dataset (RefEx), and RefEx allows users to search by the gene name, various types of IDs, chromosomal regions in genetic maps, gene family based on InterPro, gene expression patterns, or biological categories based on Gene Ontology. RefEx also provides information about genes with tissue-specific expression, and the relative gene expression values are shown as choropleth maps on 3D human body images from BodyParts3D. Combined with the newly incorporated Functional Annotation of Mammals (FANTOM) dataset, RefEx provides insight regarding the functional interpretation of unfamiliar genes. RefEx is publicly available at http://refex.dbcls.jp/. PMID:28850115
Sun, Haiyue; Liu, Yushan; Gai, Yuzhuo; Geng, Jinman; Chen, Li; Liu, Hongdi; Kang, Limin; Tian, Youwen; Li, Yadong
2015-09-02
Cranberries (Vaccinium macrocarpon Ait.), renowned for their excellent health benefits, are an important berry crop. Here, we performed transcriptome sequencing of one cranberry cultivar, from fruits at two different developmental stages, on the Illumina HiSeq 2000 platform. Our main goals were to identify putative genes for major metabolic pathways of bioactive compounds and compare the expression patterns between white fruit (W) and red fruit (R) in cranberry. In this study, two cDNA libraries of W and R were constructed. Approximately 119 million raw sequencing reads were generated and assembled de novo, yielding 57,331 high quality unigenes with an average length of 739 bp. Using BLASTx, 38,460 unigenes were identified as putative homologs of annotated sequences in public protein databases, including NCBI NR, NT, Swiss-Prot, KEGG, COG and GO. Of these, 21,898 unigenes mapped to 128 KEGG pathways, with the metabolic pathways, secondary metabolites, glycerophospholipid metabolism, ether lipid metabolism, starch and sucrose metabolism, purine metabolism, and pyrimidine metabolism being well represented. Among them, many candidate genes were involved in flavonoid biosynthesis, transport and regulation. Furthermore, digital gene expression (DEG) analysis identified 3,257 unigenes that were differentially expressed between the two fruit developmental stages. In addition, 14,473 simple sequence repeats (SSRs) were detected. Our results present comprehensive gene expression information about the cranberry fruit transcriptome that could facilitate our understanding of the molecular mechanisms of fruit development in cranberries. Although it will be necessary to validate the functions carried out by these genes, these results could be used to improve the quality of breeding programs for the cranberry and related species.
Identification and characterization of microRNAs in Phaseolus vulgaris by high-throughput sequencing
2012-01-01
Background MicroRNAs (miRNAs) are endogenously encoded small RNAs that post-transcriptionally regulate gene expression. MiRNAs play essential roles in almost all plant biological processes. Currently, few miRNAs have been identified in the model food legume Phaseolus vulgaris (common bean). Recent advances in next generation sequencing technologies have allowed the identification of conserved and novel miRNAs in many plant species. Here, we used Illumina's sequencing by synthesis (SBS) technology to identify and characterize the miRNA population of Phaseolus vulgaris. Results Small RNA libraries were generated from roots, flowers, leaves, and seedlings of P. vulgaris. Based on similarity to previously reported plant miRNAs,114 miRNAs belonging to 33 conserved miRNA families were identified. Stem-loop precursors and target gene sequences for several conserved common bean miRNAs were determined from publicly available databases. Less conserved miRNA families and species-specific common bean miRNA isoforms were also characterized. Moreover, novel miRNAs based on the small RNAs were found and their potential precursors were predicted. In addition, new target candidates for novel and conserved miRNAs were proposed. Finally, we studied organ-specific miRNA family expression levels through miRNA read frequencies. Conclusions This work represents the first massive-scale RNA sequencing study performed in Phaseolus vulgaris to identify and characterize its miRNA population. It significantly increases the number of miRNAs, precursors, and targets identified in this agronomically important species. The miRNA expression analysis provides a foundation for understanding common bean miRNA organ-specific expression patterns. The present study offers an expanded picture of P. vulgaris miRNAs in relation to those of other legumes. PMID:22394504
2012-01-01
Background MicroRNAs (miRNAs) are one of the functional non-coding small RNAs involved in the epigenetic control of the plant genome. Although plants contain both evolutionary conserved miRNAs and species-specific miRNAs within their genomes, computational methods often only identify evolutionary conserved miRNAs. The recent sequencing of the Brassica rapa genome enables us to identify miRNAs and their putative target genes. In this study, we sought to provide a more comprehensive prediction of B. rapa miRNAs based on high throughput small RNA deep sequencing. Results We sequenced small RNAs from five types of tissue: seedlings, roots, petioles, leaves, and flowers. By analyzing 2.75 million unique reads that mapped to the B. rapa genome, we identified 216 novel and 196 conserved miRNAs that were predicted to target approximately 20% of the genome’s protein coding genes. Quantitative analysis of miRNAs from the five types of tissue revealed that novel miRNAs were expressed in diverse tissues but their expression levels were lower than those of the conserved miRNAs. Comparative analysis of the miRNAs between the B. rapa and Arabidopsis thaliana genomes demonstrated that redundant copies of conserved miRNAs in the B. rapa genome may have been deleted after whole genome triplication. Novel miRNA members seemed to have spontaneously arisen from the B. rapa and A. thaliana genomes, suggesting the species-specific expansion of miRNAs. We have made this data publicly available in a miRNA database of B. rapa called BraMRs. The database allows the user to retrieve miRNA sequences, their expression profiles, and a description of their target genes from the five tissue types investigated here. Conclusions This is the first report to identify novel miRNAs from Brassica crops using genome-wide high throughput techniques. The combination of computational methods and small RNA deep sequencing provides robust predictions of miRNAs in the genome. The finding of numerous novel miRNAs, many with few target genes and low expression levels, suggests the rapid evolution of miRNA genes. The development of a miRNA database, BraMRs, enables us to integrate miRNA identification, target prediction, and functional annotation of target genes. BraMRs will represent a valuable public resource with which to study the epigenetic control of B. rapa and other closely related Brassica species. The database is available at the following link: http://bramrs.rna.kr [1]. PMID:23163954
Calduch-Giner, Josep A.; Sitjà-Bobadilla, Ariadna; Pérez-Sánchez, Jaume
2016-01-01
High-quality sequencing reads from the intestine of European sea bass were assembled, annotated by similarity against protein reference databases and combined with nucleotide sequences from public and private databases. After redundancy filtering, 24,906 non-redundant annotated sequences encoding 15,367 different gene descriptions were obtained. These annotated sequences were used to design a custom, high-density oligo-microarray (8 × 15 K) for the transcriptomic profiling of anterior (AI), middle (MI), and posterior (PI) intestinal segments. Similar molecular signatures were found for AI and MI segments, which were combined in a single group (AI-MI) whereas the PI outstood separately, with more than 1900 differentially expressed genes with a fold-change cutoff of 2. Functional analysis revealed that molecular and cellular functions related to feed digestion and nutrient absorption and transport were over-represented in AI-MI segments. By contrast, the initiation and establishment of immune defense mechanisms became especially relevant in PI, although the microarray expression profiling validated by qPCR indicated that these functional changes are gradual from anterior to posterior intestinal segments. This functional divergence occurred in association with spatial transcriptional changes in nutrient transporters and the mucosal chemosensing system via G protein-coupled receptors. These findings contribute to identify key indicators of gut functions and to compare different fish feeding strategies and immune defense mechanisms acquired along the evolution of teleosts. PMID:27610085
Calduch-Giner, Josep A; Sitjà-Bobadilla, Ariadna; Pérez-Sánchez, Jaume
2016-01-01
High-quality sequencing reads from the intestine of European sea bass were assembled, annotated by similarity against protein reference databases and combined with nucleotide sequences from public and private databases. After redundancy filtering, 24,906 non-redundant annotated sequences encoding 15,367 different gene descriptions were obtained. These annotated sequences were used to design a custom, high-density oligo-microarray (8 × 15 K) for the transcriptomic profiling of anterior (AI), middle (MI), and posterior (PI) intestinal segments. Similar molecular signatures were found for AI and MI segments, which were combined in a single group (AI-MI) whereas the PI outstood separately, with more than 1900 differentially expressed genes with a fold-change cutoff of 2. Functional analysis revealed that molecular and cellular functions related to feed digestion and nutrient absorption and transport were over-represented in AI-MI segments. By contrast, the initiation and establishment of immune defense mechanisms became especially relevant in PI, although the microarray expression profiling validated by qPCR indicated that these functional changes are gradual from anterior to posterior intestinal segments. This functional divergence occurred in association with spatial transcriptional changes in nutrient transporters and the mucosal chemosensing system via G protein-coupled receptors. These findings contribute to identify key indicators of gut functions and to compare different fish feeding strategies and immune defense mechanisms acquired along the evolution of teleosts.
2010-01-01
Background Cutaneous mycoses are common human infections among healthy and immunocompromised hosts, and the anthropophilic fungus Trichophyton rubrum is the most prevalent microorganism isolated from such clinical cases worldwide. The aim of this study was to determine the transcriptional profile of T. rubrum exposed to various stimuli in order to obtain insights into the responses of this pathogen to different environmental challenges. Therefore, we generated an expressed sequence tag (EST) collection by constructing one cDNA library and nine suppression subtractive hybridization libraries. Results The 1388 unigenes identified in this study were functionally classified based on the Munich Information Center for Protein Sequences (MIPS) categories. The identified proteins were involved in transcriptional regulation, cellular defense and stress, protein degradation, signaling, transport, and secretion, among other functions. Analysis of these unigenes revealed 575 T. rubrum sequences that had not been previously deposited in public databases. Conclusion In this study, we identified novel T. rubrum genes that will be useful for ORF prediction in genome sequencing and facilitating functional genome analysis. Annotation of these expressed genes revealed metabolic adaptations of T. rubrum to carbon sources, ambient pH shifts, and various antifungal drugs used in medical practice. Furthermore, challenging T. rubrum with cytotoxic drugs and ambient pH shifts extended our understanding of the molecular events possibly involved in the infectious process and resistance to antifungal drugs. PMID:20144196
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gao, Jian; Luo, Mao; Zhu, Ye
2015-03-27
Viola yedoensis Makino is an important Chinese traditional medicine plant adapted to cadmium (Cd) pollution regions. Illumina sequencing technology was used to sequence the transcriptome of V. yedoensis Makino. We sequenced Cd-treated (VIYCd) and untreated (VIYCK) samples of V. yedoensis, and obtained 100,410,834 and 83,587,676 high quality reads, respectively. After de novo assembly and quantitative assessment, 109,800 unigenes were finally generated with an average length of 661 bp. We then obtained functional annotations by aligning unigenes with public protein databases including NR, NT, SwissProt, KEGG and COG. In addition, 892 differentially expressed genes (DEGs) were investigated between the two libraries ofmore » untreated (VIYCK) and Cd-treated (VIYCd) plants. Moreover, 15 randomly selected DEGs were further validated with qRT-PCR and the results were highly accordant with the Solexa analysis. This study firstly generated a successful global analysis of the V. yedoensis transcriptome and it will provide for further studies on gene expression, genomics, and functional genomics in Violaceae. - Highlights: • A de novo assembly generated 109,800 unigenes and 5,4479 of them were annotated. • 31,285 could be classified into 26 COG categories. • 263 biosynthesis pathways were predicted and classified into five categories. • 892 DEGs were detected and 15 of them were validated by qRT-PCR.« less
ESTuber db: an online database for Tuber borchii EST sequences.
Lazzari, Barbara; Caprera, Andrea; Cosentino, Cristian; Stella, Alessandra; Milanesi, Luciano; Viotti, Angelo
2007-03-08
The ESTuber database (http://www.itb.cnr.it/estuber) includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-house prepared cDNA library from truffle vegetative hyphae, and 882 sequences downloaded from GenBank and representing four libraries from white truffle mycelia and ascocarps at different developmental stages. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts. Data were collected in a MySQL database, which can be queried via a php-based web interface. Sequences included in the ESTuber db were clustered and annotated against three databases: the GenBank nr database, the UniProtKB database and a third in-house prepared database of fungi genomic sequences. An algorithm was implemented to infer statistical classification among Gene Ontology categories from the ontology occurrences deduced from the annotation procedure against the UniProtKB database. Ontologies were also deduced from the annotation of more than 130,000 EST sequences from five filamentous fungi, for intra-species comparison purposes. Further analyses were performed on the ESTuber db dataset, including tandem repeats search and comparison of the putative protein dataset inferred from the EST sequences to the PROSITE database for protein patterns identification. All the analyses were performed both on the complete sequence dataset and on the contig consensus sequences generated by the EST assembly procedure. The resulting web site is a resource of data and links related to truffle expressed genes. The Sequence Report and Contig Report pages are the web interface core structures which, together with the Text search utility and the Blast utility, allow easy access to the data stored in the database.
2012-01-01
Background A detailed knowledge about spatial and temporal gene expression is important for understanding both the function of genes and their evolution. For the vast majority of species, transcriptomes are still largely uncharacterized and even in those where substantial information is available it is often in the form of partially sequenced transcriptomes. With the development of next generation sequencing, a single experiment can now simultaneously identify the transcribed part of a species genome and estimate levels of gene expression. Results mRNA from actively growing needles of Norway spruce (Picea abies) was sequenced using next generation sequencing technology. In total, close to 70 million fragments with a length of 76 bp were sequenced resulting in 5 Gbp of raw data. A de novo assembly of these reads, together with publicly available expressed sequence tag (EST) data from Norway spruce, was used to create a reference transcriptome. Of the 38,419 PUTs (putative unique transcripts) longer than 150 bp in this reference assembly, 83.5% show similarity to ESTs from other spruce species and of the remaining PUTs, 3,704 show similarity to protein sequences from other plant species, leaving 4,167 PUTs with limited similarity to currently available plant proteins. By predicting coding frames and comparing not only the Norway spruce PUTs, but also PUTs from the close relatives Picea glauca and Picea sitchensis to both Pinus taeda and Taxus mairei, we obtained estimates of synonymous and non-synonymous divergence among conifer species. In addition, we detected close to 15,000 SNPs of high quality and estimated gene expression differences between samples collected under dark and light conditions. Conclusions Our study yielded a large number of single nucleotide polymorphisms as well as estimates of gene expression on transcriptome scale. In agreement with a recent study we find that the synonymous substitution rate per year (0.6 × 10−09 and 1.1 × 10−09) is an order of magnitude smaller than values reported for angiosperm herbs. However, if one takes generation time into account, most of this difference disappears. The estimates of the dN/dS ratio (non-synonymous over synonymous divergence) reported here are in general much lower than 1 and only a few genes showed a ratio larger than 1. PMID:23122049
Huang, Kristen M; Geunes-Boyer, Scarlett; Wu, Sufen; Dutra, Amalia; Favor, Jack; Stambolian, Dwight
2004-05-01
Xcat mice display X-linked congenital cataracts and are a mouse model for the human X-linked cataract disease Nance Horan syndrome (NHS). The genetic defect in Xcat mice and NHS patients is not known. We isolated and sequenced a BAC contig representing a portion of the Xcat critical region. We combined our sequencing data with the most recent mouse sequence assemblies from both Celera and public databases. The sequence of the 2.2-Mb Xcat critical region was then analyzed for potential Xcat candidate genes. The coding regions of the seven known genes within this area (Rai2, Rbbp7, Ctps2, Calb3, Grpr, Reps2, and Syap1) were sequenced in Xcat mice and no mutations were detected. The expression of Rai2 was quantitatively identical in wild-type and Xcat mutant eyes. These results indicate that the Xcat mutation is within a novel, undiscovered gene.
Bhore, Subhash J; Kassim, Amelia; Loh, Chye Ying; Shah, Farida H
2010-01-01
It is well known that the nutritional quality of the American oil-palm (Elaeis oleifera) mesocarp oil is superior to that of African oil-palm (Elaeis guineensis Jacq. Tenera) mesocarp oil. Therefore, it is of important to identify the genetic features for its superior value. This could be achieved through the genome sequencing of the oil-palm. However, the genome sequence is not available in the public domain due to commercial secrecy. Hence, we constructed a cDNA library and generated expressed sequence tags (3,205) from the mesocarp tissue of the American oil-palm. We continued to annotate each of these cDNAs after submitting to GenBank/DDBJ/EMBL. A rough analysis turned our attention to the beta-carotene hydroxylase (Chyb) enzyme encoding cDNA. Then, we completed the full sequencing of cDNA clone for its both strands using M13 forward and reverse primers. The full nucleotide and protein sequence was further analyzed and annotated using various Bioinformatics tools. The analysis results showed the presence of fatty acid hydroxylase superfamily domain in the protein sequence. The multiple sequence alignment of selected Chyb amino acid sequences from other plant species and algal members with E. oleifera Chyb using ClustalW and its phylogenetic analysis suggest that Chyb from monocotyledonous plant species, Lilium hubrid, Crocus sativus and Zea mays are the most evolutionary related with E. oleifera Chyb. This study reports the annotation of E. oleifera Chyb. Abbreviations ESTs - expressed sequence tags, EoChyb - Elaeis oleifera beta-carotene hydroxylase, MC - main cluster PMID:21364789
Aya, Koichiro; Kobayashi, Masaaki; Tanaka, Junmu; Ohyanagi, Hajime; Suzuki, Takayuki; Yano, Kenji; Takano, Tomoyuki; Yano, Kentaro; Matsuoka, Makoto
2015-01-01
During plant evolution, ferns originally evolved as a major vascular plant with a distinctive life cycle in which the haploid and diploid generations are completely separated. However, the low level of genetic resources has limited studies of their physiological events, as well as hindering research on the evolutionary history of land plants. In this study, to identify a comprehensive catalog of transcripts and characterize their expression traits in the fern Lygodium japonicum, nine different RNA samples isolated from prothalli, trophophylls, rhizomes and sporophylls were sequenced using Roche 454 GS-FLX and Illumina HiSeq sequencers. The hybrid assembly of the high-quality 454 GS-FLX and Illumina HiSeq reads generated a set of 37,830 isoforms with an average length of 1,444 bp. Using four open reading frame (ORF) predictors, 38,142 representative ORFs were identified from a total of 37,830 transcript isoforms and 95 contigs, which were annotated by searching against several public databases. Furthermore, an orthoMCL analysis using the protein sequences of L. japonicum and five model plants revealed various sets of lineage-specific genes, including those detected among land plant lineages and those detected in only L. japonicum. We have also examined the expression patterns of all contigs/isoforms, along with the life cycle of L. japonicum, and identified the tissue-specific transcripts using statistical expression analyses. Finally, we developed a public web resource, the L. japonicum transcriptome database at http://bioinf.mind.meiji.ac.jp/kanikusa/, which provides important opportunities to accelerate molecular research in ferns. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Guo, Baozhu; Chen, Xiaoping; Dang, Phat; Scully, Brian T; Liang, Xuanqiang; Holbrook, C Corley; Yu, Jiujiang; Culbreath, Albert K
2008-01-01
Background Peanut (Arachis hypogaea L.) is an important crop economically and nutritionally, and is one of the most susceptible host crops to colonization of Aspergillus parasiticus and subsequent aflatoxin contamination. Knowledge from molecular genetic studies could help to devise strategies in alleviating this problem; however, few peanut DNA sequences are available in the public database. In order to understand the molecular basis of host resistance to aflatoxin contamination, a large-scale project was conducted to generate expressed sequence tags (ESTs) from developing seeds to identify resistance-related genes involved in defense response against Aspergillus infection and subsequent aflatoxin contamination. Results We constructed six different cDNA libraries derived from developing peanut seeds at three reproduction stages (R5, R6 and R7) from a resistant and a susceptible cultivated peanut genotypes, 'Tifrunner' (susceptible to Aspergillus infection with higher aflatoxin contamination and resistant to TSWV) and 'GT-C20' (resistant to Aspergillus with reduced aflatoxin contamination and susceptible to TSWV). The developing peanut seed tissues were challenged by A. parasiticus and drought stress in the field. A total of 24,192 randomly selected cDNA clones from six libraries were sequenced. After removing vector sequences and quality trimming, 21,777 high-quality EST sequences were generated. Sequence clustering and assembling resulted in 8,689 unique EST sequences with 1,741 tentative consensus EST sequences (TCs) and 6,948 singleton ESTs. Functional classification was performed according to MIPS functional catalogue criteria. The unique EST sequences were divided into twenty-two categories. A similarity search against the non-redundant protein database available from NCBI indicated that 84.78% of total ESTs showed significant similarity to known proteins, of which 165 genes had been previously reported in peanuts. There were differences in overall expression patterns in different libraries and genotypes. A number of sequences were expressed throughout all of the libraries, representing constitutive expressed sequences. In order to identify resistance-related genes with significantly differential expression, a statistical analysis to estimate the relative abundance (R) was used to compare the relative abundance of each gene transcripts in each cDNA library. Thirty six and forty seven unique EST sequences with threshold of R > 4 from libraries of 'GT-C20' and 'Tifrunner', respectively, were selected for examination of temporal gene expression patterns according to EST frequencies. Nine and eight resistance-related genes with significant up-regulation were obtained in 'GT-C20' and 'Tifrunner' libraries, respectively. Among them, three genes were common in both genotypes. Furthermore, a comparison of our EST sequences with other plant sequences in the TIGR Gene Indices libraries showed that the percentage of peanut EST matched to Arabidopsis thaliana, maize (Zea mays), Medicago truncatula, rapeseed (Brassica napus), rice (Oryza sativa), soybean (Glycine max) and wheat (Triticum aestivum) ESTs ranged from 33.84% to 79.46% with the sequence identity ≥ 80%. These results revealed that peanut ESTs are more closely related to legume species than to cereal crops, and more homologous to dicot than to monocot plant species. Conclusion The developed ESTs can be used to discover novel sequences or genes, to identify resistance-related genes and to detect the differences among alleles or markers between these resistant and susceptible peanut genotypes. Additionally, this large collection of cultivated peanut EST sequences will make it possible to construct microarrays for gene expression studies and for further characterization of host resistance mechanisms. It will be a valuable genomic resource for the peanut community. The 21,777 ESTs have been deposited to the NCBI GenBank database with accession numbers ES702769 to ES724546. PMID:18248674
Song, Junfang; Duc, Céline; Storey, Kate G.; McLean, W. H. Irwin; Brown, Sara J.; Simpson, Gordon G.; Barton, Geoffrey J.
2014-01-01
The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3′ untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3′ polyadenylation sites to within +/− 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3′ UTR re-annotation (including extension of one 3′ UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data. PMID:24722185
Makita, Yuko; Kawashima, Mika; Lau, Nyok Sean; Othman, Ahmad Sofiman; Matsui, Minami
2018-01-19
Natural rubber is an economically important material. Currently the Pará rubber tree, Hevea brasiliensis is the main commercial source. Little is known about rubber biosynthesis at the molecular level. Next-generation sequencing (NGS) technologies brought draft genomes of three rubber cultivars and a variety of RNA sequencing (RNA-seq) data. However, no current genome or transcriptome databases (DB) are organized by gene. A gene-oriented database is a valuable support for rubber research. Based on our original draft genome sequence of H. brasiliensis RRIM600, we constructed a rubber tree genome and transcriptome DB. Our DB provides genome information including gene functional annotations and multi-transcriptome data of RNA-seq, full-length cDNAs including PacBio Isoform sequencing (Iso-Seq), ESTs and genome wide transcription start sites (TSSs) derived from CAGE technology. Using our original and publically available RNA-seq data, we calculated co-expressed genes for identifying functionally related gene sets and/or genes regulated by the same transcription factor (TF). Users can access multi-transcriptome data through both a gene-oriented web page and a genome browser. For the gene searching system, we provide keyword search, sequence homology search and gene expression search; users can also select their expression threshold easily. The rubber genome and transcriptome DB provides rubber tree genome sequence and multi-transcriptomics data. This DB is useful for comprehensive understanding of the rubber transcriptome. This will assist both industrial and academic researchers for rubber and economically important close relatives such as R. communis, M. esculenta and J. curcas. The Rubber Transcriptome DB release 2017.03 is accessible at http://matsui-lab.riken.jp/rubber/ .
Cheng, Yunqing; Liu, Jianfeng; Zhang, Huidi; Wang, Ju; Zhao, Yixin; Geng, Wanting
2015-01-01
A high ratio of blank fruit in hazelnut (Corylus heterophylla Fisch) is a very common phenomenon that causes serious yield losses in northeast China. The development of blank fruit in the Corylus genus is known to be associated with embryo abortion. However, little is known about the molecular mechanisms responsible for embryo abortion during the nut development stage. Genomic information for C. heterophylla Fisch is not available; therefore, data related to transcriptome and gene expression profiling of developing and abortive ovules are needed. In this study, de novo transcriptome sequencing and RNA-seq analysis were conducted using short-read sequencing technology (Illumina HiSeq 2000). The results of the transcriptome assembly analysis revealed genetic information that was associated with the fruit development stage. Two digital gene expression libraries were constructed, one for a full (normally developing) ovule and one for an empty (abortive) ovule. Transcriptome sequencing and assembly results revealed 55,353 unigenes, including 18,751 clusters and 36,602 singletons. These results were annotated using the public databases NR, NT, Swiss-Prot, KEGG, COG, and GO. Using digital gene expression profiling, gene expression differences in developing and abortive ovules were identified. A total of 1,637 and 715 unigenes were significantly upregulated and downregulated, respectively, in abortive ovules, compared with developing ovules. Quantitative real-time polymerase chain reaction analysis was used in order to verify the differential expression of some genes. The transcriptome and digital gene expression profiling data of normally developing and abortive ovules in hazelnut provide exhaustive information that will improve our understanding of the molecular mechanisms of abortive ovule formation in hazelnut.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mangelsen, Elke; Kilian, Joachim; Berendzen, Kenneth W.
2008-02-01
WRKY proteins belong to the WRKY-GCM1 superfamily of zinc finger transcription factors that have been subject to a large plant-specific diversification. For the cereal crop barley (Hordeum vulgare), three different WRKY proteins have been characterized so far, as regulators in sucrose signaling, in pathogen defense, and in response to cold and drought, respectively. However, their phylogenetic relationship remained unresolved. In this study, we used the available sequence information to identify a minimum number of 45 barley WRKY transcription factor (HvWRKY) genes. According to their structural features the HvWRKY factors were classified into the previously defined polyphyletic WRKY subgroups 1 tomore » 3. Furthermore, we could assign putative orthologs of the HvWRKY proteins in Arabidopsis and rice. While in most cases clades of orthologous proteins were formed within each group or subgroup, other clades were composed of paralogous proteins for the grasses and Arabidopsis only, which is indicative of specific gene radiation events. To gain insight into their putative functions, we examined expression profiles of WRKY genes from publicly available microarray data resources and found group specific expression patterns. While putative orthologs of the HvWRKY transcription factors have been inferred from phylogenetic sequence analysis, we performed a comparative expression analysis of WRKY genes in Arabidopsis and barley. Indeed, highly correlative expression profiles were found between some of the putative orthologs. HvWRKY genes have not only undergone radiation in monocot or dicot species, but exhibit evolutionary traits specific to grasses. HvWRKY proteins exhibited not only sequence similarities between orthologs with Arabidopsis, but also relatedness in their expression patterns. This correlative expression is indicative for a putative conserved function of related WRKY proteins in mono- and dicot species.« less
MicroRNA-7a regulates Müller glia differentiation by attenuating Notch3 expression.
Baba, Yukihiro; Aihara, Yuko; Watanabe, Sumiko
2015-09-01
miRNA-7a plays critical roles in various biological aspects in health and disease. We aimed to reveal roles of miR-7a in mouse retinal development by loss- and gain-of-function analyses of miR-7a. Plasmids encoding miR-7a or miR-7a-decoy (anti-sense miR-7a) were introduced into mouse retina at P0, and the retina was cultured as explant. Then, proliferation of retinal progenitors and differentiation of retinal subtypes were examined by immunostaining. miR-7a had no apparent effect on the proliferation of retinal progenitor cells. However, the expression of Müller glia marker, cyclin D3, was reduced by miR-7a overexpression and up-regulated by miR-7a decoy, suggesting that miR-7a negatively regulates differentiation of Müller glia. Targets of miR-7a, which were predicted by using a public program miRNA.org, and Notch3 was suggested to be one of candidate genes of miR-7a target. Notch3 3' UTR appeared to contain complementary sequence to the seed sequence of miR-7a. A reporter assay in NIH3T3 cells using a plasmid containing multiple repeats of potential target sequence of 3' Notch UTR showed that miR-7a suppress expression of reporter EGFP through 3'UTR region. Expression of sh-Notch3 and over-expression of NICD3 in retina suggested that miR-7a regulates Müller glia differentiation through attenuation of Notch3 expression. Taken together, we revealed that the miR-7a regulates the differentiation of Müller glia through the suppression of Notch3 expression. Copyright © 2015 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shi, CY; Yang, H; Wei, CL
Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Using high-throughput Illumina RNA-seq, the transcriptome from poly (A){sup +} RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled intomore » 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis.« less
2011-01-01
Background Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Results Using high-throughput Illumina RNA-seq, the transcriptome from poly (A)+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). Conclusions An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis. PMID:21356090
Chang, Yaqing; Zhao, Wenming; Du, Zhenlin; Hao, Zhenlin
2015-01-01
Shell color is an important trait that is used in breeding the Japanese scallop Patinopecten yessoensis, the most economically important scallop species in China. We constructed four transcriptome libraries from different shell color lines of P. yessoensis: the left and right shell mantles of ordinary strains of P. yessoensis and the left shell mantles of the ‘Ivory’ and ‘Maple’ strains. These four libraries were paired-end sequenced using the Illumina HiSeq 2000 platform and contained 54,802,692 sequences, 40,798,962 sequences, 74,019,262 sequences, and 44,466,166 sequences, respectively. A total of 214,087,082 expressed sequence tags were assembled into 73,522 unigenes with an average size of 1,163 bp. When the data were compared against the public Nr and Swiss-Prot databases using BlastX, nearly 30.55% (22,458) of the unigenes were significantly matched to known unique proteins. Gene Ontology annotation and pathway mapping analysis using the Kyoto Encyclopedia of Genes and Genomes categorized unigenes according to their diverse biological functions and processes and identified candidate genes that were potentially involved in growth, pigmentation, metal transcription, and immunity. Expression profile analysis was performed on all four libraries and many differentially expressed genes were identified. In addition, 5,772 simple sequence repeats were obtained from the P. yessoensis transcriptomes, and 464,197, 395,646, and 310,649 single nucleotide polymorphisms were revealed in the ordinary strains, the ‘Ivory’ strain, and the ‘Maple’ strain, respectively. These results provide valuable information for future genomic studies on P. yessoensis and improve our understanding of the molecular mechanisms involved in the growth, immunity, shell coloring, and shell biomineralization of this species. These resources also may be used in a variety of applications, such as trait mapping, marker-assisted breeding, studies of population genetics and genomics, and work on functional genomics. PMID:25680107
NABIC: A New Access Portal to Search, Visualize, and Share Agricultural Genomics Data.
Seol, Young-Joo; Lee, Tae-Ho; Park, Dong-Suk; Kim, Chang-Kug
2016-01-01
The National Agricultural Biotechnology Information Center developed an access portal to search, visualize, and share agricultural genomics data with a focus on South Korean information and resources. The portal features an agricultural biotechnology database containing a wide range of omics data from public and proprietary sources. We collected 28.4 TB of data from 162 agricultural organisms, with 10 types of omics data comprising next-generation sequencing sequence read archive, genome, gene, nucleotide, DNA chip, expressed sequence tag, interactome, protein structure, molecular marker, and single-nucleotide polymorphism datasets. Our genomic resources contain information on five animals, seven plants, and one fungus, which is accessed through a genome browser. We also developed a data submission and analysis system as a web service, with easy-to-use functions and cutting-edge algorithms, including those for handling next-generation sequencing data.
Sahu, Dinesh K; Panda, Soumya P; Panda, Sujata; Das, Paramananda; Meher, Prem K; Hazra, Rupenangshu K; Peatman, Eric; Liu, Zhanjiang J; Eknath, Ambekar E; Nandi, Samiran
2013-07-15
Labeo rohita (Ham.) also called rohu is the most important freshwater aquaculture species on the Indian sub continent. Monsoon dependent breeding restricts its seed production beyond season indicating a strong genetic control about which very limited information is available. Additionally, few genomic resources are publicly available for this species. Here we sought to identify reproduction-relevant genes from normalized cDNA libraries of the brain-pituitary-gonad-liver (BPGL-axis) tissues of adult L. rohita collected during post preparatory phase. 6161 random clones sequenced (Sanger-based) from these libraries produced 4642 (75.34%) high-quality sequences. They were assembled into 3631 (78.22%) unique sequences composed of 709 contigs and 2922 singletons. A total of 182 unique sequences were found to be associated with reproduction-related genes, mainly under the GO term categories of reproduction, neuro-peptide hormone activity, hormone and receptor binding, receptor activity, signal transduction, embryonic development, cell-cell signaling, cell death and anti-apoptosis process. Several important reproduction-related genes reported here for the first time in L. rohita are zona pellucida sperm-binding protein 3, aquaporin-12, spermine oxidase, sperm associated antigen 7, testis expressed 261, progesterone receptor membrane component, Neuropeptide Y and Pro-opiomelanocortin. Quantitative RT-PCR-based analyses of 8 known and 8 unknown transcripts during preparatory and post-spawning phase showed increased expression level of most of the transcripts during preparatory phase (except Neuropeptide Y) in comparison to post-spawning phase indicating possible roles in initiation of gonad maturation. Expression of unknown transcripts was also found in prolific breeder common carp and tilapia, but levels of expression were much higher in seasonal breeder rohu. 3631 unique sequences contained 236 (6.49%) putative microsatellites with the AG (28.16%) repeat as the most frequent motif. Twenty loci showed polymorphism in 36 unrelated individuals with allele frequency ranging from 2 to 7 per locus. The observed heterozygosity ranged from 0.096 to 0.774 whereas the expected heterozygosity ranged from 0.109 to 0.801. Identification of 182 important reproduction-related genes and expression pattern of 16 transcripts in preparatory and post-spawning phase along with 20 polymorphic EST-SSRs should be highly useful for the future reproductive molecular studies and selection program in Labeo rohita. Copyright © 2013 Elsevier B.V. All rights reserved.
Heekin, Andrew M; Guerrero, Felix D; Bendele, Kylie G; Saldivar, Leo; Scoles, Glen A; Dowd, Scot E; Gondro, Cedric; Nene, Vishvanath; Djikeng, Appolinaire; Brayton, Kelly A
2013-09-01
As it feeds upon cattle, Rhipicephalus (Boophilus) microplus is capable of transmitting a number of pathogenic organisms, including the apicomplexan hemoparasite Babesia bovis, a causative agent of bovine babesiosis. The R. microplus female gut transcriptome was studied for two cohorts: adult females feeding on a bovine host infected with B. bovis and adult females feeding on an uninfected bovine. RNA was purified and used to generate a subtracted cDNA library from B. bovis-infected female gut, and 4,077 expressed sequence tags (ESTs) were sequenced. Gene expression was also measured by a microarray designed from the publicly available R. microplus gene index: BmiGI Version 2. We compared gene expression in the tick gut from females feeding upon an uninfected bovine to gene expression in tick gut from females feeding upon a splenectomized bovine infected with B. bovis. Thirty-three ESTs represented on the microarray were expressed at a higher level in female gut samples from the ticks feeding upon a B. bovis-infected calf compared to expression levels in female gut samples from ticks feeding on an uninfected calf. Forty-three transcripts were expressed at a lower level in the ticks feeding upon B. bovis-infected female guts compared with expression in female gut samples from ticks feeding on the uninfected calf. These array data were used as initial characterization of gene expression associated with the infection of R. microplus by B. bovis.
Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing.
Vega-Arreguín, Julio C; Ibarra-Laclette, Enrique; Jiménez-Moraila, Beatriz; Martínez, Octavio; Vielle-Calzada, Jean Philippe; Herrera-Estrella, Luis; Herrera-Estrella, Alfredo
2009-07-06
In-depth sequencing analysis has not been able to determine the overall complexity of transcriptional activity of a plant organ or tissue sample. In some cases, deep parallel sequencing of Expressed Sequence Tags (ESTs), although not yet optimized for the sequencing of cDNAs, has represented an efficient procedure for validating gene prediction and estimating overall gene coverage. This approach could be very valuable for complex plant genomes. In addition, little emphasis has been given to efforts aiming at an estimation of the overall transcriptional universe found in a multicellular organism at a specific developmental stage. To explore, in depth, the transcriptional diversity in an ancient maize landrace, we developed a protocol to optimize the sequencing of cDNAs and performed 4 consecutive GS20-454 pyrosequencing runs of a cDNA library obtained from 2 week-old Palomero Toluqueño maize plants. The protocol reported here allowed obtaining over 90% of informative sequences. These GS20-454 runs generated over 1.5 Million reads, representing the largest amount of sequences reported from a single plant cDNA library. A collection of 367,391 quality-filtered reads (30.09 Mb) from a single run was sufficient to identify transcripts corresponding to 34% of public maize ESTs databases; total sequences generated after 4 filtered runs increased this coverage to 50%. Comparisons of all 1.5 Million reads to the Maize Assembled Genomic Islands (MAGIs) provided evidence for the transcriptional activity of 11% of MAGIs. We estimate that 5.67% (86,069 sequences) do not align with public ESTs or annotated genes, potentially representing new maize transcripts. Following the assembly of 74.4% of the reads in 65,493 contigs, real-time PCR of selected genes confirmed a predicted correlation between the abundance of GS20-454 sequences and corresponding levels of gene expression. A protocol was developed that significantly increases the number, length and quality of cDNA reads using massive 454 parallel sequencing. We show that recurrent 454 pyrosequencing of a single cDNA sample is necessary to attain a thorough representation of the transcriptional universe present in maize, that can also be used to estimate transcript abundance of specific genes. This data suggests that the molecular and functional diversity contained in the vast native landraces remains to be explored, and that large-scale transcriptional sequencing of a presumed ancestor of the modern maize varieties represents a valuable approach to characterize the functional diversity of maize for future agricultural and evolutionary studies.
Zhang, Zhijun; Zhang, Pengjun; Li, Weidi; Zhang, Jinming; Huang, Fang; Yang, Jian; Bei, Yawei; Lu, Yaobin
2013-05-01
The western flower thrips (WFT), Frankliniella occidentalis, a world-wide invasive insect, causes agricultural damage by directly feeding and by indirectly vectoring Tospoviruses, such as Tomato spotted wilt virus (TSWV). We characterized the transcriptome of WFT and analyzed global gene expression of WFT response to TSWV infection using Illumina sequencing platform. We compiled 59,932 unigenes, and identified 36,339 unigenes by similarity analysis against public databases, most of which were annotated using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. Within these annotated transcripts, we collected 278 sequences related to insecticide resistance. GO and KEGG analysis of different expression genes between TSWV-infected and non-infected WFT population revealed that TSWV can regulate cellular process and immune response, which might lead to low virus titers in thrips cells and no detrimental effects on F. occidentalis. This data-set not only enriches genomic resource for WFT, but also benefits research into its molecular genetics and functional genomics. Copyright © 2013 Elsevier Inc. All rights reserved.
Epigenomics and bolting tolerance in sugar beet genotypes.
Hébrard, Claire; Peterson, Daniel G; Willems, Glenda; Delaunay, Alain; Jesson, Béline; Lefèbvre, Marc; Barnes, Steve; Maury, Stéphane
2016-01-01
In sugar beet (Beta vulgaris altissima), bolting tolerance is an essential agronomic trait reflecting the bolting response of genotypes after vernalization. Genes involved in induction of sugar beet bolting have now been identified, and evidence suggests that epigenetic factors are involved in their control. Indeed, the time course and amplitude of DNA methylation variations in the shoot apical meristem have been shown to be critical in inducing sugar beet bolting, and a few functional targets of DNA methylation during vernalization have been identified. However, molecular mechanisms controlling bolting tolerance levels among genotypes are still poorly understood. Here, gene expression and DNA methylation profiles were compared in shoot apical meristems of three bolting-resistant and three bolting-sensitive genotypes after vernalization. Using Cot fractionation followed by 454 sequencing of the isolated low-copy DNA, 6231 contigs were obtained that were used along with public sugar beet DNA sequences to design custom Agilent microarrays for expression (56k) and methylation (244k) analyses. A total of 169 differentially expressed genes and 111 differentially methylated regions were identified between resistant and sensitive vernalized genotypes. Fourteen sequences were both differentially expressed and differentially methylated, with a negative correlation between their methylation and expression levels. Genes involved in cold perception, phytohormone signalling, and flowering induction were over-represented and collectively represent an integrative gene network from environmental perception to bolting induction. Altogether, the data suggest that the genotype-dependent control of DNA methylation and expression of an integrative gene network participate in bolting tolerance in sugar beet, opening up perspectives for crop improvement. © The Author 2015. Published by Oxford University Press on behalf of the Society for Experimental Biology.
Karakülah, Gökhan
2017-06-28
Novel transcript discovery through RNA sequencing has substantially improved our understanding of the transcriptome dynamics of biological systems. Endogenous target mimicry (eTM) transcripts, a novel class of regulatory molecules, bind to their target microRNAs (miRNAs) by base pairing and block their biological activity. The objective of this study was to provide a computational analysis framework for the prediction of putative eTM sequences in plants, and as an example, to discover previously un-annotated eTMs in Prunus persica (peach) transcriptome. Therefore, two public peach transcriptome libraries downloaded from Sequence Read Archive (SRA) and a previously published set of long non-coding RNAs (lncRNAs) were investigated with multi-step analysis pipeline, and 44 putative eTMs were found. Additionally, an eTM-miRNA-mRNA regulatory network module associated with peach fruit organ development was built via integration of the miRNA target information and predicted eTM-miRNA interactions. My findings suggest that one of the most widely expressed miRNA families among diverse plant species, miR156, might be potentially sponged by seven putative eTMs. Besides, the study indicates eTMs potentially play roles in the regulation of development processes in peach fruit via targeting specific miRNAs. In conclusion, by following the step-by step instructions provided in this study, novel eTMs can be identified and annotated effectively in public plant transcriptome libraries.
Update on Genomic Databases and Resources at the National Center for Biotechnology Information.
Tatusova, Tatiana
2016-01-01
The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data.
NABIC: A New Access Portal to Search, Visualize, and Share Agricultural Genomics Data
Seol, Young-Joo; Lee, Tae-Ho; Park, Dong-Suk; Kim, Chang-Kug
2016-01-01
The National Agricultural Biotechnology Information Center developed an access portal to search, visualize, and share agricultural genomics data with a focus on South Korean information and resources. The portal features an agricultural biotechnology database containing a wide range of omics data from public and proprietary sources. We collected 28.4 TB of data from 162 agricultural organisms, with 10 types of omics data comprising next-generation sequencing sequence read archive, genome, gene, nucleotide, DNA chip, expressed sequence tag, interactome, protein structure, molecular marker, and single-nucleotide polymorphism datasets. Our genomic resources contain information on five animals, seven plants, and one fungus, which is accessed through a genome browser. We also developed a data submission and analysis system as a web service, with easy-to-use functions and cutting-edge algorithms, including those for handling next-generation sequencing data. PMID:26848255
Expression of a polyubiquitin promoter isolated from Gladiolus.
Joung, Young Hee; Kamo, Kathryn
2006-10-01
A polyubiquitin promoter (GUBQ1) including its 5'UTR and intron was isolated from the floral monocot Gladiolus because high levels of expression could not be obtained using publicly available promoters isolated from either cereals or dicots. Sequencing of the promoter revealed highly conserved 5' and 3' intron splicing sites for the 1.234 kb intron. The coding sequence of the first two ubiquitin genes showed the highest homology (87 and 86%, respectively) to the ubiquitin genes of Nicotiana tabacum and Oryza sativa RUBQ2. Transient expression following gene gun bombardment showed that relative levels of GUS activity with the GUBQ1 promoter were comparable to the CaMV 35S promoter in gladiolus, tobacco, rose, rice, and the floral monocot freesia. The highest levels of GUS expression with GUBQ1 were attained with Gladiolus. The full-length GUBQ1 promoter including 5'UTR and intron were necessary for maximum GUS expression in Gladiolus. The relative GUS activity for the promoter only was 9%, and the activity for the promoter with 5'UTR and 399 bp of the full-length 1.234 kb intron was 41%. Arabidopsis plants transformed with uidA under GUBQ1 showed moderate GUS expression throughout young leaves and in the vasculature of older leaves. The highest levels of transient GUS expression in Gladiolus have been achieved using the GUBQ1 promoter. This promoter should be useful for genetic engineering of disease resistance in Gladiolus, rose, and freesia, where high levels of gene expression are important.
MEPD: a Medaka gene expression pattern database
Henrich, Thorsten; Ramialison, Mirana; Quiring, Rebecca; Wittbrodt, Beate; Furutani-Seiki, Makoto; Wittbrodt, Joachim; Kondoh, Hisato
2003-01-01
The Medaka Expression Pattern Database (MEPD) stores and integrates information of gene expression during embryonic development of the small freshwater fish Medaka (Oryzias latipes). Expression patterns of genes identified by ESTs are documented by images and by descriptions through parameters such as staining intensity, category and comments and through a comprehensive, hierarchically organized dictionary of anatomical terms. Sequences of the ESTs are available and searchable through BLAST. ESTs in the database are clustered upon entry and have been blasted against public data-bases. The BLAST results are updated regularly, stored within the database and searchable. The MEPD is a project within the Medaka Genome Initiative (MGI) and entries will be interconnected to integrated genomic map databases. MEPD is accessible through the WWW at http://medaka.dsp.jst.go.jp/MEPD. PMID:12519950
Del Piccolo, Lidia; de Haes, Hanneke; Heaven, Cathy; Jansen, Jesse; Verheul, William; Bensing, Jozien; Bergvik, Svein; Deveugele, Myriam; Eide, Hilde; Fletcher, Ian; Goss, Claudia; Humphris, Gerry; Kim, Young-Mi; Langewitz, Wolf; Mazzi, Maria Angela; Mjaaland, Trond; Moretti, Francesca; Nübling, Matthias; Rimondini, Michela; Salmon, Peter; Sibbern, Tonje; Skre, Ingunn; van Dulmen, Sandra; Wissow, Larry; Young, Bridget; Zandbelt, Linda; Zimmermann, Christa; Finset, Arnstein
2011-02-01
To present a method to classify health provider responses to patient cues and concerns according to the VR-CoDES-CC (Del Piccolo et al. (2009) [2] and Zimmermann et al. (submitted for publication) [3]). The system permits sequence analysis and a detailed description of how providers handle patient's expressions of emotion. The Verona-CoDES-P system has been developed based on consensus views within the "Verona Network of Sequence Analysis". The different phases of the creation process are described in detail. A reliability study has been conducted on 20 interviews from a convenience sample of 104 psychiatric consultations. The VR-CoDES-P has two main classes of provider responses, corresponding to the degree of explicitness (yes/no) and space (yes/no) that is given by the health provider to each cue/concern expressed by the patient. The system can be further subdivided into 17 individual categories. Statistical analyses showed that the VR-CoDES-P is reliable (agreement 92.86%, Cohen's kappa 0.90 (±0.04) p<0.0001). Once validity and reliability are tested in different settings, the system should be applied to investigate the relationship between provider responses to patients' expression of emotions and outcome variables. Research employing the VR-CoDES-P should be applied to develop research-based approaches to maximize appropriate responses to patients' indirect and overt expressions of emotional needs. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Gutierrez-Gonzalez, Juan J; Garvin, David F
2016-11-01
Vitamin E is essential for humans and thus must be a component of a healthy diet. Among the cereal grains, hexaploid oats (Avena sativa L.) have high vitamin E content. To date, no gene sequences in the vitamin E biosynthesis pathway have been reported for oats. Using deep sequencing and orthology-guided assembly, coding sequences of genes for each step in vitamin E synthesis in oats were reconstructed, including resolution of the sequences of homeologs. Three homeologs, presumably representing each of the three oat subgenomes, were identified for the main steps of the pathway. Partial sequences, likely representing pseudogenes, were recovered in some instances as well. Pairwise comparisons among homeologs revealed that two of the three putative subgenome-specific homeologs are almost identical for each gene. Synonymous substitution rates indicate the time of divergence of the two more similar subgenomes from the distinct one at 7.9-8.7 MYA, and a divergence between the similar subgenomes from a common ancestor 1.1 MYA. A new proposed evolutionary model for hexaploid oat formation is discussed. Homeolog-specific gene expression was quantified during oat seed development and compared with vitamin E accumulation. Homeolog expression largely appears to be similar for most of genes; however, for some genes, homoeolog-specific transcriptional bias was observed. The expression of HPPD, as well as certain homoeologs of VTE2 and VTE4, is highly correlated with seed vitamin E accumulation. Our findings expand our understanding of oat genome evolution and will assist efforts to modify vitamin E content and composition in oats. Published 2016. This article is a U.S. Government work and is in the public domain in the USA. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Bringing the fathead minnow (Pimephales promelas) into the ...
The fathead minnow (Pimephales promelas) is a well-established ecotoxicological model organism that has been widely used for regulatory ecotoxicity testing and research for over a half century. Throughout this time, a lot of knowledge has been gained about the fathead minnow’s biological responses to various xenobiotics. However, despite its importance as a model organism, the fathead minnow still has few publicly available gene sequences. Recently, Burns et al. (2015; Environ. Toxicol. Chem. 35:212) described the sequencing and de-novo assembly of the fathead minnow genome. Two draft genome assemblies are now publicly available on the GenBank database. However, on their own the draft assemblies remain of limited use to researchers who are primarily interested in the functional units of the genome, i.e. the genes. In the present study, an annotation pipeline, consisting of gene prediction, evidence alignment, and data synthesis, was applied to the fathead minnow SOAPdenovo assembly. Ab initio gene prediction was performed using AUGUSTUS, which provided a starting point of 43,345 gene predictions. Fathead minnow Expressed Sequence Tags (ESTs) and zebrafish protein-coding sequences (CDSs) were then aligned to the assembly using the corresponding spliced alignment methods of the program Exonerate. Of the over 240,000 EST alignments, 73% were successfully aligned with 90% or greater sequence identity and query coverage. Similarly, 39% of nearly 45,000 zebrafish co
Schallmey, Marcus; Koopmeiners, Julia; Wells, Elizabeth; Wardenga, Rainer; Schallmey, Anett
2014-12-01
Halohydrin dehalogenases are very rare enzymes that are naturally involved in the mineralization of halogenated xenobiotics. Due to their catalytic potential and promiscuity, many biocatalytic reactions have been described that have led to several interesting and industrially important applications. Nevertheless, only a few of these enzymes have been made available through recombinant techniques; hence, it is of general interest to expand the repertoire of these enzymes so as to enable novel biocatalytic applications. After the identification of specific sequence motifs, 37 novel enzyme sequences were readily identified in public sequence databases. All enzymes that could be heterologously expressed also catalyzed typical halohydrin dehalogenase reactions. Phylogenetic inference for enzymes of the halohydrin dehalogenase enzyme family confirmed that all enzymes form a distinct monophyletic clade within the short-chain dehydrogenase/reductase superfamily. In addition, the majority of novel enzymes are substantially different from previously known phylogenetic subtypes. Consequently, four additional phylogenetic subtypes were defined, greatly expanding the halohydrin dehalogenase enzyme family. We show that the enormous wealth of environmental and genome sequences present in public databases can be tapped for in silico identification of very rare but biotechnologically important biocatalysts. Our findings help to readily identify halohydrin dehalogenases in ever-growing sequence databases and, as a consequence, make even more members of this interesting enzyme family available to the scientific and industrial community. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
Zhu, Haisheng; Liu, Jianting; Wen, Qingfang; Chen, Mindong; Wang, Bin; Zhang, Qianrong; Xue, Zhuzheng
2017-01-01
Fresh-cut luffa (Luffa cylindrica) fruits commonly undergo browning. However, little is known about the molecular mechanisms regulating this process. We used the RNA-seq technique to analyze the transcriptomic changes occurring during the browning of fresh-cut fruits from luffa cultivar 'Fusi-3'. Over 90 million high-quality reads were assembled into 58,073 Unigenes, and 60.86% of these were annotated based on sequences in four public databases. We detected 35,282 Unigenes with significant hits to sequences in the NCBInr database, and 24,427 Unigenes encoded proteins with sequences that were similar to those of known proteins in the Swiss-Prot database. Additionally, 20,546 and 13,021 Unigenes were similar to existing sequences in the Eukaryotic Orthologous Groups of proteins and Kyoto Encyclopedia of Genes and Genomes databases, respectively. Furthermore, 27,301 Unigenes were differentially expressed during the browning of fresh-cut luffa fruits (i.e., after 1-6 h). Moreover, 11 genes from five gene families (i.e., PPO, PAL, POD, CAT, and SOD) identified as potentially associated with enzymatic browning as well as four WRKY transcription factors were observed to be differentially regulated in fresh-cut luffa fruits. With the assistance of rapid amplification of cDNA ends technology, we obtained the full-length sequences of the 15 Unigenes. We also confirmed these Unigenes were expressed by quantitative real-time polymerase chain reaction analysis. This study provides a comprehensive transcriptome sequence resource, and may facilitate further studies aimed at identifying genes affecting luffa fruit browning for the exploitation of the underlying mechanism.
DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.
Yang, Jian-Hua; Qu, Liang-Hu
2012-01-01
Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.
Cheng, Linzhao; Hansen, Nancy F.; Zhao, Ling; Du, Yutao; Zou, Chunlin; Donovan, Frank X.; Chou, Bin-Kuan; Zhou, Guangyu; Li, Shijie; Dowey, Sarah N.; Ye, Zhaohui; Chandrasekharappa, Settara C.; Yang, Huanming; Mullikin, James C.; Liu, P. Paul
2012-01-01
Summary The utility of induced pluripotent stem cells (iPSCs) as models to study diseases and as sources for cell therapy depends on the integrity of their genomes. Despite recent publications of DNA sequence variations in the iPSCs, the true scope of such changes for the entire genome is not clear. Here we report the whole-genome sequencing of three human iPSC lines derived from two cell types of an adult donor by episomal vectors. The vector sequence was undetectable in the deeply sequenced iPSC lines. We identified 1058–1808 heterozygous single nucleotide variants (SNVs), but no copy number variants, in each iPSC line. Six to twelve of these SNVs were within coding regions in each iPSC line, but ~50% of them are synonymous changes and the remaining are not selectively enriched for known genes associated with cancers. Our data thus suggest that episome-mediated reprogramming is not inherently mutagenic during integration-free iPSC induction. PMID:22385660
Zhang, Xiaodong; Allan, Andrew C.; Li, Caixia; Wang, Yuanzhong; Yao, Qiuyang
2015-01-01
Gentiana rigescens is an important medicinal herb in China. The main validated medicinal component gentiopicroside is synthesized in shoots, but is mainly found in the plant’s roots. The gentiopicroside biosynthetic pathway and its regulatory control remain to be elucidated. Genome resources of gentian are limited. Next-generation sequencing (NGS) technologies can aid in supplying global gene expression profiles. In this study we present sequence and transcript abundance data for the root and leaf transcriptome of G. rigescens, obtained using the Illumina Hiseq2000. Over fifty million clean reads were obtained from leaf and root libraries. This yields 76,717 unigenes with an average length of 753 bp. Among these, 33,855 unigenes were identified as putative homologs of annotated sequences in public protein and nucleotide databases. Digital abundance analysis identified 3306 unigenes differentially enriched between leaf and root. Unigenes found in both tissues were categorized according to their putative functional categories. Of the differentially expressed genes, over 130 were annotated as related to terpenoid biosynthesis. This work is the first study of global transcriptome analyses in gentian. These sequences and putative functional data comprise a resource for future investigation of terpenoid biosynthesis in Gentianaceae species and annotation of the gentiopicroside biosynthetic pathway and its regulatory mechanisms. PMID:26006235
miRNEST database: an integrative approach in microRNA search and annotation
Szcześniak, Michał Wojciech; Deorowicz, Sebastian; Gapski, Jakub; Kaczyński, Łukasz; Makałowska, Izabela
2012-01-01
Despite accumulating data on animal and plant microRNAs and their functions, existing public miRNA resources usually collect miRNAs from a very limited number of species. A lot of microRNAs, including those from model organisms, remain undiscovered. As a result there is a continuous need to search for new microRNAs. We present miRNEST (http://mirnest.amu.edu.pl), a comprehensive database of animal, plant and virus microRNAs. The core part of the database is built from our miRNA predictions conducted on Expressed Sequence Tags of 225 animal and 202 plant species. The miRNA search was performed based on sequence similarity and as many as 10 004 miRNA candidates in 221 animal and 199 plant species were discovered. Out of them only 299 have already been deposited in miRBase. Additionally, miRNEST has been integrated with external miRNA data from literature and 13 databases, which includes miRNA sequences, small RNA sequencing data, expression, polymorphisms and targets data as well as links to external miRNA resources, whenever applicable. All this makes miRNEST a considerable miRNA resource in a sense of number of species (544) that integrates a scattered miRNA data into a uniform format with a user-friendly web interface. PMID:22135287
Jouffe, Vincent; Rowe, Suzanne; Liaubet, Laurence; Buitenhuis, Bart; Hornshøj, Henrik; SanCristobal, Magali; Mormède, Pierre; de Koning, D J
2009-07-16
Microarray studies can supplement QTL studies by suggesting potential candidate genes in the QTL regions, which by themselves are too large to provide a limited selection of candidate genes. Here we provide a case study where we explore ways to integrate QTL data and microarray data for the pig, which has only a partial genome sequence. We outline various procedures to localize differentially expressed genes on the pig genome and link this with information on published QTL. The starting point is a set of 237 differentially expressed cDNA clones in adrenal tissue from two pig breeds, before and after treatment with adrenocorticotropic hormone (ACTH). Different approaches to localize the differentially expressed (DE) genes to the pig genome showed different levels of success and a clear lack of concordance for some genes between the various approaches. For a focused analysis on 12 genes, overlapping QTL from the public domain were presented. Also, differentially expressed genes underlying QTL for ACTH response were described. Using the latest version of the draft sequence, the differentially expressed genes were mapped to the pig genome. This enabled co-location of DE genes and previously studied QTL regions, but the draft genome sequence is still incomplete and will contain many errors. A further step to explore links between DE genes and QTL at the pathway level was largely unsuccessful due to the lack of annotation of the pig genome. This could be improved by further comparative mapping analyses but this would be time consuming. This paper provides a case study for the integration of QTL data and microarray data for a species with limited genome sequence information and annotation. The results illustrate the challenges that must be addressed but also provide a roadmap for future work that is applicable to other non-model species.
Siebert, Stefan; Robinson, Mark D; Tintori, Sophia C; Goetz, Freya; Helm, Rebecca R; Smith, Stephen A; Shaner, Nathan; Haddock, Steven H D; Dunn, Casey W
2011-01-01
We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing.
Siebert, Stefan; Robinson, Mark D.; Tintori, Sophia C.; Goetz, Freya; Helm, Rebecca R.; Smith, Stephen A.; Shaner, Nathan; Haddock, Steven H. D.; Dunn, Casey W.
2011-01-01
We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing. PMID:21829563
Delimiting regulatory sequences of the Drosophila melanogaster Ddc gene.
Hirsh, J; Morgan, B A; Scholnick, S B
1986-01-01
We delimited sequences necessary for in vivo expression of the Drosophila melanogaster dopa decarboxylase gene Ddc. The expression of in vitro-altered genes was assayed following germ line integration via P-element vectors. Sequences between -209 and -24 were necessary for normally regulated expression, although genes lacking these sequences could be expressed at 10 to 50% of wild-type levels at specific developmental times. These genes showed components of normal developmental expression, which suggests that they retain some regulatory elements. All Ddc genes lacking the normal immediate 5'-flanking sequences were grossly deficient in larval central nervous system expression. Thus, this upstream region must contain at least one element necessary for this expression. A mutated Ddc gene without a normal TATA boxlike sequence used the normal RNA start points, indicating that this sequences is not required for start point specificity. Images PMID:3099170
Least Squares Approximation By G1 Piecewise Parametric Cubes
1993-12-01
ADDRESS(ES) 10.SPONSORING/MONITORING AGENCY REPORT NUMBER 11. SUPPLEMENTARY NOTES The views expressed in this thesis are those of the author and do not...CODE Approved for public release; distribution is unlimited. 13. ABSTRACT (maximum 200 words) Parametric piecewise cubic polynomials are used throughout...piecewise parametric cubic polynomial to a sequence of ordered points in the plane. Cubic Bdzier curves are used as a basis. The parameterization, the
NCBI Epigenomics: a new public resource for exploring epigenomic data sets
Fingerman, Ian M.; McDaniel, Lee; Zhang, Xuan; Ratzat, Walter; Hassan, Tarek; Jiang, Zhifang; Cohen, Robert F.; Schuler, Gregory D.
2011-01-01
The Epigenomics database at the National Center for Biotechnology Information (NCBI) is a new resource that has been created to serve as a comprehensive public resource for whole-genome epigenetic data sets (www.ncbi.nlm.nih.gov/epigenomics). Epigenetics is the study of stable and heritable changes in gene expression that occur independently of the primary DNA sequence. Epigenetic mechanisms include post-translational modifications of histones, DNA methylation, chromatin conformation and non-coding RNAs. It has been observed that misregulation of epigenetic processes has been associated with human disease. We have constructed the new resource by selecting the subset of epigenetics-specific data from general-purpose archives, such as the Gene Expression Omnibus, and Sequence Read Archives, and then subjecting them to further review, annotation and reorganization. Raw data is processed and mapped to genomic coordinates to generate ‘tracks’ that are a visual representation of the data. These data tracks can be viewed using popular genome browsers or downloaded for local analysis. The Epigenomics resource also provides the user with a unique interface that allows for intuitive browsing and searching of data sets based on biological attributes. Currently, there are 69 studies, 337 samples and over 1100 data tracks from five well-studied species that are viewable and downloadable in Epigenomics. PMID:21075792
NCBI Epigenomics: a new public resource for exploring epigenomic data sets.
Fingerman, Ian M; McDaniel, Lee; Zhang, Xuan; Ratzat, Walter; Hassan, Tarek; Jiang, Zhifang; Cohen, Robert F; Schuler, Gregory D
2011-01-01
The Epigenomics database at the National Center for Biotechnology Information (NCBI) is a new resource that has been created to serve as a comprehensive public resource for whole-genome epigenetic data sets (www.ncbi.nlm.nih.gov/epigenomics). Epigenetics is the study of stable and heritable changes in gene expression that occur independently of the primary DNA sequence. Epigenetic mechanisms include post-translational modifications of histones, DNA methylation, chromatin conformation and non-coding RNAs. It has been observed that misregulation of epigenetic processes has been associated with human disease. We have constructed the new resource by selecting the subset of epigenetics-specific data from general-purpose archives, such as the Gene Expression Omnibus, and Sequence Read Archives, and then subjecting them to further review, annotation and reorganization. Raw data is processed and mapped to genomic coordinates to generate 'tracks' that are a visual representation of the data. These data tracks can be viewed using popular genome browsers or downloaded for local analysis. The Epigenomics resource also provides the user with a unique interface that allows for intuitive browsing and searching of data sets based on biological attributes. Currently, there are 69 studies, 337 samples and over 1100 data tracks from five well-studied species that are viewable and downloadable in Epigenomics.
Green, Timothy J; Dixon, Tom J; Devic, Emilie; Adlard, Robert D; Barnes, Andrew C
2009-05-01
Sydney rock oysters (Saccostrea glomerata) selectively bred for disease resistance (R) and wild-caught control oysters (W) were exposed to a field infection of disseminating neoplasia. Cumulative mortality of W oysters (31.7%) was significantly greater than R oysters (0.0%) over the 118 days of the experiment. In an attempt to understand the biochemical and molecular pathways involved in disease resistance, differentially expressed sequence tags (ESTs) between R and W S. glomerata hemocytes were identified using the PCR technique, suppression subtractive hybridisation (SSH). Sequencing of 300 clones from two SSH libraries revealed 183 distinct sequences of which 113 shared high similarity to sequences in the public databases. Putative function could be assigned to 64 of the sequences. Expression of nine ESTs homologous to genes previously shown to be involved in bivalve immunity was further studied using quantitative reverse-transcriptase PCR (qRT-PCR). The base-line expression of an extracellular superoxide dismutase (ecSOD) and a small heat shock protein (sHsP) were significantly increased, whilst peroxiredoxin 6 (Prx6) and interferon inhibiting cytokine factor (IK) were significantly decreased in R oysters. From these results it was hypothesised that R oysters would be able to generate the anti-parasitic compound, hydrogen peroxide (H(2)O(2)) faster and to higher concentrations during respiratory burst due to the differential expression of genes for the two anti-oxidant enzymes of ecSOD and Prx6. To investigate this hypothesis, protein extracts from hemolymph were analysed for oxidative burst enzyme activity. Analysis of the cell free hemolymph proteins separated by native-polyacrylamide gel electrophoresis (PAGE) failed to detect true superoxide dismutase (SOD) activity by assaying dismutation of superoxide anion in zymograms. However, the ecSOD enzyme appears to generate hydrogen peroxide, presumably via another process, which is yet to be elucidated. This corroborates our hypothesis, whilst phylogenetic analysis of the complete coding sequence (CDS) of the S. glomerata ecSOD gene is supportive of the atypical nature of the ecSOD enzyme. Results obtained from this work further the current understanding of the molecular mechanisms involved in resistance to disease in this economically important bivalve, and shed further light on the anomalous oxidative processes involved.
Riviere, Guillaume; Klopp, Christophe; Ibouniyamine, Nabihoudine; Huvet, Arnaud; Boudry, Pierre; Favrel, Pascal
2015-12-02
The Pacific oyster, Crassostrea gigas, is one of the most important aquaculture shellfish resources worldwide. Important efforts have been undertaken towards a better knowledge of its genome and transcriptome, which makes now C. gigas becoming a model organism among lophotrochozoans, the under-described sister clade of ecdysozoans within protostomes. These massive sequencing efforts offer the opportunity to assemble gene expression data and make such resource accessible and exploitable for the scientific community. Therefore, we undertook this assembly into an up-to-date publicly available transcriptome database: the GigaTON (Gigas TranscriptOme pipeliNe) database. We assembled 2204 million sequences obtained from 114 publicly available RNA-seq libraries that were realized using all embryo-larval development stages, adult organs, different environmental stressors including heavy metals, temperature, salinity and exposure to air, which were mostly performed as part of the Crassostrea gigas genome project. This data was analyzed in silico and resulted into 56621 newly assembled contigs that were deposited into a publicly available database, the GigaTON database. This database also provides powerful and user-friendly request tools to browse and retrieve information about annotation, expression level, UTRs, splice and polymorphism, and gene ontology associated to all the contigs into each, and between all libraries. The GigaTON database provides a convenient, potent and versatile interface to browse, retrieve, confront and compare massive transcriptomic information in an extensive range of conditions, tissues and developmental stages in Crassostrea gigas. To our knowledge, the GigaTON database constitutes the most extensive transcriptomic database to date in marine invertebrates, thereby a new reference transcriptome in the oyster, a highly valuable resource to physiologists and evolutionary biologists.
Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A
2009-01-01
Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an important step towards the identification of all genes in the citrus genome. Furthermore, public availability of the cDNA clones generated in this study, and not only their sequence, enables testing of the biological function of the genes represented in the collection. Expression of the citrus SEP3 homologue, CitrSEP, in Arabidopsis results in early flowering, along with other phenotypes resembling the over-expression of the Arabidopsis SEPALLATA genes. Our findings suggest that the members of the SEP gene family play similar roles in these quite distant plant species. PMID:19747386
Sakai, Hiroaki; Kanamori, Hiroyuki; Arai-Kichise, Yuko; Shibata-Hatta, Mari; Ebana, Kaworu; Oono, Youko; Kurita, Kanako; Fujisawa, Hiroko; Katagiri, Satoshi; Mukai, Yoshiyuki; Hamada, Masao; Itoh, Takeshi; Matsumoto, Takashi; Katayose, Yuichi; Wakasa, Kyo; Yano, Masahiro; Wu, Jianzhong
2014-01-01
Having a deep genetic structure evolved during its domestication and adaptation, the Asian cultivated rice (Oryza sativa) displays considerable physiological and morphological variations. Here, we describe deep whole-genome sequencing of the aus rice cultivar Kasalath by using the advanced next-generation sequencing (NGS) technologies to gain a better understanding of the sequence and structural changes among highly differentiated cultivars. The de novo assembled Kasalath sequences represented 91.1% (330.55 Mb) of the genome and contained 35 139 expressed loci annotated by RNA-Seq analysis. We detected 2 787 250 single-nucleotide polymorphisms (SNPs) and 7393 large insertion/deletion (indel) sites (>100 bp) between Kasalath and Nipponbare, and 2 216 251 SNPs and 3780 large indels between Kasalath and 93-11. Extensive comparison of the gene contents among these cultivars revealed similar rates of gene gain and loss. We detected at least 7.39 Mb of inserted sequences and 40.75 Mb of unmapped sequences in the Kasalath genome in comparison with the Nipponbare reference genome. Mapping of the publicly available NGS short reads from 50 rice accessions proved the necessity and the value of using the Kasalath whole-genome sequence as an additional reference to capture the sequence polymorphisms that cannot be discovered by using the Nipponbare sequence alone. PMID:24578372
Iandolino, Alberto; Nobuta, Kan; da Silva, Francisco Goes; Cook, Douglas R; Meyers, Blake C
2008-05-12
Vitis vinifera (V. vinifera) is the primary grape species cultivated for wine production, with an industry valued annually in the billions of dollars worldwide. In order to sustain and increase grape production, it is necessary to understand the genetic makeup of grape species. Here we performed mRNA profiling using Massively Parallel Signature Sequencing (MPSS) and combined it with available Expressed Sequence Tag (EST) data. These tag-based technologies, which do not require a priori knowledge of genomic sequence, are well-suited for transcriptional profiling. The sequence depth of MPSS allowed us to capture and quantify almost all the transcripts at a specific stage in the development of the grape berry. The number and relative abundance of transcripts from stage II grape berries was defined using Massively Parallel Signature Sequencing (MPSS). A total of 2,635,293 17-base and 2,259,286 20-base signatures were obtained, representing at least 30,737 and 26,878 distinct sequences. The average normalized abundance per signature was approximately 49 TPM (Transcripts Per Million). Comparisons of the MPSS signatures with available Vitis species' ESTs and a unigene set demonstrated that 6,430 distinct contigs and 2,190 singletons have a perfect match to at least one MPSS signature. Among the matched sequences, ESTs were identified from tissues other than berries or from berries at different developmental stages. Additional MPSS signatures not matching to known grape ESTs can extend our knowledge of the V. vinifera transcriptome, particularly when these data are used to assist in annotation of whole genome sequences from Vitis vinifera. The MPSS data presented here not only achieved a higher level of saturation than previous EST based analyses, but in doing so, expand the known set of transcripts of grape berries during the unique stage in development that immediately precedes the onset of ripening. The MPSS dataset also revealed evidence of antisense expression not previously reported in grapes but comparable to that reported in other plant species. Finally, we developed a novel web-based, public resource for utilization of the grape MPSS data [1].
Young infants' generalization of emotional expressions: effects of familiarity.
Walker-Andrews, Arlene S; Krogh-Jespersen, Sheila; Mayhew, Estelle M Y; Coffield, Caroline N
2011-08-01
From birth, infants are exposed to a wealth of emotional information in their interactions. Much research has been done to investigate the development of emotion perception, and factors influencing that development. The current study investigates the role of familiarity on 3.5-month-old infants' generalization of emotional expressions. Infants were assigned to one of two habituation sequences: in one sequence, infants were visually habituated to parental expressions of happy or sad. At test, infants viewed either a continuation of the habituation sequence, their mother depicting a novel expression, an unfamiliar female depicting the habituated expression, or an unfamiliar female depicting a novel expression. In the second sequence, a new sample of infants was matched to the infants in the first sequence. These infants viewed the same habituation and test sequences, but the actors were unfamiliar to them. Only those infants who viewed their own mothers and fathers during the habituation sequence increased looking. They dishabituated looking to maternal novel expressions, the unfamiliar female's novel expression, and the unfamiliar female depicting the habituated expression, especially when sad parental expressions were followed by an expression change to happy or to a change in person. Infants are guided in their recognition of emotional expressions by the familiarity of their parents, before generalizing to others. 2011 APA, all rights reserved
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius
Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.
2010-01-01
Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665
Combined hairpin-antisense compositions and methods for modulating expression
Shanklin, John; Nguyen, Tam
2014-08-05
A nucleotide construct comprising a nucleotide sequence that forms a stem and a loop, wherein the loop comprises a nucleotide sequence that modulates expression of a target, wherein the stem comprises a nucleotide sequence that modulates expression of a target, and wherein the target modulated by the nucleotide sequence in the loop and the target modulated by the nucleotide sequence in the stem may be the same or different. Vectors, methods of regulating target expression, methods of providing a cell, and methods of treating conditions comprising the nucleotide sequence are also disclosed.
Combined hairpin-antisense compositions and methods for modulating expression
Shanklin, John; Nguyen, Tam Huu
2015-11-24
A nucleotide construct comprising a nucleotide sequence that forms a stem and a loop, wherein the loop comprises a nucleotide sequence that modulates expression of a target, wherein the stem comprises a nucleotide sequence that modulates expression of a target, and wherein the target modulated by the nucleotide sequence in the loop and the target modulated by the nucleotide sequence in the stem may be the same or different. Vectors, methods of regulating target expression, methods of providing a cell, and methods of treating conditions comprising the nucleotide sequence are also disclosed.
Devlin, Joseph C; Battaglia, Thomas; Blaser, Martin J; Ruggles, Kelly V
2018-06-25
Exploration of large data sets, such as shotgun metagenomic sequence or expression data, by biomedical experts and medical professionals remains as a major bottleneck in the scientific discovery process. Although tools for this purpose exist for 16S ribosomal RNA sequencing analysis, there is a growing but still insufficient number of user-friendly interactive visualization workflows for easy data exploration and figure generation. The development of such platforms for this purpose is necessary to accelerate and streamline microbiome laboratory research. We developed the Workflow Hub for Automated Metagenomic Exploration (WHAM!) as a web-based interactive tool capable of user-directed data visualization and statistical analysis of annotated shotgun metagenomic and metatranscriptomic data sets. WHAM! includes exploratory and hypothesis-based gene and taxa search modules for visualizing differences in microbial taxa and gene family expression across experimental groups, and for creating publication quality figures without the need for command line interface or in-house bioinformatics. WHAM! is an interactive and customizable tool for downstream metagenomic and metatranscriptomic analysis providing a user-friendly interface allowing for easy data exploration by microbiome and ecological experts to facilitate discovery in multi-dimensional and large-scale data sets.
Contreras-López, Orlando; Moyano, Tomás C; Soto, Daniela C; Gutiérrez, Rodrigo A
2018-01-01
The rapid increase in the availability of transcriptomics data generated by RNA sequencing represents both a challenge and an opportunity for biologists without bioinformatics training. The challenge is handling, integrating, and interpreting these data sets. The opportunity is to use this information to generate testable hypothesis to understand molecular mechanisms controlling gene expression and biological processes (Fig. 1). A successful strategy to generate tractable hypotheses from transcriptomics data has been to build undirected network graphs based on patterns of gene co-expression. Many examples of new hypothesis derived from network analyses can be found in the literature, spanning different organisms including plants and specific fields such as root developmental biology.In order to make the process of constructing a gene co-expression network more accessible to biologists, here we provide step-by-step instructions using published RNA-seq experimental data obtained from a public database. Similar strategies have been used in previous studies to advance root developmental biology. This guide includes basic instructions for the operation of widely used open source platforms such as Bio-Linux, R, and Cytoscape. Even though the data we used in this example was obtained from Arabidopsis thaliana, the workflow developed in this guide can be easily adapted to work with RNA-seq data from any organism.
Chen, Mindong; Wang, Bin; Zhang, Qianrong; Xue, Zhuzheng
2017-01-01
Fresh-cut luffa (Luffa cylindrica) fruits commonly undergo browning. However, little is known about the molecular mechanisms regulating this process. We used the RNA-seq technique to analyze the transcriptomic changes occurring during the browning of fresh-cut fruits from luffa cultivar ‘Fusi-3’. Over 90 million high-quality reads were assembled into 58,073 Unigenes, and 60.86% of these were annotated based on sequences in four public databases. We detected 35,282 Unigenes with significant hits to sequences in the NCBInr database, and 24,427 Unigenes encoded proteins with sequences that were similar to those of known proteins in the Swiss-Prot database. Additionally, 20,546 and 13,021 Unigenes were similar to existing sequences in the Eukaryotic Orthologous Groups of proteins and Kyoto Encyclopedia of Genes and Genomes databases, respectively. Furthermore, 27,301 Unigenes were differentially expressed during the browning of fresh-cut luffa fruits (i.e., after 1–6 h). Moreover, 11 genes from five gene families (i.e., PPO, PAL, POD, CAT, and SOD) identified as potentially associated with enzymatic browning as well as four WRKY transcription factors were observed to be differentially regulated in fresh-cut luffa fruits. With the assistance of rapid amplification of cDNA ends technology, we obtained the full-length sequences of the 15 Unigenes. We also confirmed these Unigenes were expressed by quantitative real-time polymerase chain reaction analysis. This study provides a comprehensive transcriptome sequence resource, and may facilitate further studies aimed at identifying genes affecting luffa fruit browning for the exploitation of the underlying mechanism. PMID:29145430
Narad, Priyanka; Kumar, Abhishek; Chakraborty, Amlan; Patni, Pranav; Sengupta, Abhishek; Wadhwa, Gulshan; Upadhyaya, K C
2017-09-01
Transcription factors are trans-acting proteins that interact with specific nucleotide sequences known as transcription factor binding site (TFBS), and these interactions are implicated in regulation of the gene expression. Regulation of transcriptional activation of a gene often involves multiple interactions of transcription factors with various sequence elements. Identification of these sequence elements is the first step in understanding the underlying molecular mechanism(s) that regulate the gene expression. For in silico identification of these sequence elements, we have developed an online computational tool named transcription factor information system (TFIS) for detecting TFBS for the first time using a collection of JAVA programs and is mainly based on TFBS detection using position weight matrix (PWM). The database used for obtaining position frequency matrices (PFM) is JASPAR and HOCOMOCO, which is an open-access database of transcription factor binding profiles. Pseudo-counts are used while converting PFM to PWM, and TFBS detection is carried out on the basis of percent score taken as threshold value. TFIS is equipped with advanced features such as direct sequence retrieving from NCBI database using gene identification number and accession number, detecting binding site for common TF in a batch of gene sequences, and TFBS detection after generating PWM from known raw binding sequences in addition to general detection methods. TFIS can detect the presence of potential TFBSs in both the directions at the same time. This feature increases its efficiency. And the results for this dual detection are presented in different colors specific to the orientation of the binding site. Results obtained by the TFIS are more detailed and specific to the detected TFs as integration of more informative links from various related web servers are added in the result pages like Gene Ontology, PAZAR database and Transcription Factor Encyclopedia in addition to NCBI and UniProt. Common TFs like SP1, AP1 and NF-KB of the Amyloid beta precursor gene is easily detected using TFIS along with multiple binding sites. In another scenario of embryonic developmental process, TFs of the FOX family (FOXL1 and FOXC1) were also identified. TFIS is platform-independent which is publicly available along with its support and documentation at http://tfistool.appspot.com and http://www.bioinfoplus.com/tfis/ . TFIS is licensed under the GNU General Public License, version 3 (GPL-3.0).
In silico identification of conserved microRNAs in large number of diverse plant species
Sunkar, Ramanjulu; Jagadeeswaran, Guru
2008-01-01
Background MicroRNAs (miRNAs) are recently discovered small non-coding RNAs that play pivotal roles in gene expression, specifically at the post-transcriptional level in plants and animals. Identification of miRNAs in large number of diverse plant species is important to understand the evolution of miRNAs and miRNA-targeted gene regulations. Now-a-days, publicly available databases play a central role in the in-silico biology. Because, at least ~21 miRNA families are conserved in higher plants, a homology based search using these databases can help identify orthologs or paralogs in plants. Results We searched all publicly available nucleotide databases of genome survey sequences (GSS), high-throughput genomics sequences (HTGS), expressed sequenced tags (ESTs) and nonredundant (NR) nucleotides and identified 682 miRNAs in 155 diverse plant species. We found more than 15 conserved miRNA families in 11 plant species, 10 to14 families in 10 plant species and 5 to 9 families in 29 plant species. Nineteen conserved miRNA families were identified in important model legumes such as Medicago, Lotus and soybean. Five miRNA families – miR319, miR156/157, miR169, miR165/166 and miR394 – were found in 51, 45, 41, 40 and 40 diverse plant species, respectively. miR403 homologs were found in 16 dicots, whereas miR437 and miR444 homologs, as well as the miR396d/e variant of the miR396 family, were found only in monocots, thus providing large-scale authenticity for the dicot- and monocot-specific miRNAs. Furthermore, we provide computational and/or experimental evidence for the conservation of 6 newly found Arabidopsis miRNA homologs (miR158, miR391, miR824, miR825, miR827 and miR840) and 2 small RNAs (small-85 and small-87) in Brassica spp. Conclusion Using all publicly available nucleotide databases, 682 miRNAs were identified in 155 diverse plant species. By combining the expression analysis with the computational approach, we found that 6 miRNAs and 2 small RNAs that have been identified only in Arabidopsis thus far, are also conserved in Brassica spp. These findings will be useful for tracing the evolution of small RNAs by examining their expression in common ancestors of the Arabidopsis-Brassica lineage. PMID:18416839
Hyun, Tae Kyung; Lee, Sarah; Kumar, Dhinesh; Rim, Yeonggil; Kumar, Ritesh; Lee, Sang Yeol; Lee, Choong Hwan; Kim, Jae-Yean
2014-10-01
Using Illumina sequencing technology, we have generated the large-scale transcriptome sequencing data containing abundant information on genes involved in the metabolic pathways in R. idaeus cv. Nova fruits. Rubus idaeus (Red raspberry) is one of the important economical crops that possess numerous nutrients, micronutrients and phytochemicals with essential health benefits to human. The molecular mechanism underlying the ripening process and phytochemical biosynthesis in red raspberry is attributed to the changes in gene expression, but very limited transcriptomic and genomic information in public databases is available. To address this issue, we generated more than 51 million sequencing reads from R. idaeus cv. Nova fruit using Illumina RNA-Seq technology. After de novo assembly, we obtained 42,604 unigenes with an average length of 812 bp. At the protein level, Nova fruit transcriptome showed 77 and 68 % sequence similarities with Rubus coreanus and Fragaria versa, respectively, indicating the evolutionary relationship between them. In addition, 69 % of assembled unigenes were annotated using public databases including NCBI non-redundant, Cluster of Orthologous Groups and Gene ontology database, suggesting that our transcriptome dataset provides a valuable resource for investigating metabolic processes in red raspberry. To analyze the relationship between several novel transcripts and the amounts of metabolites such as γ-aminobutyric acid and anthocyanins, real-time PCR and target metabolite analysis were performed on two different ripening stages of Nova. This is the first attempt using Illumina sequencing platform for RNA sequencing and de novo assembly of Nova fruit without reference genome. Our data provide the most comprehensive transcriptome resource available for Rubus fruits, and will be useful for understanding the ripening process and for breeding R. idaeus cultivars with improved fruit quality.
MethHC: a database of DNA methylation and gene expression in human cancer.
Huang, Wei-Yun; Hsu, Sheng-Da; Huang, Hsi-Yuan; Sun, Yi-Ming; Chou, Chih-Hung; Weng, Shun-Long; Huang, Hsien-Da
2015-01-01
We present MethHC (http://MethHC.mbc.nctu.edu.tw), a database comprising a systematic integration of a large collection of DNA methylation data and mRNA/microRNA expression profiles in human cancer. DNA methylation is an important epigenetic regulator of gene transcription, and genes with high levels of DNA methylation in their promoter regions are transcriptionally silent. Increasing numbers of DNA methylation and mRNA/microRNA expression profiles are being published in different public repositories. These data can help researchers to identify epigenetic patterns that are important for carcinogenesis. MethHC integrates data such as DNA methylation, mRNA expression, DNA methylation of microRNA gene and microRNA expression to identify correlations between DNA methylation and mRNA/microRNA expression from TCGA (The Cancer Genome Atlas), which includes 18 human cancers in more than 6000 samples, 6548 microarrays and 12 567 RNA sequencing data. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
HTSeq--a Python framework to work with high-throughput sequencing data.
Anders, Simon; Pyl, Paul Theodor; Huber, Wolfgang
2015-01-15
A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. © The Author 2014. Published by Oxford University Press.
NCBI GEO: archive for functional genomics data sets--10 years on.
Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Muertter, Rolf N; Holko, Michelle; Ayanbule, Oluwabukunmi; Yefanov, Andrey; Soboleva, Alexandra
2011-01-01
A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20,000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.
Liu, Miaomiao; Zhu, Jinhang; Wu, Shengbing; Wang, Chenkai; Guo, Xingyi; Wu, Jiawen; Zhou, Meiqi
2018-04-11
Artemisia argyi Lev. et Vant. (A. argyi) is widely utilized for moxibustion in Chinese medicine, and the mechanism underlying terpenoid biosynthesis in its leaves is suggested to play an important role in its medicinal use. However, the A. argyi transcriptome has not been sequenced. Herein, we performed RNA sequencing for A. argyi leaf, root and stem tissues to identify as many as possible of the transcribed genes. In total, 99,807 unigenes were assembled by analysing the expression profiles generated from the three tissue types, and 67,446 of those unigenes were annotated in public databases. We further performed differential gene expression analysis to compare leaf tissue with the other two tissue types and identified numerous genes that were specifically expressed or up-regulated in leaf tissue. Specifically, we identified multiple genes encoding significant enzymes or transcription factors related to terpenoid synthesis. This study serves as a valuable resource for transcriptome information, as many transcribed genes related to terpenoid biosynthesis were identified in the A. argyi transcriptome, providing a functional genomic basis for additional studies on molecular mechanisms underlying the medicinal use of A. argyi.
Shen, Li; Shao, Ningyi; Liu, Xiaochuan; Nestler, Eric
2014-04-15
Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. We have developed ngs.plot - a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.
2014-01-01
Background Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. Results We have developed ngs.plot – a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. Conclusions We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data. PMID:24735413
Pardo, Belén G; Álvarez-Dios, José Antonio; Cao, Asunción; Ramilo, Andrea; Gómez-Tato, Antonio; Planas, Josep V; Villalba, Antonio; Martínez, Paulino
2016-12-01
The flat oyster, Ostrea edulis, is one of the main farmed oysters, not only in Europe but also in the United States and Canada. Bonamiosis due to the parasite Bonamia ostreae has been associated with high mortality episodes in this species. This parasite is an intracellular protozoan that infects haemocytes, the main cells involved in oyster defence. Due to the economical and ecological importance of flat oyster, genomic data are badly needed for genetic improvement of the species, but they are still very scarce. The objective of this study is to develop a sequence database, OedulisDB, with new genomic and transcriptomic resources, providing new data and convenient tools to improve our knowledge of the oyster's immune mechanisms. Transcriptomic and genomic sequences were obtained using 454 pyrosequencing and compiled into an O. edulis database, OedulisDB, consisting of two sets of 10,318 and 7159 unique sequences that represent the oyster's genome (WG) and de novo haemocyte transcriptome (HT), respectively. The flat oyster transcriptome was obtained from two strains (naïve and tolerant) challenged with B. ostreae, and from their corresponding non-challenged controls. Approximately 78.5% of 5619 HT unique sequences were successfully annotated by Blast search using public databases. A total of 984 sequences were identified as being related to immune response and several key immune genes were identified for the first time in flat oyster. Additionally, transcriptome information was used to design and validate the first oligo-microarray in flat oyster enriched with immune sequences from haemocytes. Our transcriptomic and genomic sequencing and subsequent annotation have largely increased the scarce resources available for this economically important species and have enabled us to develop an OedulisDB database and accompanying tools for gene expression analysis. This study represents the first attempt to characterize in depth the O. edulis haemocyte transcriptome in response to B. ostreae through massively sequencing and has aided to improve our knowledge of the immune mechanisms of flat oyster. The validated oligo-microarray and the establishment of a reference transcriptome will be useful for large-scale gene expression studies in this species. Copyright © 2016 Elsevier Ltd. All rights reserved.
In silico Analysis of 2085 Clones from a Normalized Rat Vestibular Periphery 3′ cDNA Library
Roche, Joseph P.; Cioffi, Joseph A.; Kwitek, Anne E.; Erbe, Christy B.; Popper, Paul
2005-01-01
The inserts from 2400 cDNA clones isolated from a normalized Rattus norvegicus vestibular periphery cDNA library were sequenced and characterized. The Wackym-Soares vestibular 3′ cDNA library was constructed from the saccular and utricular maculae, the ampullae of all three semicircular canals and Scarpa's ganglia containing the somata of the primary afferent neurons, microdissected from 104 male and female rats. The inserts from 2400 randomly selected clones were sequenced from the 5′ end. Each sequence was analyzed using the BLAST algorithm compared to the Genbank nonredundant, rat genome, mouse genome and human genome databases to search for high homology alignments. Of the initial 2400 clones, 315 (13%) were found to be of poor quality and did not yield useful information, and therefore were eliminated from the analysis. Of the remaining 2085 sequences, 918 (44%) were found to represent 758 unique genes having useful annotations that were identified in databases within the public domain or in the published literature; these sequences were designated as known characterized sequences. 1141 sequences (55%) aligned with 1011 unique sequences had no useful annotations and were designated as known but uncharacterized sequences. Of the remaining 26 sequences (1%), 24 aligned with rat genomic sequences, but none matched previously described rat expressed sequence tags or mRNAs. No significant alignment to the rat or human genomic sequences could be found for the remaining 2 sequences. Of the 2085 sequences analyzed, 86% were singletons. The known, characterized sequences were analyzed with the FatiGO online data-mining tool (http://fatigo.bioinfo.cnio.es/) to identify level 5 biological process gene ontology (GO) terms for each alignment and to group alignments with similar or identical GO terms. Numerous genes were identified that have not been previously shown to be expressed in the vestibular system. Further characterization of the novel cDNA sequences may lead to the identification of genes with vestibular-specific functions. Continued analysis of the rat vestibular periphery transcriptome should provide new insights into vestibular function and generate new hypotheses. Physiological studies are necessary to further elucidate the roles of the identified genes and novel sequences in vestibular function. PMID:16103642
Prasath, Duraisamy; Karthika, Raveendran; Habeeba, Naduva Thadath; Suraby, Erinjery Jose; Rosana, Ottakandathil Babu; Shaji, Avaroth; Eapen, Santhosh Joseph; Deshpande, Uday; Anandaraj, Muthuswamy
2014-01-01
Bacterial wilt in ginger (Zingiber officinale Rosc.) caused by Ralstonia solanacearum is one of the most important production constraints in tropical, sub-tropical and warm temperature regions of the world. Lack of resistant genotype adds constraints to the crop management. However, mango ginger (Curcuma amada Roxb.), which is resistant to R. solanacearum, is a potential donor, if the exact mechanism of resistance is understood. To identify genes involved in resistance to R. solanacearum, we have sequenced the transcriptome from wilt-sensitive ginger and wilt-resistant mango ginger using Illumina sequencing technology. A total of 26387032 and 22268804 paired-end reads were obtained after quality filtering for C. amada and Z. officinale, respectively. A total of 36359 and 32312 assembled transcript sequences were obtained from both the species. The functions of the unigenes cover a diverse set of molecular functions and biological processes, among which we identified a large number of genes associated with resistance to stresses and response to biotic stimuli. Large scale expression profiling showed that many of the disease resistance related genes were expressed more in C. amada. Comparative analysis also identified genes belonging to different pathways of plant defense against biotic stresses that are differentially expressed in either ginger or mango ginger. The identification of many defense related genes differentially expressed provides many insights to the resistance mechanism to R. solanacearum and for studying potential pathways involved in responses to pathogen. Also, several candidate genes that may underline the difference in resistance to R. solanacearum between ginger and mango ginger were identified. Finally, we have developed a web resource, ginger transcriptome database, which provides public access to the data. Our study is among the first to demonstrate the use of Illumina short read sequencing for de novo transcriptome assembly and comparison in non-model species of Zingiberaceae. PMID:24940878
Prasath, Duraisamy; Karthika, Raveendran; Habeeba, Naduva Thadath; Suraby, Erinjery Jose; Rosana, Ottakandathil Babu; Shaji, Avaroth; Eapen, Santhosh Joseph; Deshpande, Uday; Anandaraj, Muthuswamy
2014-01-01
Bacterial wilt in ginger (Zingiber officinale Rosc.) caused by Ralstonia solanacearum is one of the most important production constraints in tropical, sub-tropical and warm temperature regions of the world. Lack of resistant genotype adds constraints to the crop management. However, mango ginger (Curcuma amada Roxb.), which is resistant to R. solanacearum, is a potential donor, if the exact mechanism of resistance is understood. To identify genes involved in resistance to R. solanacearum, we have sequenced the transcriptome from wilt-sensitive ginger and wilt-resistant mango ginger using Illumina sequencing technology. A total of 26387032 and 22268804 paired-end reads were obtained after quality filtering for C. amada and Z. officinale, respectively. A total of 36359 and 32312 assembled transcript sequences were obtained from both the species. The functions of the unigenes cover a diverse set of molecular functions and biological processes, among which we identified a large number of genes associated with resistance to stresses and response to biotic stimuli. Large scale expression profiling showed that many of the disease resistance related genes were expressed more in C. amada. Comparative analysis also identified genes belonging to different pathways of plant defense against biotic stresses that are differentially expressed in either ginger or mango ginger. The identification of many defense related genes differentially expressed provides many insights to the resistance mechanism to R. solanacearum and for studying potential pathways involved in responses to pathogen. Also, several candidate genes that may underline the difference in resistance to R. solanacearum between ginger and mango ginger were identified. Finally, we have developed a web resource, ginger transcriptome database, which provides public access to the data. Our study is among the first to demonstrate the use of Illumina short read sequencing for de novo transcriptome assembly and comparison in non-model species of Zingiberaceae.
Hamilton, John P; Neeno-Eckwall, Eric C; Adhikari, Bishwo N; Perna, Nicole T; Tisserat, Ned; Leach, Jan E; Lévesque, C André; Buell, C Robin
2011-01-01
The Comprehensive Phytopathogen Genomics Resource (CPGR) provides a web-based portal for plant pathologists and diagnosticians to view the genome and trancriptome sequence status of 806 bacterial, fungal, oomycete, nematode, viral and viroid plant pathogens. Tools are available to search and analyze annotated genome sequences of 74 bacterial, fungal and oomycete pathogens. Oomycete and fungal genomes are obtained directly from GenBank, whereas bacterial genome sequences are downloaded from the A Systematic Annotation Package (ASAP) database that provides curation of genomes using comparative approaches. Curated lists of bacterial genes relevant to pathogenicity and avirulence are also provided. The Plant Pathogen Transcript Assemblies Database provides annotated assemblies of the transcribed regions of 82 eukaryotic genomes from publicly available single pass Expressed Sequence Tags. Data-mining tools are provided along with tools to create candidate diagnostic markers, an emerging use for genomic sequence data in plant pathology. The Plant Pathogen Ribosomal DNA (rDNA) database is a resource for pathogens that lack genome or transcriptome data sets and contains 131 755 rDNA sequences from GenBank for 17 613 species identified as plant pathogens and related genera. Database URL: http://cpgr.plantbiology.msu.edu.
Missing data and technical variability in single-cell RNA-sequencing experiments.
Hicks, Stephanie C; Townes, F William; Teng, Mingxiang; Irizarry, Rafael A
2017-11-06
Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-seq and scRNA-seq data are markedly different. In particular, unlike RNA-seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, genes expressing RNA, but not at a sufficient level to be detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch-effects and confounded experiments can intensify the problem. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Searching for resistance genes to Bursaphelenchus xylophilus using high throughput screening.
Santos, Carla S; Pinheiro, Miguel; Silva, Ana I; Egas, Conceição; Vasconcelos, Marta W
2012-11-07
Pine wilt disease (PWD), caused by the pinewood nematode (PWN; Bursaphelenchus xylophilus), damages and kills pine trees and is causing serious economic damage worldwide. Although the ecological mechanism of infestation is well described, the plant's molecular response to the pathogen is not well known. This is due mainly to the lack of genomic information and the complexity of the disease. High throughput sequencing is now an efficient approach for detecting the expression of genes in non-model organisms, thus providing valuable information in spite of the lack of the genome sequence. In an attempt to unravel genes potentially involved in the pine defense against the pathogen, we hereby report the high throughput comparative sequence analysis of infested and non-infested stems of Pinus pinaster (very susceptible to PWN) and Pinus pinea (less susceptible to PWN). Four cDNA libraries from infested and non-infested stems of P. pinaster and P. pinea were sequenced in a full 454 GS FLX run, producing a total of 2,083,698 reads. The putative amino acid sequences encoded by the assembled transcripts were annotated according to Gene Ontology, to assign Pinus contigs into Biological Processes, Cellular Components and Molecular Functions categories. Most of the annotated transcripts corresponded to Picea genes-25.4-39.7%, whereas a smaller percentage, matched Pinus genes, 1.8-12.8%, probably a consequence of more public genomic information available for Picea than for Pinus. The comparative transcriptome analysis showed that when P. pinaster was infested with PWN, the genes malate dehydrogenase, ABA, water deficit stress related genes and PAR1 were highly expressed, while in PWN-infested P. pinea, the highly expressed genes were ricin B-related lectin, and genes belonging to the SNARE and high mobility group families. Quantitative PCR experiments confirmed the differential gene expression between the two pine species. Defense-related genes triggered by nematode infestation were detected in both P. pinaster and P. pinea transcriptomes utilizing 454 pyrosequencing technology. P. pinaster showed higher abundance of genes related to transcriptional regulation, terpenoid secondary metabolism (including some with nematicidal activity) and pathogen attack. P. pinea showed higher abundance of genes related to oxidative stress and higher levels of expression in general of stress responsive genes. This study provides essential information about the molecular defense mechanisms utilized by P. pinaster and P. pinea against PWN infestation and contributes to a better understanding of PWD.
Transcriptome Profile Analysis from Different Sex Types of Ginkgo biloba L.
Du, Shuhui; Sang, Yalin; Liu, Xiaojing; Xing, Shiyan; Li, Jihong; Tang, Haixia; Sun, Limin
2016-01-01
In plants, sex determination is a comprehensive process of correlated events, which involves genes that are differentially and/or specifically expressed in distinct developmental phases. Exploring gene expression profiles from different sex types will contribute to fully understanding sex determination in plants. In this study, we conducted RNA-sequencing of female and male buds (FB and MB) as well as ovulate strobilus and staminate strobilus (OS and SS) of Ginkgo biloba to gain insights into the genes potentially related to sex determination in this species. Approximately 60 Gb of clean reads were obtained from eight cDNA libraries. De novo assembly of the clean reads generated 108,307 unigenes with an average length of 796 bp. Among these unigenes, 51,953 (47.97%) had at least one significant match with a gene sequence in the public databases searched. A total of 4709 and 9802 differentially expressed genes (DEGs) were identified in MB vs. FB and SS vs. OS, respectively. Genes involved in plant hormone signal and transduction as well as those encoding DNA methyltransferase were found to be differentially expressed between different sex types. Their potential roles in sex determination of G. biloba were discussed. Pistil-related genes were expressed in male buds while anther-specific genes were identified in female buds, suggesting that dioecism in G. biloba was resulted from the selective arrest of reproductive primordia. High correlation of expression level was found between the RNA-Seq and quantitative real-time PCR results. The transcriptome resources that we generated allowed us to characterize gene expression profiles and examine differential expression profiles, which provided foundations for identifying functional genes associated with sex determination in G. biloba.
Hayward, David C.; Hetherington, Suzannah; Behm, Carolyn A.; Grasso, Lauretta C.; Forêt, Sylvain; Miller, David J.; Ball, Eldon E.
2011-01-01
Background A successful metamorphosis from a planktonic larva to a settled polyp, which under favorable conditions will establish a future colony, is critical for the survival of corals. However, in contrast to the situation in other animals, e.g., frogs and insects, little is known about the molecular basis of coral metamorphosis. We have begun to redress this situation with previous microarray studies, but there is still a great deal to learn. In the present paper we have utilized a different technology, subtractive hybridization, to characterize genes differentially expressed across this developmental transition and to compare the success of this method to microarray. Methodology/Principal Findings Suppressive subtractive hybridization (SSH) was used to identify two pools of transcripts from the coral, Acropora millepora. One is enriched for transcripts expressed at higher levels at the pre-settlement stage, and the other for transcripts expressed at higher levels at the post-settlement stage. Virtual northern blots were used to demonstrate the efficacy of the subtractive hybridization technique. Both pools contain transcripts coding for proteins in various functional classes but transcriptional regulatory proteins were represented more frequently in the post-settlement pool. Approximately 18% of the transcripts showed no significant similarity to any other sequence on the public databases. Transcripts of particular interest were further characterized by in situ hybridization, which showed that many are regulated spatially as well as temporally. Notably, many transcripts exhibit axially restricted expression patterns that correlate with the pool from which they were isolated. Several transcripts are expressed in patterns consistent with a role in calcification. Conclusions We have characterized over 200 transcripts that are differentially expressed between the planula larva and post-settlement polyp of the coral, Acropora millepora. Sequence, putative function, and in some cases temporal and spatial expression are reported. PMID:22065994
Transcriptome Profile Analysis from Different Sex Types of Ginkgo biloba L.
Du, Shuhui; Sang, Yalin; Liu, Xiaojing; Xing, Shiyan; Li, Jihong; Tang, Haixia; Sun, Limin
2016-01-01
In plants, sex determination is a comprehensive process of correlated events, which involves genes that are differentially and/or specifically expressed in distinct developmental phases. Exploring gene expression profiles from different sex types will contribute to fully understanding sex determination in plants. In this study, we conducted RNA-sequencing of female and male buds (FB and MB) as well as ovulate strobilus and staminate strobilus (OS and SS) of Ginkgo biloba to gain insights into the genes potentially related to sex determination in this species. Approximately 60 Gb of clean reads were obtained from eight cDNA libraries. De novo assembly of the clean reads generated 108,307 unigenes with an average length of 796 bp. Among these unigenes, 51,953 (47.97%) had at least one significant match with a gene sequence in the public databases searched. A total of 4709 and 9802 differentially expressed genes (DEGs) were identified in MB vs. FB and SS vs. OS, respectively. Genes involved in plant hormone signal and transduction as well as those encoding DNA methyltransferase were found to be differentially expressed between different sex types. Their potential roles in sex determination of G. biloba were discussed. Pistil-related genes were expressed in male buds while anther-specific genes were identified in female buds, suggesting that dioecism in G. biloba was resulted from the selective arrest of reproductive primordia. High correlation of expression level was found between the RNA-Seq and quantitative real-time PCR results. The transcriptome resources that we generated allowed us to characterize gene expression profiles and examine differential expression profiles, which provided foundations for identifying functional genes associated with sex determination in G. biloba. PMID:27379148
Zhang, Hanyuan; Vieira Resende E Silva, Bruno; Cui, Juan
2018-05-01
Small RNA sequencing is the most widely used tool for microRNA (miRNA) discovery, and shows great potential for the efficient study of miRNA cross-species transport, i.e., by detecting the presence of exogenous miRNA sequences in the host species. Because of the increased appreciation of dietary miRNAs and their far-reaching implication in human health, research interests are currently growing with regard to exogenous miRNAs bioavailability, mechanisms of cross-species transport and miRNA function in cellular biological processes. In this article, we present microRNA Discovery (miRDis), a new small RNA sequencing data analysis pipeline for both endogenous and exogenous miRNA detection. Specifically, we developed and deployed a Web service that supports the annotation and expression profiling data of known host miRNAs and the detection of novel miRNAs, other noncoding RNAs, and the exogenous miRNAs from dietary species. As a proof-of-concept, we analyzed a set of human plasma sequencing data from a milk-feeding study where 225 human miRNAs were detected in the plasma samples and 44 show elevated expression after milk intake. By examining the bovine-specific sequences, data indicate that three bovine miRNAs (bta-miR-378, -181* and -150) are present in human plasma possibly because of the dietary uptake. Further evaluation based on different sets of public data demonstrates that miRDis outperforms other state-of-the-art tools in both detection and quantification of miRNA from either animal or plant sources. The miRDis Web server is available at: http://sbbi.unl.edu/miRDis/index.php.
Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming
2015-01-01
Background Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. Results We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington’s, Alzheimer’s and Parkinson’s diseases. This is the first description of degenerative disease-associated genes in jellyfish. Conclusion We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular level and information on the underlying molecular mechanisms of jellyfish stinging. The findings of this study may also be used in comparative studies of gene expression profiling among different jellyfish species. PMID:26551022
Liu, Guoyan; Zhou, Yonghong; Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming
2015-01-01
Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington's, Alzheimer's and Parkinson's diseases. This is the first description of degenerative disease-associated genes in jellyfish. We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular level and information on the underlying molecular mechanisms of jellyfish stinging. The findings of this study may also be used in comparative studies of gene expression profiling among different jellyfish species.
2017-01-01
Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr. PMID:28821014
Lindsay, Cameron; Seikaly, Hadi; Biron, Vincent L
2017-01-31
Epigenetic modifications are heritable changes in gene expression that do not directly alter DNA sequence. These modifications include DNA methylation, histone post-translational modifications, small and non-coding RNAs. Alterations in epigenetic profiles cause deregulation of fundamental gene expression pathways associated with carcinogenesis. The role of epigenetics in oropharyngeal squamous cell carcinoma (OPSCC) has recently been recognized, with implications for novel biomarkers, molecular diagnostics and chemotherapeutics. In this review, important epigenetic pathways in human papillomavirus (HPV) positive and negative OPSCC are summarized, as well as the potential clinical utility of this knowledge.This material has never been published and is not currently under evaluation in any other peer-reviewed publication.
Overview: The Impact of Microbial Genomics on Food Safety
NASA Astrophysics Data System (ADS)
Milillo, Sara R.; Wiedmann, Martin; Hoelzer, Karin
The first use of the term "genome" is attributed to Hans Winkler in his 1920 publication Verbeitung und Ursache der Parthenogenesis im Pflanzen und Tierreiche (Winkler, 1920). However, it was not until 1986 that the study of genomic concepts coalesced with the creation of a new journal by the same name (McKusick, 1997). The study of genomics was initially defined as the use or the application of "informatic tools" to study features of a sequenced genome (Strauss and Falkow, 1997). Today the field of genomics is typically considered to encompass efforts to determine the nucleic acid DNA sequence of an organism as well as the expression of genetic information using high-throughput, genome-wide methods, including transcriptomic, proteomic, and metabolomic analyses.
Guo, D; Li, H L; Tang, X; Peng, S Q
2014-12-18
In plants, homeodomain proteins play a critical role in regulating various aspects of plant growth and development. KNOX proteins are members of the homeodomain protein family. The KNOX transcription factors have been reported from Arabidopsis, rice, and other higher plants. The recent publication of the draft genome sequence of cassava (Manihot esculenta Krantz) has allowed a genome-wide search for M. esculenta KNOX (MeKNOX) transcription factors and the comparison of these positively identified proteins with their homologs in model plants. In the present study, we identified 12 MeKNOX genes in the cassava genome and grouped them into two distinct subfamilies based on their domain composition and phylogenetic analysis. Furthermore, semi-quantitative reverse transcription polymerase chain reaction analysis was performed to elucidate the expression profiles of these genes in different tissues and during various stages of root development. The analysis of MeKNOX expression profiles of indicated that 12 MeKNOX genes display differential expressions either in their transcript abundance or expression patterns.
BioVLAB-MMIA-NGS: microRNA-mRNA integrated analysis using high-throughput sequencing data.
Chae, Heejoon; Rhee, Sungmin; Nephew, Kenneth P; Kim, Sun
2015-01-15
It is now well established that microRNAs (miRNAs) play a critical role in regulating gene expression in a sequence-specific manner, and genome-wide efforts are underway to predict known and novel miRNA targets. However, the integrated miRNA-mRNA analysis remains a major computational challenge, requiring powerful informatics systems and bioinformatics expertise. The objective of this study was to modify our widely recognized Web server for the integrated mRNA-miRNA analysis (MMIA) and its subsequent deployment on the Amazon cloud (BioVLAB-MMIA) to be compatible with high-throughput platforms, including next-generation sequencing (NGS) data (e.g. RNA-seq). We developed a new version called the BioVLAB-MMIA-NGS, deployed on both Amazon cloud and on a high-performance publicly available server called MAHA. By using NGS data and integrating various bioinformatics tools and databases, BioVLAB-MMIA-NGS offers several advantages. First, sequencing data is more accurate than array-based methods for determining miRNA expression levels. Second, potential novel miRNAs can be detected by using various computational methods for characterizing miRNAs. Third, because miRNA-mediated gene regulation is due to hybridization of an miRNA to its target mRNA, sequencing data can be used to identify many-to-many relationship between miRNAs and target genes with high accuracy. http://epigenomics.snu.ac.kr/biovlab_mmia_ngs/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Pereiro, Patricia; Balseiro, Pablo; Romero, Alejandro; Dios, Sonia; Forn-Cuni, Gabriel; Fuste, Berta; Planas, Josep V.; Beltran, Sergi; Novoa, Beatriz; Figueras, Antonio
2012-01-01
Background Turbot (Scophthalmus maximus L.) is an important aquacultural resource both in Europe and Asia. However, there is little information on gene sequences available in public databases. Currently, one of the main problems affecting the culture of this flatfish is mortality due to several pathogens, especially viral diseases which are not treatable. In order to identify new genes involved in immune defense, we conducted 454-pyrosequencing of the turbot transcriptome after different immune stimulations. Methodology/Principal Findings Turbot were injected with viral stimuli to increase the expression level of immune-related genes. High-throughput deep sequencing using 454-pyrosequencing technology yielded 915,256 high-quality reads. These sequences were assembled into 55,404 contigs that were subjected to annotation steps. Intriguingly, 55.16% of the deduced protein was not significantly similar to any sequences in the databases used for the annotation and only 0.85% of the BLASTx top-hits matched S. maximus protein sequences. This relatively low level of annotation is possibly due to the limited information for this specie and other flatfish in the database. These results suggest the identification of a large number of new genes in turbot and in fish in general. A more detailed analysis showed the presence of putative members of several innate and specific immune pathways. Conclusions/Significance To our knowledge, this study is the first transcriptome analysis using 454-pyrosequencing for turbot. Previously, there were only 12,471 EST and less of 1,500 nucleotide sequences for S. maximus in NCBI database. Our results provide a rich source of data (55,404 contigs and 181,845 singletons) for discovering and identifying new genes, which will serve as a basis for microarray construction, gene expression characterization and for identification of genetic markers to be used in several applications. Immune stimulation in turbot was very effective, obtaining an enormous variety of sequences belonging to genes involved in the defense mechanisms. PMID:22629298
Rokyta, Darin R; Wray, Kenneth P; Lemmon, Alan R; Lemmon, Emily Moriarty; Caudle, S Brian
2011-04-01
Despite causing considerable human mortality and morbidity, animal toxins represent a valuable source of pharmacologically active macromolecules, a unique system for studying molecular adaptation, and a powerful framework for examining structure-function relationships in proteins. Snake venoms are particularly useful in the latter regard as they consist primarily of a moderate number of proteins and peptides that have been found to belong to just a handful of protein families. As these proteins and peptides are produced in dedicated glands, transcriptome sequencing has proven to be an effective approach to identifying the expressed toxin genes. We generated a venom-gland transcriptome for the Eastern Diamondback Rattlesnake (Crotalus adamanteus) using Roche 454 sequencing technology. In the current work, we focus on transcripts encoding toxins. We identified 40 unique toxin transcripts, 30 of which have full-length coding sequences, and 10 have only partial coding sequences. These toxins account for 24% of the total sequencing reads. We found toxins from 11 previously described families of snake-venom toxins and have discovered two putative, previously undescribed toxin classes. The most diverse and highly expressed toxin classes in the C. adamanteus venom-gland transcriptome are the serine proteinases, metalloproteinases, and C-type lectins. The serine proteinases are the most abundant class, accounting for 35% of the toxin sequencing reads. Metalloproteinases are the most diverse; 11 different forms have been identified. Using our sequences and those available in public databases, we detected positive selection in seven of the eight toxin families for which sufficient sequences were available for the analysis. We find that the vast majority of the genes that contribute directly to this vertebrate trait show evidence for a role for positive selection in their evolutionary history. Copyright © 2011 Elsevier Ltd. All rights reserved.
Coordinate cytokine regulatory sequences
Frazer, Kelly A.; Rubin, Edward M.; Loots, Gabriela G.
2005-05-10
The present invention provides CNS sequences that regulate the cytokine gene expression, expression cassettes and vectors comprising or lacking the CNS sequences, host cells and non-human transgenic animals comprising the CNS sequences or lacking the CNS sequences. The present invention also provides methods for identifying compounds that modulate the functions of CNS sequences as well as methods for diagnosing defects in the CNS sequences of patients.
PSI:Biology-Materials Repository: A Biologist’s Resource for Protein Expression Plasmids
Cormier, Catherine Y.; Park, Jin G.; Fiacco, Michael; Steel, Jason; Hunter, Preston; Kramer, Jason; Singla, Rajeev; LaBaer, Joshua
2011-01-01
The Protein Structure Initiative:Biology-Materials Repository (PSI:Biology-MR; MR; http://psimr.asu.edu) sequence-verifies, annotates, stores, and distributes the protein expression plasmids and vectors created by the Protein Structure Initiative (PSI). The MR has developed an informatics and sample processing pipeline that manages this process for thousands of samples per month from nearly a dozen PSI centers. DNASU (http://dnasu.asu.edu), a freely searchable database, stores the plasmid annotations, which include the full-length sequence, vector information, and associated publications for over 130,000 plasmids created by our laboratory, by the PSI and other consortia, and by individual laboratories for distribution to researchers worldwide. Each plasmid links to external resources, including the PSI Structural Biology Knowledgebase (http://sbkb.org), which facilitates cross-referencing of a particular plasmid to additional protein annotations and experimental data. To expedite and simplify plasmid requests, the MR uses an expedited material transfer agreement (EP-MTA) network, where researchers from network institutions can order and receive PSI plasmids without institutional delays. Currently over 39,000 protein expression plasmids and 78 empty vectors from the PSI are available upon request from DNASU. Overall, the MR’s repository of expression-ready plasmids, its automated pipeline, and the rapid process for receiving and distributing these plasmids more effectively allows the research community to dissect the biological function of proteins whose structures have been studied by the PSI. PMID:21360289
Expression of a non-coding RNA in ectromelia virus is required for normal plaque formation.
Esteban, David J; Upton, Chris; Bartow-McKenney, Casey; Buller, R Mark L; Chen, Nanhai G; Schriewer, Jill; Lefkowitz, Elliot J; Wang, Chunlin
2014-02-01
Poxviruses are dsDNA viruses with large genomes. Many genes in the genome remain uncharacterized, and recent studies have demonstrated that the poxvirus transcriptome includes numerous so-called anomalous transcripts not associated with open reading frames. Here, we characterize the expression and role of an apparently non-coding RNA in orthopoxviruses, which we call viral hairpin RNA (vhRNA). Using a bioinformatics approach, we predicted expression of a transcript not associated with an open reading frame that is likely to form a stem-loop structure due to the presence of a 21 nt palindromic sequence. Expression of the transcript as early as 2 h post-infection was confirmed by northern blot and analysis of publicly available vaccinia virus infected cell transcriptomes. The transcription start site was determined by RACE PCE and transcriptome analysis, and early and late promoter sequences were identified. Finally, to test the function of the transcript we generated an ectromelia virus knockout, which failed to form plaques in cell culture. The important role of the transcript in viral replication was further demonstrated using siRNA. Although the function of the transcript remains unknown, our work contributes to evidence of an increasingly complex poxvirus transcriptome, suggesting that transcripts such as vhRNA not associated with an annotated open reading frame can play an important role in viral replication.
Zhang, Zongying; Jiang, Shenghui; Wang, Nan; Li, Min; Ji, Xiaohao; Sun, Shasha; Liu, Jingxuan; Wang, Deyun; Xu, Haifeng; Qi, Sumin; Wu, Shujing; Fei, Zhangjun; Feng, Shouqian; Chen, Xuesen
2015-01-01
Apple is one of the most economically important horticultural fruit crops worldwide. It is critical to gain insights into fruit ripening and softening to improve apple fruit quality and extend shelf life. In this study, forward and reverse suppression subtractive hybridization libraries were generated from ‘Taishanzaoxia’ apple fruits sampled around the ethylene climacteric to isolate ripening- and softening-related genes. A set of 648 unigenes were derived from sequence alignment and cluster assembly of 918 expressed sequence tags. According to gene ontology functional classification, 390 out of 443 unigenes (88%) were assigned to the biological process category, 356 unigenes (80%) were classified in the molecular function category, and 381 unigenes (86%) were allocated to the cellular component category. A total of 26 unigenes differentially expressed during fruit development period were analyzed by quantitative RT-PCR. These genes were involved in cell wall modification, anthocyanin biosynthesis, aroma production, stress response, metabolism, transcription, or were non-annotated. Some genes associated with cell wall modification, anthocyanin biosynthesis and aroma production were up-regulated and significantly correlated with ethylene production, suggesting that fruit texture, coloration and aroma may be regulated by ethylene in ‘Taishanzaoxia’. Some of the identified unigenes associated with fruit ripening and softening have not been characterized in public databases. The results contribute to an improved characterization of changes in gene expression during apple fruit ripening and softening. PMID:26719904
Schmouth, Jean-François; Castellarin, Mauro; Laprise, Stéphanie; Banks, Kathleen G; Bonaguro, Russell J; McInerny, Simone C; Borretta, Lisa; Amirabbasi, Mahsa; Korecki, Andrea J; Portales-Casamar, Elodie; Wilson, Gary; Dreolini, Lisa; Jones, Steven J M; Wasserman, Wyeth W; Goldowitz, Daniel; Holt, Robert A; Simpson, Elizabeth M
2013-10-14
The next big challenge in human genetics is understanding the 98% of the genome that comprises non-coding DNA. Hidden in this DNA are sequences critical for gene regulation, and new experimental strategies are needed to understand the functional role of gene-regulation sequences in health and disease. In this study, we build upon our HuGX ('high-throughput human genes on the X chromosome') strategy to expand our understanding of human gene regulation in vivo. In all, ten human genes known to express in therapeutically important brain regions were chosen for study. For eight of these genes, human bacterial artificial chromosome clones were identified, retrofitted with a reporter, knocked single-copy into the Hprt locus in mouse embryonic stem cells, and mouse strains derived. Five of these human genes expressed in mouse, and all expressed in the adult brain region for which they were chosen. This defined the boundaries of the genomic DNA sufficient for brain expression, and refined our knowledge regarding the complexity of gene regulation. We also characterized for the first time the expression of human MAOA and NR2F2, two genes for which the mouse homologs have been extensively studied in the central nervous system (CNS), and AMOTL1 and NOV, for which roles in CNS have been unclear. We have demonstrated the use of the HuGX strategy to functionally delineate non-coding-regulatory regions of therapeutically important human brain genes. Our results also show that a careful investigation, using publicly available resources and bioinformatics, can lead to accurate predictions of gene expression.
Singhal, Dinesh K; Singhal, Raxita; Malik, Hruda N; Kumar, Surender; Kumar, Sudarshan; Mohanty, Ashok K; Kaushik, Jai K; Malakar, Dhruba
2014-01-01
Nanog is a homeodomain containing protein which plays important roles in regulation of signaling pathways for maintenance and induction of pluripotency in stem cells. Because of its unique expression in stem cells it is also regarded as pluripotency marker. In this study goat Nanog (gNanog) gene has been amplified, cloned and characterized at sequence level with successful over-expression in CHO-K1 cell line using a lentiviral based system. gNanog ORF is 903 bp long which codes for Nanog protein of size 300 amino acids (aas). Complete nucleotide sequence shows some evolutionary mutation in goat in comparision to other species. Protein sequence of goat is highly similar to other species. Overall, gNanog nucleotide sequence and predicted protein sequence showed high similarity and minimum divergence with cattle (96 % identity/4 % divergence) and buffalo (94/5 %) while low similarity and high divergence with pig (84/15 %), human (81/23 %) and mouse (69/40 %) indicating evolutionary closeness of gNanog to cattle and buffalo. gNanog lentiviral expression construct was prepared for over-expression of Nanog gene in adult goat fibroblast cells. Lentiviral expression construct of Nanog enabled continuous protein expression for induction and maintenance of pluripotency. Western blotting revealed the expression of Nanog gene at protein level which supported that the lentiviral expression system is highly promising for Nanog protein expression in differentiated goat cell.
Kameishi, Sumako; Umemoto, Terumasa; Matsuzaki, Yu; Fujita, Masako; Okano, Teruo; Kato, Takashi; Yamato, Masayuki
2016-05-06
Corneal epithelial stem cells reside in the limbus, a transitional zone between the cornea and conjunctiva, and are essential for maintaining homeostasis in the corneal epithelium. Although our previous studies demonstrated that rabbit limbal epithelial side population (SP) cells exhibit stem cell-like phenotypes with Hoechst 33342 staining, the different characteristics and/or populations of these cells remain unclear. Therefore, in this study, we determined the gene expression profiles of limbal epithelial SP cells by RNA sequencing using not only present public databases but also contigs that were created by de novo transcriptome assembly as references for mapping. Our transcriptome data indicated that limbal epithelial SP cells exhibited a stem cell-like phenotype compared with non-SP cells. Importantly, gene ontology analysis following RNA sequencing demonstrated that limbal epithelial SP cells exhibited significantly enhanced expression of mesenchymal/endothelial cell markers rather than epithelial cell markers. Furthermore, single-cell quantitative real-time reverse transcription polymerase chain reaction (qRT-PCR) demonstrated that the limbal epithelial SP population consisted of at least two immature cell populations with endothelial- or mesenchymal-like phenotypes. Therefore, our present results may propose the presence of a novel population of corneal epithelial stem cells distinct from conventional epithelial stem cells. Copyright © 2015 Elsevier Inc. All rights reserved.
Massa, Sónia I; Pearson, Gareth A; Aires, Tânia; Kube, Michael; Olsen, Jeanine L; Reinhardt, Richard; Serrão, Ester A; Arnaud-Haond, Sophie
2011-09-01
Predicted global climate change threatens the distributional ranges of species worldwide. We identified genes expressed in the intertidal seagrass Zostera noltii during recovery from a simulated low tide heat-shock exposure. Five Expressed Sequence Tag (EST) libraries were compared, corresponding to four recovery times following sub-lethal temperature stress, and a non-stressed control. We sequenced and analyzed 7009 sequence reads from 30min, 2h, 4h and 24h after the beginning of the heat-shock (AHS), and 1585 from the control library, for a total of 8594 sequence reads. Among 51 Tentative UniGenes (TUGs) exhibiting significantly different expression between libraries, 19 (37.3%) were identified as 'molecular chaperones' and were over-expressed following heat-shock, while 12 (23.5%) were 'photosynthesis TUGs' generally under-expressed in heat-shocked plants. A time course analysis of expression showed a rapid increase in expression of the molecular chaperone class, most of which were heat-shock proteins; which increased from 2 sequence reads in the control library to almost 230 in the 30min AHS library, followed by a slow decrease during further recovery. In contrast, 'photosynthesis TUGs' were under-expressed 30min AHS compared with the control library, and declined progressively with recovery time in the stress libraries, with a total of 29 sequence reads 24h AHS, compared with 125 in the control. A total of 4734 TUGs were screened for EST-Single Sequence Repeats (EST-SSRs) and 86 microsatellites were identified. Copyright © 2011 Elsevier B.V. All rights reserved.
FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies.
Abugessaisa, Imad; Noguchi, Shuhei; Hasegawa, Akira; Harshbarger, Jayson; Kondo, Atsushi; Lizio, Marina; Severin, Jessica; Carninci, Piero; Kawaji, Hideya; Kasukawa, Takeya
2017-08-29
The FANTOM5 consortium described the promoter-level expression atlas of human and mouse by using CAGE (Cap Analysis of Gene Expression) with single molecule sequencing. In the original publications, GRCh37/hg19 and NCBI37/mm9 assemblies were used as the reference genomes of human and mouse respectively; later, the Genome Reference Consortium released newer genome assemblies GRCh38/hg38 and GRCm38/mm10. To increase the utility of the atlas in forthcoming researches, we reprocessed the data to make them available on the recent genome assemblies. The data include observed frequencies of transcription starting sites (TSSs) based on the realignment of CAGE reads, and TSS peaks that are converted from those based on the previous reference. Annotations of the peak names were also updated based on the latest public databases. The reprocessed results enable us to examine frequencies of transcription initiations on the recent genome assemblies and to refer promoters with updated information across the genome assemblies consistently.
SBR-Blood: systems biology repository for hematopoietic cells.
Lichtenberg, Jens; Heuston, Elisabeth F; Mishra, Tejaswini; Keller, Cheryl A; Hardison, Ross C; Bodine, David M
2016-01-04
Extensive research into hematopoiesis (the development of blood cells) over several decades has generated large sets of expression and epigenetic profiles in multiple human and mouse blood cell types. However, there is no single location to analyze how gene regulatory processes lead to different mature blood cells. We have developed a new database framework called hematopoietic Systems Biology Repository (SBR-Blood), available online at http://sbrblood.nhgri.nih.gov, which allows user-initiated analyses for cell type correlations or gene-specific behavior during differentiation using publicly available datasets for array- and sequencing-based platforms from mouse hematopoietic cells. SBR-Blood organizes information by both cell identity and by hematopoietic lineage. The validity and usability of SBR-Blood has been established through the reproduction of workflows relevant to expression data, DNA methylation, histone modifications and transcription factor occupancy profiles. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Tartar, Aurélien; Wheeler, Marsha M; Zhou, Xuguo; Coy, Monique R; Boucias, Drion G; Scharf, Michael E
2009-01-01
Background Termite lignocellulose digestion is achieved through a collaboration of host plus prokaryotic and eukaryotic symbionts. In the present work, we took a combined host and symbiont metatranscriptomic approach for investigating the digestive contributions of host and symbiont in the lower termite Reticulitermes flavipes. Our approach consisted of parallel high-throughput sequencing from (i) a host gut cDNA library and (ii) a hindgut symbiont cDNA library. Subsequently, we undertook functional analyses of newly identified phenoloxidases with potential importance as pretreatment enzymes in industrial lignocellulose processing. Results Over 10,000 expressed sequence tags (ESTs) were sequenced from the 2 libraries that aligned into 6,555 putative transcripts, including 171 putative lignocellulase genes. Sequence analyses provided insights in two areas. First, a non-overlapping complement of host and symbiont (prokaryotic plus protist) glycohydrolase gene families known to participate in cellulose, hemicellulose, alpha carbohydrate, and chitin degradation were identified. Of these, cellulases are contributed by host plus symbiont genomes, whereas hemicellulases are contributed exclusively by symbiont genomes. Second, a diverse complement of previously unknown genes that encode proteins with homology to lignase, antioxidant, and detoxification enzymes were identified exclusively from the host library (laccase, catalase, peroxidase, superoxide dismutase, carboxylesterase, cytochrome P450). Subsequently, functional analyses of phenoloxidase activity provided results that were strongly consistent with patterns of laccase gene expression. In particular, phenoloxidase activity and laccase gene expression are mostly restricted to symbiont-free foregut plus salivary gland tissues, and phenoloxidase activity is inducible by lignin feeding. Conclusion To our knowledge, this is the first time that a dual host-symbiont transcriptome sequencing effort has been conducted in a single termite species. This sequence database represents an important new genomic resource for use in further studies of collaborative host-symbiont termite digestion, as well as development of coevolved host and symbiont-derived biocatalysts for use in industrial biomass-to-bioethanol applications. Additionally, this study demonstrates that: (i) phenoloxidase activities are prominent in the R. flavipes gut and are not symbiont derived, (ii) expands the known number of host and symbiont glycosyl hydrolase families in Reticulitermes, and (iii) supports previous models of lignin degradation and host-symbiont collaboration in cellulose/hemicellulose digestion in the termite gut. All sequences in this paper are available publicly with the accession numbers FL634956-FL640828 (Termite Gut library) and FL641015-FL645753 (Symbiont library). PMID:19832970
Methods and compositions for regulating gene expression in plant cells
NASA Technical Reports Server (NTRS)
Dai, Shunhong (Inventor); Beachy, Roger N. (Inventor); Luis, Maria Isabel Ordiz (Inventor)
2010-01-01
Novel chimeric plant promoter sequences are provided, together with plant gene expression cassettes comprising such sequences. In certain preferred embodiments, the chimeric plant promoters comprise the BoxII cis element and/or derivatives thereof. In addition, novel transcription factors are provided, together with nucleic acid sequences encoding such transcription factors and plant gene expression cassettes comprising such nucleic acid sequences. In certain preferred embodiments, the novel transcription factors comprise the acidic domain, or fragments thereof, of the RF2a transcription factor. Methods for using the chimeric plant promoter sequences and novel transcription factors in regulating the expression of at least one gene of interest are provided, together with transgenic plants comprising such chimeric plant promoter sequences and novel transcription factors.
Sequences 5' to translation start regulate expression of petunia rbcS genes.
Dean, C; Favreau, M; Bedbrook, J; Dunsmuir, P
1989-01-01
The promoter sequences that contribute to quantitative differences in expression of the petunia genes (rbcS) encoding the small subunit of ribulose bisphosphate carboxylase have been characterized. The promoter regions of the two most abundantly expressed petunia rbcS genes, SSU301 and SSU611, show sequence similarity not present in other rbcS genes. We investigated the significance of these and other sequences by adding specific regions from the SSU301 promoter (the most strongly expressed gene) to equivalent regions in the SSU911 promoter (the least strongly expressed gene) and assaying the expression of the fusions in transgenic tobacco plants. In this way, we characterized an SSU301 promoter region (either from -285 to -178 or -291 to -204) which, when added to SSU911, in either orientation, increased SSU911 expression 25-fold. This increase was equivalent to that caused by addition of the entire SSU301 5'-flanking region. Replacement of SSU911 promoter sequences between -198 and the start codon with sequences from the equivalent region of SSU301 did not increase SSU911 expression significantly. The -291 to -204 SSU301 promoter fragment contributes significantly to quantitative differences in expression between the petunia rbcS genes. PMID:2535543
Sathyanarayana, N; Pittala, Ranjith Kumar; Tripathi, Pankaj Kumar; Chopra, Ratan; Singh, Heikham Russiachand; Belamkar, Vikas; Bhardwaj, Pardeep Kumar; Doyle, Jeff J; Egan, Ashley N
2017-05-25
The medicinal legume Mucuna pruriens (L.) DC. has attracted attention worldwide as a source of the anti-Parkinson's drug L-Dopa. It is also a popular green manure cover crop that offers many agronomic benefits including high protein content, nitrogen fixation and soil nutrients. The plant currently lacks genomic resources and there is limited knowledge on gene expression, metabolic pathways, and genetics of secondary metabolite production. Here, we present transcriptomic resources for M. pruriens, including a de novo transcriptome assembly and annotation, as well as differential transcript expression analyses between root, leaf, and pod tissues. We also develop microsatellite markers and analyze genetic diversity and population structure within a set of Indian germplasm accessions. One-hundred ninety-one million two hundred thirty-three thousand two hundred forty-two bp cleaned reads were assembled into 67,561 transcripts with mean length of 626 bp and N50 of 987 bp. Assembled sequences were annotated using BLASTX against public databases with over 80% of transcripts annotated. We identified 7,493 simple sequence repeat (SSR) motifs, including 787 polymorphic repeats between the parents of a mapping population. 134 SSRs from expressed sequenced tags (ESTs) were screened against 23 M. pruriens accessions from India, with 52 EST-SSRs retained after quality control. Population structure analysis using a Bayesian framework implemented in fastSTRUCTURE showed nearly similar groupings as with distance-based (neighbor-joining) and principal component analyses, with most of the accessions clustering per geographical origins. Pair-wise comparison of transcript expression in leaves, roots and pods identified 4,387 differentially expressed transcripts with the highest number occurring between roots and leaves. Differentially expressed transcripts were enriched with transcription factors and transcripts annotated as belonging to secondary metabolite pathways. The M. pruriens transcriptomic resources generated in this study provide foundational resources for gene discovery and development of molecular markers. Polymorphic SSRs identified can be used for genetic diversity, marker-trait analyses, and development of functional markers for crop improvement. The results of differential expression studies can be used to investigate genes involved in L-Dopa synthesis and other key metabolic pathways in M. pruriens.
A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics
Tang, Haixu; Li, Sujun; Ye, Yuzhen
2016-01-01
Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro. PMID:27918579
Li, Lingli; Zhang, Hehua; Liu, Zhongshuai; Cui, Xiaoyue; Zhang, Tong; Li, Yanfang; Zhang, Lingyun
2016-10-12
Blueberry is an economically important fruit crop in Ericaceae family. The substantial quantities of flavonoids in blueberry have been implicated in a broad range of health benefits. However, the information regarding fruit development and flavonoid metabolites based on the transcriptome level is still limited. In the present study, the transcriptome and gene expression profiling over berry development, especially during color development were initiated. A total of approximately 13.67 Gbp of data were obtained and assembled into 186,962 transcripts and 80,836 unigenes from three stages of blueberry fruit and color development. A large number of simple sequence repeats (SSRs) and candidate genes, which are potentially involved in plant development, metabolic and hormone pathways, were identified. A total of 6429 sequences containing 8796 SSRs were characterized from 15,457 unigenes and 1763 unigenes contained more than one SSR. The expression profiles of key genes involved in anthocyanin biosynthesis were also studied. In addition, a comparison between our dataset and other published results was carried out. Our high quality reads produced in this study are an important advancement and provide a new resource for the interpretation of high-throughput data for blueberry species whether regarding sequencing data depth or species extension. The use of this transcriptome data will serve as a valuable public information database for the studies of blueberry genome and would greatly boost the research of fruit and color development, flavonoid metabolisms and regulation and breeding of more healthful blueberries.
Berghoff, Bork A; Karlsson, Torgny; Källman, Thomas; Wagner, E Gerhart H; Grabherr, Manfred G
2017-01-01
Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. Here, we present a novel method, moose 2 , which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli , and show how moose 2 is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/. The proposed RNA-seq normalization method, moose 2 , is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates.
Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah
2012-01-01
Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.
Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah
2012-01-01
Background Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. Methodology/Principal Findings To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. Conclusions The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple. PMID:23091603
Cytokinin oxidase/dehydrogenase genes in barley and wheat: cloning and heterologous expression.
Galuszka, Petr; Frébortová, Jitka; Werner, Tomás; Yamada, Mamoru; Strnad, Miroslav; Schmülling, Thomas; Frébort, Ivo
2004-10-01
The cloning of two novel genes that encode cytokinin oxidase/dehydrogenase (CKX) in barley is described in this work. Transformation of both genes into Arabidopsis and tobacco showed that at least one of the genes codes for a functional enzyme, as its expression caused a cytokinin-deficient phenotype in the heterologous host plants. Additional cloning of two gene fragments, and an in silico search in the public expressed sequence tag clone databases, revealed the presence of at least 13 more members of the CKX gene family in barley and wheat. The expression of three selected barley genes was analyzed by RT-PCR and found to be organ-specific with peak expression in mature kernels. One barley CKX (HvCKX2) was characterized in detail after heterologous expression in tobacco. Interestingly, this enzyme shows a pH optimum at 4.5 and a preference for cytokinin ribosides as substrates, which may indicate its vacuolar targeting. Different substrate specificities, and the pH profiles of cytokinin-degrading enzymes extracted from different barley tissues, are also presented.
Dittmar, W James; McIver, Lauren; Michalak, Pawel; Garner, Harold R; Valdez, Gregorio
2014-07-01
The wealth of publicly available gene expression and genomic data provides unique opportunities for computational inference to discover groups of genes that function to control specific cellular processes. Such genes are likely to have co-evolved and be expressed in the same tissues and cells. Unfortunately, the expertise and computational resources required to compare tens of genomes and gene expression data sets make this type of analysis difficult for the average end-user. Here, we describe the implementation of a web server that predicts genes involved in affecting specific cellular processes together with a gene of interest. We termed the server 'EvoCor', to denote that it detects functional relationships among genes through evolutionary analysis and gene expression correlation. This web server integrates profiles of sequence divergence derived by a Hidden Markov Model (HMM) and tissue-wide gene expression patterns to determine putative functional linkages between pairs of genes. This server is easy to use and freely available at http://pilot-hmm.vbi.vt.edu/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
USDA-ARS?s Scientific Manuscript database
Polymorphic genetic markers were identified and characterized using a partial genomic library of Heliothis virescens enriched for simple sequence repeats (SSR) and nucleotide sequences of expressed sequence tags (EST). Nucleotide sequences of 192 clones from the partial genomic library yielded 147 u...
Halper, Sean M; Cetnar, Daniel P; Salis, Howard M
2018-01-01
Engineering many-enzyme metabolic pathways suffers from the design curse of dimensionality. There are an astronomical number of synonymous DNA sequence choices, though relatively few will express an evolutionary robust, maximally productive pathway without metabolic bottlenecks. To solve this challenge, we have developed an integrated, automated computational-experimental pipeline that identifies a pathway's optimal DNA sequence without high-throughput screening or many cycles of design-build-test. The first step applies our Operon Calculator algorithm to design a host-specific evolutionary robust bacterial operon sequence with maximally tunable enzyme expression levels. The second step applies our RBS Library Calculator algorithm to systematically vary enzyme expression levels with the smallest-sized library. After characterizing a small number of constructed pathway variants, measurements are supplied to our Pathway Map Calculator algorithm, which then parameterizes a kinetic metabolic model that ultimately predicts the pathway's optimal enzyme expression levels and DNA sequences. Altogether, our algorithms provide the ability to efficiently map the pathway's sequence-expression-activity space and predict DNA sequences with desired metabolic fluxes. Here, we provide a step-by-step guide to applying the Pathway Optimization Pipeline on a desired multi-enzyme pathway in a bacterial host.
Control of total GFP expression by alterations to the 3′ region nucleotide sequence
2013-01-01
Background Previously, we distinguished the Escherichia coli type II cytoplasmic membrane translocation pathways of Tat, Yid, and Sec for unfolded and folded soluble target proteins. The translocation of folded protein to the periplasm for soluble expression via the Tat pathway was controlled by an N-terminal hydrophilic leader sequence. In this study, we investigated the effect of the hydrophilic C-terminal end and its nucleotide sequence on total and soluble protein expression. Results The native hydrophilic C-terminal end of GFP was obtained by deleting the C-terminal peptide LeuGlu-6×His, derived from pET22b(+). The corresponding clones induced total and soluble GFP expression that was either slightly increased or dramatically reduced, apparently through reconstruction of the nucleotide sequence around the stop codon in the 3′ region. In the expression-induced clones, the hydrophilic C-terminus showed increased Tat pathway specificity for soluble expression. However, in the expression-reduced clone, after analyzing the role of the 5′ poly(A) coding sequence with a substituted synonymous codon, we proved that the longer 5′ poly(A) coding sequence interacted with the reconstructed 3′ region nucleotide sequence to create a new mRNA tertiary structure between the 5′ and 3′ regions, which resulted in reduced total GFP expression. Further, to recover the reduced expression by changing the 3′ nucleotide sequence, after replacing selected C-terminal 5′ codons and the stop codon in the ORF with synonymous codons, total GFP expression in most of the clones was recovered to the undeleted control level. The insertion of trinucleotides after the stop codon in the 3′-UTR recovered or reduced total GFP expression. RT-PCR revealed that the level of total protein expression was controlled by changes in translational or transcriptional regulation, which were induced or reduced by the substitution or insertion of 3′ region nucleotides. Conclusions We found that the hydrophilic C-terminal end of GFP increased Tat pathway specificity and that the 3′ nucleotide sequence played an important role in total protein expression through translational and transcriptional regulation. These findings may be useful for efficiently producing recombinant proteins as well as for potentially controlling the expression level of specific genes in the body for therapeutic purposes. PMID:23834827
Bar, Ido; Cummins, Scott; Elizur, Abigail
2016-03-10
Controlling and managing the breeding of bluefin tuna (Thunnus spp.) in captivity is an imperative step towards obtaining a sustainable supply of these fish in aquaculture production systems. Germ cell transplantation (GCT) is an innovative technology for the production of inter-species surrogates, by transplanting undifferentiated germ cells derived from a donor species into larvae of a host species. The transplanted surrogates will then grow and mature to produce donor-derived seed, thus providing a simpler alternative to maintaining large-bodied broodstock such as the bluefin tuna. Implementation of GCT for new species requires the development of molecular tools to follow the fate of the transplanted germ cells. These tools are based on key reproductive and germ cell-specific genes. RNA-Sequencing (RNA-Seq) provides a rapid, cost-effective method for high throughput gene identification in non-model species. This study utilized RNA-Seq to identify key genes expressed in the gonads of Southern bluefin tuna (Thunnus maccoyii, SBT) and their specific expression patterns in male and female gonad cells. Key genes involved in the reproductive molecular pathway and specifically, germ cell development in gonads, were identified using analysis of RNA-Seq transcriptomes of male and female SBT gonad cells. Expression profiles of transcripts from ovary and testis cells were compared, as well as testis germ cell-enriched fraction prepared with Percoll gradient, as used in GCT studies. Ovary cells demonstrated over-expression of genes related to stem cell maintenance, while in testis cells, transcripts encoding for reproduction-associated receptors, sex steroids and hormone synthesis and signaling genes were over-expressed. Within the testis cells, the Percoll-enriched fraction showed over-expression of genes that are related to post-meiosis germ cell populations. Gonad development and germ cell related genes were identified from SBT gonads and their expression patterns in ovary and testis cells were determined. These expression patterns correlate with the reproductive developmental stage of the sampled fish. The majority of the genes described in this study were sequenced for the first time in T. maccoyii. The wealth of SBT gonadal and germ cell-related gene sequences made publicly available by this study provides an extensive resource for further GCT and reproductive molecular biology studies of this commercially valuable fish.
HypoxiaDB: a database of hypoxia-regulated proteins
Khurana, Pankaj; Sugadev, Ragumani; Jain, Jaspreet; Singh, Shashi Bala
2013-01-01
There has been intense interest in the cellular response to hypoxia, and a large number of differentially expressed proteins have been identified through various high-throughput experiments. These valuable data are scattered, and there have been no systematic attempts to document the various proteins regulated by hypoxia. Compilation, curation and annotation of these data are important in deciphering their role in hypoxia and hypoxia-related disorders. Therefore, we have compiled HypoxiaDB, a database of hypoxia-regulated proteins. It is a comprehensive, manually-curated, non-redundant catalog of proteins whose expressions are shown experimentally to be altered at different levels and durations of hypoxia. The database currently contains 72 000 manually curated entries taken on 3500 proteins extracted from 73 peer-reviewed publications selected from PubMed. HypoxiaDB is distinctive from other generalized databases: (i) it compiles tissue-specific protein expression changes under different levels and duration of hypoxia. Also, it provides manually curated literature references to support the inclusion of the protein in the database and establish its association with hypoxia. (ii) For each protein, HypoxiaDB integrates data on gene ontology, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway, protein–protein interactions, protein family (Pfam), OMIM (Online Mendelian Inheritance in Man), PDB (Protein Data Bank) structures and homology to other sequenced genomes. (iii) It also provides pre-compiled information on hypoxia-proteins, which otherwise requires tedious computational analysis. This includes information like chromosomal location, identifiers like Entrez, HGNC, Unigene, Uniprot, Ensembl, Vega, GI numbers and Genbank accession numbers associated with the protein. These are further cross-linked to respective public databases augmenting HypoxiaDB to the external repositories. (iv) In addition, HypoxiaDB provides an online sequence-similarity search tool for users to compare their protein sequences with HypoxiaDB protein database. We hope that HypoxiaDB will enrich our knowledge about hypoxia-related biology and eventually will lead to the development of novel hypothesis and advancements in diagnostic and therapeutic activities. HypoxiaDB is freely accessible for academic and non-profit users via http://www.hypoxiadb.com. Database URL: http://www.hypoxiadb.com PMID:24178989
Allen Brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system
Sunkin, Susan M.; Ng, Lydia; Lau, Chris; Dolbeare, Tim; Gilbert, Terri L.; Thompson, Carol L.; Hawrylycz, Michael; Dang, Chinh
2013-01-01
The Allen Brain Atlas (http://www.brain-map.org) provides a unique online public resource integrating extensive gene expression data, connectivity data and neuroanatomical information with powerful search and viewing tools for the adult and developing brain in mouse, human and non-human primate. Here, we review the resources available at the Allen Brain Atlas, describing each product and data type [such as in situ hybridization (ISH) and supporting histology, microarray, RNA sequencing, reference atlases, projection mapping and magnetic resonance imaging]. In addition, standardized and unique features in the web applications are described that enable users to search and mine the various data sets. Features include both simple and sophisticated methods for gene searches, colorimetric and fluorescent ISH image viewers, graphical displays of ISH, microarray and RNA sequencing data, Brain Explorer software for 3D navigation of anatomy and gene expression, and an interactive reference atlas viewer. In addition, cross data set searches enable users to query multiple Allen Brain Atlas data sets simultaneously. All of the Allen Brain Atlas resources can be accessed through the Allen Brain Atlas data portal. PMID:23193282
Transterm—extended search facilities and improved integration with other databases
Jacobs, Grant H.; Stockwell, Peter A.; Tate, Warren P.; Brown, Chris M.
2006-01-01
Transterm has now been publicly available for >10 years. Major changes have been made since its last description in this database issue in 2002. The current database provides data for key regions of mRNA sequences, a curated database of mRNA motifs and tools to allow users to investigate their own motifs or mRNA sequences. The key mRNA regions database is derived computationally from Genbank. It contains 3′ and 5′ flanking regions, the initiation and termination signal context and coding sequence for annotated CDS features from Genbank and RefSeq. The database is non-redundant, enabling summary files and statistics to be prepared for each species. Advances include providing extended search facilities, the database may now be searched by BLAST in addition to regular expressions (patterns) allowing users to search for motifs such as known miRNA sequences, and the inclusion of RefSeq data. The database contains >40 motifs or structural patterns important for translational control. In this release, patterns from UTRsite and Rfam are also incorporated with cross-referencing. Users may search their sequence data with Transterm or user-defined patterns. The system is accessible at . PMID:16381889
Li, Hui; Li, Defang; Chen, Anguo; Tang, Huijuan; Li, Jianjun; Huang, Siqi
2016-01-01
Kenaf (Hibiscus cannabinus L.) is an economically important natural fiber crop grown worldwide. However, only 20 expressed tag sequences (ESTs) for kenaf are available in public databases. The aim of this study was to develop large-scale simple sequence repeat (SSR) markers to lay a solid foundation for the construction of genetic linkage maps and marker-assisted breeding in kenaf. We used Illumina paired-end sequencing technology to generate new EST-simple sequences and MISA software to mine SSR markers. We identified 71,318 unigenes with an average length of 1143 nt and annotated these unigenes using four different protein databases. Overall, 9324 complementary pairs were designated as EST-SSR markers, and their quality was validated using 100 randomly selected SSR markers. In total, 72 primer pairs reproducibly amplified target amplicons, and 61 of these primer pairs detected significant polymorphism among 28 kenaf accessions. Thus, in this study, we have developed large-scale SSR markers for kenaf, and this new resource will facilitate construction of genetic linkage maps, investigation of fiber growth and development in kenaf, and also be of value to novel gene discovery and functional genomic studies. PMID:26960153
Carrillo-Tapia, Eduardo; García-García, Elizabeth; Herrera-González, Norma Estela; Yamazaki-Nakashimada, Marco Antonio; Staines-Boone, Aidee Tamara; Segura-Mendez, Nora Hilda; Scheffler-Mendoza, Selma Cecilia; O Farrill-Romanillos, Patricia; Gonzalez-Serrano, Maria E; Rodriguez-Alba, Juan Carloa; Santos-Argumedo, Leopoldo; Berron-Ruiz, Laura; Sanchez-Flores, Alejandro; López-Herrera, Gabriela
2018-01-01
X-linked agammaglobulinemia (XLA) is characterized by the absence of immunoglobulin and B cells. Patients suffer from recurrent bacterial infections from early childhood, and require lifelong immunoglobulin replacement therapy. Mutations in BTK (Bruton's Tyrosine Kinase) are associated with this phenotype. Some patients that present XLA do not show typical clinical symptoms, resulting in delayed diagnosis due to the lack of a severe phenotype. This study presents a report of five XLA patients from four different families and attempts to determine a relationship between delayed diagnosis and the occurrence of BTK mutations. Samples from patients with antibody deficiency were analyzed to determine BTK expression, immunophenotyping and mutation analysis. Clinical and laboratory data was analyzed and presented for each patient. Most patients presented here showed atypical clinical and laboratory data for XLA, including normal IgM, IgG, or IgA levels. Most patients expressed detectable BTK protein. Sequencing of BTK showed that these patients harbored missense mutations in the pleckstrin homology and Src-homology-2 domains. When it was compared to public databases, BTK sequencing exhibited a new change, along with three other previously reported changes. Delayed diagnosis and atypical manifestations in XLA might be related to mutation type and BTK expression.
NCBI GEO: archive for functional genomics data sets—10 years on
Barrett, Tanya; Troup, Dennis B.; Wilhite, Stephen E.; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F.; Tomashevsky, Maxim; Marshall, Kimberly A.; Phillippy, Katherine H.; Sherman, Patti M.; Muertter, Rolf N.; Holko, Michelle; Ayanbule, Oluwabukunmi; Yefanov, Andrey; Soboleva, Alexandra
2011-01-01
A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20 000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/. PMID:21097893
Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis.
Journet, Etienne-Pascal; van Tuinen, Diederik; Gouzy, Jérome; Crespeau, Hervé; Carreau, Véronique; Farmer, Mary-Jo; Niebel, Andreas; Schiex, Thomas; Jaillon, Olivier; Chatagnier, Odile; Godiard, Laurence; Micheli, Fabienne; Kahn, Daniel; Gianinazzi-Pearson, Vivienne; Gamas, Pascal
2002-12-15
We report on a large-scale expressed sequence tag (EST) sequencing and analysis program aimed at characterizing the sets of genes expressed in roots of the model legume Medicago truncatula during interactions with either of two microsymbionts, the nitrogen-fixing bacterium Sinorhizobium meliloti or the arbuscular mycorrhizal fungus Glomus intraradices. We have designed specific tools for in silico analysis of EST data, in relation to chimeric cDNA detection, EST clustering, encoded protein prediction, and detection of differential expression. Our 21 473 5'- and 3'-ESTs could be grouped into 6359 EST clusters, corresponding to distinct virtual genes, along with 52 498 other M.truncatula ESTs available in the dbEST (NCBI) database that were recruited in the process. These clusters were manually annotated, using a specifically developed annotation interface. Analysis of EST cluster distribution in various M.truncatula cDNA libraries, supported by a refined R test to evaluate statistical significance and by 'electronic northern' representation, enabled us to identify a large number of novel genes predicted to be up- or down-regulated during either symbiotic root interaction. These in silico analyses provide a first global view of the genetic programs for root symbioses in M.truncatula. A searchable database has been built and can be accessed through a public interface.
Kodama, Yuichi; Mashima, Jun; Kaminuma, Eli; Gojobori, Takashi; Ogasawara, Osamu; Takagi, Toshihisa; Okubo, Kousaku; Nakamura, Yasukazu
2012-01-01
The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: the 'DDBJ Omics Archive' (DOR; http://trace.ddbj.nig.ac.jp/dor) and BioProject (http://trace.ddbj.nig.ac.jp/bioproject). DOR is an archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides an organizational framework to access metadata about research projects and the data from the projects that are deposited into different databases. In this article, we describe major changes and improvements introduced to the DDBJ services, and the launch of two new resources: DOR and BioProject.
Mariani, Luca; Weinand, Kathryn; Vedenko, Anastasia; Barrera, Luis A; Bulyk, Martha L
2017-09-27
Transcription factors (TFs) control cellular processes by binding specific DNA motifs to modulate gene expression. Motif enrichment analysis of regulatory regions can identify direct and indirect TF binding sites. Here, we created a glossary of 108 non-redundant TF-8mer "modules" of shared specificity for 671 metazoan TFs from publicly available and new universal protein binding microarray data. Analysis of 239 ENCODE TF chromatin immunoprecipitation sequencing datasets and associated RNA sequencing profiles suggest the 8mer modules are more precise than position weight matrices in identifying indirect binding motifs and their associated tethering TFs. We also developed GENRE (genomically equivalent negative regions), a tunable tool for construction of matched genomic background sequences for analysis of regulatory regions. GENRE outperformed four state-of-the-art approaches to background sequence construction. We used our TF-8mer glossary and GENRE in the analysis of the indirect binding motifs for the co-occurrence of tethering factors, suggesting novel TF-TF interactions. We anticipate that these tools will aid in elucidating tissue-specific gene-regulatory programs. Copyright © 2017 Elsevier Inc. All rights reserved.
PlantTFDB: a comprehensive plant transcription factor database
Guo, An-Yuan; Chen, Xin; Gao, Ge; Zhang, He; Zhu, Qi-Hui; Liu, Xiao-Chuan; Zhong, Ying-Fu; Gu, Xiaocheng; He, Kun; Luo, Jingchu
2008-01-01
Transcription factors (TFs) play key roles in controlling gene expression. Systematic identification and annotation of TFs, followed by construction of TF databases may serve as useful resources for studying the function and evolution of transcription factors. We developed a comprehensive plant transcription factor database PlantTFDB (http://planttfdb.cbi.pku.edu.cn), which contains 26 402 TFs predicted from 22 species, including five model organisms with available whole genome sequence and 17 plants with available EST sequences. To provide comprehensive information for those putative TFs, we made extensive annotation at both family and gene levels. A brief introduction and key references were presented for each family. Functional domain information and cross-references to various well-known public databases were available for each identified TF. In addition, we predicted putative orthologs of those TFs among the 22 species. PlantTFDB has a simple interface to allow users to search the database by IDs or free texts, to make sequence similarity search against TFs of all or individual species, and to download TF sequences for local analysis. PMID:17933783
Adaptable gene-specific dye bias correction for two-channel DNA microarrays.
Margaritis, Thanasis; Lijnzaad, Philip; van Leenen, Dik; Bouwmeester, Diane; Kemmeren, Patrick; van Hooff, Sander R; Holstege, Frank C P
2009-01-01
DNA microarray technology is a powerful tool for monitoring gene expression or for finding the location of DNA-bound proteins. DNA microarrays can suffer from gene-specific dye bias (GSDB), causing some probes to be affected more by the dye than by the sample. This results in large measurement errors, which vary considerably for different probes and also across different hybridizations. GSDB is not corrected by conventional normalization and has been difficult to address systematically because of its variance. We show that GSDB is influenced by label incorporation efficiency, explaining the variation of GSDB across different hybridizations. A correction method (Gene- And Slide-Specific Correction, GASSCO) is presented, whereby sequence-specific corrections are modulated by the overall bias of individual hybridizations. GASSCO outperforms earlier methods and works well on a variety of publically available datasets covering a range of platforms, organisms and applications, including ChIP on chip. A sequence-based model is also presented, which predicts which probes will suffer most from GSDB, useful for microarray probe design and correction of individual hybridizations. Software implementing the method is publicly available.
Adaptable gene-specific dye bias correction for two-channel DNA microarrays
Margaritis, Thanasis; Lijnzaad, Philip; van Leenen, Dik; Bouwmeester, Diane; Kemmeren, Patrick; van Hooff, Sander R; Holstege, Frank CP
2009-01-01
DNA microarray technology is a powerful tool for monitoring gene expression or for finding the location of DNA-bound proteins. DNA microarrays can suffer from gene-specific dye bias (GSDB), causing some probes to be affected more by the dye than by the sample. This results in large measurement errors, which vary considerably for different probes and also across different hybridizations. GSDB is not corrected by conventional normalization and has been difficult to address systematically because of its variance. We show that GSDB is influenced by label incorporation efficiency, explaining the variation of GSDB across different hybridizations. A correction method (Gene- And Slide-Specific Correction, GASSCO) is presented, whereby sequence-specific corrections are modulated by the overall bias of individual hybridizations. GASSCO outperforms earlier methods and works well on a variety of publically available datasets covering a range of platforms, organisms and applications, including ChIP on chip. A sequence-based model is also presented, which predicts which probes will suffer most from GSDB, useful for microarray probe design and correction of individual hybridizations. Software implementing the method is publicly available. PMID:19401678
Compressing DNA sequence databases with coil.
White, W Timothy J; Hendy, Michael D
2008-05-20
Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression - an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression - the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.
Compressing DNA sequence databases with coil
White, W Timothy J; Hendy, Michael D
2008-01-01
Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work. PMID:18489794
Genome-wide characterization of differential transcript usage in Arabidopsis thaliana.
Vaneechoutte, Dries; Estrada, April R; Lin, Ying-Chen; Loraine, Ann E; Vandepoele, Klaas
2017-12-01
Alternative splicing and the usage of alternate transcription start- or stop sites allows a single gene to produce multiple transcript isoforms. Most plant genes express certain isoforms at a significantly higher level than others, but under specific conditions this expression dominance can change, resulting in a different set of dominant isoforms. These events of differential transcript usage (DTU) have been observed for thousands of Arabidopsis thaliana, Zea mays and Vitis vinifera genes, and have been linked to development and stress response. However, neither the characteristics of these genes, nor the implications of DTU on their protein coding sequences or functions, are currently well understood. Here we present a dataset of isoform dominance and DTU for all genes in the AtRTD2 reference transcriptome based on a protocol that was benchmarked on simulated data and validated through comparison with a published reverse transciptase-polymerase chain reaction panel. We report DTU events for 8148 genes across 206 public RNA-Seq samples, and find that protein sequences are affected in 22% of the cases. The observed DTU events show high consistency across replicates, and reveal reproducible patterns in response to treatment and development. We also demonstrate that genes with different evolutionary ages, expression breadths and functions show large differences in the frequency at which they undergo DTU, and in the effect that these events have on their protein sequences. Finally, we showcase how the generated dataset can be used to explore DTU events for genes of interest or to find genes with specific DTU in samples of interest. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.
He, Xin; Samee, Md. Abul Hassan; Blatti, Charles; Sinha, Saurabh
2010-01-01
Quantitative models of cis-regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled, or heuristic approximations of the underlying regulatory mechanisms. We have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence, as a function of transcription factor concentrations and their DNA-binding specificities. It uses statistical thermodynamics theory to model not only protein-DNA interaction, but also the effect of DNA-bound activators and repressors on gene expression. In addition, the model incorporates mechanistic features such as synergistic effect of multiple activators, short range repression, and cooperativity in transcription factor-DNA binding, allowing us to systematically evaluate the significance of these features in the context of available expression data. Using this model on segmentation-related enhancers in Drosophila, we find that transcriptional synergy due to simultaneous action of multiple activators helps explain the data beyond what can be explained by cooperative DNA-binding alone. We find clear support for the phenomenon of short-range repression, where repressors do not directly interact with the basal transcriptional machinery. We also find that the binding sites contributing to an enhancer's function may not be conserved during evolution, and a noticeable fraction of these undergo lineage-specific changes. Our implementation of the model, called GEMSTAT, is the first publicly available program for simultaneously modeling the regulatory activities of a given set of sequences. PMID:20862354
Zhao, Ying; Thammannagowda, Shivegowda; Staton, Margaret; Tang, Sha; Xia, Xinli; Yin, Weilun; Liang, Haiying
2013-03-01
The "living fossil" Metasequoia glyptostroboides Hu et Cheng, commonly known as dawn redwood or Chinese redwood, is the only living species in the genus and is valued for its essential oil and crude extracts that have great potential for anti-fungal activity. Despite its paleontological significance and economical value as a rare relict species, genomic resources of Metasequoia are very limited. In order to gain insight into the molecular mechanisms behind the formation of reproductive buds and the transition from vegetative phase to reproductive phase in Metasequoia, we performed sequencing of expressed sequence tags from Metasequoia vegetative buds and female buds. By using the 454 pyrosequencing technology, a total of 1,571,764 high-quality reads were generated, among which 733,128 were from vegetative buds and 775,636 were from female buds. These EST reads were clustered and assembled into 114,124 putative unique transcripts (PUTs) with an average length of 536 bp. The 97,565 PUTs that were at least 100 bp in length were functionally annotated by a similarity search against public databases and assigned with Gene Ontology (GO) terms. A total of 59 known floral gene families and 190 isotigs involved in hormone regulation were captured in the dataset. Furthermore, a set of PUTs differentially expressed in vegetative and reproductive buds, as well as SSR motifs and high confidence SNPs, were identified. This is the first large-scale expressed sequence tags ever generated in Metasequoia and the first evidence for floral genes in this critically endangered deciduous conifer species.
DEXTER: Disease-Expression Relation Extraction from Text.
Gupta, Samir; Dingerdissen, Hayley; Ross, Karen E; Hu, Yu; Wu, Cathy H; Mazumder, Raja; Vijay-Shanker, K
2018-01-01
Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease. The scientific literature is another rich source of information on gene expression-disease relationships that not only have been captured from large-scale studies but have also been observed in thousands of small-scale studies. Expression information obtained from literature through manual curation can extend expression databases. While many of the existing databases include information from literature, they are limited by the time-consuming nature of manual curation and have difficulty keeping up with the explosion of publications in the biomedical field. In this work, we describe an automated text-mining tool, Disease-Expression Relation Extraction from Text (DEXTER) to extract information from literature on gene and microRNA expression in the context of disease. One of the motivations in developing DEXTER was to extend the BioXpress database, a cancer-focused gene expression database that includes data derived from large-scale experiments and manual curation of publications. The literature-based portion of BioXpress lags behind significantly compared to expression information obtained from large-scale studies and can benefit from our text-mined results. We have conducted two different evaluations to measure the accuracy of our text-mining tool and achieved average F-scores of 88.51 and 81.81% for the two evaluations, respectively. Also, to demonstrate the ability to extract rich expression information in different disease-related scenarios, we used DEXTER to extract information on differential expression information for 2024 genes in lung cancer, 115 glycosyltransferases in 62 cancers and 826 microRNA in 171 cancers. All extractions using DEXTER are integrated in the literature-based portion of BioXpress.Database URL: http://biotm.cis.udel.edu/DEXTER.
Strategies to explore functional genomics data sets in NCBI's GEO database.
Wilhite, Stephen E; Barrett, Tanya
2012-01-01
The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze, and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries.
Strategies to Explore Functional Genomics Data Sets in NCBI’s GEO Database
Wilhite, Stephen E.; Barrett, Tanya
2012-01-01
The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries. PMID:22130872
Bragalini, Claudia; Ribière, Céline; Parisot, Nicolas; Vallon, Laurent; Prudent, Elsa; Peyretaillade, Eric; Girlanda, Mariangela; Peyret, Pierre; Marmeisse, Roland; Luis, Patricia
2014-01-01
Eukaryotic microbial communities play key functional roles in soil biology and potentially represent a rich source of natural products including biocatalysts. Culture-independent molecular methods are powerful tools to isolate functional genes from uncultured microorganisms. However, none of the methods used in environmental genomics allow for a rapid isolation of numerous functional genes from eukaryotic microbial communities. We developed an original adaptation of the solution hybrid selection (SHS) for an efficient recovery of functional complementary DNAs (cDNAs) synthesized from soil-extracted polyadenylated mRNAs. This protocol was tested on the Glycoside Hydrolase 11 gene family encoding endo-xylanases for which we designed 35 explorative 31-mers capture probes. SHS was implemented on four soil eukaryotic cDNA pools. After two successive rounds of capture, >90% of the resulting cDNAs were GH11 sequences, of which 70% (38 among 53 sequenced genes) were full length. Between 1.5 and 25% of the cloned captured sequences were expressed in Saccharomyces cerevisiae. Sequencing of polymerase chain reaction-amplified GH11 gene fragments from the captured sequences highlighted hundreds of phylogenetically diverse sequences that were not yet described, in public databases. This protocol offers the possibility of performing exhaustive exploration of eukaryotic gene families within microbial communities thriving in any type of environment. PMID:25281543
Toxicogenomics and Cancer Susceptibility: Advances with Next-Generation Sequencing
Ning, Baitang; Su, Zhenqiang; Mei, Nan; Hong, Huixiao; Deng, Helen; Shi, Leming; Fuscoe, James C.; Tolleson, William H.
2017-01-01
The aim of this review is to comprehensively summarize the recent achievements in the field of toxicogenomics and cancer research regarding genetic-environmental interactions in carcinogenesis and detection of genetic aberrations in cancer genomes by next-generation sequencing technology. Cancer is primarily a genetic disease in which genetic factors and environmental stimuli interact to cause genetic and epigenetic aberrations in human cells. Mutations in the germline act as either high-penetrance alleles that strongly increase the risk of cancer development, or as low-penetrance alleles that mildly change an individual’s susceptibility to cancer. Somatic mutations, resulting from either DNA damage induced by exposure to environmental mutagens or from spontaneous errors in DNA replication or repair are involved in the development or progression of the cancer. Induced or spontaneous changes in the epigenome may also drive carcinogenesis. Advances in next-generation sequencing technology provide us opportunities to accurately, economically, and rapidly identify genetic variants, somatic mutations, gene expression profiles, and epigenetic alterations with single-base resolution. Whole genome sequencing, whole exome sequencing, and RNA sequencing of paired cancer and adjacent normal tissue present a comprehensive picture of the cancer genome. These new findings should benefit public health by providing insights in understanding cancer biology, and in improving cancer diagnosis and therapy. PMID:24875441
2011-01-01
Abstract Background Bupleurum chinense DC. is a widely used traditional Chinese medicinal plant. Saikosaponins are the major bioactive constituents of B. chinense, but relatively little is known about saikosaponin biosynthesis. The 454 pyrosequencing technology provides a promising opportunity for finding novel genes that participate in plant metabolism. Consequently, this technology may help to identify the candidate genes involved in the saikosaponin biosynthetic pathway. Results One-quarter of the 454 pyrosequencing runs produced a total of 195, 088 high-quality reads, with an average read length of 356 bases (NCBI SRA accession SRA039388). A de novo assembly generated 24, 037 unique sequences (22, 748 contigs and 1, 289 singletons), 12, 649 (52.6%) of which were annotated against three public protein databases using a basic local alignment search tool (E-value ≤1e-10). All unique sequences were compared with NCBI expressed sequence tags (ESTs) (237) and encoding sequences (44) from the Bupleurum genus, and with a Sanger-sequenced EST dataset (3, 111). The 23, 173 (96.4%) unique sequences obtained in the present study represent novel Bupleurum genes. The ESTs of genes related to saikosaponin biosynthesis were found to encode known enzymes that catalyze the formation of the saikosaponin backbone; 246 cytochrome P450 (P450s) and 102 glycosyltransferases (GTs) unique sequences were also found in the 454 dataset. Full length cDNAs of 7 P450s and 7 uridine diphosphate GTs (UGTs) were verified by reverse transcriptase polymerase chain reaction or by cloning using 5' and/or 3' rapid amplification of cDNA ends. Two P450s and three UGTs were identified as the most likely candidates involved in saikosaponin biosynthesis. This finding was based on the coordinate up-regulation of their expression with β-AS in methyl jasmonate-treated adventitious roots and on their similar expression patterns with β-AS in various B. chinense tissues. Conclusions A collection of high-quality ESTs for B. chinense obtained by 454 pyrosequencing is provided here for the first time. These data should aid further research on the functional genomics of B. chinense and other Bupleurum species. The candidate genes for enzymes involved in saikosaponin biosynthesis, especially the P450s and UGTs, that were revealed provide a substantial foundation for follow-up research on the metabolism and regulation of the saikosaponins. PMID:22047182
Database Resources of the BIG Data Center in 2018
Xu, Xingjian; Hao, Lili; Zhu, Junwei; Tang, Bixia; Zhou, Qing; Song, Fuhai; Chen, Tingting; Zhang, Sisi; Dong, Lili; Lan, Li; Wang, Yanqing; Sang, Jian; Hao, Lili; Liang, Fang; Cao, Jiabao; Liu, Fang; Liu, Lin; Wang, Fan; Ma, Yingke; Xu, Xingjian; Zhang, Lijuan; Chen, Meili; Tian, Dongmei; Li, Cuiping; Dong, Lili; Du, Zhenglin; Yuan, Na; Zeng, Jingyao; Zhang, Zhewen; Wang, Jinyue; Shi, Shuo; Zhang, Yadong; Pan, Mengyu; Tang, Bixia; Zou, Dong; Song, Shuhui; Sang, Jian; Xia, Lin; Wang, Zhennan; Li, Man; Cao, Jiabao; Niu, Guangyi; Zhang, Yang; Sheng, Xin; Lu, Mingming; Wang, Qi; Xiao, Jingfa; Zou, Dong; Wang, Fan; Hao, Lili; Liang, Fang; Li, Mengwei; Sun, Shixiang; Zou, Dong; Li, Rujiao; Yu, Chunlei; Wang, Guangyu; Sang, Jian; Liu, Lin; Li, Mengwei; Li, Man; Niu, Guangyi; Cao, Jiabao; Sun, Shixiang; Xia, Lin; Yin, Hongyan; Zou, Dong; Xu, Xingjian; Ma, Lina; Chen, Huanxin; Sun, Yubin; Yu, Lei; Zhai, Shuang; Sun, Mingyuan; Zhang, Zhang; Zhao, Wenming; Xiao, Jingfa; Bao, Yiming; Song, Shuhui; Hao, Lili; Li, Rujiao; Ma, Lina; Sang, Jian; Wang, Yanqing; Tang, Bixia; Zou, Dong; Wang, Fan
2018-01-01
Abstract The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. PMID:29036542
Li, Bing; Shi, Xiao-Yu; Liao, Dai-Xiang; Cao, Bang-Rong; Luo, Cheng-Hua; Cheng, Shu-Jun
2015-01-01
There are still no absolute parameters predicting progression of adenoma into cancer. The present study aimed to characterize functional differences on the multistep carcinogenetic process from the adenoma-carcinoma sequence. All samples were collected and mRNA expression profiling was performed by using Agilent Microarray high-throughput gene-chip technology. Then, the characteristics of mRNA expression profiles of adenoma-carcinoma sequence were described with bioinformatics software, and we analyzed the relationship between gene expression profiles of adenoma-adenocarcinoma sequence and clinical prognosis of colorectal cancer. The mRNA expressions of adenoma-carcinoma sequence were significantly different between high-grade intraepithelial neoplasia group and adenocarcinoma group. The biological process of gene ontology function enrichment analysis on differentially expressed genes between high-grade intraepithelial neoplasia group and adenocarcinoma group showed that genes enriched in the extracellular structure organization, skeletal system development, biological adhesion and itself regulated growth regulation, with the P value after FDR correction of less than 0.05. In addition, IPR-related protein mainly focused on the insulin-like growth factor binding proteins. The variable trends of gene expression profiles for adenoma-carcinoma sequence were mainly concentrated in high-grade intraepithelial neoplasia and adenocarcinoma. The differentially expressed genes are significantly correlated between high-grade intraepithelial neoplasia group and adenocarcinoma group. Bioinformatics analysis is an effective way to study the gene expression profiles in the adenoma-carcinoma sequence, and may provide an effective tool to involve colorectal cancer research strategy into colorectal adenoma or advanced adenoma.
Castresana, C; Garcia-Luque, I; Alonso, E; Malik, V S; Cashmore, A R
1988-01-01
We have analyzed promoter regulatory elements from a photoregulated CAB gene (Cab-E) isolated from Nicotiana plumbaginifolia. These studies have been performed by introducing chimeric gene constructs into tobacco cells via Agrobacterium tumefaciens-mediated transformation. Expression studies on the regenerated transgenic plants have allowed us to characterize three positive and one negative cis-acting elements that influence photoregulated expression of the Cab-E gene. Within the upstream sequences we have identified two positive regulatory elements (PRE1 and PRE2) which confer maximum levels of photoregulated expression. These sequences contain multiple repeated elements related to the sequence-ACCGGCCCACTT-. We have also identified within the upstream region a negative regulatory element (NRE) extremely rich in AT sequences, which reduces the level of gene expression in the light. We have defined a light regulatory element (LRE) within the promoter region extending from -396 to -186 bp which confers photoregulated expression when fused to a constitutive nopaline synthase ('nos') promoter. Within this region there is a 132-bp element, extending from -368 to -234 bp, which on deletion from the Cab-E promoter reduces gene expression from high levels to undetectable levels. Finally, we have demonstrated for a full length Cab-E promoter conferring high levels of photoregulated expression, that sequences proximal to the Cab-E TATA box are not replaceable by corresponding sequences from a 'nos' promoter. This contrasts with the apparent equivalence of these Cab-E and 'nos' TATA box-proximal sequences in truncated promoters conferring low levels of photoregulated expression. Images PMID:2901343
USDA-ARS?s Scientific Manuscript database
Five developmental stages of Chrysoperla rufilabris were tested using nine primer pairs. Three sequences were highly expressed at all life stages and six were differentially expressed. These primer pairs may be used as standards to quantitate functional gene expression associated with physiological ...
Positive Selection Underlies Faster-Z Evolution of Gene Expression in Birds
Dean, Rebecca; Harrison, Peter W.; Wright, Alison E.; Zimmer, Fabian; Mank, Judith E.
2015-01-01
The elevated rate of evolution for genes on sex chromosomes compared with autosomes (Fast-X or Fast-Z evolution) can result either from positive selection in the heterogametic sex or from nonadaptive consequences of reduced relative effective population size. Recent work in birds suggests that Fast-Z of coding sequence is primarily due to relaxed purifying selection resulting from reduced relative effective population size. However, gene sequence and gene expression are often subject to distinct evolutionary pressures; therefore, we tested for Fast-Z in gene expression using next-generation RNA-sequencing data from multiple avian species. Similar to studies of Fast-Z in coding sequence, we recover clear signatures of Fast-Z in gene expression; however, in contrast to coding sequence, our data indicate that Fast-Z in expression is due to positive selection acting primarily in females. In the soma, where gene expression is highly correlated between the sexes, we detected Fast-Z in both sexes, although at a higher rate in females, suggesting that many positively selected expression changes in females are also expressed in males. In the gonad, where intersexual correlations in expression are much lower, we detected Fast-Z for female gene expression, but crucially, not males. This suggests that a large amount of expression variation is sex-specific in its effects within the gonad. Taken together, our results indicate that Fast-Z evolution of gene expression is the product of positive selection acting on recessive beneficial alleles in the heterogametic sex. More broadly, our analysis suggests that the adaptive potential of Z chromosome gene expression may be much greater than that of gene sequence, results which have important implications for the role of sex chromosomes in speciation and sexual selection. PMID:26067773
PeanutDB: an integrated bioinformatics web portal for Arachis hypogaea transcriptomics
2012-01-01
Background The peanut (Arachis hypogaea) is an important crop cultivated worldwide for oil production and food sources. Its complex genetic architecture (e.g., the large and tetraploid genome possibly due to unique cross of wild diploid relatives and subsequent chromosome duplication: 2n = 4x = 40, AABB, 2800 Mb) presents a major challenge for its genome sequencing and makes it a less-studied crop. Without a doubt, transcriptome sequencing is the most effective way to harness the genome structure and gene expression dynamics of this non-model species that has a limited genomic resource. Description With the development of next generation sequencing technologies such as 454 pyro-sequencing and Illumina sequencing by synthesis, the transcriptomics data of peanut is rapidly accumulated in both the public databases and private sectors. Integrating 187,636 Sanger reads (103,685,419 bases), 1,165,168 Roche 454 reads (333,862,593 bases) and 57,135,995 Illumina reads (4,073,740,115 bases), we generated the first release of our peanut transcriptome assembly that contains 32,619 contigs. We provided EC, KEGG and GO functional annotations to these contigs and detected SSRs, SNPs and other genetic polymorphisms for each contig. Based on both open-source and our in-house tools, PeanutDB presents many seamlessly integrated web interfaces that allow users to search, filter, navigate and visualize easily the whole transcript assembly, its annotations and detected polymorphisms and simple sequence repeats. For each contig, sequence alignment is presented in both bird’s-eye view and nucleotide level resolution, with colorfully highlighted regions of mismatches, indels and repeats that facilitate close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors. Conclusion As a public genomic database that integrates peanut transcriptome data from different sources, PeanutDB (http://bioinfolab.muohio.edu/txid3818v1) provides the Peanut research community with an easy-to-use web portal that will definitely facilitate genomics research and molecular breeding in this less-studied crop. PMID:22712730
mESAdb: microRNA Expression and Sequence Analysis Database
Kaya, Koray D.; Karakülah, Gökhan; Yakıcıer, Cengiz M.; Acar, Aybar C.; Konu, Özlen
2011-01-01
microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data. PMID:21177657
mESAdb: microRNA expression and sequence analysis database.
Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen
2011-01-01
microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.
Blixt, Maria K E; Hallböök, Finn
2016-01-01
Combining techniques of episomal vector gene-specific Cre expression and genomic integration using the piggyBac transposon system enables studies of gene expression-specific cell lineage tracing in the chicken retina. In this work, we aimed to target the retinal horizontal cell progenitors. A 208 bp gene regulatory sequence from the chicken retinoid X receptor γ gene (RXRγ208) was used to drive Cre expression. RXRγ is expressed in progenitors and photoreceptors during development. The vector was combined with a piggyBac "donor" vector containing a floxed STOP sequence followed by enhanced green fluorescent protein (EGFP), as well as a piggyBac helper vector for efficient integration into the host cell genome. The vectors were introduced into the embryonic chicken retina with in ovo electroporation. Tissue electroporation targets specific developmental time points and in specific structures. Cells that drove Cre expression from the regulatory RXRγ208 sequence excised the floxed STOP-sequence and expressed GFP. The approach generated a stable lineage with robust expression of GFP in retinal cells that have activated transcription from the RXRγ208 sequence. Furthermore, GFP was expressed in cells that express horizontal or photoreceptor markers when electroporation was performed between developmental stages 22 and 28. Electroporation of a stage 12 optic cup gave multiple cell types in accordance with RXRγ gene expression in the early retina. In this study, we describe an easy, cost-effective, and time-efficient method for testing regulatory sequences in general. More specifically, our results open up the possibility for further studies of the RXRγ-gene regulatory network governing the formation of photoreceptor and horizontal cells. In addition, the method presents approaches to target the expression of effector genes, such as regulators of cell fate or cell cycle progression, to these cells and their progenitor.
Cosart, Ted; Beja-Pereira, Albano; Luikart, Gordon
2014-11-01
The computer program EXONSAMPLER automates the sampling of thousands of exon sequences from publicly available reference genome sequences and gene annotation databases. It was designed to provide exon sequences for the efficient, next-generation gene sequencing method called exon capture. The exon sequences can be sampled by a list of gene name abbreviations (e.g. IFNG, TLR1), or by sampling exons from genes spaced evenly across chromosomes. It provides a list of genomic coordinates (a bed file), as well as a set of sequences in fasta format. User-adjustable parameters for collecting exon sequences include a minimum and maximum acceptable exon length, maximum number of exonic base pairs (bp) to sample per gene, and maximum total bp for the entire collection. It allows for partial sampling of very large exons. It can preferentially sample upstream (5 prime) exons, downstream (3 prime) exons, both external exons, or all internal exons. It is written in the Python programming language using its free libraries. We describe the use of EXONSAMPLER to collect exon sequences from the domestic cow (Bos taurus) genome for the design of an exon-capture microarray to sequence exons from related species, including the zebu cow and wild bison. We collected ~10% of the exome (~3 million bp), including 155 candidate genes, and ~16,000 exons evenly spaced genomewide. We prioritized the collection of 5 prime exons to facilitate discovery and genotyping of SNPs near upstream gene regulatory DNA sequences, which control gene expression and are often under natural selection. © 2014 John Wiley & Sons Ltd.
Expression Atlas: gene and protein expression across multiple studies and organisms
Tang, Y Amy; Bazant, Wojciech; Burke, Melissa; Fuentes, Alfonso Muñoz-Pomer; George, Nancy; Koskinen, Satu; Mohammed, Suhaib; Geniza, Matthew; Preece, Justin; Jarnuczak, Andrew F; Huber, Wolfgang; Stegle, Oliver; Brazma, Alvis; Petryszak, Robert
2018-01-01
Abstract Expression Atlas (http://www.ebi.ac.uk/gxa) is an added value database that provides information about gene and protein expression in different species and contexts, such as tissue, developmental stage, disease or cell type. The available public and controlled access data sets from different sources are curated and re-analysed using standardized, open source pipelines and made available for queries, download and visualization. As of August 2017, Expression Atlas holds data from 3,126 studies across 33 different species, including 731 from plants. Data from large-scale RNA sequencing studies including Blueprint, PCAWG, ENCODE, GTEx and HipSci can be visualized next to each other. In Expression Atlas, users can query genes or gene-sets of interest and explore their expression across or within species, tissues, developmental stages in a constitutive or differential context, representing the effects of diseases, conditions or experimental interventions. All processed data matrices are available for direct download in tab-delimited format or as R-data. In addition to the web interface, data sets can now be searched and downloaded through the Expression Atlas R package. Novel features and visualizations include the on-the-fly analysis of gene set overlaps and the option to view gene co-expression in experiments investigating constitutive gene expression across tissues or other conditions. PMID:29165655
Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods.
Dal Molin, Alessandra; Baruzzo, Giacomo; Di Camillo, Barbara
2017-01-01
The sequencing of the transcriptomes of single-cells, or single-cell RNA-sequencing, has now become the dominant technology for the identification of novel cell types and for the study of stochastic gene expression. In recent years, various tools for analyzing single-cell RNA-sequencing data have been proposed, many of them with the purpose of performing differentially expression analysis. In this work, we compare four different tools for single-cell RNA-sequencing differential expression, together with two popular methods originally developed for the analysis of bulk RNA-sequencing data, but largely applied to single-cell data. We discuss results obtained on two real and one synthetic dataset, along with considerations about the perspectives of single-cell differential expression analysis. In particular, we explore the methods performance in four different scenarios, mimicking different unimodal or bimodal distributions of the data, as characteristic of single-cell transcriptomics. We observed marked differences between the selected methods in terms of precision and recall, the number of detected differentially expressed genes and the overall performance. Globally, the results obtained in our study suggest that is difficult to identify a best performing tool and that efforts are needed to improve the methodologies for single-cell RNA-sequencing data analysis and gain better accuracy of results.
Searching for resistance genes to Bursaphelenchus xylophilus using high throughput screening
2012-01-01
Background Pine wilt disease (PWD), caused by the pinewood nematode (PWN; Bursaphelenchus xylophilus), damages and kills pine trees and is causing serious economic damage worldwide. Although the ecological mechanism of infestation is well described, the plant’s molecular response to the pathogen is not well known. This is due mainly to the lack of genomic information and the complexity of the disease. High throughput sequencing is now an efficient approach for detecting the expression of genes in non-model organisms, thus providing valuable information in spite of the lack of the genome sequence. In an attempt to unravel genes potentially involved in the pine defense against the pathogen, we hereby report the high throughput comparative sequence analysis of infested and non-infested stems of Pinus pinaster (very susceptible to PWN) and Pinus pinea (less susceptible to PWN). Results Four cDNA libraries from infested and non-infested stems of P. pinaster and P. pinea were sequenced in a full 454 GS FLX run, producing a total of 2,083,698 reads. The putative amino acid sequences encoded by the assembled transcripts were annotated according to Gene Ontology, to assign Pinus contigs into Biological Processes, Cellular Components and Molecular Functions categories. Most of the annotated transcripts corresponded to Picea genes-25.4-39.7%, whereas a smaller percentage, matched Pinus genes, 1.8-12.8%, probably a consequence of more public genomic information available for Picea than for Pinus. The comparative transcriptome analysis showed that when P. pinaster was infested with PWN, the genes malate dehydrogenase, ABA, water deficit stress related genes and PAR1 were highly expressed, while in PWN-infested P. pinea, the highly expressed genes were ricin B-related lectin, and genes belonging to the SNARE and high mobility group families. Quantitative PCR experiments confirmed the differential gene expression between the two pine species. Conclusions Defense-related genes triggered by nematode infestation were detected in both P. pinaster and P. pinea transcriptomes utilizing 454 pyrosequencing technology. P. pinaster showed higher abundance of genes related to transcriptional regulation, terpenoid secondary metabolism (including some with nematicidal activity) and pathogen attack. P. pinea showed higher abundance of genes related to oxidative stress and higher levels of expression in general of stress responsive genes. This study provides essential information about the molecular defense mechanisms utilized by P. pinaster and P. pinea against PWN infestation and contributes to a better understanding of PWD. PMID:23134679
Analysis of codon usage in beta-tubulin sequences of helminths.
von Samson-Himmelstjerna, G; Harder, A; Failing, K; Pape, M; Schnieder, T
2003-07-01
Codon usage bias has been shown to be correlated with gene expression levels in many organisms, including the nematode Caenorhabditis elegans. Here, the codon usage (cu) characteristics for a set of currently available beta-tubulin coding sequences of helminths were assessed by calculating several indices, including the effective codon number (Nc), the intrinsic codon deviation index (ICDI), the P2 value and the mutational response index (MRI). The P2 value gives a measure of translational pressure, which has been shown to be correlated to high gene expression levels in some organisms, but it has not yet been analysed in that respect in helminths. For all but two of the C. elegans beta-tubulin coding sequences investigated, the P2 value was the only index that indicated the presence of codon usage bias. Therefore, we propose that in general the helminth beta-tubulin sequences investigated here are not expressed at high levels. Furthermore, we calculated the correlation coefficients for the cu patterns of the helminth beta-tubulin sequences compared with those of highly expressed genes in organisms such as Escherichia coli and C. elegans. It was found that beta-tubulin cu patterns for all sequences of members of the Strongylida were significantly correlated to those for highly expressed C. elegans genes. This approach provides a new measure for comparing the adaptation of cu of a particular coding sequence with that of highly expressed genes in possible expression systems.Finally, using the cu patterns of the sequences studied, a phylogenetic tree was constructed. The topology of this tree was very much in concordance with that of a phylogeny based on small subunit ribosomal DNA sequence alignments.
Havird, Justin C.; Mitchell, Reed T.; Henry, Raymond P.; Santos, Scott R.
2016-01-01
Decapods represent one of the most ecologically diverse taxonomic groups within crustaceans, making them ideal to study physiological processes like osmoregulation. However, prior studies have failed to consider the entire transcriptomic response of the gill – the primary organ responsible for ion transport – to changing salinity. Moreover, the molecular genetic differences between non-osmoregulatory and osmoregulatory gill types, as well as the hormonal basis of osmoregulation, remain underexplored. Here, we identified and characterized differentially expressed genes (DEGs) via RNA-Seq in anterior (non-osmoregulatory) and posterior (osmoregulatory) gills during high to low salinity transfer in the blue crab Callinectes sapidus, a well-studied model for crustacean osmoregulation. Overall, we confirmed previous expression patterns for individual ion transport genes and identified novel ones with salinity-mediated expression. Notable, novel DEGs among salinities and gill types for C. sapidus included anterior gills having higher expression of structural genes such as actin and cuticle proteins while posterior gills exhibit elevated expression of ion transport and energy-related genes, with the latter likely linked to ion transport. Potential targets among recovered DEGs for hormonal regulation of ion transport between salinities and gill types included neuropeptide Y and a KCTD16-like protein. Using publically available sequence data, constituents for a “core” gill transcriptome among decapods are presented, comprising genes involved in ion transport and energy conversion and consistent with salinity transfer experiments. Lastly, rarefication analyses lead us to recommend a modest number of sequence reads (~10–15 M), but with increased biological replication, be utilized in future DEG analyses of crustaceans. PMID:27337176
Identification and Characterization of Novel MicroRNAs from Schistosoma japonicum
Xue, Xiangyang; Sun, Jun; Zhang, Qingfeng; Wang, Zhangxun; Huang, Yufu; Pan, Weiqing
2008-01-01
Background Schistosomiasis japonica remains a major public health problem in China. Its pathogen, Schistosoma japonicum has a complex life cycle and a unique repertoire of genes expressed at different life cycle stages. Exploring schistosome gene regulation will yield the best prospects for new drug targets and vaccine candidates. MicroRNAs (miRNAs) are a highly conserved class of noncoding RNA that control many biological processes by sequence-specific inhibition of gene expression. Although a large number of miRNAs have been identified from plants to mammals, it remains no experimental proof whether schistosome exist miRNAs. Methodology and Results We have identified novel miRNAs from Schistosoma japonicum by cloning and sequencing a small (18–26 nt) RNA cDNA library from the adult worms. Five novel miRNAs were identified from 227 cloned RNA sequences and verified by Northern blot. Alignments of the miRNAs with corresponding family members indicated that four of them belong to a metazoan miRNA family: let-7, miR-71, bantam and miR-125. The fifth potentially new (non conserved) miRNA appears to belong to a previously undescribed family in the genus Schistosome. The novel miRNAs were designated as sja-let-7, sja-miR-71, sja-bantam, sja-miR-125 and sja-miR-new1, respectively. Expression of sja-let-7, sja-miR-71 and sja-bantam were analyzed in six stages of the life cycle, i.e. egg, miracidium, sporocyst, cercaria, schistosomulum, and adult worm, by a modified stem-loop reverse transcribed polymerase chain reaction (RT-PCR) method developed in our laboratory. The expression patterns of these miRNAs were highly stage-specific. In particular, sja-miR-71 and sja-bantam expression reach their peaks in the cercaria stage and then drop quickly to the nadirs in the schistosomulum stage, following penetration of cercaria into a mammalian host. Conclusions Authentic miRNAs were identified for the first time in S. japonicum, including a new schistosome family member. The different expression patterns of the novel miRNAs over the life stages of S. japonicum suggest that they may mediate important roles in Schistosome growth and development. PMID:19107204
The Histone Database: an integrated resource for histones and histone fold-containing proteins
Mariño-Ramírez, Leonardo; Levine, Kevin M.; Morales, Mario; Zhang, Suiyuan; Moreland, R. Travis; Baxevanis, Andreas D.; Landsman, David
2011-01-01
Eukaryotic chromatin is composed of DNA and protein components—core histones—that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Histone Sequence Database is a curated collection of sequences and structures of histones and non-histone proteins containing histone folds, assembled from major public databases. Here, we report a substantial increase in the number of sequences and taxonomic coverage for histone and histone fold-containing proteins available in the database. Additionally, the database now contains an expanded dataset that includes archaeal histone sequences. The database also provides comprehensive multiple sequence alignments for each of the four core histones (H2A, H2B, H3 and H4), the linker histones (H1/H5) and the archaeal histones. The database also includes current information on solved histone fold-containing structures. The Histone Sequence Database is an inclusive resource for the analysis of chromatin structure and function focused on histones and histone fold-containing proteins. Database URL: The Histone Sequence Database is freely available and can be accessed at http://research.nhgri.nih.gov/histones/. PMID:22025671
Onozawa, Masahiro; Zhang, Zhenhua; Kim, Yoo Jung; Goldberg, Liat; Varga, Tamas; Bergsagel, P Leif; Kuehl, W Michael; Aplan, Peter D
2014-05-27
We used the I-SceI endonuclease to produce DNA double-strand breaks (DSBs) and observed that a fraction of these DSBs were repaired by insertion of sequences, which we termed "templated sequence insertions" (TSIs), derived from distant regions of the genome. These TSIs were derived from genic, retrotransposon, or telomere sequences and were not deleted from the donor site in the genome, leading to the hypothesis that they were derived from reverse-transcribed RNA. Cotransfection of RNA and an I-SceI expression vector demonstrated insertion of RNA-derived sequences at the DNA-DSB site, and TSIs were suppressed by reverse-transcriptase inhibitors. Both observations support the hypothesis that TSIs were derived from RNA templates. In addition, similar insertions were detected at sites of DNA DSBs induced by transcription activator-like effector nuclease proteins. Whole-genome sequencing of myeloma cell lines revealed additional TSIs, demonstrating that repair of DNA DSBs via insertion was not restricted to experimentally produced DNA DSBs. Analysis of publicly available databases revealed that many of these TSIs are polymorphic in the human genome. Taken together, these results indicate that insertional events should be considered as alternatives to gross chromosomal rearrangements in the interpretation of whole-genome sequence data and that this mutagenic form of DNA repair may play a role in genetic disease, exon shuffling, and mammalian evolution.
Dcode.org anthology of comparative genomic tools.
Loots, Gabriela G; Ovcharenko, Ivan
2005-07-01
Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the non-coding encryption of gene regulation across genomes. To facilitate the practical application of comparative sequence analysis to genetics and genomics, we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools, zPicture and Mulan; a phylogenetic shadowing tool, eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools, rVista and multiTF; a tool for extracting cis-regulatory modules governing the expression of co-regulated genes, Creme 2.0; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here, we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ website.
Barat, Ana; Ruskin, Heather J; Byrne, Annette T; Prehn, Jochen H M
2015-11-23
Recently, considerable attention has been paid to gene expression-based classifications of colorectal cancers (CRC) and their association with patient prognosis. In addition to changes in gene expression, abnormal DNA-methylation is known to play an important role in cancer onset and development, and colon cancer is no exception to this rule. Large-scale technologies, such as methylation microarray assays and specific sequencing of methylated DNA, have been used to determine whole genome profiles of CpG island methylation in tissue samples. In this article, publicly available microarray-based gene expression and methylation data sets are used to characterize expression subtypes with respect to locus-specific methylation. A major objective was to determine whether integration of these data types improves previously characterized subtypes, or provides evidence for additional subtypes. We used unsupervised clustering techniques to determine methylation-based subgroups, which are subsequently annotated with three published expression-based classifications, comprising from three to six subtypes. Our results showed that, while methylation profiles provide a further basis for segregation of certain (Inflammatory and Goblet-like) finer-grained expression-based subtypes, they also suggest that other finer-grained subtypes are not distinctive and can be considered as a single subtype.
Barat, Ana; Ruskin, Heather J.; Byrne, Annette T.; Prehn, Jochen H. M.
2015-01-01
Recently, considerable attention has been paid to gene expression-based classifications of colorectal cancers (CRC) and their association with patient prognosis. In addition to changes in gene expression, abnormal DNA-methylation is known to play an important role in cancer onset and development, and colon cancer is no exception to this rule. Large-scale technologies, such as methylation microarray assays and specific sequencing of methylated DNA, have been used to determine whole genome profiles of CpG island methylation in tissue samples. In this article, publicly available microarray-based gene expression and methylation data sets are used to characterize expression subtypes with respect to locus-specific methylation. A major objective was to determine whether integration of these data types improves previously characterized subtypes, or provides evidence for additional subtypes. We used unsupervised clustering techniques to determine methylation-based subgroups, which are subsequently annotated with three published expression-based classifications, comprising from three to six subtypes. Our results showed that, while methylation profiles provide a further basis for segregation of certain (Inflammatory and Goblet-like) finer-grained expression-based subtypes, they also suggest that other finer-grained subtypes are not distinctive and can be considered as a single subtype. PMID:27600244
Evolution and Function of the TCR Vgamma9 Chain Repertoire: It’s Good to be Public
Pauza, C. David; Cairo, Cristiana
2015-01-01
Lymphocytes expressing a T cell receptor (TCR) composed of Vgamma9 and Vdelta2 chains represent a minor fraction of human thymocytes. Extrathymic selection throughout post-natal life causes the proportion of cells with a Vgamma9-JP rearrangement to increase and elevates the capacity for responding to non-peptidic phosphoantigens. Extrathymic selection is so powerful that phosphoantigen-reactive cells comprise about 1 in 40 circulating memory T cells from healthy adults and the subset can be expanded rapidly upon infection or in response to malignancy. Skewing of the gamma delta TCR repertoire is accompanied by selection for public gamma chain sequences such that many unrelated individuals overlap extensive in their circulating repertoire. This type of selection implies the presence of a monomorphic antigen-presenting molecule that is an object of current research but remains incompletely defined. While selection on a monomorphic presenting molecule may seem unusual, similar mechanisms shape the alpha beta T cell repertoire including the extreme examples of NKT or mucosal-associated invariant T cells (MAIT) and the less dramatic amplification of public Vbeta chain rearrangements driven by individual MHC molecules and associated with resistance to viral pathogens. Selecting and amplifying public T cell receptors whether alpha beta or gamma delta, are important steps in developing an anticipatory TCR repertoire. Cell clones expressing public TCR can accelerate the kinetics of response to pathogens and impact host survival. PMID:25769734
Nagaraj, Shivashankar H.; Gasser, Robin B.; Ranganathan, Shoba
2008-01-01
Background Parasitic nematodes of humans, other animals and plants continue to impose a significant public health and economic burden worldwide, due to the diseases they cause. Promising antiparasitic drug and vaccine candidates have been discovered from excreted or secreted (ES) proteins released from the parasite and exposed to the immune system of the host. Mining the entire expressed sequence tag (EST) data available from parasitic nematodes represents an approach to discover such ES targets. Methods and Findings In this study, we predicted, using EST2Secretome, a novel, high-throughput, computational workflow system, 4,710 ES proteins from 452,134 ESTs derived from 39 different species of nematodes, parasitic in animals (including humans) or plants. In total, 2,632, 786, and 1,292 ES proteins were predicted for animal-, human-, and plant-parasitic nematodes. Subsequently, we systematically analysed ES proteins using computational methods. Of these 4,710 proteins, 2,490 (52.8%) had orthologues in Caenorhabditis elegans, whereas 621 (13.8%) appeared to be novel, currently having no significant match to any molecule available in public databases. Of the C. elegans homologues, 267 had strong “loss-of-function” phenotypes by RNA interference (RNAi) in this nematode. We could functionally classify 1,948 (41.3%) sequences using the Gene Ontology (GO) terms, establish pathway associations for 573 (12.2%) sequences using Kyoto Encyclopaedia of Genes and Genomes (KEGG), and identify protein interaction partners for 1,774 (37.6%) molecules. We also mapped 758 (16.1%) proteins to protein domains including the nematode-specific protein family “transthyretin-like” and “chromadorea ALT,” considered as vaccine candidates against filariasis in humans. Conclusions We report the large-scale analysis of ES proteins inferred from EST data for a range of parasitic nematodes. This set of ES proteins provides an inventory of known and novel members of ES proteins as a foundation for studies focused on understanding the biology of parasitic nematodes and their interactions with their hosts, as well as for the development of novel drugs or vaccines for parasite intervention and control. PMID:18820748
2012-01-01
Background In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database. Methods The GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology. The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and processing. Results and conclusions Entries coming from Virus (814,122), Plant (1,365,360) and Invertebrate (959,065) divisions of GenBank rel.180 have been loaded in the Molecular Biodiversity Database by GIDL. Our system, combining the Sequence Ontology and the Chado schema, allows a more powerful query expressiveness compared with the most commonly used sequence retrieval systems like Entrez or SRS. PMID:22536971
USDA-ARS?s Scientific Manuscript database
The codling moth, Cydia pomonella, is one of the most important pests of pome fruits in the world, yet the molecular genetics and physiology of this insect remains poorly understood. A combined assembly of 8340 expressed sequence tags (ESTs) was generated from Roche 454 GS-FLX sequencing of 8 tissu...
Jagtap, Soham; Shivaprasad, Padubidri V
2014-12-02
Micro (mi)RNAs are important regulators of plant development. Across plant lineages, Dicer-like 1 (DCL1) proteins process long ds-like structures to produce micro (mi) RNA duplexes in a stepwise manner. These miRNAs are incorporated into Argonaute (AGO) proteins and influence expression of RNAs that have sequence complementarity with miRNAs. Expression levels of AGOs are greatly regulated by plants in order to minimize unwarranted perturbations using miRNAs to target mRNAs coding for AGOs. AGOs may also have high promoter specificity-sometimes expression of AGO can be limited to just a few cells in a plant. Viral pathogens utilize various means to counter antiviral roles of AGOs including hijacking the host encoded miRNAs to target AGOs. Two host encoded miRNAs namely miR168 and miR403 that target AGOs have been described in the model plant Arabidopsis and such a mechanism is thought to be well conserved across plants because AGO sequences are well conserved. We show that the interaction between AGO mRNAs and miRNAs is species-specific due to the diversity in sequences of two miRNAs that target AGOs, sequence diversity among corresponding target regions in AGO mRNAs and variable expression levels of these miRNAs among vascular plants. We used miRNA sequences from 68 plant species representing 31 plant families for this analysis. Sequences of miR168 and miR403 are not conserved among plant lineages, but surprisingly they differ drastically in their sequence diversity and expression levels even among closely related plants. Variation in miR168 expression among plants correlates well with secondary structures/length of loop sequences of their precursors. Our data indicates a complex AGO targeting interaction among plant lineages due to miRNA sequence diversity and sequences of miRNA targeting regions among AGO mRNAs, thus leading to the assumption that the perturbations by viruses that use host miRNAs to target antiviral AGOs can only be species-specific. We also show that rapid evolution and likely loss of expression of miR168 isoforms in tobacco is related to the insertion of MITE-like transposons between miRNA and miRNA* sequences, a possible mechanism showing how miRNAs are lost in few plant lineages even though other close relatives have abundantly expressing miRNAs.
Lewis, Jo E.; Brameld, John M.; Hill, Phil; Barrett, Perry; Ebling, Francis J.P.; Jethwa, Preeti H.
2015-01-01
Introduction The viral 2A sequence has become an attractive alternative to the traditional internal ribosomal entry site (IRES) for simultaneous over-expression of two genes and in combination with recombinant adeno-associated viruses (rAAV) has been used to manipulate gene expression in vitro. New method To develop a rAAV construct in combination with the viral 2A sequence to allow long-term over-expression of the vgf gene and fluorescent marker gene for tracking of the transfected neurones in vivo. Results Transient transfection of the AAV plasmid containing the vgf gene, viral 2A sequence and eGFP into SH-SY5Y cells resulted in eGFP fluorescence comparable to a commercially available reporter construct. This increase in fluorescent cells was accompanied by an increase in VGF mRNA expression. Infusion of the rAAV vector containing the vgf gene, viral 2A sequence and eGFP resulted in eGFP fluorescence in the hypothalamus of both mice and Siberian hamsters, 32 weeks post infusion. In situ hybridisation confirmed that the location of VGF mRNA expression in the hypothalamus corresponded to the eGFP pattern of fluorescence. Comparison with old method The viral 2A sequence is much smaller than the traditional IRES and therefore allowed over-expression of the vgf gene with fluorescent tracking without compromising viral capacity. Conclusion The use of the viral 2A sequence in the AAV plasmid allowed the simultaneous expression of both genes in vitro. When used in combination with rAAV it resulted in long-term over-expression of both genes at equivalent locations in the hypothalamus of both Siberian hamsters and mice, without any adverse effects. PMID:26300182
Rojas-Cartagena, Carmencita; Ortíz-Pineda, Pablo; Ramírez-Gómez, Francisco; Suárez-Castillo, Edna C.; Matos-Cruz, Vanessa; Rodríguez, Carlos; Ortíz-Zuazaga, Humberto; García-Arrarás, José E.
2010-01-01
Repair and regeneration are key processes for tissue maintenance, and their disruption may lead to disease states. Little is known about the molecular mechanisms that underline the repair and regeneration of the digestive tract. The sea cucumber Holothuria glaberrima represents an excellent model to dissect and characterize the molecular events during intestinal regeneration. To study the gene expression profile, cDNA libraries were constructed from normal, 3-day, and 7-day regenerating intestines of H. glaberrima. Clones were randomly sequenced and queried against the nonredundant protein database at the National Center for Biotechnology Information. RT-PCR analyses were made of several genes to determine their expression profile during intestinal regeneration. A total of 5,173 sequences from three cDNA libraries were obtained. About 46.2, 35.6, and 26.2% of the sequences for the normal, 3-days, and 7-days cDNA libraries, respectively, shared significant similarity with known sequences in the protein database of GenBank but only present 10% of similarity among them. Analysis of the libraries in terms of functional processes, protein domains, and most common sequences suggests that a differential expression profile is taking place during the regeneration process. Further examination of the expressed sequence tag dataset revealed that 12 putative genes are differentially expressed at significant level (R > 6). Experimental validation by RT-PCR analysis reveals that at least three genes (unknown C-4677-1, melanotransferrin, and centaurin) present a differential expression during regeneration. These findings strongly suggest that the gene expression profile varies among regeneration stages and provide evidence for the existence of differential gene expression. PMID:17579180
Tissue Gene Expression Analysis Using Arrayed Normalized cDNA Libraries
Eickhoff, Holger; Schuchhardt, Johannes; Ivanov, Igor; Meier-Ewert, Sebastian; O'Brien, John; Malik, Arif; Tandon, Neeraj; Wolski, Eryk-Witold; Rohlfs, Elke; Nyarsik, Lajos; Reinhardt, Richard; Nietfeld, Wilfried; Lehrach, Hans
2000-01-01
We have used oligonucleotide-fingerprinting data on 60,000 cDNA clones from two different mouse embryonic stages to establish a normalized cDNA clone set. The normalized set of 5,376 clones represents different clusters and therefore, in almost all cases, different genes. The inserts of the cDNA clones were amplified by PCR and spotted on glass slides. The resulting arrays were hybridized with mRNA probes prepared from six different adult mouse tissues. Expression profiles were analyzed by hierarchical clustering techniques. We have chosen radioactive detection because it combines robustness with sensitivity and allows the comparison of multiple normalized experiments. Sensitive detection combined with highly effective clustering algorithms allowed the identification of tissue-specific expression profiles and the detection of genes specifically expressed in the tissues investigated. The obtained results are publicly available (http://www.rzpd.de) and can be used by other researchers as a digital expression reference. [The sequence data described in this paper have been submitted to the EMBL data library under accession nos. AL360374–AL36537.] PMID:10958641
Tedder, Philip; Zubko, Elena; Westhead, David R.; Meyer, Peter
2009-01-01
Two pools of small RNAs were cloned from inflorescences of Petunia hybrida using a 5′-ligation dependent and a 5′-ligation independent approach. The two libraries were integrated into a public website that allows the screening of individual sequences against 359,769 unique clones. The library contains 15 clones with 100% identity and 53 clones with one mismatch to miRNAs described for other plant species. For two conserved miRNAs, miR159 and miR390, we find clear differences in tissue-specific distribution, compared with other species. This shows that evolutionary conservation of miRNA sequences does not necessarily include a conservation of the miRNA expression profile. Almost 60% of all clones in the database are 24-nucleotide clones. In accordance with the role of 24mers in marking repetitive regions, we find them distributed across retroviral and transposable element sequences but other 24mers map to promoter regions and to different transcript regions. For one target region we observe tissue-specific variation of matching 24mers, which demonstrates that, as for 21mers, 24mer concentrations are not necessarily identical in different tissues. Asymmetric distribution of a putative novel miRNA in the two libraries suggests that the cloning method can be selective for the representation of certain small RNAs in a collection. PMID:19369427
Comprehensive Genetic Database of Expressed Sequence Tags for Coccolithophorids
NASA Astrophysics Data System (ADS)
Ranji, Mohammad; Hadaegh, Ahmad R.
Coccolithophorids are unicellular, marine, golden-brown, single-celled algae (Haptophyta) commonly found in near-surface waters in patchy distributions. They belong to the Phytoplankton family that is known to be responsible for much of the earth reproduction. Phytoplankton, just like plants live based on the energy obtained by Photosynthesis which produces oxygen. Substantial amount of oxygen in the earth's atmosphere is produced by Phytoplankton through Photosynthesis. The single-celled Emiliana Huxleyi is the most commonly known specie of Coccolithophorids and is known for extracting bicarbonate (HCO3) from its environment and producing calcium carbonate to form Coccoliths. Coccolithophorids are one of the world's primary producers, contributing about 15% of the average oceanic phytoplankton biomass to the oceans. They produce elaborate, minute calcite platelets (Coccoliths), covering the cell to form a Coccosphere and supplying up to 60% of the bulk pelagic calcite deposited on the sea floors. In order to understand the genetics of Coccolithophorid and the complexities of their biochemical reactions, we decided to build a database to store a complete profile of these organisms' genomes. Although a variety of such databases currently exist, (http://www.geneservice.co.uk/home/) none have yet been developed to comprehensively address the sequencing efforts underway by the Coccolithophorid research community. This database is called CocooExpress and is available to public (http://bioinfo.csusm.edu) for both data queries and sequence contribution.
2014-01-01
Background Neisseria meningitidis expresses type four pili (Tfp) which are important for colonisation and virulence. Tfp have been considered as one of the most variable structures on the bacterial surface due to high frequency gene conversion, resulting in amino acid sequence variation of the major pilin subunit (PilE). Meningococci express either a class I or a class II pilE gene and recent work has indicated that class II pilins do not undergo antigenic variation, as class II pilE genes encode conserved pilin subunits. The purpose of this work was to use whole genome sequences to further investigate the frequency and variability of the class II pilE genes in meningococcal isolate collections. Results We analysed over 600 publically available whole genome sequences of N. meningitidis isolates to determine the sequence and genomic organization of pilE. We confirmed that meningococcal strains belonging to a limited number of clonal complexes (ccs, namely cc1, cc5, cc8, cc11 and cc174) harbour a class II pilE gene which is conserved in terms of sequence and chromosomal context. We also identified pilS cassettes in all isolates with class II pilE, however, our analysis indicates that these do not serve as donor sequences for pilE/pilS recombination. Furthermore, our work reveals that the class II pilE locus lacks the DNA sequence motifs that enable (G4) or enhance (Sma/Cla repeat) pilin antigenic variation. Finally, through analysis of pilin genes in commensal Neisseria species we found that meningococcal class II pilE genes are closely related to pilE from Neisseria lactamica and Neisseria polysaccharea, suggesting horizontal transfer among these species. Conclusions Class II pilins can be defined by their amino acid sequence and genomic context and are present in meningococcal isolates which have persisted and spread globally. The absence of G4 and Sma/Cla sequences adjacent to the class II pilE genes is consistent with the lack of pilin subunit variation in these isolates, although horizontal transfer may generate class II pilin diversity. This study supports the suggestion that high frequency antigenic variation of pilin is not universal in pathogenic Neisseria. PMID:24690385
Generation of 2A-linked multicistronic cassettes by recombinant PCR.
Szymczak-Workman, Andrea L; Vignali, Kate M; Vignali, Dario A A
2012-02-01
The need for reliable, multicistronic vectors for multigene delivery is at the forefront of biomedical technology. It is now possible to express multiple proteins from a single open reading frame (ORF) using 2A peptide-linked multicistronic vectors. These small sequences, when cloned between genes, allow for efficient, stoichiometric production of discrete protein products within a single vector through a novel "cleavage" event within the 2A peptide sequence. Expression of more than two genes using conventional approaches has several limitations, most notably imbalanced protein expression and large size. The use of 2A peptide sequences alleviates these concerns. They are small (18-22 amino acids) and have divergent amino-terminal sequences, which minimizes the chance for homologous recombination and allows for multiple, different 2A peptide sequences to be used within a single vector. Importantly, separation of genes placed between 2A peptide sequences is nearly 100%, which allows for stoichiometric and concordant expression of the genes, regardless of the order of placement within the vector. This protocol describes the use of recombinant polymerase chain reaction (PCR) to connect multiple 2A-linked protein sequences. The final construct is subcloned into an expression vector.
Kogelman, Lisette J A; Cirera, Susanna; Zhernakova, Daria V; Fredholm, Merete; Franke, Lude; Kadarmideen, Haja N
2014-09-30
Obesity is a complex metabolic condition in strong association with various diseases, like type 2 diabetes, resulting in major public health and economic implications. Obesity is the result of environmental and genetic factors and their interactions, including genome-wide genetic interactions. Identification of co-expressed and regulatory genes in RNA extracted from relevant tissues representing lean and obese individuals provides an entry point for the identification of genes and pathways of importance to the development of obesity. The pig, an omnivorous animal, is an excellent model for human obesity, offering the possibility to study in-depth organ-level transcriptomic regulations of obesity, unfeasible in humans. Our aim was to reveal adipose tissue co-expression networks, pathways and transcriptional regulations of obesity using RNA Sequencing based systems biology approaches in a porcine model. We selected 36 animals for RNA Sequencing from a previously created F2 pig population representing three extreme groups based on their predicted genetic risks for obesity. We applied Weighted Gene Co-expression Network Analysis (WGCNA) to detect clusters of highly co-expressed genes (modules). Additionally, regulator genes were detected using Lemon-Tree algorithms. WGCNA revealed five modules which were strongly correlated with at least one obesity-related phenotype (correlations ranging from -0.54 to 0.72, P < 0.001). Functional annotation identified pathways enlightening the association between obesity and other diseases, like osteoporosis (osteoclast differentiation, P = 1.4E-7), and immune-related complications (e.g. Natural killer cell mediated cytotoxity, P = 3.8E-5; B cell receptor signaling pathway, P = 7.2E-5). Lemon-Tree identified three potential regulator genes, using confident scores, for the WGCNA module which was associated with osteoclast differentiation: CCR1, MSR1 and SI1 (probability scores respectively 95.30, 62.28, and 34.58). Moreover, detection of differentially connected genes identified various genes previously identified to be associated with obesity in humans and rodents, e.g. CSF1R and MARC2. To our knowledge, this is the first study to apply systems biology approaches using porcine adipose tissue RNA-Sequencing data in a genetically characterized porcine model for obesity. We revealed complex networks, pathways, candidate and regulatory genes related to obesity, confirming the complexity of obesity and its association with immune-related disorders and osteoporosis.
Music performance and the perception of key.
Thompson, W F; Cuddy, L L
1997-02-01
The effect of music performance on perceived key movement was examined. Listeners judged key movement in sequences presented without performance expression (mechanical) in Experiment 1 and with performance expression in Experiment 2. Modulation distance varied. Judgments corresponded to predictions based on the cycle of fifths and toroidal models of key relatedness, with the highest correspondence for performed versions with the toroidal model. In Experiment 3, listeners compared mechanical sequences with either performed sequences or modifications of performed sequences. Modifications preserved expressive differences between chords, but not between voices. Predictions from Experiments 1 and 2 held only for performed sequences, suggesting that differences between voices are informative of key movement. Experiment 4 confirmed that modifications did not disrupt musicality. Analyses of performances further suggested a link between performance expression and key.
Amirhaeri, S; Wohlrab, F; Wells, R D
1995-02-17
The influence of simple repeat sequences, cloned into different positions relative to the SV40 early promoter/enhancer, on the transient expression of the chloramphenicol acetyltransferase (CAT) gene was investigated. Insertion of (G)29.(C)29 in either orientation into the 5'-untranslated region of the CAT gene reduced expression in CV-1 cells 50-100 fold when compared with controls with random sequence inserts. Analysis of CAT-specific mRNA levels demonstrated that the effect was due to a reduction of CAT mRNA production rather than to posttranscriptional events. In contrast, insertion of the same insert in either orientation upstream of the promoter-enhancer or downstream of the gene stimulated gene expression 2-3-fold. These effects could be reversed by cotransfection of a competitor plasmid carrying (G)25.(C)25 sequences. The results suggest that a G.C-binding transcription factor modulates gene expression in this system and that promoter strength can be regulated by providing protein-binding sites in trans. Although constructs containing longer tracts of alternating (C-G), (T-G), or (A-T) sequences inhibited CAT expression when inserted in the 5'-untranslated region of the CAT gene, the amount of CAT mRNA was unaffected. Hence, these inhibitions must be due to posttranscriptional events, presumably at the level of translation. These effects of microsatellite sequences on gene expression are discussed with respect to recent data on related simple repeat sequences which cause several human genetic diseases.
Mahmood, Khalid; Højland, Dorte H; Asp, Torben; Kristensen, Michael
2016-01-01
Insecticide resistance in the housefly, Musca domestica, has been investigated for more than 60 years. It will enter a new era after the recent publication of the housefly genome and the development of multiple next generation sequencing technologies. The genetic background of the xenobiotic response can now be investigated in greater detail. Here, we investigate the 454-pyrosequencing transcriptome of the spinosad-resistant 791spin strain in relation to the housefly genome with focus on P450 genes. The de novo assembly of clean reads gave 35,834 contigs consisting of 21,780 sequences of the spinosad resistant strain. The 3,648 sequences were annotated with an enzyme code EC number and were mapped to 124 KEGG pathways with metabolic processes as most highly represented pathway. One hundred and twenty contigs were annotated as P450s covering 44 different P450 genes of housefly. Eight differentially expressed P450s genes were identified and investigated for SNPs, CpG islands and common regulatory motifs in promoter and coding regions. Functional annotation clustering of metabolic related genes and motif analysis of P450s revealed their association with epigenetic, transcription and gene expression related functions. The sequence variation analysis resulted in 12 SNPs and eight of them found in cyp6d1. There is variation in location, size and frequency of CpG islands and specific motifs were also identified in these P450s. Moreover, identified motifs were associated to GO terms and transcription factors using bioinformatic tools. Transcriptome data of a spinosad resistant strain provide together with genome data fundamental support for future research to understand evolution of resistance in houseflies. Here, we report for the first time the SNPs, CpG islands and common regulatory motifs in differentially expressed P450s. Taken together our findings will serve as a stepping stone to advance understanding of the mechanism and role of P450s in xenobiotic detoxification.
Duan, Jun; Ladd, Tim; Doucet, Daniel; Cusson, Michel; vanFrankenhuyzen, Kees; Mittapalli, Omprakash; Krell, Peter J; Quan, Guoxing
2015-01-01
The Emerald ash borer (EAB), Agrilus planipennis, is an invasive phloem-feeding insect pest of ash trees. Since its initial discovery near the Detroit, US- Windsor, Canada area in 2002, the spread of EAB has had strong negative economic, social and environmental impacts in both countries. Several transcriptomes from specific tissues including midgut, fat body and antenna have recently been generated. However, the relatively low sequence depth, gene coverage and completeness limited the usefulness of these EAB databases. High-throughput deep RNA-Sequencing (RNA-Seq) was used to obtain 473.9 million pairs of 100 bp length paired-end reads from various life stages and tissues. These reads were assembled into 88,907 contigs using the Trinity strategy and integrated into 38,160 unigenes after redundant sequences were removed. We annotated 11,229 unigenes by searching against the public nr, Swiss-Prot and COG. The EAB transcriptome assembly was compared with 13 other sequenced insect species, resulting in the prediction of 536 unigenes that are Coleoptera-specific. Differential gene expression revealed that 290 unigenes are expressed during larval molting and 3,911 unigenes during metamorphosis from larvae to pupae, respectively (FDR< 0.01 and log2 FC>2). In addition, 1,167 differentially expressed unigenes were identified from larval and adult midguts, 435 unigenes were up-regulated in larval midgut and 732 unigenes were up-regulated in adult midgut. Most of the genes involved in RNA interference (RNAi) pathways were identified, which implies the existence of a system RNAi in EAB. This study provides one of the most fundamental and comprehensive transcriptome resources available for EAB to date. Identification of the tissue- stage- or species- specific unigenes will benefit the further study of gene functions during growth and metamorphosis processes in EAB and other pest insects.
Duan, Jun; Ladd, Tim; Doucet, Daniel; Cusson, Michel; vanFrankenhuyzen, Kees; Mittapalli, Omprakash; Krell, Peter J.; Quan, Guoxing
2015-01-01
Background The Emerald ash borer (EAB), Agrilus planipennis, is an invasive phloem-feeding insect pest of ash trees. Since its initial discovery near the Detroit, US- Windsor, Canada area in 2002, the spread of EAB has had strong negative economic, social and environmental impacts in both countries. Several transcriptomes from specific tissues including midgut, fat body and antenna have recently been generated. However, the relatively low sequence depth, gene coverage and completeness limited the usefulness of these EAB databases. Methodology and Principal Findings High-throughput deep RNA-Sequencing (RNA-Seq) was used to obtain 473.9 million pairs of 100 bp length paired-end reads from various life stages and tissues. These reads were assembled into 88,907 contigs using the Trinity strategy and integrated into 38,160 unigenes after redundant sequences were removed. We annotated 11,229 unigenes by searching against the public nr, Swiss-Prot and COG. The EAB transcriptome assembly was compared with 13 other sequenced insect species, resulting in the prediction of 536 unigenes that are Coleoptera-specific. Differential gene expression revealed that 290 unigenes are expressed during larval molting and 3,911 unigenes during metamorphosis from larvae to pupae, respectively (FDR< 0.01 and log2 FC>2). In addition, 1,167 differentially expressed unigenes were identified from larval and adult midguts, 435 unigenes were up-regulated in larval midgut and 732 unigenes were up-regulated in adult midgut. Most of the genes involved in RNA interference (RNAi) pathways were identified, which implies the existence of a system RNAi in EAB. Conclusions and Significance This study provides one of the most fundamental and comprehensive transcriptome resources available for EAB to date. Identification of the tissue- stage- or species- specific unigenes will benefit the further study of gene functions during growth and metamorphosis processes in EAB and other pest insects. PMID:26244979
Computational annotation of genes differentially expressed along olive fruit development
Galla, Giulio; Barcaccia, Gianni; Ramina, Angelo; Collani, Silvio; Alagna, Fiammetta; Baldoni, Luciana; Cultrera, Nicolò GM; Martinelli, Federico; Sebastiani, Luca; Tonutti, Pietro
2009-01-01
Background Olea europaea L. is a traditional tree crop of the Mediterranean basin with a worldwide economical high impact. Differently from other fruit tree species, little is known about the physiological and molecular basis of the olive fruit development and a few sequences of genes and gene products are available for olive in public databases. This study deals with the identification of large sets of differentially expressed genes in developing olive fruits and the subsequent computational annotation by means of different software. Results mRNA from fruits of the cv. Leccino sampled at three different stages [i.e., initial fruit set (stage 1), completed pit hardening (stage 2) and veraison (stage 3)] was used for the identification of differentially expressed genes putatively involved in main processes along fruit development. Four subtractive hybridization libraries were constructed: forward and reverse between stage 1 and 2 (libraries A and B), and 2 and 3 (libraries C and D). All sequenced clones (1,132 in total) were analyzed through BlastX against non-redundant NCBI databases and about 60% of them showed similarity to known proteins. A total of 89 out of 642 differentially expressed unique sequences was further investigated by Real-Time PCR, showing a validation of the SSH results as high as 69%. Library-specific cDNA repertories were annotated according to the three main vocabularies of the gene ontology (GO): cellular component, biological process and molecular function. BlastX analysis, GO terms mapping and annotation analysis were performed using the Blast2GO software, a research tool designed with the main purpose of enabling GO based data mining on sequence sets for which no GO annotation is yet available. Bioinformatic analysis pointed out a significantly different distribution of the annotated sequences for each GO category, when comparing the three fruit developmental stages. The olive fruit-specific transcriptome dataset was used to query all known KEGG (Kyoto Encyclopaedia of Genes and Genomes) metabolic pathways for characterizing and positioning retrieved EST records. The integration of the olive sequence datasets within the MapMan platform for microarray analysis allowed the identification of specific biosynthetic pathways useful for the definition of key functional categories in time course analyses for gene groups. Conclusion The bioinformatic annotation of all gene sequences was useful to shed light on metabolic pathways and transcriptional aspects related to carbohydrates, fatty acids, secondary metabolites, transcription factors and hormones as well as response to biotic and abiotic stresses throughout olive drupe development. These results represent a first step toward both functional genomics and systems biology research for understanding the gene functions and regulatory networks in olive fruit growth and ripening. PMID:19852839
2012-01-01
Background For decades the tobacco plant has served as a model organism in plant biology to answer fundamental biological questions in the areas of plant development, physiology, and genetics. Due to the lack of sufficient coverage of genomic sequences, however, none of the expressed sequence tag (EST)-based chips developed to date cover gene expression from the whole genome. The availability of Tobacco Genome Initiative (TGI) sequences provides a useful resource to build a whole genome exon array, even if the assembled sequences are highly fragmented. Here, the design of a Tobacco Exon Array is reported and an application to improve the understanding of genes regulated by cadmium (Cd) in tobacco is described. Results From the analysis and annotation of the 1,271,256 Nicotiana tabacum fasta and quality files from methyl filtered genomic survey sequences (GSS) obtained from the TGI and ~56,000 ESTs available in public databases, an exon array with 272,342 probesets was designed (four probes per exon) and tested on two selected tobacco varieties. Two tobacco varieties out of 45 accumulating low and high cadmium in leaf were identified based on the GGE biplot analysis, which is analysis of the genotype main effect (G) plus analysis of the genotype by environment interaction (GE) of eight field trials (four fields over two years) showing reproducibility across the trials. The selected varieties were grown under greenhouse conditions in two different soils and subjected to exon array analyses using root and leaf tissues to understand the genetic make-up of the Cd accumulation. Conclusions An Affymetrix Exon Array was developed to cover a large (~90%) proportion of the tobacco gene space. The Tobacco Exon Array will be available for research use through Affymetrix array catalogue. As a proof of the exon array usability, we have demonstrated that the Tobacco Exon Array is a valuable tool for studying Cd accumulation in tobacco leaves. Data from field and greenhouse experiments supported by gene expression studies strongly suggested that the difference in leaf Cd accumulation between the two specific tobacco cultivars is dependent solely on genetic factors and genetic variability rather than on the environment. PMID:23190529
MicroRNA Expression Profile in Penile Cancer Revealed by Next-Generation Small RNA Sequencing
Zhang, Yuanwei; Xu, Bo; Zhou, Jun; Fan, Song; Hao, Zongyao; Shi, Haoqiang; Zhang, Xiansheng; Kong, Rui; Xu, Lingfan; Gao, Jingjing; Zou, Duohong; Liang, Chaozhao
2015-01-01
Penile cancer (PeCa) is a relatively rare tumor entity but possesses higher morbidity and mortality rates especially in developing countries. To date, the concrete pathogenic signaling pathways and core machineries involved in tumorigenesis and progression of PeCa remain to be elucidated. Several studies suggested miRNAs, which modulate gene expression at posttranscriptional level, were frequently mis-regulated and aberrantly expressed in human cancers. However, the miRNA profile in human PeCa has not been reported before. In this present study, the miRNA profile was obtained from 10 fresh penile cancerous tissues and matched adjacent non-cancerous tissues via next-generation sequencing. As a result, a total of 751 and 806 annotated miRNAs were identified in normal and cancerous penile tissues, respectively. Among which, 56 miRNAs with significantly different expression levels between paired tissues were identified. Subsequently, several annotated miRNAs were selected randomly and validated using quantitative real-time PCR. Compared with the previous publications regarding to the altered miRNAs expression in various cancers and especially genitourinary (prostate, bladder, kidney, testis) cancers, the most majority of deregulated miRNAs showed the similar expression pattern in penile cancer. Moreover, the bioinformatics analyses suggested that the putative target genes of differentially expressed miRNAs between cancerous and matched normal penile tissues were tightly associated with cell junction, proliferation, growth as well as genomic instability and so on, by modulating Wnt, MAPK, p53, PI3K-Akt, Notch and TGF-β signaling pathways, which were all well-established to participate in cancer initiation and progression. Our work presents a global view of the differentially expressed miRNAs and potentially regulatory networks of their target genes for clarifying the pathogenic transformation of normal penis to PeCa, which research resource also provides new insights into future investigations aimed to explore the in-depth mechanisms of miRNAs and other small RNAs including piRNAs in penile carcinogenesis regulation and effective target-specific theragnosis. PMID:26158897
Gao, Jian Ping; Wang, Dong; Cao, Ling Ya; Sun, Hai Feng
2015-01-01
Background Codonopsis pilosula (Franch.) Nannf. is one of the most widely used medicinal plants. Although chemical and pharmacological studies have shown that codonopsis polysaccharides (CPPs) are bioactive compounds and that their composition is variable, their biosynthetic pathways remain largely unknown. Next-generation sequencing is an efficient and high-throughput technique that allows the identification of candidate genes involved in secondary metabolism. Principal Findings To identify the components involved in CPP biosynthesis, a transcriptome library, prepared using root and other tissues, was assembled with the help of Illumina sequencing. A total of 9.2 Gb of clean nucleotides was obtained comprising 91,175,044 clean reads, 102,125 contigs, and 45,511 unigenes. After aligning the sequences to the public protein databases, 76.1% of the unigenes were annotated. Among these annotated unigenes, 26,189 were assigned to Gene Ontology categories, 11,415 to Clusters of Orthologous Groups, and 18,848 to Kyoto Encyclopedia of Genes and Genomes pathways. Analysis of abundance of transcripts in the library showed that genes, including those encoding metallothionein, aquaporin, and cysteine protease that are related to stress responses, were in the top list. Among genes involved in the biosynthesis of CPP, those responsible for the synthesis of UDP-L-arabinose and UDP-xylose were highly expressed. Significance To our knowledge, this is the first study to provide a public transcriptome dataset prepared from C. pilosula and an outline of the biosynthetic pathway of polysaccharides in a medicinal plant. Identified candidate genes involved in CPP biosynthesis provide understanding of the biosynthesis and regulation of CPP at the molecular level. PMID:25719364
Linking microarray reporters with protein functions.
Gaj, Stan; van Erk, Arie; van Haaften, Rachel I M; Evelo, Chris T A
2007-09-26
The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways. This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways. Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/.
Castle, John C; Armour, Christopher D; Löwer, Martin; Haynor, David; Biery, Matthew; Bouzek, Heather; Chen, Ronghua; Jackson, Stuart; Johnson, Jason M; Rohl, Carol A; Raymond, Christopher K
2010-07-26
Non-coding RNAs (ncRNAs) are an essential class of molecular species that have been difficult to monitor on high throughput platforms due to frequent lack of polyadenylation. Using a polyadenylation-neutral amplification protocol and next-generation sequencing, we explore ncRNA expression in eleven human tissues. ncRNAs 7SL, U2, 7SK, and HBII-52 are expressed at levels far exceeding mRNAs. C/D and H/ACA box snoRNAs are associated with rRNA methylation and pseudouridylation, respectively: spleen expresses both, hypothalamus expresses mainly C/D box snoRNAs, and testes show enriched expression of both H/ACA box snoRNAs and RNA telomerase TERC. Within the snoRNA 14q cluster, 14q(I-6) is expressed at much higher levels than other cluster members. More reads align to mitochondrial than nuclear tRNAs. Many lincRNAs are actively transcribed, particularly those overlapping known ncRNAs. Within the Prader-Willi syndrome loci, the snoRNA HBII-85 (group I) cluster is highly expressed in hypothalamus, greater than in other tissues and greater than group II or III. Additionally, within the disease locus we find novel transcription across a 400,000 nt span in ovaries. This genome-wide polyA-neutral expression compendium demonstrates the richness of ncRNA expression, their high expression patterns, their function-specific expression patterns, and is publicly available.
Nakazato, Takeru; Bono, Hidemasa
2017-01-01
Abstract It is important for public data repositories to promote the reuse of archived data. In the growing field of omics science, however, the increasing number of submissions of high-throughput sequencing (HTSeq) data to public repositories prevents users from choosing a suitable data set from among the large number of search results. Repository users need to be able to set a threshold to reduce the number of results to obtain a suitable subset of high-quality data for reanalysis. We calculated the quality of sequencing data archived in a public data repository, the Sequence Read Archive (SRA), by using the quality control software FastQC. We obtained quality values for 1 171 313 experiments, which can be used to evaluate the suitability of data for reuse. We also visualized the data distribution in SRA by integrating the quality information and metadata of experiments and samples. We provide quality information of all of the archived sequencing data, which enable users to obtain sufficient quality sequencing data for reanalyses. The calculated quality data are available to the public in various formats. Our data also provide an example of enhancing the reuse of public data by adding metadata to published research data by a third party. PMID:28449062
T3SEdb: data warehousing of virulence effectors secreted by the bacterial Type III Secretion System.
Tay, Daniel Ming Ming; Govindarajan, Kunde Ramamoorthy; Khan, Asif M; Ong, Terenze Yao Rui; Samad, Hanif M; Soh, Wei Wei; Tong, Minyan; Zhang, Fan; Tan, Tin Wee
2010-10-15
Effectors of Type III Secretion System (T3SS) play a pivotal role in establishing and maintaining pathogenicity in the host and therefore the identification of these effectors is important in understanding virulence. However, the effectors display high level of sequence diversity, therefore making the identification a difficult process. There is a need to collate and annotate existing effector sequences in public databases to enable systematic analyses of these sequences for development of models for screening and selection of putative novel effectors from bacterial genomes that can be validated by a smaller number of key experiments. Herein, we present T3SEdb http://effectors.bic.nus.edu.sg/T3SEdb, a specialized database of annotated T3SS effector (T3SE) sequences containing 1089 records from 46 bacterial species compiled from the literature and public protein databases. Procedures have been defined for i) comprehensive annotation of experimental status of effectors, ii) submission and curation review of records by users of the database, and iii) the regular update of T3SEdb existing and new records. Keyword fielded and sequence searches (BLAST, regular expression) are supported for both experimentally verified and hypothetical T3SEs. More than 171 clusters of T3SEs were detected based on sequence identity comparisons (intra-cluster difference up to ~60%). Owing to this high level of sequence diversity of T3SEs, the T3SEdb provides a large number of experimentally known effector sequences with wide species representation for creation of effector predictors. We created a reliable effector prediction tool, integrated into the database, to demonstrate the application of the database for such endeavours. T3SEdb is the first specialised database reported for T3SS effectors, enriched with manual annotations that facilitated systematic construction of a reliable prediction model for identification of novel effectors. The T3SEdb represents a platform for inclusion of additional annotations of metadata for future developments of sophisticated effector prediction models for screening and selection of putative novel effectors from bacterial genomes/proteomes that can be validated by a small number of key experiments.
Increased complexity of circRNA expression during species evolution.
Dong, Rui; Ma, Xu-Kai; Chen, Ling-Ling; Yang, Li
2017-08-03
Circular RNAs (circRNAs) are broadly identified from precursor mRNA (pre-mRNA) back-splicing across various species. Recent studies have suggested a cell-/tissue- specific manner of circRNA expression. However, the distinct expression pattern of circRNAs among species and its underlying mechanism still remain to be explored. Here, we systematically compared circRNA expression from human and mouse, and found that only a small portion of human circRNAs could be determined in parallel mouse samples. The conserved circRNA expression between human and mouse is correlated with the existence of orientation-opposite complementary sequences in introns that flank back-spliced exons in both species, but not the circRNA sequences themselves. Quantification of RNA pairing capacity of orientation-opposite complementary sequences across circRNA-flanking introns by Complementary Sequence Index (CSI) identifies that among all types of complementary sequences, SINEs, especially Alu elements in human, contribute the most for circRNA formation and that their diverse distribution across species leads to the increased complexity of circRNA expression during species evolution. Together, our integrated and comparative reference catalog of circRNAs in different species reveals a species-specific pattern of circRNA expression and suggests a previously under-appreciated impact of fast-evolved SINEs on the regulation of (circRNA) gene expression.
Alu sequence involvement in transcriptional insulation of the keratin 18 gene in transgenic mice.
Thorey, I S; Ceceña, G; Reynolds, W; Oshima, R G
1993-01-01
The human keratin 18 (K18) gene is expressed in a variety of adult simple epithelial tissues, including liver, intestine, lung, and kidney, but is not normally found in skin, muscle, heart, spleen, or most of the brain. Transgenic animals derived from the cloned K18 gene express the transgene in appropriate tissues at levels directly proportional to the copy number and independently of the sites of integration. We have investigated in transgenic mice the dependence of K18 gene expression on the distal 5' and 3' flanking sequences and upon the RNA polymerase III promoter of an Alu repetitive DNA transcription unit immediately upstream of the K18 promoter. Integration site-independent expression of tandemly duplicated K18 transgenes requires the presence of either an 825-bp fragment of the 5' flanking sequence or the 3.5-kb 3' flanking sequence. Mutation of the RNA polymerase III promoter of the Alu element within the 825-bp fragment abolishes copy number-dependent expression in kidney but does not abolish integration site-independent expression when assayed in the absence of the 3' flanking sequence of the K18 gene. The characteristics of integration site-independent expression and copy number-dependent expression are separable. In addition, the formation of the chromatin state of the K18 gene, which likely restricts the tissue-specific expression of this gene, is not dependent upon the distal flanking sequences of the 10-kb K18 gene but rather may depend on internal regulatory regions of the gene. Images PMID:7692231
Sequencing artifacts in the type A influenza databases and attempts to correct them.
Suarez, David L; Chester, Nikki; Hatfield, Jason
2014-07-01
There are over 276 000 influenza gene sequences in public databases, with the quality of the sequences determined by the contributor. As part of a high school class project, influenza sequences with possible errors were identified in the public databases based on the size of the gene being longer than expected, with the hypothesis that these sequences would have an error. Students contacted sequence submitters alerting them of the possible sequence issue(s) and requested they the suspect sequence(s) be correct as appropriate. Type A influenza viruses were screened, and gene segments longer than the accepted size were identified for further analysis. Attention was placed on sequences with additional nucleotides upstream or downstream of the highly conserved non-coding ends of the viral segments. A total of 1081 sequences were identified that met this criterion. Three types of errors were commonly observed: non-influenza primer sequence wasn't removed from the sequence; PCR product was cloned and plasmid sequence was included in the sequence; and Taq polymerase added an adenine at the end of the PCR product. Internal insertions of nucleotide sequence were also commonly observed, but in many cases it was unclear if the sequence was correct or actually contained an error. A total of 215 sequences, or 22.8% of the suspect sequences, were corrected in the public databases in the first year of the student project. Unfortunately 138 additional sequences with possible errors were added to the databases in the second year. Additional awareness of the need for data integrity of sequences submitted to public databases is needed to fully reap the benefits of these large data sets. © 2014 The Authors. Influenza and Other Respiratory Viruses Published by John Wiley & Sons Ltd.
Positive Selection Underlies Faster-Z Evolution of Gene Expression in Birds.
Dean, Rebecca; Harrison, Peter W; Wright, Alison E; Zimmer, Fabian; Mank, Judith E
2015-10-01
The elevated rate of evolution for genes on sex chromosomes compared with autosomes (Fast-X or Fast-Z evolution) can result either from positive selection in the heterogametic sex or from nonadaptive consequences of reduced relative effective population size. Recent work in birds suggests that Fast-Z of coding sequence is primarily due to relaxed purifying selection resulting from reduced relative effective population size. However, gene sequence and gene expression are often subject to distinct evolutionary pressures; therefore, we tested for Fast-Z in gene expression using next-generation RNA-sequencing data from multiple avian species. Similar to studies of Fast-Z in coding sequence, we recover clear signatures of Fast-Z in gene expression; however, in contrast to coding sequence, our data indicate that Fast-Z in expression is due to positive selection acting primarily in females. In the soma, where gene expression is highly correlated between the sexes, we detected Fast-Z in both sexes, although at a higher rate in females, suggesting that many positively selected expression changes in females are also expressed in males. In the gonad, where intersexual correlations in expression are much lower, we detected Fast-Z for female gene expression, but crucially, not males. This suggests that a large amount of expression variation is sex-specific in its effects within the gonad. Taken together, our results indicate that Fast-Z evolution of gene expression is the product of positive selection acting on recessive beneficial alleles in the heterogametic sex. More broadly, our analysis suggests that the adaptive potential of Z chromosome gene expression may be much greater than that of gene sequence, results which have important implications for the role of sex chromosomes in speciation and sexual selection. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Analysis of Epstein-Barr Virus Genomes and Expression Profiles in Gastric Adenocarcinoma.
Borozan, Ivan; Zapatka, Marc; Frappier, Lori; Ferretti, Vincent
2018-01-15
Epstein-Barr virus (EBV) is a causative agent of a variety of lymphomas, nasopharyngeal carcinoma (NPC), and ∼9% of gastric carcinomas (GCs). An important question is whether particular EBV variants are more oncogenic than others, but conclusions are currently hampered by the lack of sequenced EBV genomes. Here, we contribute to this question by mining whole-genome sequences of 201 GCs to identify 13 EBV-positive GCs and by assembling 13 new EBV genome sequences, almost doubling the number of available GC-derived EBV genome sequences and providing the first non-Asian EBV genome sequences from GC. Whole-genome sequence comparisons of all EBV isolates sequenced to date (85 from tumors and 57 from healthy individuals) showed that most GC and NPC EBV isolates were closely related although American Caucasian GC samples were more distant, suggesting a geographical component. However, EBV GC isolates were found to contain some consistent changes in protein sequences regardless of geographical origin. In addition, transcriptome data available for eight of the EBV-positive GCs were analyzed to determine which EBV genes are expressed in GC. In addition to the expected latency proteins (EBNA1, LMP1, and LMP2A), specific subsets of lytic genes were consistently expressed that did not reflect a typical lytic or abortive lytic infection, suggesting a novel mechanism of EBV gene regulation in the context of GC. These results are consistent with a model in which a combination of specific latent and lytic EBV proteins promotes tumorigenesis. IMPORTANCE Epstein-Barr virus (EBV) is a widespread virus that causes cancer, including gastric carcinoma (GC), in a small subset of individuals. An important question is whether particular EBV variants are more cancer associated than others, but more EBV sequences are required to address this question. Here, we have generated 13 new EBV genome sequences from GC, almost doubling the number of EBV sequences from GC isolates and providing the first EBV sequences from non-Asian GC. We further identify sequence changes in some EBV proteins common to GC isolates. In addition, gene expression analysis of eight of the EBV-positive GCs showed consistent expression of both the expected latency proteins and a subset of lytic proteins that was not consistent with typical lytic or abortive lytic expression. These results suggest that novel mechanisms activate expression of some EBV lytic proteins and that their expression may contribute to oncogenesis. Copyright © 2018 American Society for Microbiology.
Public clonotype usage identifies protective Gag-specific CD8+ T cell responses in SIV infection
Asher, Tedi E.; Wilson, Nancy A.; Nason, Martha C.; Brenchley, Jason M.; Metzler, Ian S.; Venturi, Vanessa; Gostick, Emma; Chattopadhyay, Pratip K.; Roederer, Mario; Davenport, Miles P.; Watkins, David I.; Douek, Daniel C.
2009-01-01
Despite the pressing need for an AIDS vaccine, the determinants of protective immunity to HIV remain concealed within the complexity of adaptive immune responses. We dissected immunodominant virus-specific CD8+ T cell populations in Mamu-A*01+ rhesus macaques with primary SIV infection to elucidate the hallmarks of effective immunity at the level of individual constituent clonotypes, which were identified according to the expression of distinct T cell receptors (TCRs). The number of public clonotypes, defined as those that expressed identical TCR β-chain amino acid sequences and recurred in multiple individuals, contained within the acute phase CD8+ T cell population specific for the biologically constrained Gag CM9 (CTPYDINQM; residues 181–189) epitope correlated negatively with the virus load set point. This independent molecular signature of protection was confirmed in a prospective vaccine trial, in which clonotype engagement was governed by the nature of the antigen rather than the context of exposure and public clonotype usage was associated with enhanced recognition of epitope variants. Thus, the pattern of antigen-specific clonotype recruitment within a protective CD8+ T cell population is a prognostic indicator of vaccine efficacy and biological outcome in an AIDS virus infection. PMID:19349463
2012-01-01
Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates. PMID:22985019
Robles, José A; Qureshi, Sumaira E; Stephen, Stuart J; Wilson, Susan R; Burden, Conrad J; Taylor, Jennifer M
2012-09-17
RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.
Selinger, David A.; Chandler, Vicki L.
2001-01-01
The maize (Zea mays) b1 gene encodes a transcription factor that regulates the anthocyanin pigment pathway. Of the b1 alleles with distinct tissue-specific expression, B-Peru and B-Bolivia are the only alleles that confer seed pigmentation. B-Bolivia produces variable and weaker seed expression but darker, more regular plant expression relative to B-Peru. Our experiments demonstrated that B-Bolivia is not expressed in the seed when transmitted through the male. When transmitted through the female the proportion of kernels pigmented and the intensity of pigment varied. Molecular characterization of B-Bolivia demonstrated that it shares the first 530 bp of the upstream region with B-Peru, a region sufficient for seed expression. Immediately upstream of 530 bp, B-Bolivia is completely divergent from B-Peru. These sequences share sequence similarity to retrotransposons. Transient expression assays of various promoter constructs identified a 33-bp region in B-Bolivia that can account for the reduced aleurone pigment amounts (40%) observed with B-Bolivia relative to B-Peru. Transgenic plants carrying the B-Bolivia promoter proximal region produced pigmented seeds. Similar to native B-Bolivia, some transgene loci are variably expressed in seeds. In contrast to native B-Bolivia, the transgene loci are expressed in seeds when transmitted through both the male and female. Some transgenic lines produced pigment in vegetative tissues, but the tissue-specificity was different from B-Bolivia, suggesting the introduced sequences do not contain the B-Bolivia plant-specific regulatory sequences. We hypothesize that the chromatin context of the B-Bolivia allele controls its epigenetic seed expression properties, which could be influenced by the adjacent highly repeated retrotransposon sequence. PMID:11244116
expVIP: a Customizable RNA-seq Data Analysis and Visualization Platform1[OPEN
2016-01-01
The majority of transcriptome sequencing (RNA-seq) expression studies in plants remain underutilized and inaccessible due to the use of disparate transcriptome references and the lack of skills and resources to analyze and visualize these data. We have developed expVIP, an expression visualization and integration platform, which allows easy analysis of RNA-seq data combined with an intuitive and interactive interface. Users can analyze public and user-specified data sets with minimal bioinformatics knowledge using the expVIP virtual machine. This generates a custom Web browser to visualize, sort, and filter the RNA-seq data and provides outputs for differential gene expression analysis. We demonstrate expVIP’s suitability for polyploid crops and evaluate its performance across a range of biologically relevant scenarios. To exemplify its use in crop research, we developed a flexible wheat (Triticum aestivum) expression browser (www.wheat-expression.com) that can be expanded with user-generated data in a local virtual machine environment. The open-access expVIP platform will facilitate the analysis of gene expression data from a wide variety of species by enabling the easy integration, visualization, and comparison of RNA-seq data across experiments. PMID:26869702
Oishi, M; Gohma, H; Lejukole, H Y; Taniguchi, Y; Yamada, T; Suzuki, K; Shinkai, H; Uenishi, H; Yasue, H; Sasaki, Y
2004-05-01
Expressed sequence tags (ESTs) generated based on characterization of clones isolated randomly from cDNA libraries are used to study gene expression profiles in specific tissues and to provide useful information for characterizing tissue physiology. In this study, two directionally cloned cDNA libraries were constructed from 60 day-old bovine whole fetus and fetal placenta. We have characterized 5357 and 1126 clones, and then identified 3464 and 795 unique sequences for the fetus and placenta cDNA libraries: 1851 and 504 showed homology to already identified genes, and 1613 and 291 showed no significant matches to any of the sequences in DNA databases, respectively. Further, we found 94 unique sequences overlapping in both the fetus and the placenta, leading to a catalog of 4165 genes expressed in 60 day-old fetus and placenta. The catalog is used to examine expression profile of genes in 60 day-old bovine fetus and placenta.
Yang, Yabo; Hu, Dong; Wang, Lexun; Liang, Chi; Hu, Xuchu; Xu, Jin; Huang, Yan; Yu, Xinbing
2014-01-01
Clonorchiasis, which has been an important public health problem in China, is caused by ingestion of raw or undercooked fish contaminated by live metacercaria. Therefore, preventing fish from infecting is of great significance for controlling the disease. SERPINs (serine protease inhibitors) are well known as negative regulators of hemostasis, thrombolysis, and innate immune responses. In the present study, two full-length sequences encoding SERPIN were identified from metacercaria cDNA library of Clonorchis sinensis (C. sinensis) and were denominated as CsSERPIN and CsSERPIN3, respectively. Bioinformatics analysis showed that the two sequences shares 35.9% identity to each other. Both of the sequences have SERPIN domain and the greatest difference between the two domains is the reactive centre loop. Transmembrane region was found in CsSERPIN3 while not in CsSERPIN. The expression of the two CsSERPINs was significantly higher at the life stage of metacercaria than that of adult. The transcription levels of CsSERPIN and CsSERPIN3 at metacercaria stage were 3.249- and 11.314-fold of that at adult stage, respectively. Furthermore, the expression of CsSERPIN was 4.32-fold of that of CsSERPIN3 at metacercaria stage. Immunobiochemistry revealed that CsERPIN was dispersed at subtegument and oral sucker of metacercaria, while CsSERPIN3 localized intensely in the tegument of metacercaria of C. sinensis inside of the cyst wall. All these indicated that the CsSERPINs play important roles at metacercaria stage of the parasite. CsSERPIN may take part in regulation of endogenous serine proteinase and CsSERPIN3 may be involved in immune evasion and be a potential candidate for vaccine and drug target for clonorchiasis. PMID:24831344
Yang, Yabo; Hu, Dong; Wang, Lexun; Liang, Chi; Hu, Xuchu; Xu, Jin; Huang, Yan; Yu, Xinbing
2014-06-01
Clonorchiasis, which has been an important public health problem in China, is caused by ingestion of raw or undercooked fish contaminated by live metacercaria. Therefore, preventing fish from infecting is of great significance for controlling the disease. SERPINs (serine protease inhibitors) are well known as negative regulators of hemostasis, thrombolysis, and innate immune responses. In the present study, two full-length sequences encoding SERPIN were identified from metacercaria cDNA library of Clonorchis sinensis (C. sinensis) and were denominated as CsSERPIN and CsSERPIN3, respectively. Bioinformatics analysis showed that the two sequences shares 35.9% identity to each other. Both of the sequences have SERPIN domain and the greatest difference between the two domains is the reactive centre loop. Transmembrane region was found in CsSERPIN3 while not in CsSERPIN. The expression of the two CsSERPINs was significantly higher at the life stage of metacercaria than that of adult. The transcription levels of CsSERPIN and CsSERPIN3 at metacercaria stage were 3.249- and 11.314-fold of that at adult stage, respectively. Furthermore, the expression of CsSERPIN was 4.32-fold of that of CsSERPIN3 at metacercaria stage. Immunobiochemistry revealed that CsERPIN was dispersed at subtegument and oral sucker of metacercaria, while CsSERPIN3 localized intensely in the tegument of metacercaria of C. sinensis inside of the cyst wall. All these indicated that the CsSERPINs play important roles at metacercaria stage of the parasite. CsSERPIN may take part in regulation of endogenous serine proteinase and CsSERPIN3 may be involved in immune evasion and be a potential candidate for vaccine and drug target for clonorchiasis.
De novo transcriptome sequencing and analysis of the cereal cyst nematode, Heterodera avenae.
Kumar, Mukesh; Gantasala, Nagavara Prasad; Roychowdhury, Tanmoy; Thakur, Prasoon Kumar; Banakar, Prakash; Shukla, Rohit N; Jones, Michael G K; Rao, Uma
2014-01-01
The cereal cyst nematode (CCN, Heterodera avenae) is a major pest of wheat (Triticum spp) that reduces crop yields in many countries. Cyst nematodes are obligate sedentary endoparasites that reproduce by amphimixis. Here, we report the first transcriptome analysis of two stages of H. avenae. After sequencing extracted RNA from pre parasitic infective juvenile and adult stages of the life cycle, 131 million Illumina high quality paired end reads were obtained which generated 27,765 contigs with N50 of 1,028 base pairs, of which 10,452 were annotated. Comparative analyses were undertaken to evaluate H. avenae sequences with those of other plant, animal and free living nematodes to identify differences in expressed genes. There were 4,431 transcripts common to H. avenae and the free living nematode Caenorhabditis elegans, and 9,462 in common with more closely related potato cyst nematode, Globodera pallida. Annotation of H. avenae carbohydrate active enzymes (CAZy) revealed fewer glycoside hydrolases (GHs) but more glycosyl transferases (GTs) and carbohydrate esterases (CEs) when compared to M. incognita. 1,280 transcripts were found to have secretory signature, presence of signal peptide and absence of transmembrane. In a comparison of genes expressed in the pre-parasitic juvenile and feeding female stages, expression levels of 30 genes with high RPKM (reads per base per kilo million) value, were analysed by qRT-PCR which confirmed the observed differences in their levels of expression levels. In addition, we have also developed a user-friendly resource, Heterodera transcriptome database (HATdb) for public access of the data generated in this study. The new data provided on the transcriptome of H. avenae adds to the genetic resources available to study plant parasitic nematodes and provides an opportunity to seek new effectors that are specifically involved in the H. avenae-cereal host interaction.
De Novo Transcriptome Sequencing and Analysis of the Cereal Cyst Nematode, Heterodera avenae
Kumar, Mukesh; Gantasala, Nagavara Prasad; Roychowdhury, Tanmoy; Thakur, Prasoon Kumar; Banakar, Prakash; Shukla, Rohit N.; Jones, Michael G. K.; Rao, Uma
2014-01-01
The cereal cyst nematode (CCN, Heterodera avenae) is a major pest of wheat (Triticum spp) that reduces crop yields in many countries. Cyst nematodes are obligate sedentary endoparasites that reproduce by amphimixis. Here, we report the first transcriptome analysis of two stages of H. avenae. After sequencing extracted RNA from pre parasitic infective juvenile and adult stages of the life cycle, 131 million Illumina high quality paired end reads were obtained which generated 27,765 contigs with N50 of 1,028 base pairs, of which 10,452 were annotated. Comparative analyses were undertaken to evaluate H. avenae sequences with those of other plant, animal and free living nematodes to identify differences in expressed genes. There were 4,431 transcripts common to H. avenae and the free living nematode Caenorhabditis elegans, and 9,462 in common with more closely related potato cyst nematode, Globodera pallida. Annotation of H. avenae carbohydrate active enzymes (CAZy) revealed fewer glycoside hydrolases (GHs) but more glycosyl transferases (GTs) and carbohydrate esterases (CEs) when compared to M. incognita. 1,280 transcripts were found to have secretory signature, presence of signal peptide and absence of transmembrane. In a comparison of genes expressed in the pre-parasitic juvenile and feeding female stages, expression levels of 30 genes with high RPKM (reads per base per kilo million) value, were analysed by qRT-PCR which confirmed the observed differences in their levels of expression levels. In addition, we have also developed a user-friendly resource, Heterodera transcriptome database (HATdb) for public access of the data generated in this study. The new data provided on the transcriptome of H. avenae adds to the genetic resources available to study plant parasitic nematodes and provides an opportunity to seek new effectors that are specifically involved in the H. avenae-cereal host interaction. PMID:24802510
Wei, Hairong; Chen, Xin; Zong, Xiaojuan; Shu, Huairui; Gao, Dongsheng; Liu, Qingzhong
2015-01-01
Background Fruit color is one of the most important economic traits of the sweet cherry (Prunus avium L.). The red coloration of sweet cherry fruit is mainly attributed to anthocyanins. However, limited information is available regarding the molecular mechanisms underlying anthocyanin biosynthesis and its regulation in sweet cherry. Methodology/Principal Findings In this study, a reference transcriptome of P. avium L. was sequenced and annotated to identify the transcriptional determinants of fruit color. Normalized cDNA libraries from red and yellow fruits were sequenced using the next-generation Illumina/Solexa sequencing platform and de novo assembly. Over 66 million high-quality reads were assembled into 43,128 unigenes using a combined assembly strategy. Then a total of 22,452 unigenes were compared to public databases using homology searches, and 20,095 of these unigenes were annotated in the Nr protein database. Furthermore, transcriptome differences between the four stages of fruit ripening were analyzed using Illumina digital gene expression (DGE) profiling. Biological pathway analysis revealed that 72 unigenes were involved in anthocyanin biosynthesis. The expression patterns of unigenes encoding phenylalanine ammonia-lyase (PAL), 4-coumarate-CoA ligase (4CL), chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3-hydroxylase (F3H), flavanone 3’-hydroxylase (F3’H), dihydroflavonol 4-reductase (DFR), anthocyanidin synthase (ANS) and UDP glucose: flavonol 3-O-glucosyltransferase (UFGT) during fruit ripening differed between red and yellow fruit. In addition, we identified some transcription factor families (such as MYB, bHLH and WD40) that may control anthocyanin biosynthesis. We confirmed the altered expression levels of eighteen unigenes that encode anthocyanin biosynthetic enzymes and transcription factors using quantitative real-time PCR (qRT-PCR). Conclusions/Significance The obtained sweet cherry transcriptome and DGE profiling data provide comprehensive gene expression information that lends insights into the molecular mechanisms underlying anthocyanin biosynthesis. These results will provide a platform for further functional genomic research on this fruit crop. PMID:25799516
McArt, Darragh G.; Dunne, Philip D.; Blayney, Jaine K.; Salto-Tellez, Manuel; Van Schaeybroeck, Sandra; Hamilton, Peter W.; Zhang, Shu-Dong
2013-01-01
The advent of next generation sequencing technologies (NGS) has expanded the area of genomic research, offering high coverage and increased sensitivity over older microarray platforms. Although the current cost of next generation sequencing is still exceeding that of microarray approaches, the rapid advances in NGS will likely make it the platform of choice for future research in differential gene expression. Connectivity mapping is a procedure for examining the connections among diseases, genes and drugs by differential gene expression initially based on microarray technology, with which a large collection of compound-induced reference gene expression profiles have been accumulated. In this work, we aim to test the feasibility of incorporating NGS RNA-Seq data into the current connectivity mapping framework by utilizing the microarray based reference profiles and the construction of a differentially expressed gene signature from a NGS dataset. This would allow for the establishment of connections between the NGS gene signature and those microarray reference profiles, alleviating the associated incurring cost of re-creating drug profiles with NGS technology. We examined the connectivity mapping approach on a publicly available NGS dataset with androgen stimulation of LNCaP cells in order to extract candidate compounds that could inhibit the proliferative phenotype of LNCaP cells and to elucidate their potential in a laboratory setting. In addition, we also analyzed an independent microarray dataset of similar experimental settings. We found a high level of concordance between the top compounds identified using the gene signatures from the two datasets. The nicotine derivative cotinine was returned as the top candidate among the overlapping compounds with potential to suppress this proliferative phenotype. Subsequent lab experiments validated this connectivity mapping hit, showing that cotinine inhibits cell proliferation in an androgen dependent manner. Thus the results in this study suggest a promising prospect of integrating NGS data with connectivity mapping. PMID:23840550
Sun, Xiudong; Zhou, Shumei; Meng, Fanlu; Liu, Shiqi
2012-10-01
Garlic is widely used as a spice throughout the world for the culinary value of its flavor and aroma, which are created by the chemical transformation of a series of organic sulfur compounds. To analyze the transcriptome of Allium sativum and discover the genes involved in sulfur metabolism, cDNAs derived from the total RNA of Allium sativum buds were analyzed by Illumina sequencing. Approximately 26.67 million 90 bp paired-end clean reads were achieved in two libraries. A total of 127,933 unigenes were generated by de novo assembly and were compared with the sequences in public databases. Of these, 45,286 unigenes had significant hits to the sequences in the Nr database, 29,514 showed significant similarity to known proteins in the Swiss-Prot database and, 20,706 and 21,952 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Moreover, genes involved in organic sulfur biosynthesis were identified. These unigenes data will provide the foundation for research on gene expression, genomics and functional genomics in Allium sativum. Key message The obtained unigenes will provide the foundation for research on functional genomics in Allium sativum and its closely related species, and fill the gap of the existing plant EST database.
Mining SNPs from EST sequences using filters and ensemble classifiers.
Wang, J; Zou, Q; Guo, M Z
2010-05-04
Abundant single nucleotide polymorphisms (SNPs) provide the most complete information for genome-wide association studies. However, due to the bottleneck of manual discovery of putative SNPs and the inaccessibility of the original sequencing reads, it is essential to develop a more efficient and accurate computational method for automated SNP detection. We propose a novel computational method to rapidly find true SNPs in public-available EST (expressed sequence tag) databases; this method is implemented as SNPDigger. EST sequences are clustered and aligned. SNP candidates are then obtained according to a measure of redundant frequency. Several new informative biological features, such as the structural neighbor profiles and the physical position of the SNP, were extracted from EST sequences, and the effectiveness of these features was demonstrated. An ensemble classifier, which employs a carefully selected feature set, was included for the imbalanced training data. The sensitivity and specificity of our method both exceeded 80% for human genetic data in the cross validation. Our method enables detection of SNPs from the user's own EST dataset and can be used on species for which there is no genome data. Our tests showed that this method can effectively guide SNP discovery in ESTs and will be useful to avoid and save the cost of biological analyses.
Picardi, Ernesto; Gallo, Angela; Galeano, Federica; Tomaselli, Sara; Pesole, Graziano
2012-01-01
RNA editing is a post-transcriptional process occurring in a wide range of organisms. In human brain, the A-to-I RNA editing, in which individual adenosine (A) bases in pre-mRNA are modified to yield inosine (I), is the most frequent event. Modulating gene expression, RNA editing is essential for cellular homeostasis. Indeed, its deregulation has been linked to several neurological and neurodegenerative diseases. To date, many RNA editing sites have been identified by next generation sequencing technologies employing massive transcriptome sequencing together with whole genome or exome sequencing. While genome and transcriptome reads are not always available for single individuals, RNA-Seq data are widespread through public databases and represent a relevant source of yet unexplored RNA editing sites. In this context, we propose a simple computational strategy to identify genomic positions enriched in novel hypothetical RNA editing events by means of a new two-steps mapping procedure requiring only RNA-Seq data and no a priori knowledge of RNA editing characteristics and genomic reads. We assessed the suitability of our procedure by confirming A-to-I candidates using conventional Sanger sequencing and performing RNA-Seq as well as whole exome sequencing of human spinal cord tissue from a single individual. PMID:22957051
Mining of haplotype-based expressed sequence tag single nucleotide polymorphisms in citrus
2013-01-01
Background Single nucleotide polymorphisms (SNPs), the most abundant variations in a genome, have been widely used in various studies. Detection and characterization of citrus haplotype-based expressed sequence tag (EST) SNPs will greatly facilitate further utilization of these gene-based resources. Results In this paper, haplotype-based SNPs were mined out of publicly available citrus expressed sequence tags (ESTs) from different citrus cultivars (genotypes) individually and collectively for comparison. There were a total of 567,297 ESTs belonging to 27 cultivars in varying numbers and consequentially yielding different numbers of haplotype-based quality SNPs. Sweet orange (SO) had the most (213,830) ESTs, generating 11,182 quality SNPs in 3,327 out of 4,228 usable contigs. Summed from all the individually mining results, a total of 25,417 quality SNPs were discovered – 15,010 (59.1%) were transitions (AG and CT), 9,114 (35.9%) were transversions (AC, GT, CG, and AT), and 1,293 (5.0%) were insertion/deletions (indels). A vast majority of SNP-containing contigs consisted of only 2 haplotypes, as expected, but the percentages of 2 haplotype contigs varied widely in these citrus cultivars. BLAST of the 25,417 25-mer SNP oligos to the Clementine reference genome scaffolds revealed 2,947 SNPs had “no hits found”, 19,943 had 1 unique hit / alignment, 1,571 had one hit and 2+ alignments per hit, and 956 had 2+ hits and 1+ alignment per hit. Of the total 24,293 scaffold hits, 23,955 (98.6%) were on the main scaffolds 1 to 9, and only 338 were on 87 minor scaffolds. Most alignments had 100% (25/25) or 96% (24/25) nucleotide identities, accounting for 93% of all the alignments. Considering almost all the nucleotide discrepancies in the 24/25 alignments were at the SNP sites, it served well as in silico validation of these SNPs, in addition to and consistent with the rate (81%) validated by sequencing and SNaPshot assay. Conclusions High-quality EST-SNPs from different citrus genotypes were detected, and compared to estimate the heterozygosity of each genome. All the SNP oligo sequences were aligned with the Clementine citrus genome to determine their distribution and uniqueness and for in silico validation, in addition to SNaPshot and sequencing validation of selected SNPs. PMID:24175923
Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon
2011-01-01
Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns. PMID:21599934
Caldwell, Rachel; Lin, Yan-Xia; Zhang, Ren
2015-01-01
There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs) between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length. PMID:26114098
USDA-ARS?s Scientific Manuscript database
Expressed sequence tag (EST) simple sequence repeats (SSRs) in Prunus were mined, and flanking primers designed and used for genome-wide characterization and selection of primers to optimize marker distribution and reliability. A total of 12,618 contigs were assembled from 84,727 ESTs, along with 34...
Generate Optimized Genetic Rhythm for Enzyme Expression in Non-native systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
2016-11-03
Most amino acids are represented by more than one codon, resulting in redundancy in the genetic code. Silent codon substitutions that do not alter the amino acid sequence still have an effect on protein expression. We have developed an algorithm, GoGREEN, to enhance the expression of foreign proteins in a host organism. GoGREEN selects codons according to frequency patterns seen in the gene of interest using the codon usage table from the host organism. GoGREEN is also designed to accommodate gaps in the sequence.This software takes for input (1) the aligned protein sequences for genes the user wishes to express,more » (2) the codon usage table for the host organism, (3) and the DNA sequence for the target protein found in the host organism. The program will select codons based on codon usage patterns for the target DNA sequence. The program will also select codons for “gaps” found in the aligned protein sequences using the codon usage table from the host organism.« less
Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.
Song, Li; Florea, Liliana
2015-01-01
Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.
Gene: a gene-centered information resource at NCBI.
Brown, Garth R; Hem, Vichet; Katz, Kenneth S; Ovetsky, Michael; Wallin, Craig; Ermolaeva, Olga; Tolstoy, Igor; Tatusova, Tatiana; Pruitt, Kim D; Maglott, Donna R; Murphy, Terence D
2015-01-01
The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Le Chevanton, L; Leblon, G
1989-04-15
We cloned the ura5 gene coding for the orotate phosphoribosyl transferase from the ascomycete Sordaria macrospora by heterologous probing of a Sordaria genomic DNA library with the corresponding Podospora anserina sequence. The Sordaria gene was expressed in an Escherichia coli pyrE mutant strain defective for the same enzyme, and expression was shown to be promoted by plasmid sequences. The nucleotide sequence of the 1246-bp DNA fragment encompassing the region of homology with the Podospora gene has been determined. This sequence contains an open reading frame of 699 nucleotides. The deduced amino acid sequence shows 72% similarity with the corresponding Podospora protein.
Database Resources of the BIG Data Center in 2018.
2018-01-04
The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Analysis of Cytoskeletal and Motility Proteins in the Sea Urchin Genome Assembly
RL, Morris; MP, Hoffman; RA, Obar; SS, McCafferty; IR, Gibbons; AD, Leone; J, Cool; EL, Allgood; AM, Musante; KM, Judkins; BJ, Rossetti; AP, Rawson; DR, Burgess
2007-01-01
The sea urchin embryo is a classical model system for studying the role of the cytoskeleton in such events as fertilization, mitosis, cleavage, cell migration and gastrulation. We have conducted an analysis of gene models derived from the Strongylocentrotus purpuratus genome assembly and have gathered strong evidence for the existence of multiple gene families encoding cytoskeletal proteins and their regulators in sea urchin. While many cytoskeletal genes have been cloned from sea urchin with sequences already existing in public databases, genome analysis reveals a significantly higher degree of diversity within certain gene families. Furthermore, genes are described corresponding to homologs of cytoskeletal proteins not previously documented in sea urchins. To illustrate the varying degree of sequence diversity that exists within cytoskeletal gene families, we conducted an analysis of genes encoding actins, specific actin-binding proteins, myosins, tubulins, kinesins, dyneins, specific microtubule-associated proteins, and intermediate filaments. We conducted ontological analysis of select genes to better understand the relatedness of urchin cytoskeletal genes to those of other deuterostomes. We analyzed developmental expression (EST) data to confirm the existence of select gene models and to understand their differential expression during various stages of early development. PMID:17027957
The BIG Data Center: from deposition to integration to translation
2017-01-01
Biological data are generated at unprecedentedly exponential rates, posing considerable challenges in big data deposition, integration and translation. The BIG Data Center, established at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, provides a suite of database resources, including (i) Genome Sequence Archive, a data repository specialized for archiving raw sequence reads, (ii) Gene Expression Nebulas, a data portal of gene expression profiles based entirely on RNA-Seq data, (iii) Genome Variation Map, a comprehensive collection of genome variations for featured species, (iv) Genome Warehouse, a centralized resource housing genome-scale data with particular focus on economically important animals and plants, (v) Methylation Bank, an integrated database of whole-genome single-base resolution methylomes and (vi) Science Wikis, a central access point for biological wikis developed for community annotations. The BIG Data Center is dedicated to constructing and maintaining biological databases through big data integration and value-added curation, conducting basic research to translate big data into big knowledge and providing freely open access to a variety of data resources in support of worldwide research activities in both academia and industry. All of these resources are publicly available and can be found at http://bigd.big.ac.cn. PMID:27899658
Gao, Meiping; Zhang, Shangwen; Luo, Cong; He, Xinhua; Wei, Shaolong; Jiang, Wen; He, Fanglian; Lin, Zhicheng; Yan, Meixin; Dong, Weiqong
2018-04-05
Sagittaria sagittifolia L is an important bulb vegetable that has high nutritional and medical value. Bulb formation and development are crucial to Sagittaria sagittifolia; however, its sucrose metabolism is poorly understood and there are a lack of sufficient transcriptomic and genomic data available to fully understand the molecular mechanisms underlying bulb formation and development as well as the bulb transcriptome. Five cDNA libraries were constructed at different developmental stages and sequenced using high-throughput Illumina RNA sequencing. From approximately 63.53 Gb clean reads, a total of 60,884 unigenes, with an average length of 897.34 bp and N50 of 1.368 kb, were obtained. A total of 36,590 unigenes were successfully annotated using five public databases. Across different developmental stages, 4195, 827, 832, 851, and 1494 were differentially expressed in T02, T03, T04, T05, and T06 libraries, respectively. Gene ontology (GO) analysis revealed several differentially-expressed genes (DEGs) associated with catalytic activity, binding, and transporter activity. The Kyoto encyclopedia of genes and genomes (KEGG) revealed that these DEGs are involved in physiological and biochemical processes. RT-qPCR was used to profile the expression of these unigenes and revealed that the expression patterns of the DEGs were consistent with the transcriptome data. In this study, we conducted a comparative gene expression analysis at the transcriptional level using RNA-seq across the different developmental stages of Sagittaria sagittifolia. We identified a set of genes that might contribute to starch and sucrose metabolism, and the genetic mechanisms related to bulblet development were also explored. This study provides important data for future studies of the genetic and molecular mechanisms underlying bulb formation and development in Sagittaria sagittifolia. Copyright © 2018. Published by Elsevier B.V.
A Review of Recommendations for Sequencing Receptive and Expressive Language Instruction
ERIC Educational Resources Information Center
Petursdottir, Anna Ingeborg; Carr, James E.
2011-01-01
We review recommendations for sequencing instruction in receptive and expressive language objectives in early and intensive behavioral intervention (EIBI) programs. Several books recommend completing receptive protocols before introducing corresponding expressive protocols. However, this recommendation has little empirical support, and some…
Compositions and methods for xylem-specific expression in plant cells
DOE Office of Scientific and Technical Information (OSTI.GOV)
Han, Kyung-Hwan; Ko, Jae-Heung
The invention provides promoter sequences that regulate specific expression of operably linked sequences in developing xylem cells and/or in developing xylem tissue. The developing xylem-specific sequences are exemplified by the DX5, DX8, DX11, and DX15 promoters, portions thereof, and homologs thereof. The invention further provides expression vectors, cells, tissues and plants that contain the invention's sequences. The compositions of the invention and methods of using them are useful in, for example, improving the quantity (biomass) and/or the quality (wood density, lignin content, sugar content etc.) of expressed biomass feedstock products that may be used for bioenergy, biorefinary, and generating woodmore » products such as pulp, paper, and solid wood.« less
PmiRExAt: plant miRNA expression atlas database and web applications
Gurjar, Anoop Kishor Singh; Panwar, Abhijeet Singh; Gupta, Rajinder; Mantri, Shrikant S.
2016-01-01
High-throughput small RNA (sRNA) sequencing technology enables an entirely new perspective for plant microRNA (miRNA) research and has immense potential to unravel regulatory networks. Novel insights gained through data mining in publically available rich resource of sRNA data will help in designing biotechnology-based approaches for crop improvement to enhance plant yield and nutritional value. Bioinformatics resources enabling meta-analysis of miRNA expression across multiple plant species are still evolving. Here, we report PmiRExAt, a new online database resource that caters plant miRNA expression atlas. The web-based repository comprises of miRNA expression profile and query tool for 1859 wheat, 2330 rice and 283 maize miRNA. The database interface offers open and easy access to miRNA expression profile and helps in identifying tissue preferential, differential and constitutively expressing miRNAs. A feature enabling expression study of conserved miRNA across multiple species is also implemented. Custom expression analysis feature enables expression analysis of novel miRNA in total 117 datasets. New sRNA dataset can also be uploaded for analysing miRNA expression profiles for 73 plant species. PmiRExAt application program interface, a simple object access protocol web service allows other programmers to remotely invoke the methods written for doing programmatic search operations on PmiRExAt database. Database URL: http://pmirexat.nabi.res.in. PMID:27081157
Ohta, Tazro; Nakazato, Takeru; Bono, Hidemasa
2017-06-01
It is important for public data repositories to promote the reuse of archived data. In the growing field of omics science, however, the increasing number of submissions of high-throughput sequencing (HTSeq) data to public repositories prevents users from choosing a suitable data set from among the large number of search results. Repository users need to be able to set a threshold to reduce the number of results to obtain a suitable subset of high-quality data for reanalysis. We calculated the quality of sequencing data archived in a public data repository, the Sequence Read Archive (SRA), by using the quality control software FastQC. We obtained quality values for 1 171 313 experiments, which can be used to evaluate the suitability of data for reuse. We also visualized the data distribution in SRA by integrating the quality information and metadata of experiments and samples. We provide quality information of all of the archived sequencing data, which enable users to obtain sufficient quality sequencing data for reanalyses. The calculated quality data are available to the public in various formats. Our data also provide an example of enhancing the reuse of public data by adding metadata to published research data by a third party. © The Authors 2017. Published by Oxford University Press.
Lewis, Jo E; Brameld, John M; Hill, Phil; Barrett, Perry; Ebling, Francis J P; Jethwa, Preeti H
2015-12-30
The viral 2A sequence has become an attractive alternative to the traditional internal ribosomal entry site (IRES) for simultaneous over-expression of two genes and in combination with recombinant adeno-associated viruses (rAAV) has been used to manipulate gene expression in vitro. To develop a rAAV construct in combination with the viral 2A sequence to allow long-term over-expression of the vgf gene and fluorescent marker gene for tracking of the transfected neurones in vivo. Transient transfection of the AAV plasmid containing the vgf gene, viral 2A sequence and eGFP into SH-SY5Y cells resulted in eGFP fluorescence comparable to a commercially available reporter construct. This increase in fluorescent cells was accompanied by an increase in VGF mRNA expression. Infusion of the rAAV vector containing the vgf gene, viral 2A sequence and eGFP resulted in eGFP fluorescence in the hypothalamus of both mice and Siberian hamsters, 32 weeks post infusion. In situ hybridisation confirmed that the location of VGF mRNA expression in the hypothalamus corresponded to the eGFP pattern of fluorescence. The viral 2A sequence is much smaller than the traditional IRES and therefore allowed over-expression of the vgf gene with fluorescent tracking without compromising viral capacity. The use of the viral 2A sequence in the AAV plasmid allowed the simultaneous expression of both genes in vitro. When used in combination with rAAV it resulted in long-term over-expression of both genes at equivalent locations in the hypothalamus of both Siberian hamsters and mice, without any adverse effects. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
2010-01-01
Background Cytochrome P450 monooxygenases (P450s) catalyze oxidation of various substrates using oxygen and NAD(P)H. Plant P450s are involved in the biosynthesis of primary and secondary metabolites performing diverse biological functions. The recent availability of the soybean genome sequence allows us to identify and analyze soybean putative P450s at a genome scale. Co-expression analysis using an available soybean microarray and Illumina sequencing data provides clues for functional annotation of these enzymes. This approach is based on the assumption that genes that have similar expression patterns across a set of conditions may have a functional relationship. Results We have identified a total number of 332 full-length P450 genes and 378 pseudogenes from the soybean genome. From the full-length sequences, 195 genes belong to A-type, which could be further divided into 20 families. The remaining 137 genes belong to non-A type P450s and are classified into 28 families. A total of 178 probe sets were found to correspond to P450 genes on the Affymetrix soybean array. Out of these probe sets, 108 represented single genes. Using the 28 publicly available microarray libraries that contain organ-specific information, some tissue-specific P450s were identified. Similarly, stress responsive soybean P450s were retrieved from 99 microarray soybean libraries. We also utilized Illumina transcriptome sequencing technology to analyze the expressions of all 332 soybean P450 genes. This dataset contains total RNAs isolated from nodules, roots, root tips, leaves, flowers, green pods, apical meristem, mock-inoculated and Bradyrhizobium japonicum-infected root hair cells. The tissue-specific expression patterns of these P450 genes were analyzed and the expression of a representative set of genes were confirmed by qRT-PCR. We performed the co-expression analysis on many of the 108 P450 genes on the Affymetrix arrays. First we confirmed that CYP93C5 (an isoflavone synthase gene) is co-expressed with several genes encoding isoflavonoid-related metabolic enzymes. We then focused on nodulation-induced P450s and found that CYP728H1 was co-expressed with the genes involved in phenylpropanoid metabolism. Similarly, CYP736A34 was highly co-expressed with lipoxygenase, lectin and CYP83D1, all of which are involved in root and nodule development. Conclusions The genome scale analysis of P450s in soybean reveals many unique features of these important enzymes in this crop although the functions of most of them are largely unknown. Gene co-expression analysis proves to be a useful tool to infer the function of uncharacterized genes. Our work presented here could provide important leads toward functional genomics studies of soybean P450s and their regulatory network through the integration of reverse genetics, biochemistry, and metabolic profiling tools. The identification of nodule-specific P450s and their further exploitation may help us to better understand the intriguing process of soybean and rhizobium interaction. PMID:21062474
Defining objective clusters for rabies virus sequences using affinity propagation clustering
Fischer, Susanne; Freuling, Conrad M.; Pfaff, Florian; Bodenhofer, Ulrich; Höper, Dirk; Fischer, Mareike; Marston, Denise A.; Fooks, Anthony R.; Mettenleiter, Thomas C.; Conraths, Franz J.; Homeier-Bachmann, Timo
2018-01-01
Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses. PMID:29357361
Benfey, PN; Takatsuji, H; Ren, L; Shah, DM; Chua, NH
1990-01-01
We have analyzed expression from deletion derivatives of the 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) 5[prime]-upstream region in transgenic petunia flowers and seedlings. In seedlings, expression was strongest in root cortex cells and in trichomes. High-level expression in petals and in seedling roots was conferred by large (>500 base-pair) stretches of sequence, but was lost when smaller fragments were analyzed individually. This apparent requirement for extensive sequence suggests that combinations of cis-elements that are widely separated control tissue-specific expression from the EPSPS promoter. We have also used the high-level, petal-specific expression of the EPSPS promoter to change petal color in two mutant petunia lines. PMID:12354968
Barling, Adam; Swaminathan, Kankshita; Mitros, Therese; James, Brandon T; Morris, Juliette; Ngamboma, Ornella; Hall, Megan C; Kirkpatrick, Jessica; Alabady, Magdy; Spence, Ashley K; Hudson, Matthew E; Rokhsar, Daniel S; Moose, Stephen P
2013-12-09
The Miscanthus genus of perennial C4 grasses contains promising biofuel crops for temperate climates. However, few genomic resources exist for Miscanthus, which limits understanding of its interesting biology and future genetic improvement. A comprehensive catalog of expressed sequences were generated from a variety of Miscanthus species and tissue types, with an emphasis on characterizing gene expression changes in spring compared to fall rhizomes. Illumina short read sequencing technology was used to produce transcriptome sequences from different tissues and organs during distinct developmental stages for multiple Miscanthus species, including Miscanthus sinensis, Miscanthus sacchariflorus, and their interspecific hybrid Miscanthus × giganteus. More than fifty billion base-pairs of Miscanthus transcript sequence were produced. Overall, 26,230 Sorghum gene models (i.e., ~ 96% of predicted Sorghum genes) had at least five Miscanthus reads mapped to them, suggesting that a large portion of the Miscanthus transcriptome is represented in this dataset. The Miscanthus × giganteus data was used to identify genes preferentially expressed in a single tissue, such as the spring rhizome, using Sorghum bicolor as a reference. Quantitative real-time PCR was used to verify examples of preferential expression predicted via RNA-Seq. Contiguous consensus transcript sequences were assembled for each species and annotated using InterProScan. Sequences from the assembled transcriptome were used to amplify genomic segments from a doubled haploid Miscanthus sinensis and from Miscanthus × giganteus to further disentangle the allelic and paralogous variations in genes. This large expressed sequence tag collection creates a valuable resource for the study of Miscanthus biology by providing detailed gene sequence information and tissue preferred expression patterns. We have successfully generated a database of transcriptome assemblies and demonstrated its use in the study of genes of interest. Analysis of gene expression profiles revealed biological pathways that exhibit altered regulation in spring compared to fall rhizomes, which are consistent with their different physiological functions. The expression profiles of the subterranean rhizome provides a better understanding of the biological activities of the underground stem structures that are essentials for perenniality and the storage or remobilization of carbon and nutrient resources.
de Souza, C R; Aragão, F J; Moreira, E C O; Costa, C N M; Nascimento, S B; Carvalho, L J
2009-03-24
Cassava is one of the most important tropical food crops for more than 600 million people worldwide. Transgenic technologies can be useful for increasing its nutritional value and its resistance to viral diseases and insect pests. However, tissue-specific promoters that guarantee correct expression of transgenes would be necessary. We used inverse polymerase chain reaction to isolate a promoter sequence of the Mec1 gene coding for Pt2L4, a glutamic acid-rich protein differentially expressed in cassava storage roots. In silico analysis revealed putative cis-acting regulatory elements within this promoter sequence, including root-specific elements that may be required for its expression in vascular tissues. Transient expression experiments showed that the Mec1 promoter is functional, since this sequence was able to drive GUS expression in bean embryonic axes. Results from our computational analysis can serve as a guide for functional experiments to identify regions with tissue-specific Mec1 promoter activity. The DNA sequence that we identified is a new promoter that could be a candidate for genetic engineering of cassava roots.
Mueller, Jakob C; Kuhl, Heiner; Timmermann, Bernd; Kempenaers, Bart
2016-03-01
Decoding genomic sequences and determining their variation within populations has potential to reveal adaptive processes and unravel the genetic basis of ecologically relevant trait variation within a species. The blue tit Cyanistes caeruleus--a long-time ecological model species--has been used to investigate fitness consequences of variation in mating and reproductive behaviour. However, very little is known about the underlying genetic changes due to natural and sexual selection in the genome of this songbird. As a step to bridge this gap, we assembled the first draft genome of a single blue tit, mapped the transcriptome of five females and five males to this reference, identified genomewide variants and performed sex-differential expression analysis in the gonads, brain and other tissues. In the gonads, we found a high number of sex-biased genes, and of those, a similar proportion were sex-limited (genes only expressed in one sex) in males and females. However, in the brain, the proportion of female-limited genes within the female-biased gene category (82%) was substantially higher than the proportion of male-limited genes within the male-biased category (6%). This suggests a predominant on-off switching mechanism for the female-limited genes. In addition, most male-biased genes were located on the Z-chromosome, indicating incomplete dosage compensation for the male-biased genes. We called more than 500,000 SNPs from the RNA-seq data. Heterozygote detection in the single reference individual was highly congruent between DNA-seq and RNA-seq calling. Using information from these polymorphisms, we identified potential selection signals in the genome. We list candidate genes which can be used for further sequencing and detailed selection studies, including genes potentially related to meiotic drive evolution. A public genome browser of the blue tit with the described information is available at http://public-genomes-ngs.molgen.mpg.de. © 2015 John Wiley & Sons Ltd.
Caste-Specific and Sex-Specific Expression of Chemoreceptor Genes in a Termite
Mikheyev, Alexander; Tin, Mandy M. Y.; Watanabe, Yutaka; Matsuura, Kenji
2016-01-01
The sophisticated colony organization of eusocial insects is primarily maintained through the utilization of pheromones. The regulation of these complex social interactions requires intricate chemoreception systems. The recent publication of the genome of Zootermopsis nevadensis opened a new avenue to study molecular basis of termite caste systems. Although there has been a growing interest in the termite chemoreception system that regulates their sophisticated caste system, the relationship between division of labor and expression of chemoreceptor genes remains to be explored. Using high-throughput mRNA sequencing (RNA-seq), we found several chemoreceptors that are differentially expressed among castes and between sexes in a subterranean termite Reticulitermes speratus. In total, 53 chemoreception-related genes were annotated, including 22 odorant receptors, 7 gustatory receptors, 12 ionotropic receptors, 9 odorant-binding proteins, and 3 chemosensory proteins. Most of the chemoreception-related genes had caste-related and sex-related expression patterns; in particular, some chemoreception genes showed king-biased or queen-biased expression patterns. Moreover, more than half of the genes showed significant age-dependent differences in their expression in female and/or male reproductives. These results reveal a strong relationship between the evolution of the division of labor and the regulation of chemoreceptor gene expression, thereby demonstrating the chemical communication and underlining chemoreception mechanism in social insects. PMID:26760975
Caste-Specific and Sex-Specific Expression of Chemoreceptor Genes in a Termite.
Mitaka, Yuki; Kobayashi, Kazuya; Mikheyev, Alexander; Tin, Mandy M Y; Watanabe, Yutaka; Matsuura, Kenji
2016-01-01
The sophisticated colony organization of eusocial insects is primarily maintained through the utilization of pheromones. The regulation of these complex social interactions requires intricate chemoreception systems. The recent publication of the genome of Zootermopsis nevadensis opened a new avenue to study molecular basis of termite caste systems. Although there has been a growing interest in the termite chemoreception system that regulates their sophisticated caste system, the relationship between division of labor and expression of chemoreceptor genes remains to be explored. Using high-throughput mRNA sequencing (RNA-seq), we found several chemoreceptors that are differentially expressed among castes and between sexes in a subterranean termite Reticulitermes speratus. In total, 53 chemoreception-related genes were annotated, including 22 odorant receptors, 7 gustatory receptors, 12 ionotropic receptors, 9 odorant-binding proteins, and 3 chemosensory proteins. Most of the chemoreception-related genes had caste-related and sex-related expression patterns; in particular, some chemoreception genes showed king-biased or queen-biased expression patterns. Moreover, more than half of the genes showed significant age-dependent differences in their expression in female and/or male reproductives. These results reveal a strong relationship between the evolution of the division of labor and the regulation of chemoreceptor gene expression, thereby demonstrating the chemical communication and underlining chemoreception mechanism in social insects.
Oliveira-Neto, Osmundo B; Batista, João A N; Rigden, Daniel J; Fragoso, Rodrigo R; Silva, Rodrigo O; Gomes, Eliane A; Franco, Octávio L; Dias, Simoni C; Cordeiro, Célia M T; Monnerat, Rose G; Grossi-De-Sá, Maria F
2004-09-01
Fourteen different cDNA fragments encoding serine proteinases were isolated by reverse transcription-PCR from cotton boll weevil (Anthonomus grandis) larvae. A large diversity between the sequences was observed, with a mean pairwise identity of 22% in the amino acid sequence. The cDNAs encompassed 11 trypsin-like sequences classifiable into three families and three chymotrypsin-like sequences belonging to a single family. Using a combination of 5' and 3' RACE, the full-length sequence was obtained for five of the cDNAs, named Agser2, Agser5, Agser6, Agser10 and Agser21. The encoded proteins included amino acid sequence motifs of serine proteinase active sites, conserved cysteine residues, and both zymogen activation and signal peptides. Southern blotting analysis suggested that one or two copies of these serine proteinase genes exist in the A. grandis genome. Northern blotting analysis of Agser2 and Agser5 showed that for both genes, expression is induced upon feeding and is concentrated in the gut of larvae and adult insects. Reverse northern analysis of the 14 cDNA fragments showed that only two trypsin-like and two chymotrypsin-like were expressed at detectable levels. Under the effect of the serine proteinase inhibitors soybean Kunitz trypsin inhibitor and black-eyed pea trypsin/chymotrypsin inhibitor, expression of one of the trypsin-like sequences was upregulated while expression of the two chymotrypsin-like sequences was downregulated. Copyright 2004 Elsevier Ltd.
Cloning, sequencing, and expression of cDNA for human. beta. -glucuronidase
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oshima, A.; Kyle, J.W.; Miller, R.D.
1987-02-01
The authors report here the cDNA sequence for human placental ..beta..-glucuronidase (..beta..-D-glucuronoside glucuronosohydrolase, EC 3.2.1.31) and demonstrate expression of the human enzyme in transfected COS cells. They also sequenced a partial cDNA clone from human fibroblasts that contained a 153-base-pair deletion within the coding sequence and found a second type of cDNA clone from placenta that contained the same deletion. Nuclease S1 mapping studies demonstrated two types of mRNAs in human placenta that corresponded to the two types of cDNA clones isolated. The NH/sub 2/-terminal amino acid sequence determined for human spleen ..beta..-glucuronidase agreed with that inferred from the DNAmore » sequence of the two placental clones, beginning at amino acid 23, suggesting a cleaved signal sequence of 22 amino acids. When transfected into COS cells, plasmids containing either placental clone expressed an immunoprecipitable protein that contained N-linked oligosaccharides as evidenced by sensitivity to endoglycosidase F. However, only transfection with the clone containing the 153-base-pair segment led to expression of human ..beta..-glucuronidase activity. These studies provide the sequence for the full-length cDNA for human ..beta..-glucuronidase, demonstrate the existence of two populations of mRNA for ..beta..-glucuronidase in human placenta, only one of which specifies a catalytically active enzyme, and illustrate the importance of expression studies in verifying that a cDNA is functionally full-length.« less
2011-01-01
Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a plant model system. The genes characterized will be useful for future research not only in the species included in the present study, but also in related species for which no genomic resources are yet available. Our results demonstrate the efficiency of massively parallel transcriptome sequencing in a comparative framework as an approach for developing genomic resources in diverse groups of non-model organisms. PMID:21791039
Regulation of the Human Endogenous Retrovirus K (HML-2) Transcriptome by the HIV-1 Tat Protein
Gonzalez-Hernandez, Marta J.; Cavalcoli, James D.; Sartor, Maureen A.; Contreras-Galindo, Rafael; Meng, Fan; Dai, Manhong; Dube, Derek; Saha, Anjan K.; Gitlin, Scott D.; Omenn, Gilbert S.; Kaplan, Mark H.
2014-01-01
ABSTRACT Approximately 8% of the human genome is made up of endogenous retroviral sequences. As the HIV-1 Tat protein activates the overall expression of the human endogenous retrovirus type K (HERV-K) (HML-2), we used next-generation sequencing to determine which of the 91 currently annotated HERV-K (HML-2) proviruses are regulated by Tat. Transcriptome sequencing of total RNA isolated from Tat- and vehicle-treated peripheral blood lymphocytes from a healthy donor showed that Tat significantly activates expression of 26 unique HERV-K (HML-2) proviruses, silences 12, and does not significantly alter the expression of the remaining proviruses. Quantitative reverse transcription-PCR validation of the sequencing data was performed on Tat-treated PBLs of seven donors using provirus-specific primers and corroborated the results with a substantial degree of quantitative similarity. IMPORTANCE The expression of HERV-K (HML-2) is tightly regulated but becomes markedly increased following infection with HIV-1, in part due to the HIV-1 Tat protein. The findings reported here demonstrate the complexity of the genome-wide regulation of HERV-K (HML-2) expression by Tat. This work also demonstrates that although HERV-K (HML-2) proviruses in the human genome are highly similar in terms of DNA sequence, modulation of the expression of specific proviruses in a given biological situation can be ascertained using next-generation sequencing and bioinformatics analysis. PMID:24872592
Identification and expression profiles of the WRKY transcription factor family in Ricinus communis.
Li, Hui-Liang; Zhang, Liang-Bo; Guo, Dong; Li, Chang-Zhu; Peng, Shi-Qing
2012-07-25
In plants, WRKY proteins constitute a large family of transcription factors. They are involved in many biological processes, such as plant development, metabolism, and responses to biotic and abiotic stresses. A large number of WRKY transcription factors have been reported from Arabidopsis, rice, and other higher plants. The recent publication of the draft genome sequence of castor bean (Ricinus communis) has allowed a genome-wide search for R. communis WRKY (RcWRKY) transcription factors and the comparison of these positively identified proteins with their homologs in model plants. A total of 47 WRKY genes were identified in the castor bean genome. According to the structural features of the WRKY domain, the RcWRKY are classified into seven main phylogenetic groups. Furthermore, putative orthologs of RcWRKY proteins in Arabidopsis and rice could now be assigned. An analysis of expression profiles of RcWRKY genes indicates that 47 WRKY genes display differential expressions either in their transcript abundance or expression patterns under normal growth conditions. Copyright © 2012 Elsevier B.V. All rights reserved.
Howe, Douglas G.; Bradford, Yvonne M.; Eagle, Anne; Fashena, David; Frazer, Ken; Kalita, Patrick; Mani, Prita; Martin, Ryan; Moxon, Sierra Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Toro, Sabrina; Van Slyke, Ceri; Westerfield, Monte
2017-01-01
The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for zebrafish (Danio rerio) genetic, genomic, phenotypic and developmental data. ZFIN curators provide expert manual curation and integration of comprehensive data involving zebrafish genes, mutants, transgenic constructs and lines, phenotypes, genotypes, gene expressions, morpholinos, TALENs, CRISPRs, antibodies, anatomical structures, models of human disease and publications. We integrate curated, directly submitted, and collaboratively generated data, making these available to zebrafish research community. Among the vertebrate model organisms, zebrafish are superbly suited for rapid generation of sequence-targeted mutant lines, characterization of phenotypes including gene expression patterns, and generation of human disease models. The recent rapid adoption of zebrafish as human disease models is making management of these data particularly important to both the research and clinical communities. Here, we describe recent enhancements to ZFIN including use of the zebrafish experimental conditions ontology, ‘Fish’ records in the ZFIN database, support for gene expression phenotypes, models of human disease, mutation details at the DNA, RNA and protein levels, and updates to the ZFIN single box search. PMID:27899582
A review of recommendations for sequencing receptive and expressive language instruction.
Petursdottir, Anna Ingeborg; Carr, James E
2011-01-01
We review recommendations for sequencing instruction in receptive and expressive language objectives in early and intensive behavioral intervention (EIBI) programs. Several books recommend completing receptive protocols before introducing corresponding expressive protocols. However, this recommendation has little empirical support, and some evidence exists that the reverse sequence may be more efficient. Alternative recommendations include teaching receptive and expressive skills simultaneously (M. L. Sundberg & Partington, 1998) and building learning histories that lead to acquisition of receptive and expressive skills without direct instruction (Greer & Ross, 2008). Empirical support for these recommendations also is limited. Future research should assess the relative efficiency of receptive-before-expressive, expressive-before-receptive, and simultaneous training with children who have diagnoses of autism spectrum disorders. In addition, further evaluation is needed of the potential benefits of multiple-exemplar training and other variables that may influence the efficiency of receptive and expressive instruction.
A REVIEW OF RECOMMENDATIONS FOR SEQUENCING RECEPTIVE AND EXPRESSIVE LANGUAGE INSTRUCTION
Petursdottir, Anna Ingeborg; Carr, James E
2011-01-01
We review recommendations for sequencing instruction in receptive and expressive language objectives in early and intensive behavioral intervention (EIBI) programs. Several books recommend completing receptive protocols before introducing corresponding expressive protocols. However, this recommendation has little empirical support, and some evidence exists that the reverse sequence may be more efficient. Alternative recommendations include teaching receptive and expressive skills simultaneously (M. L. Sundberg & Partington, 1998) and building learning histories that lead to acquisition of receptive and expressive skills without direct instruction (Greer & Ross, 2008). Empirical support for these recommendations also is limited. Future research should assess the relative efficiency of receptive-before-expressive, expressive-before-receptive, and simultaneous training with children who have diagnoses of autism spectrum disorders. In addition, further evaluation is needed of the potential benefits of multiple-exemplar training and other variables that may influence the efficiency of receptive and expressive instruction. PMID:22219535
The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome
Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.
2001-01-01
Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022
The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.
Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M
2001-10-09
Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
Hu, Yanhui; Sopko, Richelle; Foos, Marianna; Kelley, Colleen; Flockhart, Ian; Ammeux, Noemie; Wang, Xiaowei; Perkins, Lizabeth; Perrimon, Norbert; Mohr, Stephanie E.
2013-01-01
The evaluation of specific endogenous transcript levels is important for understanding transcriptional regulation. More specifically, it is useful for independent confirmation of results obtained by the use of microarray analysis or RNA-seq and for evaluating RNA interference (RNAi)-mediated gene knockdown. Designing specific and effective primers for high-quality, moderate-throughput evaluation of transcript levels, i.e., quantitative, real-time PCR (qPCR), is nontrivial. To meet community needs, predefined qPCR primer pairs for mammalian genes have been designed and sequences made available, e.g., via PrimerBank. In this work, we adapted and refined the algorithms used for the mammalian PrimerBank to design 45,417 primer pairs for 13,860 Drosophila melanogaster genes, with three or more primer pairs per gene. We experimentally validated primer pairs for ~300 randomly selected genes expressed in early Drosophila embryos, using SYBR Green-based qPCR and sequence analysis of products derived from conventional PCR. All relevant information, including primer sequences, isoform specificity, spatial transcript targeting, and any available validation results and/or user feedback, is available from an online database (www.flyrnai.org/flyprimerbank). At FlyPrimerBank, researchers can retrieve primer information for fly genes either one gene at a time or in batch mode. Importantly, we included the overlap of each predicted amplified sequence with RNAi reagents from several public resources, making it possible for researchers to choose primers suitable for knockdown evaluation of RNAi reagents (i.e., to avoid amplification of the RNAi reagent itself). We demonstrate the utility of this resource for validation of RNAi reagents in vivo. PMID:23893746
The immune gene repertoire of an important viral reservoir, the Australian black flying fox.
Papenfuss, Anthony T; Baker, Michelle L; Feng, Zhi-Ping; Tachedjian, Mary; Crameri, Gary; Cowled, Chris; Ng, Justin; Janardhana, Vijaya; Field, Hume E; Wang, Lin-Fa
2012-06-20
Bats are the natural reservoir host for a range of emerging and re-emerging viruses, including SARS-like coronaviruses, Ebola viruses, henipaviruses and Rabies viruses. However, the mechanisms responsible for the control of viral replication in bats are not understood and there is little information available on any aspect of antiviral immunity in bats. Massively parallel sequencing of the bat transcriptome provides the opportunity for rapid gene discovery. Although the genomes of one megabat and one microbat have now been sequenced to low coverage, no transcriptomic datasets have been reported from any bat species. In this study, we describe the immune transcriptome of the Australian flying fox, Pteropus alecto, providing an important resource for identification of genes involved in a range of activities including antiviral immunity. Towards understanding the adaptations that have allowed bats to coexist with viruses, we have de novo assembled transcriptome sequence from immune tissues and stimulated cells from P. alecto. We identified about 18,600 genes involved in a broad range of activities with the most highly expressed genes involved in cell growth and maintenance, enzyme activity, cellular components and metabolism and energy pathways. 3.5% of the bat transcribed genes corresponded to immune genes and a total of about 500 immune genes were identified, providing an overview of both innate and adaptive immunity. A small proportion of transcripts found no match with annotated sequences in any of the public databases and may represent bat-specific transcripts. This study represents the first reported bat transcriptome dataset and provides a survey of expressed bat genes that complement existing bat genomic data. In addition, these data provide insight into genes relevant to the antiviral responses of bats, and form a basis for examining the roles of these molecules in immune response to viral infection.
Gardiner, Jack; Schroeder, Steven; Polacco, Mary L.; Sanchez-Villeda, Hector; Fang, Zhiwei; Morgante, Michele; Landewe, Tim; Fengler, Kevin; Useche, Francisco; Hanafey, Michael; Tingey, Scott; Chou, Hugh; Wing, Rod; Soderlund, Carol; Coe, Edward H.
2004-01-01
Our goal is to construct a robust physical map for maize (Zea mays) comprehensively integrated with the genetic map. We have used a two-dimensional 24 × 24 overgo pooling strategy to anchor maize expressed sequence tagged (EST) unigenes to 165,888 bacterial artificial chromosomes (BACs) on high-density filters. A set of 70,716 public maize ESTs seeded derivation of 10,723 EST unigene assemblies. From these assemblies, 10,642 overgo sequences of 40 bp were applied as hybridization probes. BAC addresses were obtained for 9,371 overgo probes, representing an 88% success rate. More than 96% of the successful overgo probes identified two or more BACs, while 5% identified more than 50 BACs. The majority of BACs identified (79%) were hybridized with one or two overgos. A small number of BACs hybridized with eight or more overgos, suggesting that these BACs must be gene rich. Approximately 5,670 overgos identified BACs assembled within one contig, indicating that these probes are highly locus specific. A total of 1,795 megabases (Mb; 87%) of the total 2,050 Mb in BAC contigs were associated with one or more overgos, which are serving as sequence-tagged sites for single nucleotide polymorphism development. Overgo density ranged from less than one overgo per megabase to greater than 20 overgos per megabase. The majority of contigs (52%) hit by overgos contained three to nine overgos per megabase. Analysis of approximately 1,022 Mb of genetically anchored BAC contigs indicates that 9,003 of the total 13,900 overgo-contig sites are genetically anchored. Our results indicate overgos are a powerful approach for generating gene-specific hybridization probes that are facilitating the assembly of an integrated genetic and physical map for maize. PMID:15020742
Kayesh, E; Bilkish, N; Liu, G S; Chen, W; Leng, X P; Fang, J G
2014-03-31
Among different classes of molecular markers, expressed sequence tags (ESTs) are a new resource for developing simple sequence repeat (SSR) functional markers for genotyping and genetic mapping in F1 hybrid populations of Vitis vinifera L. Recently, because of the availability of an enormous amount of data for ESTs in the public domain, the emphasis has shifted from genomic SSRs to EST-SSRs, which belong to transcribed regions of the genome and may have a role in gene expression or function. The objective of this study was to assess the polymorphisms among 94 F1 hybrids from "Early Rose" and "Red Globe" using 25 EST-derived and 25 non-EST SSR markers. A total collection of 362,375 grape ESTs that were retrieved from the National Center for Biotechnology Information (NCBI) and 2522 EST-SSR sequences were identified. From them, 205 primer pairs were randomly selected, including 176 pairs that were EST-derived and 29 non-EST SSR primer pairs, for polymerase chain reaction amplification. A total of 131 alleles were amplified using 50 pairs of primers; 78 alleles were amplified using EST-derived SSR primers and 53 were from non-EST SSR primers. At most, 6 and 5 alleles were amplified by EST-derived and non-EST SSR primers, respectively. The EST-derived SSR markers showed a maximum polymorphic information content (PIC) value of 1 and a minimum of 0.33 while non-EST SSR markers had maximum and minimum PIC values of 1 and 0.25, respectively. The average PIC value was 0.56 for EST-derived SSR markers and 0.45 for non-EST SSR markers.
Mennigen, Jan A; Zhang, Dapeng
2016-12-01
Rainbow trout represent an important teleost research model and aquaculture species. As such, rainbow trout are employed in diverse areas of biological research, including basic biological disciplines such as comparative physiology, toxicology, and, since rainbow trout have undergone both teleost- and salmonid-specific rounds of genome duplication, molecular evolution. In recent years, microRNAs (miRNAs, small non-protein coding RNAs) have emerged as important posttranscriptional regulators of gene expression in animals. Given the increasingly recognized importance of miRNAs as an additional layer in the regulation of gene expression and hence biological function, recent efforts using RNA- and genome sequencing approaches have resulted in the creation of several resources for the construction of a comprehensive repertoire of rainbow trout miRNAs and isomiRs (variant miRNA sequences that all appear to derive from the same gene but vary in sequence due to post-transcriptional processing). Importantly, through the recent publication of the rainbow trout genome (Berthelot et al., 2014), mRNA 3'UTR information has become available, allowing for the first time the genome-wide prediction of miRNA-target RNA relationships in this species. We here report the creation of the microtrout database, a comprehensive resource for rainbow trout miRNA and annotated 3'UTRs. The comprehensive database was used to implement an algorithm to predict genome-wide rainbow trout-specific miRNA-mRNA target relationships, generating an improved predictive framework over previously published approaches. This work will serve as a useful framework and sequence resource to experimentally address the role of miRNAs in several research areas using the rainbow trout model, examples of which are discussed. Copyright © 2016 Elsevier Inc. All rights reserved.
Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.
Liu, Xuejun; Shi, Xinxin; Chen, Chunlin; Zhang, Li
2015-10-16
The high-throughput sequencing technology, RNA-Seq, has been widely used to quantify gene and isoform expression in the study of transcriptome in recent years. Accurate expression measurement from the millions or billions of short generated reads is obstructed by difficulties. One is ambiguous mapping of reads to reference transcriptome caused by alternative splicing. This increases the uncertainty in estimating isoform expression. The other is non-uniformity of read distribution along the reference transcriptome due to positional, sequencing, mappability and other undiscovered sources of biases. This violates the uniform assumption of read distribution for many expression calculation approaches, such as the direct RPKM calculation and Poisson-based models. Many methods have been proposed to address these difficulties. Some approaches employ latent variable models to discover the underlying pattern of read sequencing. However, most of these methods make bias correction based on surrounding sequence contents and share the bias models by all genes. They therefore cannot estimate gene- and isoform-specific biases as revealed by recent studies. We propose a latent variable model, NLDMseq, to estimate gene and isoform expression. Our method adopts latent variables to model the unknown isoforms, from which reads originate, and the underlying percentage of multiple spliced variants. The isoform- and exon-specific read sequencing biases are modeled to account for the non-uniformity of read distribution, and are identified by utilizing the replicate information of multiple lanes of a single library run. We employ simulation and real data to verify the performance of our method in terms of accuracy in the calculation of gene and isoform expression. Results show that NLDMseq obtains competitive gene and isoform expression compared to popular alternatives. Finally, the proposed method is applied to the detection of differential expression (DE) to show its usefulness in the downstream analysis. The proposed NLDMseq method provides an approach to accurately estimate gene and isoform expression from RNA-Seq data by modeling the isoform- and exon-specific read sequencing biases. It makes use of a latent variable model to discover the hidden pattern of read sequencing. We have shown that it works well in both simulations and real datasets, and has competitive performance compared to popular methods. The method has been implemented as a freely available software which can be found at https://github.com/PUGEA/NLDMseq.
Content of intrinsic disorder influences the outcome of cell-free protein synthesis.
Tokmakov, Alexander A; Kurotani, Atsushi; Ikeda, Mariko; Terazawa, Yumiko; Shirouzu, Mikako; Stefanov, Vasily; Sakurai, Tetsuya; Yokoyama, Shigeyuki
2015-09-11
Cell-free protein synthesis is used to produce proteins with various structural traits. Recent bioinformatics analyses indicate that more than half of eukaryotic proteins possess long intrinsically disordered regions. However, no systematic study concerning the connection between intrinsic disorder and expression success of cell-free protein synthesis has been presented until now. To address this issue, we examined correlations of the experimentally observed cell-free protein expression yields with the contents of intrinsic disorder bioinformatically predicted in the expressed sequences. This analysis revealed strong relationships between intrinsic disorder and protein amenability to heterologous cell-free expression. On the one hand, elevated disorder content was associated with the increased ratio of soluble expression. On the other hand, overall propensity for detectable protein expression decreased with disorder content. We further demonstrated that these tendencies are rooted in some distinct features of intrinsically disordered regions, such as low hydrophobicity, elevated surface accessibility and high abundance of sequence motifs for proteolytic degradation, including sites of ubiquitination and PEST sequences. Our findings suggest that identification of intrinsically disordered regions in the expressed amino acid sequences can be of practical use for predicting expression success and optimizing cell-free protein synthesis.
A re-evaluation of the final step of vanillin biosynthesis in the orchid Vanilla planifolia.
Yang, Hailian; Barros-Rios, Jaime; Kourteva, Galina; Rao, Xiaolan; Chen, Fang; Shen, Hui; Liu, Chenggang; Podstolski, Andrzej; Belanger, Faith; Havkin-Frenkel, Daphna; Dixon, Richard A
2017-07-01
A recent publication describes an enzyme from the vanilla orchid Vanilla planifolia with the ability to convert ferulic acid directly to vanillin. The authors propose that this represents the final step in the biosynthesis of vanillin, which is then converted to its storage form, glucovanillin, by glycosylation. The existence of such a "vanillin synthase" could enable biotechnological production of vanillin from ferulic acid using a "natural" vanilla enzyme. The proposed vanillin synthase exhibits high identity to cysteine proteases, and is identical at the protein sequence level to a protein identified in 2003 as being associated with the conversion of 4-coumaric acid to 4-hydroxybenzaldehyde. We here demonstrate that the recombinant cysteine protease-like protein, whether expressed in an in vitro transcription-translation system, E. coli, yeast, or plants, is unable to convert ferulic acid to vanillin. Rather, the protein is a component of an enzyme complex that preferentially converts 4-coumaric acid to 4-hydroxybenzaldehyde, as demonstrated by the purification of this complex and peptide sequencing. Furthermore, RNA sequencing provides evidence that this protein is expressed in many tissues of V. planifolia irrespective of whether or not they produce vanillin. On the basis of our results, V. planifolia does not appear to contain a cysteine protease-like "vanillin synthase" that can, by itself, directly convert ferulic acid to vanillin. The pathway to vanillin in V. planifolia is yet to be conclusively determined. Copyright © 2017 Elsevier Ltd. All rights reserved.
NCBI GEO: archive for high-throughput functional genomic data.
Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Rudnev, Dmitry; Evangelista, Carlos; Kim, Irene F; Soboleva, Alexandra; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Muertter, Rolf N; Edgar, Ron
2009-01-01
The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput gene expression data. Additionally, GEO hosts other categories of high-throughput functional genomic data, including those that examine genome copy number variations, chromatin structure, methylation status and transcription factor binding. These data are generated by the research community using high-throughput technologies like microarrays and, more recently, next-generation sequencing. The database has a flexible infrastructure that can capture fully annotated raw and processed data, enabling compliance with major community-derived scientific reporting standards such as 'Minimum Information About a Microarray Experiment' (MIAME). In addition to serving as a centralized data storage hub, GEO offers many tools and features that allow users to effectively explore, analyze and download expression data from both gene-centric and experiment-centric perspectives. This article summarizes the GEO repository structure, content and operating procedures, as well as recently introduced data mining features. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.
Castle, John C.; Armour, Christopher D.; Löwer, Martin; Haynor, David; Biery, Matthew; Bouzek, Heather; Chen, Ronghua; Jackson, Stuart; Johnson, Jason M.; Rohl, Carol A.; Raymond, Christopher K.
2010-01-01
Non-coding RNAs (ncRNAs) are an essential class of molecular species that have been difficult to monitor on high throughput platforms due to frequent lack of polyadenylation. Using a polyadenylation-neutral amplification protocol and next-generation sequencing, we explore ncRNA expression in eleven human tissues. ncRNAs 7SL, U2, 7SK, and HBII-52 are expressed at levels far exceeding mRNAs. C/D and H/ACA box snoRNAs are associated with rRNA methylation and pseudouridylation, respectively: spleen expresses both, hypothalamus expresses mainly C/D box snoRNAs, and testes show enriched expression of both H/ACA box snoRNAs and RNA telomerase TERC. Within the snoRNA 14q cluster, 14q(I-6) is expressed at much higher levels than other cluster members. More reads align to mitochondrial than nuclear tRNAs. Many lincRNAs are actively transcribed, particularly those overlapping known ncRNAs. Within the Prader-Willi syndrome loci, the snoRNA HBII-85 (group I) cluster is highly expressed in hypothalamus, greater than in other tissues and greater than group II or III. Additionally, within the disease locus we find novel transcription across a 400,000 nt span in ovaries. This genome-wide polyA-neutral expression compendium demonstrates the richness of ncRNA expression, their high expression patterns, their function-specific expression patterns, and is publicly available. PMID:20668672
A High-Resolution Enhancer Atlas of the Developing Telencephalon
Visel, Axel; Taher, Leila; Girgis, Hani; May, Dalit; Golonzhka, Olga; Hoch, Renee; McKinsey, Gabriel L.; Pattabiraman, Kartik; Silberberg, Shanni N.; Blow, Matthew J.; Hansen, David V.; Nord, Alex S.; Akiyama, Jennifer A.; Holt, Amy; Hosseini, Roya; Phouanenavong, Sengthavy; Plajzer-Frick, Ingrid; Shoukry, Malak; Afzal, Veena; Kaplan, Tommy; Kriegstein, Arnold R.; Rubin, Edward M.; Ovcharenko, Ivan; Pennacchio, Len A.; Rubenstein, John L. R.
2013-01-01
Summary The mammalian telencephalon plays critical roles in cognition, motor function, and emotion. While many of the genes required for its development have been identified, the distant-acting regulatory sequences orchestrating their in vivo expression are mostly unknown. Here we describe a digital atlas of in vivo enhancers active in subregions of the developing telencephalon. We identified over 4,600 candidate embryonic forebrain enhancers and studied the in vivo activity of 329 of these sequences in transgenic mouse embryos. We generated serial sets of histological brain sections for 145 reproducible forebrain enhancers, resulting in a publicly accessible web-based data collection comprising over 32,000 sections. We also used epigenomic analysis of human and mouse cortex tissue to directly compare the genome-wide enhancer architecture in these species. These data provide a primary resource for investigating gene regulatory mechanisms of telencephalon development and enable studies of the role of distant-acting enhancers in neurodevelopmental disorders. PMID:23375746
Neisseria meningitidis; clones, carriage, and disease.
Read, R C
2014-05-01
Neisseria meningitidis, the cause of meningococcal disease, has been the subject of sophisticated molecular epidemiological investigation as a consequence of the significant public health threat posed by this organism. The use of multilocus sequence typing and whole genome sequencing classifies the organism into clonal complexes. Extensive phenotypic, genotypic and epidemiological information is available on the PubMLST website. The human nasopharynx is the sole ecological niche of this species, and carrier isolates show extensive genetic diversity as compared with hyperinvasive lineages. Horizontal gene exchange and recombinant events within the meningococcal genome during residence in the human nasopharynx result in antigenic diversity even within clonal complexes, so that individual clones may express, for example, more than one capsular polysaccharide (serogroup). Successful clones are capable of wide global dissemination, and may be associated with explosive epidemics of invasive disease. © 2014 The Author Clinical Microbiology and Infection © 2014 European Society of Clinical Microbiology and Infectious Diseases.
IPD—the Immuno Polymorphism Database
Robinson, James; Halliwell, Jason A.; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G. E.
2013-01-01
The Immuno Polymorphism Database (IPD), http://www.ebi.ac.uk/ipd/ is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer-cell immunoglobulin-like receptors, IPD-MHC, a database of sequences of the major histocompatibility complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTDAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The data is currently available online from the website and FTP directory. This article describes the latest updates and additional tools added to the IPD project. PMID:23180793
PWMScan: a fast tool for scanning entire genomes with a position-specific weight matrix.
Ambrosini, Giovanna; Groux, Romain; Bucher, Philipp
2018-03-05
Transcription factors (TFs) regulate gene expression by binding to specific short DNA sequences of 5 to 20-bp to regulate the rate of transcription of genetic information from DNA to messenger RNA. We present PWMScan, a fast web-based tool to scan server-resident genomes for matches to a user-supplied PWM or TF binding site model from a public database. The web server and source code are available at http://ccg.vital-it.ch/pwmscan and https://sourceforge.net/projects/pwmscan, respectively. giovanna.ambrosini@epfl.ch. SUPPLEMENTARY DATA ARE AVAILABLE AT BIOINFORMATICS ONLINE.
From Genome to Function: Systematic Analysis of the Soil Bacterium Bacillus Subtilis
Crawshaw, Samuel G.; Wipat, Anil
2001-01-01
Bacillus subtilis is a sporulating Gram-positive bacterium that lives primarily in the soil and associated water sources. Whilst this bacterium has been studied extensively in the laboratory, relatively few studies have been undertaken to study its activity in natural environments. The publication of the B. subtilis genome sequence and subsequent systematic functional analysis programme have provided an opportunity to develop tools for analysing the role and expression of Bacillus genes in situ. In this paper we discuss analytical approaches that are being developed to relate genes to function in environments such as the rhizosphere. PMID:18628943
Bioinformatic mining of EST-SSR loci in the Pacific oyster, Crassostrea gigas.
Wang, Y; Ren, R; Yu, Z
2008-06-01
A set of expressed sequence tag-simple sequence repeat (EST-SSR) markers of the Pacific oyster, Crassostrea gigas, was developed through bioinformatic mining of the GenBank public database. As of June 30, 2007, a total of 5132 EST sequences from GenBank were downloaded and screened for di-, tri- and tetra-nucleotide repeats, with criteria set at a minimum of 5, 4 and 4 repeats for the three categories of SSRs respectively. Seventeen polymorphic microsatellite markers were characterized. Allele numbers ranged from 3 to 10, and the observed and expected heterozygosity values varied from 0.125 to 0.770 and from 0.113 to 0.732 respectively. Eleven loci were at Hardy-Weinberg equilibrium (HWE); the other six loci showed significant departure from HWE (P < 0.01), suggesting possible presence of null alleles. Pairwise check of linkage disequilibrium (LD) indicated that 11 of 136 pairs of loci showed significant LD (P < 0.01), likely due to HWE present in single markers. Cross-species amplification was examined for five other Crassostrea species and reasonable results were obtained, promising usefulness of these markers in oyster genetics.
Gonzalez, Sergio; Clavijo, Bernardo; Rivarola, Máximo; Moreno, Patricio; Fernandez, Paula; Dopazo, Joaquín; Paniego, Norma
2017-02-22
In the last years, applications based on massively parallelized RNA sequencing (RNA-seq) have become valuable approaches for studying non-model species, e.g., without a fully sequenced genome. RNA-seq is a useful tool for detecting novel transcripts and genetic variations and for evaluating differential gene expression by digital measurements. The large and complex datasets resulting from functional genomic experiments represent a challenge in data processing, management, and analysis. This problem is especially significant for small research groups working with non-model species. We developed a web-based application, called ATGC transcriptomics, with a flexible and adaptable interface that allows users to work with new generation sequencing (NGS) transcriptomic analysis results using an ontology-driven database. This new application simplifies data exploration, visualization, and integration for a better comprehension of the results. ATGC transcriptomics provides access to non-expert computer users and small research groups to a scalable storage option and simple data integration, including database administration and management. The software is freely available under the terms of GNU public license at http://atgcinta.sourceforge.net .
Linking microarray reporters with protein functions
Gaj, Stan; van Erk, Arie; van Haaften, Rachel IM; Evelo, Chris TA
2007-01-01
Background The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways. Results This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways. Conclusion Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/. PMID:17897448
USDA-ARS?s Scientific Manuscript database
Background: Due to a relatively high level of codominant inheritance and transferability within and among taxonomic groups, simple sequence repeat (SSR) markers are important elements in comparative mapping and delineation of genomic regions associated with traits of economic importance. Expressed S...
Itzhaki, H; Maxson, J M; Woodson, W R
1994-09-13
The increased production of ethylene during carnation petal senescence regulates the transcription of the GST1 gene encoding a subunit of glutathione-S-transferase. We have investigated the molecular basis for this ethylene-responsive transcription by examining the cis elements and trans-acting factors involved in the expression of the GST1 gene. Transient expression assays following delivery of GST1 5' flanking DNA fused to a beta-glucuronidase receptor gene were used to functionally define sequences responsible for ethylene-responsive expression. Deletion analysis of the 5' flanking sequences of GST1 identified a single positive regulatory element of 197 bp between -667 and -470 necessary for ethylene-responsive expression. The sequences within this ethylene-responsive region were further localized to 126 bp between -596 and -470. The ethylene-responsive element (ERE) within this region conferred ethylene-regulated expression upon a minimal cauliflower mosaic virus-35S TATA-box promoter in an orientation-independent manner. Gel electrophoresis mobility-shift assays and DNase I footprinting were used to identify proteins that bind to sequences within the ERE. Nuclear proteins from carnation petals were shown to specifically interact with the 126-bp ERE and the presence and binding of these proteins were independent of ethylene or petal senescence. DNase I footprinting defined DNA sequences between -510 and -488 within the ERE specifically protected by bound protein. An 8-bp sequence (ATTTCAAA) within the protected region shares significant homology with promoter sequences required for ethylene responsiveness from the tomato fruit-ripening E4 gene.
Wittenberger, T; Schaller, H C; Hellebrand, S
2001-03-30
We have developed a comprehensive expressed sequence tag database search method and used it for the identification of new members of the G-protein coupled receptor superfamily. Our approach proved to be especially useful for the detection of expressed sequence tag sequences that do not encode conserved parts of a protein, making it an ideal tool for the identification of members of divergent protein families or of protein parts without conserved domain structures in the expressed sequence tag database. At least 14 of the expressed sequence tags found with this strategy are promising candidates for new putative G-protein coupled receptors. Here, we describe the sequence and expression analysis of five new members of this receptor superfamily, namely GPR84, GPR86, GPR87, GPR90 and GPR91. We also studied the genomic structure and chromosomal localization of the respective genes applying in silico methods. A cluster of six closely related G-protein coupled receptors was found on the human chromosome 3q24-3q25. It consists of four orphan receptors (GPR86, GPR87, GPR91, and H963), the purinergic receptor P2Y1, and the uridine 5'-diphosphoglucose receptor KIAA0001. It seems likely that these receptors evolved from a common ancestor and therefore might have related ligands. In conclusion, we describe a data mining procedure that proved to be useful for the identification and first characterization of new genes and is well applicable for other gene families. Copyright 2001 Academic Press.
Design and construction of 2A peptide-linked multicistronic vectors.
Szymczak-Workman, Andrea L; Vignali, Kate M; Vignali, Dario A A
2012-02-01
The need for reliable, multicistronic vectors for multigene delivery is at the forefront of biomedical technology. This article describes the design and construction of 2A peptide-linked multicistronic vectors, which can be used to express multiple proteins from a single open reading frame (ORF). The small 2A peptide sequences, when cloned between genes, allow for efficient, stoichiometric production of discrete protein products within a single vector through a novel "cleavage" event within the 2A peptide sequence. Expression of more than two genes using conventional approaches has several limitations, most notably imbalanced protein expression and large size. The use of 2A peptide sequences alleviates these concerns. They are small (18-22 amino acids) and have divergent amino-terminal sequences, which minimizes the chance for homologous recombination and allows for multiple, different 2A peptide sequences to be used within a single vector. Importantly, separation of genes placed between 2A peptide sequences is nearly 100%, which allows for stoichiometric and concordant expression of the genes, regardless of the order of placement within the vector.
Senescence responsive transcriptional element
Campisi, Judith; Testori, Alessandro
1999-01-01
Recombinant polynucleotides have expression control sequences that have a senescence responsive element and a minimal promoter, and which are operatively linked to a heterologous nucleotide sequence. The molecules are useful for achieving high levels of expression of genes in senescent cells. Methods of inhibiting expression of genes in senescent cells also are provided.
Candidate Genes Expressed in Tolerant Common Wheat With Resistant to English Grain Aphid.
Luo, Kun; Zhang, Gaisheng; Wang, Chunping; Ouellet, Thérèse; Wu, Jingjing; Zhu, Qidi; Zhao, Huiyan
2014-10-01
The English grain aphid, Sitobion avenae (F.) (Hemiptera: Aphididae), is a common worldwide pest of wheat (Triticum aestivum L.). The use of improved resistant cultivars by the farmers is the most effective and environmentally friendly method to control this aphid in the field. The winter wheat genotypes 98-10-35 and Amigo are resistant to S. avenae. To identify genes responsible for resistance to S. avenae in these genotypes, differential-display reverse transcription-polymerase chain reaction was used to identify the corresponding differentially expressed sequences in current study. Two backcross progenies were obtained by crossing the two resistant genotypes with the susceptible genotype 1376. Six potential expected-differential bands were sequenced. Lengths of the expressed sequence tags ranged from 128 to 532 bp. Although these expressed sequences were likely associated with S. avenae resistance, there was one expressed sequence tag located on 7DL chromosome, and its potential function may associate with the ability to maintain photosynthesis in wheat. That serves as an active way for tolerant common wheat with resistant to S. avenae. Cloning the full length of these sequences would help us thoroughly understand the mechanism of wheat resistance to S. avenae and be valuable for breeding cultivars with S. avenae resistance. © 2014 Entomological Society of America.
Metal resistance sequences and transgenic plants
Meagher, Richard Brian; Summers, Anne O.; Rugh, Clayton L.
1999-10-12
The present invention provides nucleic acid sequences encoding a metal ion resistance protein, which are expressible in plant cells. The metal resistance protein provides for the enzymatic reduction of metal ions including but not limited to divalent Cu, divalent mercury, trivalent gold, divalent cadmium, lead ions and monovalent silver ions. Transgenic plants which express these coding sequences exhibit increased resistance to metal ions in the environment as compared with plants which have not been so genetically modified. Transgenic plants with improved resistance to organometals including alkylmercury compounds, among others, are provided by the further inclusion of plant-expressible organometal lyase coding sequences, as specifically exemplified by the plant-expressible merB coding sequence. Furthermore, these transgenic plants which have been genetically modified to express the metal resistance coding sequences of the present invention can participate in the bioremediation of metal contamination via the enzymatic reduction of metal ions. Transgenic plants resistant to organometals can further mediate remediation of organic metal compounds, for example, alkylmetal compounds including but not limited to methyl mercury, methyl lead compounds, methyl cadmium and methyl arsenic compounds, in the environment by causing the freeing of mercuric or other metal ions and the reduction of the ionic mercury or other metal ions to the less toxic elemental mercury or other metals.
Curated eutherian third party data gene data sets.
Premzl, Marko
2016-03-01
The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets.
Subang, Maria C; Fatah, Rewas; Wu, Ying; Hannaman, Drew; Rice, Jason; Evans, Claire F; Chernajovsky, Yuti; Gould, David
2015-01-01
Immune responses to expressed foreign transgenes continue to hamper progress of gene therapy development. Translated foreign proteins with intracellular location are generally less accessible to the immune system, nevertheless they can be presented to the immune system through both MHC Class I and Class II pathways. When the foreign protein luciferase was expressed following intramuscular delivery of plasmid DNA in outbred mice, expression rapidly declined over 4 weeks. Through modifications to the expression plasmid and the luciferase transgene we examined the effect of detargeting expression away from antigen-presenting cells (APCs), targeting expression to skeletal muscle and fusion with glycine-alanine repeats (GAr) that block MHC-Class I presentation on the duration of luciferase expression. De-targeting expression from APCs with miR142-3p target sequences incorporated into the luciferase 3'UTR reduced the humoral immune response to both native and luciferase modified with a short GAr sequence but did not prolong the duration of expression. When a skeletal muscle specific promoter was combined with the miR target sequences the humoral immune response was dampened and luciferase expression persisted at higher levels for longer. Interestingly, fusion of luciferase with a longer GAr sequence promoted the decline in luciferase expression and increased the humoral immune response to luciferase. These studies demonstrate that expression elements and transgene modifications can alter the duration of transgene expression but other factors will need to overcome before foreign transgenes expressed in skeletal muscle are immunologically silent.
Wide-Open: Accelerating public data release by automating detection of overdue datasets
Poon, Hoifung; Howe, Bill
2017-01-01
Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week. PMID:28594819
Wide-Open: Accelerating public data release by automating detection of overdue datasets.
Grechkin, Maxim; Poon, Hoifung; Howe, Bill
2017-06-01
Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.
Streaming fragment assignment for real-time analysis of sequencing experiments
Roberts, Adam; Pachter, Lior
2013-01-01
We present eXpress, a software package for highly efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time, and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data, showing greater efficiency than other quantification methods. PMID:23160280
Harraghy, Niamh; Homerova, Dagmar; Herrmann, Mathias; Kormanec, Jan
2008-01-01
Mapping the transcription start points of the eap, emp, and vwb promoters revealed a conserved octanucleotide sequence (COS). Deleting this sequence abolished the expression of eap, emp, and vwb. However, electrophoretic mobility shift assays gave no evidence that this sequence was a binding site for SarA or SaeR, known regulators of eap and emp.
Rethinking the Undergraduate Public Relations Sequence: Evolution of Thought 1975-1995.
ERIC Educational Resources Information Center
Fischer, Rick
Public relations sequence heads have the luxury of a strong and supportive foundation on which to build a program of instruction. The field has a rich collection of thinking and recommendations relating to public relations education. The Association for Education in Journalism (AEJ) and the Public Relations Society of America (PRSA) conducted a…
Raventós, D; Jensen, A B; Rask, M B; Casacuberta, J M; Mundy, J; San Segundo, B
1995-01-01
Transient gene expression assays in barley aleurone protoplasts were used to identify a cis-regulatory element involved in the elicitor-responsive expression of the maize PRms gene. Analysis of transcriptional fusions between PRms 5' upstream sequences and a chloramphenicol acetyltransferase reporter gene, as well as chimeric promoters containing PRms promoter fragments or repeated oligonucleotides fused to a minimal promoter, delineated a 20 bp sequence which functioned as an elicitor-response element (ERE). This sequence contains a motif (-246 AATTGACC) similar to sequences found in promoters of other pathogen-responsive genes. The analysis also indicated that an enhancing sequence(s) between -397 and -296 is required for full PRms activation by elicitors. The protein kinase inhibitor staurosporine was found to completely block the transcriptional activation induced by elicitors. These data indicate that protein phosphorylation is involved in the signal transduction pathway leading to PRms expression.
Stenman, G; Anisowicz, A; Sager, R
1988-11-01
The KRAS gene is constitutionally amplified in the Chinese hamster. We have mapped the amplified sequences by in situ hybridization to two major sites on the X and Y chromosomes, Xq4 and Yp2. No autosomal site was detected despite a search under relaxed hybridization conditions. KRAS DNA is amplified about 50-fold compared to a human cell line known to have a diploid number of KRAS sequences, whereas mRNA expression is 5- to 10-fold lower than in normal human cells. While mRNA expression levels do not necessarily parallel gene copy number, the low expression level strongly suggests that the amplified sequences are transcriptionally silent. It is suggested that the amplified sequences arose from the original KRAS gene on chromosome 8 and that the KRAS sequences on the Y chromosome arose by X-Y recombination.
Li, De-Zhu; Guo, Zhen-Hua
2012-01-01
Background Transcriptome sequencing can be used to determine gene sequences and transcript abundance in non-model species, and the advent of next-generation sequencing (NGS) technologies has greatly decreased the cost and time required for this process. Transcriptome data are especially desirable in bamboo species, as certain members constitute an economically and culturally important group of mostly semelparous plants with remarkable flowering features, yet little bamboo genomic research has been performed. Here we present, for the first time, extensive sequence and transcript abundance data for the floral transcriptome of a key bamboo species, Dendrocalamus latiflorus, obtained using the Illumina GAII sequencing platform. Our further goal was to identify patterns of gene expression during bamboo flower development. Results Approximately 96 million sequencing reads were generated and assembled de novo, yielding 146,395 high quality unigenes with an average length of 461 bp. Of these, 80,418 were identified as putative homologs of annotated sequences in the public protein databases, of which 290 were associated with the floral transition and 47 were related to flower development. Digital abundance analysis identified 26,529 transcripts differentially enriched between two developmental stages, young flower buds and older developing flowers. Unigenes found at each stage were categorized according to their putative functional categories. These sequence and putative function data comprise a resource for future investigation of the floral transition and flower development in bamboo species. Conclusions Our results present the first broad survey of a bamboo floral transcriptome. Although it will be necessary to validate the functions carried out by these genes, these results represent a starting point for future functional research on D. latiflorus and related species. PMID:22916120
Song, Wei; Jiang, Keji; Zhang, Fengying; Lin, Yu; Ma, Lingbo
2016-08-08
Acipenser baeri, one of the critically endangered animals on the verge of extinction, is a key species for evolutionary, developmental, physiology and conservation studies and a standout amongst the most important food products worldwide. Though the transcriptome of the early development of A. baeri has been published recently, the transcriptome changes occurring in the transition from embryonic to late stages are still unknown. The aim of this work was to analyze the transcriptomes of embryonic and post-embryonic stages of A. baeri and identify differentially expressed genes (DEGs) and their expression patterns using mRNA collected from specimens at big yolk plug, wide neural plate and 64 day old sturgeon developmental stages for RNA-Seq. The paired-end sequencing of the transcriptome of samples of A. baeri collected at two early (big yolk plug (T1, 32 h after fertilization) and wide neural plate formation (T2, 45 h after fertilization)) and one late (T22, 64 day old sturgeon) developmental stages using Illumina Hiseq2000 platform generated 64039846, 64635214 and 75293762 clean paired-end reads for T1, T2 and T22, respectively. After quality control, the sequencing reads were de novo assembled to generate a set of 149,265 unigenes with N50 value of 1277 bp. Functional annotation indicated that a substantial number of these unigenes had significant similarity with proteins in public databases. Differential expression profiling allowed the identification of 2789, 12,819 and 10,824 DEGs from the respective T1 vs. T2, T1 vs. T22 and T2 vs. T22 comparisons. High correlation of DEGs' features was recorded among early stages while significant divergences were observed when comparing the late stage with early stages. GO and KEGG enrichment analyses revealed the biological processes, cellular component, molecular functions and metabolic pathways associated with identified DEGs. The qRT-PCR performed for candidate genes in specimens confirmed the validity of the RNA-seq data. This study presents, for the first time, an extensive overview of RNA-Seq based characterization of the early and post-embryonic developmental transcriptomes of A. baeri and provided 149,265 gene sequences that will be potentially valuable for future molecular and genetic studies in A. baeri.
Zhao, Daqiu; Jiang, Yao; Ning, Chuanlong; Meng, Jiasong; Lin, Shasha; Ding, Wen; Tao, Jun
2014-08-19
Herbaceous peony (Paeonia lactiflora Pall.) is a traditional flower in China and a wedding attractive flower in worldwide. In its flower colour, yellow is the rarest which is ten times the price of the other colours. However, the breeding of new yellow P. lactiflora varieties using genetic engineering is severely limited due to the little-known biochemical and molecular mechanisms underlying its characteristic formation. In this study, two cDNA libraries generated from P. lactiflora chimaera with red outer-petal and yellow inner-petal were sequenced using an Illumina HiSeq™ 2000 platform. 66,179,398 and 65,481,444 total raw reads from red outer-petal and yellow inner-petal cDNA libraries were generated, which were assembled into 61,431 and 70,359 Unigenes with an average length of 628 and 617 nt, respectively. Moreover, 61,408 non-redundant All-unigenes were obtained, with 37,511 All-unigenes (61.08%) annotated in public databases. In addition, 6,345 All-unigenes were differentially expressed between the red outer-petal and yellow inner-petal, with 3,899 up-regulated and 2,446 down-regulated All-unigenes, and the flavonoid metabolic pathway related to colour development was identified using the Kyoto encyclopedia of genes and genomes database (KEGG). Subsequently, the expression patterns of 10 candidate differentially expressed genes (DEGs) involved in the flavonoid metabolic pathway were examined, and flavonoids were qualitatively and quantitatively analysed. Numerous anthoxanthins (flavone and flavonol) and a few anthocyanins were detected in the yellow inner-petal, which were all lower than those in the red outer-petal due to the low expression levels of the phenylalanine ammonialyase gene (PlPAL), flavonol synthase gene (PlFLS), dihydroflavonol 4-reductase gene (PlDFR), anthocyanidin synthase gene (PlANS), anthocyanidin 3-O-glucosyltransferase gene (Pl3GT) and anthocyanidin 5-O-glucosyltransferase gene (Pl5GT). Transcriptome sequencing (RNA-Seq) analysis based on the high throughput sequencing technology was an efficient approach to identify critical genes in P. lactiflora and other non-model plants. The flavonoid metabolic pathway and glucide metabolic pathway were identified as relatived yellow formation in P. lactiflora, PlPAL, PlFLS, PlDFR, PlANS, Pl3GT and Pl5GT were selected as potential candidates involved in flavonoid metabolic pathway, which inducing inhibition of anthocyanin biosynthesis mediated yellow formation in P. lactiflora. This study could lay a theoretical foundation for breeding new yellow P. lactiflora varieties.
Babak, Tomas; Garrett-Engele, Philip; Armour, Christopher D; Raymond, Christopher K; Keller, Mark P; Chen, Ronghua; Rohl, Carol A; Johnson, Jason M; Attie, Alan D; Fraser, Hunter B; Schadt, Eric E
2010-08-13
Identifying associations between genotypes and gene expression levels using microarrays has enabled systematic interrogation of regulatory variation underlying complex phenotypes. This approach has vast potential for functional characterization of disease states, but its prohibitive cost, given hundreds to thousands of individual samples from populations have to be genotyped and expression profiled, has limited its widespread application. Here we demonstrate that genomic regions with allele-specific expression (ASE) detected by sequencing cDNA are highly enriched for cis-acting expression quantitative trait loci (cis-eQTL) identified by profiling of 500 animals in parallel, with up to 90% agreement on the allele that is preferentially expressed. We also observed widespread noncoding and antisense ASE and identified several allele-specific alternative splicing variants. Monitoring ASE by sequencing cDNA from as little as one sample is a practical alternative to expression genetics for mapping cis-acting variation that regulates RNA transcription and processing.
Koho, N M; Mykkänen, A K; Reeben, M; Raekallio, M R; Ilves, M; Pösö, A R
2012-01-01
MCT1-CD147 complex is the prime lactate transporter in mammalian plasma membranes. In equine red blood cells (RBCs), activity of the complex and expression of MCT1 and CD147 is bimodal; high in 70% and low in 30%. We studied whether sequence variations contribute to the bimodal expression of MCT1 and CD147. Samples of blood and cremaster muscle were collected in connection of castration from 24 horses. Additional gluteus muscle samples were collected from 15 Standardbreds of which seven were known to express low amounts of CD147 in RBCs. The cDNA of MCT1 and CD147 together with a promoter region of CD147 was sequenced. The amounts of MCT1 and CD147 expressed in RBC and muscle membranes were measured by Western blot and mRNA levels in muscles by qPCR. MCT1 and CD147 were expressed in 20 castrates, and in four only were traces found. Sequence variations found in MCT1 were not linked to MCT1 expression. In CD147 linked heterozygous single nucleotide polymorphisms (SNPs) 389A>G (Met(125)Val) and 990C>T (3'-UTR) were associated to low expression of CD147. Also a mutation 168A>G (Ile(51)Val) in CD147 was associated to low MCT1 and CD147 expression. Low MCT1 and CD147 mRNA levels in gluteus were found in Standardbreds with low CD147 expression in RBCs. The results suggest that sequence variations affect the expression level of CD147, but do not explain its bimodality. The levels of MCT1 and CD147 mRNA correlated with the expression of CD147 and suggest that bimodality of their expression is regulated at transcriptional level. Copyright © 2011 Elsevier B.V. All rights reserved.
Kasi, Devi; Catherine, Christy; Lee, Seung-Won; Lee, Kyung-Ho; Kim, Yu Jung; Ro Lee, Myeong; Ju, Jung Won; Kim, Dong-Myung
2017-05-01
The rapidly evolving cloning and sequencing technologies have enabled understanding of genomic structure of parasite genomes, opening up new ways of combatting parasite-related diseases. To make the most of the exponentially accumulating genomic data, however, it is crucial to analyze the proteins encoded by these genomic sequences. In this study, we adopted an engineered cell-free protein synthesis system for large-scale expression screening of an expression sequence tag (EST) library of Clonorchis sinensis to identify potential antigens that can be used for diagnosis and treatment of clonorchiasis. To allow high-throughput expression and identification of individual genes comprising the library, a cell-free synthesis reaction was designed such that both the template DNA and the expressed proteins were co-immobilized on the same microbeads, leading to microbead-based linkage of the genotype and phenotype. This reaction configuration allowed streamlined expression, recovery, and analysis of proteins. This approach enabled us to identify 21 antigenic proteins. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 33:832-837, 2017. © 2017 American Institute of Chemical Engineers.
Peng, Jing; Peng, Futian; Zhu, Chunfu; Wei, Shaochong
2008-06-01
A putative isopentenyltransferase (IPT) encoding gene was identified from a pingyitiancha (Malus hupehensis Rehd.) expressed sequence tag database, and the full-length gene was cloned by RACE. Based on expression profile and sequence alignment, the nucleotide sequence of the clone, named MhIPT3, was most similar to AtIPT3, an IPT gene in Arabidopsis. The full-length cDNA contained a 963-bp open reading frame encoding a protein of 321 amino acids with a molecular mass of 37.3 kDa. Sequence analysis of genomic DNA revealed the absence of introns in the frame. Quantitative real-time PCR analysis demonstrated that the gene was expressed in roots, stems and leaves. Application of nitrate to roots of nitrogen-deprived seedlings strongly induced expression of MhIPT3 and was accompanied by the accumulation of cytokinins, whereas MhIPT3 expression was little affected by ammonium application to roots of nitrogen-deprived seedlings. Application of nitrate to leaves also up-regulated the expression of MhIPT3 and corresponded closely with the accumulation of isopentyladenine and isopentyladenosine in leaves.
BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC
Satija, Rahul; Novák, Ádám; Miklós, István; Lyngsø, Rune; Hein, Jotun
2009-01-01
Background We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. Results We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the α-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. Conclusion BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from PMID:19715598
BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC.
Satija, Rahul; Novák, Adám; Miklós, István; Lyngsø, Rune; Hein, Jotun
2009-08-28
We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the alpha-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from http://www.stats.ox.ac.uk/~satija/BigFoot/
Miller, Jason R; Koren, Sergey; Dilley, Kari A; Puri, Vinita; Brown, David M; Harkins, Derek M; Thibaud-Nissen, Françoise; Rosen, Benjamin; Chen, Xiao-Guang; Tu, Zhijian; Sharakhov, Igor V; Sharakhova, Maria V; Sebra, Robert; Stockwell, Timothy B; Bergman, Nicholas H; Sutton, Granger G; Phillippy, Adam M; Piermarini, Peter M; Shabman, Reed S
2018-03-01
The 50-year-old Aedes albopictus C6/36 cell line is a resource for the detection, amplification, and analysis of mosquito-borne viruses including Zika, dengue, and chikungunya. The cell line is derived from an unknown number of larvae from an unspecified strain of Aedes albopictus mosquitoes. Toward improved utility of the cell line for research in virus transmission, we present an annotated assembly of the C6/36 genome. The C6/36 genome assembly has the largest contig N50 (3.3 Mbp) of any mosquito assembly, presents the sequences of both haplotypes for most of the diploid genome, reveals independent null mutations in both alleles of the Dicer locus, and indicates a male-specific genome. Gene annotation was computed with publicly available mosquito transcript sequences. Gene expression data from cell line RNA sequence identified enrichment of growth-related pathways and conspicuous deficiency in aquaporins and inward rectifier K+ channels. As a test of utility, RNA sequence data from Zika-infected cells were mapped to the C6/36 genome and transcriptome assemblies. Host subtraction reduced the data set by 89%, enabling faster characterization of nonhost reads. The C6/36 genome sequence and annotation should enable additional uses of the cell line to study arbovirus vector interactions and interventions aimed at restricting the spread of human disease.
Pittendrigh, B R; Clark, J M; Johnston, J S; Lee, S H; Romero-Severson, J; Dasch, G A
2006-11-01
The human body louse, Pediculus humanus humanus (L.), and the human head louse, Pediculus humanus capitis, belong to the hemimetabolous order Phthiraptera. The body louse is the primary vector that transmits the bacterial agents of louse-borne relapsing fever, trench fever, and epidemic typhus. The genomes of the bacterial causative agents of several of these aforementioned diseases have been sequenced. Thus, determining the body louse genome will enhance studies of host-vector-pathogen interactions. Although not important as a major disease vector, head lice are of major social concern. Resistance to traditional pesticides used to control head and body lice have developed. It is imperative that new molecular targets be discovered for the development of novel compounds to control these insects. No complete genome sequence exists for a hemimetabolous insect species primarily because hemimetabolous insects often have large (2000 Mb) to very large (up to 16,300 Mb) genomes. Fortuitously, we determined that the human body louse has one of the smallest genome sizes known in insects, suggesting it may be a suitable choice as a minimal hemimetabolous genome in which many genes have been eliminated during its adaptation to human parasitism. Because many louse species infest birds and mammals, the body louse genome-sequencing project will facilitate studies of their comparative genomics. A 6-8X coverage of the body louse genome, plus sequenced expressed sequence tags, should provide the entomological, evolutionary biology, medical, and public health communities with useful genetic information.
BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations
Wang, Junbai; Batmanov, Kirill
2015-01-01
Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein–DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein–DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions. PMID:26202972
Functional expression of dental plaque microbiota.
Peterson, Scott N; Meissner, Tobias; Su, Andrew I; Snesrud, Erik; Ong, Ana C; Schork, Nicholas J; Bretz, Walter A
2014-01-01
Dental caries remains a significant public health problem and is considered pandemic worldwide. The prediction of dental caries based on profiling of microbial species involved in disease and equally important, the identification of species conferring dental health has proven more difficult than anticipated due to high interpersonal and geographical variability of dental plaque microbiota. We have used RNA-Seq to perform global gene expression analysis of dental plaque microbiota derived from 19 twin pairs that were either concordant (caries-active or caries-free) or discordant for dental caries. The transcription profiling allowed us to define a functional core microbiota consisting of nearly 60 species. Similarities in gene expression patterns allowed a preliminary assessment of the relative contribution of human genetics, environmental factors and caries phenotype on the microbiota's transcriptome. Correlation analysis of transcription allowed the identification of numerous functional networks, suggesting that inter-personal environmental variables may co-select for groups of genera and species. Analysis of functional role categories allowed the identification of dominant functions expressed by dental plaque biofilm communities, that highlight the biochemical priorities of dental plaque microbes to metabolize diverse sugars and cope with the acid and oxidative stress resulting from sugar fermentation. The wealth of data generated by deep sequencing of expressed transcripts enables a greatly expanded perspective concerning the functional expression of dental plaque microbiota.
Functional expression of dental plaque microbiota
Peterson, Scott N.; Meissner, Tobias; Su, Andrew I.; Snesrud, Erik; Ong, Ana C.; Schork, Nicholas J.; Bretz, Walter A.
2014-01-01
Dental caries remains a significant public health problem and is considered pandemic worldwide. The prediction of dental caries based on profiling of microbial species involved in disease and equally important, the identification of species conferring dental health has proven more difficult than anticipated due to high interpersonal and geographical variability of dental plaque microbiota. We have used RNA-Seq to perform global gene expression analysis of dental plaque microbiota derived from 19 twin pairs that were either concordant (caries-active or caries-free) or discordant for dental caries. The transcription profiling allowed us to define a functional core microbiota consisting of nearly 60 species. Similarities in gene expression patterns allowed a preliminary assessment of the relative contribution of human genetics, environmental factors and caries phenotype on the microbiota's transcriptome. Correlation analysis of transcription allowed the identification of numerous functional networks, suggesting that inter-personal environmental variables may co-select for groups of genera and species. Analysis of functional role categories allowed the identification of dominant functions expressed by dental plaque biofilm communities, that highlight the biochemical priorities of dental plaque microbes to metabolize diverse sugars and cope with the acid and oxidative stress resulting from sugar fermentation. The wealth of data generated by deep sequencing of expressed transcripts enables a greatly expanded perspective concerning the functional expression of dental plaque microbiota. PMID:25177549
Microarray-Based Gene Expression Analysis for Veterinary Pathologists: A Review.
Raddatz, Barbara B; Spitzbarth, Ingo; Matheis, Katja A; Kalkuhl, Arno; Deschl, Ulrich; Baumgärtner, Wolfgang; Ulrich, Reiner
2017-09-01
High-throughput, genome-wide transcriptome analysis is now commonly used in all fields of life science research and is on the cusp of medical and veterinary diagnostic application. Transcriptomic methods such as microarrays and next-generation sequencing generate enormous amounts of data. The pathogenetic expertise acquired from understanding of general pathology provides veterinary pathologists with a profound background, which is essential in translating transcriptomic data into meaningful biological knowledge, thereby leading to a better understanding of underlying disease mechanisms. The scientific literature concerning high-throughput data-mining techniques usually addresses mathematicians or computer scientists as the target audience. In contrast, the present review provides the reader with a clear and systematic basis from a veterinary pathologist's perspective. Therefore, the aims are (1) to introduce the reader to the necessary methodological background; (2) to introduce the sequential steps commonly performed in a microarray analysis including quality control, annotation, normalization, selection of differentially expressed genes, clustering, gene ontology and pathway analysis, analysis of manually selected genes, and biomarker discovery; and (3) to provide references to publically available and user-friendly software suites. In summary, the data analysis methods presented within this review will enable veterinary pathologists to analyze high-throughput transcriptome data obtained from their own experiments, supplemental data that accompany scientific publications, or public repositories in order to obtain a more in-depth insight into underlying disease mechanisms.
Madi, Asaf; Poran, Asaf; Shifrut, Eric; Reich-Zeliger, Shlomit; Greenstein, Erez; Zaretsky, Irena; Arnon, Tomer; Laethem, Francois Van; Singer, Alfred; Lu, Jinghua; Sun, Peter D; Cohen, Irun R; Friedman, Nir
2017-01-01
Diversity of T cell receptor (TCR) repertoires, generated by somatic DNA rearrangements, is central to immune system function. However, the level of sequence similarity of TCR repertoires within and between species has not been characterized. Using network analysis of high-throughput TCR sequencing data, we found that abundant CDR3-TCRβ sequences were clustered within networks generated by sequence similarity. We discovered a substantial number of public CDR3-TCRβ segments that were identical in mice and humans. These conserved public sequences were central within TCR sequence-similarity networks. Annotated TCR sequences, previously associated with self-specificities such as autoimmunity and cancer, were linked to network clusters. Mechanistically, CDR3 networks were promoted by MHC-mediated selection, and were reduced following immunization, immune checkpoint blockade or aging. Our findings provide a new view of T cell repertoire organization and physiology, and suggest that the immune system distributes its TCR sequences unevenly, attending to specific foci of reactivity. DOI: http://dx.doi.org/10.7554/eLife.22057.001 PMID:28731407
Digital gene expression analysis of the zebra finch genome
2010-01-01
Background In order to understand patterns of adaptation and molecular evolution it is important to quantify both variation in gene expression and nucleotide sequence divergence. Gene expression profiling in non-model organisms has recently been facilitated by the advent of massively parallel sequencing technology. Here we investigate tissue specific gene expression patterns in the zebra finch (Taeniopygia guttata) with special emphasis on the genes of the major histocompatibility complex (MHC). Results Almost 2 million 454-sequencing reads from cDNA of six different tissues were assembled and analysed. A total of 11,793 zebra finch transcripts were represented in this EST data, indicating a transcriptome coverage of about 65%. There was a positive correlation between the tissue specificity of gene expression and non-synonymous to synonymous nucleotide substitution ratio of genes, suggesting that genes with a specialised function are evolving at a higher rate (or with less constraint) than genes with a more general function. In line with this, there was also a negative correlation between overall expression levels and expression specificity of contigs. We found evidence for expression of 10 different genes related to the MHC. MHC genes showed relatively tissue specific expression levels and were in general primarily expressed in spleen. Several MHC genes, including MHC class I also showed expression in brain. Furthermore, for all genes with highest levels of expression in spleen there was an overrepresentation of several gene ontology terms related to immune function. Conclusions Our study highlights the usefulness of next-generation sequence data for quantifying gene expression in the genome as a whole as well as in specific candidate genes. Overall, the data show predicted patterns of gene expression profiles and molecular evolution in the zebra finch genome. Expression of MHC genes in particular, corresponds well with expression patterns in other vertebrates. PMID:20359325
Mismer, D.; Rubin, G. M.
1989-01-01
We have analyzed the cis-acting regulatory sequences of the Rh1 (ninaE) gene in Drosophila melanogaster by P-element-mediated germline transformation of indicator genes transcribed from mutant ninaE promoter sequences. We have previously shown that a 200-bp region extending from -120 to +67 relative to the transcription start site is sufficient to obtain eye-specific expression from the ninaE promoter. In the present study, 22 different 4-13-bp sequences in the -120/+67 promoter region were altered by oligonucleotide-directed mutagenesis. Several of these sequences were found to be required for proper promoter function; two of these are conserved in the promoter of the homologous gene isolated from the related species Drosophila virilis. Alteration of a conserved 9-bp sequence results in aberrant, low level expression in the body. Alteration of a separate 11-bp sequence, found in the promoter regions of several photoreceptor-specific genes of Drosophila, results in an approximately 15-fold reduction in promoter efficiency but without apparent alteration of tissue-specificity. A protein factor capable of interacting with this 11-bp sequence has been detected by DNaseI footprinting in embryonic nuclear extracts. Finally, we have further characterized two separable enhancer sequences previously shown to be required for normal levels of expression from this promoter. PMID:2521839
DNA methylation inhibits expression and transposition of the Neurospora Tad retrotransposon.
Zhou, Y; Cambareri, E B; Kinsey, J A
2001-06-01
Tad is a LINE-like retrotransposon of the filamentous fungus Neurospora crassa. We have analyzed both expression and transposition of this element using strains with a single copy of Tad located in the 5' noncoding sequences of the am (glutamate dehydrogenase) gene. Tad in this position has been shown to carry a de novo cytosine methylation signal which causes reversible methylation of both Tad and am upstream sequences. Here we find that methylation of the Tad sequences inhibits both Tad expression and transposition. This inhibition can be relieved by the use of 5-azacytidine, a drug which reduces cytosine methylation, or by placing the Tad/am sequences in a dim-2 genetic background.
Groves, Benjamin; Kuchina, Anna; Rosenberg, Alexander B.; Jojic, Nebojsa; Fields, Stanley; Seelig, Georg
2017-01-01
Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5′ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5′ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5′ UTRs as well as native S. cerevisiae 5′ UTRs. The model additionally was used to computationally evolve highly active 5′ UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model. PMID:29097404
Croteau, Rodney Bruce; Wildung, Mark Raymond; Lange, Bernd Markus; McCaskill, David G.
2001-01-01
cDNAs encoding 1-deoxyxylulose-5-phosphate synthase from peppermint (Mentha piperita) have been isolated and sequenced, and the corresponding amino acid sequences have been determined. Accordingly, isolated DNA sequences (SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7) are provided which code for the expression of 1-deoxyxylulose-5-phosphate synthase from plants. In another aspect the present invention provides for isolated, recombinant DXPS proteins, such as the proteins having the sequences set forth in SEQ ID NO:4, SEQ ID NO:6 and SEQ ID NO:8. In other aspects, replicable recombinant cloning vehicles are provided which code for plant 1-deoxyxylulose-5-phosphate synthases, or for a base sequence sufficiently complementary to at least a portion of 1-deoxyxylulose-5-phosphate synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding a plant 1-deoxyxylulose-5-phosphate synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant 1-deoxyxylulose-5-phosphate synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant 1-deoxyxylulose-5-phosphate synthase may be used to obtain expression or enhanced expression of 1-deoxyxylulose-5-phosphate synthase in plants in order to enhance the production of 1-deoxyxylulose-5-phosphate, or its derivatives such as isopentenyl diphosphate (BP), or may be otherwise employed for the regulation or expression of 1-deoxyxylulose-5-phosphate synthase, or the production of its products.
Bubier, Jason A.; Jay, Jeremy J.; Baker, Christopher L.; Bergeson, Susan E.; Ohno, Hiroshi; Metten, Pamela; Crabbe, John C.; Chesler, Elissa J.
2014-01-01
Extensive genetic and genomic studies of the relationship between alcohol drinking preference and withdrawal severity have been performed using animal models. Data from multiple such publications and public data resources have been incorporated in the GeneWeaver database with >60,000 gene sets including 285 alcohol withdrawal and preference-related gene sets. Among these are evidence for positional candidates regulating these behaviors in overlapping quantitative trait loci (QTL) mapped in distinct mouse populations. Combinatorial integration of functional genomics experimental results revealed a single QTL positional candidate gene in one of the loci common to both preference and withdrawal. Functional validation studies in Ap3m2 knockout mice confirmed these relationships. Genetic validation involves confirming the existence of segregating polymorphisms that could account for the phenotypic effect. By exploiting recent advances in mouse genotyping, sequence, epigenetics, and phylogeny resources, we confirmed that Ap3m2 resides in an appropriately segregating genomic region. We have demonstrated genetic and alcohol-induced regulation of Ap3m2 expression. Although sequence analysis revealed no polymorphisms in the Ap3m2-coding region that could account for all phenotypic differences, there are several upstream SNPs that could. We have identified one of these to be an H3K4me3 site that exhibits strain differences in methylation. Thus, by making cross-species functional genomics readily computable we identified a common QTL candidate for two related bio-behavioral processes via functional evidence and demonstrate sufficiency of the genetic locus as a source of variation underlying two traits. PMID:24923803
Bubier, Jason A; Jay, Jeremy J; Baker, Christopher L; Bergeson, Susan E; Ohno, Hiroshi; Metten, Pamela; Crabbe, John C; Chesler, Elissa J
2014-08-01
Extensive genetic and genomic studies of the relationship between alcohol drinking preference and withdrawal severity have been performed using animal models. Data from multiple such publications and public data resources have been incorporated in the GeneWeaver database with >60,000 gene sets including 285 alcohol withdrawal and preference-related gene sets. Among these are evidence for positional candidates regulating these behaviors in overlapping quantitative trait loci (QTL) mapped in distinct mouse populations. Combinatorial integration of functional genomics experimental results revealed a single QTL positional candidate gene in one of the loci common to both preference and withdrawal. Functional validation studies in Ap3m2 knockout mice confirmed these relationships. Genetic validation involves confirming the existence of segregating polymorphisms that could account for the phenotypic effect. By exploiting recent advances in mouse genotyping, sequence, epigenetics, and phylogeny resources, we confirmed that Ap3m2 resides in an appropriately segregating genomic region. We have demonstrated genetic and alcohol-induced regulation of Ap3m2 expression. Although sequence analysis revealed no polymorphisms in the Ap3m2-coding region that could account for all phenotypic differences, there are several upstream SNPs that could. We have identified one of these to be an H3K4me3 site that exhibits strain differences in methylation. Thus, by making cross-species functional genomics readily computable we identified a common QTL candidate for two related bio-behavioral processes via functional evidence and demonstrate sufficiency of the genetic locus as a source of variation underlying two traits. Copyright © 2014 by the Genetics Society of America.
Speiser, Daniel I; Pankey, M Sabrina; Zaharoff, Alexander K; Battelle, Barbara A; Bracken-Grissom, Heather D; Breinholt, Jesse W; Bybee, Seth M; Cronin, Thomas W; Garm, Anders; Lindgren, Annie R; Patel, Nipam H; Porter, Megan L; Protas, Meredith E; Rivera, Ajna S; Serb, Jeanne M; Zigler, Kirk S; Crandall, Keith A; Oakley, Todd H
2014-11-19
Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families. We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository ( http://bitbucket.org/osiris_phylogenetics/pia/ ) and we demonstrate PIA on a publicly-accessible web server ( http://galaxy-dev.cnsi.ucsb.edu/pia/ ). Our new trees for LIT genes will be a valuable resource for researchers studying the evolution of eyes or other light-interacting structures. We also introduce PIA, a high throughput method for using phylogenetic relationships to identify LIT genes in transcriptomes from non-model organisms. With simple modifications, our methods may be used to search for different sets of genes or to annotate data sets from taxa outside of Metazoa.
Rama Reddy, Nagaraja Reddy; Mehta, Rucha Harishbhai; Soni, Palak Harendrabhai; Makasana, Jayanti; Gajbhiye, Narendra Athamaram; Ponnuchamy, Manivel; Kumar, Jitendra
2015-01-01
Senna (Cassia angustifolia Vahl.) is a world's natural laxative medicinal plant. Laxative properties are due to sennosides (anthraquinone glycosides) natural products. However, little genetic information is available for this species, especially concerning the biosynthetic pathways of sennosides. We present here the transcriptome sequencing of young and mature leaf tissue of Cassia angustifolia using Illumina MiSeq platform that resulted in a total of 6.34 Gb of raw nucleotide sequence. The sequence assembly resulted in 42230 and 37174 transcripts with an average length of 1119 bp and 1467 bp for young and mature leaf, respectively. The transcripts were annotated using NCBI BLAST with 'green plant database (txid 33090)', Swiss Prot, Kyoto Encylcopedia of Genes & Genomes (KEGG), Cluster of Orthologous Gene (COG) and Gene Ontology (GO). Out of the total transcripts, 40138 (95.0%) and 36349 (97.7%) from young and mature leaf, respectively, were annotated by BLASTX against green plant database of NCBI. We used InterProscan to see protein similarity at domain level, a total of 34031 (young leaf) and 32077 (mature leaf) transcripts were annotated against the Pfam domains. All transcripts from young and mature leaf were assigned to 191 KEGG pathways. There were 166 and 159 CDS, respectively, from young and mature leaf involved in metabolism of terpenoids and polyketides. Many CDS encoding enzymes leading to biosynthesis of sennosides were identified. A total of 10,763 CDS differentially expressing in both young and mature leaf libraries of which 2,343 (21.7%) CDS were up-regulated in young compared to mature leaf. Several differentially expressed genes found functionally associated with sennoside biosynthesis. CDS encoding for many CYPs and TF families were identified having probable roles in metabolism of primary as well as secondary metabolites. We developed SSR markers for molecular breeding of senna. We have identified a set of putative genes involved in various secondary metabolite pathways, especially those related to the synthesis of sennosides which will serve as an important platform for public information about gene expression, genomics, and functional genomics in senna.
Rama Reddy, Nagaraja Reddy; Mehta, Rucha Harishbhai; Soni, Palak Harendrabhai; Makasana, Jayanti; Gajbhiye, Narendra Athamaram; Ponnuchamy, Manivel; Kumar, Jitendra
2015-01-01
Senna (Cassia angustifolia Vahl.) is a world’s natural laxative medicinal plant. Laxative properties are due to sennosides (anthraquinone glycosides) natural products. However, little genetic information is available for this species, especially concerning the biosynthetic pathways of sennosides. We present here the transcriptome sequencing of young and mature leaf tissue of Cassia angustifolia using Illumina MiSeq platform that resulted in a total of 6.34 Gb of raw nucleotide sequence. The sequence assembly resulted in 42230 and 37174 transcripts with an average length of 1119 bp and 1467 bp for young and mature leaf, respectively. The transcripts were annotated using NCBI BLAST with ‘green plant database (txid 33090)’, Swiss Prot, Kyoto Encylcopedia of Genes & Genomes (KEGG), Cluster of Orthologous Gene (COG) and Gene Ontology (GO). Out of the total transcripts, 40138 (95.0%) and 36349 (97.7%) from young and mature leaf, respectively, were annotated by BLASTX against green plant database of NCBI. We used InterProscan to see protein similarity at domain level, a total of 34031 (young leaf) and 32077 (mature leaf) transcripts were annotated against the Pfam domains. All transcripts from young and mature leaf were assigned to 191 KEGG pathways. There were 166 and 159 CDS, respectively, from young and mature leaf involved in metabolism of terpenoids and polyketides. Many CDS encoding enzymes leading to biosynthesis of sennosides were identified. A total of 10,763 CDS differentially expressing in both young and mature leaf libraries of which 2,343 (21.7%) CDS were up-regulated in young compared to mature leaf. Several differentially expressed genes found functionally associated with sennoside biosynthesis. CDS encoding for many CYPs and TF families were identified having probable roles in metabolism of primary as well as secondary metabolites. We developed SSR markers for molecular breeding of senna. We have identified a set of putative genes involved in various secondary metabolite pathways, especially those related to the synthesis of sennosides which will serve as an important platform for public information about gene expression, genomics, and functional genomics in senna. PMID:26098898
Asp, Torben; Kristensen, Michael
2016-01-01
Background Insecticide resistance in the housefly, Musca domestica, has been investigated for more than 60 years. It will enter a new era after the recent publication of the housefly genome and the development of multiple next generation sequencing technologies. The genetic background of the xenobiotic response can now be investigated in greater detail. Here, we investigate the 454-pyrosequencing transcriptome of the spinosad-resistant 791spin strain in relation to the housefly genome with focus on P450 genes. Results The de novo assembly of clean reads gave 35,834 contigs consisting of 21,780 sequences of the spinosad resistant strain. The 3,648 sequences were annotated with an enzyme code EC number and were mapped to 124 KEGG pathways with metabolic processes as most highly represented pathway. One hundred and twenty contigs were annotated as P450s covering 44 different P450 genes of housefly. Eight differentially expressed P450s genes were identified and investigated for SNPs, CpG islands and common regulatory motifs in promoter and coding regions. Functional annotation clustering of metabolic related genes and motif analysis of P450s revealed their association with epigenetic, transcription and gene expression related functions. The sequence variation analysis resulted in 12 SNPs and eight of them found in cyp6d1. There is variation in location, size and frequency of CpG islands and specific motifs were also identified in these P450s. Moreover, identified motifs were associated to GO terms and transcription factors using bioinformatic tools. Conclusion Transcriptome data of a spinosad resistant strain provide together with genome data fundamental support for future research to understand evolution of resistance in houseflies. Here, we report for the first time the SNPs, CpG islands and common regulatory motifs in differentially expressed P450s. Taken together our findings will serve as a stepping stone to advance understanding of the mechanism and role of P450s in xenobiotic detoxification. PMID:27019205
Sequence evaluation of four specific cDNA libraries for developmental genomics of sunflower.
Tamborindeguy, C; Ben, C; Liboz, T; Gentzbittel, L
2004-04-01
Four different cDNA libraries were constructed from sunflower protoplasts growing under embryogenic and non-embryogenic conditions: one standard library from each condition and two subtractive libraries in opposite sense. A total of 22,876 cDNA clones were obtained and 4800 ESTs were sequenced, giving rise to 2479 high quality ESTs representing an unigene set of 1502 sequences. This set was compared with ESTs represented in public databases using the programs BLASTN and BLASTX, and its members were classified according to putative function using the catalog in the Kyoto Encyclopedia of Genes and Genomes (KEGG). Some 33% of sequences failed to align with existing plant ESTs and therefore represent putative novel genes. The libraries show a low level of redundancy and, on average, 50% of the present ESTs have not been previously reported for sunflower. Several potentially interesting genes were identified, based on their homology with genes involved in animal zygotic division or plant embryogenesis. We also identified two ESTs that show significantly different levels of expression under embryogenic and non-embryogenic conditions. The libraries described here represent an original and valuable resource for the discovery of yet unknown genes putatively involved in dicot embryogenesis and improving our knowledge of the mechanisms involved in polarity acquisition by plant embryos.
Quantitative analysis of a deeply sequenced marine microbial metatranscriptome.
Gifford, Scott M; Sharma, Shalabh; Rinta-Kanto, Johanna M; Moran, Mary Ann
2011-03-01
The potential of metatranscriptomic sequencing to provide insights into the environmental factors that regulate microbial activities depends on how fully the sequence libraries capture community expression (that is, sample-sequencing depth and coverage depth), and the sensitivity with which expression differences between communities can be detected (that is, statistical power for hypothesis testing). In this study, we use an internal standard approach to make absolute (per liter) estimates of transcript numbers, a significant advantage over proportional estimates that can be biased by expression changes in unrelated genes. Coastal waters of the southeastern United States contain 1 × 10(12) bacterioplankton mRNA molecules per liter of seawater (~200 mRNA molecules per bacterial cell). Even for the large bacterioplankton libraries obtained in this study (~500,000 possible protein-encoding sequences in each of two libraries after discarding rRNAs and small RNAs from >1 million 454 FLX pyrosequencing reads), sample-sequencing depth was only 0.00001%. Expression levels of 82 genes diagnostic for transformations in the marine nitrogen, phosphorus and sulfur cycles ranged from below detection (<1 × 10(6) transcripts per liter) for 36 genes (for example, phosphonate metabolism gene phnH, dissimilatory nitrate reductase subunit napA) to >2.7 × 10(9) transcripts per liter (ammonia transporter amt and ammonia monooxygenase subunit amoC). Half of the categories for which expression was detected, however, had too few copy numbers for robust statistical resolution, as would be required for comparative (experimental or time-series) expression studies. By representing whole community gene abundance and expression in absolute units (per volume or mass of environment), 'omics' data can be better leveraged to improve understanding of microbially mediated processes in the ocean.
Multi-targeted priming for genome-wide gene expression assays.
Adomas, Aleksandra B; Lopez-Giraldez, Francesc; Clark, Travis A; Wang, Zheng; Townsend, Jeffrey P
2010-08-17
Complementary approaches to assaying global gene expression are needed to assess gene expression in regions that are poorly assayed by current methodologies. A key component of nearly all gene expression assays is the reverse transcription of transcribed sequences that has traditionally been performed by priming the poly-A tails on many of the transcribed genes in eukaryotes with oligo-dT, or by priming RNA indiscriminately with random hexamers. We designed an algorithm to find common sequence motifs that were present within most protein-coding genes of Saccharomyces cerevisiae and of Neurospora crassa, but that were not present within their ribosomal RNA or transfer RNA genes. We then experimentally tested whether degenerately priming these motifs with multi-targeted primers improved the accuracy and completeness of transcriptomic assays. We discovered two multi-targeted primers that would prime a preponderance of genes in the genomes of Saccharomyces cerevisiae and Neurospora crassa while avoiding priming ribosomal RNA or transfer RNA. Examining the response of Saccharomyces cerevisiae to nitrogen deficiency and profiling Neurospora crassa early sexual development, we demonstrated that using multi-targeted primers in reverse transcription led to superior performance of microarray profiling and next-generation RNA tag sequencing. Priming with multi-targeted primers in addition to oligo-dT resulted in higher sensitivity, a larger number of well-measured genes and greater power to detect differences in gene expression. Our results provide the most complete and detailed expression profiles of the yeast nitrogen starvation response and N. crassa early sexual development to date. Furthermore, our multi-targeting priming methodology for genome-wide gene expression assays provides selective targeting of multiple sequences and counter-selection against undesirable sequences, facilitating a more complete and precise assay of the transcribed sequences within the genome.
Schultheiss, Oliver C; Pang, Joyce S; Torges, Cynthia M; Wirth, Michelle M; Treynor, Wendy; Derryberry, Douglas
2005-03-01
Participants (N = 216) were administered a differential implicit learning task during which they were trained and tested on 3 maximally distinct 2nd-order visuomotor sequences, with sequence color serving as discriminative stimulus. During training, 1 sequence each was followed by an emotional face, a neutral face, and no face, using backward masking. Emotion (joy, surprise, anger), face gender, and exposure duration (12 ms, 209 ms) were varied between participants; implicit motives were assessed with a picture-story exercise. For power-motivated individuals, low-dominance facial expressions enhanced and high-dominance expressions impaired learning. For affiliation-motivated individuals, learning was impaired in the context of hostile faces. These findings did not depend on explicit learning of fixed sequences or on awareness of sequence-face contingencies. Copyright 2005 APA, all rights reserved.
Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain
2011-01-01
cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.
An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets.
Hosseini, Parsa; Tremblay, Arianne; Matthews, Benjamin F; Alkharouf, Nadim W
2010-07-02
The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value. We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations. TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data in a CASAVA-build into functional annotations while producing corresponding gene expression measurements. Achieving such analysis is executed in an ultrafast and highly efficient manner, whether the analysis be a single-read or paired-end sequencing experiment. TASE is a user-friendly and freely available application, allowing rapid analysis and annotation of any given Illumina Solexa sequencing dataset with ease.
Ewulonu, U K; Snyder, L; Silver, L M; Schimenti, J C
1996-03-01
Transgenic mice were generated to localize essential promoter elements in the mouse testis-expressed Tcp-10 genes. These genes are expressed exclusively in male germ cells, and exhibit a diffuse range of transcriptional start sites, possibly due to the absence of a TATA box. A series of transgene constructs containing different amounts of 5' flanking DNA revealed that all sequences necessary for appropriate temporal and tissue-specific transcription of Tcp-10 reside between positions -1 to -973. All transgenic animals containing these sequences expressed a chimeric transgene at high levels, in a pattern that paralleled the endogenous genes. These experiments further defined a 227 bp fragment from -746 to -973 that was absolutely essential for expression. In a gel-shift assay, this 227-bp fragment bound nuclear protein from testis, but not other tissues, to yield two retarded bands. Sequence analysis of this fragment revealed a half-site for the AP-2 transcription factor recognition sequence. Gel shift assays using native or mutant oligonucleotides demonstrated that the putative AP-2 recognition sequence was essential for generating the retarded bands. Since the binding activity is testis-specific, but AP-2 expression is not exclusive to male germ cells, it is possible that transcription of Tcp-10 requires interaction between AP-2 and a germ cell-specific transcription factor.
NASA Astrophysics Data System (ADS)
Boone, R. D.; Rogers, S. L.
2004-12-01
We report on work to assess the functional gene sequences for soil microbiota that control nitrogen cycle pathways along the successional sequence (willow, alder, poplar, white spruce, black spruce) on the Tanana River floodplain, Interior Alaska. Microbial DNA and mRNA were extracted from soils (0-10 cm depth) for amoA (ammonium monooxygenase), nifH (nitrogenase reductase), napA (nitrate reductase), and nirS and nirK (nitrite reductase) genes. Gene presence was determined by amplification of a conserved sequence of each gene employing sequence specific oligonucleotide primers and Polymerase Chain Reaction (PCR). Expression of the genes was measured via nested reverse transcriptase PCR amplification of the extracted mRNA. Amplified PCR products were visualized on agarose electrophoresis gels. All five successional stages show evidence for the presence and expression of microbial genes that regulate N fixation (free-living), nitrification, and nitrate reduction. We detected (1) nifH, napA, and nirK presence and amoA expression (mRNA production) for all five successional stages and (2) nirS and amoA presence and nifH, nirK, and napA expression for early successional stages (willow, alder, poplar). The results highlight that the existing body of previous process-level work has not sufficiently considered the microbial potential for a nitrate economy and free-living N fixation along the complete floodplain successional sequence.
Isoform-level gene expression patterns in single-cell RNA-sequencing data.
Vu, Trung Nghia; Wills, Quin F; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Pawitan, Yudi; Rantalainen, Mattias
2018-02-27
RNA sequencing of single cells enables characterization of transcriptional heterogeneity in seemingly homogeneous cell populations. Single-cell sequencing has been applied in a wide range of researches fields. However, few studies have focus on characterization of isoform-level expression patterns at the single-cell level. In this study we propose and apply a novel method, ISOform-Patterns (ISOP), based on mixture modeling, to characterize the expression patterns of isoform pairs from the same gene in single-cell isoform-level expression data. We define six principal patterns of isoform expression relationships and describe a method for differential-pattern analysis. We demonstrate ISOP through analysis of single-cell RNA-sequencing data from a breast cancer cell line, with replication in three independent datasets. We assigned the pattern types to each of 16,562 isoform-pairs from 4,929 genes. Among those, 26% of the discovered patterns were significant (p<0.05), while remaining patterns are possibly effects of transcriptional bursting, drop-out and stochastic biological heterogeneity. Furthermore, 32% of genes discovered through differential-pattern analysis were not detected by differential-expression analysis. The effect of drop-out events, mean expression level, and properties of the expression distribution on the performances of ISOP were also investigated through simulated datasets. To conclude, ISOP provides a novel approach for characterization of isoformlevel preference, commitment and heterogeneity in single-cell RNA-sequencing data. The ISOP method has been implemented as a R package and is available at https://github.com/nghiavtr/ISOP under a GPL-3 license. mattias.rantalainen@ki.se. Supplementary data are available at Bioinformatics online.
Gu, Yifeng; Zhang, Lei; Chen, Xiaowu
2014-08-01
MicroRNAs (miRNAs) play an important role in gonadal development and differentiation in fish. However, understanding of the mechanism of this process is hindered by our poor knowledge of miRNA expression patterns in fish gonads. In this study, miRNA libraries derived from adult gonads of Paralichthys olivaceus were generated by using next-generation sequencing (NGS) technology. Bioinformatics analysis was performed to distinguish mature miRNA sequences from two classes of small RNAs represented in the sequencing data. A total of 141 mature miRNAs were identified, in which 21 miRNAs were found in P. olivaceus for the first time. Variance and preference of miRNAs expression were concluded from the deep sequencing reads. Some miRNAs, such as pol-miR-143, pol-miR-26a and pol-let-7a were found with quite high expression levels in both gonads, while some exhibited a clear sex-biased expression in different gonad. Approximate 20.0% and 13.1% of the isolated miRNAs were preferentially expressed in the testis (FC<0.5) or ovary (FC>2), respectively. The identification and the preliminary analysis of the sex-biased expression of miRNAs in P. olivaceus gonads in our work by using NGS will provide us a basic catalog of miRNAs to facilitate future improvement and exploitation of sexual regulatory mechanisms in P. olivaceus. Copyright © 2014. Published by Elsevier Inc.
Lee, Je Hyuk; Daugharthy, Evan R.; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C.; Terry, Richard; Turczyk, Brian M.; Yang, Joyce L.; Lee, Ho Suk; Aach, John; Zhang, Kun; Church, George M.
2014-01-01
RNA sequencing measures the quantitative change in gene expression over the whole transcriptome, but it lacks spatial context. On the other hand, in situ hybridization provides the location of gene expression, but only for a small number of genes. Here we detail a protocol for genome-wide profiling of gene expression in situ in fixed cells and tissues, in which RNA is converted into cross-linked cDNA amplicons and sequenced manually on a confocal microscope. Unlike traditional RNA-seq our method enriches for context-specific transcripts over house-keeping and/or structural RNA, and it preserves the tissue architecture for RNA localization studies. Our protocol is written for researchers experienced in cell microscopy with minimal computing skills. Library construction and sequencing can be completed within 14 d, with image analysis requiring an additional 2 d. PMID:25675209
Switchgrass ubiquitin promoter (PVUBI2) and uses thereof
Stewart, C. Neal; Mann, David George James
2013-12-10
The subject application provides polynucleotides, compositions thereof and methods for regulating gene expression in a plant. Polynucleotides disclosed herein comprise novel sequences for a promoter isolated from Panicum virgatum (switchgrass) that initiates transcription of an operably linked nucleotide sequence. Thus, various embodiments of the invention comprise the nucleotide sequence of SEQ ID NO: 2 or fragments thereof comprising nucleotides 1 to 692 of SEQ ID NO: 2 that are capable of driving the expression of an operably linked nucleic acid sequence.
Majumder, P; Choudhury, A; Banerjee, M; Lahiri, A; Bhattacharyya, N P
2007-08-01
To investigate the mechanism of increased expression of caspase-1 caused by exogenous Hippi, observed earlier in HeLa and Neuro2A cells, in this work we identified a specific motif AAAGACATG (- 101 to - 93) at the caspase-1 gene upstream sequence where HIPPI could bind. Various mutations in this specific sequence compromised the interaction, showing the specificity of the interactions. In the luciferase reporter assay, when the reporter gene was driven by caspase-1 gene upstream sequences (- 151 to - 92) with the mutation G to T at position - 98, luciferase activity was decreased significantly in green fluorescent protein-Hippi-expressing HeLa cells in comparison to that obtained with the wild-type caspase-1 gene 60 bp upstream sequence, indicating the biological significance of such binding. It was observed that the C-terminal 'pseudo' death effector domain of HIPPI interacted with the 60 bp (- 151 to - 92) upstream sequence of the caspase-1 gene containing the motif. We further observed that expression of caspase-8 and caspase-10 was increased in green fluorescent protein-Hippi-expressing HeLa cells. In addition, HIPPI interacted in vitro with putative promoter sequences of these genes, containing a similar motif. In summary, we identified a novel function of HIPPI; it binds to specific upstream sequences of the caspase-1, caspase-8 and caspase-10 genes and alters the expression of the genes. This result showed the motif-specific interaction of HIPPI with DNA, and indicates that it could act as transcription regulator.
Tsukagoshi, Hironaka; Suzuki, Takamasa; Nishikawa, Kouki; Agarie, Sakae; Ishiguro, Sumie; Higashiyama, Tetsuya
2015-01-01
Understanding the molecular mechanisms that convey salt tolerance in plants is a crucial issue for increasing crop yield. The ice plant (Mesembryanthemum crystallinum) is a halophyte that is capable of growing under high salt conditions. For example, the roots of ice plant seedlings continue to grow in 140 mM NaCl, a salt concentration that completely inhibits Arabidopsis thaliana root growth. Identifying the molecular mechanisms responsible for this high level of salt tolerance in a halophyte has the potential of revealing tolerance mechanisms that have been evolutionarily successful. In the present study, deep sequencing (RNAseq) was used to examine gene expression in ice plant roots treated with various concentrations of NaCl. Sequencing resulted in the identification of 53,516 contigs, 10,818 of which were orthologs of Arabidopsis genes. In addition to the expression analysis, a web-based ice plant database was constructed that allows broad public access to the data. The results obtained from an analysis of the RNAseq data were confirmed by RT-qPCR. Novel patterns of gene expression in response to high salinity within 24 hours were identified in the ice plant when the RNAseq data from the ice plant was compared to gene expression data obtained from Arabidopsis plants exposed to high salt. Although ABA responsive genes and a sodium transporter protein (HKT1), are up-regulated and down-regulated respectively in both Arabidopsis and the ice plant; peroxidase genes exhibit opposite responses. The results of this study provide an important first step towards analyzing environmental tolerance mechanisms in a non-model organism and provide a useful dataset for predicting novel gene functions. PMID:25706745
Wang, Xiaoyun; Liu, Hailiang; Chen, Yangyi; Guo, Lei; Luo, Fang; Sun, Jiufeng; Mao, Qiang; Liang, Pei; Xie, Zhizhi; Zhou, Chenhui; Tian, Yanli; Lv, Xiaoli; Huang, Lisi; Zhou, Juanjuan; Hu, Yue; Li, Ran; Zhang, Fan; Lei, Huali; Li, Wenfang; Hu, Xuchu; Liang, Chi; Xu, Jin; Li, Xuerong; Yu, Xinbing
2013-01-01
Clonorchis sinensis (C. sinensis), an important food-borne parasite that inhabits the intrahepatic bile duct and causes clonorchiasis, is of interest to both the public health field and the scientific research community. To learn more about the migration, parasitism and pathogenesis of C. sinensis at the molecular level, the present study developed an upgraded genomic assembly and annotation by sequencing paired-end and mate-paired libraries. We also performed transcriptome sequence analyses on multiple C. sinensis tissues (sucker, muscle, ovary and testis). Genes encoding molecules involved in responses to stimuli and muscle-related development were abundantly expressed in the oral sucker. Compared with other species, genes encoding molecules that facilitate the recognition and transport of cholesterol were observed in high copy numbers in the genome and were highly expressed in the oral sucker. Genes encoding transporters for fatty acids, glucose, amino acids and oxygen were also highly expressed, along with other molecules involved in metabolizing these substrates. All genes involved in energy metabolism pathways, including the β-oxidation of fatty acids, the citrate cycle, oxidative phosphorylation, and fumarate reduction, were expressed in the adults. Finally, we also provide valuable insights into the mechanism underlying the process of pathogenesis by characterizing the secretome of C. sinensis. The characterization and elaborate analysis of the upgraded genome and the tissue transcriptomes not only form a detailed and fundamental C. sinensis resource but also provide novel insights into the physiology and pathogenesis of C. sinensis. We anticipate that this work will aid the development of innovative strategies for the prevention and control of clonorchiasis. PMID:23382950
Porin Deficiency in Carbapenem-Resistant Enterobacter aerogenes Strains.
Hao, Min; Ye, Meiping; Shen, Zhen; Hu, Fupin; Yang, Yang; Wu, Shi; Xu, Xiaogang; Zhu, Sihui; Qin, Xiaohua; Wang, Minggui
2018-03-13
The more frequent reports of carbapenem-resistant Enterobacteriaceae have raised the alarm for public health. Apart from the production of carbapenemases, deficiency (decreased or loss of expression) of outer membrane proteins (OMPs) has been proposed as a potentially important mechanism of carbapenem resistance. The aim of the present study was to evaluate the contribution of the major OMPs to carbapenem resistance in Enterobacter aerogenes (CREA) isolates and also investigate the role of small RNAs (sRNAs) in inducing porin-associated permeability defects. The differential expression of OMPs was analyzed in four clinical CREA isolates. omp35 and omp36 genes were further investigated by whole-genome sequencing, induction of meropenem resistance, sRNA overexpression, OMP complementation assays, and reverse transcription-quantitative PCR. All four isolates examined were deficient in omp35 and omp36. Functional restoration of these two genes confirmed their contribution to carbapenem resistance. The meropenem induction assay further revealed that porin deficiency plays a role in carbapenem resistance under antibiotic selection pressure. Single-point mutations in omp36 leading to premature stop codons were detected in two of the isolates. Elevated expression levels of the sRNAs micF and micC were detected in the other two porin-deficient isolates, which were predicted to be potential porin regulators from whole-genome sequencing. Overexpression of micF and micC downregulated the expression of Omp35 and Omp36, respectively. Porin deficiency plays an important role in carbapenem resistance among clinical E. aerogenes isolates under regulation of the sRNAs micC and micF. Furthermore, overexpression of micC and micF had a minor to no impact on carbapenem minimum inhibitory concentrations, and thus, the regulatory mechanism is likely to be complex.
Dean, C; Jones, J; Favreau, M; Dunsmuir, P; Bedbrook, J
1988-01-01
The petunia rbcS gene SSU301 was introduced into tobacco using Agrobacterium tumefaciens-mediated transformation. The time at which rbcS expression was maximal after transfer of the tobacco plants to the greenhouse was determined. The expression level of the SSU301 gene varied up to 9 fold between individual tobacco plants which had been standardized physiologically as much as possible. The presence of adjacent pUC plasmid sequences did not affect the expression of the SSU301 gene. In an attempt to reduce the between-transformant variability in expression, the SSU301 gene was introduced into tobacco surrounded by 10kb of 5' and 13 kb of 3' DNA sequences which normally flank SSU301 in petunia. The longer flanking regions did not reduce the between-transformant variability of SSU301 gene expression. Images PMID:3174450
Gu, Lijiao; Li, Libei; Wei, Hengling; Wang, Hantao; Su, Junji; Guo, Yaning; Yu, Shuxun
2018-01-01
WRKY transcription factors play important roles in plant defense, stress response, leaf senescence, and plant growth and development. Previous studies have revealed the important roles of the group IIa GhWRKY genes in cotton. To comprehensively analyze the group IIa GhWRKY genes in upland cotton, we identified 15 candidate group IIa GhWRKY genes in the Gossypium hirsutum genome. The phylogenetic tree, intron-exon structure, motif prediction and Ka/Ks analyses indicated that most group IIa GhWRKY genes shared high similarity and conservation and underwent purifying selection during evolution. In addition, we detected the expression patterns of several group IIa GhWRKY genes in individual tissues as well as during leaf senescence using public RNA sequencing data and real-time quantitative PCR. To better understand the functions of group IIa GhWRKYs in cotton, GhWRKY17 (KF669857) was isolated from upland cotton, and its sequence alignment, promoter cis-acting elements and subcellular localization were characterized. Moreover, the over-expression of GhWRKY17 in Arabidopsis up-regulated the senescence-associated genes AtWRKY53, AtSAG12 and AtSAG13, enhancing the plant's susceptibility to leaf senescence. These findings lay the foundation for further analysis and study of the functions of WRKY genes in cotton.
Zhu, Jia-Hong; Cao, Tian-Jun; Dai, Hao-Fu; Li, Hui-Liang; Guo, Dong; Mei, Wen-Li; Peng, Shi-Qing
2016-12-06
Dragon's blood is a red resin mainly extracted from Dracaena plants, and has been widely used as a traditional medicine in East and Southeast Asia. The major components of dragon's blood are flavonoids. Owing to a lack of Dracaena plants genomic information, the flavonoids biosynthesis and regulation in Dracaena plants remain unknown. In this study, three cDNA libraries were constructed from the stems of D. cambodiana after injecting the inducer. Approximately 266.57 million raw sequencing reads were de novo assembled into 198,204 unigenes, of which 34,873 unique sequences were annotated in public protein databases. Many candidate genes involved in flavonoid accumulation were identified. Differential expression analysis identified 20 genes involved in flavonoid biosynthesis, 27 unigenes involved in flavonoid modification and 68 genes involved in flavonoid transport that were up-regulated in the stems of D. cambodiana after injecting the inducer, consistent with the accumulation of flavonoids. Furthermore, we have revealed the differential expression of transcripts encoding for transcription factors (MYB, bHLH and WD40) involved in flavonoid metabolism. These de novo transcriptome data sets provide insights on pathways and molecular regulation of flavonoid biosynthesis and transport, and improve our understanding of molecular mechanisms of dragon's blood formation in D. cambodiana.
Zhu, Jia-Hong; Cao, Tian-Jun; Dai, Hao-Fu; Li, Hui-Liang; Guo, Dong; Mei, Wen-Li; Peng, Shi-Qing
2016-01-01
Dragon’s blood is a red resin mainly extracted from Dracaena plants, and has been widely used as a traditional medicine in East and Southeast Asia. The major components of dragon’s blood are flavonoids. Owing to a lack of Dracaena plants genomic information, the flavonoids biosynthesis and regulation in Dracaena plants remain unknown. In this study, three cDNA libraries were constructed from the stems of D. cambodiana after injecting the inducer. Approximately 266.57 million raw sequencing reads were de novo assembled into 198,204 unigenes, of which 34,873 unique sequences were annotated in public protein databases. Many candidate genes involved in flavonoid accumulation were identified. Differential expression analysis identified 20 genes involved in flavonoid biosynthesis, 27 unigenes involved in flavonoid modification and 68 genes involved in flavonoid transport that were up-regulated in the stems of D. cambodiana after injecting the inducer, consistent with the accumulation of flavonoids. Furthermore, we have revealed the differential expression of transcripts encoding for transcription factors (MYB, bHLH and WD40) involved in flavonoid metabolism. These de novo transcriptome data sets provide insights on pathways and molecular regulation of flavonoid biosynthesis and transport, and improve our understanding of molecular mechanisms of dragon’s blood formation in D. cambodiana. PMID:27922066
Analysis of Expressed Sequence Tags (EST) in Date Palm.
Al-Faifi, Sulieman A; Migdadi, Hussein M; Algamdi, Salem S; Khan, Mohammad Altaf; Al-Obeed, Rashid S; Ammar, Megahed H; Jakse, Jerenj
2017-01-01
Expressed sequence tags (EST) were generated from a normalized cDNA library of the date palm Sukkari cv. to understand the high-quality and better field performance of this well-known commercial cultivar. A total of 6943 high-quality ESTs were generated, out of them 6671 are submitted to the GenBank dbEST (LIBEST_028537). The generated ESTs were assembled into 6362 unigenes, consisting of 494 (14.4%) contigs and 5868 (84.53%) singletons. The functional annotation shows that the majority of the ESTs are associated with binding (44%), catalytic (40%), transporter (5%), and structural molecular (5%) activities. The blastx results show that 73% of unigenes are significantly similar to known plant genes and 27% are novel. The latter could be of particular interest in date palm genetic studies. Further analysis shows that some ESTs are categorized as stress/defense- and fruit development-related genes. These newly generated ESTs could significantly enhance date palm EST databases in the public domain and are available to scientists and researchers across the globe. This knowledge will facilitate the discovery of candidate genes that govern important developmental and agronomical traits in date palm. It will provide important resources for developing genetic tools, comparative genomics, and genome evolution among date palm cultivars.
Fundamentals of nutrigenetics and nutrigenomics.
Bouchard, Claude; Ordovas, Jose M
2012-01-01
This volume of Progress in Molecular Biology and Translational Science is devoted to the exciting and promising field of nutrigenetics and nutrigenomics. The introductory chapter defines the basic concepts necessary for the interpretation of the material covered in the remainder of the volume. Emphasis is on the concept of personalized nutrition and its likely role in public health and disease prevention, as well as in therapeutics. Nutrigenetics refers to the role of DNA sequence variation in the responses to nutrients, whereas nutrigenomics is the study of the role of nutrients in gene expression. This research is predicated on the assumption that there are individual differences in responsiveness to acute or repeated exposures to a given nutrient or combination of nutrients. Throughout human history, diet has affected the expression of genes, resulting in phenotypes that are able to successfully respond to environmental challenges and that allow better exploitation of food resources. These adaptations have been key to human growth and development. Technological advances have made it possible to investigate not only specific genes but also to explore in unbiased designs the whole genome-wide complement of DNA sequence variants or transcriptome. These advances provide an opportunity to establish the foundation for incorporating biological individuality into dietary recommendations, with significant therapeutic potential. Copyright © 2012 Elsevier Inc. All rights reserved.
The BIG Data Center: from deposition to integration to translation.
2017-01-04
Biological data are generated at unprecedentedly exponential rates, posing considerable challenges in big data deposition, integration and translation. The BIG Data Center, established at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, provides a suite of database resources, including (i) Genome Sequence Archive, a data repository specialized for archiving raw sequence reads, (ii) Gene Expression Nebulas, a data portal of gene expression profiles based entirely on RNA-Seq data, (iii) Genome Variation Map, a comprehensive collection of genome variations for featured species, (iv) Genome Warehouse, a centralized resource housing genome-scale data with particular focus on economically important animals and plants, (v) Methylation Bank, an integrated database of whole-genome single-base resolution methylomes and (vi) Science Wikis, a central access point for biological wikis developed for community annotations. The BIG Data Center is dedicated to constructing and maintaining biological databases through big data integration and value-added curation, conducting basic research to translate big data into big knowledge and providing freely open access to a variety of data resources in support of worldwide research activities in both academia and industry. All of these resources are publicly available and can be found at http://bigd.big.ac.cn. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Zhang, Zhang; Liu, Jingxing; Wu, Jiayan; Yu, Jun
2013-01-01
The regulation of gene expression is essential for eukaryotes, as it drives the processes of cellular differentiation and morphogenesis, leading to the creation of different cell types in multicellular organisms. RNA-Sequencing (RNA-Seq) provides researchers with a powerful toolbox for characterization and quantification of transcriptome. Many different human tissue/cell transcriptome datasets coming from RNA-Seq technology are available on public data resource. The fundamental issue here is how to develop an effective analysis method to estimate expression pattern similarities between different tumor tissues and their corresponding normal tissues. We define the gene expression pattern from three directions: 1) expression breadth, which reflects gene expression on/off status, and mainly concerns ubiquitously expressed genes; 2) low/high or constant/variable expression genes, based on gene expression level and variation; and 3) the regulation of gene expression at the gene structure level. The cluster analysis indicates that gene expression pattern is higher related to physiological condition rather than tissue spatial distance. Two sets of human housekeeping (HK) genes are defined according to cell/tissue types, respectively. To characterize the gene expression pattern in gene expression level and variation, we firstly apply improved K-means algorithm and a gene expression variance model. We find that cancer-associated HK genes (a HK gene is specific in cancer group, while not in normal group) are expressed higher and more variable in cancer condition than in normal condition. Cancer-associated HK genes prefer to AT-rich genes, and they are enriched in cell cycle regulation related functions and constitute some cancer signatures. The expression of large genes is also avoided in cancer group. These studies will help us understand which cell type-specific patterns of gene expression differ among different cell types, and particularly for cancer. PMID:23382867
Croteau, Rodney Bruce; Wildung, Mark Raymond; Crock, John E.
1999-01-01
A cDNA encoding (E)-.beta.-farnesene synthase from peppermint (Mentha piperita) has been isolated and sequenced, and the corresponding amino acid sequence has been determined. Accordingly, an isolated DNA sequence (SEQ ID NO:1) is provided which codes for the expression of (E)-.beta.-farnesene synthase (SEQ ID NO:2), from peppermint (Mentha piperita). In other aspects, replicable recombinant cloning vehicles are provided which code for (E)-.beta.-farnesene synthase, or for a base sequence sufficiently complementary to at least a portion of (E)-.beta.-farnesene synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding (E)-.beta.-farnesene synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant (E)-.beta.-farnesene synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant (E)-.beta.-farnesene synthase may be used to obtain expression or enhanced expression of (E)-.beta.-farnesene synthase in plants in order to enhance the production of (E)-.beta.-farnesene, or may be otherwise employed for the regulation or expression of (E)-.beta.-farnesene synthase, or the production of its product.
Croteau, Rodney Bruce; Crock, John E.
2005-01-25
A cDNA encoding (E)-.beta.-farnesene synthase from peppermint (Mentha piperita) has been isolated and sequenced, and the corresponding amino acid sequence has been determined. Accordingly, an isolated DNA sequence (SEQ ID NO:1) is provided which codes for the expression of (E)-.beta.-farnesene synthase (SEQ ID NO:2), from peppermint (Mentha piperita). In other aspects, replicable recombinant cloning vehicles are provided which code for (E)-.beta.-farnesene synthase, or for a base sequence sufficiently complementary to at least a portion of (E)-.beta.-farnesene synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding (E)-.beta.-farnesene synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant (E)-.beta.-famesene synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant (E)-.beta.-farnesene synthase may be used to obtain expression or enhanced expression of (E)-.beta.-famesene synthase in plants in order to enhance the production of (E)-.beta.-farnesene, or may be otherwise employed for the regulation or expression of (E)-.beta.-farnesene synthase, or the production of its product.
Yang, J; Yamamoto, M; Ishibashi, J; Taniai, K; Yamakawa, M
1998-08-01
An antibacterial protein, designated rhinocerosin, was purified to homogeneity from larvae of the coconut rhinoceros beetle, Oryctes rhinoceros immunized with Escherichia coli. Based on the amino acid sequence of the N-terminal region, a degenerate primer was synthesized and reverse-transcriptase PCR was performed to clone rhinocerosin cDNA. As a result, a 279-bp fragment was obtained. The complete nucleotide sequence was determined by sequencing the extended rhinocerosin cDNA clone by 5' rapid amplification of cDNA ends. The deduced amino acid sequence of the mature portion of rhinocerosin was composed of 72 amino acids without cystein residues and was shown to be rich in glycine (11.1%) and proline (11.1%) residues. Comparison of the deduced amino acid sequence of rhinocerosin with those of other antibacterial proteins indicated that it has 77.8% and 44.6% identity with holotricin 2 and coleoptrecin, respectively. Rhinocerosin had strong antibacterial activity against E. coli, Streptococcus pyogenes, Staphylococcus aureus but not against Pseudomonas aeruginosa. Results of reverse-transcriptase PCR analysis of gene expression in different tissues indicated that the rhinocerosin gene is strongly expressed in the fat body and the Malpighian tubule, and weakly expressed in hemocytes and midgut. In addition, gene expression was inducible by bacteria in the fat body, the Malpighian tubule and hemocyte but constitutive expression was observed in the midgut.
Molecular Characterization of Copepod Photoreception.
Porter, Megan L; Steck, Mireille; Roncalli, Vittoria; Lenz, Petra H
2017-08-01
Copepod crustaceans are an abundant and ecologically significant group whose basic biology is guided by numerous visually guided behaviors. These behaviors are driven by copepod eyes, including naupliar eyes and Gicklhorn's organs, which vary widely in structure and function among species. Yet little is known about the molecular aspects of copepod vision. In this study we present a general overview of the molecular aspects of copepod vision by identifying phototransduction genes from newly generated and publicly available RNA-sequencing data and assemblies from 12 taxonomically diverse copepod species. We identify a set of 10 expressed transcripts that serve as a set of target genes for future studies of copepod phototransduction. Our more detailed evolutionary analyses of the opsin gene responsible for forming visual pigments found that all of the copepod species investigated express two main groups of opsins: middle-wavelength-sensitive (MWS) opsins and pteropsins. Additionally, there is evidence from a few species (e.g., Calanus finmarchicus, Eurytemora affinis, Paracyclopina nana, and Lernaea cyprinacea) for the expression of two additional groups of opsins-the peropsins and rhodopsin 7 (Rh7) opsins-at low levels or distinct developmental stages. An ontogenetic analysis of opsin expression in Calanus finmarchicus found the expression of a single dominant MWS opsin, as well as evidence for differences in expression across development in some MWS, pteropsin, and Rh7 opsins, with expression peaking in early naupliar through early copepodite stages.
Rodovalho, Cynara M; Ferro, Milene; Fonseca, Fernando Pp; Antonio, Erik A; Guilherme, Ivan R; Henrique-Silva, Flávio; Bacci, Maurício
2011-06-17
Leafcutters are the highest evolved within Neotropical ants in the tribe Attini and model systems for studying caste formation, labor division and symbiosis with microorganisms. Some species of leafcutters are agricultural pests controlled by chemicals which affect other animals and accumulate in the environment. Aiming to provide genetic basis for the study of leafcutters and for the development of more specific and environmentally friendly methods for the control of pest leafcutters, we generated expressed sequence tag data from Atta laevigata, one of the pest ants with broad geographic distribution in South America. The analysis of the expressed sequence tags allowed us to characterize 2,006 unique sequences in Atta laevigata. Sixteen of these genes had a high number of transcripts and are likely positively selected for high level of gene expression, being responsible for three basic biological functions: energy conservation through redox reactions in mitochondria; cytoskeleton and muscle structuring; regulation of gene expression and metabolism. Based on leafcutters lifestyle and reports of genes involved in key processes of other social insects, we identified 146 sequences potential targets for controlling pest leafcutters. The targets are responsible for antixenobiosis, development and longevity, immunity, resistance to pathogens, pheromone function, cell signaling, behavior, polysaccharide metabolism and arginine kynase activity. The generation and analysis of expressed sequence tags from Atta laevigata have provided important genetic basis for future studies on the biology of leaf-cutting ants and may contribute to the development of a more specific and environmentally friendly method for the control of agricultural pest leafcutters.
Jaber, Tareq; Henderson, Gail; Li, Sumin; Perng, Guey-Chuen; Carpenter, Dale; Wechsler, Steven L; Jones, Clinton
2009-10-01
The herpes simplex virus type 1 (HSV-1) latency-associated transcript (LAT) is abundantly expressed in latently infected sensory neurons. In small animal models of infection, expression of the first 1.5 kb of LAT coding sequences is necessary and sufficient for wild-type reactivation from latency. The ability of LAT to inhibit apoptosis is important for reactivation from latency. Within the first 1.5 kb of LAT coding sequences and LAT promoter sequences, additional transcripts have been identified. For example, the anti-sense to LAT transcript (AL) is expressed in the opposite direction to LAT from the 5' end of LAT and LAT promoter sequences. In addition, the upstream of LAT (UOL) transcript is expressed in the LAT direction from sequences in the LAT promoter. Further examination of the first 1.5 kb of LAT coding sequences revealed two small ORFs that are anti-sense with respect to LAT (AL2 and AL3). A transcript spanning AL3 was detected in productively infected cells, mouse neuroblastoma cells stably expressing LAT and trigeminal ganglia (TG) of latently infected mice. Peptide-specific IgG directed against AL3 specifically recognized a protein migrating near 15 kDa in cells stably transfected with LAT, mouse neuroblastoma cells transfected with a plasmid containing the AL3 ORF and TG of latently infected mice. The inability to detect the AL3 protein during productive infection may have been because the 5' terminus of the AL3 transcript was downstream of the first in-frame methionine of the AL3 ORF during productive infection.
2011-01-01
Background Leafcutters are the highest evolved within Neotropical ants in the tribe Attini and model systems for studying caste formation, labor division and symbiosis with microorganisms. Some species of leafcutters are agricultural pests controlled by chemicals which affect other animals and accumulate in the environment. Aiming to provide genetic basis for the study of leafcutters and for the development of more specific and environmentally friendly methods for the control of pest leafcutters, we generated expressed sequence tag data from Atta laevigata, one of the pest ants with broad geographic distribution in South America. Results The analysis of the expressed sequence tags allowed us to characterize 2,006 unique sequences in Atta laevigata. Sixteen of these genes had a high number of transcripts and are likely positively selected for high level of gene expression, being responsible for three basic biological functions: energy conservation through redox reactions in mitochondria; cytoskeleton and muscle structuring; regulation of gene expression and metabolism. Based on leafcutters lifestyle and reports of genes involved in key processes of other social insects, we identified 146 sequences potential targets for controlling pest leafcutters. The targets are responsible for antixenobiosis, development and longevity, immunity, resistance to pathogens, pheromone function, cell signaling, behavior, polysaccharide metabolism and arginine kynase activity. Conclusion The generation and analysis of expressed sequence tags from Atta laevigata have provided important genetic basis for future studies on the biology of leaf-cutting ants and may contribute to the development of a more specific and environmentally friendly method for the control of agricultural pest leafcutters. PMID:21682882
Llopart, Ana
2018-05-01
The hemizygosity of the X (Z) chromosome fully exposes the fitness effects of mutations on that chromosome and has evolutionary consequences on the relative rates of evolution of X and autosomes. Specifically, several population genetics models predict increased rates of evolution in X-linked loci relative to autosomal loci. This prediction of faster-X evolution has been evaluated and confirmed for both protein coding sequences and gene expression. In the case of faster-X evolution for gene expression divergence, it is often assumed that variation in 5' noncoding sequences is associated with variation in transcript abundance between species but a formal, genomewide test of this hypothesis is still missing. Here, I use whole genome sequence data in Drosophila yakuba and D. santomea to evaluate this hypothesis and report positive correlations between sequence divergence at 5' noncoding sequences and gene expression divergence. I also examine polymorphism and divergence in 9,279 noncoding sequences located at the 5' end of annotated genes and detected multiple signals of positive selection. Notably, I used the traditional synonymous sites as neutral reference to test for adaptive evolution, but I also used bases 8-30 of introns <65 bp, which have been proposed to be a better neutral choice. X-linked genes with high degree of male-biased expression show the most extreme adaptive pattern at 5' noncoding regions, in agreement with faster-X evolution for gene expression divergence and a higher incidence of positively selected recessive mutations. © 2018 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.
Gene expression analysis of flax seed development
2011-01-01
Background Flax, Linum usitatissimum L., is an important crop whose seed oil and stem fiber have multiple industrial applications. Flax seeds are also well-known for their nutritional attributes, viz., omega-3 fatty acids in the oil and lignans and mucilage from the seed coat. In spite of the importance of this crop, there are few molecular resources that can be utilized toward improving seed traits. Here, we describe flax embryo and seed development and generation of comprehensive genomic resources for the flax seed. Results We describe a large-scale generation and analysis of expressed sequences in various tissues. Collectively, the 13 libraries we have used provide a broad representation of genes active in developing embryos (globular, heart, torpedo, cotyledon and mature stages) seed coats (globular and torpedo stages) and endosperm (pooled globular to torpedo stages) and genes expressed in flowers, etiolated seedlings, leaves, and stem tissue. A total of 261,272 expressed sequence tags (EST) (GenBank accessions LIBEST_026995 to LIBEST_027011) were generated. These EST libraries included transcription factor genes that are typically expressed at low levels, indicating that the depth is adequate for in silico expression analysis. Assembly of the ESTs resulted in 30,640 unigenes and 82% of these could be identified on the basis of homology to known and hypothetical genes from other plants. When compared with fully sequenced plant genomes, the flax unigenes resembled poplar and castor bean more than grape, sorghum, rice or Arabidopsis. Nearly one-fifth of these (5,152) had no homologs in sequences reported for any organism, suggesting that this category represents genes that are likely unique to flax. Digital analyses revealed gene expression dynamics for the biosynthesis of a number of important seed constituents during seed development. Conclusions We have developed a foundational database of expressed sequences and collection of plasmid clones that comprise even low-expressed genes such as those encoding transcription factors. This has allowed us to delineate the spatio-temporal aspects of gene expression underlying the biosynthesis of a number of important seed constituents in flax. Flax belongs to a taxonomic group of diverse plants and the large sequence database will allow for evolutionary studies as well. PMID:21529361
Update Direct-Strike Lightning Environment for Stockpile-to-Target Sequence (Second Revision)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Uman, Martin A.; Rakov, V. A.; Elisme, J. O.
2010-10-05
The University of Florida has surveyed all relevant publications reporting lightning characteristics and presents here an up-to-date version of the direct-strike lightning environment specifications for nuclear weapons published in 1989 by R. J. Fisher and M. A. Uman. Further, we present functional expressions for current vs. time, current derivative vs. time, second current derivative vs. time, charge transfer vs. time, and action integral (specific energy) vs. time for positive and negative first return strokes, for negative subsequent return strokes, and for positive and negative continuing currents; and we give sets of constants for these functional expressions so that the resultantmore » waveforms exhibit approximately the median and extreme lightning parameters presented in the updated direct strike environment. Fourier transforms of the return stroke current waveforms are presented. The results of our literature survey are included in three Appendices entitled Return Stroke Current, Continuing Current, and Positive Lightning.« less
Jorge, Sérgio; Monte, Leonardo G; Coimbra, Marco Antonio; Albano, Ana Paula; Hartwig, Daiane D; Lucas, Caroline; Seixas, Fabiana K; Dellagostin, Odir A; Hartleben, Cláudia P
2012-10-01
Leptospirosis is a globally prevalent zoonosis caused by pathogenic Leptospira spp.; several serologic variants have reservoirs in synanthropic rodents. The capybara is the largest living rodent in the world, and it has a wide geographical distribution in Central and South America. This rodent is a significant source of Leptospira since the agent is shed via urine into the environment and is a potential public health threat. In this study, we isolated and identified by molecular techniques a pathogenic Leptospira from capybara in southern Brazil. The isolated strain was characterized by partial rpoB gene sequencing and variable-number tandem-repeats analysis as L. interrogans, serogroup Icterohaemorrhagiae. In addition, to confirm the expression of virulence factors, the bacterial immunoglobulin-like proteins A and B expression was detected by indirect immunofluorescence using leptospiral specific monoclonal antibodies. This report identifies capybaras as an important source of infection and provides insight into the epidemiology of leptospirosis.
Bove, Jérôme; Lucas, Philippe; Godin, Béatrice; Ogé, Laurent; Jullien, Marc; Grappin, Philippe
2005-03-01
Seed dormancy in Nicotiana plumbaginifolia is characterized by an abscisic acid accumulation linked to a pronounced germination delay. Dormancy can be released by 1 year after-ripening treatment. Using a cDNA-amplified fragment length polymorphism (cDNA-AFLP) approach we compared the gene expression patterns of dormant and after-ripened seeds, air-dry or during one day imbibition and analyzed 15,000 cDNA fragments. Among them 1020 were found to be differentially regulated by dormancy. Of 412 sequenced cDNA fragments, 83 were assigned to a known function by search similarities to public databases. The functional categories of the identified dormancy maintenance and breaking responsive genes, give evidence that after-ripening turns in the air-dry seed to a new developmental program that modulates, at the RNA level, components of translational control, signaling networks, transcriptional control and regulated proteolysis.
Sequence analysis of 497 mouse brain ESTs expressed in the substantia nigra
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stewart, G.J.; Savioz, A.; Davies, R.W.
1997-01-15
The use of subtracted, region-specific cDNA libraries combined with single-pass cDNA sequencing allows the discovery of novel genes and facilitates molecular description of the tissue or region involved. We report the sequence of 497 mouse expressed sequence tags (ESTs) from two subtracted libraries enriched for cDNAs expressed in the substantia nigra, a brain region with important roles in movement control and Parkinson disease. Of these, 238 ESTs give no database matches and therefore derive from novel genes. A further 115 ESTs show sequence similarity to ESTs from other organisms, which themselves do not yield any significant database matches to genesmore » of known function. Fifty-six ESTs show sequence similarity to previously identified genes whose mouse homologues have not been reported. The total number of ESTs reported that are new for the mouse is 407, which, together with the 90 ESTs corresponding to known mouse genes or cDNAs, contributes to the molecular description of the substantia nigra. 21 refs., 4 tabs.« less
RNA-ID, a Powerful Tool for Identifying and Characterizing Regulatory Sequences.
Brule, C E; Dean, K M; Grayhack, E J
2016-01-01
The identification and analysis of sequences that regulate gene expression is critical because regulated gene expression underlies biology. RNA-ID is an efficient and sensitive method to discover and investigate regulatory sequences in the yeast Saccharomyces cerevisiae, using fluorescence-based assays to detect green fluorescent protein (GFP) relative to a red fluorescent protein (RFP) control in individual cells. Putative regulatory sequences can be inserted either in-frame or upstream of a superfolder GFP fusion protein whose expression, like that of RFP, is driven by the bidirectional GAL1,10 promoter. In this chapter, we describe the methodology to identify and study cis-regulatory sequences in the RNA-ID system, explaining features and variations of the RNA-ID reporter, as well as some applications of this system. We describe in detail the methods to analyze a single regulatory sequence, from construction of a single GFP variant to assay of variants by flow cytometry, as well as modifications required to screen libraries of different strains simultaneously. We also describe subsequent analyses of regulatory sequences. © 2016 Elsevier Inc. All rights reserved.
Lima, L S; Gramacho, K P; Carels, N; Novais, R; Gaiotto, F A; Lopes, U V; Gesteira, A S; Zaidan, H A; Cascardo, J C M; Pires, J L; Micheli, F
2009-07-14
In order to increase the efficiency of cacao tree resistance to witches' broom disease, which is caused by Moniliophthora perniciosa (Tricholomataceae), we looked for molecular markers that could help in the selection of resistant cacao genotypes. Among the different markers useful for developing marker-assisted selection, single nucleotide polymorphisms (SNPs) constitute the most common type of sequence difference between alleles and can be easily detected by in silico analysis from expressed sequence tag libraries. We report the first detection and analysis of SNPs from cacao-M. perniciosa interaction expressed sequence tags, using bioinformatics. Selection based on analysis of these SNPs should be useful for developing cacao varieties resistant to this devastating disease.
Highly multiplexed subcellular RNA sequencing in situ
Lee, Je Hyuk; Daugharthy, Evan R.; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C.; Yang, Joyce L.; Terry, Richard; Jeanty, Sauveur S. F.; Li, Chao; Amamoto, Ryoji; Peters, Derek T.; Turczyk, Brian M.; Marblestone, Adam H.; Inverso, Samuel A.; Bernard, Amy; Mali, Prashant; Rios, Xavier; Aach, John; Church, George M.
2014-01-01
Understanding the spatial organization of gene expression with single nucleotide resolution requires localizing the sequences of expressed RNA transcripts within a cell in situ. Here we describe fluorescent in situ RNA sequencing (FISSEQ), in which stably cross-linked cDNA amplicons are sequenced within a biological sample. Using 30-base reads from 8,742 genes in situ, we examined RNA expression and localization in human primary fibroblasts using a simulated wound healing assay. FISSEQ is compatible with tissue sections and whole mount embryos, and reduces the limitations of optical resolution and noisy signals on single molecule detection. Our platform enables massively parallel detection of genetic elements, including gene transcripts and molecular barcodes, and can be used to investigate cellular phenotype, gene regulation, and environment in situ. PMID:24578530
Tavares, D; Tully, K; Dobner, P R
1999-10-15
The promoter region of the mouse high affinity neurotensin receptor (Ntr-1) gene was characterized, and sequences required for expression in neuroblastoma cell lines that express high affinity NT-binding sites were characterized. Me(2)SO-induced neuronal differentiation of N1E-115 neuroblastoma cells increased both the expression of the endogenous Ntr-1 gene and reporter genes driven by NTR-1 promoter sequences by 3-4-fold. Deletion analysis revealed that an 83-base pair promoter region containing the transcriptional start site is required for Me(2)SO activation. Detailed mutational analysis of this region revealed that a CACCC box and the central region of a large GC-rich palindrome are the crucial cis-regulatory elements required for Me(2)SO induction. The CACCC box is bound by at least one factor that is induced upon Me(2)SO treatment of N1E-115 cells. The Me(2)SO effect was found to be both selective and cell type-restricted. Basal expression in the neuroblastoma cell lines required a distinct set of sequences, including an Sp1-like sequence, and a sequence resembling an NGFI-A-binding site; however, a more distal 5' sequence was found to repress basal activity in N1E-115 cells. These results provide evidence that Ntr-1 gene regulation involves both positive and negative regulatory elements located in the 5'-flanking region and that Ntr-1 gene activation involves the coordinate activation or induction of several factors, including a CACCC box binding complex.
Rizk, Francine; Laverdure, Sylvain; d'Alençon, Emmanuelle; Bossin, Hervé; Dupressoir, Thierry
2018-01-01
The Lepidopteran ambidensovirus 1 isolated from Junonia coenia (hereafter JcDV) is an invertebrate parvovirus considered as a viral transduction vector as well as a potential tool for the biological control of insect pests. Previous works showed that JcDV-based circular plasmids experimentally integrate into insect cells genomic DNA. In order to approach the natural conditions of infection and possible integration, we generated linear JcDV- gfp based molecules which were transfected into non permissive Spodoptera frugiperda ( Sf9 ) cultured cells. Cells were monitored for the expression of green fluorescent protein (GFP) and DNA was analyzed for integration of transduced viral sequences. Non-structural protein modulation of the VP-gene cassette promoter activity was additionally assayed. We show that linear JcDV-derived molecules are capable of long term genomic integration and sustained transgene expression in Sf9 cells. As expected, only the deletion of both inverted terminal repeats (ITR) or the polyadenylation signals of NS and VP genes dramatically impairs the global transduction/expression efficiency. However, all the integrated viral sequences we characterized appear "scrambled" whatever the viral content of the transfected vector. Despite a strong GFP expression, we were unable to recover any full sequence of the original constructs and found rearranged viral and non-viral sequences as well. Cellular flanking sequences were identified as non-coding ones. On the other hand, the kinetics of GFP expression over time led us to investigate the apparent down-regulation by non-structural proteins of the VP-gene cassette promoter. Altogether, our results show that JcDV-derived sequences included in linear DNA molecules are able to drive efficiently the integration and expression of a foreign gene into the genome of insect cells, whatever their composition, provided that at least one ITR is present. However, the transfected sequences were extensively rearranged with cellular DNA during or after random integration in the host cell genome. Lastly, the non-structural proteins seem to participate in the regulation of p9 promoter activity rather than to the integration of viral sequences.
2010-01-01
Background Identifying associations between genotypes and gene expression levels using microarrays has enabled systematic interrogation of regulatory variation underlying complex phenotypes. This approach has vast potential for functional characterization of disease states, but its prohibitive cost, given hundreds to thousands of individual samples from populations have to be genotyped and expression profiled, has limited its widespread application. Results Here we demonstrate that genomic regions with allele-specific expression (ASE) detected by sequencing cDNA are highly enriched for cis-acting expression quantitative trait loci (cis-eQTL) identified by profiling of 500 animals in parallel, with up to 90% agreement on the allele that is preferentially expressed. We also observed widespread noncoding and antisense ASE and identified several allele-specific alternative splicing variants. Conclusion Monitoring ASE by sequencing cDNA from as little as one sample is a practical alternative to expression genetics for mapping cis-acting variation that regulates RNA transcription and processing. PMID:20707912
Lazar, Zbigniew; Rossignol, Tristan; Verbeke, Jonathan; Crutz-Le Coq, Anne-Marie; Nicaud, Jean-Marc; Robak, Małgorzata
2013-11-01
Yarrowia lipolytica requires the expression of a heterologous invertase to grow on a sucrose-based substrate. This work reports the construction of an optimized invertase expression cassette composed of Saccharomyces cerevisiae Suc2p secretion signal sequence followed by the SUC2 sequence and under the control of the strong Y. lipolytica pTEF promoter. This new construction allows a fast and optimal cleavage of sucrose into glucose and fructose and allows cells to reach the maximum growth rate. Contrary to pre-existing constructions, the expression of SUC2 is not sensitive to medium composition in this context. The strain JMY2593, expressing this new cassette with an optimized secretion signal sequence and a strong promoter, produces 4,519 U/l of extracellular invertase in bioreactor experiments compared to 597 U/l in a strain expressing the former invertase construction. The expression of this cassette strongly improved production of invertase and is suitable for simultaneously high production level of citric acid from sucrose-based media.
Transposon Variants and Their Effects on Gene Expression in Arabidopsis
Wang, Xi; Weigel, Detlef; Smith, Lisa M.
2013-01-01
Transposable elements (TEs) make up the majority of many plant genomes. Their transcription and transposition is controlled through siRNAs and epigenetic marks including DNA methylation. To dissect the interplay of siRNA–mediated regulation and TE evolution, and to examine how TE differences affect nearby gene expression, we investigated genome-wide differences in TEs, siRNAs, and gene expression among three Arabidopsis thaliana accessions. Both TE sequence polymorphisms and presence of linked TEs are positively correlated with intraspecific variation in gene expression. The expression of genes within 2 kb of conserved TEs is more stable than that of genes next to variant TEs harboring sequence polymorphisms. Polymorphism levels of TEs and closely linked adjacent genes are positively correlated as well. We also investigated the distribution of 24-nt-long siRNAs, which mediate TE repression. TEs targeted by uniquely mapping siRNAs are on average farther from coding genes, apparently because they more strongly suppress expression of adjacent genes. Furthermore, siRNAs, and especially uniquely mapping siRNAs, are enriched in TE regions missing in other accessions. Thus, targeting by uniquely mapping siRNAs appears to promote sequence deletions in TEs. Overall, our work indicates that siRNA–targeting of TEs may influence removal of sequences from the genome and hence evolution of gene expression in plants. PMID:23408902
Efficient search, mapping, and optimization of multi-protein genetic systems in diverse bacteria
Farasat, Iman; Kushwaha, Manish; Collens, Jason; Easterbrook, Michael; Guido, Matthew; Salis, Howard M
2014-01-01
Developing predictive models of multi-protein genetic systems to understand and optimize their behavior remains a combinatorial challenge, particularly when measurement throughput is limited. We developed a computational approach to build predictive models and identify optimal sequences and expression levels, while circumventing combinatorial explosion. Maximally informative genetic system variants were first designed by the RBS Library Calculator, an algorithm to design sequences for efficiently searching a multi-protein expression space across a > 10,000-fold range with tailored search parameters and well-predicted translation rates. We validated the algorithm's predictions by characterizing 646 genetic system variants, encoded in plasmids and genomes, expressed in six gram-positive and gram-negative bacterial hosts. We then combined the search algorithm with system-level kinetic modeling, requiring the construction and characterization of 73 variants to build a sequence-expression-activity map (SEAMAP) for a biosynthesis pathway. Using model predictions, we designed and characterized 47 additional pathway variants to navigate its activity space, find optimal expression regions with desired activity response curves, and relieve rate-limiting steps in metabolism. Creating sequence-expression-activity maps accelerates the optimization of many protein systems and allows previous measurements to quantitatively inform future designs. PMID:24952589
The tripartite leader sequence is required for ectopic expression of HAdV-B and HAdV-E E3 CR1 genes.
Bair, Camden R; Kotha Lakshmi Narayan, Poornima; Kajon, Adriana E
2017-05-01
The unique repertoire of genes that characterizes the early region 3 (E3) of the different species of human adenovirus (HAdV) likely contributes to their distinct pathogenic traits. The function of many E3 CR1 proteins remains unknown possibly due to unidentified intrinsic properties that make them difficult to express ectopically. This study shows that the species HAdV-B- and HAdV-E-specific E3 CR1 genes can be expressed from vectors carrying the HAdV tripartite leader (TPL) sequence but not from traditional mammalian expression vectors. Insertion of the TPL sequence upstream of the HAdV-B and HAdV-E E3 CR1 open reading frames was sufficient to rescue protein expression from pCI-neo constructs in transfected 293T cells. The detection of higher levels of HAdV-B and HAdV-E E3 CR1 transcripts suggests that the TPL sequence may enhance gene expression at both the transcriptional and translational levels. Our findings will facilitate the characterization of additional AdV E3 proteins. Copyright © 2017 Elsevier Inc. All rights reserved.
Newborn Sequencing in Genomic Medicine and Public Health
Agrawal, Pankaj B.; Bailey, Donald B.; Beggs, Alan H.; Brenner, Steven E.; Brower, Amy M.; Cakici, Julie A.; Ceyhan-Birsoy, Ozge; Chan, Kee; Chen, Flavia; Currier, Robert J.; Dukhovny, Dmitry; Green, Robert C.; Harris-Wai, Julie; Holm, Ingrid A.; Iglesias, Brenda; Joseph, Galen; Kingsmore, Stephen F.; Koenig, Barbara A.; Kwok, Pui-Yan; Lantos, John; Leeder, Steven J.; Lewis, Megan A.; McGuire, Amy L.; Milko, Laura V.; Mooney, Sean D.; Parad, Richard B.; Pereira, Stacey; Petrikin, Joshua; Powell, Bradford C.; Powell, Cynthia M.; Puck, Jennifer M.; Rehm, Heidi L.; Risch, Neil; Roche, Myra; Shieh, Joseph T.; Veeraraghavan, Narayanan; Watson, Michael S.; Willig, Laurel; Yu, Timothy W.; Urv, Tiina; Wise, Anastasia L.
2017-01-01
The rapid development of genomic sequencing technologies has decreased the cost of genetic analysis to the extent that it seems plausible that genome-scale sequencing could have widespread availability in pediatric care. Genomic sequencing provides a powerful diagnostic modality for patients who manifest symptoms of monogenic disease and an opportunity to detect health conditions before their development. However, many technical, clinical, ethical, and societal challenges should be addressed before such technology is widely deployed in pediatric practice. This article provides an overview of the Newborn Sequencing in Genomic Medicine and Public Health Consortium, which is investigating the application of genome-scale sequencing in newborns for both diagnosis and screening. PMID:28096516
Mondal, Tapan Kumar; Ganie, Showkat Ahmad; Debnath, Ananda Bhusan
2015-01-01
Oryza coarctata, a halophyte and wild relative of rice, is grown normally in saline water. MicroRNAs (miRNAs) are non-coding RNAs that play pivotal roles in every domain of life including stress response. There are very few reports on the discovery of salt-responsive miRNAs from halophytes. In this study, two small RNA libraries, one each from the control and salt-treated (450 mM NaCl for 24 h) leaves of O. coarctata were sequenced, which yielded 338 known and 95 novel miRNAs. Additionally, we used publicly available transcriptomics data of O. coarctata which led to the discovery of additional 48 conserved miRNAs along with their pre-miRNA sequences through in silico analysis. In total, 36 known and 7 novel miRNAs were up-regulated whereas, 12 known and 7 novel miRNAs were down-regulated under salinity stress. Further, 233 and 154 target genes were predicted for 48 known and 14 novel differentially regulated miRNAs respectively. These targets with the help of gene ontology analysis were found to be involved in several important biological processes that could be involved in salinity tolerance. Relative expression trends of majority of the miRNAs as detected by real time-PCR as well as predicted by Illumina sequencing were found to be coherent. Additionally, expression of most of the target genes was negatively correlated with their corresponding miRNAs. Thus, the present study provides an account of miRNA-target networking that is involved in salinity adaption of O. coarctata.
Ji, Rui; Wang, Yujun; Cheng, Yanbin; Zhang, Meiping; Zhang, Hong-Bin; Zhu, Li; Fang, Jichao; Zhu-Salzman, Keyan
2016-01-01
Green peach aphid (Myzus persicae) and pea aphid (Acyrthosiphon pisum) are two phylogenetically closely related agricultural pests. While pea aphid is restricted to Fabaceae, green peach aphid feeds on hundreds of plant species from more than 40 families. Transcriptome comparison could shed light on the genetic factors underlying the difference in host range between the two species. Furthermore, a large scale study contrasting gene expression between immature nymphs and fully developed adult aphids would fill a previous knowledge gap. Here, we obtained transcriptomic sequences of green peach aphid nymphs and adults, respectively, using Illumina sequencing technology. A total of 2244 genes were found to be differentially expressed between the two developmental stages, many of which were associated with detoxification, hormone production, cuticle formation, metabolism, food digestion, and absorption. When searched against publically available pea aphid mRNA sequences, 13,752 unigenes were found to have no homologous counterparts. Interestingly, many of these unigenes that could be annotated in other databases were involved in the “xenobiotics biodegradation and metabolism” pathway, suggesting the two aphids differ in their adaptation to secondary metabolites of host plants. Conversely, 3989 orthologous gene pairs between the two species were subjected to calculations of synonymous and nonsynonymous substitutions, and 148 of the genes potentially evolved in response to positive selection. Some of these genes were predicted to be associated with insect-plant interactions. Our study has revealed certain molecular events related to aphid development, and provided some insight into biological variations in two aphid species, possibly as a result of host plant adaptation. PMID:27812361
CDinFusion – Submission-Ready, On-Line Integration of Sequence and Contextual Data
Hankeln, Wolfgang; Wendel, Norma Johanna; Gerken, Jan; Waldmann, Jost; Buttigieg, Pier Luigi; Kostadinov, Ivaylo; Kottmann, Renzo; Yilmaz, Pelin; Glöckner, Frank Oliver
2011-01-01
State of the art (DNA) sequencing methods applied in “Omics” studies grant insight into the ‘blueprints’ of organisms from all domains of life. Sequencing is carried out around the globe and the data is submitted to the public repositories of the International Nucleotide Sequence Database Collaboration. However, the context in which these studies are conducted often gets lost, because experimental data, as well as information about the environment are rarely submitted along with the sequence data. If these contextual or metadata are missing, key opportunities of comparison and analysis across studies and habitats are hampered or even impossible. To address this problem, the Genomic Standards Consortium (GSC) promotes checklists and standards to better describe our sequence data collection and to promote the capturing, exchange and integration of sequence data with contextual data. In a recent community effort the GSC has developed a series of recommendations for contextual data that should be submitted along with sequence data. To support the scientific community to significantly enhance the quality and quantity of contextual data in the public sequence data repositories, specialized software tools are needed. In this work we present CDinFusion, a web-based tool to integrate contextual and sequence data in (Multi)FASTA format prior to submission. The tool is open source and available under the Lesser GNU Public License 3. A public installation is hosted and maintained at the Max Planck Institute for Marine Microbiology at http://www.megx.net/cdinfusion. The tool may also be installed locally using the open source code available at http://code.google.com/p/cdinfusion. PMID:21935468
NCBI GEO: archive for functional genomics data sets--update.
Barrett, Tanya; Wilhite, Stephen E; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Holko, Michelle; Yefanov, Andrey; Lee, Hyeseung; Zhang, Naigong; Robertson, Cynthia L; Serova, Nadezhda; Davis, Sean; Soboleva, Alexandra
2013-01-01
The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.
Naville, Magali; Bressan, Cédric; Hühns, Maja; Gock, Michael; Kühn, Florian; Volff, Jean-Nicolas; Trillet-Lenoir, Véronique
2015-01-01
Background Expression of the human endogenous retrovirus (HERV)-H family has been associated with colorectal carcinomas (CRC), yet no individual HERV-H locus expression has been thoroughly correlated with clinical data. Here, we characterized HERV-H reactivations in clinical CRC samples by integrating expression profiles, molecular patterns and clinical data. Expression of relevant HERV-H sequences was analyzed by qRT-PCR on two well-defined clinical cohorts (n = 139 pairs of tumor and adjacent normal colon tissue) including samples from adenomas (n = 21) and liver metastases (n = 16). Correlations with clinical and molecular data were assessed. Results CRC specific HERV-H sequences were validated and found expressed throughout CRC disease progression. Correlations between HERV-H expression and lymph node invasion of tumor cells (p = 0.0006) as well as microsatellite instable tumors (p < 0.0001) were established. No association with regard to age, tumor localization, grading or common mutations became apparent. Interestingly, CRC expressed elements belonged to specific young HERV-H subfamilies and their 5′ LTR often presented active histone marks. Conclusion These results suggest a functional role of HERV-H sequences in colorectal carcinogenesis. The pronounced connection with microsatellite instability warrants a more detailed investigation. Thus, HERV-H sequences in addition to tumor specific mutations may represent clinically relevant, truly CRC specific markers for diagnostic, prognostic and therapeutic purposes. PMID:26517682
Pérot, Philippe; Mullins, Christina Susanne; Naville, Magali; Bressan, Cédric; Hühns, Maja; Gock, Michael; Kühn, Florian; Volff, Jean-Nicolas; Trillet-Lenoir, Véronique; Linnebacher, Michael; Mallet, François
2015-11-24
Expression of the human endogenous retrovirus (HERV)-H family has been associated with colorectal carcinomas (CRC), yet no individual HERV-H locus expression has been thoroughly correlated with clinical data.Here, we characterized HERV-H reactivations in clinical CRC samples by integrating expression profiles, molecular patterns and clinical data. Expression of relevant HERV-H sequences was analyzed by qRT-PCR on two well-defined clinical cohorts (n = 139 pairs of tumor and adjacent normal colon tissue) including samples from adenomas (n = 21) and liver metastases (n = 16). Correlations with clinical and molecular data were assessed. CRC specific HERV-H sequences were validated and found expressed throughout CRC disease progression. Correlations between HERV-H expression and lymph node invasion of tumor cells (p = 0.0006) as well as microsatellite instable tumors (p < 0.0001) were established. No association with regard to age, tumor localization, grading or common mutations became apparent. Interestingly, CRC expressed elements belonged to specific young HERV-H subfamilies and their 5' LTR often presented active histone marks. These results suggest a functional role of HERV-H sequences in colorectal carcinogenesis. The pronounced connection with microsatellite instability warrants a more detailed investigation. Thus, HERV-H sequences in addition to tumor specific mutations may represent clinically relevant, truly CRC specific markers for diagnostic, prognostic and therapeutic purposes.
Grace, Christy R.; Ferreira, Antonio M.; Waddell, M. Brett; Ridout, Granger; Naeve, Deanna; Leuze, Michael; LoCascio, Philip F.; Panetta, John C.; Wilkinson, Mark R.; Pui, Ching-Hon; Naeve, Clayton W.; Uberbacher, Edward C.; Bonten, Erik J.; Evans, William E.
2016-01-01
MicroRNAs are important regulators of gene expression, acting primarily by binding to sequence-specific locations on already transcribed messenger RNAs (mRNA) and typically down-regulating their stability or translation. Recent studies indicate that microRNAs may also play a role in up-regulating mRNA transcription levels, although a definitive mechanism has not been established. Double-helical DNA is capable of forming triple-helical structures through Hoogsteen and reverse Hoogsteen interactions in the major groove of the duplex, and we show physical evidence (i.e., NMR, FRET, SPR) that purine or pyrimidine-rich microRNAs of appropriate length and sequence form triple-helical structures with purine-rich sequences of duplex DNA, and identify microRNA sequences that favor triplex formation. We developed an algorithm (Trident) to search genome-wide for potential triplex-forming sites and show that several mammalian and non-mammalian genomes are enriched for strong microRNA triplex binding sites. We show that those genes containing sequences favoring microRNA triplex formation are markedly enriched (3.3 fold, p<2.2 × 10−16) for genes whose expression is positively correlated with expression of microRNAs targeting triplex binding sequences. This work has thus revealed a new mechanism by which microRNAs could interact with gene promoter regions to modify gene transcription. PMID:26844769
Nucleic acid molecules encoding isopentenyl monophosphate kinase, and methods of use
Croteau, Rodney B.; Lange, Bernd M.
2001-01-01
A cDNA encoding isopentenyl monophosphate kinase (IPK) from peppermint (Mentha x piperita) has been isolated and sequenced, and the corresponding amino acid sequence has been determined. Accordingly, an isolated DNA sequence (SEQ ID NO:1) is provided which codes for the expression of isopentenyl monophosphate kinase (SEQ ID NO:2), from peppermint (Mentha x piperita). In other aspects, replicable recombinant cloning vehicles are provided which code for isopentenyl monophosphate kinase, or for a base sequence sufficiently complementary to at least a portion of isopentenyl monophosphate kinase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding isopentenyl monophosphate kinase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant isopentenyl monophosphate kinase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant isopentenyl monophosphate kinase may be used to obtain expression or enhanced expression of isopentenyl monophosphate kinase in plants in order to enhance the production of isopentenyl monophosphate kinase, or isoprenoids derived therefrom, or may be otherwise employed for the regulation or expression of isopentenyl monophosphate kinase, or the production of its products.
Wang, S Y; Huo, J L; Miao, Y W; Cheng, W M; Zeng, Y Z
2013-04-02
U2 small nuclear RNA auxiliary factor 2 (U2AF2) is an important gene for pre-messenger RNA splicing in higher eukaryotes. In this study, the Banna mini-pig inbred line (BMI) U2AF2 coding sequence (CDS) was cloned, sequenced, and characterized. The U2AF2 complete CDS was amplified using the reverse transcription-polymerase chain reaction (RT-PCR) technique based on the conserved sequence information of cattle and known highly homologous swine expressed sequence tags. This novel gene was deposited into the National Center for Biotechnology Information database (Accession No. JQ839267). Sequence analysis revealed that the BMI U2AF2 coding sequence consisted of 1416 bp and encoded 471 amino acids with a molecular weight of 53.12 kDa. The protein sequence has high sequence homology with U2AF65 of 6 species - Homo sapiens (100%), Equus caballus (100%), Canis lupus (100%), Macaca mulatta (99.8%), Bos taurus (74.4%), and Mus musculus (74.4%). The phylogenetic tree analysis revealed that BMI U2AF65 has a closer genetic relationship with B. taurus U2AF65 than with U2AF65 of E. caballus, C. lupus, M. mulatta, H. sapiens, and M. musculus. RT-PCR analysis showed that BMI U2AF2 was most highly expressed in the brain; moderately expressed in the spleen, lung, muscle, and skin; and weakly expressed in the liver, kidney, and ovary. Its expression was nearly silent in the spinal cord, nerve fiber, heart, stomach, pancreas, and intestine. Three microRNA target sites were predicted in the CDS of BMI U2AF2 messenger RNA. Our results establish a foundation for further insight into this swine gene.
Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L
2010-07-01
Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.
Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.
2010-01-01
Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087
Snyder, Lindsey L.; Esser, Jonathan M.; Pachuk, Catherine J.; Steel, Laura F.
2008-01-01
RNA interference (RNAi) is a process that can target intracellular RNAs for degradation in a highly sequence specific manner, making it a powerful tool that is being pursued in both research and therapeutic applications. Hepatitis B virus (HBV) is a serious public health problem in need of better treatment options, and aspects of its life cycle make it an excellent target for RNAi-based therapeutics. We have designed a vector that expresses interfering RNAs that target HBV transcripts, including both viral RNA replicative intermediates and mRNAs encoding viral proteins. Our vector design incorporates many features of endogenous microRNA (miRNA) gene organization that are proving useful for the development of reagents for RNAi. In particular, our vector contains an RNA pol II driven gene cassette that leads to tissue specific expression and efficient processing of multiple interfering RNAs from a single transcript, without the co-expression of any protein product. This vector shows potent silencing of HBV targets in cell culture models of HBV infection. The vector design will be applicable to silencing of additional cellular or disease-related genes. PMID:18499277
Faccio, Greta; Kruus, Kristiina; Buchert, Johanna; Saloheimo, Markku
2010-08-20
Sulfhydryl oxidases are flavin-dependent enzymes that catalyse the formation of de novo disulfide bonds from free thiol groups, with the reduction of molecular oxygen to hydrogen peroxide. Sulfhydryl oxidases have been investigated in the food industry to remove the burnt flavour of ultraheat-treated milk and are currently studied as potential crosslinking enzymes, aiming at strengthening wheat dough and improving the overall bread quality. In the present study, potential sulfhydryl oxidases were identified in the publicly available fungal genome sequences and their sequence characteristics were studied. A representative sulfhydryl oxidase from Aspergillus oryzae, AoSOX1, was expressed in the fungus Trichoderma reesei. AoSOX1 was produced in relatively good yields and was purified and biochemically characterised. The enzyme catalysed the oxidation of thiol-containing compounds like glutathione, D/L-cysteine, beta-mercaptoethanol and DTT. The enzyme had a melting temperature of 57°C, a pH optimum of 7.5 and its enzymatic activity was completely inhibited in the presence of 1 mM ZnSO4. Eighteen potentially secreted sulfhydryl oxidases were detected in the publicly available fungal genomes analysed and a novel proline-tryptophan dipeptide in the characteristic motif CXXC, where X is any amino acid, was found. A representative protein, AoSOX1 from A. oryzae, was produced in T. reesei in an active form and had the characteristics of sulfhydryl oxidases. Further testing of the activity on thiol groups within larger peptides and on protein level will be needed to assess the application potential of this enzyme.
Huntley, Stuart; Baggott, Daniel M.; Hamilton, Aaron T.; Tran-Gyamfi, Mary; Yang, Shan; Kim, Joomyeong; Gordon, Laurie; Branscomb, Elbert; Stubbs, Lisa
2006-01-01
Krüppel-type zinc finger (ZNF) motifs are prevalent components of transcription factor proteins in all eukaryotes. KRAB-ZNF proteins, in which a potent repressor domain is attached to a tandem array of DNA-binding zinc-finger motifs, are specific to tetrapod vertebrates and represent the largest class of ZNF proteins in mammals. To define the full repertoire of human KRAB-ZNF proteins, we searched the genome sequence for key motifs and then constructed and manually curated gene models incorporating those sequences. The resulting gene catalog contains 423 KRAB-ZNF protein-coding loci, yielding alternative transcripts that altogether predict at least 742 structurally distinct proteins. Active rounds of segmental duplication, involving single genes or larger regions and including both tandem and distributed duplication events, have driven the expansion of this mammalian gene family. Comparisons between the human genes and ZNF loci mined from the draft mouse, dog, and chimpanzee genomes not only identified 103 KRAB-ZNF genes that are conserved in mammals but also highlighted a substantial level of lineage-specific change; at least 136 KRAB-ZNF coding genes are primate specific, including many recent duplicates. KRAB-ZNF genes are widely expressed and clustered genes are typically not coregulated, indicating that paralogs have evolved to fill roles in many different biological processes. To facilitate further study, we have developed a Web-based public resource with access to gene models, sequences, and other data, including visualization tools to provide genomic context and interaction with other public data sets. PMID:16606702
[Isolation and function of genes regulating aphB expression in Vibrio cholerae].
Chen, Haili; Zhu, Zhaoqin; Zhong, Zengtao; Zhu, Jun; Kan, Biao
2012-02-04
We identified genes that regulate the expression of aphB, the gene encoding a key virulence regulator in Vibrio cholerae O1 E1 Tor C6706(-). We constructed a transposon library in V. cholerae C6706 strain containing a P(aphB)-luxCDABE and P(aphB)-lacZ transcriptional reporter plasmids. Using a chemiluminescence imager system, we rapidly detected aphB promoter expression level at a large scale. We then sequenced the transposon insertion sites by arbitrary PCR and sequencing analysis. We obtained two candidate mutants T1 and T2 which displayed reduced aphB expression from approximately 40,000 transposon insertion mutants. Sequencing analysis shows that Tn inserted in vc1585 reading frame in the T1 mutant and Tn inserted in the end of coding sequence of vc1602 in the T2 mutant. By using a genetic screen, we identified two potential genes that may involve in regulation of the expression of the key virulence regulator AphB. This study sheds light on our further investigation to fully understand V. cholerae virulence gene regulatory cascades.
Solanum torvum responses to the root-knot nematode Meloidogyne incognita
2013-01-01
Background Solanum torvum Sw is worldwide employed as rootstock for eggplant cultivation because of its vigour and resistance/tolerance to the most serious soil-borne diseases as bacterial, fungal wilts and root-knot nematodes. The little information on Solanum torvum (hereafter Torvum) resistance mechanisms, is mostly attributable to the lack of genomic tools (e.g. dedicated microarray) as well as to the paucity of database information limiting high-throughput expression studies in Torvum. Results As a first step towards transcriptome profiling of Torvum inoculated with the nematode M. incognita, we built a Torvum 3’ transcript catalogue. One-quarter of a 454 full run resulted in 205,591 quality-filtered reads. De novo assembly yielded 24,922 contigs and 11,875 singletons. Similarity searches of the S. torvum transcript tags catalogue produced 12,344 annotations. A 30,0000 features custom combimatrix chip was then designed and microarray hybridizations were conducted for both control and 14 dpi (day post inoculation) with Meloidogyne incognita-infected roots samples resulting in 390 differentially expressed genes (DEG). We also tested the chip with samples from the phylogenetically-related nematode-susceptible eggplant species Solanum melongena. An in-silico validation strategy was developed based on assessment of sequence similarity among Torvum probes and eggplant expressed sequences available in public repositories. GO term enrichment analyses with the 390 Torvum DEG revealed enhancement of several processes as chitin catabolism and sesquiterpenoids biosynthesis, while no GO term enrichment was found with eggplant DEG. The genes identified from S. torvum catalogue, bearing high similarity to known nematode resistance genes, were further investigated in view of their potential role in the nematode resistance mechanism. Conclusions By combining 454 pyrosequencing and microarray technology we were able to conduct a cost-effective global transcriptome profiling in a non-model species. In addition, the development of an in silico validation strategy allowed to further extend the use of the custom chip to a related species and to assess by comparison the expression of selected genes without major concerns of artifacts. The expression profiling of S. torvum responses to nematode infection points to sesquiterpenoids and chitinases as major effectors of nematode resistance. The availability of the long sequence tags in S. torvum catalogue will allow precise identification of active nematocide/nematostatic compounds and associated enzymes posing the basis for exploitation of these resistance mechanisms in other species. PMID:23937585
Functional Analysis of Maize Silk-Specific ZmbZIP25 Promoter.
Li, Wanying; Yu, Dan; Yu, Jingjuan; Zhu, Dengyun; Zhao, Qian
2018-03-12
ZmbZIP25 ( Zea mays bZIP (basic leucine zipper) transcription factor 25) is a function-unknown protein that belongs to the D group of the bZIP transcription factor family. RNA-seq data showed that the expression of ZmbZIP25 was tissue-specific in maize silks, and this specificity was confirmed by RT-PCR (reverse transcription-polymerase chain reaction). In situ RNA hybridization showed that ZmbZIP25 was expressed exclusively in the xylem of maize silks. A 5' RACE (rapid amplification of cDNA ends) assay identified an adenine residue as the transcription start site of the ZmbZIP25 gene. To characterize this silk-specific promoter, we isolated and analyzed a 2450 bp (from -2083 to +367) and a 2600 bp sequence of ZmbZIP25 (from -2083 to +517, the transcription start site was denoted +1). Stable expression assays in Arabidopsis showed that the expression of the reporter gene GUS driven by the 2450 bp ZmbZIP25 5'-flanking fragment occurred exclusively in the papillae of Arabidopsis stigmas. Furthermore, transient expression assays in maize indicated that GUS and GFP expression driven by the 2450 bp ZmbZIP25 5'-flanking sequences occurred only in maize silks and not in other tissues. However, no GUS or GFP expression was driven by the 2600 bp ZmbZIP25 5'-flanking sequences in either stable or transient expression assays. A series of deletion analyses of the 2450 bp ZmbZIP25 5'-flanking sequence was performed in transgenic Arabidopsis plants, and probable elements prediction analysis revealed the possible presence of negative regulatory elements within the 161 bp region from -1117 to -957 that were responsible for the specificity of the ZmbZIP25 5'-flanking sequence.
Functional Analysis of Maize Silk-Specific ZmbZIP25 Promoter
Li, Wanying; Yu, Dan; Yu, Jingjuan; Zhu, Dengyun; Zhao, Qian
2018-01-01
ZmbZIP25 (Zea mays bZIP (basic leucine zipper) transcription factor 25) is a function-unknown protein that belongs to the D group of the bZIP transcription factor family. RNA-seq data showed that the expression of ZmbZIP25 was tissue-specific in maize silks, and this specificity was confirmed by RT-PCR (reverse transcription-polymerase chain reaction). In situ RNA hybridization showed that ZmbZIP25 was expressed exclusively in the xylem of maize silks. A 5′ RACE (rapid amplification of cDNA ends) assay identified an adenine residue as the transcription start site of the ZmbZIP25 gene. To characterize this silk-specific promoter, we isolated and analyzed a 2450 bp (from −2083 to +367) and a 2600 bp sequence of ZmbZIP25 (from −2083 to +517, the transcription start site was denoted +1). Stable expression assays in Arabidopsis showed that the expression of the reporter gene GUS driven by the 2450 bp ZmbZIP25 5′-flanking fragment occurred exclusively in the papillae of Arabidopsis stigmas. Furthermore, transient expression assays in maize indicated that GUS and GFP expression driven by the 2450 bp ZmbZIP25 5′-flanking sequences occurred only in maize silks and not in other tissues. However, no GUS or GFP expression was driven by the 2600 bp ZmbZIP25 5′-flanking sequences in either stable or transient expression assays. A series of deletion analyses of the 2450 bp ZmbZIP25 5′-flanking sequence was performed in transgenic Arabidopsis plants, and probable elements prediction analysis revealed the possible presence of negative regulatory elements within the 161 bp region from −1117 to −957 that were responsible for the specificity of the ZmbZIP25 5′-flanking sequence. PMID:29534529
Regulation of Bacteria-Induced Intercellular Adhesion Molecule-1 by CCAAT/Enhancer Binding Proteins
Manzel, Lori J.; Chin, Cecilia L.; Behlke, Mark A.; Look, Dwight C.
2009-01-01
Direct interaction between bacteria and epithelial cells may initiate or amplify the airway response through induction of epithelial defense gene expression by nuclear factor-κB (NF-κB). However, multiple signaling pathways modify NF-κB effects to modulate gene expression. In this study, the effects of CCAAT/enhancer binding protein (C/EBP) family members on induction of the leukocyte adhesion glycoprotein intercellular adhesion molecule-1 (ICAM-1) was examined in primary cultures of human tracheobronchial epithelial cells incubated with nontypeable Haemophilus influenzae. Increased ICAM-1 gene transcription in response to H. influenzae required gene sequences located at −200 to −135 in the 5′-flanking region that contain a C/EBP-binding sequence immediately upstream of the NF-κB enhancer site. Constitutive C/EBPβ was found to have an important role in epithelial cell ICAM-1 regulation, while the adjacent NF-κB sequence binds the RelA/p65 and NF-κB1/p50 members of the NF-κB family to induce ICAM-1 expression in response to H. influenzae. The expression of C/EBP proteins is not regulated by p38 mitogen-activated protein kinase activation, but p38 affects gene transcription by increasing the binding of TATA-binding protein to TATA-box–containing gene sequences. Epithelial cell ICAM-1 expression in response to H. influenzae was decreased by expressing dominant-negative protein or RNA interference against C/EBPβ, confirming its role in ICAM-1 regulation. Although airway epithelial cells express multiple constitutive and inducible C/EBP family members that bind C/EBP sequences, the results indicate that C/EBPβ plays a central role in modulation of NF-κB–dependent defense gene expression in human airway epithelial cells after exposure to H. influenzae. PMID:18703796
Kassar, Telissa C; Magalhães, Tereza; S, José V J; Carvalho, Amanda G O; Silva, Andréa N M R DA; Queiroz, Sabrina R A; Bertani, Giovani R; Gil, Laura H V G
2017-01-01
Yellow fever is an arthropod-borne viral disease that still poses high public health concerns, despite the availability of an effective vaccine. The development of recombinant viruses is of utmost importance for several types of studies, such as those aimed to dissect virus-host interactions and to search for novel antiviral strategies. Moreover, recombinant viruses expressing reporter genes may greatly facilitate these studies. Here, we report the construction of a recombinant yellow fever virus (YFV) expressing Gaussia luciferase (GLuc) (YFV-GLuc). We show, through RT-PCR, sequencing and measurement of GLuc activity, that stability of the heterologous gene was maintained after six passages. Furthermore, a direct association between GLuc expression and viral replication was observed (r2=0.9967), indicating that measurement of GLuc activity may be used to assess viral replication in different applications. In addition, we evaluated the use of the recombinant virus in an antiviral assay with recombinant human alfa-2b interferon. A 60% inhibition of GLuc expression was observed in cells infected with YFV-GLuc and incubated with IFN alfa-2b. Previously tested on YFV inhibition by plaque assays indicated a similar fold-decrease in viral replication. These results are valuable as they show the stability of YFV-GLuc and one of several possible applications of this construct.
Wang, Cheng; Yu, Jie; Kallen, Caleb B
2008-01-01
The proliferating cell nuclear antigen (PCNA) is an essential component of DNA replication, cell cycle regulation, and epigenetic inheritance. High expression of PCNA is associated with poor prognosis in patients with breast cancer. The 5'-region of the PCNA gene contains two computationally-detected estrogen response element (ERE) sequences, one of which is evolutionarily conserved. Both of these sequences are of undocumented cis-regulatory function. We recently demonstrated that estradiol (E2) enhances PCNA mRNA expression in MCF7 breast cancer cells. MCF7 cells proliferate in response to E2. Here, we demonstrate that E2 rapidly enhanced PCNA mRNA and protein expression in a process that requires ERalpha as well as de novo protein synthesis. One of the two upstream ERE sequences was specifically bound by ERalpha-containing protein complexes, in vitro, in gel shift analysis. Yet, each ERE sequence, when cloned as a single copy, or when engineered as two tandem copies of the ERE-containing sequence, was not capable of activating a luciferase reporter construct in response to E2. In MCF7 cells, neither ERE-containing genomic region demonstrated E2-dependent recruitment of ERalpha by sensitive ChIP-PCR assays. We conclude that E2 enhances PCNA gene expression by an indirect process and that computational detection of EREs, even when evolutionarily conserved and when near E2-responsive genes, requires biochemical validation.
Regulatory link between DNA methylation and active demethylation in Arabidopsis
Lei, Mingguang; Zhang, Huiming; Julian, Russell; Tang, Kai; Xie, Shaojun; Zhu, Jian-Kang
2015-01-01
De novo DNA methylation through the RNA-directed DNA methylation (RdDM) pathway and active DNA demethylation play important roles in controlling genome-wide DNA methylation patterns in plants. Little is known about how cells manage the balance between DNA methylation and active demethylation activities. Here, we report the identification of a unique RdDM target sequence, where DNA methylation is required for maintaining proper active DNA demethylation of the Arabidopsis genome. In a genetic screen for cellular antisilencing factors, we isolated several REPRESSOR OF SILENCING 1 (ros1) mutant alleles, as well as many RdDM mutants, which showed drastically reduced ROS1 gene expression and, consequently, transcriptional silencing of two reporter genes. A helitron transposon element (TE) in the ROS1 gene promoter negatively controls ROS1 expression, whereas DNA methylation of an RdDM target sequence between ROS1 5′ UTR and the promoter TE region antagonizes this helitron TE in regulating ROS1 expression. This RdDM target sequence is also targeted by ROS1, and defective DNA demethylation in loss-of-function ros1 mutant alleles causes DNA hypermethylation of this sequence and concomitantly causes increased ROS1 expression. Our results suggest that this sequence in the ROS1 promoter region serves as a DNA methylation monitoring sequence (MEMS) that senses DNA methylation and active DNA demethylation activities. Therefore, the ROS1 promoter functions like a thermostat (i.e., methylstat) to sense DNA methylation levels and regulates DNA methylation by controlling ROS1 expression. PMID:25733903
Nayidu, Naghabushana K.; Kagale, Sateesh; Taheri, Ali; Withana-Gamage, Thushan S.; Parkin, Isobel A. P.; Sharpe, Andrew G.; Gruber, Margaret Y.
2014-01-01
Coding sequences for major trichome regulatory genes, including the positive regulators GLABRA 1(GL1), GLABRA 2 (GL2), ENHANCER OF GLABRA 3 (EGL3), and TRANSPARENT TESTA GLABRA 1 (TTG1) and the negative regulator TRIPTYCHON (TRY), were cloned from wild Brassica villosa, which is characterized by dense trichome coverage over most of the plant. Transcript (FPKM) levels from RNA sequencing indicated much higher expression of the GL2 and TTG1 regulatory genes in B. villosa leaves compared with expression levels of GL1 and EGL3 genes in either B. villosa or the reference genome species, glabrous B. oleracea; however, cotyledon TTG1 expression was high in both species. RNA sequencing and Q-PCR also revealed an unusual expression pattern for the negative regulators TRY and CPC, which were much more highly expressed in trichome-rich B. villosa leaves than in glabrous B. oleracea leaves and in glabrous cotyledons from both species. The B. villosa TRY expression pattern also contrasted with TRY expression patterns in two diploid Brassica species, and with the Arabidopsis model for expression of negative regulators of trichome development. Further unique sequence polymorphisms, protein characteristics, and gene evolution studies highlighted specific amino acids in GL1 and GL2 coding sequences that distinguished glabrous species from hairy species and several variants that were specific for each B. villosa gene. Positive selection was observed for GL1 between hairy and non-hairy plants, and as expected the origin of the four expressed positive trichome regulatory genes in B. villosa was predicted to be from B. oleracea. In particular the unpredicted expression patterns for TRY and CPC in B. villosa suggest additional characterization is needed to determine the function of the expanded families of trichome regulatory genes in more complex polyploid species within the Brassicaceae. PMID:24755905
Tamori, Akihiro; Yamanishi, Yoshihiro; Kawashima, Shuichi; Kanehisa, Minoru; Enomoto, Masaru; Tanaka, Hiromu; Kubo, Shoji; Shiomi, Susumu; Nishiguchi, Shuhei
2005-08-15
Integration of hepatitis B virus (HBV) DNA into the human genome is one of the most important steps in HBV-related carcinogenesis. This study attempted to find the link between HBV DNA, the adjoining cellular sequence, and altered gene expression in hepatocellular carcinoma (HCC) with integrated HBV DNA. We examined 15 cases of HCC infected with HBV by cassette ligation-mediated PCR. The human DNA adjacent to the integrated HBV DNA was sequenced. Protein coding sequences were searched for in the human sequence. In five cases with HBV DNA integration, from which good quality RNA was extracted, gene expression was examined by cDNA microarray analysis. The human DNA sequence successive to integrated HBV DNA was determined in the 15 HCCs. Eight protein-coding regions were involved: ras-responsive element binding protein 1, calmodulin 1, mixed lineage leukemia 2 (MLL2), FLJ333655, LOC220272, LOC255345, LOC220220, and LOC168991. The MLL2 gene was expressed in three cases with HBV DNA integrated into exon 3 of MLL2 and in one case with HBV DNA integrated into intron 3 of MLL2. Gene expression analysis suggested that two HCCs with HBV integrated into MLL2 had similar patterns of gene expression compared with three HCCs with HBV integrated into other loci of human chromosomes. HBV DNA was integrated at random sites of human DNA, and the MLL2 gene was one of the targets for integration. Our results suggest that HBV DNA might modulate human genes near integration sites, followed by integration site-specific expression of such genes during hepatocarcinogenesis.
2013-01-01
Background Ginger (Zingiber officinale) and turmeric (Curcuma longa) accumulate important pharmacologically active metabolites at high levels in their rhizomes. Despite their importance, relatively little is known regarding gene expression in the rhizomes of ginger and turmeric. Results In order to identify rhizome-enriched genes and genes encoding specialized metabolism enzymes and pathway regulators, we evaluated an assembled collection of expressed sequence tags (ESTs) from eight different ginger and turmeric tissues. Comparisons to publicly available sorghum rhizome ESTs revealed a total of 777 gene transcripts expressed in ginger/turmeric and sorghum rhizomes but apparently absent from other tissues. The list of rhizome-specific transcripts was enriched for genes associated with regulation of tissue growth, development, and transcription. In particular, transcripts for ethylene response factors and AUX/IAA proteins appeared to accumulate in patterns mirroring results from previous studies regarding rhizome growth responses to exogenous applications of auxin and ethylene. Thus, these genes may play important roles in defining rhizome growth and development. Additional associations were made for ginger and turmeric rhizome-enriched MADS box transcription factors, their putative rhizome-enriched homologs in sorghum, and rhizomatous QTLs in rice. Additionally, analysis of both primary and specialized metabolism genes indicates that ginger and turmeric rhizomes are primarily devoted to the utilization of leaf supplied sucrose for the production and/or storage of specialized metabolites associated with the phenylpropanoid pathway and putative type III polyketide synthase gene products. This finding reinforces earlier hypotheses predicting roles of this enzyme class in the production of curcuminoids and gingerols. Conclusion A significant set of genes were found to be exclusively or preferentially expressed in the rhizome of ginger and turmeric. Specific transcription factors and other regulatory genes were found that were common to the two species and that are excellent candidates for involvement in rhizome growth, differentiation and development. Large classes of enzymes involved in specialized metabolism were also found to have apparent tissue-specific expression, suggesting that gene expression itself may play an important role in regulating metabolite production in these plants. PMID:23410187
Plant nitrogen regulatory P-PII genes
Coruzzi, Gloria M.; Lam, Hon-Ming; Hsieh, Ming-Hsiun
2001-01-01
The present invention generally relates to plant nitrogen regulatory PII gene (hereinafter P-PII gene), a gene involved in regulating plant nitrogen metabolism. The invention provides P-PII nucleotide sequences, expression constructs comprising said nucleotide sequences, and host cells and plants having said constructs and, optionally expressing the P-PII gene from said constructs. The invention also provides substantially pure P-PII proteins. The P-PII nucleotide sequences and constructs of the
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chattopadhyay, Saket; Ely, Abdullah; Bloom, Kristie
2009-11-20
RNA interference (RNAi) may be harnessed to inhibit viral gene expression and this approach is being developed to counter chronic infection with hepatitis B virus (HBV). Compared to synthetic RNAi activators, DNA expression cassettes that generate silencing sequences have advantages of sustained efficacy and ease of propagation in plasmid DNA (pDNA). However, the large size of pDNAs and inclusion of sequences conferring antibiotic resistance and immunostimulation limit delivery efficiency and safety. To develop use of alternative DNA templates that may be applied for therapeutic gene silencing, we assessed the usefulness of PCR-generated linear expression cassettes that produce anti-HBV micro-RNA (miR)more » shuttles. We found that silencing of HBV markers of replication was efficient (>75%) in cell culture and in vivo. miR shuttles were processed to form anti-HBV guide strands and there was no evidence of induction of the interferon response. Modification of terminal sequences to include flanking human adenoviral type-5 inverted terminal repeats was easily achieved and did not compromise silencing efficacy. These linear DNA sequences should have utility in the development of gene silencing applications where modifications of terminal elements with elimination of potentially harmful and non-essential sequences are required.« less
Jeong, Seongmun; Kim, Jiwoong; Park, Won; Jeon, Hongmin; Kim, Namshin
2017-01-01
Over the last decade, a large number of nucleotide sequences have been generated by next-generation sequencing technologies and deposited to public databases. However, most of these datasets do not specify the sex of individuals sampled because researchers typically ignore or hide this information. Male and female genomes in many species have distinctive sex chromosomes, XX/XY and ZW/ZZ, and expression levels of many sex-related genes differ between the sexes. Herein, we describe how to develop sex marker sequences from syntenic regions of sex chromosomes and use them to quickly identify the sex of individuals being analyzed. Array-based technologies routinely use either known sex markers or the B-allele frequency of X or Z chromosomes to deduce the sex of an individual. The same strategy has been used with whole-exome/genome sequence data; however, all reads must be aligned onto a reference genome to determine the B-allele frequency of the X or Z chromosomes. SEXCMD is a pipeline that can extract sex marker sequences from reference sex chromosomes and rapidly identify the sex of individuals from whole-exome/genome and RNA sequencing after training with a known dataset through a simple machine learning approach. The pipeline counts total numbers of hits from sex-specific marker sequences and identifies the sex of the individuals sampled based on the fact that XX/ZZ samples do not have Y or W chromosome hits. We have successfully validated our pipeline with mammalian (Homo sapiens; XY) and avian (Gallus gallus; ZW) genomes. Typical calculation time when applying SEXCMD to human whole-exome or RNA sequencing datasets is a few minutes, and analyzing human whole-genome datasets takes about 10 minutes. Another important application of SEXCMD is as a quality control measure to avoid mixing samples before bioinformatics analysis. SEXCMD comprises simple Python and R scripts and is freely available at https://github.com/lovemun/SEXCMD.
Zavalova, L; Lukyanov, S; Baskova, I; Snezhkov, E; Akopov, S; Berezhnoy, S; Bogdanova, E; Barsova, E; Sverdlov, E D
1996-11-27
We previously detected in salivary gland secretions of the medicinal leech (Hirudo medicinalis) a novel enzymatic activity, endo-epsilon(gamma-Glu)-Lys isopeptidase, which cleaves isopeptide bonds formed by transglutaminase (Factor XIIIa) between glutamine gamma-carboxamide and the epsilon-amino group of lysine. Such isopeptide bonds, either within or between protein polypeptide chains are formed in many biological processes. However, before we started our work no enzymes were known to be capable of specifically splitting isopeptide bonds in proteins. The isopeptidase activity we detected was specific for isopeptide bonds. The enzyme was termed destabilase. Here we report the first purification of destabilase, part of its amino acid sequence isolation and sequencing of two related cDNAs derived from the gene family that encodes destabilase proteins, and the detection of isopeptidase activity encoded by one of these cDNAs cloned in a baculovirus expression vector. The deduced mature protein products of these cDNAs contain 115 and 116 amino acid residues, including 14 highly conserved Cys residues, and are formed from precursors containing specific leader peptides. No homologous sequences were found in public databases.
Agarwala, Prachi; Pandey, Satyaprakash; Mapa, Koyeli; Maiti, Souvik
2013-03-05
Transforming growth factor β2 (TGFβ2) is a versatile cytokine with a prominent role in cell migration, invasion, cellular development, and immunomodulation. TGFβ2 promotes the malignancy of tumors by inducing epithelial-mesenchymal transition, angiogenesis, and immunosuppression. As it is well-documented that nucleic acid secondary structure can regulate gene expression, we assessed whether any secondary motif regulates its expression at the post-transcriptional level. Bioinformatics analysis predicts an existence of a 23-nucleotide putative G-quadruplex sequence (PG4) in the 5' untranslated region (UTR) of TGFβ2 mRNA. The ability of this stretch of sequence to form a highly stable, intramolecular parallel quadruplex was demonstrated using ultraviolet and circular dichroism spectroscopy. Footprinting studies further validated its existence in the presence of a neighboring nucleotide sequence. Following structural characterization, we evaluated the biological relevance of this secondary motif using a dual luciferase assay. Although PG4 inhibits the expression of the reporter gene, its presence in the context of the entire 5' UTR sequence interestingly enhances gene expression. Mutation or removal of the G-quadruplex sequence from the 5' UTR of the gene diminished the level of expression of this gene at the translational level. Thus, here we highlight an activating role of the G-quadruplex in modulating gene expression of TGFβ2 at the translational level and its potential to be used as a target for the development of therapeutics against cancer.
Sacramento, C B; Moraes, J Z; Denapolis, P M A; Han, S W
2010-08-01
The main objective of the present study was to find suitable DNA-targeting sequences (DTS) for the construction of plasmid vectors to be used to treat ischemic diseases. The well-known Simian virus 40 nuclear DTS (SV40-DTS) and hypoxia-responsive element (HRE) sequences were used to construct plasmid vectors to express the human vascular endothelial growth factor gene (hVEGF). The rate of plasmid nuclear transport and consequent gene expression under normoxia (20% O2) and hypoxia (less than 5% O2) were determined. Plasmids containing the SV40-DTS or HRE sequences were constructed and used to transfect the A293T cell line (a human embryonic kidney cell line) in vitro and mouse skeletal muscle cells in vivo. Plasmid transport to the nucleus was monitored by real-time PCR, and the expression level of the hVEGF gene was measured by ELISA. The in vitro nuclear transport efficiency of the SV40-DTS plasmid was about 50% lower under hypoxia, while the HRE plasmid was about 50% higher under hypoxia. Quantitation of reporter gene expression in vitro and in vivo, under hypoxia and normoxia, confirmed that the SV40-DTS plasmid functioned better under normoxia, while the HRE plasmid was superior under hypoxia. These results indicate that the efficiency of gene expression by plasmids containing DNA binding sequences is affected by the concentration of oxygen in the medium.
Li, XiaoChing; Wang, Xiu-Jie; Tannenhauser, Jonathan; Podell, Sheila; Mukherjee, Piali; Hertel, Moritz; Biane, Jeremy; Masuda, Shoko; Nottebohm, Fernando; Gaasterland, Terry
2007-01-01
Vocal learning and neuronal replacement have been studied extensively in songbirds, but until recently, few molecular and genomic tools for songbird research existed. Here we describe new molecular/genomic resources developed in our laboratory. We made cDNA libraries from zebra finch (Taeniopygia guttata) brains at different developmental stages. A total of 11,000 cDNA clones from these libraries, representing 5,866 unique gene transcripts, were randomly picked and sequenced from the 3′ ends. A web-based database was established for clone tracking, sequence analysis, and functional annotations. Our cDNA libraries were not normalized. Sequencing ESTs without normalization produced many developmental stage-specific sequences, yielding insights into patterns of gene expression at different stages of brain development. In particular, the cDNA library made from brains at posthatching day 30–50, corresponding to the period of rapid song system development and song learning, has the most diverse and richest set of genes expressed. We also identified five microRNAs whose sequences are highly conserved between zebra finch and other species. We printed cDNA microarrays and profiled gene expression in the high vocal center of both adult male zebra finches and canaries (Serinus canaria). Genes differentially expressed in the high vocal center were identified from the microarray hybridization results. Selected genes were validated by in situ hybridization. Networks among the regulated genes were also identified. These resources provide songbird biologists with tools for genome annotation, comparative genomics, and microarray gene expression analysis. PMID:17426146
Hussain, Khalid; Mungikar, Kanak; Kulkarni, Abhijeet; Kamble, Avinash
2018-05-05
Upon confrontation with unfavourable conditions, plants invoke a very complex set of biochemical and physiological reactions and alter gene expression patterns to combat the situations. MicroRNAs (miRNAs), a class of small non-coding RNA, contribute extensively in regulation of gene expression through translation inhibition or degradation of their target mRNAs during such conditions. Therefore, identification of miRNAs and their targets holds importance in understanding the regulatory networks triggered during stress. Structure and sequence similarity based in silico prediction of miRNAs in Cajanus cajan L. (Pigeonpea) draft genome sequence has been carried out earlier. These annotations also appear in related GenBank genome sequence entries. However, there are no reports available on context dependent miRNA expression and their targets in pigeonpea. Therefore, in the present study we addressed these questions computationally, using pigeonpea EST sequence information. We identified five novel pigeonpea miRNA precursors, their mature forms and targets. Interestingly, only one of these miRNAs (miR169i-3p) was identified earlier in draft genome sequence. We then validated expression of these miRNAs, experimentally. It was also observed that these miRNAs show differential expression patterns in response to Fusarium inoculation indicating their biotic stress responsive nature. Overall these results will help towards better understanding the regulatory network of defense during pigeonpea -pathogen interactions and role of miRNAs in the process. Copyright © 2018 Elsevier B.V. All rights reserved.
Making sense of deep sequencing
Goldman, D.; Domschke, K.
2016-01-01
This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of ‘big data’, to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306
Zhu, Xiao; Kong, Qingming; Xie, Liwei; Chen, Zhihong; Li, Hongmei; Zhu, Zhu; Huang, Yongmei; Lan, Feifei; Luo, Haiqing; Zhan, Jingting; Ding, Hongrong; Lei, Jinli; Xiao, Qin; Fu, Weiming; Fan, Wenguo; Zhang, Jinfang; Luo, Hui
2018-01-01
Previous studies showed that the low expressions of chromodomain-helicase-DNA-binding protein 5 (CHD5) were intensively associated with deteriorative biologic and clinical characteristics as well as outcomes in many tumors. The aim of this study is to determine whether CHD5 single nucleotide polymorphisms (SNPs) contribute to the prognosis of hepatocellular carcima (HCC). The SNPs were selected according to their linkage disequilibrium (LD) in the targeted next-generation sequencing (NGS) and then genotyped with TaqMan probers. We revealed a rare haplotype AG in CHD5 (SNPs: rs12564469-rs9434711) was markedly associated with HCC prognosis. The univariate and multivariate regression analyses revealed the patients with worse overall survival time were those with tumor metastasis and haplotype AG, as well as cirrhosis, poor differentiation and IV-TNM stage. Based on the available public databases, we discovered the significant association between haplotype AG and CHD5 mRNA expressions only existed in Chinese. These data proposed that the potentially genetic haplotype might functionally contribute to HCC prognosis and CHD5 mRNA expressions. PMID:29568352
Metabolism and Genetics of Helicobacter pylori: the Genome Era
Marais, Armelle; Mendz, George L.; Hazell, Stuart L.; Mégraud, Francis
1999-01-01
The publication of the complete sequence of Helicobacter pylori 26695 in 1997 and more recently that of strain J99 has provided new insight into the biology of this organism. In this review, we attempt to analyze and interpret the information provided by sequence annotations and to compare these data with those provided by experimental analyses. After a brief description of the general features of the genomes of the two sequenced strains, the principal metabolic pathways are analyzed. In particular, the enzymes encoded by H. pylori involved in fermentative and oxidative metabolism, lipopolysaccharide biosynthesis, nucleotide biosynthesis, aerobic and anaerobic respiration, and iron and nitrogen assimilation are described, and the areas of controversy between the experimental data and those provided by the sequence annotation are discussed. The role of urease, particularly in pH homeostasis, and other specialized mechanisms developed by the bacterium to maintain its internal pH are also considered. The replicational, transcriptional, and translational apparatuses are reviewed, as is the regulatory network. The numerous findings on the metabolism of the bacteria and the paucity of gene expression regulation systems are indicative of the high level of adaptation to the human gastric environment. Arguments in favor of the diversity of H. pylori and molecular data reflecting possible mechanisms involved in this diversity are presented. Finally, we compare the numerous experimental data on the colonization factors and those provided from the genome sequence annotation, in particular for genes involved in motility and adherence of the bacterium to the gastric tissue. PMID:10477311
Grove, J R; Deutsch, P J; Price, D J; Habener, J F; Avruch, J
1989-11-25
Plasmids that encode a bioactive amino-terminal fragment of the heat-stable inhibitor of the cAMP-dependent protein kinase, PKI(1-31), were employed to characterize the role of this protein kinase in the control of transcriptional activity mediated by three DNA regulatory elements in the JEG-3 human placental cell line. The 5'-flanking sequence of the human collagenase gene contains the heptameric sequence, 5'-TGAGTCA-3', previously identified as a "phorbol ester" response element. Reporter genes containing either the intact 1.2-kilobase 5'-flanking sequence from the human collagenase gene or just the 7-base pair (bp) response element, when coupled to an enhancerless promoter, each exhibit both cAMP and phorbol ester-stimulated expression in JEG-3 cells. Cotransfection of either construct with plasmids encoding PKI(1-31) inhibits cAMP-stimulated but not basal- or phorbol ester-stimulated expression. Pretreatment of cells with phorbol ester for 1 or 2 days abrogates completely the response to rechallenge with phorbol ester but does not alter the basal expression of either construct; cAMP-stimulated expression, while modestly inhibited, remains vigorous. The 5'-flanking sequence of the human chorionic gonadotropin-alpha subunit (HCG alpha) gene has two copies of the sequence, 5'-TGACGTCA-3', contained in directly adjacent identical 18-bp segments, previously identified as a cAMP-response element. Reporter genes containing either the intact 1.5 kilobase of 5'-flanking sequence from the HCG alpha gene, or just the 36-bp tandem repeat cAMP response element, when coupled to an enhancerless promoter, both exhibit a vigorous cAMP stimulation of expression but no response to phorbol ester in JEG-3 cells. Cotransfection with plasmids encoding PKI(1-31) inhibits both basal and cAMP-stimulated expression in a parallel fashion. The 5'-flanking sequence of the human enkephalin gene mediates cAMP-stimulated expression of reporter genes in both JEG-3 and CV-1 cells. Plasmids encoding PKI(1-31) inhibit the expression that is stimulated by the addition of cAMP analogs in both cell lines; basal expression, however, is inhibited by PKI(1-31) only in the JEG-3 cell line and not in the CV-1 cells. These observations indicate that, in JEG-3 cells, PKI(1-31) is a specific inhibitor of kinase A-mediated gene transcription, but it does not modify kinase C-directed transcription.(ABSTRACT TRUNCATED AT 400 WORDS)
Newborn Sequencing in Genomic Medicine and Public Health.
Berg, Jonathan S; Agrawal, Pankaj B; Bailey, Donald B; Beggs, Alan H; Brenner, Steven E; Brower, Amy M; Cakici, Julie A; Ceyhan-Birsoy, Ozge; Chan, Kee; Chen, Flavia; Currier, Robert J; Dukhovny, Dmitry; Green, Robert C; Harris-Wai, Julie; Holm, Ingrid A; Iglesias, Brenda; Joseph, Galen; Kingsmore, Stephen F; Koenig, Barbara A; Kwok, Pui-Yan; Lantos, John; Leeder, Steven J; Lewis, Megan A; McGuire, Amy L; Milko, Laura V; Mooney, Sean D; Parad, Richard B; Pereira, Stacey; Petrikin, Joshua; Powell, Bradford C; Powell, Cynthia M; Puck, Jennifer M; Rehm, Heidi L; Risch, Neil; Roche, Myra; Shieh, Joseph T; Veeraraghavan, Narayanan; Watson, Michael S; Willig, Laurel; Yu, Timothy W; Urv, Tiina; Wise, Anastasia L
2017-02-01
The rapid development of genomic sequencing technologies has decreased the cost of genetic analysis to the extent that it seems plausible that genome-scale sequencing could have widespread availability in pediatric care. Genomic sequencing provides a powerful diagnostic modality for patients who manifest symptoms of monogenic disease and an opportunity to detect health conditions before their development. However, many technical, clinical, ethical, and societal challenges should be addressed before such technology is widely deployed in pediatric practice. This article provides an overview of the Newborn Sequencing in Genomic Medicine and Public Health Consortium, which is investigating the application of genome-scale sequencing in newborns for both diagnosis and screening. Copyright © 2017 by the American Academy of Pediatrics.
An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets
2010-01-01
Background The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value. Findings We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations. Conclusions TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data in a CASAVA-build into functional annotations while producing corresponding gene expression measurements. Achieving such analysis is executed in an ultrafast and highly efficient manner, whether the analysis be a single-read or paired-end sequencing experiment. TASE is a user-friendly and freely available application, allowing rapid analysis and annotation of any given Illumina Solexa sequencing dataset with ease. PMID:20598141
RHOMBOID DOMAIN CONTAINING 2 (RHBDD2)
Abba, MC; Lacunza, E; Nunez, MI; Colussi, A; Isla-Larrain, M; Segal-Eiras, A; Croce, MV; Aldaz, CM
2009-01-01
In the course of breast cancer global gene expression studies, we identified an uncharacterized gene known as RHBDD2 (Rhomboid domain containing 2) to be markedly over-expressed in primary tumors from patients with recurrent disease. In this study, we identified RHBDD2 mRNA and protein expression significantly elevated in breast carcinomas compared with normal breast samples as analyzed by SAGE (n=46) and immunohistochemistry (n=213). Interestingly, specimens displaying RHBDD2 over-expression were predominantly advanced stage III breast carcinomas (p=0.001). Western-blot, RT-PCR and cDNA sequencing analyses allowed us to identify two RHBDD2 alternatively spliced mRNA isoforms expressed in breast cancer cell lines. We further investigated the occurrence and frequency of gene amplification and over-expression affecting RHBDD2 in 131 breast samples. RHBDD2 gene amplification was detected in 21% of 98 invasive breast carcinomas analyzed. However, no RHBDD2 amplification was detected in normal breast tissues (n=17) or breast benign lesions (n=16) (p=0.014). Interestingly, siRNA mediated silencing of RHBDD2 expression results in a decrease of MCF7 breast cancer cells proliferation compared with the corresponding controls (p=0.001). In addition, analysis of publicly available gene expression data showed a strong association between high RHBDD2 expression and decreased overall survival (p=0.0023), relapse-free survival (p= 0.0013), and metastasis-free interval (p=0.006) in patients with primary ER-negative breast carcinomas. In conclusion, our findings suggest that RHBDD2 over-expression behaves as an indicator of poor prognosis and may play a role facilitating breast cancer progression. PMID:19616622
Asamizu, Erika; Nakamura, Yasukazu; Sato, Shusei; Tabata, Satoshi
2004-02-01
To perform a comprehensive analysis of genes expressed in a model legume, Lotus japonicus, a total of 74472 3'-end expressed sequence tags (EST) were generated from cDNA libraries produced from six different organs. Clustering of sequences was performed with an identity criterion of 95% for 50 bases, and a total of 20457 non-redundant sequences, 8503 contigs and 11954 singletons were generated. EST sequence coverage was analyzed by using the annotated L. japonicus genomic sequence and 1093 of the 1889 predicted protein-encoding genes (57.9%) were hit by the EST sequence(s). Gene content was compared to several plant species. Among the 8503 contigs, 471 were identified as sequences conserved only in leguminous species and these included several disease resistance-related genes. This suggested that in legumes, these genes may have evolved specifically to resist pathogen attack. The rate of gene sequence divergence was assessed by comparing similarity level and functional category based on the Gene Ontology (GO) annotation of Arabidopsis genes. This revealed that genes encoding ribosomal proteins, as well as those related to translation, photosynthesis, and cellular structure were more abundantly represented in the highly conserved class, and that genes encoding transcription factors and receptor protein kinases were abundantly represented in the less conserved class. To make the sequence information and the cDNA clones available to the research community, a Web database with useful services was created at http://www.kazusa.or.jp/en/plant/lotus/EST/.
Sequencing Stories in Spanish and English.
ERIC Educational Resources Information Center
Steckbeck, Pamela Meza
The guide was designed for speech pathologists, bilingual teachers, and specialists in English as a second language who work with Spanish-speaking children. The guide contains twenty illustrated stories that facilitate the learning of auditory sequencing, auditory and visual memory, receptive and expressive vocabulary, and expressive language…
Brady, J; Radonovich, M; Thoren, M; Das, G; Salzman, N P
1984-01-01
We have previously identified an 11-base DNA sequence, 5'-G-G-T-A-C-C-T-A-A-C-C-3' (simian virus 40 [SV40] map position 294 to 304), which is important in the control of SV40 late RNA expression in vitro and in vivo (Brady et al., Cell 31:625-633, 1982). We report here the identification of another domain of the SV40 late promoter. A series of mutants with deletions extending from SV40 map position 0 to 300 was prepared by nuclease BAL 31 treatment. The cloned templates were then analyzed for efficiency and accuracy of late SV40 RNA expression in the Manley in vitro transcription system. Our studies showed that, in addition to the promoter domain near map position 300, there are essential DNA sequences between nucleotide positions 74 and 95 that are required for efficient expression of late SV40 RNA. Included in this SV40 DNA sequence were two of the six GGGCGG SV40 repeat sequences and an 11-nucleotide segment which showed strong homology with the upstream sequences required for the efficient in vitro and in vivo expression of the histone H2A gene. This upstream promoter sequence supported transcription with the same efficiency even when it was moved 72 nucleotides closer to the major late cap site. In vitro promoter competition analysis demonstrated that the upstream promoter sequence, independent of the 294 to 304 promoter element, is capable of binding polymerase-transcription factors required for SV40 late gene transcription. Finally, we show that DNA sequences which control the specificity of RNA initiation at nucleotide 325 lie downstream of map position 294. Images PMID:6321950
Superior Cross-Species Reference Genes: A Blueberry Case Study
Die, Jose V.; Rowland, Lisa J.
2013-01-01
The advent of affordable Next Generation Sequencing technologies has had major impact on studies of many crop species, where access to genomic technologies and genome-scale data sets has been extremely limited until now. The recent development of genomic resources in blueberry will enable the application of high throughput gene expression approaches that should relatively quickly increase our understanding of blueberry physiology. These studies, however, require a highly accurate and robust workflow and make necessary the identification of reference genes with high expression stability for correct target gene normalization. To create a set of superior reference genes for blueberry expression analyses, we mined a publicly available transcriptome data set from blueberry for orthologs to a set of Arabidopsis genes that showed the most stable expression in a developmental series. In total, the expression stability of 13 putative reference genes was evaluated by qPCR and a set of new references with high stability values across a developmental series in fruits and floral buds of blueberry were identified. We also demonstrated the need to use at least two, preferably three, reference genes to avoid inconsistencies in results, even when superior reference genes are used. The new references identified here provide a valuable resource for accurate normalization of gene expression in Vaccinium spp. and may be useful for other members of the Ericaceae family as well. PMID:24058469
Pernicious plans revealed: Plasmodium falciparum genome wide expression analysis.
Llinás, Manuel; DeRisi, Joseph L
2004-08-01
The asexual intraerythrocytic developmental cycle (IDC) of Plasmodium falciparum is responsible for the majority of the clinical manifestations of malaria in humans. Although malaria has been studied for over a century, the elucidation of the full genome sequence of P. falciparum has now allowed for in-depth studies of gene expression throughout the entire intraerythrocytic stage. As the mainstays of anti-malarial chemotherapy become increasingly ineffective, we need a deeper understanding of fundamental plasmodial bioregulatory mechanisms to successfully subvert them. Recent gene expression studies have begun to examine different aspects of the IDC and are providing key insights into the basic mechanisms of Plasmodium gene regulation and are helping to define gene functions. However, to date, no transcription factor has been fully characterized from Plasmodium and the definitive identification of cis-acting regulatory elements along with their corresponding trans-acting partners is still lacking. The characterization of the transcriptome of P. falciparum is the first major step towards the understanding of the genome wide regulation of gene expression in this parasite. IDC expression data for almost every gene in the P. falciparum genome can now be publicly queried at and. The results of these studies suggest promising leads for identifying novel targets for anti-malarial therapeutics and vaccines in addition to providing a solid foundation for the ongoing elucidation of plasmodial gene expression.
Vidal, Ramon Oliveira; Mondego, Jorge Maurício Costa; Pot, David; Ambrósio, Alinne Batista; Andrade, Alan Carvalho; Pereira, Luiz Filipe Protasio; Colombo, Carlos Augusto; Vieira, Luiz Gonzaga Esteves; Carazzolle, Marcelo Falsarella; Pereira, Gonçalo Amarante Guimarães
2010-01-01
Polyploidization constitutes a common mode of evolution in flowering plants. This event provides the raw material for the divergence of function in homeologous genes, leading to phenotypic novelty that can contribute to the success of polyploids in nature or their selection for use in agriculture. Mounting evidence underlined the existence of homeologous expression biases in polyploid genomes; however, strategies to analyze such transcriptome regulation remained scarce. Important factors regarding homeologous expression biases remain to be explored, such as whether this phenomenon influences specific genes, how paralogs are affected by genome doubling, and what is the importance of the variability of homeologous expression bias to genotype differences. This study reports the expressed sequence tag assembly of the allopolyploid Coffea arabica and one of its direct ancestors, Coffea canephora. The assembly was used for the discovery of single nucleotide polymorphisms through the identification of high-quality discrepancies in overlapped expressed sequence tags and for gene expression information indirectly estimated by the transcript redundancy. Sequence diversity profiles were evaluated within C. arabica (Ca) and C. canephora (Cc) and used to deduce the transcript contribution of the Coffea eugenioides (Ce) ancestor. The assignment of the C. arabica haplotypes to the C. canephora (CaCc) or C. eugenioides (CaCe) ancestral genomes allowed us to analyze gene expression contributions of each subgenome in C. arabica. In silico data were validated by the quantitative polymerase chain reaction and allele-specific combination TaqMAMA-based method. The presence of differential expression of C. arabica homeologous genes and its implications in coffee gene expression, ontology, and physiology are discussed. PMID:20864545
Functionally conserved cis-regulatory elements of COL18A1 identified through zebrafish transgenesis.
Kague, Erika; Bessling, Seneca L; Lee, Josephine; Hu, Gui; Passos-Bueno, Maria Rita; Fisher, Shannon
2010-01-15
Type XVIII collagen is a component of basement membranes, and expressed prominently in the eye, blood vessels, liver, and the central nervous system. Homozygous mutations in COL18A1 lead to Knobloch Syndrome, characterized by ocular defects and occipital encephalocele. However, relatively little has been described on the role of type XVIII collagen in development, and nothing is known about the regulation of its tissue-specific expression pattern. We have used zebrafish transgenesis to identify and characterize cis-regulatory sequences controlling expression of the human gene. Candidate enhancers were selected from non-coding sequence associated with COL18A1 based on sequence conservation among mammals. Although these displayed no overt conservation with orthologous zebrafish sequences, four regions nonetheless acted as tissue-specific transcriptional enhancers in the zebrafish embryo, and together recapitulated the major aspects of col18a1 expression. Additional post-hoc computational analysis on positive enhancer sequences revealed alignments between mammalian and teleost sequences, which we hypothesize predict the corresponding zebrafish enhancers; for one of these, we demonstrate functional overlap with the orthologous human enhancer sequence. Our results provide important insight into the biological function and regulation of COL18A1, and point to additional sequences that may contribute to complex diseases involving COL18A1. More generally, we show that combining functional data with targeted analyses for phylogenetic conservation can reveal conserved cis-regulatory elements in the large number of cases where computational alignment alone falls short. Copyright 2009 Elsevier Inc. All rights reserved.
Improving membrane protein expression by optimizing integration efficiency
2017-01-01
The heterologous overexpression of integral membrane proteins in Escherichia coli often yields insufficient quantities of purifiable protein for applications of interest. The current study leverages a recently demonstrated link between co-translational membrane integration efficiency and protein expression levels to predict protein sequence modifications that improve expression. Membrane integration efficiencies, obtained using a coarse-grained simulation approach, robustly predicted effects on expression of the integral membrane protein TatC for a set of 140 sequence modifications, including loop-swap chimeras and single-residue mutations distributed throughout the protein sequence. Mutations that improve simulated integration efficiency were 4-fold enriched with respect to improved experimentally observed expression levels. Furthermore, the effects of double mutations on both simulated integration efficiency and experimentally observed expression levels were cumulative and largely independent, suggesting that multiple mutations can be introduced to yield higher levels of purifiable protein. This work provides a foundation for a general method for the rational overexpression of integral membrane proteins based on computationally simulated membrane integration efficiencies. PMID:28918393
Intronic sequences are required for AINTEGUMENTA-LIKE6 expression in Arabidopsis flowers.
Krizek, Beth A
2015-10-12
The AINTEGUMENTA-LIKE6/PLETHORA3 (AIL6/PLT3) gene of Arabidopsis thaliana is a key regulator of growth and patterning in both shoots and roots. AIL6 encodes an AINTEGUMENTA-LIKE/PLETHORA (AIL/PLT) transcription factor that is expressed in the root stem cell niche, the peripheral region of the shoot apical meristem and young lateral organ primordia. In flowers, AIL6 acts redundantly with AINTEGUMENTA (ANT) to regulate floral organ positioning, growth, identity and patterning. Experiments were undertaken to define the genomic regions required for AIL6 function and expression in flowers. Transgenic plants expressing a copy of the coding region of AIL6 in the context of 7.7 kb of 5' sequence and 919 bp of 3' sequence (AIL6:cAIL6-3') fail to fully complement AIL6 function when assayed in the ant-4 ail6-2 double mutant background. In contrast, a genomic copy of AIL6 with the same amount of 5' and 3' sequence (AIL6:gAIL6-3') can fully complement ant-4 ail6-2. In addition, a genomic copy of AIL6 with 590 bp of 5' sequence and 919 bp of 3' sequence (AIL6m:gAIL6-3') complements ant-4 ail6-2 and contains all regulatory elements needed to confer normal AIL6 expression in inflorescences. Efforts to map cis-regulatory elements reveal that the third intron of AIL6 contains enhancer elements that confer expression in young flowers but in a broader pattern than that of AIL6 mRNA in wild-type flowers. Some AIL6:gAIL6-3' and AIL6m:gAIL6-3' lines confer an over-rescue phenotype in the ant-4 ail6-2 background that is correlated with higher levels of AIL6 mRNA accumulation. The results presented here indicate that AIL6 intronic sequences serve as transcriptional enhancer elements. In addition, the results show that increased expression of AIL6 can partially compensate for loss of ANT function in flowers.
Perigone Lobe Transcriptome Analysis Provides Insights into Rafflesia cantleyi Flower Development.
Lee, Xin-Wei; Mat-Isa, Mohd-Noor; Mohd-Elias, Nur-Atiqah; Aizat-Juhari, Mohd Afiq; Goh, Hoe-Han; Dear, Paul H; Chow, Keng-See; Haji Adam, Jumaat; Mohamed, Rahmah; Firdaus-Raih, Mohd; Wan, Kiew-Lian
2016-01-01
Rafflesia is a biologically enigmatic species that is very rare in occurrence and possesses an extraordinary morphology. This parasitic plant produces a gigantic flower up to one metre in diameter with no leaves, stem or roots. However, little is known about the floral biology of this species especially at the molecular level. In an effort to address this issue, we have generated and characterised the transcriptome of the Rafflesia cantleyi flower, and performed a comparison with the transcriptome of its floral bud to predict genes that are expressed and regulated during flower development. Approximately 40 million sequencing reads were generated and assembled de novo into 18,053 transcripts with an average length of 641 bp. Of these, more than 79% of the transcripts had significant matches to annotated sequences in the public protein database. A total of 11,756 and 7,891 transcripts were assigned to Gene Ontology categories and clusters of orthologous groups respectively. In addition, 6,019 transcripts could be mapped to 129 pathways in Kyoto Encyclopaedia of Genes and Genomes Pathway database. Digital abundance analysis identified 52 transcripts with very high expression in the flower transcriptome of R. cantleyi. Subsequently, analysis of differential expression between developing flower and the floral bud revealed a set of 105 transcripts with potential role in flower development. Our work presents a deep transcriptome resource analysis for the developing flower of R. cantleyi. Genes potentially involved in the growth and development of the R. cantleyi flower were identified and provide insights into biological processes that occur during flower development.
[Construction of the superantigen SEA transfected laryngocarcinoma cells].
Ji, Xiaobin; Jingli, J V; Liu, Qicai; Xie, Jinghua
2013-04-01
To construct an eukaryotic expression vectors containing superantigen staphylococcal enterotoxin A (SEA) gene, and to identify its expression in laryngeal squamous carcinoma cells. SEA full-length gene fragment was obtained from ATCC13565 genome of the staphylococcus, referencing standard strains producing SEA. Coding sequence of SEA was artificially synthetized. Than, SEA gene fragments was subcloned into eukaryotic expression vector pIRES-EGFP. The recombinant plasmid pSEA-IRES-EGFP was constructed and was transfected to laryngocarcinoma Hep-2 cells. Resistant clones were screened by G418. The expression of SEA in laryngocarcinoma cells was identified with ELISA and RT-PCR method. The subclone of artificially synthetized SEA gene was subclone to eukaryotic expression vector pires-EGFP. Flanking sequence confirmed that SEA sequence was fully identical to the coding sequence of standard staphylococcus strains ATCC13565 in Genbank. After recombinant plasmid transfected to laryngocarcinoma cells, the resistant clones was obtained after screening for two weeks. The clones were selected. The specific gene fragment was obtained by RT-PCR amplification. ELISA assay confirmed that the content of SEA protein in supernatant fluid of cell culture had reached about Pg level. The recombinant eukaryotic expression vector containing superantigen SEA gene is successfully constructed, and is capable of effective expression and continued secretion of SEA protein in laryngochrcinoma Hep-2 cells after recombinant plasmid transfected to laryngocarcinoma cells.
A high-resolution enhancer atlas of the developing telencephalon.
Visel, Axel; Taher, Leila; Girgis, Hani; May, Dalit; Golonzhka, Olga; Hoch, Renee V; McKinsey, Gabriel L; Pattabiraman, Kartik; Silberberg, Shanni N; Blow, Matthew J; Hansen, David V; Nord, Alex S; Akiyama, Jennifer A; Holt, Amy; Hosseini, Roya; Phouanenavong, Sengthavy; Plajzer-Frick, Ingrid; Shoukry, Malak; Afzal, Veena; Kaplan, Tommy; Kriegstein, Arnold R; Rubin, Edward M; Ovcharenko, Ivan; Pennacchio, Len A; Rubenstein, John L R
2013-02-14
The mammalian telencephalon plays critical roles in cognition, motor function, and emotion. Though many of the genes required for its development have been identified, the distant-acting regulatory sequences orchestrating their in vivo expression are mostly unknown. Here, we describe a digital atlas of in vivo enhancers active in subregions of the developing telencephalon. We identified more than 4,600 candidate embryonic forebrain enhancers and studied the in vivo activity of 329 of these sequences in transgenic mouse embryos. We generated serial sets of histological brain sections for 145 reproducible forebrain enhancers, resulting in a publicly accessible web-based data collection comprising more than 32,000 sections. We also used epigenomic analysis of human and mouse cortex tissue to directly compare the genome-wide enhancer architecture in these species. These data provide a primary resource for investigating gene regulatory mechanisms of telencephalon development and enable studies of the role of distant-acting enhancers in neurodevelopmental disorders. Copyright © 2013 Elsevier Inc. All rights reserved.
Pine Gene Discovery Project - Final Report - 08/31/1997 - 02/28/2001
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whetten, R. W.; Sederoff, R. R.; Kinlaw, C.
2001-04-30
Integration of pines into the large scope of plant biology research depends on study of pines in parallel with study of annual plants, and on availability of research materials from pine to plant biologists interested in comparing pine with annual plant systems. The objectives of the Pine Gene Discovery Project were to obtain 10,000 partial DNA sequences of genes expressed in loblolly pine, to determine which of those pine genes were similar to known genes from other organisms, and to make the DNA sequences and isolated pine genes available to plant researchers to stimulate integration of pines into the widermore » scope of plant biology research. Those objectives have been completed, and the results are available to the public. Requests for pine genes have been received from a number of laboratories that would otherwise not have included pine in their research, indicating that progress is being made toward the goal of integrating pine research into the larger molecular biology research community.« less
DCODE.ORG Anthology of Comparative Genomic Tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loots, G G; Ovcharenko, I
2005-01-11
Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the noncoding encryption of gene regulation across genomes. To facilitate the use of comparative genomics to practical applications in genetics and genomics we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools: zPicture and Mulan; a phylogenetic shadowing tool: eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools: rVista and multiTF; a toolmore » for extracting cis-regulatory modules governing the expression of co-regulated genes, CREME; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ web site.« less
Basic Helix-Loop-Helix Transcription Factor Gene Family Phylogenetics and Nomenclature
Skinner, Michael K.; Rawls, Alan; Wilson-Rawls, Jeanne; Roalson, Eric H.
2010-01-01
A phylogenetic analysis of the basic helix-loop-helix (bHLH) gene superfamily was performed using seven different species (human, mouse, rat, worm, fly, yeast, and plant Arabidopsis) and involving over 600 bHLH genes [1]. All bHLH genes were identified in the genomes of the various species, including expressed sequence tags, and the entire coding sequence was used in the analysis. Nearly 15% of the gene family has been updated or added since the original publication. A super-tree involving six clades and all structural relationships was established and is now presented for four of the species. The wealth of functional data available for members of the bHLH gene superfamily provides us with the opportunity to use this exhaustive phylogenetic tree to predict potential functions of uncharacterized members of the family. This phylogenetic and genomic analysis of the bHLH gene family has revealed unique elements of the evolution and functional relationships of the different genes in the bHLH gene family. PMID:20219281
D'Angelo, Maria C; Jiménez, Luis; Milliken, Bruce; Lupiáñez, Juan
2013-01-01
Individuals experience less interference from conflicting information following events that contain conflicting information. Recently, Jiménez, Lupiáñez, and Vaquero (2009) demonstrated that such adaptations to conflict occur even when the source of conflict arises from implicit knowledge of sequences. There is accumulating evidence that momentary changes in adaptations made in response to conflicting information are conflict-type specific (e.g., Funes, Lupiáñez, & Humphreys, 2010a), suggesting that there are multiple modes of control. The current study examined whether conflict-specific sequential congruency effects occur when the 2 sources of conflict are implicitly learned. Participants implicitly learned a motor sequence while simultaneously learning a perceptual sequence. In a first experiment, after learning the 2 orthogonal sequences, participants expressed knowledge of the 2 sequences independently of each other in a transfer phase. In Experiments 2 and 3, within each sequence, the presence of a single control trial disrupted the expression of this specific type of learning on the following trial. There was no evidence of cross-conflict modulations in the expression of sequence learning. The results suggest that the mechanisms involved in transient shifts in conflict-specific control, as reflected in sequential congruency effects, are also engaged when the source of conflict is implicit. (c) 2013 APA, all rights reserved.
RefSeq microbial genomes database: new representation and annotation strategy.
Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O'Neill, Kathleen; Tolstoy, Igor
2014-01-01
The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.
RNA sequencing: current and prospective uses in metabolic research.
Vikman, Petter; Fadista, Joao; Oskolkov, Nikolay
2014-10-01
Previous global RNA analysis was restricted to known transcripts in species with a defined transcriptome. Next generation sequencing has transformed transcriptomics by making it possible to analyse expressed genes with an exon level resolution from any tissue in any species without any a priori knowledge of which genes that are being expressed, splice patterns or their nucleotide sequence. In addition, RNA sequencing is a more sensitive technique compared with microarrays with a larger dynamic range, and it also allows for investigation of imprinting and allele-specific expression. This can be done for a cost that is able to compete with that of a microarray, making RNA sequencing a technique available to most researchers. Therefore RNA sequencing has recently become the state of the art with regards to large-scale RNA investigations and has to a large extent replaced microarrays. The only drawback is the large data amounts produced, which together with the complexity of the data can make a researcher spend far more time on analysis than performing the actual experiment. © 2014 Society for Endocrinology.
The Total Library System (TLS). Volume 2. The Files.
1983-01-01
file name. USER private By user sequence number. PROFILE private By user sequence number. TITL2 public By short form of the Library of Congress number...SHELF public By short form of the Library of Congress number. CATCOST private By requisition number. CATCHG private By Library of Congress number...including edition year and copy number--as on label of book, etc. CONTIN public By short form of the Library of Congress number. LIST public One record
Holmes, Andrew; Szafranski, Karol; Faulkes, Chris G.; Coen, Clive W.; Buffenstein, Rochelle; Platzer, Matthias; de Magalhães, João Pedro; Church, George M.
2011-01-01
The naked mole-rat (Heterocephalus glaber) is a long-lived, cancer resistant rodent and there is a great interest in identifying the adaptations responsible for these and other of its unique traits. We employed RNA sequencing to compare liver gene expression profiles between naked mole-rats and wild-derived mice. Our results indicate that genes associated with oxidoreduction and mitochondria were expressed at higher relative levels in naked mole-rats. The largest effect is nearly 300-fold higher expression of epithelial cell adhesion molecule (Epcam), a tumour-associated protein. Also of interest are the protease inhibitor, alpha2-macroglobulin (A2m), and the mitochondrial complex II subunit Sdhc, both ageing-related genes found strongly over-expressed in the naked mole-rat. These results hint at possible candidates for specifying species differences in ageing and cancer, and in particular suggest complex alterations in mitochondrial and oxidation reduction pathways in the naked mole-rat. Our differential gene expression analysis obviated the need for a reference naked mole-rat genome by employing a combination of Illumina/Solexa and 454 platforms for transcriptome sequencing and assembling transcriptome contigs of the non-sequenced species. Overall, our work provides new research foci and methods for studying the naked mole-rat's fascinating characteristics. PMID:22073188
NASA Astrophysics Data System (ADS)
Tian, Caihong; Tek Tay, Wee; Feng, Hongqiang; Wang, Ying; Hu, Yongmin; Li, Guoping
2015-06-01
Adelphocoris suturalis is one of the most serious pest insects of Bt cotton in China, however its molecular genetics, biochemistry and physiology are poorly understood. We used high throughput sequencing platform to perform de novo transcriptome assembly and gene expression analyses across different developmental stages (eggs, 2nd and 5th instar nymphs, female and male adults). We obtained 20 GB of clean data and revealed 88,614 unigenes, including 23,830 clusters and 64,784 singletons. These unigene sequences were annotated and classified by Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases. A large number of differentially expressed genes were discovered through pairwise comparisons between these developmental stages. Gene expression profiles were dramatically different between life stage transitions, with some of these most differentially expressed genes being associated with sex difference, metabolism and development. Quantitative real-time PCR results confirm deep-sequencing findings based on relative expression levels of nine randomly selected genes. Furthermore, over 791,390 single nucleotide polymorphisms and 2,682 potential simple sequence repeats were identified. Our study provided comprehensive transcriptional gene expression information for A. suturalis that will form the basis to better understanding of development pathways, hormone biosynthesis, sex differences and wing formation in mirid bugs.
Tian, Caihong; Tek Tay, Wee; Feng, Hongqiang; Wang, Ying; Hu, Yongmin; Li, Guoping
2015-01-01
Adelphocoris suturalis is one of the most serious pest insects of Bt cotton in China, however its molecular genetics, biochemistry and physiology are poorly understood. We used high throughput sequencing platform to perform de novo transcriptome assembly and gene expression analyses across different developmental stages (eggs, 2nd and 5th instar nymphs, female and male adults). We obtained 20 GB of clean data and revealed 88,614 unigenes, including 23,830 clusters and 64,784 singletons. These unigene sequences were annotated and classified by Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases. A large number of differentially expressed genes were discovered through pairwise comparisons between these developmental stages. Gene expression profiles were dramatically different between life stage transitions, with some of these most differentially expressed genes being associated with sex difference, metabolism and development. Quantitative real-time PCR results confirm deep-sequencing findings based on relative expression levels of nine randomly selected genes. Furthermore, over 791,390 single nucleotide polymorphisms and 2,682 potential simple sequence repeats were identified. Our study provided comprehensive transcriptional gene expression information for A. suturalis that will form the basis to better understanding of development pathways, hormone biosynthesis, sex differences and wing formation in mirid bugs. PMID:26047353
Characterization of Cer-1 cis-regulatory region during early Xenopus development.
Silva, Ana Cristina; Filipe, Mário; Steinbeisser, Herbert; Belo, José António
2011-05-01
Cerberus-related molecules are well-known Wnt, Nodal, and BMP inhibitors that have been implicated in different processes including anterior–posterior patterning and left–right asymmetry. In both mouse and frog, two Cerberus-related genes have been isolated, mCer-1 and mCer-2, and Xcer and Xcoco, respectively. Until now, little is known about the mechanisms involved in their transcriptional regulation. Here, we report a heterologous analysis of the mouse Cerberus-1 gene upstream regulatory regions, responsible for its expression in the visceral endodermal cells. Our analysis showed that the consensus sequences for a TATA, CAAT, or GC boxes were absent but a TGTGG sequence was present at position -172 to -168 bp, relative to the ATG. Using a series of deletion constructs and transient expression in Xenopus embryos, we found that a fragment of 1.4 kb of Cer-1 promoter sequence could reproduce the endogenous expression pattern of Xenopus cerberus. A 0.7-kb mcer-1 upstream region was able to drive reporter expression to the involuting mesendodermal cells, while further deletions abolished reporter gene expression. Our results suggest that although no sequence similarity was found between mouse and Xenopus cerberus cis-regulatory regions, the signaling cascades regulating cerberus expression, during gastrulation, is conserved.
Sakai, Kazuko; Takeda, Masayuki; Okamoto, Isamu; Nakagawa, Kazuhiko; Nishio, Kazuto
2015-01-01
Hepatocyte growth factor (HGF) expression is a poor prognostic factor in various types of cancer. Expression levels of HGF have been reported to be regulated by shorter poly(dA) sequences in the promoter region. In the present study, the poly(dA) mononucleotide tract in various types of human cancer cell lines was examined and compared with the HGF expression levels in those cells. Short deoxyadenosine repeat sequences were detected in five of the 55 cell lines used in the present study. The H69, IM95, CCK-81, Sui73 and H28 cells exhibited a truncated poly(dA) sequence in which the number of poly(dA) repeats was reduced by ≥5 bp. Two of the cell lines exhibited high HGF expression, determined by reverse transcription quantitative polymerase chain reaction and enzyme-linked immunosorbent assay. The CCK-81, Sui73 and H28 cells with shorter poly(dA) sequences exhibited low HGF expression. The cause of the suppression of HGF expression in the CCK-81, Sui73 and H28 cells was clarified by two approaches, suppression by methylation and single nucleotide polymorphisms in the HGF gene. Exposure to 5-Aza-dC, an inhibitor of DNA methyltransferase 1, induced an increased expression of HGF in the CCK-81 cells, but not in the other cells. Single-nucleotide polymorphism (SNP) rs72525097 in intron 1 was detected in the Sui73 and H28 cells. Taken together, it was found that the defect of poly(dA) in the HGF promoter was present in various types of cancer, including lung, stomach, colorectal, pancreas and mesothelioma. The present study proposes the negative regulation mechanisms by methylation and SNP in intron 1 of HGF for HGF expression in cancer cells with short poly(dA).
Transcriptome response to elevated CO2, water deficit, and thermal stress in peanut
USDA-ARS?s Scientific Manuscript database
Previously, our laboratories have performed gene expression studies using EST sequencing and spotted microarrays to investigate tissue-specific gene expression and response to abiotic stress. While these studies have provided valuable insight into these processes, they are constrained by sequencer t...
Shang, Feng; Ding, Bi-Yue; Xiong, Ying; Dou, Wei; Wei, Dong; Jiang, Hong-Bo; Wei, Dan-Dan; Wang, Jin-Jun
2016-01-01
Winged and wingless morphs in insects represent a trade-off between dispersal ability and reproduction. We studied key genes associated with apterous and alate morphs in Toxoptera citricida (Kirkaldy) using RNAseq, digital gene expression (DGE) profiling, and RNA interference. The de novo assembly of the transcriptome was obtained through Illumina short-read sequencing technology. A total of 44,199 unigenes were generated and 27,640 were annotated. The transcriptomic differences between alate and apterous adults indicated that 279 unigenes were highly expressed in alate adults, whereas 5,470 were expressed at low levels. Expression patterns of the top 10 highly expressed genes in alate adults agreed with wing bud development trends. Silencing of the lipid synthesis and degradation gene (3-ketoacyl-CoA thiolase, mitochondrial-like) and glycogen genes (Phosphoenolpyruvate carboxykinase [GTP]-like and Glycogen phosphorylase-like isoform 2) resulted in underdeveloped wings. This suggests that both lipid and glycogen metabolism provide energy for aphid wing development. The large number of sequences and expression data produced from the transcriptome and DGE sequencing, respectively, increases our understanding of wing development mechanisms. PMID:27577531
Setliff, Ian; McDonnell, Wyatt J; Raju, Nagarajan; Bombardi, Robin G; Murji, Amyn A; Scheepers, Cathrine; Ziki, Rutendo; Mynhardt, Charissa; Shepherd, Bryan E; Mamchak, Alusha A; Garrett, Nigel; Karim, Salim Abdool; Mallal, Simon A; Crowe, James E; Morris, Lynn; Georgiev, Ivelin S
2018-06-13
Characterization of single antibody lineages within infected individuals has provided insights into the development of Env-specific antibodies. However, a systems-level understanding of the humoral response against HIV-1 is limited. Here, we interrogated the antibody repertoires of multiple HIV-infected donors from an infection-naive state through acute and chronic infection using next-generation sequencing. This analysis revealed the existence of "public" antibody clonotypes that were shared among multiple HIV-infected individuals. The HIV-1 reactivity for representative antibodies from an identified public clonotype shared by three donors was confirmed. Furthermore, a meta-analysis of publicly available antibody repertoire sequencing datasets revealed antibodies with high sequence identity to known HIV-reactive antibodies, even in repertoires that were reported to be HIV naive. The discovery of public antibody clonotypes in HIV-infected individuals represents an avenue of significant potential for better understanding antibody responses to HIV-1 infection, as well as for clonotype-specific vaccine development. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Bedon, Frank; Grima-Pettenati, Jacqueline; Mackay, John
2007-01-01
Background Several members of the R2R3-MYB family of transcription factors act as regulators of lignin and phenylpropanoid metabolism during wood formation in angiosperm and gymnosperm plants. The angiosperm Arabidopsis has over one hundred R2R3-MYBs genes; however, only a few members of this family have been discovered in gymnosperms. Results We isolated and characterised full-length cDNAs encoding R2R3-MYB genes from the gymnosperms white spruce, Picea glauca (13 sequences), and loblolly pine, Pinus taeda L. (five sequences). Sequence similarities and phylogenetic analyses placed the spruce and pine sequences in diverse subgroups of the large R2R3-MYB family, although several of the sequences clustered closely together. We searched the highly variable C-terminal region of diverse plant MYBs for conserved amino acid sequences and identified 20 motifs in the spruce MYBs, nine of which have not previously been reported and three of which are specific to conifers. The number and length of the introns in spruce MYB genes varied significantly, but their positions were well conserved relative to angiosperm MYB genes. Quantitative RTPCR of MYB genes transcript abundance in root and stem tissues revealed diverse expression patterns; three MYB genes were preferentially expressed in secondary xylem, whereas others were preferentially expressed in phloem or were ubiquitous. The MYB genes expressed in xylem, and three others, were up-regulated in the compression wood of leaning trees within 76 hours of induction. Conclusion Our survey of 18 conifer R2R3-MYB genes clearly showed a gene family structure similar to that of Arabidopsis. Three of the sequences are likely to play a role in lignin metabolism and/or wood formation in gymnosperm trees, including a close homolog of the loblolly pine PtMYB4, shown to regulate lignin biosynthesis in transgenic tobacco. PMID:17397551
Filloux, Denis; Murrell, Sasha; Koohapitagtam, Maneerat; Golden, Michael; Julian, Charlotte; Galzi, Serge; Uzest, Marilyne; Rodier-Goud, Marguerite; D’Hont, Angélique; Vernerey, Marie Stephanie; Wilkin, Paul; Peterschmitt, Michel; Winter, Stephan; Murrell, Ben; Martin, Darren P.; Roumagnac, Philippe
2015-01-01
Endogenous viral sequences are essentially ‘fossil records’ that can sometimes reveal the genomic features of long extinct virus species. Although numerous known instances exist of single-stranded DNA (ssDNA) genomes becoming stably integrated within the genomes of bacteria and animals, there remain very few examples of such integration events in plants. The best studied of these events are those which yielded the geminivirus-related DNA elements found within the nuclear genomes of various Nicotiana species. Although other ssDNA virus-like sequences are included within the draft genomes of various plant species, it is not entirely certain that these are not contaminants. The Nicotiana geminivirus-related DNA elements therefore remain the only definitively proven instances of endogenous plant ssDNA virus sequences. Here, we characterize two new classes of endogenous plant virus sequence that are also apparently derived from ancient geminiviruses in the genus Begomovirus. These two endogenous geminivirus-like elements (EGV1 and EGV2) are present in the Dioscorea spp. of the Enantiophyllum clade. We used fluorescence in situ hybridization to confirm that the EGV1 sequences are integrated in the D. alata genome and showed that one or two ancestral EGV sequences likely became integrated more than 1.4 million years ago during or before the diversification of the Asian and African Enantiophyllum Dioscorea spp. Unexpectedly, we found evidence of natural selection actively favouring the maintenance of EGV-expressed replication-associated protein (Rep) amino acid sequences, which clearly indicates that functional EGV Rep proteins were probably expressed for prolonged periods following endogenization. Further, the detection in D. alata of EGV gene transcripts, small 21–24 nt RNAs that are apparently derived from these transcripts, and expressed Rep proteins, provides evidence that some EGV genes are possibly still functionally expressed in at least some of the Enantiophyllum clade species. PMID:27774276