Science.gov

Sample records for sequence tags analysis

  1. Expressed sequence tags analysis of Blattella germanica.

    PubMed

    Chung, Hyang Suk; Yu, Tai Hyun; Kim, Bong Jin; Kim, Sun Mi; Kim, Joo Yeong; Yu, Hak Sun; Jeong, Hae Jin; Ock, Mee Sun

    2005-12-01

    Four hundred and sixty five randomly selected clones from a cDNA library of Blattella germanica were partially sequenced and searched using BLAST as a means of analyzing the transcribed sequences of its genome. A total of 363 expressed sequence tags (ESTs) were generated from 465 clones after editing and trimming the vector and ambiguous sequences. About 42% (154/363) of these clones showed significant homology with other data base registered genes. These new B. germanica genes constituted a broad range of transcripts distributed among ribosomal proteins, energy metabolism, allergens, proteases, protease inhibitors, enzymes, translation, cell signaling pathways, and proteins of unknown function. Eighty clones were not well-matched by database searches, and these represent new B. germanica-specific ESTs. Some genes which drew our attention are discussed. The information obtained increases our understanding of the B. germanica genome.

  2. Expressed sequence tags analysis of Blattella germanica

    PubMed Central

    Chung, Hyang Suk; Yu, Tai Hyun; Kim, Bong Jin; Kim, Sun Mi; Kim, Joo Yeong; Yu, Hak Sun; Jeong, Hae Jin

    2005-01-01

    Four hundred and sixty five randomly selected clones from a cDNA library of Blattella germanica were partially sequenced and searched using BLAST as a means of analyzing the transcribed sequences of its genome. A total of 363 expressed sequence tags (ESTs) were generated from 465 clones after editing and trimming the vector and ambiguous sequences. About 42% (154/363) of these clones showed significant homology with other data base registered genes. These new B. germanica genes constituted a broad range of transcripts distributed among ribosomal proteins, energy metabolism, allergens, proteases, protease inhibitors, enzymes, translation, cell signaling pathways, and proteins of unknown function. Eighty clones were not well-matched by database searches, and these represent new B. germanica-specific ESTs. Some genes which drew our attention are discussed. The information obtained increases our understanding of the B. germanica genome. PMID:16340304

  3. RED: the analysis, management and dissemination of expressed sequence tags.

    PubMed

    Everitt, R; Minnema, S E; Wride, M A; Koster, C S; Hance, J E; Mansergh, F C; Rancourt, D E

    2002-12-01

    The Rancourt EST Database (RED) is a web-based system for the analysis, management, and dissemination of expressed sequence tags (ESTs). RED represents a flexible template DNA sequence database that can be easily manipulated to suit the needs of other laboratories undertaking mid-size sequencing projects.

  4. Transcript identification by analysis of short sequence tags--influence of tag length, restriction site and transcript database.

    PubMed

    Unneberg, Per; Wennborg, Anders; Larsson, Magnus

    2003-04-15

    There exist a number of gene expression profiling techniques that utilize restriction enzymes for generation of short expressed sequence tags. We have studied how the choice of restriction enzyme influences various characteristics of tags generated in an experiment. We have also investigated various aspects of in silico transcript identification that these profiling methods rely on. First, analysis of 14 248 mRNA sequences derived from the RefSeq transcript database showed that 1-30% of the sequences lack a given restriction enzyme recognition site. Moreover, 1-5% of the transcripts have recognition sites located less than 10 bases from the poly(A) tail. The uniqueness of 10 bp tags lies in the range 90-95%, which increases only slightly with longer tags, due to the existence of closely related transcripts. Furthermore, 3-30% of upstream 10 bp tags are identical to 3' tags, introducing a risk of misclassification if upstream tags are present in a sample. Second, we found that a sequence length of 16-17 bp, including the recognition site, is sufficient for unique transcript identification by BLAST based sequence alignment to the UniGene Human non-redundant database. Third, we constructed a tag-to-gene mapping for UniGene and compared it to an existing mapping database. The mappings agreed to 79-83%, where the selection of representative sequences in the UniGene clusters is the main cause of the disagreement. The results of this study may serve to improve the interpretation of sequence-based expression studies and the design of hybridization arrays, by identifying short tags that have a high reliability and separating them from tags that carry an inherent ambiguity in their capacity to discriminate between genes. To this end, supplementary information in the form of a web companion to this paper is located at http:// biobase.biotech.kth.se/tagseq.

  5. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    NASA Astrophysics Data System (ADS)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  6. Expressed sequence tag analysis in tef (Eragrostis tef (Zucc) Trotter).

    PubMed

    Yu, Ju-Kyung; Sun, Qi; Rota, Mauricio La; Edwards, Hugh; Tefera, Hailu; Sorrells, Mark E

    2006-04-01

    Tef (Eragrostis tef (Zucc.) Trotter) is the most important cereal crop in Ethiopia; however, there is very little DNA sequence information available for this species. Expressed sequence tags (ESTs) were generated from 4 cDNA libraries: seedling leaf, seedling root, and inflorescence of E. tef and seedling leaf of Eragrostis pilosa, a wild relative of E. tef. Clustering of 3603 sequences produced 530 clusters and 1890 singletons, resulting in 2420 tef unigenes. Approximately 3/4 of tef unigenes matched protein or nucleotide sequences in public databases. Annotation of unigenes associated 68% of the putative tef genes with gene ontology categories. Identification of the translated unigenes for conserved protein domains revealed 389 protein family domains (Pfam), the most frequent of which was protein kinase. A total of 170 ESTs containing simple sequence repeats (EST-SSRs) were identified and 80 EST-SSR markers were developed. In addition, 19 single-nucleotide polymorphism (SNP) and (or) insertion-deletion (indel) and 34 intron fragment length polymorphism (IFLP) markers were developed. The EST database and molecular markers generated in this study will be valuable resources for further tef genetic research.

  7. Analysis of the dermatophyte Trichophyton rubrum expressed sequence tags

    PubMed Central

    Wang, Lingling; Ma, Li; Leng, Wenchuan; Liu, Tao; Yu, Lu; Yang, Jian; Yang, Li; Zhang, Wenliang; Zhang, Qian; Dong, Jie; Xue, Ying; Zhu, Yafang; Xu, Xingye; Wan, Zhe; Ding, Guohui; Yu, Fudong; Tu, Kang; Li, Yixue; Li, Ruoyu; Shen, Yan; Jin, Qi

    2006-01-01

    Background Dermatophytes are the primary causative agent of dermatophytoses, a disease that affects billions of individuals worldwide. Trichophyton rubrum is the most common of the superficial fungi. Although T. rubrum is a recognized pathogen for humans, little is known about how its transcriptional pattern is related to development of the fungus and establishment of disease. It is therefore necessary to identify genes whose expression is relevant to growth, metabolism and virulence of T. rubrum. Results We generated 10 cDNA libraries covering nearly the entire growth phase and used them to isolate 11,085 unique expressed sequence tags (ESTs), including 3,816 contigs and 7,269 singletons. Comparisons with the GenBank non-redundant (NR) protein database revealed putative functions or matched homologs from other organisms for 7,764 (70%) of the ESTs. The remaining 3,321 (30%) of ESTs were only weakly similar or not similar to known sequences, suggesting that these ESTs represent novel genes. Conclusion The present data provide a comprehensive view of fungal physiological processes including metabolism, sexual and asexual growth cycles, signal transduction and pathogenic mechanisms. PMID:17032460

  8. Expressed sequence tags (ESTs) analysis of Acanthamoeba healyi

    PubMed Central

    Kong, Hyun-Hee; Hwang, Mee-Yeul; Kim, Hyo-Kyung

    2001-01-01

    Randomly selected 435 clones from Acanthamoeba healyi cDNA library were sequenced and a total of 387 expressed sequence tags (ESTs) had been generated. Based on the results of BLAST search, 130 clones (34.4%) were identified as the genes enconding surface proteins, enzymes for DNA, energy production or other metabolism, kinases and phosphatases, protease, proteins for signal transduction, structural and cytoskeletal proteins, cell cycle related proteins, transcription factors, transcription and translational machineries, and transporter proteins. Most of the genes (88.5%) are newly identified in the genus Acanthamoeba. Although 15 clones matched the genes of Acanthamoeba located in the public databases, twelve clones were actin gene which was the most frequently expressed gene in this study. These ESTs of Acanthamoeba would give valuable information to study the organism as a model system for biological investigations such as cytoskeleton or cell movement, signal transduction, transcriptional and translational regulations. These results would also provide clues to elucidate factors for pathogenesis in human granulomatous amoebic encephalitis or keratitis by Acanthamoeba. PMID:11441502

  9. Analysis of expressed sequence tags (ESTs) from Agrostis species obtained using sequence related amplified polymorphism.

    PubMed

    Dinler, Gizem; Budak, Hikmet

    2008-10-01

    Bentgrass (Agrostis spp.), a genus of the Poaceae family, consists of more than 200 species and is mainly used in athletic fields and golf courses. Creeping bentgrass (A. stolonifera L.) is the most commonly used species in maintaining golf courses, followed by colonial bentgrass (A. capillaris L.) and velvet bentgrass (A. canina L.). The presence and nature of sequence related amplified polymorphism (SRAP) at the cDNA level were investigated. We isolated 80 unique cDNA fragment bands from these species using 56 SRAP primer combinations. Sequence analysis of cDNA clones and analysis of putative translation products revealed that some encoded amino acid sequences were similar to proteins involved in DNA synthesis, transcription, and signal transduction. The cytosolic glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene (GenBank accession no. EB812822) was also identified from velvet bentgrass, and the corresponding protein sequence is further analyzed due to its critical role in many cellular processes. The partial peptide sequence obtained was 112 amino acids long, presenting a high degree of homology to parts of the N-terminal and C-terminal regions of cytosolic phosphorylating GAPDH (GapC). The existence of common expressed sequence tags (ESTs) revealed by a minimum evolutionary dendrogram among the Agrostis ESTs indicated the usefulness of SRAP for comparative genome analysis of transcribed genes in the grass species.

  10. ChIA-PET tool for comprehensive chromatin interaction analysis with paired-end tag sequencing.

    PubMed

    Li, Guoliang; Fullwood, Melissa J; Xu, Han; Mulawadi, Fabianus Hendriyan; Velkov, Stoyan; Vega, Vinsensius; Ariyaratne, Pramila Nuwantha; Mohamed, Yusoff Bin; Ooi, Hong-Sain; Tennakoon, Chandana; Wei, Chia-Lin; Ruan, Yijun; Sung, Wing-Kin

    2010-01-01

    Chromatin interaction analysis with paired-end tag sequencing (ChIA-PET) is a new technology to study genome-wide long-range chromatin interactions bound by protein factors. Here we present ChIA-PET Tool, a software package for automatic processing of ChIA-PET sequence data, including linker filtering, mapping tags to reference genomes, identifying protein binding sites and chromatin interactions, and displaying the results on a graphical genome browser. ChIA-PET Tool is fast, accurate, comprehensive, user-friendly, and open source (available at http://chiapet.gis.a-star.edu.sg).

  11. Insights into a dinoflagellate genome through expressed sequence tag analysis

    PubMed Central

    Hackett, Jeremiah D; Scheetz, Todd E; Yoon, Hwan Su; Soares, Marcelo B; Bonaldo, Maria F; Casavant, Thomas L; Bhattacharya, Debashish

    2005-01-01

    Background Dinoflagellates are important marine primary producers and grazers and cause toxic "red tides". These taxa are characterized by many unique features such as immense genomes, the absence of nucleosomes, and photosynthetic organelles (plastids) that have been gained and lost multiple times. We generated EST sequences from non-normalized and normalized cDNA libraries from a culture of the toxic species Alexandrium tamarense to elucidate dinoflagellate evolution. Previous analyses of these data have clarified plastid origin and here we study the gene content, annotate the ESTs, and analyze the genes that are putatively involved in DNA packaging. Results Approximately 20% of the 6,723 unique (11,171 total 3'-reads) ESTs data could be annotated using Blast searches against GenBank. Several putative dinoflagellate-specific mRNAs were identified, including one novel plastid protein. Dinoflagellate genes, similar to other eukaryotes, have a high GC-content that is reflected in the amino acid codon usage. Highly represented transcripts include histone-like (HLP) and luciferin binding proteins and several genes occur in families that encode nearly identical proteins. We also identified rare transcripts encoding a predicted protein highly similar to histone H2A.X. We speculate this histone may be retained for its role in DNA double-strand break repair. Conclusion This is the most extensive collection to date of ESTs from a toxic dinoflagellate. These data will be instrumental to future research to understand the unique and complex cell biology of these organisms and for potentially identifying the genes involved in toxin production. PMID:15921535

  12. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment.

    PubMed

    Kim, Jonghwan; Bhinge, Akshay A; Morgan, Xochitl C; Iyer, Vishwanath R

    2005-01-01

    Identifying the chromosomal targets of transcription factors is important for reconstructing the transcriptional regulatory networks underlying global gene expression programs. We have developed an unbiased genomic method called sequence tag analysis of genomic enrichment (STAGE) to identify the direct binding targets of transcription factors in vivo. STAGE is based on high-throughput sequencing of concatemerized tags derived from target DNA enriched by chromatin immunoprecipitation. We first used STAGE in yeast to confirm that RNA polymerase III genes are the most prominent targets of the TATA-box binding protein. We optimized the STAGE protocol and developed analysis methods to allow the identification of transcription factor targets in human cells. We used STAGE to identify several previously unknown binding targets of human transcription factor E2F4 that we independently validated by promoter-specific PCR and microarray hybridization. STAGE provides a means of identifying the chromosomal targets of DNA-associated proteins in any sequenced genome.

  13. Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

    PubMed Central

    Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

    2003-01-01

    To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979

  14. Analysis of expressed sequence tags of the water flea Daphnia magna.

    PubMed

    Watanabe, Hajime; Tatarazako, Norihisa; Oda, Shigeto; Nishide, Hiroyo; Uchiyama, Ikuo; Morita, Masatoshi; Iguchi, Taisen

    2005-08-01

    To study gene expression in the water flea Daphnia magna we constructed a cDNA library and characterized the expressed sequence tags (ESTs) of 7210 clones. The EST sequences clustered into 2958 nonredundant groups. BLAST analyses of both protein and DNA databases showed that 1218 (41%) of the unique sequences shared significant similarities to known nucleotide or amino acid sequences, whereas the remaining 1740 (59%) showed no significant similarities to other genes. Clustering analysis revealed particularly high expression of genes related to ATP synthesis, structural proteins, and proteases. The cDNA clones and EST sequence information should be useful for future functional analysis of daphnid biology and investigation of the links between ecology and genomics.

  15. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  16. Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius.

    PubMed

    Al-Swailem, Abdulaziz M; Shehata, Maher M; Abu-Duhier, Faisel M; Al-Yamani, Essam J; Al-Busadah, Khalid A; Al-Arawi, Mohammed S; Al-Khider, Ali Y; Al-Muhaimeed, Abdullah N; Al-Qahtani, Fahad H; Manee, Manee M; Al-Shomrani, Badr M; Al-Qhtani, Saad M; Al-Harthi, Amer S; Akdemir, Kadir C; Inan, Mehmet S; Otu, Hasan H

    2010-05-19

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism.

  17. Gene expression profile of human bone marrow stromal cells: high-throughput expressed sequence tag sequencing analysis.

    PubMed

    Jia, Libin; Young, Marian F; Powell, John; Yang, Liming; Ho, Nicola C; Hotchkiss, Robert; Robey, Pamela Gehron; Francomano, Clair A

    2002-01-01

    Human bone marrow stromal cells (HBMSC) are pluripotent cells with the potential to differentiate into osteoblasts, chondrocytes, myelosupportive stroma, and marrow adipocytes. We used high-throughput DNA sequencing analysis to generate 4258 single-pass sequencing reactions (known as expressed sequence tags, or ESTs) obtained from the 5' (97) and 3' (4161) ends of human cDNA clones from a HBMSC cDNA library. Our goal was to obtain tag sequences from the maximum number of possible genes and to deposit them in the publicly accessible database for ESTs (dbEST of the National Center for Biotechnology Information). Comparisons of our EST sequencing data with nonredundant human mRNA and protein databases showed that the ESTs represent 1860 gene clusters. The EST sequencing data analysis showed 60 novel genes found only in this cDNA library after BLAST analysis against 3.0 million ESTs in NCBI's dbEST database. The BLAST search also showed the identified ESTs that have close homology to known genes, which suggests that these may be newly recognized members of known gene families. The gene expression profile of this cell type is revealed by analyzing both the frequency with which a message is encountered and the functional categorization of expressed sequences. Comparing an EST sequence with the human genomic sequence database enables assignment of an EST to a specific chromosomal region (a process called digital gene localization) and often enables immediate partial determination of intron/exon boundaries within the genomic structure. It is expected that high-throughput EST sequencing and data mining analysis will greatly promote our understanding of gene expression in these cells and of growth and development of the skeleton.

  18. Generation and analysis of expressed sequence tags from the ciliate protozoan parasite Ichthyophthirius multifiliis

    PubMed Central

    Abernathy, Jason W; Xu, Peng; Li, Ping; Xu, De-Hai; Kucuktas, Huseyin; Klesius, Phillip; Arias, Covadonga; Liu, Zhanjiang

    2007-01-01

    Background The ciliate protozoan Ichthyophthirius multifiliis (Ich) is an important parasite of freshwater fish that causes 'white spot disease' leading to significant losses. A genomic resource for large-scale studies of this parasite has been lacking. To study gene expression involved in Ich pathogenesis and virulence, our goal was to generate expressed sequence tags (ESTs) for the development of a powerful microarray platform for the analysis of global gene expression in this species. Here, we initiated a project to sequence and analyze over 10,000 ESTs. Results We sequenced 10,368 EST clones using a normalized cDNA library made from pooled samples of the trophont, tomont, and theront life-cycle stages, and generated 9,769 sequences (94.2% success rate). Post-sequencing processing led to 8,432 high quality sequences. Clustering analysis of these ESTs allowed identification of 4,706 unique sequences containing 976 contigs and 3,730 singletons. These unique sequences represent over two million base pairs (~10% of Plasmodium falciparum genome, a phylogenetically related protozoan). BLASTX searches produced 2,518 significant (E-value < 10-5) hits and further Gene Ontology (GO) analysis annotated 1,008 of these genes. The ESTs were analyzed comparatively against the genomes of the related protozoa Tetrahymena thermophila and P. falciparum, allowing putative identification of additional genes. All the EST sequences were deposited by dbEST in GenBank (GenBank: EG957858–EG966289). Gene discovery and annotations are presented and discussed. Conclusion This set of ESTs represents a significant proportion of the Ich transcriptome, and provides a material basis for the development of microarrays useful for gene expression studies concerning Ich development, pathogenesis, and virulence. PMID:17577414

  19. Analysis of expressed sequence tags from a naked foraminiferan Reticulomyxa filosa.

    PubMed

    Burki, Fabien; Nikolaev, Sergey I; Bolivar, Ignacio; Guiard, Jackie; Pawlowski, Jan

    2006-08-01

    Foraminifers are a major component of modern marine ecosystems and one of the most important oceanic producers of calcium carbonate. They are a key phylogenetic group among amoeboid protists, but our knowledge of their genome is still mostly limited to a few conserved genes. Here, we report the first study of expressed genes by means of expressed sequence tag (EST) from the freshwater naked foraminiferan Reticulomyxa filosa. Cluster analysis of 1630 valid ESTs enabled the identification of 178 groups of related sequences and 871 singlets. Approximately 50% of the putative unique 1059 ESTs could be annotated using Blast searches against the protein database SwissProt + TrEMBL. The EST database described here is the first step towards gene discovery in Foraminifera and should provide the basis for new insights into the genomic and transcriptomic characteristics of these interesting but poorly understood protists.

  20. Generation and analysis of expressed sequence tags from the bone marrow of Chinese Sika deer.

    PubMed

    Yao, Baojin; Zhao, Yu; Zhang, Mei; Li, Juan

    2012-03-01

    Sika deer is one of the best-known and highly valued animals of China. Despite its economic, cultural, and biological importance, there has not been a large-scale sequencing project for Sika deer to date. With the ultimate goal of sequencing the complete genome of this organism, we first established a bone marrow cDNA library for Sika deer and generated a total of 2,025 reads. After processing the sequences, 2,017 high-quality expressed sequence tags (ESTs) were obtained. These ESTs were assembled into 1,157 unigenes, including 238 contigs and 919 singletons. Comparative analyses indicated that 888 (76.75%) of the unigenes had significant matches to sequences in the non-redundant protein database, In addition to highly expressed genes, such as stearoyl-CoA desaturase, cytochrome c oxidase, adipocyte-type fatty acid-binding protein, adiponectin and thymosin beta-4, we also obtained vascular endothelial growth factor-A and heparin-binding growth-associated molecule, both of which are of great importance for angiogenesis research. There were 244 (21.09%) unigenes with no significant match to any sequence in current protein or nucleotide databases, and these sequences may represent genes with unknown function in Sika deer. Open reading frame analysis of the sequences was performed using the getorf program. In addition, the sequences were functionally classified using the gene ontology hierarchy, clusters of orthologous groups of proteins and Kyoto encyclopedia of genes and genomes databases. Analysis of ESTs described in this paper provides an important resource for the transcriptome exploration of Sika deer, and will also facilitate further studies on functional genomics, gene discovery and genome annotation of Sika deer.

  1. Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida

    PubMed Central

    Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

    2007-01-01

    Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730

  2. Analysis of expressed sequence tags from Prunus mume flower and fruit and development of simple sequence repeat markers

    PubMed Central

    2010-01-01

    Background Expressed Sequence Tag (EST) has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST) sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047), among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs) in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65%) and low in the peach (46%), and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species. PMID:20626882

  3. Analysis of transcripts from intracellular stages of Eimeria acervulina using expressed sequence tags.

    PubMed

    Miska, K B; Fetterer, R H; Rosenberg, G H

    2008-04-01

    Coccidiosis in chickens is caused by 7 species of Eimeria. Even though coccidiosis is a complex disease that can be caused by any combination of these species, most of the molecular research concerning chicken coccidiosis has been limited to Eimeria tenella. The present study describes the first large-scale analysis of expressed sequence tags (ESTs) generated primarily from second-stage merozoites (and schizonts) of E. acervulina. In total, 1,847 ESTs were sequenced; these represent 1,026 unique sequences. Approximately half of the ESTs encode proteins of unknown function, or hypothetical proteins. Twenty-nine percent of the E. acervulina ESTs share significant sequence identity with sequences in the E. tenella genome. Additionally, EST hits seem to be much different compared with those of E. tenella. One of the differences is the very low number of ESTs that encode putative microneme proteins. This study underlines the potential differences in the molecular aspects of 2 Eimeria species that in the past were thought to be highly similar in nature.

  4. Comparative gene expression in the symbiotic and aposymbiotic Aiptasia pulchella by expressed sequence tag analysis.

    PubMed

    Kuo, Jimmy; Chen, Ming-Chyuan; Lin, Chorng-Horng; Fang, Lee-Shing

    2004-05-21

    Intracellular symbiotic relationships are prevalent between cnidarians, such as corals and sea anemones, and the photosynthetic dinoflagellate symbionts. However, there is little understanding about how the genes express when the symbiotic relationship is set up. To characterize genes involved in this association, the endosymbiosis between sea anemone, Aiptasia pulchella, and dinoflagellate zooxanthellae, Symbiodinium spp., was employed as a model. Two complementary DNA (cDNA) libraries were constructed from RNA isolated from symbiotic and aposymbiotic A. pulchella. Using single-pass sequencing of cDNA clones, a total of 870 expressed sequence tags (ESTs) clones were generated from the two libraries: 474 from symbiotic animal and 396 from aposymbiotic animal. The initial ESTs consisted of 143 clusters and 231 singletons. A BLASTX search revealed that 147 unique genes had similarities with protein sequences available from databases; 120 of these clones were categorized according to their putative function. However, many ESTs could not assign functionally. The putative roles of some of the identified genes relative to endosymbiosis were discussed. This is the first report of the use of EST analysis to examine the gene expression in symbiotic and aposymbiotic states of the cnidarians. The systematic analysis of EST from this study provides a useful database for future investigations of the molecular mechanisms involved in algal-cnidarian symbiosis.

  5. Genomic analysis of expressed sequence tags in American black bear Ursus americanus.

    PubMed

    Zhao, Sen; Shao, Chunxuan; Goropashnaya, Anna V; Stewart, Nathan C; Xu, Yichi; Tøien, Øivind; Barnes, Brian M; Fedorov, Vadim B; Yan, Jun

    2010-03-26

    Species of the bear family (Ursidae) are important organisms for research in molecular evolution, comparative physiology and conservation biology, but relatively little genetic sequence information is available for this group. Here we report the development and analyses of the first large scale Expressed Sequence Tag (EST) resource for the American black bear (Ursus americanus). Comprehensive analyses of molecular functions, alternative splicing, and tissue-specific expression of 38,757 black bear EST sequences were conducted using the dog genome as a reference. We identified 18 genes, involved in functions such as lipid catabolism, cell cycle, and vesicle-mediated transport, that are showing rapid evolution in the bear lineage Three genes, Phospholamban (PLN), cysteine glycine-rich protein 3 (CSRP3) and Troponin I type 3 (TNNI3), are related to heart contraction, and defects in these genes in humans lead to heart disease. Two genes, biphenyl hydrolase-like (BPHL) and CSRP3, contain positively selected sites in bear. Global analysis of evolution rates of hibernation-related genes in bear showed that they are largely conserved and slowly evolving genes, rather than novel and fast-evolving genes. We provide a genomic resource for an important mammalian organism and our study sheds new light on the possible functions and evolution of bear genes.

  6. Genomic analysis of expressed sequence tags in American black bear Ursus americanus

    PubMed Central

    2010-01-01

    Background Species of the bear family (Ursidae) are important organisms for research in molecular evolution, comparative physiology and conservation biology, but relatively little genetic sequence information is available for this group. Here we report the development and analyses of the first large scale Expressed Sequence Tag (EST) resource for the American black bear (Ursus americanus). Results Comprehensive analyses of molecular functions, alternative splicing, and tissue-specific expression of 38,757 black bear EST sequences were conducted using the dog genome as a reference. We identified 18 genes, involved in functions such as lipid catabolism, cell cycle, and vesicle-mediated transport, that are showing rapid evolution in the bear lineage Three genes, Phospholamban (PLN), cysteine glycine-rich protein 3 (CSRP3) and Troponin I type 3 (TNNI3), are related to heart contraction, and defects in these genes in humans lead to heart disease. Two genes, biphenyl hydrolase-like (BPHL) and CSRP3, contain positively selected sites in bear. Global analysis of evolution rates of hibernation-related genes in bear showed that they are largely conserved and slowly evolving genes, rather than novel and fast-evolving genes. Conclusion We provide a genomic resource for an important mammalian organism and our study sheds new light on the possible functions and evolution of bear genes. PMID:20338065

  7. Generation and analysis of expressed sequence tags from the medicinal plant Salvia miltiorrhiza.

    PubMed

    Yan, YaPing; Wang, ZheZhi; Tian, Wei; Dong, ZhongMin; Spencer, David F

    2010-02-01

    Salvia miltiorrhiza Bge. is a well-known traditional Chinese herb. Its roots have been formulated and used clinically for the treatment of various diseases. However, little genetic information has so far been available and this fact has become a major obstacle for molecular studies. To address this lack of genetic information, an Expressed Sequence Tag (EST) library from whole plantlets of S. miltiorrhiza was generated. From the 12959 cDNA clones that were randomly selected and subjected to single-pass sequencing from their 5' ends, 10288 ESTs (with sizes > or = 100 bp) were selected and assembled into 1288 contigs, leaving 2937 singletons, for a total of 4225 unigenes. These were analyzed using BLASTX (against protein databases), RPS-BLAST (against a conserved domain database) as well as the web-based KEGG Automatic Annotation Server for metabolic enzyme assignment. Based on the metabolic enzyme assignment, expression patterns of 14 secondary metabolic enzyme genes in different organs and under different treatments were verified using real-time PCR analysis. Additionally, a total of 122 microsatellites were identified from the ESTs, with 89 having sufficient flanking sequences for primer design. This set of ESTs represents a significant proportion of the S. miltiorrhiza transcriptome, and gives preliminary insights into the gene complement of S. miltiorrhiza. They will prove useful for uncovering secondary metabolic pathways, analyzing cDNA-array based gene expression, genetic manipulation to improve yield of desirable secondary products, and molecular marker identification.

  8. Generation and analysis of the expressed sequence tags from the mycelium of Ganoderma lucidum.

    PubMed

    Huang, Yen-Hua; Wu, Hung-Yi; Wu, Keh-Ming; Liu, Tze-Tze; Liou, Ruey-Fen; Tsai, Shih-Feng; Shiao, Ming-Shi; Ho, Low-Tone; Tzean, Shean-Shong; Yang, Ueng-Cheng

    2013-01-01

    Ganoderma lucidum (G. lucidum) is a medicinal mushroom renowned in East Asia for its potential biological effects. To enable a systematic exploration of the genes associated with the various phenotypes of the fungus, the genome consortium of G. lucidum has carried out an expressed sequence tag (EST) sequencing project. Using a Sanger sequencing based approach, 47,285 ESTs were obtained from in vitro cultures of G. lucidum mycelium of various durations. These ESTs were further clustered and merged into 7,774 non-redundant expressed loci. The features of these expressed contigs were explored in terms of over-representation, alternative splicing, and natural antisense transcripts. Our results provide an invaluable information resource for exploring the G. lucidum transcriptome and its regulation. Many cases of the genes over-represented in fast-growing dikaryotic mycelium are closely related to growth, such as cell wall and bioactive compound synthesis. In addition, the EST-genome alignments containing putative cassette exons and retained introns were manually curated and then used to make inferences about the predominating splice-site recognition mechanism of G. lucidum. Moreover, a number of putative antisense transcripts have been pinpointed, from which we noticed that two cases are likely to reveal hitherto undiscovered biological pathways. To allow users to access the data and the initial analysis of the results of this project, a dedicated web site has been created at http://csb2.ym.edu.tw/est/.

  9. Comprehensive analysis of expressed sequence tags from cultivated and wild radish (Raphanus spp.).

    PubMed

    Shen, Di; Sun, Honghe; Huang, Mingyun; Zheng, Yi; Qiu, Yang; Li, Xixiang; Fei, Zhangjun

    2013-10-21

    Radish (Raphanus sativus L., 2n = 2× = 18) is an economically important vegetable crop worldwide. A large collection of radish expressed sequence tags (ESTs) has been generated but remains largely uncharacterized. In this study, approximately 315,000 ESTs derived from 22 Raphanus cDNA libraries from 18 different genotypes were analyzed, for the purpose of gene and marker discovery and to evaluate large-scale genome duplication and phylogenetic relationships among Raphanus spp. The ESTs were assembled into 85,083 unigenes, of which 90%, 65%, 89% and 89% had homologous sequences in the GenBank nr, SwissProt, TrEMBL and Arabidopsis protein databases, respectively. A total of 66,194 (78%) could be assigned at least one gene ontology (GO) term. Comparative analysis identified 5,595 gene families unique to radish that were significantly enriched with genes related to small molecule metabolism, as well as 12,899 specific to the Brassicaceae that were enriched with genes related to seed oil body biogenesis and responses to phytohormones. The analysis further indicated that the divergence of radish and Brassica rapa occurred approximately 8.9-14.9 million years ago (MYA), following a whole-genome duplication event (12.8-21.4 MYA) in their common ancestor. An additional whole-genome duplication event in radish occurred at 5.1-8.4 MYA, after its divergence from B. rapa. A total of 13,570 simple sequence repeats (SSRs) and 28,758 high-quality single nucleotide polymorphisms (SNPs) were also identified. Using a subset of SNPs, the phylogenetic relationships of eight different accessions of Raphanus was inferred. Comprehensive analysis of radish ESTs provided new insights into radish genome evolution and the phylogenetic relationships of different radish accessions. Moreover, the radish EST sequences and the associated SSR and SNP markers described in this study represent a valuable resource for radish functional genomics studies and breeding.

  10. Expressed sequence tag analysis of guinea pig (Cavia porcellus) eye tissues for NEIBank

    PubMed Central

    Simpanya, Mukoma F.; Wistow, Graeme; Gao, James; David, Larry L.; Giblin, Frank J.

    2008-01-01

    Purpose To characterize gene expression patterns in guinea pig ocular tissues and identify orthologs of human genes from NEIBank expressed sequence tags. Methods RNA was extracted from dissected eye tissues of 2.5-month-old guinea pigs to make three unamplified and unnormalized cDNA libraries in the pCMVSport-6 vector for the lens, retina, and eye minus lens and retina. Over 4,000 clones were sequenced from each library and were analyzed using GRIST for clustering and gene identification. Lens crystallin EST data were validated using two-dimensional electrophoresis (2-DE), matrix assisted laser desorption (MALDI), and electrospray ionization mass spectrometry (ESIMS). Results Combined data from the three libraries generated a total of 6,694 distinctive gene clusters, with each library having between 1,000 and 3,000 clusters. Approximately 60% of the total gene clusters were novel cDNA sequences and had significant homologies to other mammalian sequences in GenBank. Complete cDNA sequences were obtained for many guinea pig lens proteins, including αA/αAinsert-, γN-, and γS-crystallins, lengsin and GRIFIN. The ratio of αA- to αB-crystallin on 2-DE gels was 8: 1 in the lens nucleus and 6.5: 1 in the cortex. Analysis of ESTs, genome sequence, and proteins (by MALDI), did not reveal any evidence for the presence of γD-, γE-, and γF-crystallin in the guinea pig. Predicted masses of many guinea pig lens crystallins were confirmed by ESIMS analysis. For the retina, orthologs of human phototransduction genes were found, such as Rhodopsin, S-antigen (Sag, Arrestin), and Transducin. The guinea-pig ortholog of NRL, a key rod photoreceptor-specific transcription factor, was also represented in EST data. In the ‘rest-of-eye’ library, the most abundant transcripts included decorin and keratin 12, representative of the cornea. Conclusions Genomic analysis of guinea pig eye tissues provides sequence-verified clones for future studies. Guinea pig orthologs of many human

  11. Primary Analysis of the Expressed Sequence Tags in a Pentastomid Nymph cDNA Library

    PubMed Central

    Yuan, Zhongying; Yin, Jianhai; Zang, Wei; Xu, Yuxin; Lu, Weiyuan; Wang, Yanjuan; Wang, Ying; Cao, Jianping

    2013-01-01

    Background Pentastomiasis is a rare zoonotic disease caused by pentastomids. Despite their worm-like appearance, they are commonly placed into a separate sub-class of the subphylum Crustacea, phylum Arthropoda. However, until now, the systematic classification of the pentastomids and the diagnosis of pentastomiasis are immature, and genetic information about pentastomid nylum is almost nonexistent. The objective of this study was to obtain information on pentastomid nymph genes and identify the gene homologues related to host-parasite interactions or stage-specific antigens. Methodology/Principal Findings Total pentastomid nymph RNA was used to construct a cDNA library and 500 colonies were sequenced. Analysis shows one hundred and ninety-seven unigenes were identified. In which, 147 genes were annotated, and 75 unigenes (53.19%) were mapped to 82 KEGG pathways, including 29 metabolism pathways, 29 genetic information processing pathways, 4 environmental information processing pathways, 7 cell motility pathways and 5 organismal systems pathways. Additionally, two host-parasite interaction-related gene homologues, a putative Kunitz inhibitor and a putative cysteine protease. Conclusion/Significance We first successfully constructed a cDNA library and gained a number of expressed sequence tags (EST) from pentastomid nymphs, which will lay the foundation for the further study on pentastomids and pentastomiasis. PMID:23437150

  12. Generation and analysis of expressed sequence tags from NaCl-treated Glycine soja

    PubMed Central

    Ji, Wei; Li, Yong; Li, Jie; Dai, Cui-hong; Wang, Xi; Bai, Xi; Cai, Hua; Yang, Liang; Zhu, Yan-ming

    2006-01-01

    Background Salinization causes negative effects on plant productivity and poses an increasingly serious threat to the sustainability of agriculture. Wild soybean (Glycine soja) can survive in highly saline conditions, therefore provides an ideal candidate plant system for salt tolerance gene mining. Results As a first step towards the characterization of genes that contribute to combating salinity stress, we constructed a full-length cDNA library of Glycine soja (50109) leaf treated with 150 mM NaCl, using the SMART technology. Random expressed sequence tag (EST) sequencing of 2,219 clones produced 2,003 cleaned ESTs for gene expression analysis. The average read length of cleaned ESTs was 454 bp, with an average GC content of 40%. These ESTs were assembled using the PHRAP program to generate 375 contigs and 696 singlets. The resulting unigenes were categorized according to the Gene Ontology (GO) hierarchy. The potential roles of gene products associated with stress related ESTs were discussed. We compared the EST sequences of Glycine soja to that of Glycine max by using the blastn algorithm. Most expressed sequences from wild soybean exhibited similarity with soybean. All our EST data are available on the Internet (GenBank_Accn: DT082443~DT084445). Conclusion The Glycine soja ESTs will be used to mine salt tolerance gene, whose full-length cDNAs will be obtained easily from the full-length cDNA library. Comparison of Glycine soja ESTs with those of Glycine max revealed the potential to investigate the wild soybean's expression profile using the soybean's gene chip. This will provide opportunities to understand the genetic mechanisms underlying stress response of plants. PMID:16504061

  13. Comprehensive analysis of expressed sequence tags from cultivated and wild radish (Raphanus spp.)

    PubMed Central

    2013-01-01

    Background Radish (Raphanus sativus L., 2n = 2× = 18) is an economically important vegetable crop worldwide. A large collection of radish expressed sequence tags (ESTs) has been generated but remains largely uncharacterized. Results In this study, approximately 315,000 ESTs derived from 22 Raphanus cDNA libraries from 18 different genotypes were analyzed, for the purpose of gene and marker discovery and to evaluate large-scale genome duplication and phylogenetic relationships among Raphanus spp. The ESTs were assembled into 85,083 unigenes, of which 90%, 65%, 89% and 89% had homologous sequences in the GenBank nr, SwissProt, TrEMBL and Arabidopsis protein databases, respectively. A total of 66,194 (78%) could be assigned at least one gene ontology (GO) term. Comparative analysis identified 5,595 gene families unique to radish that were significantly enriched with genes related to small molecule metabolism, as well as 12,899 specific to the Brassicaceae that were enriched with genes related to seed oil body biogenesis and responses to phytohormones. The analysis further indicated that the divergence of radish and Brassica rapa occurred approximately 8.9-14.9 million years ago (MYA), following a whole-genome duplication event (12.8-21.4 MYA) in their common ancestor. An additional whole-genome duplication event in radish occurred at 5.1-8.4 MYA, after its divergence from B. rapa. A total of 13,570 simple sequence repeats (SSRs) and 28,758 high-quality single nucleotide polymorphisms (SNPs) were also identified. Using a subset of SNPs, the phylogenetic relationships of eight different accessions of Raphanus was inferred. Conclusion Comprehensive analysis of radish ESTs provided new insights into radish genome evolution and the phylogenetic relationships of different radish accessions. Moreover, the radish EST sequences and the associated SSR and SNP markers described in this study represent a valuable resource for radish functional genomics studies and

  14. Transcriptome analysis of the Amazonian viper Bothrops atrox venom gland using expressed sequence tags (ESTs).

    PubMed

    Neiva, Márcia; Arraes, Fabricio B M; de Souza, Jonso Vieira; Rádis-Baptista, Gandhi; Prieto da Silva, Alvaro R B; Walter, Maria Emilia M T; Brigido, Marcelo de Macedo; Yamane, Tetsuo; López-Lozano, Jorge Luiz; Astolfi-Filho, Spartaco

    2009-03-15

    Bothrops atrox is a highly dangerous pit viper in the Brazilian Amazon region. We produced a global catalogue of gene transcripts to identify the main toxin and other protein families present in the B. atrox venom gland. We prepared a directional cDNA library, from which a set of 610 high quality expressed sequence tags (ESTs) were generated by bioinformatics processing. Our data indicated a predominance of transcripts encoding mainly metalloproteinases (59% of the toxins). The expression pattern of the B. atrox venom was similar to Bothrops insularis, Bothrops jararaca and Bothrops jararacussu in terms of toxin type, although some differences were observed. B. atrox showed a higher amount of the PIII class of metalloproteinases which correlates well with the observed intense hemorrhagic action of its toxin. Also, the PLA2 content was the second highest in this sample compared to the other three Bothrops transcriptomes. To our knowledge, this work is the first transcriptome analysis of an Amazonian rain forest pit viper and it will contribute to the body of knowledge regarding the gene diversity of the venom gland of members of the Bothrops genus. Moreover, our results can be used for future studies with other snake species from the Amazon region to investigate differences in gene patterns or phylogenetic relationships.

  15. Expressed sequence tag analysis in Cycas, the most primitive living seed plant

    PubMed Central

    Brenner, Eric D; Stevenson, Dennis W; McCombie, Richard W; Katari, Manpreet S; Rudd, Stephen A; Mayer, Klaus FX; Palenchar, Peter M; Runko, Suzan J; Twigg, Richard W; Dai, Guangwei; Martienssen, Rob A; Benfey, Phillip N; Coruzzi, Gloria M

    2003-01-01

    Background Cycads are ancient seed plants (living fossils) with origins in the Paleozoic. Cycads are sometimes considered a 'missing link' as they exhibit characteristics intermediate between vascular non-seed plants and the more derived seed plants. Cycads have also been implicated as the source of 'Guam's dementia', possibly due to the production of S(+)-beta-methyl-alpha, beta-diaminopropionic acid (BMAA), which is an agonist of animal glutamate receptors. Results A total of 4,200 expressed sequence tags (ESTs) were created from Cycas rumphii and clustered into 2,458 contigs, of which 1,764 had low-stringency BLAST similarity to other plant genes. Among those cycad contigs with similarity to plant genes, 1,718 cycad 'hits' are to angiosperms, 1,310 match genes in gymnosperms and 734 match lower (non-seed) plants. Forty-six contigs were found that matched only genes in lower plants and gymnosperms. Upon obtaining the complete sequence from the clones of 37/46 contigs, 14 still matched only gymnosperms. Among those cycad contigs common to higher plants, ESTs were discovered that correspond to those involved in development and signaling in present-day flowering plants. We purified a cycad EST for a glutamate receptor (GLR)-like gene, as well as ESTs potentially involved in the synthesis of the GLR agonist BMAA. Conclusions Analysis of cycad ESTs has uncovered conserved and potentially novel genes. Furthermore, the presence of a glutamate receptor agonist, as well as a glutamate receptor-like gene in cycads, supports the hypothesis that such neuroactive plant products are not merely herbivore deterrents but may also serve a role in plant signaling. PMID:14659015

  16. Analysis and functional annotation of expressed sequence tags from the fall armyworm Spodoptera frugiperda.

    PubMed

    Deng, Youping; Dong, Yinghua; Thodima, Venkata; Clem, Rollie J; Passarelli, A Lorena

    2006-10-19

    Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm. We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences. S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses.

  17. Myocardial tagging by cardiovascular magnetic resonance: evolution of techniques--pulse sequences, analysis algorithms, and applications.

    PubMed

    Ibrahim, El-Sayed H

    2011-07-28

    Cardiovascular magnetic resonance (CMR) tagging has been established as an essential technique for measuring regional myocardial function. It allows quantification of local intramyocardial motion measures, e.g. strain and strain rate. The invention of CMR tagging came in the late eighties, where the technique allowed for the first time for visualizing transmural myocardial movement without having to implant physical markers. This new idea opened the door for a series of developments and improvements that continue up to the present time. Different tagging techniques are currently available that are more extensive, improved, and sophisticated than they were twenty years ago. Each of these techniques has different versions for improved resolution, signal-to-noise ratio (SNR), scan time, anatomical coverage, three-dimensional capability, and image quality. The tagging techniques covered in this article can be broadly divided into two main categories: 1) Basic techniques, which include magnetization saturation, spatial modulation of magnetization (SPAMM), delay alternating with nutations for tailored excitation (DANTE), and complementary SPAMM (CSPAMM); and 2) Advanced techniques, which include harmonic phase (HARP), displacement encoding with stimulated echoes (DENSE), and strain encoding (SENC). Although most of these techniques were developed by separate groups and evolved from different backgrounds, they are in fact closely related to each other, and they can be interpreted from more than one perspective. Some of these techniques even followed parallel paths of developments, as illustrated in the article. As each technique has its own advantages, some efforts have been made to combine different techniques together for improved image quality or composite information acquisition. In this review, different developments in pulse sequences and related image processing techniques are described along with the necessities that led to their invention, which makes this

  18. Myocardial tagging by Cardiovascular Magnetic Resonance: evolution of techniques--pulse sequences, analysis algorithms, and applications

    PubMed Central

    2011-01-01

    Cardiovascular magnetic resonance (CMR) tagging has been established as an essential technique for measuring regional myocardial function. It allows quantification of local intramyocardial motion measures, e.g. strain and strain rate. The invention of CMR tagging came in the late eighties, where the technique allowed for the first time for visualizing transmural myocardial movement without having to implant physical markers. This new idea opened the door for a series of developments and improvements that continue up to the present time. Different tagging techniques are currently available that are more extensive, improved, and sophisticated than they were twenty years ago. Each of these techniques has different versions for improved resolution, signal-to-noise ratio (SNR), scan time, anatomical coverage, three-dimensional capability, and image quality. The tagging techniques covered in this article can be broadly divided into two main categories: 1) Basic techniques, which include magnetization saturation, spatial modulation of magnetization (SPAMM), delay alternating with nutations for tailored excitation (DANTE), and complementary SPAMM (CSPAMM); and 2) Advanced techniques, which include harmonic phase (HARP), displacement encoding with stimulated echoes (DENSE), and strain encoding (SENC). Although most of these techniques were developed by separate groups and evolved from different backgrounds, they are in fact closely related to each other, and they can be interpreted from more than one perspective. Some of these techniques even followed parallel paths of developments, as illustrated in the article. As each technique has its own advantages, some efforts have been made to combine different techniques together for improved image quality or composite information acquisition. In this review, different developments in pulse sequences and related image processing techniques are described along with the necessities that led to their invention, which makes this

  19. Not all sequence tags are created equal: designing and validating sequence identification tags robust to indels.

    PubMed

    Faircloth, Brant C; Glenn, Travis C

    2012-01-01

    Ligating adapters with unique synthetic oligonucleotide sequences (sequence tags) onto individual DNA samples before massively parallel sequencing is a popular and efficient way to obtain sequence data from many individual samples. Tag sequences should be numerous and sufficiently different to ensure sequencing, replication, and oligonucleotide synthesis errors do not cause tags to be unrecoverable or confused. However, many design approaches only protect against substitution errors during sequencing and extant tag sets contain too few tag sequences. We developed an open-source software package to validate sequence tags for conformance to two distance metrics and design sequence tags robust to indel and substitution errors. We use this software package to evaluate several commercial and non-commercial sequence tag sets, design several large sets (max(count) = 7,198) of edit metric sequence tags having different lengths and degrees of error correction, and integrate a subset of these edit metric tags to polymerase chain reaction (PCR) primers and sequencing adapters. We validate a subset of these edit metric tagged PCR primers and sequencing adapters by sequencing on several platforms and subsequent comparison to commercially available alternatives. We find that several commonly used sets of sequence tags or design methodologies used to produce sequence tags do not meet the minimum expectations of their underlying distance metric, and we find that PCR primers and sequencing adapters incorporating edit metric sequence tags designed by our software package perform as well as their commercial counterparts. We suggest that researchers evaluate sequence tags prior to use or evaluate tags that they have been using. The sequence tag sets we design improve on extant sets because they are large, valid across the set, and robust to the suite of substitution, insertion, and deletion errors affecting massively parallel sequencing workflows on all currently used platforms.

  20. Desiccation survival in an Antarctic nematode: molecular analysis using expressed sequenced tags

    PubMed Central

    Adhikari, Bishwo N; Wall, Diana H; Adams, Byron J

    2009-01-01

    Background Nematodes are the dominant soil animals in Antarctic Dry Valleys and are capable of surviving desiccation and freezing in an anhydrobiotic state. Genes induced by desiccation stress have been successfully enumerated in nematodes; however we have little knowledge of gene regulation by Antarctic nematodes which can survive multiple environmental stresses. To address this problem we investigated the genetic responses of a nematode species, Plectus murrayi, that is capable of tolerating Antarctic environmental extremes, in particular desiccation and freezing. In this study, we provide the first insight into the desiccation induced transcriptome of an Antarctic nematode through cDNA library construction and suppressive subtractive hybridization. Results We obtained 2,486 expressed sequence tags (ESTs) from 2,586 clones derived from the cDNA library of desiccated P. murrayi. The 2,486 ESTs formed 1,387 putative unique transcripts of which 523 (38%) had matches in the model-nematode Caenorhabditis elegans, 107 (7%) in nematodes other than C. elegans, 153 (11%) in non-nematode organisms and 605 (44%) had no significant match to any sequences in the current databases. The 1,387 unique transcripts were functionally classified by using Gene Ontology (GO) hierarchy and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The results indicate that the transcriptome contains a group of transcripts from diverse functional areas. The subtractive library of desiccated nematodes showed 80 transcripts differentially expressed during desiccation stress, of which 28% were metabolism related, 19% were involved in environmental information processing, 28% involved in genetic information processing and 21% were novel transcripts. Expression profiling of 14 selected genes by quantitative Real-time PCR showed 9 genes significantly up-regulated, 3 down-regulated and 2 continuously expressed in response to desiccation. Conclusion The establishment of a desiccation EST

  1. Analysis and functional annotation of expressed sequence tags from the fall armyworm Spodoptera frugiperda

    PubMed Central

    Deng, Youping; Dong, Yinghua; Thodima, Venkata; Clem, Rollie J; Passarelli, A Lorena

    2006-01-01

    Background Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm. Results We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences. Conclusion S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses. PMID:17052344

  2. Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa

    PubMed Central

    2012-01-01

    Background Bulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Results Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Conclusions Two transcriptome sets

  3. Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa.

    PubMed

    Shahin, Arwa; van Kaauwen, Martijn; Esselink, Danny; Bargsten, Joachim W; van Tuyl, Jaap M; Visser, Richard G F; Arens, Paul

    2012-11-20

    Bulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Two transcriptome sets were built that are valuable

  4. Analysis of expressed sequence tags of the cyclically parthenogenetic rotifer Brachionus plicatilis.

    PubMed

    Suga, Koushirou; Welch, David Mark; Tanaka, Yukari; Sakakura, Yoshitaka; Hagiwara, Atsushi

    2007-08-01

    Rotifers are among the most common non-arthropod animals and are the most experimentally tractable members of the basal assemblage of metazoan phyla known as Gnathifera. The monogonont rotifer Brachionus plicatilis is a developing model system for ecotoxicology, aquatic ecology, cryptic speciation, and the evolution of sex, and is an important food source for finfish aquaculture. However, basic knowledge of the genome and transcriptome of any rotifer species has been lacking. We generated and partially sequenced a cDNA library from B. plicatilis and constructed a database of over 2300 expressed sequence tags corresponding to more than 450 transcripts. About 20% of the transcripts had no significant similarity to database sequences by BLAST; most of these contained open reading frames of significant length but few had recognized Pfam motifs. Sixteen transcripts accounted for 25% of the ESTs; four of these had no significant similarity to BLAST or Pfam databases. Putative up- and downstream untranslated regions are relatively short and AT rich. In contrast to bdelloid rotifers, there was no evidence of a conserved trans-spliced leader sequence among the transcripts and most genes were single-copy. Despite the small size of this EST project it revealed several important features of the rotifer transcriptome and of individual monogonont genes. Because there is little genomic data for Gnathifera, the transcripts we found with no known function may represent genes that are species-, class-, phylum- or even superphylum-specific; the fact that some are among the most highly expressed indicates their importance. The absence of trans-spliced leader exons in this monogonont species contrasts with their abundance in bdelloid rotifers and indicates that the presence of this phenomenon can vary at the subphylum level. Our EST database provides a relatively large quantity of transcript-level data for B. plicatilis, and more generally of rotifers and other gnathiferan phyla, and

  5. Analysis of Expressed Sequence Tags of the Cyclically Parthenogenetic Rotifer Brachionus plicatilis

    PubMed Central

    Suga, Koushirou; Mark Welch, David; Tanaka, Yukari; Sakakura, Yoshitaka; Hagiwara, Atsushi

    2007-01-01

    Background Rotifers are among the most common non-arthropod animals and are the most experimentally tractable members of the basal assemblage of metazoan phyla known as Gnathifera. The monogonont rotifer Brachionus plicatilis is a developing model system for ecotoxicology, aquatic ecology, cryptic speciation, and the evolution of sex, and is an important food source for finfish aquaculture. However, basic knowledge of the genome and transcriptome of any rotifer species has been lacking. Methodology/Principal Findings We generated and partially sequenced a cDNA library from B. plicatilis and constructed a database of over 2300 expressed sequence tags corresponding to more than 450 transcripts. About 20% of the transcripts had no significant similarity to database sequences by BLAST; most of these contained open reading frames of significant length but few had recognized Pfam motifs. Sixteen transcripts accounted for 25% of the ESTs; four of these had no significant similarity to BLAST or Pfam databases. Putative up- and downstream untranslated regions are relatively short and AT rich. In contrast to bdelloid rotifers, there was no evidence of a conserved trans-spliced leader sequence among the transcripts and most genes were single-copy. Conclusions/Significance Despite the small size of this EST project it revealed several important features of the rotifer transcriptome and of individual monogonont genes. Because there is little genomic data for Gnathifera, the transcripts we found with no known function may represent genes that are species-, class-, phylum- or even superphylum-specific; the fact that some are among the most highly expressed indicates their importance. The absence of trans-spliced leader exons in this monogonont species contrasts with their abundance in bdelloid rotifers and indicates that the presence of this phenomenon can vary at the subphylum level. Our EST database provides a relatively large quantity of transcript-level data for B. plicatilis

  6. Functional analysis and comparative genomics of expressed sequence tags from the lycophyte Selaginella moellendorffii

    PubMed Central

    Weng, Jing-Ke; Tanurdzic, Milos; Chapple, Clint

    2005-01-01

    Background The lycophyte Selaginella moellendorffii is a member of one of the oldest lineages of vascular plants on Earth. Fossil records show that the lycophyte clade arose 400 million years ago, 150–200 million years earlier than angiosperms, a group of plants that includes the well-studied flowering plant Arabidopsis thaliana. S. moellendorffii has a genome size of approximately 100 Mbp, as small or smaller than that of A. thaliana. S. moellendorffii has the potential to provide significant comparative information to better understand the evolution of vascular plants. Results We sequenced 2181 Expressed Sequence Tags (ESTs) from a S. moellendorffii cDNA library. One thousand three hundred and one non-redundant sequences were assembled, containing 291 contigs and 1010 singletons. Approximately 75% of the ESTs matched proteins in the non-redundant protein database. Among 1301 clusters, 343 were categorized according to Gene Ontology (GO) hierarchy and were compared to the GO mapping of A. thaliana tentative consensus sequences. We compared S. moellendorffii ESTs to the A. thaliana and Physcomitrella patens EST databases, using the tBLASTX algorithm. Approximately 60% of the ESTs exhibited similarity with both A. thaliana and P. patens ESTs; whereas, 13% and 1% of the ESTs had exclusive similarity with A. thaliana and P. patens ESTs, respectively. A substantial proportion of the ESTs (26%) had no match with A. thaliana or P. patens ESTs. Conclusion We discovered 1301 putative unigenes in S. moellendorffii. These results give an initial insight into its transcriptome that will aid in the study of the S. moellendorffii genome in the near future. PMID:15938755

  7. Immune gene discovery by expressed sequence tag (EST) analysis of hemocytes in the ridgetail white prawn Exopalaemon carinicauda

    PubMed Central

    Duan, Yafei; Liu, Ping; Li, Jitao; Li, Jian; Chen, Ping

    2013-01-01

    The ridgetail white prawn Exopalaemon carinicauda is one of the most important commercial species in eastern China. However, little information of immune genes in E. carinicauda has been reported. To identify distinctive genes associated with immunity, an expressed sequence tag (EST) library was constructed from hemocytes of E. carinicauda. A total of 3411 clones were sequenced, yielding 2853 ESTs and the average sequence length is 436 bp. The cluster and assembly analysis yielded 1053 unique sequences including 329 contigs and 724 singletons. Blast analysis identified 593 (56.3%) of the unique sequences as orthologs of genes from other organisms (E-value < 1e-5). Based on the COG and Gene Ontology (GO), 593 unique sequences were classified. Through comparison with previous studies, 153 genes assembled from 367 ESTs have been identified as possibly involved in defense or immune functions. These genes are categorized into seven categories according to their putative functions in shrimp immune system: antimicrobial peptides, prophenoloxidase activating system, antioxidant defense systems, chaperone proteins, clottable proteins, pattern recognition receptors and other immune-related genes. According to EST abundance, the major immune-related genes were thioredoxin (141, 4.94% of all ESTs) and calmodulin (14, 0.49% of all ESTs). The EST sequences of E. carinicauda hemocytes provide important information of the immune system and lay the groundwork for development of molecular markers related to disease resistance in prawn species. PMID:23092732

  8. MT-Toolbox: improved amplicon sequencing using molecule tags.

    PubMed

    Yourstone, Scott M; Lundberg, Derek S; Dangl, Jeffery L; Jones, Corbin D

    2014-08-22

    Short oligonucleotides can be used as markers to tag and track DNA sequences. For example, barcoding techniques (i.e. Multiplex Identifiers or Indexing) use short oligonucleotides to distinguish between reads from different DNA samples pooled for high-throughput sequencing. A similar technique called molecule tagging uses the same principles but is applied to individual DNA template molecules. Each template molecule is tagged with a unique oligonucleotide prior to polymerase chain reaction. The resulting amplicon sequences can be traced back to their original templates by their oligonucleotide tag. Consensus building from sequences sharing the same tag enables inference of original template molecules thereby reducing effects of sequencing error and polymerase chain reaction bias. Several independent groups have developed similar protocols for molecule tagging; however, user-friendly software for build consensus sequences from molecule tagged reads is not readily available or is highly specific for a particular protocol. MT-Toolbox recognizes oligonucleotide tags in amplicons and infers the correct template sequence. On a set of molecule tagged test reads, MT-Toolbox generates sequences having on average 0.00047 errors per base. MT-Toolbox includes a graphical user interface, command line interface, and options for speed and accuracy maximization. It can be run in serial on a standard personal computer or in parallel on a Load Sharing Facility based cluster system. An optional plugin provides features for common 16S metagenome profiling analysis such as chimera filtering, building operational taxonomic units, contaminant removal, and taxonomy assignments. MT-Toolbox provides an accessible, user-friendly environment for analysis of molecule tagged reads thereby reducing technical errors and polymerase chain reaction bias. These improvements reduce noise and allow for greater precision in single amplicon sequencing experiments.

  9. Generation and analysis of expressed sequence tags (ESTs) for marker development in yam (Dioscorea alata L.)

    PubMed Central

    2011-01-01

    Background Anthracnose (Colletotrichum gloeosporioides) is a major limiting factor in the production of yam (Dioscorea spp.) worldwide. Availability of high quality sequence information is necessary for designing molecular markers associated with resistance. However, very limited sequence information pertaining to yam is available at public genome databases. Therefore, this collaborative project was developed for genetic improvement and germplasm characterization of yams using molecular markers. The current investigation is focused on studying gene expression, by large scale generation of ESTs, from one susceptible (TDa 95-0310) and two resistant yam genotypes (TDa 87-01091, TDa 95-0328) challenged with the fungus. Total RNA was isolated from young leaves of resistant and susceptible genotypes and cDNA libraries were sequenced using Roche 454 technology. Results A total of 44,757 EST sequences were generated from the cDNA libraries of the resistant and susceptible genotypes. Greater than 56% of ESTs were annotated using MapMan Mercator tool and Blast2GO search tools. Gene annotations were used to characterize the transcriptome in yam and also perform a differential gene expression analysis between the resistant and susceptible EST datasets. Mining for SSRs in the ESTs revealed 1702 unique sequences containing SSRs and 1705 SSR markers were designed using those sequences. Conclusion We have developed a comprehensive annotated transcriptome data set in yam to enrich the EST information in public databases. cDNA libraries were constructed from anthracnose fungus challenged leaf tissues for transcriptome characterization, and differential gene expression analysis. Thus, it helped in identifying unique transcripts in each library for disease resistance. These EST resources provide the basis for future microarray development, marker validation, genetic linkage mapping and QTL analysis in Dioscorea species. PMID:21303556

  10. Generation and analysis of expressed sequence tags (ESTs) for marker development in yam (Dioscorea alata L.).

    PubMed

    Narina, Satya S; Buyyarapu, Ramesh; Kottapalli, Kameswara Rao; Sartie, Alieu M; Ali, Mohamed I; Robert, Asiedu; Hodeba, Mignouna J D; Sayre, Brian L; Scheffler, Brian E

    2011-02-09

    Anthracnose (Colletotrichum gloeosporioides) is a major limiting factor in the production of yam (Dioscorea spp.) worldwide. Availability of high quality sequence information is necessary for designing molecular markers associated with resistance. However, very limited sequence information pertaining to yam is available at public genome databases. Therefore, this collaborative project was developed for genetic improvement and germplasm characterization of yams using molecular markers. The current investigation is focused on studying gene expression, by large scale generation of ESTs, from one susceptible (TDa 95-0310) and two resistant yam genotypes (TDa 87-01091, TDa 95-0328) challenged with the fungus. Total RNA was isolated from young leaves of resistant and susceptible genotypes and cDNA libraries were sequenced using Roche 454 technology. A total of 44,757 EST sequences were generated from the cDNA libraries of the resistant and susceptible genotypes. Greater than 56% of ESTs were annotated using MapMan Mercator tool and Blast2GO search tools. Gene annotations were used to characterize the transcriptome in yam and also perform a differential gene expression analysis between the resistant and susceptible EST datasets. Mining for SSRs in the ESTs revealed 1702 unique sequences containing SSRs and 1705 SSR markers were designed using those sequences. We have developed a comprehensive annotated transcriptome data set in yam to enrich the EST information in public databases. cDNA libraries were constructed from anthracnose fungus challenged leaf tissues for transcriptome characterization, and differential gene expression analysis. Thus, it helped in identifying unique transcripts in each library for disease resistance. These EST resources provide the basis for future microarray development, marker validation, genetic linkage mapping and QTL analysis in Dioscorea species.

  11. Analysis of expressed sequence tags (ESTs) from cocoa (Theobroma cacao L) upon infection with Phytophthora megakarya.

    PubMed

    Naganeeswaran, Sudalaimuthu Asari; Subbian, Elain Apshara; Ramaswamy, Manimekalai

    2012-01-01

    Phytophthora megakarya, the causative agent of cacao black pod disease in West African countries causes an extensive loss of yield. In this study we have analyzed 4 libraries of ESTs derived from Phytophthora megakarya infected cocoa leaf and pod tissues. Totally 6379 redundant sequences were retrieved from ESTtik database and EST processing was performed using seqclean tool. Clustering and assembling using CAP3 generated 3333 non-redundant (907 contigs and 2426 singletons) sequences. The primary sequence analysis of 3333 non-redundant sequences showed that the GC percentage was 42.7 and the sequence length ranged from 101 - 2576 nucleotides. Further, functional analysis (Blast, Interproscan, Gene ontology and KEGG search) were executed and 1230 orthologous genes were annotated. Totally 272 enzymes corresponding to 114 metabolic pathways were identified. Functional annotation revealed that most of the sequences are related to molecular function, stress response and biological processes. The annotated enzymes are aldehyde dehydrogenase (E.C: 1.2.1.3), catalase (E.C: 1.11.1.6), acetyl-CoA C-acetyltransferase (E.C: 2.3.1.9), threonine ammonia-lyase (E.C: 4.3.1.19), acetolactate synthase (E.C: 2.2.1.6), O-methyltransferase (E.C: 2.1.1.68) which play an important role in amino acid biosynthesis and phenyl propanoid biosynthesis. All this information was stored in MySQL database management system to be used in future for reconstruction of biotic stress response pathway in cocoa.

  12. Expressed sequence tag analysis of Antarctic hairgrass Deschampsia antarctica from King George Island, Antarctica.

    PubMed

    Lee, Hyoungseok; Cho, Hyun Hee; Kim, Il-Chan; Yim, Joung Han; Lee, Hong Kum; Lee, Yoo Kyung

    2008-04-30

    Deschampsia antarctica is the only monocot that thrives in the tough conditions of the Antarctic region. It is an invaluable resource for the identification of genes associated with tolerance to various environmental pressures. In order to identify genes that are differentially regulated between greenhouse-grown and Antarctic field-grown plants, we initiated a detailed gene expression analysis. Antarctic plants were collected and greenhouse plants served as controls. Two different cDNA libraries were constructed with these plants. A total of 2,112 cDNA clones was sequenced and grouped into 1,199 unigene clusters consisting of 243 consensus and 956 singleton sequences. Using similarity searches against several public databases, we constructed a functional classification of the ESTs into categories such as genes related to responses to stimuli, as well as photosynthesis and metabolism. Real-time PCR analysis of various stress responsive genes revealed different patterns of regulation in the different environments, suggesting that these genes are involved in responses to specific environmental factors.

  13. Analysis of expressed sequence tags from the blue-green sharpshooter, Graphocephala atropunctata

    USDA-ARS?s Scientific Manuscript database

    We used a metagenomic approach and identified and sequenced 6,836 genetic sequences isolated from adult blue-green sharpshooters, BGSS, Graphocephala atropunctata. These results provided over 70% of the mitochondrial genome sequence which is being completed. The BGSS is endemic to southern Californ...

  14. Expressed Sequence Tags Analysis and Design of Simple Sequence Repeats Markers from a Full-Length cDNA Library in Perilla frutescens (L.)

    PubMed Central

    Seong, Eun Soo; Yoo, Ji Hye; Choi, Jae Hoo; Kim, Chang Heum; Jeon, Mi Ran; Kang, Byeong Ju; Lee, Jae Geun; Choi, Seon Kang; Ghimire, Bimal Kumar; Yu, Chang Yeon

    2015-01-01

    Perilla frutescens is valuable as a medicinal plant as well as a natural medicine and functional food. However, comparative genomics analyses of P. frutescens are limited due to a lack of gene annotations and characterization. A full-length cDNA library from P. frutescens leaves was constructed to identify functional gene clusters and probable EST-SSR markers via analysis of 1,056 expressed sequence tags. Unigene assembly was performed using basic local alignment search tool (BLAST) homology searches and annotated Gene Ontology (GO). A total of 18 simple sequence repeats (SSRs) were designed as primer pairs. This study is the first to report comparative genomics and EST-SSR markers from P. frutescens will help gene discovery and provide an important source for functional genomics and molecular genetic research in this interesting medicinal plant. PMID:26664999

  15. Multiple platform assessment of the EGF dependent transcriptome by microarray and deep tag sequencing analysis

    PubMed Central

    2011-01-01

    Background Epidermal Growth Factor (EGF) is a key regulatory growth factor activating many processes relevant to normal development and disease, affecting cell proliferation and survival. Here we use a combined approach to study the EGF dependent transcriptome of HeLa cells by using multiple long oligonucleotide based microarray platforms (from Agilent, Operon, and Illumina) in combination with digital gene expression profiling (DGE) with the Illumina Genome Analyzer. Results By applying a procedure for cross-platform data meta-analysis based on RankProd and GlobalAncova tests, we establish a well validated gene set with transcript levels altered after EGF treatment. We use this robust gene list to build higher order networks of gene interaction by interconnecting associated networks, supporting and extending the important role of the EGF signaling pathway in cancer. In addition, we find an entirely new set of genes previously unrelated to the currently accepted EGF associated cellular functions. Conclusions We propose that the use of global genomic cross-validation derived from high content technologies (microarrays or deep sequencing) can be used to generate more reliable datasets. This approach should help to improve the confidence of downstream in silico functional inference analyses based on high content data. PMID:21699700

  16. The venom composition of the parasitic wasp Chelonus inanitus resolved by combined expressed sequence tags analysis and proteomic approach

    PubMed Central

    2010-01-01

    Background Parasitic wasps constitute one of the largest group of venomous animals. Although some physiological effects of their venoms are well documented, relatively little is known at the molecular level on the protein composition of these secretions. To identify the majority of the venom proteins of the endoparasitoid wasp Chelonus inanitus (Hymenoptera: Braconidae), we have randomly sequenced 2111 expressed sequence tags (ESTs) from a cDNA library of venom gland. In parallel, proteins from pure venom were separated by gel electrophoresis and individually submitted to a nano-LC-MS/MS analysis allowing comparison of peptides and ESTs sequences. Results About 60% of sequenced ESTs encoded proteins whose presence in venom was attested by mass spectrometry. Most of the remaining ESTs corresponded to gene products likely involved in the transcriptional and translational machinery of venom gland cells. In addition, a small number of transcripts were found to encode proteins that share sequence similarity with well-known venom constituents of social hymenopteran species, such as hyaluronidase-like proteins and an Allergen-5 protein. An overall number of 29 venom proteins could be identified through the combination of ESTs sequencing and proteomic analyses. The most highly redundant set of ESTs encoded a protein that shared sequence similarity with a venom protein of unknown function potentially specific of the Chelonus lineage. Venom components specific to C. inanitus included a C-type lectin domain containing protein, a chemosensory protein-like protein, a protein related to yellow-e3 and ten new proteins which shared no significant sequence similarity with known sequences. In addition, several venom proteins potentially able to interact with chitin were also identified including a chitinase, an imaginal disc growth factor-like protein and two putative mucin-like peritrophins. Conclusions The use of the combined approaches has allowed to discriminate between cellular

  17. An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

    PubMed Central

    Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

    2004-01-01

    Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051

  18. Generation and Analysis of Expressed Sequence Tags from Olea europaea L.

    PubMed Central

    Ozdemir Ozgenturk, Nehir; Oruç, Fatma; Sezerman, Ugur; Kuçukural, Alper; Vural Korkut, Senay; Toksoz, Feriha; Un, Cemal

    2010-01-01

    Olive (Olea europaea L.) is an important source of edible oil which was originated in Near-East region. In this study, two cDNA libraries were constructed from young olive leaves and immature olive fruits for generation of ESTs to discover the novel genes and search the function of unknown genes of olive. The randomly selected 3840 colonies were sequenced for EST collection from both libraries. Readable 2228 sequences for olive leaf and 1506 sequences for olive fruit were assembled into 205 and 69 contigs, respectively, whereas 2478 were singletons. Putative functions of all 2752 differentially expressed unique sequences were designated by gene homology based on BLAST and annotated using BLAST2GO. While 1339 ESTs show no homology to the database, 2024 ESTs have homology (under 80%) with hypothetical proteins, putative proteins, expressed proteins, and unknown proteins in NCBI-GenBank. 635 EST's unique genes sequence have been identified by over 80% homology to known function in other species which were not previously described in Olea family. Only 3.1% of total EST's was shown similarity with olive database existing in NCBI. This generated EST's data and consensus sequences were submitted to NCBI as valuable source for functional genome studies of olive. PMID:21197085

  19. Analysis of expressed sequence tags from the venom ducts of Conus striatus: focusing on the expression profile of conotoxins.

    PubMed

    Pi, Canhui; Liu, Yun; Peng, Can; Jiang, Xiuhua; Liu, Junliang; Xu, Bin; Yu, Xuesong; Yu, Yanghong; Jiang, Xiaoyu; Wang, Lei; Dong, Meiling; Chen, Shangwu; Xu, An-Long

    2006-02-01

    Cone snails (genus Conus) are predatory marine gastropods that use venom peptides for interacting with prey, predators and competitors. A majority of these peptides, generally known as conotoxins demonstrate striking selectivity in targeting specific subtypes of ion channels and neurotransmitter receptors. So they are not only useful tools in neuroscience to characterize receptors and receptor subtypes, but offer great potential in new drug research and development as well. Here, a cDNA library from the venom ducts of a fish-hunting cone snail species, Conus striatus is described for the generation of expressed sequence tags (ESTs). A total of 429 ESTs were grouped into 137 clusters or singletons. Among these sequences, 221 were toxin sequences, accounting for 52.1% (corresponding to 19 clusters) of all transcripts. A-superfamily (132 ESTs) and O-superfamily conotoxins (80 ESTs) constitute the predominant toxin components. Some non-disulfide-rich Conus peptides were also found. The expression profile of conotoxins also explained to some extent the pharmacological and physiological reactions elicited by this typical piscivorous species. For the first time, a nonstop transcript of conotoxin was identified, which is suggestive that alternative polyadenylation may be a means of post-transcriptional regulation of conotoxin production. A comparison analysis of these conotoxins reveals the different variation and divergence patterns in these two superfamilies. Our investigations indicate that focal hyper-mutation, block substitution and exon shuffling are three main mechanisms leading to the conotoxin diversity in a species. The comprehensive set of Conus gene sequences allowed the identification of the representative classes of conotoxins and related components, which may lay the foundation for further research and development of conotoxins.

  20. Sequence analysis of expressed sequence tags from an ABA-treated cDNA library identifies stress response genes in the moss Physcomitrella patens.

    PubMed

    Machuka, J; Bashiardes, S; Ruben, E; Spooner, K; Cuming, A; Knight, C; Cove, D

    1999-04-01

    Partial cDNA sequencing was used to obtain 169 expressed sequence tags (ESTs) in the moss, Physcomitrella patens. The source of ESTs was a random cDNA library constructed from 7 day-old protonemata following treatment with 10(-4) M abscisic acid (ABA). Analysis of the ESTs identified 69% with homology to known sequences, 61% of which had significant homology to sequences of plant origin. More importantly, at least 11 ESTs had significant similarities to genes which are implicated in plant stress-responses, including responses which may involve ABA. These included a cDNA associated with desiccation tolerance, two heat shock protein genes, one cold acclimation protein cDNA and five others that may be involved in either oxidative or chemical stress or both, i.e., Zn/Cu-superoxide dismutase, NADPH protochlorophyllide oxidoreductase (PorB), selenium binding protein, glutathione peroxidase and glutathione S transferase. Analysis of codon usage between P. patens and seed plants indicated that although mosses and higher plants are to a large extent similar, minor variations also exists that may represent the distinctiveness of each group.

  1. Analysis and functional annotation of expressed sequence tags (ESTs) from multiple tissues of oil palm (Elaeis guineensis Jacq.)

    PubMed Central

    Ho, Chai-Ling; Kwan, Yen-Yen; Choi, Mei-Chooi; Tee, Sue-Sean; Ng, Wai-Har; Lim, Kok-Ang; Lee, Yang-Ping; Ooi, Siew-Eng; Lee, Weng-Wah; Tee, Jin-Ming; Tan, Siang-Hee; Kulaveerasingam, Harikrishna; Alwee, Sharifah Shahrul Rabiah Syed; Abdullah, Meilina Ong

    2007-01-01

    Background Oil palm is the second largest source of edible oil which contributes to approximately 20% of the world's production of oils and fats. In order to understand the molecular biology involved in in vitro propagation, flowering, efficient utilization of nitrogen sources and root diseases, we have initiated an expressed sequence tag (EST) analysis on oil palm. Results In this study, six cDNA libraries from oil palm zygotic embryos, suspension cells, shoot apical meristems, young flowers, mature flowers and roots, were constructed. We have generated a total of 14537 expressed sequence tags (ESTs) from these libraries, from which 6464 tentative unique contigs (TUCs) and 2129 singletons were obtained. Approximately 6008 of these tentative unique genes (TUGs) have significant matches to the non-redundant protein database, from which 2361 were assigned to one or more Gene Ontology categories. Predominant transcripts and differentially expressed genes were identified in multiple oil palm tissues. Homologues of genes involved in many aspects of flower development were also identified among the EST collection, such as CONSTANS-like, AGAMOUS-like (AGL)2, AGL20, LFY-like, SQUAMOSA, SQUAMOSA binding protein (SBP) etc. Majority of them are the first representatives in oil palm, providing opportunities to explore the cause of epigenetic homeotic flowering abnormality in oil palm, given the importance of flowering in fruit production. The transcript levels of two flowering-related genes, EgSBP and EgSEP were analysed in the flower tissues of various developmental stages. Gene homologues for enzymes involved in oil biosynthesis, utilization of nitrogen sources, and scavenging of oxygen radicals, were also uncovered among the oil palm ESTs. Conclusion The EST sequences generated will allow comparative genomic studies between oil palm and other monocotyledonous and dicotyledonous plants, development of gene-targeted markers for the reference genetic map, design and

  2. Comparative analysis and functional annotation of a large expressed sequence tag collection of apple

    USDA-ARS?s Scientific Manuscript database

    A total of 34 apple cDNA libraries were constructed from root, leaf, bud, shoot, flower, and fruit tissues, at varying developmental stages and/or under biotic or abiotic stress conditions, and of several genotypes. From these libraries, 190,425 clones were partially sequenced from the 5’ end and 4...

  3. Sequence tagging reveals unexpected modifications in toxicoproteomics.

    PubMed

    Dasari, Surendra; Chambers, Matthew C; Codreanu, Simona G; Liebler, Daniel C; Collins, Ben C; Pennington, Stephen R; Gallagher, William M; Tabb, David L

    2011-02-18

    Toxicoproteomic samples are rich in posttranslational modifications (PTMs) of proteins. Identifying these modifications via standard database searching can incur significant performance penalties. Here, we describe the latest developments in TagRecon, an algorithm that leverages inferred sequence tags to identify modified peptides in toxicoproteomic data sets. TagRecon identifies known modifications more effectively than the MyriMatch database search engine. TagRecon outperformed state of the art software in recognizing unanticipated modifications from LTQ, Orbitrap, and QTOF data sets. We developed user-friendly software for detecting persistent mass shifts from samples. We follow a three-step strategy for detecting unanticipated PTMs in samples. First, we identify the proteins present in the sample with a standard database search. Next, identified proteins are interrogated for unexpected PTMs with a sequence tag-based search. Finally, additional evidence is gathered for the detected mass shifts with a refinement search. Application of this technology on toxicoproteomic data sets revealed unintended cross-reactions between proteins and sample processing reagents. Twenty-five proteins in rat liver showed signs of oxidative stress when exposed to potentially toxic drugs. These results demonstrate the value of mining toxicoproteomic data sets for modifications.

  4. Analysis of expressed sequence tags from Uromyces appendiculatus hyphae and haustoria and their comparison to sequences from other rust fungi

    USDA-ARS?s Scientific Manuscript database

    Two separate cDNA libraries were prepared for RNA extracted from bean rust (Uromyces appendiculatus) hyphae and haustoria isolated from infected leaves bean leaves (Phaseolus vulgaris cv Pint 111) between 2 and 8 dpi. Approximately 13,000 clones were sequenced from both ends and the sequences assem...

  5. Transcriptome analysis of the phytopathogenic fungus Rhizoctonia solani AG1-IB 7/3/14 applying high-throughput sequencing of expressed sequence tags (ESTs).

    PubMed

    Wibberg, Daniel; Jelonek, Lukas; Rupp, Oliver; Kröber, Magdalena; Goesmann, Alexander; Grosch, Rita; Pühler, Alfred; Schlüter, Andreas

    2014-01-01

    Rhizoctonia solani is a soil-borne plant pathogenic fungus of the phylum Basidiomycota. It affects a wide range of agriculturally important crops and hence is responsible for economically relevant crop losses. Transcriptome analysis of the bottom rot pathogen R. solani AG1-1B (isolate 7/3/14) by applying high-throughput sequencing and bioinformatics methods addressing Expressed Sequence Tag (EST) data interpretation provided new insights in expressed genes of this fungus. Two normalized cDNA libraries representing different cultivation conditions of the fungus were sequenced on the 454 FLX (Roche) system. Subsequent to cDNA sequence assembly and quality control, ESTs were analysed applying advanced bioinformatics methods. More than 14 000 transcript isoforms originating from approximately 10 000 predictable R. solani AG1-IB 7/3/14 genes are represented in each dataset. Comparative analyses revealed several differentially expressed genes depending on the growth conditions applied. Determinants with predicted functions in recognition processes between the fungus and the host plant were identified. Moreover, many R. solani AG1-IB ESTs were predicted to encode putative cellulose, pectin, and lignin degrading enzymes. Furthermore, genes playing a possible role in mitogen-activated protein (MAP) kinase cascades, 4-aminobutyric acid (GABA) metabolism, melanin synthesis, plant defence antagonism, phytotoxin, and mycotoxin synthesis were detected.

  6. Microarray analysis of expressed sequence tags from haustoria of the rust fungus Uromyces fabae.

    PubMed

    Jakupović, Mirza; Heintz, Manuel; Reichmann, Peter; Mendgen, Kurt; Hahn, Matthias

    2006-01-01

    Rust fungi are plant parasites which colonise host tissue with an intercellular mycelium that forms haustoria within living plant cells. To identify genes expressed during biotrophic growth, EST sequencing was performed with a haustorium-specific cDNA library from Uromyces fabae. One thousand seventeen ESTs were generated, which assembled into 530 contigs. Several of the most frequently represented sequences in the EST database were identical to the in planta induced genes (PIGs) identified previously (Hahn, M., Mendgen, K., 1997. Characterisation of in planta-induced rust genes isolated from a haustorium-specific cDNA library, Mol. Plant-Microbe Interact. 10, 427-437). Virus-encoded sequences were identified, providing evidence for two novel RNA mycoviruses in U. fabae. Microarray hybridisation revealed many cDNAs that were significantly activated in rust-infected leaves compared to germinated uredospores. Very strong in planta expression was found for two PIGs encoding putative metallothioneins. Furthermore, several genes involved in ribosome biogenesis and translation, glycolysis, amino acid metabolism, stress response, and detoxification showed an increased expression in the parasitic mycelium. These data indicate a strong shift in gene expression in rust fungi between germination and the biotrophic stage of development.

  7. Comprehensive analysis of expressed sequence tags from the pulp of the red mutant 'Cara Cara' navel orange (Citrus sinensis Osbeck).

    PubMed

    Ye, Jun-Li; Zhu, An-Dan; Tao, Neng-Guo; Xu, Qiang; Xu, Juan; Deng, Xiu-Xin

    2010-10-01

    Expressed sequence tag (EST) analysis of the pulp of the red-fleshed mutant 'Cara Cara' navel orange provided a starting point for gene discovery and transcriptome survey during citrus fruit maturation. Interpretation of the EST datasets revealed that the mutant pulp transcriptome held a high section of stress responses related genes, such as the type III metallothionein-like gene (6.0%), heat shock protein (2.8%), Cu/Zn superoxide dismutase (0.8%), late embryogenesis abundant protein 5 (0.8%), etc. 133 transcripts were detected to be differentially expressed between the red mutant and its orange-color wild genotype 'Washington' via digital expression analysis. Among them, genes involved in metabolism, defense/stress and signal transduction were statistical overrepresented. Fifteen transcription factors, composed of NAM, ATAF, and CUC transcription factor (NAC); myeloblastosis (MYB); myelocytomatosis (MYC); basic helix-loop-helix (bHLH); basic leucine zipper (bZIP) domain members, were also included. The data reflected the distinct expression profile and the unique regulatory module associated with these two genotypes. Eight differently expressed genes analyzed in digital were validated by quantitative real-time polymerase chain reaction. For structural polymorphism, both simple sequence repeats and single nucleotide polymorphisms (SNP) loci were surveyed; dinucleotide presentation revealed a bias toward AG/GA/TC/CT repeats (52.5%), against GC/CG repeats (0%). SNPs analysis found that transitions (73%) outnumbered transversions (27%). Seventeen potential cultivar-specific and 387 heterozygous SNP loci were detected from 'Cara Cara' and 'Washington' EST pool. © 2010 Institute of Botany, Chinese Academy of Sciences.

  8. SSH Analysis of Endosperm Transcripts and Characterization of Heat Stress Regulated Expressed Sequence Tags in Bread Wheat

    PubMed Central

    Goswami, Suneha; Kumar, Ranjeet R.; Dubey, Kavita; Singh, Jyoti P.; Tiwari, Sachidanand; Kumar, Ashok; Smita, Shuchi; Mishra, Dwijesh C.; Kumar, Sanjeev; Grover, Monendra; Padaria, Jasdeep C.; Kala, Yugal K.; Singh, Gyanendra P.; Pathak, Himanshu; Chinnusamy, Viswanathan; Rai, Anil; Praveen, Shelly; Rai, Raj D.

    2016-01-01

    Heat stress is one of the major problems in agriculturally important cereal crops, especially wheat. Here, we have constructed a subtracted cDNA library from the endosperm of HS-treated (42°C for 2 h) wheat cv. HD2985 by suppression subtractive hybridization (SSH). We identified ~550 recombinant clones ranging from 200 to 500 bp with an average size of 300 bp. Sanger's sequencing was performed with 205 positive clones to generate the differentially expressed sequence tags (ESTs). Most of the ESTs were observed to be localized on the long arm of chromosome 2A and associated with heat stress tolerance and metabolic pathways. Identified ESTs were BLAST search using Ensemble, TriFLD, and TIGR databases and the predicted CDS were translated and aligned with the protein sequences available in pfam and InterProScan 5 databases to predict the differentially expressed proteins (DEPs). We observed eight different types of post-translational modifications (PTMs) in the DEPs corresponds to the cloned ESTs-147 sites with phosphorylation, 21 sites with sumoylation, 237 with palmitoylation, 96 sites with S-nitrosylation, 3066 calpain cleavage sites, and 103 tyrosine nitration sites, predicted to sense the heat stress and regulate the expression of stress genes. Twelve DEPs were observed to have transmembrane helixes (TMH) in their structure, predicted to play the role of sensors of HS. Quantitative Real-Time PCR of randomly selected ESTs showed very high relative expression of HSP17 under HS; up-regulation was observed more in wheat cv. HD2985 (thermotolerant), as compared to HD2329 (thermosusceptible) during grain-filling. The abundance of transcripts was further validated through northern blot analysis. The ESTs and their corresponding DEPs can be used as molecular marker for screening or targeted precision breeding program. PTMs identified in the DEPs can be used to elucidate the thermotolerance mechanism of wheat—a novel step toward the development of

  9. Transcriptomic analysis of the venom gland of the red-headed krait (Bungarus flaviceps) using expressed sequence tags

    PubMed Central

    2010-01-01

    Background The Red-headed krait (Bungarus flaviceps, Squamata: Serpentes: Elapidae) is a medically important venomous snake that inhabits South-East Asia. Although the venoms of most species of the snake genus Bungarus have been well characterized, a detailed compositional analysis of B. flaviceps is currently lacking. Results Here, we have sequenced 845 expressed sequence tags (ESTs) from the venom gland of a B. flaviceps. Of the transcripts, 74.8% were putative toxins; 20.6% were cellular; and 4.6% were unknown. The main venom protein families identified were three-finger toxins (3FTxs), Kunitz-type serine protease inhibitors (including chain B of β-bungarotoxin), phospholipase A2 (including chain A of β-bungarotoxin), natriuretic peptide (NP), CRISPs, and C-type lectin. Conclusion The 3FTxs were found to be the major component of the venom (39%). We found eight groups of unique 3FTxs and most of them were different from the well-characterized 3FTxs. We found three groups of Kunitz-type serine protease inhibitors (SPIs); one group was comparable to the classical SPIs and the other two groups to chain B of β-bungarotoxins (with or without the extra cysteine) based on sequence identity. The latter group may be functional equivalents of dendrotoxins in Bungarus venoms. The natriuretic peptide (NP) found is the first NP for any Asian elapid, and distantly related to Australian elapid NPs. Our study identifies several unique toxins in B. flaviceps venom, which may help in understanding the evolution of venom toxins and the pathophysiological symptoms induced after envenomation. PMID:20350308

  10. SSH Analysis of Endosperm Transcripts and Characterization of Heat Stress Regulated Expressed Sequence Tags in Bread Wheat.

    PubMed

    Goswami, Suneha; Kumar, Ranjeet R; Dubey, Kavita; Singh, Jyoti P; Tiwari, Sachidanand; Kumar, Ashok; Smita, Shuchi; Mishra, Dwijesh C; Kumar, Sanjeev; Grover, Monendra; Padaria, Jasdeep C; Kala, Yugal K; Singh, Gyanendra P; Pathak, Himanshu; Chinnusamy, Viswanathan; Rai, Anil; Praveen, Shelly; Rai, Raj D

    2016-01-01

    Heat stress is one of the major problems in agriculturally important cereal crops, especially wheat. Here, we have constructed a subtracted cDNA library from the endosperm of HS-treated (42°C for 2 h) wheat cv. HD2985 by suppression subtractive hybridization (SSH). We identified ~550 recombinant clones ranging from 200 to 500 bp with an average size of 300 bp. Sanger's sequencing was performed with 205 positive clones to generate the differentially expressed sequence tags (ESTs). Most of the ESTs were observed to be localized on the long arm of chromosome 2A and associated with heat stress tolerance and metabolic pathways. Identified ESTs were BLAST search using Ensemble, TriFLD, and TIGR databases and the predicted CDS were translated and aligned with the protein sequences available in pfam and InterProScan 5 databases to predict the differentially expressed proteins (DEPs). We observed eight different types of post-translational modifications (PTMs) in the DEPs corresponds to the cloned ESTs-147 sites with phosphorylation, 21 sites with sumoylation, 237 with palmitoylation, 96 sites with S-nitrosylation, 3066 calpain cleavage sites, and 103 tyrosine nitration sites, predicted to sense the heat stress and regulate the expression of stress genes. Twelve DEPs were observed to have transmembrane helixes (TMH) in their structure, predicted to play the role of sensors of HS. Quantitative Real-Time PCR of randomly selected ESTs showed very high relative expression of HSP17 under HS; up-regulation was observed more in wheat cv. HD2985 (thermotolerant), as compared to HD2329 (thermosusceptible) during grain-filling. The abundance of transcripts was further validated through northern blot analysis. The ESTs and their corresponding DEPs can be used as molecular marker for screening or targeted precision breeding program. PTMs identified in the DEPs can be used to elucidate the thermotolerance mechanism of wheat-a novel step toward the development of "climate-smart" wheat.

  11. Next-generation tag sequencing for cancer gene expression profiling.

    PubMed

    Morrissy, A Sorana; Morin, Ryan D; Delaney, Allen; Zeng, Thomas; McDonald, Helen; Jones, Steven; Zhao, Yongjun; Hirst, Martin; Marra, Marco A

    2009-10-01

    We describe a new method, Tag-seq, which employs ultra high-throughput sequencing of 21 base pair cDNA tags for sensitive and cost-effective gene expression profiling. We compared Tag-seq data to LongSAGE data and observed improved representation of several classes of rare transcripts, including transcription factors, antisense transcripts, and intronic sequences, the latter possibly representing novel exons or genes. We observed increases in the diversity, abundance, and dynamic range of such rare transcripts and took advantage of the greater dynamic range of expression to identify, in cancers and normal libraries, altered expression ratios of alternative transcript isoforms. The strand-specific information of Tag-seq reads further allowed us to detect altered expression ratios of sense and antisense (S-AS) transcripts between cancer and normal libraries. S-AS transcripts were enriched in known cancer genes, while transcript isoforms were enriched in miRNA targeting sites. We found that transcript abundance had a stronger GC-bias in LongSAGE than Tag-seq, such that AT-rich tags were less abundant than GC-rich tags in LongSAGE. Tag-seq also performed better in gene discovery, identifying >98% of genes detected by LongSAGE and profiling a distinct subset of the transcriptome characterized by AT-rich genes, which was expressed at levels below those detectable by LongSAGE. Overall, Tag-seq is sensitive to rare transcripts, has less sequence composition bias relative to LongSAGE, and allows differential expression analysis for a greater range of transcripts, including transcripts encoding important regulatory molecules.

  12. Construction of cDNA library and preliminary analysis of expressed sequence tags from Siberian tiger

    PubMed Central

    Liu, Chang-Qing; Lu, Tao-Feng; Feng, Bao-Gang; Liu, Dan; Guan, Wei-Jun; Ma, Yue-Hui

    2010-01-01

    In this study we successfully constructed a full-length cDNA library from Siberian tiger, Panthera tigris altaica, the most well-known wild Animal. Total RNA was extracted from cultured Siberian tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.30×106 pfu/ml and 1.62×109 pfu/ml respectively. The proportion of recombinants from unamplified library was 90.5% and average length of exogenous inserts was 1.13 kb. A total of 282 individual ESTs with sizes ranging from 328 to 1,142bps were then analyzed the BLASTX score revealed that 53.9% of the sequences were classified as strong match, 38.6% as nominal and 7.4% as weak match. 28.0% of them were found to be related to enzyme/catalytic protein, 20.9% ESTs to metabolism, 13.1% ESTs to transport, 12.1% ESTs to signal transducer/cell communication, 9.9% ESTs to structure protein, 3.9% ESTs to immunity protein/defense metabolism, 3.2% ESTs to cell cycle, and 8.9 ESTs classified as novel genes. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genomic research of Siberian tigers. PMID:20941376

  13. Construction of cDNA library and preliminary analysis of expressed sequence tags from Siberian tiger.

    PubMed

    Liu, Chang-Qing; Lu, Tao-Feng; Feng, Bao-Gang; Liu, Dan; Guan, Wei-Jun; Ma, Yue-Hui

    2010-10-01

    In this study we successfully constructed a full-length cDNA library from Siberian tiger, Panthera tigris altaica, the most well-known wild Animal. Total RNA was extracted from cultured Siberian tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.30×10(6) pfu/ml and 1.62×10(9) pfu/ml respectively. The proportion of recombinants from unamplified library was 90.5% and average length of exogenous inserts was 1.13 kb. A total of 282 individual ESTs with sizes ranging from 328 to 1,142 bps were then analyzed the BLASTX score revealed that 53.9% of the sequences were classified as strong match, 38.6% as nominal and 7.4% as weak match. 28.0% of them were found to be related to enzyme/catalytic protein, 20.9% ESTs to metabolism, 13.1% ESTs to transport, 12.1% ESTs to signal transducer/cell communication, 9.9% ESTs to structure protein, 3.9% ESTs to immunity protein/defense metabolism, 3.2% ESTs to cell cycle, and 8.9 ESTs classified as novel genes. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genomic research of Siberian tigers.

  14. Comparative analysis of expressed sequence tags from three castes and two life stages of the termite Reticulitermes flavipes

    PubMed Central

    2010-01-01

    Background Termites (Isoptera) are eusocial insects whose colonies consist of morphologically and behaviorally specialized castes of sterile workers and soldiers, and reproductive alates. Previous studies on eusocial insects have indicated that caste differentiation and behavior are underlain by differential gene expression. Although much is known about gene expression in the honey bee, Apis mellifera, termites remain relatively understudied in this regard. Therefore, our objective was to assemble an expressed sequence tag (EST) data base for the eastern subterranean termite, Reticulitermes flavipes, for future gene expression studies. Results Soldier, worker, and alate caste and two larval cDNA libraries were constructed, and approximately 15,000 randomly chosen clones were sequenced to compile an EST data base. Putative gene functions were assigned based on a BLASTX Swissprot search. Categorical in silico expression patterns for each library were compared using the R-statistic. A significant proportion of the ESTs of each caste and life stages had no significant similarity to those in existing data bases. All cDNA libraries, including those of non-reproductive worker and soldier castes, contained sequences with putative reproductive functions. Genes that showed a potential expression bias among castes included a putative antibacterial humoral response and translation elongation protein in soldiers and a chemosensory protein in alates. Conclusions We have expanded upon the available sequences for R. flavipes and utilized an in silico method to compare gene expression in different castes of an eusocial insect. The in silico analysis allowed us to identify several genes which may be differentially expressed and involved in caste differences. These include a gene overrepresented in the alate cDNA library with a predicted function of neurotransmitter secretion or cholesterol absorption and a gene predicted to be involved in protein biosynthesis and ligase activity

  15. Identification of host immune regulation candidate genes of Toxascaris leonina by expression sequenced tags (ESTs) analysis.

    PubMed

    Cho, Min Kyoung; Lee, Keun Hee; Lee, Sun Joo; Kang, Se Won; Ock, Mee Sun; Hong, Yeon Chul; Lee, Yong Seok; Yu, Hak Sun

    2009-10-14

    Toxascaris leonina adult worms live in the gastrointestinal tract of dog, cat, and fox, releasing eggs which enter the environment by the fecal route. Previously, we reported that T. leonina adult worm derived protein was able to inhibit OVA-specific Th2 responses, and in particular, immunization with parasite proteins exerts a more profound protective effect than allergen treatment. In order to gain greater insight into the relevant immune evasion mechanisms as well as basic scientific information, we have generated ESTs of T. leonina adult female worm and investigated their functions using euKaryotic Orthologous Groups (KOG) database analysis. From the randomly selected plasmids containing DNA inserts, a total of 487 reads were collected from the T. leonina adult worm cDNA library. The annotated ESTs were classified into 25 KOG categories; the most of ESTs (7.90%) were annotated with energy production and conversion, and the second highly annotated category is translation, ribosomal structure and biogenesis related ESTs (7.69%). We also identified many host-parasite immune related genes including C-type lectin, galectin, SXP, and cathepsin L-like cysteine protease coding genes. It is necessary to get more information regarding these genes for understanding about the mechanisms of immune evasion of Toxascaris.

  16. Generation and analysis of expressed sequence tags from Trypanosoma cruzi trypomastigote and amastigote cDNA libraries.

    PubMed

    Agüero, Fernán; Abdellah, Karim Ben; Tekiel, Valeria; Sánchez, Daniel O; González, Antonio

    2004-08-01

    We have generated 2771 expressed sequence tags (ESTs) from two cDNA libraries of Trypanosoma cruzi CL-Brener. The libraries were constructed from trypomastigote and amastigotes, using a spliced leader primer to synthesize the cDNA second strand, thus selecting for full-length cDNAs. Since the libraries were not normalized nor pre-screened, we compared the representation of transcripts between the two using a statistical test and identify a subset of transcripts that show apparent differential representation. A non-redundant set of 1619 reconstructed transcripts was generated by sequence clustering. This dataset was used to perform similarity searches against protein and nucleotide databases. Based on these searches, 339 sequences could be assigned a putative identity. One thousand one-hundred and sixteen sequences in the non-redundant clustered dataset (68.8%) are new expression tags, not represented in the T. cruzi epimastigote ESTs that are in the public databases. Additional information is provided online at http://genoma.unsam.edu.ar/projects/tram. To the best of our knowledge these are the first ESTs reported for the life cycle stages of T. cruzi that occur in the vertebrate host.

  17. Expressed sequence tags of the peanut pod nematode Ditylenchus africanus: the first transcriptome analysis of an Anguinid nematode.

    PubMed

    Haegeman, Annelies; Jacob, Joachim; Vanholme, Bartel; Kyndt, Tina; Mitreva, Makedonka; Gheysen, Godelieve

    2009-09-01

    In this study, 4847 expressed sequenced tags (ESTs) from mixed stages of the migratory plant-parasitic nematode Ditylenchus africanus (peanut pod nematode) were investigated. It is the first molecular survey of a nematode which belongs to the family of the Anguinidae (order Rhabditida, superfamily Sphaerularioidea). The sequences were clustered into 2596 unigenes, of which 43% did not show any homology to known protein, nucleotide, nematode EST or plant-parasitic nematode genome sequences. Gene ontology mapping revealed that most putative proteins are involved in developmental and reproductive processes. In addition unigenes involved in oxidative stress as well as in anhydrobiosis, such as LEA (late embryogenesis abundant protein) and trehalose-6-phosphate synthase were identified. Other tags showed homology to genes previously described as being involved in parasitism (expansin, SEC-2, calreticulin, 14-3-3b and various allergen proteins). In situ hybridization revealed that the expression of a putative expansin and a venom allergen protein was restricted to the gland cell area of the nematode, being in agreement with their presumed role in parasitism. Furthermore, seven putative novel candidate parasitism genes were identified based on the prediction of a signal peptide in the corresponding protein sequence and homologous ESTs exclusively in parasitic nematodes. These genes are interesting for further research and functional characterization. Finally, 34 unigenes were retained as good target candidates for future RNAi experiments, because of their nematode specific nature and observed lethal phenotypes of Caenorhabditis elegans homologs.

  18. Expressed sequence tags of the peanut pod nematode Ditylenchus africanus: the first transcriptome analysis of an Anguinid nematode

    PubMed Central

    Haegeman, Annelies; Jacob, Joachim; Vanholme, Bartel; Kyndt, Tina; Mitreva, Makedonka; Gheysen, Godelieve

    2009-01-01

    In this study, 4847 expressed sequenced tags (ESTs) from mixed stages of the migratory plant-parasitic nematode Ditylenchus africanus (peanut pod nematode) were investigated. It is the first molecular survey of a nematode which belongs to the family of the Anguinidae (order Rhabditida, superfamily Sphaerularioidea). The sequences were clustered into 2596 unigenes, of which 43% did not show any homology to known protein, nucleotide, nematode EST or plant-parasitic nematode genome sequences. Gene ontology mapping revealed that most putative proteins are involved in developmental and reproductive processes. In addition unigenes involved in oxidative stress as well as in anhydrobiosis, such as LEA (late embryogenesis abundant protein) and trehalose-6-phosphate synthase were identified. Other tags showed homology to genes previously described as being involved in parasitism (expansin, SEC-2, calreticulin, 14-3-3b and various allergen proteins). In situ hybridization revealed that the expression of a putative expansin and a venom allergen protein was restricted to the gland cell area of the nematode, being in agreement with their presumed role in parasitism. Furthermore, 7 putative novel candidate parasitism genes were identified based on the prediction of a signal peptide in the corresponding protein sequence and homologous ESTs exclusively in parasitic nematodes. These genes are interesting for further research and functional characterization. Finally, 34 unigenes were retained as good target candidates for future RNAi experiments, because of their nematode specific nature and observed lethal phenotypes of Caenorhabditis elegans homologs. PMID:19383517

  19. Expressed sequence tag analysis of adult human optic nerve for NEIBank: Identification of cell type and tissue markers

    PubMed Central

    Bernstein, Steven L; Guo, Yan; Peterson, Katherine; Wistow, Graeme

    2009-01-01

    Background The optic nerve is a pure white matter central nervous system (CNS) tract with an isolated blood supply, and is widely used in physiological studies of white matter response to various insults. We examined the gene expression profile of human optic nerve (ON) and, through the NEIBANK online resource, to provide a resource of sequenced verified cDNA clones. An un-normalized cDNA library was constructed from pooled human ON tissues and was used in expressed sequence tag (EST) analysis. Location of an abundant oligodendrocyte marker was examined by immunofluorescence. Quantitative real time polymerase chain reaction (qRT-PCR) and Western analysis were used to compare levels of expression for key calcium channel protein genes and protein product in primate and rodent ON. Results Our analyses revealed a profile similar in many respects to other white matter related tissues, but significantly different from previously available ON cDNA libraries. The previous libraries were found to include specific markers for other eye tissues, suggesting contamination. Immune/inflammatory markers were abundant in the new ON library. The oligodendrocyte marker QKI was abundant at the EST level. Immunofluorescence revealed that this protein is a useful oligodendrocyte cell-type marker in rodent and primate ONs. L-type calcium channel EST abundance was found to be particularly low. A qRT-PCR-based comparative mammalian species analysis reveals that L-type calcium channel expression levels are significantly lower in primate than in rodent ON, which may help account for the class-specific difference in responsiveness to calcium channel blocking agents. Several known eye disease genes are abundantly expressed in ON. Many genes associated with normal axonal function, mRNAs associated with axonal transport, inflammation and neuroprotection are observed. Conclusion We conclude that the new cDNA library is a faithful representation of human ON and EST data provide an initial overview

  20. Generation and analysis of expressed sequence tags from six developing xylem libraries in Pinus radiata D. Don

    PubMed Central

    Li, Xinguo; Wu, Harry X; Dillon, Shannon K; Southerton, Simon G

    2009-01-01

    Background Wood is a major renewable natural resource for the timber, fibre and bioenergy industry. Pinus radiata D. Don is the most important commercial plantation tree species in Australia and several other countries; however, genomic resources for this species are very limited in public databases. Our primary objective was to sequence a large number of expressed sequence tags (ESTs) from genes involved in wood formation in radiata pine. Results Six developing xylem cDNA libraries were constructed from earlywood and latewood tissues sampled at juvenile (7 yrs), transition (11 yrs) and mature (30 yrs) ages, respectively. These xylem tissues represent six typical development stages in a rotation period of radiata pine. A total of 6,389 high quality ESTs were collected from 5,952 cDNA clones. Assembly of 5,952 ESTs from 5' end sequences generated 3,304 unigenes including 952 contigs and 2,352 singletons. About 97.0% of the 5,952 ESTs and 96.1% of the unigenes have matches in the UniProt and TIGR databases. Of the 3,174 unigenes with matches, 42.9% were not assigned GO (Gene Ontology) terms and their functions are unknown or unclassified. More than half (52.1%) of the 5,952 ESTs have matches in the Pfam database and represent 772 known protein families. About 18.0% of the 5,952 ESTs matched cell wall related genes in the MAIZEWALL database, representing all 18 categories, 91 of all 174 families and possibly 557 genes. Fifteen cell wall-related genes are ranked in the 30 most abundant genes, including CesA, tubulin, AGP, SAMS, actin, laccase, CCoAMT, MetE, phytocyanin, pectate lyase, cellulase, SuSy, expansin, chitinase and UDP-glucose dehydrogenase. Based on the PlantTFDB database 41 of the 64 transcription factor families in the poplar genome were identified as being involved in radiata pine wood formation. Comparative analysis of GO term abundance revealed a distinct transcriptome in juvenile earlywood formation compared to other stages of wood development

  1. Applying thiouracil tagging to mouse transcriptome analysis.

    PubMed

    Gay, Leslie; Karfilis, Kate V; Miller, Michael R; Doe, Chris Q; Stankunas, Kryn

    2014-02-01

    Transcriptional profiling is a powerful approach for studying mouse development, physiology and disease models. Here we describe a protocol for mouse thiouracil tagging (TU tagging), a transcriptome analysis technology that includes in vivo covalent labeling, purification and analysis of cell type-specific RNA. TU tagging enables the isolation of RNA from a given cell population of a complex tissue, avoiding transcriptional changes induced by cell isolation trauma, as well as the identification of actively transcribed RNAs and not preexisting transcripts. Therefore, in contrast to other cell-specific transcriptional profiling methods based on the purification of tagged ribosomes or nuclei, TU tagging provides a direct examination of transcriptional regulation. We describe how to (i) deliver 4-thiouracil to transgenic mice to thio-label cell lineage-specific transcripts, (ii) purify TU-tagged RNA and prepare libraries for Illumina sequencing and (iii) follow a straightforward bioinformatics workflow to identify cell type-enriched or differentially expressed genes. Tissue containing TU-tagged RNA can be obtained in 1 d, RNA-seq libraries can be generated within 2 d and, after sequencing, an initial bioinformatics analysis can be completed in 1 additional day.

  2. Expression profiling of salinity-alkali stress responses by large-scale expressed sequence tag analysis in Tamarix hispid.

    PubMed

    Gao, Caiqiu; Wang, Yucheng; Liu, Guifeng; Yang, Chuanping; Jiang, Jing; Li, Huiyu

    2008-02-01

    Tamarix hispida, a woody halophyte, thrives in saline and saline-alkali soil. To better understand the gene expression profiles that manifest in response to saline-alkali stress, three cDNA libraries were constructed from leaf tissue of T. hispida plants that were well watered and exposed to NaHCO3 for 24 and 52 h. A total of 9,447 high quality expressed sequence tags (ESTs) were obtained from the three libraries. These ESTs represent 3,945 unigenes, including 986 contigs and 2,959 singlets. The numbers of unigenes obtained from the three libraries were 1,752, 1,558 and 1,675, respectively. The EST analysis was performed to compare gene expression in the three cDNA libraries; the transcripts responsive to NaHCO3 were identified. The differentially expressed transcripts were identified. The up-regulation genes were involved in a variety function areas, such as stress-related proteins, hormone signaling transduction, antioxidative response, transcriptional regulators, protein synthesis and destination, ion homeostasis, photosynthesis and metabolism. The results indicated that the response to NaHCO3 in T. hispida is a complex one, involving multiple physiological and metabolic pathways. Nine gene expression patterns were compared in response to NaHCO3 and NaCl using real time reverse transcription-polymerase chain reaction (RT-PCR). Gene expression trends were similar after a 24-h exposure to either NaCl or NaHCO3, however, great variability was found after a 52-h exposure, indicating that short-term responses to either salt may not be obviously different.

  3. Gene Discovery through Expressed Sequence Tag Sequencing in Trypanosoma cruzi

    PubMed Central

    Verdun, Ramiro E.; Di Paolo, Nelson; Urmenyi, Turan P.; Rondinelli, Edson; Frasch, Alberto C. C.; Sanchez, Daniel O.

    1998-01-01

    Analysis of expressed sequence tags (ESTs) constitutes a useful approach for gene identification that, in the case of human pathogens, might result in the identification of new targets for chemotherapy and vaccine development. As part of the Trypanosoma cruzi genome project, we have partially sequenced the 5′ ends of 1,949 clones to generate ESTs. The clones were randomly selected from a normalized CL Brener epimastigote cDNA library. A total of 14.6% of the clones were homologous to previously identified T. cruzi genes, while 18.4% had significant matches to genes from other organisms in the database. A total of 67% of the ESTs had no matches in the database, and thus, some of them might be T. cruzi-specific genes. Functional groups of those sequences with matches in the database were constructed according to their putative biological functions. The two largest categories were protein synthesis (23.3%) and cell surface molecules (10.8%). The information reported in this paper should be useful for researchers in the field to analyze genes and proteins of their own interest. PMID:9784549

  4. Analysis and comparison of a set of expressed sequence tags of the parthenogenetic water flea Daphnia carinata.

    PubMed

    Xu, Xiaoqian; Song, Shuhui; Wang, Qun; Qin, Fen; Liu, Kan; Zhang, Xiaowei; Hu, Songnian; Zhao, Yunlong

    2009-08-01

    The water flea Daphnia carinata (D. carinata) reproduces both sexually and parthenogenetically, yet little is known about the genes involved in these processes. To further clarify the reproductive biology of Daphnia and elucidate their unique mechanism of reproductive transformation, we have generated and characterized an expressed sequence tag (EST) data set from D. carinata. A set of 1,495 clusters were generated from sequencing 3,072 randomly chosen clones from a parthenogenetic, juvenile water flea cDNA library. The nucleic acid and deduced amino acid sequences were compared with known GenBank sequences. Functional annotation found that 959 clusters showed significant homology with known genes involved in a broad range of activities, including metabolism, translation, development and reproduction, as well as genes involved in sensing environmental factors. We speculate that genes involved in development and reproduction, along with genes that allow the organism to sense changes in the environment, play important roles in the process of parthenogenetic reproduction and could be markers of the early steps of sexual differentiation. Additionally, 86% of the D. Carinata unique sequences could be stringently mapped to the D. pulex genome, of which 125 mapped to intergenic and intronic regions on the current assembly. Our results provide practical insight into crustacean reproductive biology, in addition to establishing a new animal model for reproductive and developmental biology.

  5. Obtaining accurate translations from expressed sequence tags.

    PubMed

    Wasmuth, James; Blaxter, Mark

    2009-01-01

    The genomes of an increasing number of species are being investigated through the generation of expressed sequence tags (ESTs). However, ESTs are prone to sequencing errors and typically define incomplete transcripts, making downstream annotation difficult. Annotation would be greatly improved with robust polypeptide translations. Many current solutions for EST translation require a large number of full-length gene sequences for training purposes, a resource that is not available for the majority of EST projects. As part of our ongoing EST programs investigating these "neglected" genomes, we have developed a polypeptide prediction pipeline, prot4EST. It incorporates freely available software to produce final translations that are more accurate than those derived from any single method. We describe how this integrated approach goes a long way to overcoming the deficit in training data.

  6. Internal epitope tagging informed by relative lack of sequence conservation

    PubMed Central

    Burg, Leonard; Zhang, Karen; Bonawitz, Tristan; Grajevskaja, Viktorija; Bellipanni, Gianfranco; Waring, Richard; Balciunas, Darius

    2016-01-01

    Many experimental techniques rely on specific recognition and stringent binding of proteins by antibodies. This can readily be achieved by introducing an epitope tag. We employed an approach that uses a relative lack of evolutionary conservation to inform epitope tag site selection, followed by integration of the tag-coding sequence into the endogenous locus in zebrafish. We demonstrate that an internal epitope tag is accessible for antibody binding, and that tagged proteins retain wild type function. PMID:27892520

  7. Chromatin Interaction Analysis with Paired-End Tag Sequencing (ChIA-PET) for Mapping Chromatin Interactions and Understanding Transcription Regulation

    PubMed Central

    Poh, Huay Mei; Peh, Su Qin; Ong, Chin Thing; Zhang, Jingyao; Ruan, Xiaoan; Ruan, Yijun

    2012-01-01

    Genomes are organized into three-dimensional structures, adopting higher-order conformations inside the micron-sized nuclear spaces 7, 2, 12. Such architectures are not random and involve interactions between gene promoters and regulatory elements 13. The binding of transcription factors to specific regulatory sequences brings about a network of transcription regulation and coordination 1, 14. Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) was developed to identify these higher-order chromatin structures 5,6. Cells are fixed and interacting loci are captured by covalent DNA-protein cross-links. To minimize non-specific noise and reduce complexity, as well as to increase the specificity of the chromatin interaction analysis, chromatin immunoprecipitation (ChIP) is used against specific protein factors to enrich chromatin fragments of interest before proximity ligation. Ligation involving half-linkers subsequently forms covalent links between pairs of DNA fragments tethered together within individual chromatin complexes. The flanking MmeI restriction enzyme sites in the half-linkers allow extraction of paired end tag-linker-tag constructs (PETs) upon MmeI digestion. As the half-linkers are biotinylated, these PET constructs are purified using streptavidin-magnetic beads. The purified PETs are ligated with next-generation sequencing adaptors and a catalog of interacting fragments is generated via next-generation sequencers such as the Illumina Genome Analyzer. Mapping and bioinformatics analysis is then performed to identify ChIP-enriched binding sites and ChIP-enriched chromatin interactions 8. We have produced a video to demonstrate critical aspects of the ChIA-PET protocol, especially the preparation of ChIP as the quality of ChIP plays a major role in the outcome of a ChIA-PET library. As the protocols are very long, only the critical steps are shown in the video. PMID:22564980

  8. Flavonoid biosynthesis genes putatively identified in the aromatic plant Polygonum minus via Expressed Sequences Tag (EST) analysis.

    PubMed

    Roslan, Nur Diyana; Yusop, Jastina Mat; Baharum, Syarul Nataqain; Othman, Roohaida; Mohamed-Hussein, Zeti-Azura; Ismail, Ismanizan; Noor, Normah Mohd; Zainal, Zamri

    2012-01-01

    P. minus is an aromatic plant, the leaf of which is widely used as a food additive and in the perfume industry. The leaf also accumulates secondary metabolites that act as active ingredients such as flavonoid. Due to limited genomic and transcriptomic data, the biosynthetic pathway of flavonoids is currently unclear. Identification of candidate genes involved in the flavonoid biosynthetic pathway will significantly contribute to understanding the biosynthesis of active compounds. We have constructed a standard cDNA library from P. minus leaves, and two normalized full-length enriched cDNA libraries were constructed from stem and root organs in order to create a gene resource for the biosynthesis of secondary metabolites, especially flavonoid biosynthesis. Thus, large-scale sequencing of P. minus cDNA libraries identified 4196 expressed sequences tags (ESTs) which were deposited in dbEST in the National Center of Biotechnology Information (NCBI). From the three constructed cDNA libraries, 11 ESTs encoding seven genes were mapped to the flavonoid biosynthetic pathway. Finally, three flavonoid biosynthetic pathway-related ESTs chalcone synthase, CHS (JG745304), flavonol synthase, FLS (JG705819) and leucoanthocyanidin dioxygenase, LDOX (JG745247) were selected for further examination by quantitative RT-PCR (qRT-PCR) in different P. minus organs. Expression was detected in leaf, stem and root. Gene expression studies have been initiated in order to better understand the underlying physiological processes.

  9. Sequence-tagged-site (STS) markers of arbitrary genes: development, characterization and analysis of linkage in black spruce.

    PubMed Central

    Perry, D J; Bousquet, J

    1998-01-01

    Sequence-tagged-site (STS) markers of arbitrary genes were investigated in black spruce [Picea mariana (Mill.) B.S.P.]. Thirty-nine pairs of PCR primers were used to screen diverse panels of haploid and diploid DNAs for variation that could be detected by standard agarose gel electrophoresis without further manipulation of amplification products. Codominant length polymorphisms were revealed at 15 loci. Three of these loci also had null amplification alleles as did 3 other loci that had no apparent product-length variation. Dominant length polymorphisms were observed at 2 other loci. Alleles of codominant markers differed in size by as little as 1 bp to as much as an estimated 175 bp with nearly all insertions/deletions found in noncoding regions. Polymorphisms at 3 loci involved large (33 bp to at least 114 bp) direct repeats and similar repeats were found in 7 of 51 cDNAs sequenced. Allelic segregation was in accordance with Mendelian inheritance and linkage was detected for 5 of 63 pairwise combinations of loci tested. Codominant STS markers of 12 loci revealed an average heterozygosity of 0.26 and an average of 2.8 alleles in a range-wide sample of 22 trees. PMID:9611216

  10. Generation and Analysis of Expressed Sequence Tags from Chimonanthus praecox (Wintersweet) Flowers for Discovering Stress-Responsive and Floral Development-Related Genes

    PubMed Central

    Sui, Shunzhao; Luo, Jianghui; Ma, Jing; Zhu, Qinlong; Lei, Xinghua; Li, Mingyang

    2012-01-01

    A complementary DNA library was constructed from the flowers of Chimonanthus praecox, an ornamental perennial shrub blossoming in winter in China. Eight hundred sixty-seven high-quality expressed sequence tag sequences with an average read length of 673.8 bp were acquired. A nonredundant set of 479 unigenes, including 94 contigs and 385 singletons, was identified after the expressed sequence tags were clustered and assembled. BLAST analysis against the nonredundant protein database and nonredundant nucleotide database revealed that 405 unigenes shared significant homology with known genes. The homologous unigenes were categorized according to Gene Ontology hierarchies (biological, cellular, and molecular). By BLAST analysis and Gene Ontology annotation, 95 unigenes involved in stress and defense and 19 unigenes related to floral development were identified based on existing knowledge. Twelve genes, of which 9 were annotated as “cold response,” were examined by real-time RT-PCR to understand the changes in expression patterns under cold stress and to validate the findings. Fourteen genes, including 11 genes related to floral development, were also detected by real-time RT-PCR to validate the expression patterns in the blooming process and in different tissues. This study provides a useful basis for the genomic analysis of C. praecox. PMID:22536115

  11. Identification and isolation of full-length cDNA sequences by sequencing and analysis of expressed sequence tags from guarana (Paullinia cupana).

    PubMed

    Figueirêdo, L C; Faria-Campos, A C; Astolfi-Filho, S; Azevedo, J L

    2011-06-21

    The current intense production of biological data, generated by sequencing techniques, has created an ever-growing volume of unanalyzed data. We reevaluated data produced by the guarana (Paullinia cupana) transcriptome sequencing project to identify cDNA clones with complete coding sequences (full-length clones) and complete sequences of genes of biotechnological interest, contributing to the knowledge of biological characteristics of this organism. We analyzed 15,490 ESTs of guarana in search of clones with complete coding regions. A total of 12,402 sequences were analyzed using BLAST, and 4697 full-length clones were identified, responsible for the production of 2297 different proteins. Eighty-four clones were identified as full-length for N-methyltransferase and 18 were sequenced in both directions to obtain the complete genome sequence, and confirm the search made in silico for full-length clones. Phylogenetic analyses were made with the complete genome sequences of three clones, which showed only 0.017% dissimilarity; these are phylogenetically close to the caffeine synthase of Theobroma cacao. The search for full-length clones allowed the identification of numerous clones that had the complete coding region, demonstrating this to be an efficient and useful tool in the process of biological data mining. The sequencing of the complete coding region of identified full-length clones corroborated the data from the in silico search, strengthening its efficiency and utility.

  12. Exploring the Host Parasitism of the Migratory Plant-Parasitic Nematode Ditylenchus destuctor by Expressed Sequence Tags Analysis

    PubMed Central

    Peng, Huan; Gao, Bing-li; Kong, Ling-an; Yu, Qing; Huang, Wen-kun; He, Xu-feng; Long, Hai-bo; Peng, De-liang

    2013-01-01

    The potato rot nematode, Ditylenchus destructor, is a very destructive nematode pest on many agriculturally important crops worldwide, but the molecular characterization of its parasitism of plant has been limited. The effectors involved in nematode parasitism of plant for several sedentary endo-parasitic nematodes such as Heterodera glycines, Globodera rostochiensis and Meloidogyne incognita have been identified and extensively studied over the past two decades. Ditylenchus destructor, as a migratory plant parasitic nematode, has different feeding behavior, life cycle and host response. Comparing the transcriptome and parasitome among different types of plant-parasitic nematodes is the way to understand more fully the parasitic mechanism of plant nematodes. We undertook the approach of sequencing expressed sequence tags (ESTs) derived from a mixed stage cDNA library of D. destructor. This is the first study of D. destructor ESTs. A total of 9800 ESTs were grouped into 5008 clusters including 3606 singletons and 1402 multi-member contigs, representing a catalog of D. destructor genes. Implementing a bioinformatics' workflow, we found 1391 clusters have no match in the available gene database; 31 clusters only have similarities to genes identified from D. africanus, the most closely related species to D. destructor; 1991 clusters were annotated using Gene Ontology (GO); 1550 clusters were assigned enzyme commission (EC) numbers; and 1211 clusters were mapped to 181 KEGG biochemical pathways. 22 ESTs had similarities to reported nematode effectors. Interestedly, most of the effectors identified in this study are involved in host cell wall degradation or modification, such as 1,4-beta-glucanse, 1,3-beta-glucanse, pectate lyase, chitinases and expansin, or host defense suppression such as calreticulin, annexin and venom allergen-like protein. This result implies that the migratory plant-parasitic nematode D. destructor secrets similar effectors to those of sedentary

  13. Identification and analysis of safener-inducible expressed sequence tags in Populus using a cDNA microarray.

    PubMed

    Rishi, A S; Munir, Shirin; Kapur, Vivek; Nelson, Neil D; Goyal, Arun

    2004-12-01

    Safeners are the chemicals used to protect plants from detrimental effects of herbicides, but their mode of action at the molecular level is not well understood. As an initial step towards understanding the molecular mechanism of safener action in trees, homologous genes in hybrid poplar (Populus nigra x Populus maximowiczii) that were induced by a safener were identified. We here describe the identification of differentially expressed genes in Populus that are induced by Concep-III, a herbicide safener. Expressed sequence tags (ESTs) enriched for transcriptionally induced genes were isolated by suppressive subtractive hybridization (SSH). The SSH library cDNA inserts were used to construct a cDNA microarray for high-throughput validation of the up-regulated expression of safener-induced genes. Single-pass and partial sequences of 1,344 safener-induced ESTs were assembled into 418 singletons and 328 clusters, but the putative functions of almost 53% of the ESTs are not known. Genes encoding proteins involved in all three different phases of safener action, viz., oxidation, conjugation, and sequestration, were found in the SSH library. Almost 75% of genes that showed greater than 2-fold expression upon safener treatment were redundant in the SSH library. The expression pattern for selected genes was validated by reverse transcription-polymerase chain reaction. A few safener-induced genes that were not previously reported to be induced by safeners, but which may have a role in herbicide metabolism, were identified. The newly identified genes could have potential for application in genetic engineering of plants for herbicide detoxification and tolerance.

  14. Exploring the host parasitism of the migratory plant-parasitic nematode Ditylenchus destuctor by expressed sequence tags analysis.

    PubMed

    Peng, Huan; Gao, Bing-li; Kong, Ling-an; Yu, Qing; Huang, Wen-kun; He, Xu-feng; Long, Hai-bo; Peng, De-liang

    2013-01-01

    The potato rot nematode, Ditylenchus destructor, is a very destructive nematode pest on many agriculturally important crops worldwide, but the molecular characterization of its parasitism of plant has been limited. The effectors involved in nematode parasitism of plant for several sedentary endo-parasitic nematodes such as Heterodera glycines, Globodera rostochiensis and Meloidogyne incognita have been identified and extensively studied over the past two decades. Ditylenchus destructor, as a migratory plant parasitic nematode, has different feeding behavior, life cycle and host response. Comparing the transcriptome and parasitome among different types of plant-parasitic nematodes is the way to understand more fully the parasitic mechanism of plant nematodes. We undertook the approach of sequencing expressed sequence tags (ESTs) derived from a mixed stage cDNA library of D. destructor. This is the first study of D. destructor ESTs. A total of 9800 ESTs were grouped into 5008 clusters including 3606 singletons and 1402 multi-member contigs, representing a catalog of D. destructor genes. Implementing a bioinformatics' workflow, we found 1391 clusters have no match in the available gene database; 31 clusters only have similarities to genes identified from D. africanus, the most closely related species to D. destructor; 1991 clusters were annotated using Gene Ontology (GO); 1550 clusters were assigned enzyme commission (EC) numbers; and 1211 clusters were mapped to 181 KEGG biochemical pathways. 22 ESTs had similarities to reported nematode effectors. Interestedly, most of the effectors identified in this study are involved in host cell wall degradation or modification, such as 1,4-beta-glucanse, 1,3-beta-glucanse, pectate lyase, chitinases and expansin, or host defense suppression such as calreticulin, annexin and venom allergen-like protein. This result implies that the migratory plant-parasitic nematode D. destructor secrets similar effectors to those of sedentary

  15. Comparative analysis of expressed sequence tags (ESTs) between drought-tolerant and -susceptible genotypes of chickpea under terminal drought stress

    PubMed Central

    2011-01-01

    Background Chickpea (Cicer arietinum L.) is an important grain-legume crop that is mainly grown in rainfed areas, where terminal drought is a major constraint to its productivity. We generated expressed sequence tags (ESTs) by suppression subtraction hybridization (SSH) to identify differentially expressed genes in drought-tolerant and -susceptible genotypes in chickpea. Results EST libraries were generated by SSH from root and shoot tissues of IC4958 (drought tolerant) and ICC 1882 (drought resistant) exposed to terminal drought conditions by the dry down method. SSH libraries were also constructed by using 2 sets of bulks prepared from the RNA of root tissues from selected recombinant inbred lines (RILs) (10 each) for the extreme high and low root biomass phenotype. A total of 3062 unigenes (638 contigs and 2424 singletons), 51.4% of which were novel in chickpea, were derived by cluster assembly and sequence alignment of 5949 ESTs. Only 2185 (71%) unigenes showed significant BLASTX similarity (<1E-06) in the NCBI non-redundant (nr) database. Gene ontology functional classification terms (BLASTX results and GO term), were retrieved for 2006 (92.0%) sequences, and 656 sequences were further annotated with 812 Enzyme Commission (EC) codes and were mapped to 108 different KEGG pathways. In addition, expression status of 830 unigenes in response to terminal drought stress was evaluated using macro-array (dot blots). The expression of few selected genes was validated by northern blotting and quantitative real-time PCR assay. Conclusion Our study compares not only genes that are up- and down-regulated in a drought-tolerant genotype under terminal drought stress and a drought susceptible genotype but also between the bulks of the selected RILs exhibiting extreme phenotypes. More than 50% of the genes identified have been shown to be associated with drought stress in chickpea for the first time. This study not only serves as resource for marker discovery, but can provide

  16. De novo sequencing of highly modified therapeutic oligonucleotides by hydrophobic tag sequencing coupled with LC-MS.

    PubMed

    Goto, R; Miyakawa, S; Inomata, E; Takami, T; Yamaura, J; Nakamura, Y

    2017-02-01

    Correct sequences are prerequisite for quality control of therapeutic oligonucleotides. However, there is no definitive method available for determining sequences of highly modified therapeutic RNAs, and thereby, most of the oligonucleotides have been used clinically without direct sequence determination. In this study, we developed a novel sequencing method called 'hydrophobic tag sequencing'. Highly modified oligonucleotides are sequenced by partially digesting oligonucleotides conjugated with a 5'-hydrophobic tag, followed by liquid chromatography-mass spectrometry analysis. 5'-Hydrophobic tag-printed fragments (5'-tag degradates) can be separated in order of their molecular masses from tag-free oligonucleotides by reversed-phase liquid chromatography. As models for the sequencing, the anti-VEGF aptamer (Macugen) and the highly modified 38-mer RNA sequences were analyzed under blind conditions. Most nucleotides were identified from the molecular weight of hydrophobic 5'-tag degradates calculated from monoisotopic mass in simple full mass data. When monoisotopic mass could not be assigned, the nucleotide was estimated using the molecular weight of the most abundant mass. The sequences of Macugen and 38-mer RNA perfectly matched the theoretical sequences. The hydrophobic tag sequencing worked well to obtain simple full mass data, resulting in accurate and clear sequencing. The present study provides for the first time a de novo sequencing technology for highly modified RNAs and contributes to quality control of therapeutic oligonucleotides. Copyright © 2016 John Wiley & Sons, Ltd.

  17. Generation, Annotation and Analysis of First Large-Scale Expressed Sequence Tags from Developing Fiber of Gossypium barbadense L

    PubMed Central

    Yuan, Daojun; Tu, Lili; Zhang, Xianlong

    2011-01-01

    Background Cotton fiber is the world's leading natural fiber used in the manufacture of textiles. Gossypium is also the model plant in the study of polyploidization, evolution, cell elongation, cell wall development, and cellulose biosynthesis. G. barbadense L. is an ideal candidate for providing new genetic variations useful to improve fiber quality for its superior properties. However, little is known about fiber development mechanisms of G. barbadense and only a few molecular resources are available in GenBank. Methodology and Principal Findings In total, 10,979 high-quality expressed sequence tags (ESTs) were generated from a normalized fiber cDNA library of G. barbadense. The ESTs were clustered and assembled into 5852 unigenes, consisting of 1492 contigs and 4360 singletons. The blastx result showed 2165 unigenes with significant similarity to known genes and 2687 unigenes with significant similarity to genes of predicted proteins. Functional classification revealed that unigenes were abundant in the functions of binding, catalytic activity, and metabolic pathways of carbohydrate, amino acid, energy, and lipids. The function motif/domain-related cytoskeleton and redox homeostasis were enriched. Among the 5852 unigenes, 282 and 736 unigenes were identified as potential cell wall biosynthesis and transcription factors, respectively. Furthermore, the relationships among cotton species or between cotton and other model plant systems were analyzed. Some putative species-specific unigenes of G. barbadense were highlighted. Conclusions/Significance The ESTs generated in this study are from the first large-scale EST project for G. barbadense and significantly enhance the number of G. barbadense ESTs in public databases. This knowledge will contribute to cotton improvements by studying fiber development mechanisms of G. barbadense, establishing a breeding program using marker-assisted selection, and discovering candidate genes related to important agronomic traits of

  18. Comparative analysis of expressed sequence tag (EST) libraries in the seagrass Zostera marina subjected to temperature stress.

    PubMed

    Reusch, Thorsten B H; Veron, Amelie S; Preuss, Christoph; Weiner, January; Wissler, Lothar; Beck, Alfred; Klages, Sven; Kube, Michael; Reinhardt, Richard; Bornberg-Bauer, Erich

    2008-01-01

    Global warming is associated with increasing stress and mortality on temperate seagrass beds, in particular during periods of high sea surface temperatures during summer months, adding to existing anthropogenic impacts, such as eutrophication and habitat destruction. We compare several expressed sequence tag (EST) in the ecologically important seagrass Zostera marina (eelgrass) to elucidate the molecular genetic basis of adaptation to environmental extremes. We compared the tentative unigene (TUG) frequencies of libraries derived from leaf and meristematic tissue from a control situation with two experimentally imposed temperature stress conditions and found that TUG composition is markedly different among these conditions (all P < 0.0001). Under heat stress, we find that 63 TUGs are differentially expressed (d.e.) at 25 degrees C compared with lower, no-stress condition temperatures (4 degrees C and 17 degrees C). Approximately one-third of d.e. eelgrass genes were characteristic for the stress response of the terrestrial plant model Arabidopsis thaliana. The changes in gene expression suggest complex photosynthetic adjustments among light-harvesting complexes, reaction center subunits of photosystem I and II, and components of the dark reaction. Heat shock encoding proteins and reactive oxygen scavengers also were identified, but their overall frequency was too low to perform statistical tests. In all conditions, the most abundant transcript (3-15%) was a putative metallothionein gene with unknown function. We also find evidence that heat stress may translate to enhanced infection by protists. A total of 210 TUGs contain one or more microsatellites as potential candidates for gene-linked genetic markers. Data are publicly available in a user-friendly database at http://www.uni-muenster.de/Evolution/ebb/Services/zostera .

  19. Expressed sequence-tag analysis of ovaries of Brachiaria brizantha reveals genes associated with the early steps of embryo sac differentiation of apomictic plants.

    PubMed

    Silveira, Erica Duarte; Guimarães, Larissa Arrais; de Alencar Dusi, Diva Maria; da Silva, Felipe Rodrigues; Martins, Natália Florencio; do Carmo Costa, Marcos Mota; Alves-Ferreira, Márcio; de Campos Carneiro, Vera Tavares

    2012-02-01

    In apomixis, asexual mode of plant reproduction through seeds, an unreduced megagametophyte is formed due to circumvented or altered meiosis. The embryo develops autonomously from the unreduced egg cell, independently of fertilization. Brachiaria is a genus of tropical forage grasses that reproduces sexually or by apomixis. A limited number of studies have reported the sequencing of apomixis-related genes and a few Brachiaria sequences have been deposited at genebank databases. This work shows sequencing and expression analyses of expressed sequence-tags (ESTs) of Brachiaria genus and points to transcripts from ovaries with preferential expression at megasporogenesis in apomictic plants. From the 11 differentially expressed sequences from immature ovaries of sexual and apomictic Brachiaria brizantha obtained from macroarray analysis, 9 were preferentially detected in ovaries of apomicts, as confirmed by RT-qPCR. A putative involvement in early steps of Panicum-type embryo sac differentiation of four sequences from B. brizantha ovaries: BbrizHelic, BbrizRan, BbrizSec13 and BbrizSti1 is suggested. Two of these, BbrizSti1 and BbrizHelic, with similarity to a gene coding to stress induced protein and a helicase, respectively, are preferentially expressed in the early stages of apomictic ovaries development, especially in the nucellus, in a stage previous to the differentiation of aposporous initials, as verified by in situ hybridization.

  20. Expressed sequence tag analysis of human RPE/choroid for the NEIBank Project: over 6000 non-redundant transcripts, novel genes and splice variants.

    PubMed

    Wistow, Graeme; Bernstein, Steven L; Wyatt, M Keith; Fariss, Robert N; Behal, Amita; Touchman, Jeffrey W; Bouffard, Gerald; Smith, Don; Peterson, Katherine

    2002-06-15

    The retinal pigment epithelium (RPE) and choroid comprise a functional unit of the eye that is essential to normal retinal health and function. Here we describe expressed sequence tag (EST) analysis of human RPE/choroid as part of a project for ocular bioinformatics. A cDNA library (cs) was made from human RPE/choroid and sequenced. Data were analyzed and assembled using the program GRIST (GRouping and Identification of Sequence Tags). Complete sequencing, Northern and Western blots, RH mapping, peptide antibody synthesis and immunofluorescence (IF) have been used to examine expression patterns and genome location for selected transcripts and proteins. Ten thousand individual sequence reads yield over 6300 unique gene clusters of which almost half have no matches with named genes. One of the most abundant transcripts is from a gene (named "alpha") that maps to the BBS1 region of chromosome 11. A number of tissue preferred transcripts are common to both RPE/choroid and iris. These include oculoglycan/opticin, for which an alternative splice form is detected in RPE/choroid, and "oculospanin" (Ocsp), a novel tetraspanin that maps to chromosome 17q. Antiserum to Ocsp detects expression in RPE, iris, ciliary body, and retinal ganglion cells by IF. A newly identified gene for a zinc-finger protein (TIRC) maps to 19q13.4. Variant transcripts of several genes were also detected. Most notably, the predominant form of Bestrophin represented in cs contains a longer open reading frame as a result of splice junction skipping. The unamplified cs library gives a view of the transcriptional repertoire of the adult RPE/choroid. A large number of potentially novel genes and splice forms and candidates for genetic diseases are revealed. Clones from this collection are being included in a large, nonredundant set for cDNA microarray construction.

  1. An analysis of expressed sequence tags of developing castor endosperm using a full-length cDNA library

    PubMed Central

    Lu, Chaofu; Wallis, James G; Browse, John

    2007-01-01

    Background Castor seeds are a major source for ricinoleate, an important industrial raw material. Genomics studies of castor plant will provide critical information for understanding seed metabolism, for effectively engineering ricinoleate production in transgenic oilseeds, or for genetically improving castor plants by eliminating toxic and allergic proteins in seeds. Results Full-length cDNAs are useful resources in annotating genes and in providing functional analysis of genes and their products. We constructed a full-length cDNA library from developing castor endosperm, and obtained 4,720 ESTs from 5'-ends of the cDNA clones representing 1,908 unique sequences. The most abundant transcripts are genes encoding storage proteins, ricin, agglutinin and oleosins. Several other sequences are also very numerous, including two acidic triacylglycerol lipases, and the oleate hydroxylase (FAH12) gene that is responsible for ricinoleate biosynthesis. The role(s) of the lipases in developing castor seeds are not clear, and co-expressing of a lipase and the FAH12 did not result in significant changes in hydroxy fatty acid accumulation in transgenic Arabidopsis seeds. Only one oleate desaturase (FAD2) gene was identified in our cDNA sequences. Sequence and functional analyses of the castor FAD2 were carried out since it had not been characterized previously. Overexpression of castor FAD2 in a FAH12-expressing Arabidopsis line resulted in decreased accumulation of hydroxy fatty acids in transgenic seeds. Conclusion Our results suggest that transcriptional regulation of FAD2 and FAH12 genes maybe one of the mechanisms that contribute to a high level of ricinoleate accumulation in castor endosperm. The full-length cDNA library will be used to search for additional genes that affect ricinoleate accumulation in seed oils. Our EST sequences will also be useful to annotate the castor genome, which whole sequence is being generated by shotgun sequencing at the Institute for Genome

  2. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon.

    PubMed

    Clepet, Christian; Joobeur, Tarek; Zheng, Yi; Jublot, Delphine; Huang, Mingyun; Truniger, Veronica; Boualem, Adnane; Hernandez-Gonzalez, Maria Elena; Dolcet-Sanjuan, Ramon; Portnoy, Vitaly; Mascarell-Creus, Albert; Caño-Delgado, Ana I; Katzir, Nurit; Bendahmane, Abdelhafid; Giovannoni, James J; Aranda, Miguel A; Garcia-Mas, Jordi; Fei, Zhangjun

    2011-05-20

    Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon

  3. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

    PubMed Central

    2011-01-01

    Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot

  4. Identification of stress-induced genes from the drought-tolerant plant Prosopis juliflora (Swartz) DC. through analysis of expressed sequence tags.

    PubMed

    George, Suja; Venkataraman, Gayatri; Parida, Ajay

    2007-05-01

    Abiotic stresses such as cold, salinity, drought, wounding, and heavy metal contamination adversely affect crop productivity throughout the world. Prosopis juliflora is a phreatophyte that can tolerate severe adverse environmental conditions such as drought, salinity, and heavy metal contamination. As a first step towards the characterization of genes that contribute to combating abiotic stress, construction and analysis of a cDNA library of P. juliflora genes is reported here. Random expressed sequence tag (EST) sequencing of 1750 clones produced 1467 high-quality reads. These clones were classified into functional categories, and BLAST comparisons revealed that 114 clones were homologous to genes implicated in stress response(s) and included heat shock proteins, metallothioneins, lipid transfer proteins, and late embryogenesis abundant proteins. Of the ESTs analyzed, 26% showed homology to previously uncharacterized genes in the databases. Fifty-two clones from this category were selected for reverse Northern analysis: 21 were shown to be upregulated and 16 downregulated. The results obtained by reverse Northern analysis were confirmed by Northern analysis. Clustering of the 1467 ESTs produced a total of 295 contigs encompassing 790 ESTs, resulting in a 54.2% redundancy. Two of the abundant genes coding for a nonspecific lipid transfer protein and late embryogenesis abundant protein were sequenced completely. Northern analysis (after polyethylene glycol stress) of the 2 genes was carried out. The implications of the analyzed genes in abiotic stress tolerance are also discussed.

  5. Expressed sequence tag analysis of adult human lens for the NEIBank Project: over 2000 non-redundant transcripts, novel genes and splice variants.

    PubMed

    Wistow, Graeme; Bernstein, Steven L; Wyatt, M Keith; Behal, Amita; Touchman, Jeffrey W; Bouffard, Gerald; Smith, Don; Peterson, Katherine

    2002-06-15

    To explore the expression profile of the human lens and to provide a resource for microarray studies, expressed sequence tag (EST) analysis has been performed on cDNA libraries from adult lenses. A cDNA library was constructed from two adult (40 year old) human lenses. Over two thousand clones were sequenced from the unamplified, un-normalized library. The library was then normalized and a further 2200 sequences were obtained. All the data were analyzed using GRIST (GRouping and Identification of Sequence Tags), a procedure for gene identification and clustering. The lens library (by) contains a low percentage of non-mRNA contaminants and a high fraction (over 75%) of apparently full length cDNA clones. Approximately 2000 reads from the unamplified library yields 810 clusters, potentially representing individual genes expressed in the lens. After normalization, the content of crystallins and other abundant cDNAs is markedly reduced and a similar number of reads from this library (fs) yields 1455 unique groups of which only two thirds correspond to named genes in GenBank. Among the most abundant cDNAs is one for a novel gene related to glutamine synthetase, which was designated "lengsin" (LGS). Analyses of ESTs also reveal examples of alternative transcripts, including a major alternative splice form for the lens specific membrane protein MP19. Variant forms for other transcripts, including those encoding the apoptosis inhibitor Livin and the armadillo repeat protein ARVCF, are also described. The lens cDNA libraries are a resource for gene discovery, full length cDNAs for functional studies and microarrays. The discovery of an abundant, novel transcript, lengsin, and a major novel splice form of MP19 reflect the utility of unamplified libraries constructed from dissected tissue. Many novel transcripts and splice forms are represented, some of which may be candidates for genetic diseases.

  6. Construction of cDNA library and preliminary analysis of expressed sequence tags from green microalga Ankistrodesmus convolutus Corda.

    PubMed

    Thanh, Tran; Chi, Vu Thi Quynh; Abdullah, Mohd Puad; Omar, Hishamuddin; Noroozi, Mostafa; Ky, Huynh; Napis, Suhaimi

    2011-01-01

    Green microalga Ankistrodesmus convolutus Corda is a fast growing alga which produces appreciable amount of carotenoids and polyunsaturated fatty acids. To our knowledge, this is the first report on the construction of cDNA library and preliminary analysis of ESTs for this species. The titers of the primary and amplified cDNA libraries were 1.1×10(6) and 6.0×10(9) pfu/ml respectively. The percentage of recombinants was 97% in the primary library and a total of 337 out of 415 original cDNA clones selected randomly contained inserts ranging from 600 to 1,500 bps. A total of 201 individual ESTs with sizes ranging from 390 to 1,038 bps were then analyzed and the BLASTX score revealed that 35.8% of the sequences were classified as strong match, 38.3% as nominal and 25.9% as weak match. Among the ESTs with known putative function, 21.4% of them were found to be related to gene expression, 14.4% ESTs to photosynthesis, 10.9% ESTs to metabolism, 5.5% ESTs to miscellaneous, 2.0% to stress response, and the remaining 45.8% were classified as novel genes. Analysis of ESTs described in this paper can be an effective approach to isolate and characterize new genes from A. convolutus and thus the sequences obtained represented a significant contribution to the extensive database of sequences from green microalgae.

  7. Application of the High Resolution Melting analysis for genetic mapping of Sequence Tagged Site markers in narrow-leafed lupin (Lupinus angustifolius L.).

    PubMed

    Kamel, Katarzyna A; Kroc, Magdalena; Święcicki, Wojciech

    2015-01-01

    Sequence tagged site (STS) markers are valuable tools for genetic and physical mapping that can be successfully used in comparative analyses among related species. Current challenges for molecular markers genotyping in plants include the lack of fast, sensitive and inexpensive methods suitable for sequence variant detection. In contrast, high resolution melting (HRM) is a simple and high-throughput assay, which has been widely applied in sequence polymorphism identification as well as in the studies of genetic variability and genotyping. The present study is the first attempt to use the HRM analysis to genotype STS markers in narrow-leafed lupin (Lupinus angustifolius L.). The sensitivity and utility of this method was confirmed by the sequence polymorphism detection based on melting curve profiles in the parental genotypes and progeny of the narrow-leafed lupin mapping population. Application of different approaches, including amplicon size and a simulated heterozygote analysis, has allowed for successful genetic mapping of 16 new STS markers in the narrow-leafed lupin genome.

  8. Diversity Analysis in Cannabis sativa Based on Large-Scale Development of Expressed Sequence Tag-Derived Simple Sequence Repeat Markers

    PubMed Central

    Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis. PMID:25329551

  9. Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

    PubMed

    Gao, Chunsheng; Xin, Pengfei; Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.

  10. Analysis of expression sequence tags from a full-length-enriched cDNA library of developing sesame seeds (Sesamum indicum)

    PubMed Central

    2011-01-01

    Background Sesame (Sesamum indicum) is one of the most important oilseed crops with high oil contents and rich nutrient value. However, genetic improvement efforts in sesame could not get benefit from molecular biology technology due to poor DNA and RNA sequence resources. In this study, we carried out a large scale of expressed sequence tags (ESTs) sequencing from developing sesame seeds and further conducted analysis on seed storage products-related genes. Results A normalized and full-length enriched cDNA library from 5 ~ 30 days old immature seeds was constructed and randomly sequenced, leading to generation of 41,248 expressed sequence tags (ESTs) which then formed 4,713 contigs and 27,708 singletons with 44.9% uniESTs being putative full-length open reading frames. Approximately 26,091 of all these uniESTs have significant matches to the counterparts in Nr database of GenBank, and 21,628 of them were assigned to one or more Gene ontology (GO) terms. Homologous genes involved in oil biosynthesis were identified including some conservative transcription factors regulating oil biosynthesis such as LEAFY COTYLEDON1 (LEC1), PICKLE (PKL), WRINKLED1 (WRI1) and majority of them were found for the first time in sesame seeds. One hundred and 17 ESTs were identified possibly involved in biosynthesis of sesame lignans, sesamin and sesamolin. In total, 9,347 putative functional genes from developing seeds were identified, which accounts for one third of total genes in the sesame genome. Further analysis of the uniESTs identified 1,949 non-redundant simple sequence repeats (SSRs). Conclusions This study has provided an overview of genes expressed during sesame seed development. This collection of sesame full-length cDNAs covered a wide variety of genes in seeds, in particular, candidate genes involved in biosynthesis of sesame oils and lignans. These EST sequences enriched with full length will contribute to comparative genomic studies on sesame and other oilseed plants

  11. Analysis of expression sequence tags from a full-length-enriched cDNA library of developing sesame seeds (Sesamum indicum).

    PubMed

    Ke, Tao; Dong, Caihua; Mao, Han; Zhao, Yingzhong; Chen, Hong; Liu, Hongyan; Dong, Xuyan; Tong, Chaobo; Liu, Shengyi

    2011-12-24

    Sesame (Sesamum indicum) is one of the most important oilseed crops with high oil contents and rich nutrient value. However, genetic improvement efforts in sesame could not get benefit from molecular biology technology due to poor DNA and RNA sequence resources. In this study, we carried out a large scale of expressed sequence tags (ESTs) sequencing from developing sesame seeds and further conducted analysis on seed storage products-related genes. A normalized and full-length enriched cDNA library from 5 ~ 30 days old immature seeds was constructed and randomly sequenced, leading to generation of 41,248 expressed sequence tags (ESTs) which then formed 4,713 contigs and 27,708 singletons with 44.9% uniESTs being putative full-length open reading frames. Approximately 26,091 of all these uniESTs have significant matches to the counterparts in Nr database of GenBank, and 21,628 of them were assigned to one or more Gene ontology (GO) terms. Homologous genes involved in oil biosynthesis were identified including some conservative transcription factors regulating oil biosynthesis such as LEAFY COTYLEDON1 (LEC1), PICKLE (PKL), WRINKLED1 (WRI1) and majority of them were found for the first time in sesame seeds. One hundred and 17 ESTs were identified possibly involved in biosynthesis of sesame lignans, sesamin and sesamolin. In total, 9,347 putative functional genes from developing seeds were identified, which accounts for one third of total genes in the sesame genome. Further analysis of the uniESTs identified 1,949 non-redundant simple sequence repeats (SSRs). This study has provided an overview of genes expressed during sesame seed development. This collection of sesame full-length cDNAs covered a wide variety of genes in seeds, in particular, candidate genes involved in biosynthesis of sesame oils and lignans. These EST sequences enriched with full length will contribute to comparative genomic studies on sesame and other oilseed plants and serve as an abundant

  12. Development of Microsatellite Markers Derived from Expressed Sequence Tags of Polyporales for Genetic Diversity Analysis of Endangered Polyporus umbellatus

    PubMed Central

    Zhang, Yuejin; Chen, Yuanyuan; Wang, Ruihong; Zeng, Ailin; Deyholos, Michael K.; Shu, Jia; Guo, Hongbo

    2015-01-01

    A large scale of EST sequences of Polyporales was screened in this investigation in order to identify EST-SSR markers for various applications. The distribution of EST sequences and SSRs in five families of Polyporales was analyzed, respectively. Mononucleotide was the most abundant type, followed by trinucleotide. Among five families, Ganodermataceae occupied the most SSR markers, followed by Coriolaceae. Functional prediction of SSR marker-containing EST sequences in Ganoderma lucidum obtained three main groups, namely, cellular component, biological process, and molecular function. Thirty EST-SSR primers were designed to evaluate the genetic diversity of 13 natural Polyporus umbellatus accessions. Twenty one EST-SSRs were polymorphic with average PIC value of 0.33 and transferability rate of 71%. These 13 P. umbellatus accessions showed relatively high genetic diversity. The expected heterozygosity, Nei's gene diversity, and Shannon information index were 0.41, 0.39, and 0.57, respectively. Both UPGMA dendrogram and principal coordinate analysis (PCA) showed the same cluster result that divided the 13 accessions into three or four groups. PMID:26146636

  13. Expressed sequence tag analysis and development of gene associated markers in a near-isogenic plant system of Eragrostis curvula.

    PubMed

    Cervigni, Gerardo D L; Paniego, Norma; Díaz, Marina; Selva, Juan P; Zappacosta, Diego; Zanazzi, Darío; Landerreche, Iñaki; Martelotto, Luciano; Felitti, Silvina; Pessino, Silvina; Spangenberg, Germán; Echenique, Viviana

    2008-05-01

    Eragrostis curvula (Schrad.) Nees is a forage grass native to the semiarid regions of Southern Africa, which reproduces mainly by pseudogamous diplosporous apomixis. A collection of ESTs was generated from four cDNA libraries, three of them obtained from panicles of near-isogenic lines with different ploidy levels and reproductive modes, and one obtained from 12 days-old plant leaves. A total of 12,295 high-quality ESTs were clustered and assembled, rendering 8,864 unigenes, including 1,490 contigs and 7,394 singletons, with a genome coverage of 22%. A total of 7,029 (79.11%) unigenes were functionally categorized by BLASTX analysis against sequences deposited in public databases, but only 37.80% could be classified according to Gene Ontology. Sequence comparison against the cereals genes indexes (GI) revealed 50% significant hits. A total of 254 EST-SSRs were detected from 219 singletons and 35 from contigs. Di- and tri- motifs were similarly represented with percentages of 38.95 and 40.16%, respectively. In addition, 190 SNPs and Indels were detected in 18 contigs generated from 3 to 4 libraries. The ESTs and the molecular markers obtained in this study will provide valuable resources for a wide range of applications including gene identification, genetic mapping, cultivar identification, analysis of genetic diversity, phenotype mapping and marker assisted selection.

  14. Use of expressed sequence tag analysis and cDNA microarrays of the filamentous fungus Aspergillus nidulans.

    PubMed

    Sims, Andrew H; Robson, Geoffrey D; Hoyle, David C; Oliver, Stephen G; Turner, Geoffrey; Prade, Rolf A; Russell, Hugh H; Dunn-Coleman, Nigel S; Gent, Manda E

    2004-02-01

    The use of microarrays in the analysis of gene expression is becoming widespread for many organisms, including yeast. However, although the genomes of a number of filamentous fungi have been fully or partially sequenced, microarray analysis is still in its infancy in these organisms. Here, we describe the construction and validation of microarrays for the fungus Aspergillus nidulans using PCR products from a 4092 EST conidial germination library. An experiment was designed to validate these arrays by monitoring the expression profiles of known genes following the addition of 1% (w/v) glucose to wild-type A. nidulans cultures grown to mid-exponential phase in Vogel's minimal medium with ethanol as the sole carbon source. The profiles of genes showing statistically significant differential expression following the glucose up-shift are presented and an assessment of the quality and reproducibility of the A. nidulans arrays discussed.

  15. Accumulation, functional annotation, and comparative analysis of expressed sequence tags in eggplant (Solanum melongena L.), the third pole of the genus Solanum species after tomato and potato.

    PubMed

    Fukuoka, Hiroyuki; Yamaguchi, Hirotaka; Nunome, Tsukasa; Negoro, Satomi; Miyatake, Koji; Ohyama, Akio

    2010-01-15

    Eggplant (Solanum melongena L.) is a widely grown vegetable crop that belongs to the genus Solanum, which is comprised of more than 1000 species of wide genetic and phenotypic variation. Unlike tomato and potato, Solanum crops that belong to subgenus Potatoe and have been targets for comprehensive genomic studies, eggplant is endemic to the Old World and belongs to a different subgenus, Leptostemonum, and therefore, would be a unique member for comparative molecular biology in Solanum. In this study, more than 60,000 eggplant cDNA clones from various tissues and treatments were sequenced from both the 5'- and 3'-ends, and a unigene set consisting of 16,245 unique sequences was constructed. Functional annotations based on sequence similarity to known plant reference datasets revealed a distribution of functional categories almost similar to that of tomato, while 1316 unigenes were suggested to be eggplant-specific. Sequence-based comparative analysis using putative orthologous gene groups setup by reciprocal sequence comparison among six solanaceous species suggested that eggplant and its wild ally Solanum torvum were clustered separately from subgenus Potatoe species, and then, all Solanum species were clustered separately from the genus Capsicum. Microsatellite motif distribution was different among species and likely to be coincident with the phylogenetic relationships. Furthermore, the eggplant unigene dataset exhibited its utility in transcriptome analysis by the SAGE strategy where a considerable number of short tag sequences of interest were successfully assigned to unigenes and their functional annotations. The eggplant ESTs and 16k unigene set developed in this study would be a useful resource not only for molecular genetics and breeding in eggplant itself, but for expanding the scope of comparative biology in Solanum species.

  16. Characterizing the Grape Transcriptome. Analysis of Expressed Sequence Tags from Multiple Vitis Species and Development of a Compendium of Gene Expression during Berry Development1[w

    PubMed Central

    Silva, Francisco Goes da; Iandolino, Alberto; Al-Kayal, Fadi; Bohlmann, Marlene C.; Cushman, Mary Ann; Lim, Hyunju; Ergul, Ali; Figueroa, Rubi; Kabuloglu, Elif K.; Osborne, Craig; Rowe, Joan; Tattersall, Elizabeth; Leslie, Anna; Xu, Jane; Baek, JongMin; Cramer, Grant R.; Cushman, John C.; Cook, Douglas R.

    2005-01-01

    We report the analysis and annotation of 146,075 expressed sequence tags from Vitis species. The majority of these sequences were derived from different cultivars of Vitis vinifera, comprising an estimated 25,746 unique contig and singleton sequences that survey transcription in various tissues and developmental stages and during biotic and abiotic stress. Putatively homologous proteins were identified for over 17,752 of the transcripts, with 1,962 transcripts further subdivided into one or more Gene Ontology categories. A simple structured vocabulary, with modules for plant genotype, plant development, and stress, was developed to describe the relationship between individual expressed sequence tags and cDNA libraries; the resulting vocabulary provides query terms to facilitate data mining within the context of a relational database. As a measure of the extent to which characterized metabolic pathways were encompassed by the data set, we searched for homologs of the enzymes leading from glycolysis, through the oxidative/nonoxidative pentose phosphate pathway, and into the general phenylpropanoid pathway. Homologs were identified for 65 of these 77 enzymes, with 86% of enzymatic steps represented by paralogous genes. Differentially expressed transcripts were identified by means of a stringent believability index cutoff of ≥98.4%. Correlation analysis and two-dimensional hierarchical clustering grouped these transcripts according to similarity of expression. In the broadest analysis, 665 differentially expressed transcripts were identified across 29 cDNA libraries, representing a range of developmental and stress conditions. The groupings revealed expected associations between plant developmental stages and tissue types, with the notable exception of abiotic stress treatments. A more focused analysis of flower and berry development identified 87 differentially expressed transcripts and provides the basis for a compendium that relates gene expression and annotation

  17. Transcriptional Regulations on the Low-Temperature-Induced Floral Transition in an Orchidaceae Species, Dendrobium nobile: An Expressed Sequence Tags Analysis

    PubMed Central

    Liang, Shan; Ye, Qing-Sheng; Li, Rui-Hong; Leng, Jia-Yi; Li, Mei-Ru; Wang, Xiao-Jing; Li, Hong-Qing

    2012-01-01

    Vernalization-induced flowering is a cold-relevant adaptation in many species, but little is known about the genetic basis behind in Orchidaceae species. Here, we reported a collection of 15017 expressed sequence tags (ESTs) from the vernalized axillary buds of an Orchidaceae species, Dendrobium nobile, which were assembled for 9616 unique gene clusters. Functional enrichment analysis showed that genes in relation to the responses to stresses, especially in the form of low temperatures, and those involving in protein biosynthesis and chromatin assembly were significantly overrepresented during 40 days of vernalization. Additionally, a total of 59 putative flowering-relevant genes were recognized, including those homologous to known key players in vernalization pathways in temperate cereals or Arabidopsis, such as cereal VRN1, FT/VRN3, and Arabidopsis AGL19. Results from this study suggest that the networks regulating vernalization-induced floral transition are conserved, but just in a part, in D. nobile, temperate cereals, and Arabidopsis. PMID:22550428

  18. Identification of salt-induced genes from Salicornia brachiata, an extreme halophyte through expressed sequence tags analysis.

    PubMed

    Jha, Bhavanath; Agarwal, Pradeep K; Reddy, Palakolanu Sudhakar; Lal, Sanjay; Sopory, Sudhir K; Reddy, Malireddy K

    2009-04-01

    Salinity severely affects plant growth and development causing crop loss worldwide. We have isolated a large number of salt-induced genes as well as unknown and hypothetical genes from Salicornia brachiata Roxb. (Amaranthaceae). This is the first description of identification of genes in response to salinity stress in this extreme halophyte plant. Salicornia accumulates salt in its pith and survives even at 2 M NaCl under field conditions. For isolating salt responsive genes, cDNA subtractive hybridization was performed between control and 500 mM NaCl treated plants. Out of the 1200 recombinant clones, 930 sequences were submitted to the NCBI database (GenBank accession: EB484528 to EB485289 and EC906125 to EC906292). 789 ESTs showed matching with different genes in NCBI database. 4.8% ESTs belonged to stress-tolerant gene category and approximately 29% ESTs showed no homology with known functional gene sequences, thus classified as unknown or hypothetical. The detection of a large number of ESTs with unknown putative function in this species makes it an interesting contribution. The 90 unknown and hypothetical genes were selected to study their differential regulation by reverse Northern analysis for identifying their role in salinity tolerance. Interestingly, both up and down regulation at 500 mM NaCl were observed (21 and 10 genes, respectively). Northern analysis of two important salt tolerant genes, ASR1 (Abscisic acid stress ripening gene) and plasma membrane H+ATPase, showed the basal level of transcripts in control condition and an increase with NaCl treatment. ASR1 gene is made full length using 5' RACE and its potential role in imparting salt tolerance is being studied.

  19. Construction of cDNA library and preliminary analysis of expressed sequence tags from tea plant [Camellia sinensis (L) O. Kuntze].

    PubMed

    Phukon, Munmi; Namdev, Richa; Deka, Diganta; Modi, Mahendra K; Sen, Priyabrata

    2012-09-10

    Tea is the most popular non-alcoholic and healthy beverage across the world. The understanding of the genetic organization and molecular biology of tea plant, which is very poorly understood at present, is required for quantum increase in productivity and efficient use of germplasm for either cultivation or breeding program. Single-pass sequencing of randomly selected cDNA clones is the most widely accepted technique for gene identification and cloning. In the present study, a good quality cDNA library was constructed and preliminary analysis of ESTs was carried out. The titers of unamplified and amplified libraries were 1.4 × 10(6)pfu/ml and 5.27 × 10(8)pfu/ml respectively. A total of 210 cDNA clones from the constructed cDNA library were sequenced and analyzed. A total of 84 high quality Expressed Sequence Tags (ESTs) were generated, among which 71 ESTs had significant homology with sequences in NCBI non-redundant protein database by BLAST X analysis. About 80% ESTs had poly (A) tail at 3' end indicating that the cDNAs were full length. The database-matched ESTs were classified into putative cellular roles, viz. energy-related category (corresponding to 20% of total BLAST X matched ESTs), Transcription (14.2%), protein synthesis (14.2%) cell growth and division (8.6%), cell structure (5.7%), signal transduction (5.7%), transporters (2.9%), disease and defenses (2.9%), secondary metabolism (2.9%) and gene regulation (2.9%). This study provides an overview of the mRNA expression profile and first hand information of gene sequence expressed in tender leaves and apical buds of tea plant.

  20. Comparative analysis of secreted protein evolution using expressed sequence tags from four poplar leaf rusts (Melampsora spp.)

    PubMed Central

    2010-01-01

    Background Obligate biotrophs such as rust fungi are believed to establish long-term relationships by modulating plant defenses through a plethora of effector proteins, whose most recognizable feature is the presence of a signal peptide for secretion. Since the phenotypes of these effectors extend to host cells, their genes are expected to be under accelerated evolution stimulated by host-pathogen coevolutionary arms races. Recently, whole genome sequence data has allowed the prediction of secretomes, facilitating the identification of putative effectors. Results We generated cDNA libraries from four poplar leaf rust pathogens (Melampsora spp.) and used computational approaches to identify and annotate putative secreted proteins with the aim of uncovering new knowledge about the nature and evolution of the rust secretome. While more than half of the predicted secretome members encoded lineage-specific proteins, similarities with experimentally characterized fungal effectors were also identified. A SAGE analysis indicated a strong stage-specific regulation of transcripts encoding secreted proteins. The average sequence identity of putative secreted proteins to their closest orthologs in the wheat stem rust Puccinia graminis f. sp. tritici was dramatically reduced compared with non-secreted ones. A comparative genomics approach based on homologous gene groups unravelled positive selection in putative members of the secretome. Conclusion We uncovered robust evidence that different evolutionary constraints are acting on the rust secretome when compared to the rest of the genome. These results are consistent with the view that these genes are more likely to exhibit an effector activity and be involved in coevolutionary arms races with host factors. PMID:20615251

  1. Bioinformatic analysis of fruit-specific expressed sequence tag libraries of Diospyros kaki Thunb.: view at the transcriptome at different developmental stages.

    PubMed

    Sablok, Gaurav; Luo, Chun; Lee, Wan Sin; Rahman, Farzana; Tatarinova, Tatiana V; Harikrishna, Jennifer Ann; Luo, Zhengrong

    2011-07-01

    We present here a systematic analysis of the Diospyros kaki expressed sequence tags (ESTs) generated from development stage-specific libraries. A total of 2,529 putative tentative unigenes were identified in the MF library whereas the OYF library displayed 3,775 tentative unigenes. Among the two cDNA libraries, 325 EST-Simple sequence repeats (SSRs) in 296 putative unigenes were detected in the MF library showing an occurrence of 11.7% with a frequency of 1 SSR/3.16 kb whereas the OYF library had an EST-SSRs occurrence of 10.8% with 407 EST-SSRs in the 352 putative unigenes with a frequency of 1 SSR/2.92 kb. We observed a higher frequency of SNPs and indels in the OYF library (20.94 SNPs/indels per 100 bp) in comparison to MF library showed a relatively lower frequency (0.74 SNPs/indels per 100 bp). A combined homology and secondary structure analysis approach identified a potential miRNA precursor, an ortholog of miR159, and potential miR159 targets, in the development-specific ESTs of D. kaki. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s13205-011-0005-9) contains supplementary material, which is available to authorized users.

  2. Generation and analysis of large-scale expressed sequence tags (ESTs) from a full-length enriched cDNA library of porcine backfat tissue

    PubMed Central

    Kim, Tae-Hun; Kim, Nam-Soon; Lim, Dajeong; Lee, Kyung-Tai; Oh, Jung-Hwa; Park, Hye-Sook; Jang, Gil-Won; Kim, Hyung-Yong; Jeon, Mina; Choi, Bong-Hwan; Lee, Hae-Young; Chung, HY; Kim, Heebal

    2006-01-01

    Background Genome research in farm animals will expand our basic knowledge of the genetic control of complex traits, and the results will be applied in the livestock industry to improve meat quality and productivity, as well as to reduce the incidence of disease. A combination of quantitative trait locus mapping and microarray analysis is a useful approach to reduce the overall effort needed to identify genes associated with quantitative traits of interest. Results We constructed a full-length enriched cDNA library from porcine backfat tissue. The estimated average size of the cDNA inserts was 1.7 kb, and the cDNA fullness ratio was 70%. In total, we deposited 16,110 high-quality sequences in the dbEST division of GenBank (accession numbers: DT319652-DT335761). For all the expressed sequence tags (ESTs), approximately 10.9 Mb of porcine sequence were generated with an average length of 674 bp per EST (range: 200–952 bp). Clustering and assembly of these ESTs resulted in a total of 5,008 unique sequences with 1,776 contigs (35.46%) and 3,232 singleton (65.54%) ESTs. From a total of 5,008 unique sequences, 3,154 (62.98%) were similar to other sequences, and 1,854 (37.02%) were identified as having no hit or low identity (<95%) and 60% coverage in The Institute for Genomic Research (TIGR) gene index of Sus scrofa. Gene ontology (GO) annotation of unique sequences showed that approximately 31.7, 32.3, and 30.8% were assigned molecular function, biological process, and cellular component GO terms, respectively. A total of 1,854 putative novel transcripts resulted after comparison and filtering with the TIGR SsGI; these included a large percentage of singletons (80.64%) and a small proportion of contigs (13.36%). Conclusion The sequence data generated in this study will provide valuable information for studying expression profiles using EST-based microarrays and assist in the condensation of current pig TCs into clusters representing longer stretches of cDNA sequences

  3. Computational exploration of microRNAs from expressed sequence tags of Humulus lupulus, target predictions and expression analysis.

    PubMed

    Mishra, Ajay Kumar; Duraisamy, Ganesh Selvaraj; Týcová, Anna; Matoušek, Jaroslav

    2015-12-01

    Among computationally predicted and experimentally validated plant miRNAs, several are conserved across species boundaries in the plant kingdom. In this study, a combined experimental-in silico computational based approach was adopted for the identification and characterization of miRNAs in Humulus lupulus (hop), which is widely cultivated for use by the brewing industry and apart from, used as a medicinal herb. A total of 22 miRNAs belonging to 17 miRNA families were identified in hop following comparative computational approach and EST-based homology search according to a series of filtering criteria. Selected miRNAs were validated by end-point PCR and quantitative reverse transcription-polymerase chain reaction (qRT-PCR), confirmed the existence of conserved miRNAs in hop. Based on the characteristic that miRNAs exhibit perfect or nearly perfect complementarity with their targeted mRNA sequences, a total of 47 potential miRNA targets were identified in hop. Strikingly, the majority of predicted targets were belong to transcriptional factors which could regulate hop growth and development, including leaf, root and even cone development. Moreover, the identified miRNAs may also be involved in other cellular and metabolic processes, such as stress response, signal transduction, and other physiological processes. The cis-regulatory elements relevant to biotic and abiotic stress, plant hormone response, flavonoid biosynthesis were identified in the promoter regions of those miRNA genes. Overall, findings from this study will accelerate the way for further researches of miRNAs, their functions in hop and shows a path for the prediction and analysis of miRNAs to those species whose genomes are not available.

  4. Analysis of expressed sequence tags and identification of genes encoding cell-wall-degrading enzymes from the fungivorous nematode Aphelenchus avenae.

    PubMed

    Karim, Nurul; Jones, John T; Okada, Hiroaki; Kikuchi, Taisei

    2009-11-16

    The fungivorus nematode, Aphelenchus avenae is widespread in soil and is found in association with decaying plant material. This nematode is also found in association with plants but its ability to cause plant disease remains largely undetermined. The taxonomic position and intermediate lifestyle of A. avenae make it an important model for studying the evolution of plant parasitism within the Nematoda. In addition, the exceptional capacity of this nematode to survive desiccation makes it an important system for study of anhydrobiosis. Expressed sequence tag (EST) analysis may therefore be useful in providing an initial insight into the poorly understood genetic background of A. avenae. We present the generation, analysis and annotation of over 5,000 ESTs from a mixed-stage A. avenae cDNA library. Clustering of 5,076 high-quality ESTs resulted in a set of 2,700 non-redundant sequences comprising 695 contigs and 2,005 singletons. Comparative analyses indicated that 1,567 (58.0%) of the cluster sequences had homologues in Caenorhabditis elegans, 1,750 (64.8%) in other nematodes, 1,321(48.9%) in organisms other than nematodes, and 862 (31.9%) had no significant match to any sequence in current protein or nucleotide databases. In addition, 1,100 (40.7%) of the sequences were functionally classified using Gene Ontology (GO) hierarchy. Similarity searches of the cluster sequences identified a set of genes with significant homology to genes encoding enzymes that degrade plant or fungal cell walls. The full length sequences of two genes encoding glycosyl hydrolase family 5 (GHF5) cellulases and two pectate lyase genes encoding polysaccharide lyase family 3 (PL3) proteins were identified and characterized. We have described at least 2,214 putative genes from A. avenae and identified a set of genes encoding a range of cell-wall-degrading enzymes. This EST dataset represents a starting point for studies in a number of different fundamental and applied areas. The presence of

  5. Analysis of expressed sequence tags and identification of genes encoding cell-wall-degrading enzymes from the fungivorous nematode Aphelenchus avenae

    PubMed Central

    2009-01-01

    Background The fungivorus nematode, Aphelenchus avenae is widespread in soil and is found in association with decaying plant material. This nematode is also found in association with plants but its ability to cause plant disease remains largely undetermined. The taxonomic position and intermediate lifestyle of A. avenae make it an important model for studying the evolution of plant parasitism within the Nematoda. In addition, the exceptional capacity of this nematode to survive desiccation makes it an important system for study of anhydrobiosis. Expressed sequence tag (EST) analysis may therefore be useful in providing an initial insight into the poorly understood genetic background of A. avenae. Results We present the generation, analysis and annotation of over 5,000 ESTs from a mixed-stage A. avenae cDNA library. Clustering of 5,076 high-quality ESTs resulted in a set of 2,700 non-redundant sequences comprising 695 contigs and 2,005 singletons. Comparative analyses indicated that 1,567 (58.0%) of the cluster sequences had homologues in Caenorhabditis elegans, 1,750 (64.8%) in other nematodes, 1,321(48.9%) in organisms other than nematodes, and 862 (31.9%) had no significant match to any sequence in current protein or nucleotide databases. In addition, 1,100 (40.7%) of the sequences were functionally classified using Gene Ontology (GO) hierarchy. Similarity searches of the cluster sequences identified a set of genes with significant homology to genes encoding enzymes that degrade plant or fungal cell walls. The full length sequences of two genes encoding glycosyl hydrolase family 5 (GHF5) cellulases and two pectate lyase genes encoding polysaccharide lyase family 3 (PL3) proteins were identified and characterized. Conclusion We have described at least 2,214 putative genes from A. avenae and identified a set of genes encoding a range of cell-wall-degrading enzymes. This EST dataset represents a starting point for studies in a number of different fundamental and

  6. Deductions about the Number, Organization, and Evolution of Genes in the Tomato Genome Based on Analysis of a Large Expressed Sequence Tag Collection and Selective Genomic Sequencing

    PubMed Central

    Van der Hoeven, Rutger; Ronning, Catherine; Giovannoni, James; Martin, Gregory; Tanksley, Steven

    2002-01-01

    Analysis of a collection of 120,892 single-pass ESTs, derived from 26 different tomato cDNA libraries and reduced to a set of 27,274 unique consensus sequences (unigenes), revealed that 70% of the unigenes have identifiable homologs in the Arabidopsis genome. Genes corresponding to metabolism have remained most conserved between these two genomes, whereas genes encoding transcription factors are among the fastest evolving. The majority of the 10 largest conserved multigene families share similar copy numbers in tomato and Arabidopsis, suggesting that the multiplicity of these families may have occurred before the divergence of these two species. An exception to this multigene conservation was observed for the E8-like protein family, which is associated with fruit ripening and has higher copy number in tomato than in Arabidopsis. Finally, six BAC clones from different parts of the tomato genome were isolated, genetically mapped, sequenced, and annotated. The combined analysis of the EST database and these six sequenced BACs leads to the prediction that the tomato genome encodes ∼35,000 genes, which are sequestered largely in euchromatic regions corresponding to less than one-quarter of the total DNA in the tomato nucleus. PMID:12119366

  7. Analysis and functional annotation of expressed sequence tags from in vitro cell lines of elasmobranchs: Spiny dogfish shark (Squalus acanthias) and little skate (Leucoraja erinacea).

    PubMed

    Parton, Angela; Bayne, Christopher J; Barnes, David W

    2010-09-01

    Elasmobranchs are the most commonly used experimental models among the jawed, cartilaginous fish (Chondrichthyes). Previously we developed cell lines from embryos of two elasmobranchs, Squalus acanthias the spiny dogfish shark (SAE line), and Leucoraja erinacea the little skate (LEE-1 line). From these lines cDNA libraries were derived and expressed sequence tags (ESTs) generated. From the SAE cell line 4303 unique transcripts were identified, with 1848 of these representing unknown sequences (showing no BLASTX identification). From the LEE-1 cell line, 3660 unique transcripts were identified, and unknown, unique sequences totaled 1333. Gene Ontology (GO) annotation showed that GO assignments for the two cell lines were in general similar. These results suggest that the procedures used to derive the cell lines led to isolation of cell types of the same general embryonic origin from both species. The LEE-1 transcripts included GO categories "envelope" and "oxidoreductase activity" but the SAE transcripts did not. GO analysis of SAE transcripts identified the category "anatomical structure formation" that was not present in LEE-1 cells. Increased organelle compartments may exist within LEE-1 cells compared to SAE cells, and the higher oxidoreductase activity in LEE-1 cells may indicate a role for these cells in responses associated with innate immunity or in steroidogenesis. These EST libraries from elasmobranch cell lines provide information for assembly of genomic sequences and are useful in revealing gene diversity, new genes and molecular markers, as well as in providing means for elucidation of full-length cDNAs and probes for gene array analyses. This is the first study of this type with members of the Chondrichthyes.

  8. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    PubMed Central

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  9. Multiple tag labeling method for DNA sequencing

    DOEpatents

    Mathies, Richard A.; Huang, Xiaohua C.; Quesada, Mark A.

    1995-01-01

    A DNA sequencing method described which uses single lane or channel electrophoresis. Sequencing fragments are separated in said lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radio-isotope labels.

  10. Multiple tag labeling method for DNA sequencing

    DOEpatents

    Mathies, R.A.; Huang, X.C.; Quesada, M.A.

    1995-07-25

    A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.

  11. Generation and analysis of expressed sequence tags(ESTs) for marker development in yam (Dioscores alata L.)

    USDA-ARS?s Scientific Manuscript database

    A total of 44,757 EST sequences , 1705 EST-SSR and 104 SNP markers were generated from the cDNA libraries of the resistant and susceptible genotypes. We have developed a comprehensive annotated transcriptome data set in yam to enrich the EST information in public databases. These EST resources prov...

  12. Generation and Analysis of Expressed Sequence Tags (ESTs) from Halophyte Atriplex canescens to Explore Salt-Responsive Related Genes

    PubMed Central

    Li, Jingtao; Sun, Xinhua; Yu, Gang; Jia, Chengguo; Liu, Jinliang; Pan, Hongyu

    2014-01-01

    Little information is available on gene expression profiling of halophyte A. canescens. To elucidate the molecular mechanism for stress tolerance in A. canescens, a full-length complementary DNA library was generated from A. canescens exposed to 400 mM NaCl, and provided 343 high-quality ESTs. In an evaluation of 343 valid EST sequences in the cDNA library, 197 unigenes were assembled, among which 190 unigenes (83.1% ESTs) were identified according to their significant similarities with proteins of known functions. All the 343 EST sequences have been deposited in the dbEST GenBank under accession numbers JZ535802 to JZ536144. According to Arabidopsis MIPS functional category and GO classifications, we identified 193 unigenes of the 311 annotations EST, representing 72 non-redundant unigenes sharing similarities with genes related to the defense response. The sets of ESTs obtained provide a rich genetic resource and 17 up-regulated genes related to salt stress resistance were identified by qRT-PCR. Six of these genes may contribute crucially to earlier and later stage salt stress resistance. Additionally, among the 343 unigenes sequences, 22 simple sequence repeats (SSRs) were also identified contributing to the study of A. canescens resources. PMID:24960361

  13. A molecular analysis of desiccation tolerance mechanisms in the anhydrobiotic nematode Panagrolaimus superbus using expressed sequenced tags

    PubMed Central

    2012-01-01

    Background Some organisms can survive extreme desiccation by entering into a state of suspended animation known as anhydrobiosis. Panagrolaimus superbus is a free-living anhydrobiotic nematode that can survive rapid environmental desiccation. The mechanisms that P. superbus uses to combat the potentially lethal effects of cellular dehydration may include the constitutive and inducible expression of protective molecules, along with behavioural and/or morphological adaptations that slow the rate of cellular water loss. In addition, inducible repair and revival programmes may also be required for successful rehydration and recovery from anhydrobiosis. Results To identify constitutively expressed candidate anhydrobiotic genes we obtained 9,216 ESTs from an unstressed mixed stage population of P. superbus. We derived 4,009 unigenes from these ESTs. These unigene annotations and sequences can be accessed at http://www.nematodes.org/nembase4/species_info.php?species=PSC. We manually annotated a set of 187 constitutively expressed candidate anhydrobiotic genes from P. superbus. Notable among those is a putative lineage expansion of the lea (late embryogenesis abundant) gene family. The most abundantly expressed sequence was a member of the nematode specific sxp/ral-2 family that is highly expressed in parasitic nematodes and secreted onto the surface of the nematodes' cuticles. There were 2,059 novel unigenes (51.7% of the total), 149 of which are predicted to encode intrinsically disordered proteins lacking a fixed tertiary structure. One unigene may encode an exo-β-1,3-glucanase (GHF5 family), most similar to a sequence from Phytophthora infestans. GHF5 enzymes have been reported from several species of plant parasitic nematodes, with horizontal gene transfer (HGT) from bacteria proposed to explain their evolutionary origin. This P. superbus sequence represents another possible HGT event within the Nematoda. The expression of five of the 19 putative stress response

  14. Analysis of bacterial and archaeal diversity in coastal microbial mats using massive parallel 16S rRNA gene tag sequencing.

    PubMed

    Bolhuis, Henk; Stal, Lucas J

    2011-11-01

    Coastal microbial mats are small-scale and largely closed ecosystems in which a plethora of different functional groups of microorganisms are responsible for the biogeochemical cycling of the elements. Coastal microbial mats play an important role in coastal protection and morphodynamics through stabilization of the sediments and by initiating the development of salt-marshes. Little is known about the bacterial and especially archaeal diversity and how it contributes to the ecological functioning of coastal microbial mats. Here, we analyzed three different types of coastal microbial mats that are located along a tidal gradient and can be characterized as marine (ST2), brackish (ST3) and freshwater (ST3) systems. The mats were sampled during three different seasons and subjected to massive parallel tag sequencing of the V6 region of the 16S rRNA genes of Bacteria and Archaea. Sequence analysis revealed that the mats are among the most diverse marine ecosystems studied so far and consist of several novel taxonomic levels ranging from classes to species. The diversity between the different mat types was far more pronounced than the changes between the different seasons at one location. The archaeal community for these mats have not been studied before and revealed a strong reaction on a short period of draught during summer resulting in a massive increase in halobacterial sequences, whereas the bacterial community was barely affected. We concluded that the community composition and the microbial diversity were intrinsic of the mat type and depend on the location along the tidal gradient indicating a relation with salinity.

  15. Unraveling new genes associated with seed development and metabolism in Bixa orellana L. by expressed sequence tag (EST) analysis.

    PubMed

    Soares, Virgínia L F; Rodrigues, Simone M; de Oliveira, Tahise M; de Queiroz, Talisson O; Lima, Lívia S; Hora-Júnior, Braz T; Gramacho, Karina P; Micheli, Fabienne; Cascardo, Júlio C M; Otoni, Wagner C; Gesteira, Abelmon S; Costa, Marcio G C

    2011-02-01

    The tropical tree Bixa orellana L. produces a range of secondary metabolites which biochemical and molecular biosynthesis basis are not well understood. In this work we have characterized a set of ESTs from a non-normalized cDNA library of B. orellana seeds to obtain information about the main developmental and metabolic processes taking place in developing seeds and their associated genes. After sequencing a set of randomly selected clones, most of the sequences were assigned with putative functions based on similarity, GO annotations and protein domains. The most abundant transcripts encoded proteins associated with cell wall (prolyl 4-hydroxylase), fatty acid (acyl carrier protein), and hormone/flavonoid (2OG-Fe oxygenase) synthesis, germination (MADS FLC-like protein) and embryo development (AP2/ERF transcription factor) regulation, photosynthesis (chlorophyll a-b binding protein), cell elongation (MAP65-1a), and stress responses (metallothionein- and thaumatin-like proteins). Enzymes were assigned to 16 different metabolic pathways related to both primary and secondary metabolisms. Characterization of two candidate genes of the bixin biosynthetic pathway, BoCCD and BoOMT, showed that they belong, respectively, to the carotenoid-cleavage dioxygenase 4 (CCD4) and caffeic acid O-methyltransferase (COMT) families, and are up-regulated during seed development. It indicates their involvement in the synthesis of this commercially important carotenoid pigment in seeds of B. orellana. Most of the genes identified here are the first representatives of their gene families in B. orellana.

  16. Analysis of expressed sequence tags from Actinidia: applications of a cross species EST database for gene discovery in the areas of flavor, health, color and ripening.

    PubMed

    Crowhurst, Ross N; Gleave, Andrew P; MacRae, Elspeth A; Ampomah-Dwamena, Charles; Atkinson, Ross G; Beuning, Lesley L; Bulley, Sean M; Chagne, David; Marsh, Ken B; Matich, Adam J; Montefiori, Mirco; Newcomb, Richard D; Schaffer, Robert J; Usadel, Björn; Allan, Andrew C; Boldingh, Helen L; Bowen, Judith H; Davy, Marcus W; Eckloff, Rheinhart; Ferguson, A Ross; Fraser, Lena G; Gera, Emma; Hellens, Roger P; Janssen, Bart J; Klages, Karin; Lo, Kim R; MacDiarmid, Robin M; Nain, Bhawana; McNeilage, Mark A; Rassam, Maysoon; Richardson, Annette C; Rikkerink, Erik Ha; Ross, Gavin S; Schröder, Roswitha; Snowden, Kimberley C; Souleyre, Edwige J F; Templeton, Matt D; Walton, Eric F; Wang, Daisy; Wang, Mindy Y; Wang, Yanming Y; Wood, Marion; Wu, Rongmei; Yauk, Yar-Khing; Laing, William A

    2008-07-27

    Kiwifruit (Actinidia spp.) are a relatively new, but economically important crop grown in many different parts of the world. Commercial success is driven by the development of new cultivars with novel consumer traits including flavor, appearance, healthful components and convenience. To increase our understanding of the genetic diversity and gene-based control of these key traits in Actinidia, we have produced a collection of 132,577 expressed sequence tags (ESTs). The ESTs were derived mainly from four Actinidia species (A. chinensis, A. deliciosa, A. arguta and A. eriantha) and fell into 41,858 non redundant clusters (18,070 tentative consensus sequences and 23,788 EST singletons). Analysis of flavor and fragrance-related gene families (acyltransferases and carboxylesterases) and pathways (terpenoid biosynthesis) is presented in comparison with a chemical analysis of the compounds present in Actinidia including esters, acids, alcohols and terpenes. ESTs are identified for most genes in color pathways controlling chlorophyll degradation and carotenoid biosynthesis. In the health area, data are presented on the ESTs involved in ascorbic acid and quinic acid biosynthesis showing not only that genes for many of the steps in these pathways are represented in the database, but that genes encoding some critical steps are absent. In the convenience area, genes related to different stages of fruit softening are identified. This large EST resource will allow researchers to undertake the tremendous challenge of understanding the molecular basis of genetic diversity in the Actinidia genus as well as provide an EST resource for comparative fruit genomics. The various bioinformatics analyses we have undertaken demonstrates the extent of coverage of ESTs for genes encoding different biochemical pathways in Actinidia.

  17. Analysis of expressed sequence tags from Actinidia: applications of a cross species EST database for gene discovery in the areas of flavor, health, color and ripening

    PubMed Central

    Crowhurst, Ross N; Gleave, Andrew P; MacRae, Elspeth A; Ampomah-Dwamena, Charles; Atkinson, Ross G; Beuning, Lesley L; Bulley, Sean M; Chagne, David; Marsh, Ken B; Matich, Adam J; Montefiori, Mirco; Newcomb, Richard D; Schaffer, Robert J; Usadel, Björn; Allan, Andrew C; Boldingh, Helen L; Bowen, Judith H; Davy, Marcus W; Eckloff, Rheinhart; Ferguson, A Ross; Fraser, Lena G; Gera, Emma; Hellens, Roger P; Janssen, Bart J; Klages, Karin; Lo, Kim R; MacDiarmid, Robin M; Nain, Bhawana; McNeilage, Mark A; Rassam, Maysoon; Richardson, Annette C; Rikkerink, Erik HA; Ross, Gavin S; Schröder, Roswitha; Snowden, Kimberley C; Souleyre, Edwige JF; Templeton, Matt D; Walton, Eric F; Wang, Daisy; Wang, Mindy Y; Wang, Yanming Y; Wood, Marion; Wu, Rongmei; Yauk, Yar-Khing; Laing, William A

    2008-01-01

    Background Kiwifruit (Actinidia spp.) are a relatively new, but economically important crop grown in many different parts of the world. Commercial success is driven by the development of new cultivars with novel consumer traits including flavor, appearance, healthful components and convenience. To increase our understanding of the genetic diversity and gene-based control of these key traits in Actinidia, we have produced a collection of 132,577 expressed sequence tags (ESTs). Results The ESTs were derived mainly from four Actinidia species (A. chinensis, A. deliciosa, A. arguta and A. eriantha) and fell into 41,858 non redundant clusters (18,070 tentative consensus sequences and 23,788 EST singletons). Analysis of flavor and fragrance-related gene families (acyltransferases and carboxylesterases) and pathways (terpenoid biosynthesis) is presented in comparison with a chemical analysis of the compounds present in Actinidia including esters, acids, alcohols and terpenes. ESTs are identified for most genes in color pathways controlling chlorophyll degradation and carotenoid biosynthesis. In the health area, data are presented on the ESTs involved in ascorbic acid and quinic acid biosynthesis showing not only that genes for many of the steps in these pathways are represented in the database, but that genes encoding some critical steps are absent. In the convenience area, genes related to different stages of fruit softening are identified. Conclusion This large EST resource will allow researchers to undertake the tremendous challenge of understanding the molecular basis of genetic diversity in the Actinidia genus as well as provide an EST resource for comparative fruit genomics. The various bioinformatics analyses we have undertaken demonstrates the extent of coverage of ESTs for genes encoding different biochemical pathways in Actinidia. PMID:18655731

  18. Construction of a cDNA library and preliminary analysis of expressed sequence tags in Piper hainanense.

    PubMed

    Fan, R; Ling, P; Hao, C Y; Li, F P; Huang, L F; Wu, B D; Wu, H S

    2015-10-19

    Black pepper is a perennial climbing vine. It is widely cultivated because its berries can be utilized not only as a spice in food but also for medicinal use. This study aimed to construct a standardized, high-quality cDNA library to facilitated identification of new Piper hainanense transcripts. For this, 262 unigenes were used to generate raw reads. The average length of these 262 unigenes was 774.8 bp. Of these, 94 genes (35.9%) were newly identified, according to the NCBI protein database. Thus, identification of new genes may broaden the molecular knowledge of P. hainanense on the basis of Clusters of Orthologous Groups and Gene Ontology categories. In addition, certain basic genes linked to physiological processes, which can contribute to disease resistance and thereby to the breeding of black pepper. A total of 26 unigenes were found to be SSR markers. Dinucleotide SSR was the main repeat motif, accounting for 61.54%, followed by trinucleotide SSR (23.07%). Eight primer pairs successfully amplified DNA fragments and detected significant amounts of polymorphism among twenty-one piper germplasm. These results present a novel sequence information of P. hainanense, which can serve as the foundation for further genetic research on this species.

  19. Generation and analysis of expressed sequence tags (ESTs) of Camelina sativa to mine drought stress-responsive genes.

    PubMed

    Kanth, Bashistha Kumar; Kumari, Shipra; Choi, Seo Hee; Ha, Hye-Jeong; Lee, Geung-Joo

    2015-11-06

    Camelina sativa is an oil-producing crop belonging to the family of Brassicaceae. Due to exceptionally high content of omega fatty acid, it is commercially grown around the world as edible oil, biofuel, and animal feed. A commonly referred 'false flax' or gold-of-pleasure Camelina sativa has been interested as one of biofuel feedstocks. The species can grow on marginal land due to its superior drought tolerance with low requirement of agricultural inputs. This crop has been unexploited due to very limited transcriptomic and genomic data. Use of gene-specific molecular markers is an important strategy for new cultivar development in breeding program. In this study, Illumina paired-end sequencing technology and bioinformatics tools were used to obtain expression profiling of genes responding to drought stress in Camelina sativa BN14. A total of more than 60,000 loci were assembled, corresponding to approximately 275 K transcripts. When the species was exposed to 10 kPa drought stress, 100 kPa drought stress, and rehydrated conditions, a total of 107, 2,989, and 982 genes, respectively, were up-regulated, while 146, 3,659, and 1189 genes, respectively, were down-regulated compared to control condition. Some unknown genes were found to be highly expressed under drought conditions, together with some already reported gene families such as senescence-associated genes, CAP160, and LEA under 100 kPa soil water condition, cysteine protease, 2OG, Fe(II)-dependent oxygenase, and RAD-like 1 under rehydrated condition. These genes will be further validated and mapped to determine their function and loci. This EST library will be favorably applied to develop gene-specific molecular markers and discover genes responsible for drought tolerance in Camelina species.

  20. Applying thiouracil (TU)-tagging for mouse transcriptome analysis

    PubMed Central

    Gay, Leslie; Karfilis, Kate V.; Miller, Michael R.; Doe, Chris Q.; Stankunas, Kryn

    2014-01-01

    Transcriptional profiling is a powerful approach to study mouse development, physiology, and disease models. Here, we describe a protocol for mouse thiouracil-tagging (TU-tagging), a transcriptome analysis technology that includes in vivo covalent labeling, purification, and analysis of cell type-specific RNA. TU-tagging enables 1) the isolation of RNA from a given cell population of a complex tissue, avoiding transcriptional changes induced by cell isolation trauma, and 2) the identification of actively transcribed RNAs and not pre-existing transcripts. Therefore, in contrast to other cell-specific transcriptional profiling methods based on purification of tagged ribosomes or nuclei, TU-tagging provides a direct examination of transcriptional regulation. We describe how to: 1) deliver 4-thiouracil to transgenic mice to thio-label cell lineage-specific transcripts, 2) purify TU-tagged RNA and prepare libraries for Illumina sequencing, and 3) follow a straight-forward bioinformatics workflow to identify cell type-enriched or differentially expressed genes. Tissue containing TU-tagged RNA can be obtained in one day, RNA-Seq libraries generated within two days, and, following sequencing, an initial bioinformatics analysis completed in one additional day. PMID:24457332

  1. SAGETTARIUS: a program to reduce the number of tags mapped to multiple transcripts and to plan SAGE sequencing stages.

    PubMed

    Bianchetti, Laurent; Wu, Yan; Guerin, Eric; Plewniak, Frédéric; Poch, Olivier

    2007-01-01

    SAGE (Serial Analysis of Gene Expression) experiments generate short nucleotide sequences called 'tags' which are assumed to map unambiguously to their original transcripts (1 tag to 1 transcript mapping). Nevertheless, many tags are generated that do not map to any transcript or map to multiple transcripts. Current bioinformatics resources, such as SAGEmap and TAGmapper, have focused on reducing the number of unmapped tags. Here, we describe SAGETTARIUS, a new high-throughput program that performs successive precise Nla3 and Sau3A tag to transcript mapping, based on specifically designed Virtual Tag (VT) libraries. First, SAGETTARIUS decreases the number of tags mapped to multiple transcripts. Among the various mapping resources compared, SAGETTARIUS performed the best in this respect by decreasing up to 11% the number of multiply mapped tags. Second, SAGETTARIUS allows the establishment of a guideline for SAGE experiment sequencing efforts through efficient mapping of the CRT (Cytoplasmic Ribosomal protein Transcripts)-specific tags. Using all publicly available human and mouse Nla3 SAGE experiments, we show that sequencing 100,000 tags is sufficient to map almost all CRT-specific tags and that four sequencing stages can be identified when carrying out a human or mouse SAGE project. SAGETTARIUS is web interfaced and freely accessible to academic users.

  2. Needles in the EST haystack: large-scale identification and analysis of excretory-secretory (ES) proteins in parasitic nematodes using expressed sequence tags (ESTs).

    PubMed

    Nagaraj, Shivashankar H; Gasser, Robin B; Ranganathan, Shoba

    2008-09-24

    Parasitic nematodes of humans, other animals and plants continue to impose a significant public health and economic burden worldwide, due to the diseases they cause. Promising antiparasitic drug and vaccine candidates have been discovered from excreted or secreted (ES) proteins released from the parasite and exposed to the immune system of the host. Mining the entire expressed sequence tag (EST) data available from parasitic nematodes represents an approach to discover such ES targets. In this study, we predicted, using EST2Secretome, a novel, high-throughput, computational workflow system, 4,710 ES proteins from 452,134 ESTs derived from 39 different species of nematodes, parasitic in animals (including humans) or plants. In total, 2,632, 786, and 1,292 ES proteins were predicted for animal-, human-, and plant-parasitic nematodes. Subsequently, we systematically analysed ES proteins using computational methods. Of these 4,710 proteins, 2,490 (52.8%) had orthologues in Caenorhabditis elegans, whereas 621 (13.8%) appeared to be novel, currently having no significant match to any molecule available in public databases. Of the C. elegans homologues, 267 had strong "loss-of-function" phenotypes by RNA interference (RNAi) in this nematode. We could functionally classify 1,948 (41.3%) sequences using the Gene Ontology (GO) terms, establish pathway associations for 573 (12.2%) sequences using Kyoto Encyclopaedia of Genes and Genomes (KEGG), and identify protein interaction partners for 1,774 (37.6%) molecules. We also mapped 758 (16.1%) proteins to protein domains including the nematode-specific protein family "transthyretin-like" and "chromadorea ALT," considered as vaccine candidates against filariasis in humans. We report the large-scale analysis of ES proteins inferred from EST data for a range of parasitic nematodes. This set of ES proteins provides an inventory of known and novel members of ES proteins as a foundation for studies focused on understanding the biology

  3. Needles in the EST Haystack: Large-Scale Identification and Analysis of Excretory-Secretory (ES) Proteins in Parasitic Nematodes Using Expressed Sequence Tags (ESTs)

    PubMed Central

    Nagaraj, Shivashankar H.; Gasser, Robin B.; Ranganathan, Shoba

    2008-01-01

    Background Parasitic nematodes of humans, other animals and plants continue to impose a significant public health and economic burden worldwide, due to the diseases they cause. Promising antiparasitic drug and vaccine candidates have been discovered from excreted or secreted (ES) proteins released from the parasite and exposed to the immune system of the host. Mining the entire expressed sequence tag (EST) data available from parasitic nematodes represents an approach to discover such ES targets. Methods and Findings In this study, we predicted, using EST2Secretome, a novel, high-throughput, computational workflow system, 4,710 ES proteins from 452,134 ESTs derived from 39 different species of nematodes, parasitic in animals (including humans) or plants. In total, 2,632, 786, and 1,292 ES proteins were predicted for animal-, human-, and plant-parasitic nematodes. Subsequently, we systematically analysed ES proteins using computational methods. Of these 4,710 proteins, 2,490 (52.8%) had orthologues in Caenorhabditis elegans, whereas 621 (13.8%) appeared to be novel, currently having no significant match to any molecule available in public databases. Of the C. elegans homologues, 267 had strong “loss-of-function” phenotypes by RNA interference (RNAi) in this nematode. We could functionally classify 1,948 (41.3%) sequences using the Gene Ontology (GO) terms, establish pathway associations for 573 (12.2%) sequences using Kyoto Encyclopaedia of Genes and Genomes (KEGG), and identify protein interaction partners for 1,774 (37.6%) molecules. We also mapped 758 (16.1%) proteins to protein domains including the nematode-specific protein family “transthyretin-like” and “chromadorea ALT,” considered as vaccine candidates against filariasis in humans. Conclusions We report the large-scale analysis of ES proteins inferred from EST data for a range of parasitic nematodes. This set of ES proteins provides an inventory of known and novel members of ES proteins as a

  4. QTL analysis of photoperiod sensitivity in common buckwheat by using markers for expressed sequence tags and photoperiod-sensitivity candidate genes

    PubMed Central

    Hara, Takashi; Iwata, Hiroyoshi; Okuno, Kazutoshi; Matsui, Katsuhiro; Ohsawa, Ryo

    2011-01-01

    Photoperiod sensitivity is an important trait related to crop adaptation and ecological breeding in common buckwheat (Fagopyrum esculentum Moench). Although photoperiod sensitivity in this species is thought to be controlled by quantitative trait loci (QTLs), no genes or regions related to photoperiod sensitivity had been identified until now. Here, we identified QTLs controlling photoperiod sensitivity by QTL analysis in a segregating F4 population (n = 100) derived from a cross of two autogamous lines, 02AL113(Kyukei SC2)LH.self and C0408-0 RP. The F4 progenies were genotyped with three markers for photoperiod-sensitivity candidate genes, which were identified based on homology to photoperiod-sensitivity genes in Arabidopsis and 76 expressed sequence tag markers. Among the three photoperiod-sensitivity candidate genes (FeCCA1, FeELF3 and FeCOL3) identified in common buckwheat, FeELF3 was associated with photoperiod sensitivity. Two EST regions, Fest_L0606_4 and Fest_L0337_6, were associated with photoperiod sensitivity and explained 20.0% and 14.2% of the phenotypic variation, respectively. For both EST regions, the allele from 02AL113(Kyukei SC2)LH.self led to early flowering. An epistatic interaction was also confirmed between Fest_L0606_4 and Fest_L0337_6. These results demonstrate that photoperiod sensitivity in common buckwheat is controlled by a pathway consisting of photoperiod-sensitivity candidate genes as well as multiple gene action. PMID:23136477

  5. Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts

    PubMed Central

    Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming

    2015-01-01

    Background Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. Results We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington’s, Alzheimer’s and Parkinson’s diseases. This is the first description of degenerative disease-associated genes in jellyfish. Conclusion We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular

  6. Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts.

    PubMed

    Liu, Guoyan; Zhou, Yonghong; Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming

    2015-01-01

    Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington's, Alzheimer's and Parkinson's diseases. This is the first description of degenerative disease-associated genes in jellyfish. We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular level and information on the underlying

  7. The ABRF Edman Sequencing Research Group 2008 Study: Investigation into Homopolymeric Amino Acid N-Terminal Sequence Tags and Their Effects on Automated Edman Degradation

    PubMed Central

    Thoma, R. S.; Smith, J. S.; Sandoval, W.; Leone, J. W.; Hunziker, P.; Hampton, B.; Linse, K. D.; Denslow, N. D.

    2009-01-01

    The Edman Sequence Research Group (ESRG) of the Association of Biomolecular Resource designs and executes interlaboratory studies investigating the use of automated Edman degradation for protein and peptide analysis. In 2008, the ESRG enlisted the help of core sequencing facilities to investigate the effects of a repeating amino acid tag at the N-terminus of a protein. Commonly, to facilitate protein purification, an affinity tag containing a polyhistidine sequence is conjugated to the N-terminus of the protein. After expression, polyhistidine-tagged protein is readily purified via chelation with an immobilized metal affinity resin. The addition of the polyhistidine tag presents unique challenges for the determination of protein identity using Edman degradation chemistry. Participating laboratories were asked to sequence one protein engineered in three configurations: with an N-terminal polyhistidine tag; with an N-terminal polyalanine tag; or with no tag. Study participants were asked to return a data file containing the uncorrected amino acid picomole yields for the first 17 cycles. Initial and repetitive yield (R.Y.) information and the amount of lag were evaluated. Information about instrumentation and sample treatment was also collected as part of the study. For this study, the majority of participating laboratories successfully called the amino acid sequence for 17 cycles for all three test proteins. In general, laboratories found it more difficult to call the sequence containing the polyhistidine tag. Lag was observed earlier and more consistently with the polyhistidine-tagged protein than the polyalanine-tagged protein. Histidine yields were significantly less than the alanine yields in the tag portion of each analysis. The polyhistidine and polyalanine protein-R.Y. calculations were found to be equivalent. These calculations showed that the nontagged portion from each protein was equivalent. The terminal histidines from the tagged portion of the protein

  8. SAGETTARIUS: a program to reduce the number of tags mapped to multiple transcripts and to plan SAGE sequencing stages

    PubMed Central

    Bianchetti, Laurent; Wu, Yan; Guerin, Eric; Plewniak, Frédéric; Poch, Olivier

    2007-01-01

    SAGE (Serial Analysis of Gene Expression) experiments generate short nucleotide sequences called ‘tags’ which are assumed to map unambiguously to their original transcripts (1 tag to 1 transcript mapping). Nevertheless, many tags are generated that do not map to any transcript or map to multiple transcripts. Current bioinformatics resources, such as SAGEmap and TAGmapper, have focused on reducing the number of unmapped tags. Here, we describe SAGETTARIUS, a new high-throughput program that performs successive precise Nla3 and Sau3A tag to transcript mapping, based on specifically designed Virtual Tag (VT) libraries. First, SAGETTARIUS decreases the number of tags mapped to multiple transcripts. Among the various mapping resources compared, SAGETTARIUS performed the best in this respect by decreasing up to 11% the number of multiply mapped tags. Second, SAGETTARIUS allows the establishment of a guideline for SAGE experiment sequencing efforts through efficient mapping of the CRT (Cytoplasmic Ribosomal protein Transcripts)-specific tags. Using all publicly available human and mouse Nla3 SAGE experiments, we show that sequencing 100 000 tags is sufficient to map almost all CRT-specific tags and that four sequencing stages can be identified when carrying out a human or mouse SAGE project. SAGETTARIUS is web interfaced and freely accessible to academic users. PMID:17884916

  9. Tag jumps illuminated--reducing sequence-to-sample misidentifications in metabarcoding studies.

    PubMed

    Schnell, Ida Baerholm; Bohmann, Kristine; Gilbert, M Thomas P

    2015-11-01

    Metabarcoding of environmental samples on second-generation sequencing platforms has rapidly become a valuable tool for ecological studies. A fundamental assumption of this approach is the reliance on being able to track tagged amplicons back to the samples from which they originated. In this study, we address the problem of sequences in metabarcoding sequencing outputs with false combinations of used tags (tag jumps). Unless these sequences can be identified and excluded from downstream analyses, tag jumps creating sequences with false, but already used tag combinations, can cause incorrect assignment of sequences to samples and artificially inflate diversity. In this study, we document and investigate tag jumping in metabarcoding studies on Illumina sequencing platforms by amplifying mixed-template extracts obtained from bat droppings and leech gut contents with tagged generic arthropod and mammal primers, respectively. We found that an average of 2.6% and 2.1% of sequences had tag combinations, which could be explained by tag jumping in the leech and bat diet study, respectively. We suggest that tag jumping can happen during blunt-ending of pools of tagged amplicons during library build and as a consequence of chimera formation during bulk amplification of tagged amplicons during library index PCR. We argue that tag jumping and contamination between libraries represents a considerable challenge for Illumina-based metabarcoding studies, and suggest measures to avoid false assignment of tag jumping-derived sequences to samples. © 2015 John Wiley & Sons Ltd.

  10. Nonradioactive sequence-tagged microsatellite site analyses: a method transferable to the tropics.

    PubMed

    Lagoda, P J; Dambier, D; Grapin, A; Baurens, F C; Lanaud, C; Noyer, J L

    1998-02-01

    Utilization of existing isozyme analysis facilities to detect sequence-tagged microsatellite site (STMS) polymorphism or any simple sequence repeat (SSR) variation is described. Different parameters concerning the difficulties in transferring molecular techniques to less sophisticated laboratory infrastructures (i.e. tropical outstations) are discussed (e.g. reproducibility, efficacy, precision). Nonradioactive STMS analysis is bound to foster collaborative research between "biodiversity" and "biotechnology" centers.

  11. Development of expressed sequence tag-based microsatellite markers for the critically endangered Isoëtes sinensis (Isoetaceae) based on transcriptome analysis.

    PubMed

    Gichira, A W; Long, Z C; Wang, Q F; Chen, J M; Liao, K

    2016-07-15

    Isoëtes sinensis is a critically endangered quillwort. To facilitate studies on the conservation genetics of this species, we developed expressed sequence tag-simple sequence repeat (EST-SSR) markers. A total of 50,063 unigenes were predicted by transcriptome sequencing, 5294 (10.6%) of which significantly matched 3011 Gene Ontology annotations and 2363 were assigned to Kyoto Encyclopedia of Genes and Genomes metabolic pathways. Most of these (2297) were involved in metabolism. A total of 1982 SSR motifs were identified, with trinucleotides being the dominant repeat motif, and 1438 (72.6%) SSR primers were designed. Eighteen randomly selected primer pairs were used to genotype 24 I. sinensis accessions, which confirmed the suitability of these novel markers for molecular studies of I. sinensis. The heterozygosity index value ranged between 0.0799 and 0.9106, while the Shannon-Wiener diversity index value ranged between 0.1732 and 2.5589. The EST-SSRs reported in this study are linked to genic sequences, and are therefore ideal for investigating the evolutionary history of I. sinensis. These markers, together with the large EST dataset generated in this study, will greatly facilitate conservation genetic studies of I. sinensis.

  12. HIV-1 quasispecies delineation by tag linkage deep sequencing.

    PubMed

    Wu, Nicholas C; De La Cruz, Justin; Al-Mawsawi, Laith Q; Olson, C Anders; Qi, Hangfei; Luan, Harding H; Nguyen, Nguyen; Du, Yushen; Le, Shuai; Wu, Ting-Ting; Li, Xinmin; Lewis, Martha J; Yang, Otto O; Sun, Ren

    2014-01-01

    Trade-offs between throughput, read length, and error rates in high-throughput sequencing limit certain applications such as monitoring viral quasispecies. Here, we describe a molecular-based tag linkage method that allows assemblage of short sequence reads into long DNA fragments. It enables haplotype phasing with high accuracy and sensitivity to interrogate individual viral sequences in a quasispecies. This approach is demonstrated to deduce ∼ 2000 unique 1.3 kb viral sequences from HIV-1 quasispecies in vivo and after passaging ex vivo with a detection limit of ∼ 0.005% to ∼ 0.001%. Reproducibility of the method is validated quantitatively and qualitatively by a technical replicate. This approach can improve monitoring of the genetic architecture and evolution dynamics in any quasispecies population.

  13. Comparative analyses of potato expressed sequence tag libraries.

    PubMed

    Ronning, Catherine M; Stegalkina, Svetlana S; Ascenzi, Robert A; Bougri, Oleg; Hart, Amy L; Utterbach, Teresa R; Vanaken, Susan E; Riedmuller, Steve B; White, Joseph A; Cho, Jennifer; Pertea, Geo M; Lee, Yuandan; Karamycheva, Svetlana; Sultana, Razvan; Tsai, Jennifer; Quackenbush, John; Griffiths, Helen M; Restrepo, Silvia; Smart, Christine D; Fry, William E; Van Der Hoeven, Rutger; Tanksley, Steve; Zhang, Peifen; Jin, Hailing; Yamamoto, Miki L; Baker, Barbara J; Buell, C Robin

    2003-02-01

    The cultivated potato (Solanum tuberosum) shares similar biology with other members of the Solanaceae, yet has features unique within the family, such as modified stems (stolons) that develop into edible tubers. To better understand potato biology, we have undertaken a survey of the potato transcriptome using expressed sequence tags (ESTs) from diverse tissues. A total of 61,940 ESTs were generated from aerial tissues, below-ground tissues, and tissues challenged with the late-blight pathogen (Phytophthora infestans). Clustering and assembly of these ESTs resulted in a total of 19,892 unique sequences with 8,741 tentative consensus sequences and 11,151 singleton ESTs. We were able to identify a putative function for 43.7% of these sequences. A number of sequences (48) were expressed throughout the libraries sampled, representing constitutively expressed sequences. Other sequences (13,068, 21%) were uniquely expressed and were detected only in a single library. Using hierarchal and k means clustering of the EST sequences, we were able to correlate changes in gene expression with major physiological events in potato biology. Using pair-wise comparisons of tuber-related tissues, we were able to associate genes with tuber initiation, dormancy, and sprouting. We also were able to identify a number of characterized as well as novel sequences that were unique to the incompatible interaction of late-blight pathogen, thereby providing a foundation for further understanding the mechanism of resistance.

  14. Comparative Analyses of Potato Expressed Sequence Tag Libraries1

    PubMed Central

    Ronning, Catherine M.; Stegalkina, Svetlana S.; Ascenzi, Robert A.; Bougri, Oleg; Hart, Amy L.; Utterbach, Teresa R.; Vanaken, Susan E.; Riedmuller, Steve B.; White, Joseph A.; Cho, Jennifer; Pertea, Geo M.; Lee, Yuandan; Karamycheva, Svetlana; Sultana, Razvan; Tsai, Jennifer; Quackenbush, John; Griffiths, Helen M.; Restrepo, Silvia; Smart, Christine D.; Fry, William E.; van der Hoeven, Rutger; Tanksley, Steve; Zhang, Peifen; Jin, Hailing; Yamamoto, Miki L.; Baker, Barbara J.; Buell, C. Robin

    2003-01-01

    The cultivated potato (Solanum tuberosum) shares similar biology with other members of the Solanaceae, yet has features unique within the family, such as modified stems (stolons) that develop into edible tubers. To better understand potato biology, we have undertaken a survey of the potato transcriptome using expressed sequence tags (ESTs) from diverse tissues. A total of 61,940 ESTs were generated from aerial tissues, below-ground tissues, and tissues challenged with the late-blight pathogen (Phytophthora infestans). Clustering and assembly of these ESTs resulted in a total of 19,892 unique sequences with 8,741 tentative consensus sequences and 11,151 singleton ESTs. We were able to identify a putative function for 43.7% of these sequences. A number of sequences (48) were expressed throughout the libraries sampled, representing constitutively expressed sequences. Other sequences (13,068, 21%) were uniquely expressed and were detected only in a single library. Using hierarchal and k means clustering of the EST sequences, we were able to correlate changes in gene expression with major physiological events in potato biology. Using pair-wise comparisons of tuber-related tissues, we were able to associate genes with tuber initiation, dormancy, and sprouting. We also were able to identify a number of characterized as well as novel sequences that were unique to the incompatible interaction of late-blight pathogen, thereby providing a foundation for further understanding the mechanism of resistance. PMID:12586867

  15. Massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities

    PubMed Central

    Stoeck, Thorsten; Behnke, Anke; Christen, Richard; Amaral-Zettler, Linda; Rodriguez-Mora, Maria J; Chistoserdov, Andrei; Orsi, William; Edgcomb, Virginia P

    2009-01-01

    Background Recent advances in sequencing strategies make possible unprecedented depth and scale of sampling for molecular detection of microbial diversity. Two major paradigm-shifting discoveries include the detection of bacterial diversity that is one to two orders of magnitude greater than previous estimates, and the discovery of an exciting 'rare biosphere' of molecular signatures ('species') of poorly understood ecological significance. We applied a high-throughput parallel tag sequencing (454 sequencing) protocol adopted for eukaryotes to investigate protistan community complexity in two contrasting anoxic marine ecosystems (Framvaren Fjord, Norway; Cariaco deep-sea basin, Venezuela). Both sampling sites have previously been scrutinized for protistan diversity by traditional clone library construction and Sanger sequencing. By comparing these clone library data with 454 amplicon library data, we assess the efficiency of high-throughput tag sequencing strategies. We here present a novel, highly conservative bioinformatic analysis pipeline for the processing of large tag sequence data sets. Results The analyses of ca. 250,000 sequence reads revealed that the number of detected Operational Taxonomic Units (OTUs) far exceeded previous richness estimates from the same sites based on clone libraries and Sanger sequencing. More than 90% of this diversity was represented by OTUs with less than 10 sequence tags. We detected a substantial number of taxonomic groups like Apusozoa, Chrysomerophytes, Centroheliozoa, Eustigmatophytes, hyphochytriomycetes, Ichthyosporea, Oikomonads, Phaeothamniophytes, and rhodophytes which remained undetected by previous clone library-based diversity surveys of the sampling sites. The most important innovations in our newly developed bioinformatics pipeline employ (i) BLASTN with query parameters adjusted for highly variable domains and a complete database of public ribosomal RNA (rRNA) gene sequences for taxonomic assignments of tags; (ii

  16. DNA methylation mapping by tag-modified bisulfite genomic sequencing.

    PubMed

    Han, Weiguo; Cauchi, Stephane; Herman, James G; Spivack, Simon D

    2006-08-01

    A tag-modified bisulfite genomic sequencing (tBGS) method employing direct cycle sequencing of polymerase chain reaction (PCR) products at kilobase scale, without conventional DNA fragment cloning, was developed for simplified evaluation of DNA methylation sites. The method entails subjecting bisulfite-modified genomic DNA to a second-round PCR amplification employing GC-tagged primers. Qualitative results from tBGS closely correlated with those from conventional BGS (R=0.935, p=0.002). In application, the intertissue and interindividual CpG methylation differences in promoter sequence for two genes, CYP1B1 and GSTP1, were then explored across four human tissue types (peripheral blood cells, exfoliated buccal cells, paired nontumor-tumor lung tissues), and two lung cell types in culture (normal NHBE and malignant A549). Predominantly conserved methylation maps for the two gene promoters were apparent across donors and tissues. At any given CpG site, variation in the degree of methylation could be determined by the relative height of C and T peaks in the sequencing trace. Methylation maps for the GSTP1 promoter diverged between NHBE (unmethylated) and A549 (completely methylated) cells in a previously unexplored upstream region, correlating with a 2.7-fold difference in GSTP1 mRNA expression (p<0.01). The tBGS method simplifies detailed methylation scanning of kilobase-scale genomic DNA, facilitating more ambitious genomic methylation mapping studies.

  17. Generation and analysis of a 29,745 unique Expressed Sequence Tags from the Pacific oyster (Crassostrea gigas) assembled into a publicly accessible database: the GigasDatabase

    PubMed Central

    2009-01-01

    Background Although bivalves are among the most-studied marine organisms because of their ecological role and economic importance, very little information is available on the genome sequences of oyster species. This report documents three large-scale cDNA sequencing projects for the Pacific oyster Crassostrea gigas initiated to provide a large number of expressed sequence tags that were subsequently compiled in a publicly accessible database. This resource allowed for the identification of a large number of transcripts and provides valuable information for ongoing investigations of tissue-specific and stimulus-dependant gene expression patterns. These data are crucial for constructing comprehensive DNA microarrays, identifying single nucleotide polymorphisms and microsatellites in coding regions, and for identifying genes when the entire genome sequence of C. gigas becomes available. Description In the present paper, we report the production of 40,845 high-quality ESTs that identify 29,745 unique transcribed sequences consisting of 7,940 contigs and 21,805 singletons. All of these new sequences, together with existing public sequence data, have been compiled into a publicly-available Website http://public-contigbrowser.sigenae.org:9090/Crassostrea_gigas/index.html. Approximately 43% of the unique ESTs had significant matches against the SwissProt database and 27% were annotated using Gene Ontology terms. In addition, we identified a total of 208 in silico microsatellites from the ESTs, with 173 having sufficient flanking sequence for primer design. We also identified a total of 7,530 putative in silico, single-nucleotide polymorphisms using existing and newly-generated EST resources for the Pacific oyster. Conclusion A publicly-available database has been populated with 29,745 unique sequences for the Pacific oyster Crassostrea gigas. The database provides many tools to search cleaned and assembled ESTs. The user may input and submit several filters, such as

  18. A blackberry (Rubus L.) expressed sequence tag library for the development of simple sequence repeat markers

    USDA-ARS?s Scientific Manuscript database

    A blackberry (Rubus L.) expressed sequence tag (EST) library was produced for developing simple sequence repeat (SSR) markers from the tetraploid blackberry cultivar, Merton Thornless, the source of the thornless trait in commercial cultivars. RNA was extracted from young expanding leaves and used f...

  19. Searching the expressed sequence tag (EST) databases: panning for genes.

    PubMed

    Jongeneel, C V

    2000-02-01

    The genomes of living organisms contain many elements, including genes coding for proteins. The portions of the genes expressed as mature mRNA, collectively known as the transcriptome, represent only a small part of the genome. The expressed sequence tag (EST) databases contain an increasingly large part of the transcriptome of many species. For this reason, these databases are probably the most abundant source of new coding sequences available today. However, the raw data deposited in the EST databases are to a large extent unorganised, unannotated, redundant and of relatively low quality. This paper reviews some of the characteristics of the EST data, and the methods that can be used to find novel protein sequences within them. It also documents a collection of databases, software and web sites that can be useful to biologists interested in mining the EST databases over the Internet, or in establishing a local environment for such analyses.

  20. Single nucleotide polymorphisms from Theobroma cacao expressed sequence tags associated with witches' broom disease in cacao.

    PubMed

    Lima, L S; Gramacho, K P; Carels, N; Novais, R; Gaiotto, F A; Lopes, U V; Gesteira, A S; Zaidan, H A; Cascardo, J C M; Pires, J L; Micheli, F

    2009-07-14

    In order to increase the efficiency of cacao tree resistance to witches' broom disease, which is caused by Moniliophthora perniciosa (Tricholomataceae), we looked for molecular markers that could help in the selection of resistant cacao genotypes. Among the different markers useful for developing marker-assisted selection, single nucleotide polymorphisms (SNPs) constitute the most common type of sequence difference between alleles and can be easily detected by in silico analysis from expressed sequence tag libraries. We report the first detection and analysis of SNPs from cacao-M. perniciosa interaction expressed sequence tags, using bioinformatics. Selection based on analysis of these SNPs should be useful for developing cacao varieties resistant to this devastating disease.

  1. Analyses of Expressed Sequence Tags from Apple1

    PubMed Central

    Newcomb, Richard D.; Crowhurst, Ross N.; Gleave, Andrew P.; Rikkerink, Erik H.A.; Allan, Andrew C.; Beuning, Lesley L.; Bowen, Judith H.; Gera, Emma; Jamieson, Kim R.; Janssen, Bart J.; Laing, William A.; McArtney, Steve; Nain, Bhawana; Ross, Gavin S.; Snowden, Kimberley C.; Souleyre, Edwige J.F.; Walton, Eric F.; Yauk, Yar-Khing

    2006-01-01

    The domestic apple (Malus domestica; also known as Malus pumila Mill.) has become a model fruit crop in which to study commercial traits such as disease and pest resistance, grafting, and flavor and health compound biosynthesis. To speed the discovery of genes involved in these traits, develop markers to map genes, and breed new cultivars, we have produced a substantial expressed sequence tag collection from various tissues of apple, focusing on fruit tissues of the cultivar Royal Gala. Over 150,000 expressed sequence tags have been collected from 43 different cDNA libraries representing 34 different tissues and treatments. Clustering of these sequences results in a set of 42,938 nonredundant sequences comprising 17,460 tentative contigs and 25,478 singletons, together representing what we predict are approximately one-half the expressed genes from apple. Many potential molecular markers are abundant in the apple transcripts. Dinucleotide repeats are found in 4,018 nonredundant sequences, mainly in the 5′-untranslated region of the gene, with a bias toward one repeat type (containing AG, 88%) and against another (repeats containing CG, 0.1%). Trinucleotide repeats are most common in the predicted coding regions and do not show a similar degree of sequence bias in their representation. Bi-allelic single-nucleotide polymorphisms are highly abundant with one found, on average, every 706 bp of transcribed DNA. Predictions of the numbers of representatives from protein families indicate the presence of many genes involved in disease resistance and the biosynthesis of flavor and health-associated compounds. Comparisons of some of these gene families with Arabidopsis (Arabidopsis thaliana) suggest instances where there have been duplications in the lineages leading to apple of biosynthetic and regulatory genes that are expressed in fruit. This resource paves the way for a concerted functional genomics effort in this important temperate fruit crop. PMID:16531485

  2. Construction of a full-length cDNA library and preliminary analysis of expressed sequence tags from lymphocytes of half-pipe snowboarding athletes.

    PubMed

    Zhao, Y H; Zhang, Z B; Zhao, C Q; Zhang, Y; Wang, Y F; Guan, W J; Zhu, Z Q

    2015-10-21

    The genes of top athletes are a valuable genetic resource for the human race, and could be exploited to identify novel genes related to sports ability, as well as other functions. We analyzed the expressed sequence tags from top half-pipe snowboarding athletes using the SMART complementary DNA (cDNA) library construction method to elucidate the characteristics of the athlete genome and the differential expression of the genes it contains. Overall, we established a full-length cDNA library from the lymphocytes of half-pipe snowboarding athletes and analyzed the inserted gene fragments. We also classified those genes according to molecular function, biological characteristics, cellular composition, protein types, and signal paths. A total of 201 functional genes were noted, which were distributed in 27 pathways. TXN, MDH1, ARL1, ARPC3, ACTG1, and other genes measured in sequence may be associated with physical ability. This suggests that the SMART cDNA library constructed from the genetic material from top athletes is an effective tool for preserving genetic sports resources and providing genetic markers of physical ability for athlete selection.

  3. An expressed sequence tag analysis of the intertidal brown seaweeds Fucus serratus (L.) and F. vesiculosus (L.) (Heterokontophyta, Phaeophyceae) in response to abiotic stressors.

    PubMed

    Pearson, Gareth A; Hoarau, Galice; Lago-Leston, Asuncion; Coyer, James A; Kube, Michael; Reinhardt, Richard; Henckel, Kolja; Serrão, Ester T A; Corre, Erwan; Olsen, Jeanine L

    2010-04-01

    In order to aid gene discovery and uncover genes responding to abiotic stressors in stress-tolerant brown algae of the genus Fucus, expressed sequence tags (ESTs) were studied in two species, Fucus serratus and Fucus vesiculosus. Clustering of over 12,000 ESTs from three libraries for heat shock/recovery and desiccation/rehydration resulted in identification of 2,503, 1,290, and 2,409 unigenes from heat-shocked F. serratus, desiccated F. serratus, and desiccated F. vesiculosus, respectively. Low overall annotation rates (18-31%) were strongly associated with the presence of long 3' untranslated regions in Fucus transcripts, as shown by analyses of predicted protein-coding sequence in annotated and nonannotated tentative consensus sequences. Posttranslational modification genes were overrepresented in the heat shock/recovery library, including many chaperones, the most abundant of which were a family of small heat shock protein transcripts, Hsp90 and Hsp70 members. Transcripts of LI818-like light-harvesting genes implicated in photoprotection were also expressed during heat shock in high light. The expression of several heat-shock-responsive genes was confirmed by quantitative reverse transcription polymerase chain reaction. However, candidate genes were notably absent from both desiccation/rehydration libraries, while the responses of the two species to desiccation were divergent, perhaps reflecting the species-specific physiological differences in stress tolerance previously established. Desiccation-tolerant F. vesiculosus overexpressed at least 17 ribosomal protein genes and two ubiquitin-ribosomal protein fusion genes, suggesting that ribosome function and/or biogenesis are important during cycles of rapid desiccation and rehydration in the intertidal zone and possibly indicate parallels with other poikilohydric organisms such as desiccation-tolerant bryophytes.

  4. Discovering conserved insect microRNAs from expressed sequence tags.

    PubMed

    Jia, Qidong; Lin, Kejian; Liang, Jingdong; Yu, Lun; Li, Fei

    2010-12-01

    MicroRNAs (miRNA) participate in regulating diverse biological pathways by translational repression in animals. They have attracted increasing attention recently. However, little work has been done on the miRNA genes in agriculturally important pests. Because the transcripts of most miRNA genes are the products of type-II RNA polymerase, pri-miRNA has a poly(A) tail and appears in expressed sequence tags (EST). We developed a computational pipeline to identify miRNA genes from insect ESTs. First, 980,697 ESTs from 63 insects were collected and used to search the nr database. The ESTs which did not share significant similarities with any known protein-coding genes were treated as non-coding ESTs. Next, known mature miRNAs were used to align with non-coding ESTs. The ESTs which contain the sequence of mature miRNA were treated as candidate ESTs. Finally, putative precursors were extracted flanking the mature miRNA region in candidate ESTs and evaluated by the Triplet-SVM algorithm. As a result, 86 miRNAs from 30 insect species were found based on a strict criterion while 330 miRNAs from 51 species were found based on a loose criterion. Evolution analysis indicated that mir-467, mir-297 and mir-466 were the highest conserved miRNA families in insects. To confirm the reliability of putative insect miRNAs, the expression profile of nine predicted miRNAs in Locusta migratoria was investigated. Eight miRNAs were successfully detected by RT-PCR. Most miRNAs were expressed ubiquitously at all examined tissues and developmental stages whereas Lmi-mir-509 was specifically expressed in the thorax of the 2nd, 4th and 5th instars and adult locust. In all, our work reported an efficient computational strategy for predicting miRNA genes from insect ESTs and presented tens of miRNAs in diverse insect species which are expected to participate in many important physiological processes.

  5. Microsatellite markers derived from Calophyllum inophyllum (Clusiaceae) expressed sequence tags.

    PubMed

    Setsuko, Suzuki; Uchiyama, Kentaro; Sugai, Kyoko; Hanaoka, So; Yoshimaru, Hiroshi

    2012-01-01

    Robust markers are required (inter alia) for assessing origins of Calophyllum inophyllum populations on the Bonin Islands, Japan. Therefore, informative expressed sequence tag (EST)-based microsatellite or simple sequence repeat (SSRs) markers in the species were sought. Using 135378 ESTs derived from de novo pyrosequencing, primers for 475 EST-SSRs were developed, 48 of which were tested for PCR amplification. Thirty-six of the 48 primers showed clear amplification, with 23 displaying polymorphism in sampled populations. Expected heterozygosity in the samples from the Bonin Islands and Ryukyu Islands populations ranged from 0.041 to 0.697 and from 0.041 to 0.773, respectively. As EST-SSRs are potentially tightly linked with functional genes, and reportedly more transferable to related species than anonymous genomic SSRs, the developed primers have utility for future studies of the origins, genetic structure, and conservation of C. inophyllum and related species.

  6. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    PubMed Central

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-01-01

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105

  7. Construction of a full-length enriched cDNA library and preliminary analysis of expressed sequence tags from Bengal Tiger Panthera tigris tigris.

    PubMed

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-05-24

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.

  8. CREST--classification resources for environmental sequence tags.

    PubMed

    Lanzén, Anders; Jørgensen, Steffen L; Huson, Daniel H; Gorfer, Markus; Grindhaug, Svenn Helge; Jonassen, Inge; Øvreås, Lise; Urich, Tim

    2012-01-01

    Sequencing of taxonomic or phylogenetic markers is becoming a fast and efficient method for studying environmental microbial communities. This has resulted in a steadily growing collection of marker sequences, most notably of the small-subunit (SSU) ribosomal RNA gene, and an increased understanding of microbial phylogeny, diversity and community composition patterns. However, to utilize these large datasets together with new sequencing technologies, a reliable and flexible system for taxonomic classification is critical. We developed CREST (Classification Resources for Environmental Sequence Tags), a set of resources and tools for generating and utilizing custom taxonomies and reference datasets for classification of environmental sequences. CREST uses an alignment-based classification method with the lowest common ancestor algorithm. It also uses explicit rank similarity criteria to reduce false positives and identify novel taxa. We implemented this method in a web server, a command line tool and the graphical user interfaced program MEGAN. Further, we provide the SSU rRNA reference database and taxonomy SilvaMod, derived from the publicly available SILVA SSURef, for classification of sequences from bacteria, archaea and eukaryotes. Using cross-validation and environmental datasets, we compared the performance of CREST and SilvaMod to the RDP Classifier. We also utilized Greengenes as a reference database, both with CREST and the RDP Classifier. These analyses indicate that CREST performs better than alignment-free methods with higher recall rate (sensitivity) as well as precision, and with the ability to accurately identify most sequences from novel taxa. Classification using SilvaMod performed better than with Greengenes, particularly when applied to environmental sequences. CREST is freely available under a GNU General Public License (v3) from http://apps.cbu.uib.no/crest and http://lcaclassifier.googlecode.com.

  9. CREST – Classification Resources for Environmental Sequence Tags

    PubMed Central

    Lanzén, Anders; Jørgensen, Steffen L.; Huson, Daniel H.; Gorfer, Markus; Grindhaug, Svenn Helge; Jonassen, Inge; Øvreås, Lise; Urich, Tim

    2012-01-01

    Sequencing of taxonomic or phylogenetic markers is becoming a fast and efficient method for studying environmental microbial communities. This has resulted in a steadily growing collection of marker sequences, most notably of the small-subunit (SSU) ribosomal RNA gene, and an increased understanding of microbial phylogeny, diversity and community composition patterns. However, to utilize these large datasets together with new sequencing technologies, a reliable and flexible system for taxonomic classification is critical. We developed CREST (Classification Resources for Environmental Sequence Tags), a set of resources and tools for generating and utilizing custom taxonomies and reference datasets for classification of environmental sequences. CREST uses an alignment-based classification method with the lowest common ancestor algorithm. It also uses explicit rank similarity criteria to reduce false positives and identify novel taxa. We implemented this method in a web server, a command line tool and the graphical user interfaced program MEGAN. Further, we provide the SSU rRNA reference database and taxonomy SilvaMod, derived from the publicly available SILVA SSURef, for classification of sequences from bacteria, archaea and eukaryotes. Using cross-validation and environmental datasets, we compared the performance of CREST and SilvaMod to the RDP Classifier. We also utilized Greengenes as a reference database, both with CREST and the RDP Classifier. These analyses indicate that CREST performs better than alignment-free methods with higher recall rate (sensitivity) as well as precision, and with the ability to accurately identify most sequences from novel taxa. Classification using SilvaMod performed better than with Greengenes, particularly when applied to environmental sequences. CREST is freely available under a GNU General Public License (v3) from http://apps.cbu.uib.no/crest and http://lcaclassifier.googlecode.com. PMID:23145153

  10. prot4EST: Translating Expressed Sequence Tags from neglected genomes

    PubMed Central

    Wasmuth, James D; Blaxter, Mark L

    2004-01-01

    Background The genomes of an increasing number of species are being investigated through generation of expressed sequence tags (ESTs). However, ESTs are prone to sequencing errors and typically define incomplete transcripts, making downstream annotation difficult. Annotation would be greatly improved with robust polypeptide translations. Many current solutions for EST translation require a large number of full-length gene sequences for training purposes, a resource that is not available for the majority of EST projects. Results As part of our ongoing EST programs investigating these "neglected" genomes, we have developed a polypeptide prediction pipeline, prot4EST. It incorporates freely available software to produce final translations that are more accurate than those derived from any single method. We show that this integrated approach goes a long way to overcoming the deficit in training data. Conclusions prot4EST provides a portable EST translation solution and can be usefully applied to >95% of EST projects to improve downstream annotation. It is freely available from . PMID:15571632

  11. Analysis of expressed sequence tags (ESTs) and gene expression changes under different growth conditions for the ciliate Anophryoides haemophila, the causative agent of bumper car disease in the American lobster (Homarus americanus).

    PubMed

    Acorn, Adam R; Clark, K Fraser; Jones, Sarah; Després, Béatrice M; Munro, Sarah; Cawthorn, Richard J; Greenwood, Spencer J

    2011-06-01

    The scuticociliate Anophryoides haemophila, causes bumper car disease in American lobster (Homarus americanus) in commercial holding facilities in Atlantic Canada. While the parasite has been recognized since the 1970s and much has been learned about its biology, minimal molecular characterization exists. With genome consortiums turning to model organisms like the ciliates Tetrahymena and Paramecium, the amount of relevant sequence data available has made sequence surveys more attractive for gene discovery in related ciliates. We sequenced 9984 expressed sequence tags (ESTs) from a non-normalized A. haemophila cDNA library to characterize gene expression patterns, functional gene distribution and to discover novel genes related to the parasitic life history. The A. haemophila ESTs were grouped into 843 clusters and singletons with 658 EST clusters having identifiable homologs, while 159 ESTs were unique and had no similarity to any sequences in the public databases. Not unexpectedly, about 67% of the A. haemophila ESTs have similarity to annotated and hypothetical genes from the related oligohymenophorean ciliate, Tetrahymena. Numerous cysteine proteases, hypothetical proteins and novel sequences possess putative secretory signal peptides suggesting that they may contribute to the pathogenesis of bumper car disease in lobster. Real time RT-qPCR analysis of cathepsin L and two homologs of cathepsin B did not show any changes in gene expression under varying in vitro growth conditions or during a modified-in vivo infection which may be suggestive of the opportunistic life history strategy of this ciliate.

  12. Identification of molecular motors in the Woods Hole squid, Loligo pealei: an expressed sequence tag approach.

    PubMed

    DeGiorgis, Joseph A; Cavaliere, Kimberly R; Burbach, J Peter H

    2011-10-01

    The squid giant axon and synapse are unique systems for studying neuronal function. While a few nucleotide and amino acid sequences have been obtained from squid, large scale genetic and proteomic information is lacking. We have been particularly interested in motors present in axons and their roles in transport processes. Here, to obtain genetic data and to identify motors expressed in squid, we initiated an expressed sequence tag project by single-pass sequencing mRNAs isolated from the stellate ganglia of the Woods Hole Squid, Loligo pealei. A total of 22,689 high quality expressed sequence tag (EST) sequences were obtained and subjected to basic local alignment search tool analysis. Seventy six percent of these sequences matched genes in the National Center for Bioinformatics databases. By CAP3 analysis this library contained 2459 contigs and 7568 singletons. Mining for motors successfully identified six kinesins, six myosins, a single dynein heavy chain, as well as components of the dynactin complex, and motor light chains and accessory proteins. This initiative demonstrates that EST projects represent an effective approach to obtain sequences of interest. Copyright © 2011 Wiley Periodicals, Inc.

  13. Analysis of Expressed Sequence Tags from Chinese Bayberry Fruit (Myrica rubra Sieb. and Zucc.) at Different Ripening Stages and Their Association with Fruit Quality Development

    PubMed Central

    Zhu, Changqing; Feng, Chao; Li, Xian; Xu, Changjie; Sun, Chongde; Chen, Kunsong

    2013-01-01

    A total of 2000 EST sequences were produced from cDNA libraries generated from Chinese bayberry fruit (Myrica rubra Sieb. and Zucc. cv. “Biqi”) at four different ripening stages. After cluster and assembly analysis of the datasets by UniProt, 395 unigenes were identified, and their presumed functions were assigned to 14 putative cellular roles. Furthermore, a sequence BLAST was done for the top ten highly expressed genes in the ESTs, and genes associated with disease/defense and anthocyanin accumulation were analyzed. Gene-encoding elements associated with ethylene biosynthesis and signal transductions, in addition to other senescence-regulating proteins, as well as those associated with quality formation during fruit ripening, were also identified. Their possible roles were subsequently discussed. PMID:23377019

  14. Analysis of STIS time-tag data

    NASA Technical Reports Server (NTRS)

    Lindler, Don J.; Gull, Theodore R.; Kraemer, Steven B.; Hulbert, Stephen J.

    1997-01-01

    Very high time resolution data can be obtained from the Space Telescope Imaging Spectrograph (STIS) Multi-Anode Microchannel Array (MAMA) detectors using the time-tag observing mode. In this mode, the photon events are not accumulated onboard the spacecraft. Instead, each event is recorded internally and transmitted to the ground as an X and Y location with an event time. Event times are recorded in units of 125 microseconds. Analysis of STIS Crab Pulsar data demonstrates that a time resolution of approaching 125 microseconds can be achieved. Furthermore, the time-tag observing mode has been demonstrated to be a very powerful diagnostic tool and can be used to increase the resolution of both imaging and spectral data.

  15. Direct Quantitative Bisulfite Sequencing Using Tag-modified Primers and Internal Normalization.

    PubMed

    Dietrich, Dimo

    2016-12-01

    For the investigation of DNA methylation patterns, bisulfite conversion of the DNA followed by polymerase chain reaction (PCR) amplification and sequencing of the region of interest is the method of choice when information at single CpG site resolution is desired. In this study, a simple method for direct quantitative bisulfite sequencing based on the Sanger method is shown to be usable for the accurate analysis of single CpG sites. This method is based on the usage of tag-modified primers to obtain an internal normalization signal within the PCR product.

  16. The C-terminal amino acid sequence of nascent peptide is a major determinant of SsrA tagging at all three stop codons.

    PubMed Central

    Sunohara, Takafumi; Abo, Tatsuhiko; Inada, Toshifumi; Aiba, Hiroji

    2002-01-01

    Recent studies on endogenous SsrA-tagged proteins have revealed that the tagging could occur at a position corresponding to the normal termination codon. During the study of SsrA-mediated Lacl tagging (Abo et al., EMBO J, 2000 19:3762-3769), we found that a variant Lacl (Lacl deltaC1) lacking the last C-terminal amino acid residue is efficiently tagged in a stop codon-dependent manner. SsrA tagging of Lacl deltaC1 occurred efficiently without Lacl binding to the lac operators at any one of three stop codons. The C-terminal (R)LESG peptide of Lacl deltaC1 was shown to trigger the SsrA tagging of an unrelated protein (CRP) when fused to its C terminus. Mass spectrometry analysis of the purified fusion proteins revealed that SsrA tagging occurs at a position corresponding to the termination codon. The alteration of the amino acid sequence but not the nucleotide sequence of the C-terminal portion eliminated the tagging. We also showed that the tagging-provoking sequences cause an efficient translational readthrough at UGA but not UAA codons. In addition, we found that C-terminal dipeptides known to induce an efficient translation readthrough could cause an efficient tagging at stop codons. We conclude that the amino acid sequence of nascent polypeptide prior to stop codons is a major determinant for the SsrA tagging at all three stop codons. PMID:12458795

  17. Sequence-Directed Covalent Protein-DNA Linkages in a Single Step Using HUH-Tags.

    PubMed

    Lovendahl, Klaus N; Hayward, Amanda N; Gordon, Wendy R

    2017-05-24

    We present a robust strategy to covalently link proteins and DNA using HUH-endonuclease domains as fusion partners (HUH-tags). We show that HUH-tags react robustly with specific sequences of unmodified single-stranded DNA, and we have identified five tags that react orthogonally with distinct DNA sequences. We demonstrate the versatility of HUH-tags as fusion partners in Cas9-mediated gene editing and the construction of doubly DNA-tethered proteins for single-molecule studies. Finally we demonstrate application to cellular imaging in live and fixed cells.

  18. Identification of differentially expressed transcripts from maturing stem of sugarcane by in silico analysis of stem expressed sequence tags and gene expression profiling.

    PubMed

    Casu, Rosanne E; Dimmock, Christine M; Chapman, Scott C; Grof, Christopher P L; McIntyre, C Lynne; Bonnett, Graham D; Manners, John M

    2004-03-01

    Sugarcane accumulates high concentrations of sucrose in the mature stem and a number of physiological processes on-going in maturing stem tissue both directly and indirectly allow this process. To identify transcripts that are associated with stem maturation, we compared patterns of gene expression in maturing and immature stem tissue by expression profiling and bioinformatic analysis of sets of stem ESTs. This study complements a previous study of gene expression associated directly with sugar metabolism in sugarcane. A survey of sequences derived from stem tissue identified an abundance of several classes of sequence that are associated with fibre biosynthesis in the maturing stem. A combination of EST analyses and microarray hybridization revealed that genes encoding homologues of the dirigent protein, a protein that assists in the stereospecificity of lignin assembly, were the most abundant and most strongly differentially expressed transcripts in maturing stem tissue. There was also evidence of coordinated expression of other categories of fibre biosynthesis and putative defence- and stress-related transcripts in the maturing stem. This study has demonstrated the utility of genomic approaches using large-scale EST acquisition and microarray hybridization techniques to highlight the very significant transcriptional investment the maturing stem of sugarcane has placed in fibre biosynthesis and stress tolerance, in addition to its already well-documented role in sugar accumulation.

  19. Comprehensive functional analyses of expressed sequence tags in common wheat (Triticum aestivum).

    PubMed

    Manickavelu, Alagu; Kawaura, Kanako; Oishi, Kazuko; Shin-I, Tadasu; Kohara, Yuji; Yahiaoui, Nabila; Keller, Beat; Abe, Reina; Suzuki, Ayako; Nagayama, Taishi; Yano, Kentaro; Ogihara, Yasunari

    2012-04-01

    About 1 million expressed sequence tag (EST) sequences comprising 125.3 Mb nucleotides were accreted from 51 cDNA libraries constructed from a variety of tissues and organs under a range of conditions, including abiotic stresses and pathogen challenges in common wheat (Triticum aestivum). Expressed sequence tags were assembled with stringent parameters after processing with inbuild scripts, resulting in 37,138 contigs and 215,199 singlets. In the assembled sequences, 10.6% presented no matches with existing sequences in public databases. Functional characterization of wheat unigenes by gene ontology annotation, mining transcription factors, full-length cDNA, and miRNA targeting sites were carried out. A bioinformatics strategy was developed to discover single-nucleotide polymorphisms (SNPs) within our large EST resource and reported the SNPs between and within (homoeologous) cultivars. Digital gene expression was performed to find the tissue-specific gene expression, and correspondence analysis was executed to identify common and specific gene expression by selecting four biotic stress-related libraries. The assembly and associated information cater a framework for future investigation in functional genomics.

  20. Comprehensive Functional Analyses of Expressed Sequence Tags in Common Wheat (Triticum aestivum)

    PubMed Central

    Manickavelu, Alagu; Kawaura, Kanako; Oishi, Kazuko; Shin-I, Tadasu; Kohara, Yuji; Yahiaoui, Nabila; Keller, Beat; Abe, Reina; Suzuki, Ayako; Nagayama, Taishi; Yano, Kentaro; Ogihara, Yasunari

    2012-01-01

    About 1 million expressed sequence tag (EST) sequences comprising 125.3 Mb nucleotides were accreted from 51 cDNA libraries constructed from a variety of tissues and organs under a range of conditions, including abiotic stresses and pathogen challenges in common wheat (Triticum aestivum). Expressed sequence tags were assembled with stringent parameters after processing with inbuild scripts, resulting in 37 138 contigs and 215 199 singlets. In the assembled sequences, 10.6% presented no matches with existing sequences in public databases. Functional characterization of wheat unigenes by gene ontology annotation, mining transcription factors, full-length cDNA, and miRNA targeting sites were carried out. A bioinformatics strategy was developed to discover single-nucleotide polymorphisms (SNPs) within our large EST resource and reported the SNPs between and within (homoeologous) cultivars. Digital gene expression was performed to find the tissue-specific gene expression, and correspondence analysis was executed to identify common and specific gene expression by selecting four biotic stress-related libraries. The assembly and associated information cater a framework for future investigation in functional genomics. PMID:22334568

  1. Generation of 7137 non-redundant expressed sequence tags from a legume, Lotus japonicus.

    PubMed

    Asamizu, E; Nakamura, Y; Sato, S; Tabata, S

    2000-04-28

    For comprehensive analysis of genes expressed in a model legume, Lotus japonicus, a total of 22,983 5' end expressed sequence tags (ESTs) were accumulated from normalized and size-selected cDNA libraries constructed from young (2 weeks old) plants. The EST sequences were clustered into 7137 non-redundant groups. Similarity search against public non-redundant protein database indicated that 3302 groups showed similarity to genes of known function, 1143 groups to hypothetical genes, and 2692 were novel sequences. Homologues of 5 nodule-specific genes which have been reported in other legume species were contained in the collected ESTs, suggesting that the EST source generated in this study will become a useful tool for identification of genes related to legume-specific biological processes. The sequence data of individual ESTs are available at the web site: http://www.kazusa.or.jp/en/plant/lotus/EST/.

  2. Application of an E. coli signal sequence as a versatile inclusion body tag.

    PubMed

    Jong, Wouter S P; Vikström, David; Houben, Diane; van den Berg van Saparoea, H Bart; de Gier, Jan-Willem; Luirink, Joen

    2017-03-21

    Heterologous protein production in Escherichia coli often suffers from bottlenecks such as proteolytic degradation, complex purification procedures and toxicity towards the expression host. Production of proteins in an insoluble form in inclusion bodies (IBs) can alleviate these problems. Unfortunately, the propensity of heterologous proteins to form IBs is variable and difficult to predict. Hence, fusing the target protein to an aggregation prone polypeptide or IB-tag is a useful strategy to produce difficult-to-express proteins in an insoluble form. When screening for signal sequences that mediate optimal targeting of heterologous proteins to the periplasmic space of E. coli, we observed that fusion to the 39 amino acid signal sequence of E. coli TorA (ssTorA) did not promote targeting but rather directed high-level expression of the human proteins hEGF, Pla2 and IL-3 in IBs. Further analysis revealed that ssTorA even mediated IB formation of the highly soluble endogenous E. coli proteins TrxA and MBP. The ssTorA also induced aggregation when fused to the C-terminus of target proteins and appeared functional as IB-tag in E. coli K-12 as well as B strains. An additive effect on IB-formation was observed upon fusion of multiple ssTorA sequences in tandem, provoking almost complete aggregation of TrxA and MBP. The ssTorA-moiety was successfully used to produce the intrinsically unstable hEGF and the toxic fusion partner SymE, demonstrating its applicability as an IB-tag for difficult-to-express and toxic proteins. We present proof-of-concept for the use of ssTorA as a small, versatile tag for robust E. coli-based expression of heterologous proteins in IBs.

  3. Protein identification with N and C-terminal sequence tags in proteome projects.

    PubMed

    Wilkins, M R; Gasteiger, E; Tonella, L; Ou, K; Tyler, M; Sanchez, J C; Gooley, A A; Walsh, B J; Bairoch, A; Appel, R D; Williams, K L; Hochstrasser, D F

    1998-05-08

    Genome sequences are available for increasing numbers of organisms. The proteomes (protein complement expressed by the genome) of many such organisms are being studied with two-dimensional (2D) gel electrophoresis. Here we have investigated the application of short N-terminal and C-terminal sequence tags to the identification of proteins separated on 2D gels. The theoretical N and C termini of 15, 519 proteins, representing all SWISS-PROT entries for the organisms Mycoplasma genitalium, Bacillus subtilis, Escherichia coli, Saccharomyces cerevisiae and human, were analysed. Sequence tags were found to be surprisingly specific, with N-terminal tags of four amino acid residues found to be unique for between 43% and 83% of proteins, and C-terminal tags of four amino acid residues unique for between 74% and 97% of proteins, depending on the species studied. Sequence tags of five amino acid residues were found to be even more specific. To utilise this specificity of sequence tags for protein identification, we created a world-wide web-accessible protein identification program, TagIdent (http://www.expasy.ch/www/tools.html), which matches sequence tags of up to six amino acid residues as well as estimated protein pI and mass against proteins in the SWISS-PROT database. We demonstrate the utility of this identification approach with sequence tags generated from 91 different E. coli proteins purified by 2D gel electrophoresis. Fifty-one proteins were unambiguously identified by virtue of their sequence tags and estimated pI and mass, and a further 11 proteins identified when sequence tags were combined with protein amino acid composition data. We conlcude that the TagIdent identification approach is best suited to the identification of proteins from prokaryotes whose complete genome sequences are available. The approach is less well suited to proteins from eukaryotes, as many eukaryotic proteins are not amenable to sequencing via Edman degradation, and tag protein

  4. Immunological responses of turbot (Psetta maxima) to nodavirus infection or polyriboinosinic polyribocytidylic acid (pIC) stimulation, using expressed sequence tags (ESTs) analysis and cDNA microarrays.

    PubMed

    Park, Kyoung C; Osborne, Jane A; Montes, Ariana; Dios, Sonia; Nerland, Audun H; Novoa, Beatriz; Figueras, Antonio; Brown, Laura L; Johnson, Stewart C

    2009-01-01

    To investigate the immunological responses of turbot to nodavirus infection or pIC stimulation, we constructed cDNA libraries from liver, kidney and gill tissues of nodavirus-infected fish and examined the differential gene expression within turbot kidney in response to nodavirus infection or pIC stimulation using a turbot cDNA microarray. Turbot were experimentally infected with nodavirus and samples of each tissue were collected at selected time points post-infection. Using equal amount of total RNA at each sampling time, we made three tissue-specific cDNA libraries. After sequencing 3230 clones we obtained 3173 (98.2%) high quality sequences from our liver, kidney and gill libraries. Of these 2568 (80.9%) were identified as known genes and 605 (19.1%) as unknown genes. A total of 768 unique genes were identified. The two largest groups resulting from the classification of ESTs according to function were the cell/organism defense genes (71 uni-genes) and apoptosis-related process (23 uni-genes). Using these clones, a 1920 element cDNA microarray was constructed and used to investigate the differential gene expression within turbot in response to experimental nodavirus infection or pIC stimulation. Kidney tissue was collected at selected times post-infection (HPI) or stimulation (HPS), and total RNA was isolated for microarray analysis. Of the 1920 genes studied on the microarray, we identified a total of 121 differentially expressed genes in the kidney: 94 genes from nodavirus-infected animals and 79 genes from those stimulated with pIC. Within the nodavirus-infected fish we observed the highest number of differentially expressed genes at 24 HPI. Our results indicate that certain genes in turbot have important roles in immune responses to nodavirus infection and dsRNA stimulation.

  5. Strategies for undertaking expressed sequence tag (EST) projects.

    PubMed

    Clifton, Sandra W; Mitreva, Makedonka

    2009-01-01

    Complementary DNA (cDNA) sequencing can be used to sample an organism's transcriptome, and the generated EST sequences can be used for a variety of purposes. They are especially important for enhancing the utility of a genome sequence or for providing a gene catalog for a genome that has not or will not be sequenced. In planning and executing a cDNA project, several criteria must be considered. One should clearly define the project purpose, including organism tissue(s) choice, whether those tissues should be pooled, ability to acquire adequate amounts of clean and well-preserved tissue, choice of type(s) of library, and construction of a library (or libraries) that is compatible with project goals. In addition, one must possess the skills to construct the library (or libraries), keeping in mind the number of clones that will be necessary to meet the project requirements. If one is inexperienced in cDNA library construction, it might be wise to outsource the library production and/or sequence and analysis to a sequencing center or to a company that specializes in those activities. One should also be aware that new sequencing platforms are being marketed that may offer simpler protocols that can produce cDNA data in a more rapid and economical manner. Of course, the bioinformatics tools will have to be in place to de-convolute and aid in data analysis for these newer technologies. Possible funding sources for these projects include well-justified grant proposals, private funding, and/or collaborators with available funds.

  6. Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

    PubMed Central

    Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

    2010-01-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087

  7. Analysis of expressed sequence tags from a single wheat cultivar facilitates interpretation of tandem mass spectrometry data and discrimination of gamma gliadin proteins that may play different functional roles in flour

    USDA-ARS?s Scientific Manuscript database

    The complement of gamma gliadin genes expressed in the wheat cultivar Butte 86 was evaluated by analyzing publicly available expressed sequence tag (EST) data. Eleven contigs were assembled from 153 Butte 86 ESTs. Nine of the contigs encoded full-length proteins and four of the proteins contained an...

  8. Genomic Sequence or Signature Tags (GSTs) from the Genome Group at Brookhaven National Laboratory (BNL)

    DOE Data Explorer

    Dunn, John J.; McCorkle, Sean R.; Praissman, Laura A.; Hind, Geoffrey; Van der Lelie, Daniel; Bahou, Wadie F.; Gnatenko, Dmitri V.; Krause, Maureen K.

    Genomic Signature Tags (GSTs) are the products of a method we have developed for identifying and quantitatively analyzing genomic DNAs. The DNA is initially fragmented with a type II restriction enzyme. An oligonucleotide adaptor containing a recognition site for MmeI, a type IIS restriction enzyme, is then used to release 21-bp tags from fixed positions in the DNA relative to the sites recognized by the fragmenting enzyme. These tags are PCR-amplified, purified, concatenated and then cloned and sequenced. The tag sequences and abundances are used to create a high resolution GST sequence profile of the genomic DNA. [Quoted from Genomic Signature Tags (GSTs): A System for Profiling Genomic DNA, Dunn, John J.; McCorkle, Sean R.; Praissman, Laura A.; Hind, Geoffrey; Van der Lelie, Daniel; Bahou, Wadie F.; Gnatenko, Dmitri V.; Krause, Maureen K., Revised 9/13/2002

  9. Peanut (Arachis hypogaea) expressed sequence tag (EST) project: Progress and application.

    USDA-ARS?s Scientific Manuscript database

    Millions of expressed sequence tag (EST) sequences from several hundred plant species have been deposited in public EST databases. Many plant ESTs have been sequenced as an alternative to whole genome sequences, including peanut because of the genome size and complexity. The US peanut research commu...

  10. Construction and characterization of an expressed sequenced tag library for the mosquito vector Armigeres subalbatus.

    PubMed

    Mayhew, George F; Bartholomay, Lyric C; Kou, Hang-Yen; Rocheleau, Thomas A; Fuchs, Jeremy F; Aliota, Matthew T; Tsao, I-Yu; Huang, Chiung-Yen; Liu, Tze-Tze; Hsiao, Kwang-Jen; Tsai, Shih-Feng; Yang, Ueng-Cheng; Perna, Nicole T; Cho, Wen-Long; Christensen, Bruce M; Chen, Cheng-Chen

    2007-12-18

    The mosquito, Armigeres subalbatus, mounts a distinctively robust innate immune response when infected with the nematode Brugia malayi, a causative agent of lymphatic filariasis. In order to mine the transcriptome for new insight into the cascade of events that takes place in response to infection in this mosquito, 6 cDNA libraries were generated from tissues of adult female mosquitoes subjected to immune-response activation treatments that lead to well-characterized responses, and from aging, naïve mosquitoes. Expressed sequence tags (ESTs) from each library were produced, annotated, and subjected to comparative analyses. Six libraries were constructed and used to generate 44,940 expressed sequence tags, of which 38,079 passed quality filters to be included in the annotation project and subsequent analyses. All of these sequences were collapsed into clusters resulting in 8,020 unique sequence clusters or singletons. EST clusters were annotated and curated manually within ASAP (A Systematic Annotation Package for Community Analysis of Genomes) web portal according to BLAST results from comparisons to Genbank, and the Anopheles gambiae and Drosophila melanogaster genome projects. The resulting dataset is the first of its kind for this mosquito vector and provides a basis for future studies of mosquito vectors regarding the cascade of events that occurs in response to infection, and thereby providing insight into vector competence and innate immunity.

  11. Grouping and identification of sequence tags (GRIST): bioinformatics tools for the NEIBank database.

    PubMed

    Wistow, Graeme; Bernstein, Steven L; Touchman, Jeffrey W; Bouffard, Gerald; Wyatt, M Keith; Peterson, Katherine; Behal, Amita; Gao, James; Buchoff, Patee; Smith, Don

    2002-06-15

    NEIBank is a project to develop and organize genomics and bioinformatics resources for the eye. As part of this effort, tools have been developed for bioinformatics analysis and web based display of data from expressed sequence tag (EST) analyses. EST sequences are identified and formed into groups or clusters representing related transcripts from the same gene. This is carried out by a rules-based procedure called GRIST (GRouping and Identification of Sequence Tags) that uses sequence match parameters derived from BLAST programs. Linked procedures are used to eliminate non-mRNA contaminants. All data are assembled in a relational database and assembled for display as web pages with annotations and links to other informatics resources. Genome projects generate huge amounts of data that need to be classified and organized to become easily accessible to the research community. GRIST provides a useful tool for assembling and displaying the results of EST analyses. The NEIBank web site contains a growing set of pages cataloging the known transcriptional repertoire of eye tissues, derived from new NEIBank cDNA libraries and from eye-related data deposited in the dbEST section of GenBank.

  12. Expressed sequence tags identify human isologs of the ARF-dependent phospholipase D.

    PubMed

    Ribbes, G; Henry, J; Cariven, C; Pontarotti, P; Chap, H; Record, M

    1996-07-05

    By searching into Expressed Sequence Tags databases (dbEST) using Blast X algorithm software and a plant phospholipase D as template, we have identified a cDNA from human brain (Z45777) which encodes for a protein similar to the amino acid region 743-929 of the human phospholipase D1 (PLD1), and a cDNA from human liver (R93485) which encodes for a protein similar to region 815-932 of PLD1. Sequence comparison between cloned phospholipases showed the presence of 3 conserved amino acid sequences: AFVGGIDLAYGRWD (box A), IIGSANINDRS (box B), and YIYIENQFFI (box C). Phylogenic analysis indicated that the cDNA from brain and liver encoded for human isologs of PLD1.

  13. The construction of Arabidopsis expressed sequence tag assemblies. A new resource to facilitate gene identification.

    PubMed Central

    Rounsley, S D; Glodek, A; Sutton, G; Adams, M D; Somerville, C R; Venter, J C; Kerlavage, A R

    1996-01-01

    The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has provided a method with which to sample a large number of genes from an organism. More than 25,000 Arabidopsis thaliana ESTs have been deposited in public databases, producing the largest collection of ESTs for any plant species. We describe here the application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same gene into a "contig" or assembly. The increased information content of these assemblies allows more putative identifications to be assigned based on the results of similarity searches with nucleotide and protein databases. The results of this analysis indicate that sequence information is available for approximately 12,600 nonoverlapping ESTs from Arabidopsis. Comparison of the assemblies with 953 Arabidopsis coding sequences indicates that up to 57% of all Arabidopsis genes are represented by an EST. Clustering analysis of these sequences suggests that between 300 and 700 gene families are represented by between 700 and 2000 sequences in the EST database. A database of the assembled sequences, their putative identifications, and cellular roles is available through the World Wide Web. PMID:8938416

  14. Expressed sequence tags from a NaCl-treated Suaeda salsa cDNA library.

    PubMed

    Zhang, L; Ma, X L; Zhang, Q; Ma, C L; Wang, P P; Sun, Y F; Zhao, Y X; Zhang, H

    2001-04-18

    Past efforts to improve plant tolerance to osmotic stress have had limited success owing to the genetic complexity of stress responses. The first step towards cataloging and categorizing genetically complex abotic stress responses is the rapid discovery of genes by the large-scale partial sequencing of randomly selected cDNA clones or expressed sequence tags (ESTs). Suaeda salsa, which can survive seawater-level salinity, is a favorite halophytic model for salt tolerant research. We constructed a NaCl-treated cDNA library of Suaeda salsa and sequenced 1048 randomly selected clones, out of which 1016 clones produced readable sequences (773 showed homology to previously identified genes, 227 matched unknown protein coding regions, 16 anomalous sequences or sequences of bacterial origin were excluded from further analysis). By sequence analysis we identified 492 unique clones: 315 showed homology to previously identified genes, 177 matched unknown protein coding regions (101 of which have been found before in other organisms and 76 are completely novel). All our EST data are available on the Internet. We believe that our dbEST and the associated DNA materials will be a useful source to scientists engaging in stress-tolerance study.

  15. Scratching the surface of the rare biosphere with ribosomal sequence tag primers.

    PubMed

    Neufeld, Josh D; Li, Jason; Mohn, William W

    2008-06-01

    Increasingly large datasets of 16S rRNA gene sequences reveal new information about the extent of microbial diversity and the surprising extent of the rare biosphere. Currently, many of the largest datasets are represented by short and variable ribosomal sequence tags (RSTs) that are limited in their ability to accurately assign sequences to broad-scale phylogenetic trees. In this study, we selected 30 rare RSTs from existing sequence datasets and designed primers to amplify c. 1400 bases of the 16S rRNA gene to determine whether these sequences were represented by existing databases or if they might reveal new lineages within the Bacteria. Approximately one-third of the RST primers successfully amplified longer portions of these low-abundance 16S rRNA genes in a specific manner. Subsequent phylogenetic analysis demonstrated that most of these sequences were (1) distantly related to existing cultivated microorganisms and (2) closely related to uncultivated clone sequences that were recently deposited in GenBank. The presence of so many recently collected 16S rRNA gene reference sequences in existing databases suggests that progress is being made quickly towards a microbial census, one which has begun scratching the surface of the 'rare biosphere'.

  16. An expressed sequence tag (EST) data mining strategy succeeding in the discovery of new G-protein coupled receptors.

    PubMed

    Wittenberger, T; Schaller, H C; Hellebrand, S

    2001-03-30

    We have developed a comprehensive expressed sequence tag database search method and used it for the identification of new members of the G-protein coupled receptor superfamily. Our approach proved to be especially useful for the detection of expressed sequence tag sequences that do not encode conserved parts of a protein, making it an ideal tool for the identification of members of divergent protein families or of protein parts without conserved domain structures in the expressed sequence tag database. At least 14 of the expressed sequence tags found with this strategy are promising candidates for new putative G-protein coupled receptors. Here, we describe the sequence and expression analysis of five new members of this receptor superfamily, namely GPR84, GPR86, GPR87, GPR90 and GPR91. We also studied the genomic structure and chromosomal localization of the respective genes applying in silico methods. A cluster of six closely related G-protein coupled receptors was found on the human chromosome 3q24-3q25. It consists of four orphan receptors (GPR86, GPR87, GPR91, and H963), the purinergic receptor P2Y1, and the uridine 5'-diphosphoglucose receptor KIAA0001. It seems likely that these receptors evolved from a common ancestor and therefore might have related ligands. In conclusion, we describe a data mining procedure that proved to be useful for the identification and first characterization of new genes and is well applicable for other gene families.

  17. Expressed sequence tags (ESTs) and simple sequence repeat (SSR) markers from octoploid strawberry (Fragaria × ananassa)

    PubMed Central

    Folta, Kevin M; Staton, Margaret; Stewart, Philip J; Jung, Sook; Bies, Dawn H; Jesdurai, Christopher; Main, Dorrie

    2005-01-01

    Background Cultivated strawberry (Fragaria × ananassa) represents one of the most valued fruit crops in the United States. Despite its economic importance, the octoploid genome presents a formidable barrier to efficient study of genome structure and molecular mechanisms that underlie agriculturally-relevant traits. Many potentially fruitful research avenues, especially large-scale gene expression surveys and development of molecular genetic markers have been limited by a lack of sequence information in public databases. As a first step to remedy this discrepancy a cDNA library has been developed from salicylate-treated, whole-plant tissues and over 1800 expressed sequence tags (EST's) have been sequenced and analyzed. Results A putative unigene set of 1304 sequences – 133 contigs and 1171 singlets – has been developed, and the transcripts have been functionally annotated. Homology searches indicate that 89.5% of sequences share significant similarity to known/putative proteins or Rosaceae ESTs. The ESTs have been functionally characterized and genes relevant to specific physiological processes of economic importance have been identified. A set of tools useful for SSR development and mapping is presented. Conclusion Sequences derived from this effort may be used to speed gene discovery efforts in Fragaria and the Rosaceae in general and also open avenues of comparative mapping. This report represents a first step in expanding molecular-genetic analyses in strawberry and demonstrates how computational tools can be used to optimally mine a large body of useful information from a relatively small data set. PMID:15985176

  18. Signature tagged mutagenesis in the functional genetic analysis of gastrointestinal pathogens

    PubMed Central

    Cummins, Joanne; Gahan, Cormac G.M.

    2012-01-01

    Signature tagged mutagenesis is a genetic approach that was developed to identify novel bacterial virulence factors. It is a negative selection method in which unique identification tags allow analysis of pools of mutants in mixed populations. The approach is particularly well suited to functional genetic analysis of the gastrointestinal phase of infection in foodborne pathogens and has the capacity to guide the development of novel vaccines and therapeutics. In this review we outline the technical principles underpinning signature-tagged mutagenesis as well as novel sequencing-based approaches for transposon mutant identification such as TraDIS (transposon directed insertion-site sequencing). We also provide an analysis of screens that have been performed in gastrointestinal pathogens which are a global health concern (Escherichia coli, Listeria monocytogenes, Helicobacter pylori, Vibrio cholerae and Salmonella enterica). The identification of key virulence loci through the use of signature tagged mutagenesis in mice and relevant larger animal models is discussed. PMID:22555467

  19. Generation and analysis of a large-scale expressed sequence tags from a full-length enriched cDNA library of Siberian tiger (Panthera tigris altaica).

    PubMed

    Guo, Yu; Liu, Changqing; Lu, Taofeng; Liu, Dan; Bai, Chunyu; Li, Xiangchen; Ma, Yuehui; Guan, Weijun

    2014-05-15

    In this study, a full-length enriched cDNA library was successfully constructed from Siberian tiger, the world's most endangered species. The titers of primary and amplified libraries were 1.28×10(6)pfu/mL and 1.59×10(10)pfu/mL respectively. The proportion of recombinants from unamplified library was 91.3% and the average length of exogenous inserts was 1.06kb. A total of 279 individual ESTs with sizes ranging from 316 to 1258bps were then analyzed. Furthermore, 204 unigenes were successfully annotated and involved in 49 functions of the GO classification, cell (175, 85.5%), cellular process (165, 80.9%), and binding (152, 74.5%) are the dominant terms. 198 unigenes were assigned to 156 KEGG pathways, and the pathways with the most representation are metabolic pathways (18, 9.1%). The proportion pattern of each COG subcategory was similar among Panthera tigris altaica, P. tigris tigris and Homo sapiens, and general function prediction only cluster (44, 15.8%) represents the largest group, followed by translation, ribosomal structure and biogenesis (33, 11.8%), replication, recombination and repair (24, 8.6%), and only 7.2% ESTs classified as novel genes. Moreover, the recombinant plasmid pET32a-TAT-COL6A2 was constructed, coded for the Trx-TAT-COL6A2 fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-COL6A2 recombinant protein was 2.64±0.18mg/mL. This library will provide a useful platform for the functional genome and transcriptome research of for the P. tigris and other felid animals in the future. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. Primer and platform effects on 16S rRNA tag sequencing

    SciTech Connect

    Tremblay, Julien; Singh, Kanwar; Fern, Alison; Kirton, Edward S.; He, Shaomei; Woyke, Tanja; Lee, Janey; Chen, Feng; Dangl, Jeffery L.; Tringe, Susannah G.

    2015-08-04

    Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as well as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. In conclusion, beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.

  1. Primer and platform effects on 16S rRNA tag sequencing

    DOE PAGES

    Tremblay, Julien; Singh, Kanwar; Fern, Alison; ...

    2015-08-04

    Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as wellmore » as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. In conclusion, beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.« less

  2. Rapid in silico cloning of genes using expressed sequence tags (ESTs).

    PubMed

    Gill, R W; Sanseau, P

    2000-01-01

    Expressed sequence tags (ESTs) are short single-pass DNA sequences obtained from either end of cDNA clones. These ESTs are derived from a vast number of cDNA libraries obtained from different species. Human ESTs are the bulk of the data and have been widely used to identify new members of gene families, as markers on the human chromosomes, to discover polymorphism sites and to compare expression patterns in different tissues or pathologies states. Information strategies have been devised to query EST databases. Since most of the analysis is performed with a computer, the term "in silico" strategy has been coined. In this chapter we will review the current status of EST databases, the pros and cons of EST-type data and describe possible strategies to retrieve meaningful information.

  3. Generation of expressed sequence tags from a normalized porcine skeletal muscle cDNA library.

    PubMed

    Yao, Jianbo; Coussens, Paul M; Saama, Peter; Suchyta, Steven; Ernst, Catherine W

    2002-11-01

    Recent developments in microarray technologies permit scientists to analyze expression of thousands of genes simultaneously in diverse biological systems. In an effort to provide integrated resources for application of microarray technologies to studies of skeletal muscle growth and development in swine, we have constructed a normalized cDNA library from porcine skeletal muscle. The effectiveness of normalization was evaluated by DNA sequencing of clones randomly picked from the library before and after normalization, and also by Southern blot hybridization using probes representing abundant transcripts. Our data suggests that the normalization procedure successfully reduced the highly abundant cDNA species in the normalized library. To date, a total of 782 EST (expressed sequence tag) sequences have been generated from this normalized library (687 ESTs) and the original library (95 ESTs). The sequence information of these ESTs plus their BLAST results has been made available through a web accessible database (http://nbfgc.msu.edu). Cluster analysis of the data indicates that a total of 742 unique sequences are present in this collection. BLASTN search of the 742 EST sequences against the public database (dbEST) revealed that 139 had no significant matches (E-value > 10(-15)) to porcine ESTs already entered in the database, suggesting the possibility of their specific expression in porcine skeletal muscle. Generation of non-redundant ESTs from this library will allow us to construct cDNA microarrays for identification of gene expression changes that regulate muscle growth and affect meat quality in swine.

  4. Phylogeny of Saccharina and Laminaria (Laminariaceae, Laminariales, Phaeophyta) in sequence-tagged-site markers

    NASA Astrophysics Data System (ADS)

    Qu, Jieqiong; Zhang, Jing; Wang, Xumin; Chi, Shan; Liu, Cui; Liu, Tao

    2014-01-01

    Laminaria and Saccharina have recently been recognized as two independent clades from the former genus Laminaria. Traditional morphological taxonomy is being challenged by molecular evidence from both nucleus and plastid. Intensive work is in great demand from the perspective of genome colinearity. In this study, 118 sequence-tagged site (STS) markers were screened for phylogenetic analyses, 29 based on genome sequences, while 89 were based on expressed sequence tag (EST) sequences. EST-based STS marker development (29.37%) had an effi ciency twice as high as genome-sequence-based development (9.48%) as a result of high conservation of gene transcripts among the relative species. S. ochotensis, S. religiosa, S. japonica, and L. hyperborea showed great homogeneity in all 118 STS markers. Our result supports the view that the diversifi cation between the genera Saccharina and Laminaria was a more recent event and that Saccharina and Laminaria shared high phylogenetic affi nity. However, when it came to the single nucleotide polymorphism (SNP) level among the 41 SNPs, L. hyperborea owned 29 unique SNPs against 12 within the left three Saccharina species and 12 of the 13 indels were supposedly unique for L. hyperborea, indicated by its high variability. Originating from homologous ancestors, species between the recently diverged genera Laminaria and Saccharina may have taken in enough mutations at the SNP level only, in spite of different evolutionary strategies for better adaptation to the environment. Our study lays a solid foundation from a new perspective, although more accurate phylogenetic analysis is still needed to clarify the evolutionary traces between the genera Saccharina and Laminaria.

  5. Isolation, annotation and applications of expressed sequence tags from the olive fly, Bactrocera oleae.

    PubMed

    Tsoumani, K T; Augustinos, A A; Kakani, E G; Drosopoulou, E; Mavragani-Tsipidou, P; Mathiopoulos, K D

    2011-01-01

    The olive fruit fly, Bactrocera oleae, is the major pest of the olive tree. Despite its importance, very little genetic and molecular knowledge is available. The present study is a first attempt to identify and characterize B. oleae expressed sequence tags (ESTs). One hundred and ninety-five randomly selected cDNA clones were isolated and the obtained sequences were annotated through BLASTX similarity searches. A set of 159 unique putative transcripts were functionally assigned using Gene Ontology terms in broad categories of biological process, molecular function and cellular component based on D. melanogaster matches. Moreover, the cytogenetic location of 35 ESTs was determined by in situ hybridization to B. oleae polytene chromosomes. The resulting low-resolution EST map more than doubles the available entry points to the insect's genome and can assist syntenic comparisons with other distant species. The deduced codon usage of the isolated ESTs suggested a conserved pattern of B. oleae with its closest relatives. Additionally, the comparative analysis of B. oleae ESTs with the homologous D. melanogaster genes led to the development of 17 nuclear EPIC-PCR markers for the amplification of intron sequences of 11 Tephritidae species. Sequencing analysis of several cross-amplified intron sequences revealed a high degree of conservation among Bactrocera species and a varying transferability of the generated markers across the examined genera, suggesting that this method can provide a useful tool for the clarification of phylogenetic relationships among different species, particularly in cases of species complexes.

  6. A blackberry (Rubus L.) expressed sequence tag library for the development of simple sequence repeat markers.

    PubMed

    Lewers, Kim S; Saski, Chris A; Cuthbertson, Brandon J; Henry, David C; Staton, Meg E; Main, Dorrie S; Dhanaraj, Anik L; Rowland, Lisa J; Tomkins, Jeff P

    2008-06-20

    The recent development of novel repeat-fruiting types of blackberry (Rubus L.) cultivars, combined with a long history of morphological marker-assisted selection for thornlessness by blackberry breeders, has given rise to increased interest in using molecular markers to facilitate blackberry breeding. Yet no genetic maps, molecular markers, or even sequences exist specifically for cultivated blackberry. The purpose of this study is to begin development of these tools by generating and annotating the first blackberry expressed sequence tag (EST) library, designing primers from the ESTs to amplify regions containing simple sequence repeats (SSR), and testing the usefulness of a subset of the EST-SSRs with two blackberry cultivars. A cDNA library of 18,432 clones was generated from expanding leaf tissue of the cultivar Merton Thornless, a progenitor of many thornless commercial cultivars. Among the most abundantly expressed of the 3,000 genes annotated were those involved with energy, cell structure, and defense. From individual sequences containing SSRs, 673 primer pairs were designed. Of a randomly chosen set of 33 primer pairs tested with two blackberry cultivars, 10 detected an average of 1.9 polymorphic PCR products. This rate predicts that this library may yield as many as 940 SSR primer pairs detecting 1,786 polymorphisms. This may be sufficient to generate a genetic map that can be used to associate molecular markers with phenotypic traits, making possible molecular marker-assisted breeding to compliment existing morphological marker-assisted breeding in blackberry.

  7. Construction of a chromosome-assigned, sequence-tagged linkage map for the radish, Raphanus sativus L. and QTL analysis of morphological traits

    PubMed Central

    Hashida, Tomoko; Nakatsuji, Ryoichi; Budahn, Holger; Schrader, Otto; Peterka, Herbert; Fujimura, Tatsuhito; Kubo, Nakao; Hirai, Masashi

    2013-01-01

    The radish displays great morphological variation but the genetic factors underlying this variability are mostly unknown. To identify quantitative trait loci (QTLs) controlling radish morphological traits, we cultivated 94 F4 and F5 recombinant inbred lines derived from a cross between the rat-tail radish and the Japanese radish cultivar ‘Harufuku’ inbred lines. Eight morphological traits (ovule and seed numbers per silique, plant shape, pubescence and root formation) were measured for investigation. We constructed a map composed of 322 markers with a total length of 673.6 cM. The linkage groups were assigned to the radish chromosomes using disomic rape-radish chromosome-addition lines. On the map, eight and 10 QTLs were identified in 2008 and 2009, respectively. The chromosome-linkage group correspondence, the sequence-specific markers and the QTLs detected here will provide useful information for further genetic studies and for selection during radish breeding programs. PMID:23853517

  8. Developing expressed sequence tag libraries and the discovery of simple sequence repeat markers for two species of raspberry (Rubus L.).

    PubMed

    Bushakra, Jill M; Lewers, Kim S; Staton, Margaret E; Zhebentyayeva, Tetyana; Saski, Christopher A

    2015-10-26

    Due to a relatively high level of codominant inheritance and transferability within and among taxonomic groups, simple sequence repeat (SSR) markers are important elements in comparative mapping and delineation of genomic regions associated with traits of economic importance. Expressed sequence tags (ESTs) are a source of SSRs that can be used to develop markers to facilitate plant breeding and for more basic research across genera and higher plant orders. Leaf and meristem tissue from 'Heritage' red raspberry (Rubus idaeus) and 'Bristol' black raspberry (R. occidentalis) were utilized for RNA extraction. After conversion to cDNA and library construction, ESTs were sequenced, quality verified, assembled and scanned for SSRs.  Primers flanking the SSRs were designed and a subset tested for amplification, polymorphism and transferability across species. ESTs containing SSRs were functionally annotated using the GenBank non-redundant (nr) database and further classified using the gene ontology database. To accelerate development of EST-SSRs in the genus Rubus (Rosaceae), 1149 and 2358 cDNA sequences were generated from red raspberry and black raspberry, respectively. The cDNA sequences were screened using rigorous filtering criteria which resulted in the identification of 121 and 257 SSR loci for red and black raspberry, respectively. Primers were designed from the surrounding sequences resulting in 131 and 288 primer pairs, respectively, as some sequences contained more than one SSR locus. Sequence analysis revealed that the SSR-containing genes span a diversity of functions and share more sequence identity with strawberry genes than with other Rosaceous species. This resource of Rubus-specific, gene-derived markers will facilitate the construction of linkage maps composed of transferable markers for studying and manipulating important traits in this economically important genus.

  9. Transcript profiling of expressed sequence tags from semimembranosus muscle of commercial and naturalized pig breeds.

    PubMed

    Nascimento, C S; Peixoto, J O; Verardo, L L; Campos, C F; Weller, M M C; Faria, V R; Botelho, M E; Martins, M F; Machado, M A; Silva, F F; Lopes, P S; Guimarães, S E F

    2012-09-17

    In general, genetic differences across different breeds of pig lead to variation in mature body size and slaughter age. The Commercial breeds Duroc and Large White and the local Brazilian breed Piau are ostensibly distinct in terms of growth and muscularity, commercial breeds are much leaner while local breeds grow much slower and are fat type pigs. However, the genetic factors that underlie such distinctions remain unclear. We used expressed sequence tags (ESTs) to characterize and compare transcript profiles in the semimembranosus muscle of these pig breeds. Our aim was to identify differences in breed-related gene expression that might influence growth performance and meat quality. We constructed three non-normalized cDNA libraries from semimembranosus muscle, using two samples from each one, of these three breeds; 6902 high-quality ESTs were obtained. Cluster analysis was performed and these sequences were clustered into 3670 unique sequences; 24.7% of the sequences were categorized as contigs and 75.3% of the sequences were singletons. Based on homology searches against the SwissProt protein database, we were able to assign a putative protein identity to only 1050 unique sequences. Among these, 58.5% were full-length protein sequences and 17.2% were pig-specific sequences. Muscle structural and cytoskeletal proteins, such as actin, and myosin, were the most abundant transcripts (16.7%) followed by those related to mitochondrial function (12.9%), and ribosomal proteins (12.4%). Furthermore, ESTs generated in this study provide a rich source for identification of novel genes and for the comparative analysis of gene expression patterns in divergent pig breeds.

  10. Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags

    SciTech Connect

    Xu, Y.; Mural, R.; Uberbacher, E.

    1997-02-01

    Computational methods for gene identification in genomic sequences typically have two phases: coding region prediction and gene parsing. While there are many effective methods for predicting coding regions (exons), parsing the predicted exons into proper gene structures, to a large extent, remains an unsolved problem. This paper presents an algorithm for inferring gene structures from predicted exon candidates, based on Expressed Sequence Tags (ESTs) and biological intuition/rules. The algorithm first finds all the related ESTs in the EST database (dbEST) for each predicted exon, and infers the boundaries of one or a series of genes based on the available EST information and biological rules. Then it constructs gene models within each pair of gene boundaries, that are most consistent with the EST information. By exploiting EST information and biological rules, the algorithm can (1) model complicated multiple gene structures, including embedded genes, (2) identify falsely-predicted exons and locate missed exons, and (3) make more accurate exon boundary predictions. The algorithm has been implemented and tested on long genomic sequences with a number of genes. Test results show that very accurate (predicted) gene models can be expected when related ESTs exist for the predicted exons.

  11. Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags.

    PubMed

    Xu, Y; Mural, R J; Uberbacher, E C

    1997-01-01

    Computational methods for gene identification in genomic sequences typically have two phases: coding region prediction and gene parsing. While there are many effective methods for predicting coding regions (exons), parsing the predicted exons into proper gene structures, to a large extent, remains an unsolved problem. This paper presents an algorithm for inferring gene structures from predicted exon candidates, based on Expressed Sequence Tags (ESTs) and biological intuition/rules. The algorithm first finds all the related ESTs in the EST database (dbEST) for each predicted exon, and infers the boundaries of one or a series of genes based on the available EST information and biological rules. Then it constructs gene models within each pair of gene boundaries, that are most consistent with the EST information. By exploiting EST information and biological rules, the algorithm can (1) model complicated multiple gene structures, including embedded genes, (2) identify falsely-predicted exons and locate missed exons, and (3) make more accurate exon boundary predictions. The algorithm has been implemented and tested on long genomic sequences with a number of genes. Test results show that very accurate (predicted) gene models can be expected when related ESTs exist for the predicted exons.

  12. Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags

    PubMed Central

    Shangguan, Lingfei; Han, Jian; Kayesh, Emrul; Sun, Xin; Zhang, Changqing; Pervaiz, Tariq; Wen, Xicheng; Fang, Jinggui

    2013-01-01

    Background With the completion of genome sequencing projects for more than 30 plant species, large volumes of genome sequences have been produced and stored in online databases. Advancements in sequencing technologies have reduced the cost and time of whole genome sequencing enabling more and more plants to be subjected to genome sequencing. Despite this, genome sequence qualities of multiple plants have not been evaluated. Methodology/Principal Finding Integrity and accuracy were calculated to evaluate the genome sequence quality of 32 plants. The integrity of a genome sequence is presented by the ratio of chromosome size and genome size (or between scaffold size and genome size), which ranged from 55.31% to nearly 100%. The accuracy of genome sequence was presented by the ratio between matched EST and selected ESTs where 52.93% ∼ 98.28% and 89.02% ∼ 98.85% of the randomly selected clean ESTs could be mapped to chromosome and scaffold sequences, respectively. According to the integrity, accuracy and other analysis of each plant species, thirteen plant species were divided into four levels. Arabidopsis thaliana, Oryza sativa and Zea mays had the highest quality, followed by Brachypodium distachyon, Populus trichocarpa, Vitis vinifera and Glycine max, Sorghum bicolor, Solanum lycopersicum and Fragaria vesca, and Lotus japonicus, Medicago truncatula and Malus × domestica in that order. Assembling the scaffold sequences into chromosome sequences should be the primary task for the remaining nineteen species. Low GC content and repeat DNA influences genome sequence assembly. Conclusion The quality of plant genome sequences was found to be lower than envisaged and thus the rapid development of genome sequencing projects as well as research on bioinformatics tools and the algorithms of genome sequence assembly should provide increased processing and correction of genome sequences that have already been published. PMID:23922843

  13. [Differentiation, identification and development of database of T. aestivum L. varieties of Ukrainian selection on the basis of sequence-tagged analysis of microsatellite repeats].

    PubMed

    Chebotar', S V; Sivolap, Iu M

    2001-01-01

    Determination of the variety genotype is very important for the development of theory and practice of plant breeding and for right protection of a variety originator. In this reason attention is focused on the molecular markers generated by polymerase chain reaction. On the basis of STMS-analysis principles of identification and development of database, which reflect molecular-genetics peculiarities of some varieties of the Plant Breeding and Genetics Institute and other Ukrainian breeding organizations, are formulated. Allelic state at microsatellite loci and their distribution were investigated. Wheat varieties were ranged according to genetic distances, data on pedigree and cluster distribution of varieties obtained using computer programs were compared.

  14. Development of peanut expessed sequence tag-based genomic resources and tools

    USDA-ARS?s Scientific Manuscript database

    U.S. Peanut Genome Initiative (PGI) has widely recognized the need for peanut genome tools and resources development for mitigating peanut allergens and food safety. Genomics such as Expressed Sequence Tag (EST), microarray technologies, and whole genome sequencing provides robotic tools for profili...

  15. Synthesis and properties of 2'-deoxycytidine triphosphate carrying c-myc tag sequence.

    PubMed

    Hinz, M; Gottschling, D; Eritja, R; Seliger, H

    2000-01-01

    The synthesis of 2'-deoxycytidine triphosphate carrying mercaptoethyl groups at position 4 of cytosine is described. This nucleoside triphosphate was reacted with a maleimido-peptide carrying the c-myc tag-sequence to yield a peptide-nucleoside triphosphate chimera. Primer extension studies showed that the nucleoside triphosphate modified with the peptide sequence is incorporated by DNA polymerases opposite guanine.

  16. Unique archaeal assemblages in the Arctic Ocean unveiled by massively parallel tag sequencing.

    PubMed

    Galand, Pierre E; Casamayor, Emilio O; Kirchman, David L; Potvin, Marianne; Lovejoy, Connie

    2009-07-01

    The Arctic Ocean plays a critical role in controlling nutrient budgets between the Pacific and Atlantic Ocean. Archaea are key players in the nitrogen cycle and in cycling nutrients, but their community composition has been little studied in the Arctic Ocean. Here, we characterize archaeal assemblages from surface and deep Arctic water masses using massively parallel tag sequencing of the V6 region of the 16S rRNA gene. This approach gave a very high coverage of the natural communities, allowing a precise description of archaeal assemblages. This first taxonomic description of archaeal communities by tag sequencing reported so far shows that it is possible to assign an identity below phylum level to most (95%) of the archaeal V6 tags, and shows that tag sequencing is a powerful tool for resolving the diversity and distribution of specific microbes in the environment. Marine group I Crenarchaeota was overall the most abundant group in the Arctic Ocean and comprised between 27% and 63% of all tags. Group III Euryarchaeota were more abundant in deep-water masses and represented the largest archaeal group in the deep Atlantic layer of the central Arctic Ocean. Coastal surface waters, in turn, harbored more group II Euryarchaeota. Moreover, group II sequences that dominated surface waters were different from the group II sequences detected in deep waters, suggesting functional differences in closely related groups. Our results unveiled for the first time an archaeal community dominated by group III Euryarchaeota and show biogeographical traits for marine Arctic Archaea.

  17. Elucidation of the metabolic fate of glucose in the filamentous fungus Trichoderma reesei using expressed sequence tag (EST) analysis and cDNA microarrays.

    PubMed

    Chambergo, Felipe S; Bonaccorsi, Eric D; Ferreira, Ari J S; Ramos, Augusto S P; Ferreira Júnior, José Ribamar; Abrahão-Neto, José; Farah, João P Simon; El-Dorry, Hamza

    2002-04-19

    Despite the intense interest in the metabolic regulation and evolution of the ATP-producing pathways, the long standing question of why most multicellular microorganisms metabolize glucose by respiration rather than fermentation remains unanswered. One such microorganism is the cellulolytic fungus Trichoderma reesei (Hypocrea jecorina). Using EST analysis and cDNA microarrays, we find that in T. reesei expression of the genes encoding the enzymes of the tricarboxylic acid cycle and the proteins of the electron transport chain is programmed in a way that favors the oxidation of pyruvate via the tricarboxylic acid cycle rather than its reduction to ethanol by fermentation. Moreover, the results indicate that acetaldehyde may be channeled into acetate rather than ethanol, thus preventing the regeneration of NAD(+), a pivotal product required for anaerobic metabolism. The studies also point out that the regulatory machinery controlled by glucose was most probably the target of evolutionary pressure that directed the flow of metabolites into respiratory metabolism rather than fermentation. This finding has significant implications for the development of metabolically engineered cellulolytic microorganisms for fuel production from cellulose biomass.

  18. Next generation barcode tagged sequencing for monitoring microbial community dynamics.

    PubMed

    Breakwell, Katy; Tetu, Sasha G; Elbourne, Liam D H

    2014-01-01

    Microbial identification using 16S rDNA variable regions has become increasingly popular over the past decade. The application of next-generation amplicon sequencing to these regions allows microbial communities to be sequenced in far greater depth than previous techniques, as well as allowing for the identification of unculturable or rare organisms within a sample. Multiplexing can be used to sequence multiple samples in tandem through the use of sample-specific identification sequences which are attached to each amplicon, making this a cost-effective method for large-scale microbial identification experiments.

  19. Expression sequence tag library derived from peripheral blood mononuclear cells of the chlorocebus sabaeus

    PubMed Central

    2012-01-01

    Background African Green Monkeys (AGM) are amongst the most frequently used nonhuman primate models in clinical and biomedical research, nevertheless only few genomic resources exist for this species. Such information would be essential for the development of dedicated new generation technologies in fundamental and pre-clinical research using this model, and would deliver new insights into primate evolution. Results We have exhaustively sequenced an Expression Sequence Tag (EST) library made from a pool of Peripheral Blood Mononuclear Cells from sixteen Chlorocebus sabaeus monkeys. Twelve of them were infected with the Simian Immunodeficiency Virus. The mononuclear cells were or not stimulated in vitro with Concanavalin A, with lipopolysacharrides, or through mixed lymphocyte reaction in order to generate a representative and broad library of expressed sequences in immune cells. We report here 37,787 sequences, which were assembled into 14,410 contigs representing an estimated 12% of the C. sabaeus transcriptome. Using data from primate genome databases, 9,029 assembled sequences from C. sabaeus could be annotated. Sequences have been systematically aligned with ten cDNA references of primate species including Homo sapiens, Pan troglodytes, and Macaca mulatta to identify ortholog transcripts. For 506 transcripts, sequences were quasi-complete. In addition, 6,576 transcript fragments are potentially specific to the C. sabaeus or corresponding to not yet described primate genes. Conclusions The EST library we provide here will prove useful in gene annotation efforts for future sequencing of the African Green Monkey genomes. Furthermore, this library, which particularly well represents immunological and hematological gene expression, will be an important resource for the comparative analysis of gene expression in clinically relevant nonhuman primate and human research. PMID:22726727

  20. A blackberry (Rubus L.) expressed sequence tag library for the development of simple sequence repeat markers

    PubMed Central

    Lewers, Kim S; Saski, Chris A; Cuthbertson, Brandon J; Henry, David C; Staton, Meg E; Main, Dorrie S; Dhanaraj, Anik L; Rowland, Lisa J; Tomkins, Jeff P

    2008-01-01

    Background The recent development of novel repeat-fruiting types of blackberry (Rubus L.) cultivars, combined with a long history of morphological marker-assisted selection for thornlessness by blackberry breeders, has given rise to increased interest in using molecular markers to facilitate blackberry breeding. Yet no genetic maps, molecular markers, or even sequences exist specifically for cultivated blackberry. The purpose of this study is to begin development of these tools by generating and annotating the first blackberry expressed sequence tag (EST) library, designing primers from the ESTs to amplify regions containing simple sequence repeats (SSR), and testing the usefulness of a subset of the EST-SSRs with two blackberry cultivars. Results A cDNA library of 18,432 clones was generated from expanding leaf tissue of the cultivar Merton Thornless, a progenitor of many thornless commercial cultivars. Among the most abundantly expressed of the 3,000 genes annotated were those involved with energy, cell structure, and defense. From individual sequences containing SSRs, 673 primer pairs were designed. Of a randomly chosen set of 33 primer pairs tested with two blackberry cultivars, 10 detected an average of 1.9 polymorphic PCR products. Conclusion This rate predicts that this library may yield as many as 940 SSR primer pairs detecting 1,786 polymorphisms. This may be sufficient to generate a genetic map that can be used to associate molecular markers with phenotypic traits, making possible molecular marker-assisted breeding to compliment existing morphological marker-assisted breeding in blackberry. PMID:18570660

  1. SPlinted Ligation Adapter Tagging (SPLAT), a novel library preparation method for whole genome bisulphite sequencing

    PubMed Central

    Manlig, Erika; Wahlberg, Per

    2017-01-01

    Abstract Sodium bisulphite treatment of DNA combined with next generation sequencing (NGS) is a powerful combination for the interrogation of genome-wide DNA methylation profiles. Library preparation for whole genome bisulphite sequencing (WGBS) is challenging due to side effects of the bisulphite treatment, which leads to extensive DNA damage. Recently, a new generation of methods for bisulphite sequencing library preparation have been devised. They are based on initial bisulphite treatment of the DNA, followed by adaptor tagging of single stranded DNA fragments, and enable WGBS using low quantities of input DNA. In this study, we present a novel approach for quick and cost effective WGBS library preparation that is based on splinted adaptor tagging (SPLAT) of bisulphite-converted single-stranded DNA. Moreover, we validate SPLAT against three commercially available WGBS library preparation techniques, two of which are based on bisulphite treatment prior to adaptor tagging and one is a conventional WGBS method. PMID:27899585

  2. Gene ontology based characterization of expressed sequence tags (ESTs) of Brassica rapa cv. Osome.

    PubMed

    Arasan, Senthil Kumar Thamil; Park, Jong-In; Ahmed, Nasar Uddin; Jung, Hee-Jeong; Lee, In-Ho; Cho, Yong-Gu; Lim, Yong-Pyo; Kang, Kwon-Kyoo; Nou, Ill-Sup

    2013-07-01

    Chinese cabbage (Brassica rapa) is widely recognized for its economic importance and contribution to human nutrition but abiotic and biotic stresses are main obstacle for its quality, nutritional status and production. In this study, 3,429 Express Sequence Tag (EST) sequences were generated from B. rapa cv. Osome cDNA library and the unique transcripts were classified functionally using a gene ontology (GO) hierarchy, Kyoto encyclopedia of genes and genomes (KEGG). KEGG orthology and the structural domain data were obtained from the biological database for stress related genes (SRG). EST datasets provided a wide outlook of functional characterization of B. rapa cv. Osome. In silico analysis revealed % 83 of ESTs to be well annotated towards reeds one dimensional concept. Clustering of ESTs returned 333 contigs and 2,446 singlets, giving a total of 3,284 putative unigene sequences. This dataset contained 1,017 EST sequences functionally annotated to stress responses and from which expression of randomly selected SRGs were analyzed against cold, salt, drought, ABA, water and PEG stresses. Most of the SRGs showed differentially expression against these stresses. Thus, the EST dataset is very important for discovering the potential genes related to stress resistance in Chinese cabbage, and can be of useful resources for genetic engineering of Brassica sp.

  3. Mining and characterization of sequence tagged microsatellites from the brown planthopper Nilaparvata lugens.

    PubMed

    Sun, Jing-Tao; Zhang, Yan-Kai; Ge, Cheng; Hong, Xiao-Yue

    2011-01-01

    The brown planthopper, Nilaparvata lugens (Stål) (Hemiptera: Delphacidae), is an important pest of rice. To better understand the migration pattern and population structure of the Chinese populations of N. lugens, we developed and characterized 12 polymorphic microsatellites from the expressed sequence tags database of N. lugens. The occurrence of these simple sequence repeats was assessed in three populations collected from three provinces of China. The number of alleles per locus ranged from 3 to 13 with an average of 6.5 alleles per locus. The mean observed heterozygosity of the three populations ranged from 0.051 to 0.772 and the expected heterozygosity ranged from 0.074 to 0.766. The sequences of the 12 markers were highly variable. The polymorphism information content of the 12 markers was high and ranged from 0.074 to 0.807 (mean = 0.503). Sequencing of microsatellite alleles revealed that the fragment length differences were mainly due to the variation of the repeat motif. Significant genetic differentiation was detected among the three N. lugens populations as the Fst ranged from 0.034 to 0.273. Principle coordinates analysis also revealed significant genetic differentiation between populations of different years. We conclude that these microsatellite markers will be a powerful tools to study the migration routine of the N. lugens.

  4. Micropreparative capillary gel electrophoresis of DNA: rapid expressed sequence tag library construction.

    PubMed

    Shi, Liang; Khandurina, Julia; Ronai, Zsolt; Li, Bi-Yu; Kwan, Wai King; Wang, Xun; Guttman, András

    2003-01-01

    A capillary gel electrophoresis based automated DNA fraction collection technique was developed to support a novel DNA fragment-pooling strategy for expressed sequence tag (EST) library construction. The cDNA population is first cleaved by BsaJ I and EcoR I restriction enzymes, and then subpooled by selective ligation with specific adapters followed by polymerase chain reaction (PCR) amplification and labeling. Combination of this cDNA fingerprinting method with high-resolution capillary gel electrophoresis separation and precise fractionation of individual cDNA transcript representatives avoids redundant fragment selection and concomitant repetitive sequencing of abundant transcripts. Using a computer-controlled capillary electrophoresis device the transcript representatives were separated by their size and fractions were automatically collected in every 30 s into 96-well plates. The high resolving power of the sieving matrix ensured sequencing grade separation of the DNA fragments (i.e., single-base resolution) and successful fraction collection. Performance and precision of the fraction collection procedure was validated by PCR amplification of the collected DNA fragments followed by capillary electrophoresis analysis for size and purity verification. The collected and PCR-amplified transcript representatives, ranging up to several hundred base pairs, were then sequenced to create an EST library.

  5. Identification of Simple Sequence Repeat Biomarkers through Cross-Species Comparison in a Tag Cloud Representation

    PubMed Central

    2014-01-01

    Simple sequence repeats (SSRs) are not only applied as genetic markers in evolutionary studies but they also play an important role in gene regulatory activities. Efficient identification of conserved and exclusive SSRs through cross-species comparison is helpful for understanding the evolutionary mechanisms and associations between specific gene groups and SSR motifs. In this paper, we developed an online cross-species comparative system and integrated it with a tag cloud visualization technique for identifying potential SSR biomarkers within fourteen frequently used model species. Ultraconserved or exclusive SSRs among cross-species orthologous genes could be effectively retrieved and displayed through a friendly interface design. Four different types of testing cases were applied to demonstrate and verify the retrieved SSR biomarker candidates. Through statistical analysis and enhanced tag cloud representation on defined functional related genes and cross-species clusters, the proposed system can correctly represent the patterns, loci, colors, and sizes of identified SSRs in accordance with gene functions, pattern qualities, and conserved characteristics among species. PMID:24800246

  6. Peanut (Arachis hypogaea) Expressed Sequence Tag Project: Progress and Application

    PubMed Central

    Feng, Suping; Wang, Xingjun; Zhang, Xinyou; Dang, Phat M.; Holbrook, C. Corley; Culbreath, Albert K.; Wu, Yaoting; Guo, Baozhu

    2012-01-01

    Many plant ESTs have been sequenced as an alternative to whole genome sequences, including peanut because of the genome size and complexity. The US peanut research community had the historic 2004 Atlanta Genomics Workshop and named the EST project as a main priority. As of August 2011, the peanut research community had deposited 252,832 ESTs in the public NCBI EST database, and this resource has been providing the community valuable tools and core foundations for various genome-scale experiments before the whole genome sequencing project. These EST resources have been used for marker development, gene cloning, microarray gene expression and genetic map construction. Certainly, the peanut EST sequence resources have been shown to have a wide range of applications and accomplished its essential role at the time of need. Then the EST project contributes to the second historic event, the Peanut Genome Project 2010 Inaugural Meeting also held in Atlanta where it was decided to sequence the entire peanut genome. After the completion of peanut whole genome sequencing, ESTs or transcriptome will continue to play an important role to fill in knowledge gaps, to identify particular genes and to explore gene function. PMID:22745594

  7. Mining expressed sequence tag (EST) libraries for cancer-associated genes.

    PubMed

    Schmitt, Armin O

    2010-01-01

    Originally established in the beginning of the 1990s as a direct route to gene finding, expressed sequence tags (ESTs) still lend themselves as a means to analyze gene expression in almost all human tissues. The type of questions that can be addressed using public EST libraries ranges from tissue-specific gene profiling to the comparison between tissues in diseased and healthy states. Thanks to a multitude of web-based online bioinformatics resources, mining in EST libraries is not restricted to experts in the field of data analysis, but can readily be performed by the medical or life scientist. In this chapter, a couple of cases studies are presented that guide the scientist to the most useful online resources so that they can conduct their own research.

  8. AGIA Tag System Based on a High Affinity Rabbit Monoclonal Antibody against Human Dopamine Receptor D1 for Protein Analysis

    PubMed Central

    Yano, Tomoya; Takeda, Hiroyuki; Uematsu, Atsushi; Yamanaka, Satoshi; Nomura, Shunsuke; Nemoto, Keiichirou; Iwasaki, Takahiro; Takahashi, Hirotaka; Sawasaki, Tatsuya

    2016-01-01

    Polypeptide tag technology is widely used for protein detection and affinity purification. It consists of two fundamental elements: a peptide sequence and a binder which specifically binds to the peptide tag. In many tag systems, antibodies have been used as binder due to their high affinity and specificity. Recently, we obtained clone Ra48, a high-affinity rabbit monoclonal antibody (mAb) against dopamine receptor D1 (DRD1). Here, we report a novel tag system composed of Ra48 antibody and its epitope sequence. Using a deletion assay, we identified EEAAGIARP in the C-terminal region of DRD1 as the minimal epitope of Ra48 mAb, and we named this sequence the “AGIA” tag, based on its central sequence. The tag sequence does not include the four amino acids, Ser, Thr, Tyr, or Lys, which are susceptible to post-translational modification. We demonstrated performance of this new tag system in biochemical and cell biology applications. SPR analysis demonstrated that the affinity of the Ra48 mAb to the AGIA tag was 4.90 × 10−9 M. AGIA tag showed remarkably high sensitivity and specificity in immunoblotting. A number of AGIA-fused proteins overexpressed in animal and plant cells were detected by anti-AGIA antibody in immunoblotting and immunostaining with low background, and were immunoprecipitated efficiently. Furthermore, a single amino acid substitution of the second Glu to Asp (AGIA/E2D) enabled competitive dissociation of AGIA/E2D-tagged protein by adding wild-type AGIA peptide. It enabled one-step purification of AGIA/E2D-tagged recombinant proteins by peptide competition under physiological conditions. The sensitivity and specificity of the AGIA system makes it suitable for use in multiple methods for protein analysis. PMID:27271343

  9. Strategy for Modular Tagged High-Throughput Amplicon Sequencing

    PubMed Central

    de Cárcer, Daniel Aguirre; Denman, Stuart E.; McSweeney, Chris; Morrison, Mark

    2011-01-01

    The use and validation of a strategy that allows a universal set of bar-coded sequencing primers to be appended to an amplified PCR product is described. The strategy allows a modular approach, in that the same bar code can be used with two or more target-specific primer sets, even simultaneously. PMID:21764953

  10. Analysis of common bean expressed sequence tags identifies sulfur metabolic pathways active in seed and sulfur-rich proteins highly expressed in the absence of phaseolin and major lectins

    PubMed Central

    2011-01-01

    Background A deficiency in phaseolin and phytohemagglutinin is associated with a near doubling of sulfur amino acid content in genetically related lines of common bean (Phaseolus vulgaris), particularly cysteine, elevated by 70%, and methionine, elevated by 10%. This mostly takes place at the expense of an abundant non-protein amino acid, S-methyl-cysteine. The deficiency in phaseolin and phytohemagglutinin is mainly compensated by increased levels of the 11S globulin legumin and residual lectins. Legumin, albumin-2, defensin and albumin-1 were previously identified as contributing to the increased sulfur amino acid content in the mutant line, on the basis of similarity to proteins from other legumes. Results Profiling of free amino acid in developing seeds of the BAT93 reference genotype revealed a biphasic accumulation of gamma-glutamyl-S-methyl-cysteine, the main soluble form of S-methyl-cysteine, with a lag phase occurring during storage protein accumulation. A collection of 30,147 expressed sequence tags (ESTs) was generated from four developmental stages, corresponding to distinct phases of gamma-glutamyl-S-methyl-cysteine accumulation, and covering the transitions to reserve accumulation and dessication. Analysis of gene ontology categories indicated the occurrence of multiple sulfur metabolic pathways, including all enzymatic activities responsible for sulfate assimilation, de novo cysteine and methionine biosynthesis. Integration of genomic and proteomic data enabled the identification and isolation of cDNAs coding for legumin, albumin-2, defensin D1 and albumin-1A and -B induced in the absence of phaseolin and phytohemagglutinin. Their deduced amino acid sequences have a higher content of cysteine than methionine, providing an explanation for the preferential increase of cysteine in the mutant line. Conclusion The EST collection provides a foundation to further investigate sulfur metabolism and the differential accumulation of sulfur amino acids in seed

  11. A survey of canine expressed sequence tags and a display of their annotations through a flexible web-based interface.

    PubMed

    Palmer, L E; O'Shaughnessy, A L; Preston, R R; Santos, L; Balija, V S; Nascimento, L U; Zutavern, T L; Henthorn, P S; Hannon, G J; McCombie, W R

    2003-01-01

    We have initially sequenced approximately 8,000 canine expressed sequence tags (ESTs) from several complementary DNA (cDNA) libraries: testes, whole brain, and Madin-Darby canine kidney (MDCK) cells. Analysis of these sequences shows that they provide partial sequence information for about 5%-10% of the canine genes. An analysis pipeline has been created to cluster the ESTs and to map individual ESTs as well as clustered ESTs to both the human genome and the human proteome. Gene ontology (GO) terms have been assigned to the ESTs and clusters based on their top matches to the International Protein Index (IPI) set of human proteins. The data generated is stored in a MySQL relational database for analysis and display. A Web-based Perl script has been written to display the analyzed data to the scientific community.

  12. Diploid Musa acuminata genetic diversity assayed with sequence-tagged microsatellite sites.

    PubMed

    Grapin, A; Noyer, J L; Carreel, F; Dambier, D; Baurens, F C; Lanaud, C; Lagoda, P J

    1998-06-01

    The sequence-tagged microsatellite site (STMS) discrimination potential was explored using nine microsatellite primer pairs. STMS polymorphism was assayed by nonradioactive urea-polyacrylamide gel electrophoresis. Genetic relationships were examined among 59 genotypes of wild or cultivated accessions of diploid Musa acuminata. The organization of the subspecies was confirmed and some clone relationships were clarified.

  13. Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach

    Treesearch

    D. Lee Taylor; Michael G. Booth; Jack W. McFarland; Ian C. Herriott; Niall J. Lennon; Chad Nusbaum; Thomas G. Marr

    2008-01-01

    High throughput sequencing methods are widely used in analyses of microbial diversity but are generally applied to small numbers of samples, which precludes charaterization of patterns of microbial diversity across space and time. We have designed a primer-tagging approach that allows pooling and subsequent sorting of numerous samples, which is directed to...

  14. Design of Modular Protein Tags for Orthogonal Covalent Bond Formation at Specific DNA Sequences.

    PubMed

    Nguyen, Thang Minh; Nakata, Eiji; Saimura, Masayuki; Dinh, Huyen; Morii, Takashi

    2017-06-28

    Simultaneous formation of specific covalent linkages at nucleotides in given DNA sequences demand distinct orthogonal reactivity of DNA modification agents. Such highly specific reactions require well-balanced reactivity and affinity of the DNA modification agents. Conjugation of a sequence-specific DNA binding zinc finger protein and a self-ligating protein tag provides a modular adaptor that expedites formation of a covalent bond between the protein tag and a substrate-modified nucleotide at a specific DNA sequence. The modular adaptor stably locates a protein of interest fused to it at the target position on DNA scaffold in its functional form. Modular adaptors with orthogonal selectivity and fast reaction kinetics to specific DNA sequences enable site-specific location of different protein molecules simultaneously. Three different modular adaptors consisting of zinc finger proteins with distinct DNA sequence specificities and self-ligating protein tags with different substrate specificities achieved orthogonal covalent bond formation at respective sequences on the same DNA scaffold with an overall coassembly yield over 90%. Application of this unique set of orthogonal modular adaptors enabled construction of a cascade reaction of three enzymes from xylose metabolic pathway on DNA scaffold.

  15. Intraclade Heterogeneity in Nitrogen Utilization by Marine Prokaryotes Revealed Using Stable Isotope Probing Coupled with Tag Sequencing (Tag-SIP).

    PubMed

    Morando, Michael; Capone, Douglas G

    2016-01-01

    Nitrogen can greatly influence the structure and productivity of microbial communities through its relative availability and form. However, the roles of specific organisms in the uptake of different nitrogen species remain poorly characterized. Most studies seeking to identify agents of assimilation have been correlative, indirectly linking activity measurements (e.g., nitrate uptake) with the presence or absence of biological markers, particularly functional genes and their transcripts. Evidence is accumulating of previously underappreciated functional diversity in major microbial subpopulations, which may confer physiological advantages under certain environmental conditions leading to ecotype divergence. This microdiversity further complicates our view of genetic variation in environmental samples requiring the development of more targeted approaches. Here, next-generation tag sequencing was successfully coupled with stable isotope probing (Tag-SIP) to assess the ability of individual phylotypes to assimilate a specific N source. Our results provide the first direct evidence of nitrate utilization by organisms thought to lack the genes required for this process including the heterotrophic clades SAR11 and the Archaeal Marine Group II. Alternatively, this may suggest the existence of tightly coupled metabolisms with primary assimilators, e.g., symbiosis, or the rapid and efficient scavenging of recently released products by highly active individuals. These results may be connected with global dominance often seen with these clades, likely conferring an advantage over other clades unable to access these resources. We also provide new direct evidence of in situ nitrate utilization by the cyanobacterium Prochlorococcus in support of recent findings. Furthermore, these results revealed widespread functional heterogeneity, i.e., different levels of nitrogen assimilation within clades, likely reflecting niche partitioning by ecotypes.

  16. Intraclade Heterogeneity in Nitrogen Utilization by Marine Prokaryotes Revealed Using Stable Isotope Probing Coupled with Tag Sequencing (Tag-SIP)

    PubMed Central

    Morando, Michael; Capone, Douglas G.

    2016-01-01

    Nitrogen can greatly influence the structure and productivity of microbial communities through its relative availability and form. However, the roles of specific organisms in the uptake of different nitrogen species remain poorly characterized. Most studies seeking to identify agents of assimilation have been correlative, indirectly linking activity measurements (e.g., nitrate uptake) with the presence or absence of biological markers, particularly functional genes and their transcripts. Evidence is accumulating of previously underappreciated functional diversity in major microbial subpopulations, which may confer physiological advantages under certain environmental conditions leading to ecotype divergence. This microdiversity further complicates our view of genetic variation in environmental samples requiring the development of more targeted approaches. Here, next-generation tag sequencing was successfully coupled with stable isotope probing (Tag-SIP) to assess the ability of individual phylotypes to assimilate a specific N source. Our results provide the first direct evidence of nitrate utilization by organisms thought to lack the genes required for this process including the heterotrophic clades SAR11 and the Archaeal Marine Group II. Alternatively, this may suggest the existence of tightly coupled metabolisms with primary assimilators, e.g., symbiosis, or the rapid and efficient scavenging of recently released products by highly active individuals. These results may be connected with global dominance often seen with these clades, likely conferring an advantage over other clades unable to access these resources. We also provide new direct evidence of in situ nitrate utilization by the cyanobacterium Prochlorococcus in support of recent findings. Furthermore, these results revealed widespread functional heterogeneity, i.e., different levels of nitrogen assimilation within clades, likely reflecting niche partitioning by ecotypes. PMID:27994576

  17. Expressed sequence tags reveal genetic diversity and putative virulence factors of the pathogenic oomycete Pythium insidiosum.

    PubMed

    Krajaejun, Theerapong; Khositnithikul, Rommanee; Lerksuthirat, Tassanee; Lowhnoo, Tassanee; Rujirawat, Thidarat; Petchthong, Thanom; Yingyong, Wanta; Suriyaphol, Prapat; Smittipat, Nat; Juthayothin, Tada; Phuntumart, Vipaporn; Sullivan, Thomas D

    2011-07-01

    Oomycetes are unique eukaryotic microorganisms that share a mycelial morphology with fungi. Many oomycetes are pathogenic to plants, and a more limited number are pathogenic to animals. Pythium insidiosum is the only oomycete that is capable of infecting both humans and animals, and causes a life-threatening infectious disease, called "pythiosis". In the majority of pythiosis patients life-long handicaps result from the inevitable radical excision of infected organs, and many die from advanced infection. Better understanding P. insidiosum pathogenesis at molecular levels could lead to new forms of treatment. Genetic and genomic information is lacking for P. insidiosum, so we have undertaken an expressed sequence tag (EST) study, and report on the first dataset of 486 ESTs, assembled into 217 unigenes. Of these, 144 had significant sequence similarity with known genes, including 47 with ribosomal protein homology. Potential virulence factors included genes involved in antioxidation, thermal adaptation, immunomodulation, and iron and sterol binding. Effectors resembling pathogenicity factors of plant-pathogenic oomycetes were also discovered, such as, a CBEL-like protein (possible involvement in host cell adhesion and hemagglutination), a putative RXLR effector (possibly involved in host cell modulation) and elicitin-like (ELL) proteins. Phylogenetic analysis mapped P. insidiosum ELLs to several novel clades of oomycete elicitins (ELIs), and homology modeling predicted that P. insidiosum ELLs should bind sterols. Most of the P. insidiosum ESTs showed homology to sequences in the genome or EST databases of other oomycetes, but one putative gene, with unknown function, was found to be unique to P. insidiosum. The EST dataset reported here represents the first steps in identifying genes of P. insidiosum and beginning transcriptome analysis. This genetic information will facilitate understanding of pathogenic mechanisms of this devastating pathogen. Copyright © 2011 The

  18. Velocity measurement of clay intrusion through a sudden contraction step using a tagging pulse sequence.

    PubMed

    Tsushima, Shohji; Hasegawa, Atsushi; Suekane, Tetsuya; Hirai, Shuichiro; Tanaka, Yoshihiro; Nakasuji, Yoshizumi

    2003-07-01

    Magnetic resonance imaging (MRI) with a spatial tagging sequence was used to measure the velocity distribution of clay that was forced past a sudden contraction. A spatial tagging sequence provided magnetic resonance images of clay that allowed measurement of the velocity distribution in the clay, which can provide profound insights on the deformation process of clay during the intrusion process. The experiments were conducted using a specially-designed vessel that could operate at up to 30 MPa. The vessel offers a rectangle test section with a sudden contraction step that had a ratio of contraction of 2:1. The vessel was installed into a commercial magnetic resonance imaging equipment and then the fluid motion of clay flowing into the narrow contracted channel was quantitatively investigated to examine behaviors of flowing clay as non-Newtonian fluid. MRI results are compared with those obtained by computational fluid dynamics (CFD) calculation. Velocity distributions obtained from each tag displacement did not well agree with those predicted by CFD results near the contraction step where the fluid accelerated rapidly. However, a post-processing on calculation results, in which virtual tag displacement is calculated, gave better agreement with experiment and enabled us to compare MRI results with CFD results.

  19. Genetic Linkage Maps of the Red Flour Beetle, Tribolium castaneum, Based on Bacterial Artificial Chromosomes and Expressed Sequence Tags

    PubMed Central

    Lorenzen, Marcé D.; Doyungan, Zaldy; Savard, Joel; Snow, Kathy; Crumly, Lindsey R.; Shippy, Teresa D.; Stuart, Jeffrey J.; Brown, Susan J.; Beeman, Richard W.

    2005-01-01

    A genetic linkage map was constructed in a backcross family of the red flour beetle, Tribolium castaneum, based largely on sequences from bacterial artificial chromosome (BAC) ends and untranslated regions from random cDNA's. In most cases, dimorphisms were detected using heteroduplex or single-strand conformational polymorphism analysis after specific PCR amplification. The map incorporates a total of 424 markers, including 190 BACs and 165 cDNA's, as well as 69 genes, transposon insertion sites, sequence-tagged sites, microsatellites, and amplified fragment-length polymorphisms. Mapped loci are distributed along 571 cM, spanning all 10 linkage groups at an average marker separation of 1.3 cM. This genetic map provides a framework for positional cloning and a scaffold for integration of the emerging physical map and genome sequence assembly. The map and corresponding sequences can be accessed through BeetleBase (http://www.bioinformatics.ksu.edu/BeetleBase/). PMID:15834150

  20. Multiplexed metagenome mining using short DNA sequence tags facilitates targeted discovery of epoxyketone proteasome inhibitors.

    PubMed

    Owen, Jeremy G; Charlop-Powers, Zachary; Smith, Alexandra G; Ternei, Melinda A; Calle, Paula Y; Reddy, Boojala Vijay B; Montiel, Daniel; Brady, Sean F

    2015-04-07

    In molecular evolutionary analyses, short DNA sequences are used to infer phylogenetic relationships among species. Here we apply this principle to the study of bacterial biosynthesis, enabling the targeted isolation of previously unidentified natural products directly from complex metagenomes. Our approach uses short natural product sequence tags derived from conserved biosynthetic motifs to profile biosynthetic diversity in the environment and then guide the recovery of gene clusters from metagenomic libraries. The methodology is conceptually simple, requires only a small investment in sequencing, and is not computationally demanding. To demonstrate the power of this approach to natural product discovery we conducted a computational search for epoxyketone proteasome inhibitors within 185 globally distributed soil metagenomes. This led to the identification of 99 unique epoxyketone sequence tags, falling into 6 phylogenetically distinct clades. Complete gene clusters associated with nine unique tags were recovered from four saturating soil metagenomic libraries. Using heterologous expression methodologies, seven potent epoxyketone proteasome inhibitors (clarepoxcins A-E and landepoxcins A and B) were produced from these pathways, including compounds with different warhead structures and a naturally occurring halohydrin prodrug. This study provides a template for the targeted expansion of bacterially derived natural products using the global metagenome.

  1. Sub-wavelength plasmonic readout for direct linear analysis of optically tagged DNA

    NASA Astrophysics Data System (ADS)

    Varsanik, Jonathan; Teynor, William; LeBlanc, John; Clark, Heather; Krogmeier, Jeffrey; Yang, Tian; Crozier, Kenneth; Bernstein, Jonathan

    2010-02-01

    This work describes the development and fabrication of a novel nanofluidic flow-through sensing chip that utilizes a plasmonic resonator to excite fluorescent tags with sub-wavelength resolution. We cover the design of the microfluidic chip and simulation of the plasmonic resonator using Finite Difference Time Domain (FDTD) software. The fabrication methods are presented, with testing procedures and preliminary results. This research is aimed at improving the resolution limits of the Direct Linear Analysis (DLA) technique developed by US Genomics [1]. In DLA, intercalating dyes which tag a specific 8 base-pair sequence are inserted in a DNA sample. This sample is pumped though a nano-fluidic channel, where it is stretched into a linear geometry and interrogated with light which excites the fluorescent tags. The resulting sequence of optical pulses produces a characteristic "fingerprint" of the sample which uniquely identifies any sample of DNA. Plasmonic confinement of light to a 100 nm wide metallic nano-stripe enables resolution of a higher tag density compared to free space optics. Prototype devices have been fabricated and are being tested with fluorophore solutions and tagged DNA. Preliminary results show evanescent coupling to the plasmonic resonator is occurring with 0.1 micron resolution, however light scattering limits the S/N of the detector. Two methods to reduce scattered light are presented: index matching and curved waveguides.

  2. Job optimization in ATLAS TAG-based distributed analysis

    NASA Astrophysics Data System (ADS)

    Mambelli, M.; Cranshaw, J.; Gardner, R.; Maeno, T.; Malon, D.; Novak, M.

    2010-04-01

    The ATLAS experiment is projected to collect over one billion events/year during the first few years of operation. The efficient selection of events for various physics analyses across all appropriate samples presents a significant technical challenge. ATLAS computing infrastructure leverages the Grid to tackle the analysis across large samples by organizing data into a hierarchical structure and exploiting distributed computing to churn through the computations. This includes events at different stages of processing: RAW, ESD (Event Summary Data), AOD (Analysis Object Data), DPD (Derived Physics Data). Event Level Metadata Tags (TAGs) contain information about each event stored using multiple technologies accessible by POOL and various web services. This allows users to apply selection cuts on quantities of interest across the entire sample to compile a subset of events that are appropriate for their analysis. This paper describes new methods for organizing jobs using the TAGs criteria to analyze ATLAS data. It further compares different access patterns to the event data and explores ways to partition the workload for event selection and analysis. Here analysis is defined as a broader set of event processing tasks including event selection and reduction operations ("skimming", "slimming" and "thinning") as well as DPD making. Specifically it compares analysis with direct access to the events (AOD and ESD data) to access mediated by different TAG-based event selections. We then compare different ways of splitting the processing to maximize performance.

  3. Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee.

    PubMed

    Whitfield, Charles W; Band, Mark R; Bonaldo, Maria F; Kumar, Charu G; Liu, Lei; Pardinas, Jose R; Robertson, Hugh M; Soares, M Bento; Robinson, Gene E

    2002-04-01

    To accelerate the molecular analysis of behavior in the honey bee (Apis mellifera), we created expressed sequence tag (EST) and cDNA microarray resources for the bee brain. Over 20,000 cDNA clones were partially sequenced from a normalized (and subsequently subtracted) library generated from adult A. mellifera brains. These sequences were processed to identify 15,311 high-quality ESTs representing 8912 putative transcripts. Putative transcripts were functionally annotated (using the Gene Ontology classification system) based on matching gene sequences in Drosophila melanogaster. The brain ESTs represent a broad range of molecular functions and biological processes, with neurobiological classifications particularly well represented. Roughly half of Drosophila genes currently implicated in synaptic transmission and/or behavior are represented in the Apis EST set. Of Apis sequences with open reading frames of at least 450 bp, 24% are highly diverged with no matches to known protein sequences. Additionally, over 100 Apis transcript sequences conserved with other organisms appear to have been lost from the Drosophila genome. DNA microarrays were fabricated with over 7000 EST cDNA clones putatively representing different transcripts. Using probe derived from single bee brain mRNA, microarrays detected gene expression for 90% of Apis cDNAs two standard deviations greater than exogenous control cDNAs. [The sequence data described in this paper have been submitted to Genbank data library under accession nos. BI502708-BI517278. The sequences are also available at http://titan.biotec.uiuc.edu/bee/honeybee_project.htm.

  4. Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data.

    PubMed

    Batley, Jacqueline; Barker, Gary; O'Sullivan, Helen; Edwards, Keith J; Edwards, David

    2003-05-01

    We have developed a computer based method to identify candidate single nucleotide polymorphisms (SNPs) and small insertions/deletions from expressed sequence tag data. Using a redundancy-based approach, valid SNPs are distinguished from erroneous sequence by their representation multiple times in an alignment of sequence reads. A second measure of validity was also calculated based on the cosegregation of the SNP pattern between multiple SNP loci in an alignment. The utility of this method was demonstrated by applying it to 102,551 maize (Zea mays) expressed sequence tag sequences. A total of 14,832 candidate polymorphisms were identified with an SNP redundancy score of two or greater. Segregation of these SNPs with haplotype indicates that candidate SNPs with high redundancy and cosegregation confidence scores are likely to represent true SNPs. This was confirmed by validation of 264 candidate SNPs from 27 loci, with a range of redundancy and cosegregation scores, in four inbred maize lines. The SNP transition/transversion ratio and insertion/deletion size frequencies correspond to those observed by direct sequencing methods of SNP discovery and suggest that the majority of predicted SNPs and insertion/deletions identified using this approach represent true genetic variation in maize.

  5. Evaluation of anonymous and expressed sequence tag derived polymorphic microsatellite markers in the tobacco budworm Heliothis virescens (Lepidoptera: noctuidae)

    USDA-ARS?s Scientific Manuscript database

    Polymorphic genetic markers were identified and characterized using a partial genomic library of Heliothis virescens enriched for simple sequence repeats (SSR) and nucleotide sequences of expressed sequence tags (EST). Nucleotide sequences of 192 clones from the partial genomic library yielded 147 u...

  6. Development, characterization and cross species amplification of polymorphic microsatellite markers from expressed sequence tags of turmeric (Curcuma longa L.).

    PubMed

    Siju, S; Dhanya, K; Syamkumar, S; Sasikumar, B; Sheeja, T E; Bhat, A I; Parthasarathy, V A

    2010-02-01

    Expressed sequence tags (ESTs) from turmeric (Curcuma longa L.) were used for the screening of type and frequency of Class I (hypervariable) simple sequence repeats (SSRs). A total of 231 microsatellite repeats were detected from 12,593 EST sequences of turmeric after redundancy elimination. The average density of Class I SSRs accounts to one SSR per 17.96 kb of EST. Mononucleotides were the most abundant class of microsatellite repeat in turmeric ESTs followed by trinucleotides. A robust set of 17 polymorphic EST-SSRs were developed and used for evaluating 20 turmeric accessions. The number of alleles detected ranged from 3 to 8 per loci. The developed markers were also evaluated in 13 related species of C. longa confirming high rate (100%) of cross species transferability. The polymorphic microsatellite markers generated from this study could be used for genetic diversity analysis and resolving the taxonomic confusion prevailing in the genus.

  7. Identification of SNP and SSR markers in eggplant using RAD tag sequencing

    PubMed Central

    2011-01-01

    Background The eggplant (Solanum melongena L.) genome is relatively unexplored, especially compared to those of the other major Solanaceae crops tomato and potato. In particular, no SNP markers are publicly available; on the other hand, over 1,000 SSR markers were developed and publicly available. We have combined the recently developed Restriction-site Associated DNA (RAD) approach with Illumina DNA sequencing for rapid and mass discovery of both SNP and SSR markers for eggplant. Results RAD tags were generated from the genomic DNA of a pair of eggplant mapping parents, and sequenced to produce ~17.5 Mb of sequences arrangeable into ~78,000 contigs. The resulting non-redundant genomic sequence dataset consisted of ~45,000 sequences, of which ~29% were putative coding sequences and ~70% were in common between the mapping parents. The shared sequences allowed the discovery of ~10,000 SNPs and nearly 1,000 indels, equivalent to a SNP frequency of 0.8 per Kb and an indel frequency of 0.07 per Kb. Over 2,000 of the SNPs are likely to be mappable via the Illumina GoldenGate assay. A subset of 384 SNPs was used to successfully fingerprint a panel of eggplant germplasm, producing a set of informative diversity data. The RAD sequences also included nearly 2,000 putative SSRs, and primer pairs were designed to amplify 1,155 loci. Conclusion The high throughput sequencing of the RAD tags allowed the discovery of a large number of DNA markers, which will prove useful for extending our current knowledge of the genome organization of eggplant, for assisting in marker-aided selection and for carrying out comparative genomic analyses within the Solanaceae family. PMID:21663628

  8. Development of expressed sequence tag and expressed sequence tag–simple sequence repeat marker resources for Musa acuminata

    PubMed Central

    Passos, Marco A. N.; de Oliveira Cruz, Viviane; Emediato, Flavia L.; de Camargo Teixeira, Cristiane; Souza, Manoel T.; Matsumoto, Takashi; Rennó Azevedo, Vânia C.; Ferreira, Claudia F.; Amorim, Edson P.; de Alencar Figueiredo, Lucio Flavio; Martins, Natalia F.; de Jesus Barbosa Cavalcante, Maria; Baurens, Franc-Christophe; da Silva, Orzenil Bonfim; Pappas, Georgios J.; Pignolet, Luc; Abadie, Catherine; Ciampi, Ana Y.; Piffanelli, Pietro; Miller, Robert N. G.

    2012-01-01

    Background and aims Banana (Musa acuminata) is a crop contributing to global food security. Many varieties lack resistance to biotic stresses, due to sterility and narrow genetic background. The objective of this study was to develop an expressed sequence tag (EST) database of transcripts expressed during compatible and incompatible banana–Mycosphaerella fijiensis (Mf) interactions. Black leaf streak disease (BLSD), caused by Mf, is a destructive disease of banana. Microsatellite markers were developed as a resource for crop improvement. Methodology cDNA libraries were constructed from in vitro-infected leaves from BLSD-resistant M. acuminata ssp. burmaniccoides Calcutta 4 (MAC4) and susceptible M. acuminata cv. Cavendish Grande Naine (MACV). Clones were 5′-end Sanger sequenced, ESTs assembled with TGICL and unigenes annotated using BLAST, Blast2GO and InterProScan. Mreps was used to screen for simple sequence repeats (SSRs), with markers evaluated for polymorphism using 20 diploid (AA) M. acuminata accessions contrasting in resistance to Mycosphaerella leaf spot diseases. Principal results A total of 9333 high-quality ESTs were obtained for MAC4 and 3964 for MACV, which assembled into 3995 unigenes. Of these, 2592 displayed homology to genes encoding proteins with known or putative function, and 266 to genes encoding proteins with unknown function. Gene ontology (GO) classification identified 543 GO terms, 2300 unigenes were assigned to EuKaryotic orthologous group categories and 312 mapped to Kyoto Encyclopedia of Genes and Genomes pathways. A total of 624 SSR loci were identified, with trinucleotide repeat motifs the most abundant in MAC4 (54.1 %) and MACV (57.6 %). Polymorphism across M. acuminata accessions was observed with 75 markers. Alleles per polymorphic locus ranged from 2 to 8, totalling 289. The polymorphism information content ranged from 0.08 to 0.81. Conclusions This EST collection offers a resource for studying functional genes, including

  9. 2058 Expressed sequence tags (ESTs) from a human fetal lung cDNA library

    SciTech Connect

    Kazunori, Sudo |; Katsuya Chinen; Yusuke Nakamura

    1994-11-15

    ESTs (expressed sequence tags) provide complementary resources for structural and functional analyses of the human genome. The authors have performed single-pass sequencing of 2058 randomly selected, directionally cloned cDNAs isolated from a fetal-lung cDNA library constructed with oligo (dT) primers. Computer analyses of the 5{prime}-end sequences revealed that 60.4% of the clones were considered to be identical to previously reported human genes or ESTs; 9.0% of them showed significant homology to known genes in human, other mammals, or lower organisms; 30.6% showed no homology to any genes or DNA sequences in the public database. These data and reagents will be useful for future investigations of gene expression during prenatal development of human lung. 11 refs., 1 fig., 2 tabs.

  10. Large-scale detection and application of expressed sequence tag single nucleotide polymorphisms in Nicotiana.

    PubMed

    Wang, Y; Zhou, D; Wang, S; Yang, L

    2015-07-14

    Single nucleotide polymorphisms (SNPs) are widespread in the Nicotiana genome. Using an alignment and variation detection method, we developed 20,607,973 SNPs, based on the expressed sequence tag sequences of 10 Nicotiana species. The replacement rate was much higher than the transversion rate in the SNPs, and SNPs widely exist in the Nicotiana. In vitro verification indicated that all of the SNPs were high quality and accurate. Evolutionary relationships between 15 varieties were investigated by polymerase chain reaction with a special primer; the specific 302 locus of these sequence results clearly indicated the origin of Zhongyan 100. A database of Nicotiana SNPs (NSNP) was developed to store and search for SNPs in Nicotiana. NSNP is a tool for researchers to develop SNP markers of sequence data.

  11. A physical map of the X chromosome of Drosophila melanogaster: Cosmid contigs and sequence tagged sites

    SciTech Connect

    Madueno, E.; Modolell, J.; Papagiannakis, G.

    1995-04-01

    A physical map of the euchromatic X chromosome of Drosophila melanogaster has been constructed by assembling contiguous arrays of cosmids that were selected by screening a library with DNA isolated from microamplified chromosomal divisions. This map, consisting of 893 cosmids, covers {approximately}64% of the euchromatic part of the chromosome. In addition, 568 sequence tagged sites (STS), in aggregate representing 120 kb of sequenced DNA, were derived from selected cosmids. Most of these STSs, spaced at an average distance of {approximately} 35 kb along the euchromatic region of the chromosome, represent DNA tags that can be used as entry points to the fruitfly genome. Furthermore, 42 genes have been placed on the physical map, either through the hybridization of specific probes to the cosmids or through the fact that they were represented among the STSs. These provide a link between the physical and the genetic maps of D. melanogaster. Nine novel genes have been tentatively identified in Drosophila on the basis of matches between STS sequences and sequences from other species. 32 refs., 3 figs., 4 tabs.

  12. A Physical Map of the X Chromosome of Drosophila Melanogaster: Cosmid Contigs and Sequence Tagged Sites

    PubMed Central

    Madueno, E.; Papagiannakis, G.; Rimmington, G.; Saunders, RDC.; Savakis, C.; Siden-Kiamos, I.; Skavdis, G.; Spanos, L.; Trenear, J.; Adam, P.; Ashburner, M.; Benos, P.; Bolshakov, V. N.; Coulson, D.; Glover, D. M.; Herrmann, S.; Kafatos, F. C.; Louis, C.; Majerus, T.; Modolell, J.

    1995-01-01

    A physical map of the euchromatic X chromosome of Drosophila melanogaster has been constructed by assembling contiguous arrays of cosmids that were selected by screening a library with DNA isolated from microamplified chromosomal divisions. This map, consisting of 893 cosmids, covers ~64% of the euchromatic part of the chromosome. In addition, 568 sequence tagged sites (STS), in aggregate representing 120 kb of sequenced DNA, were derived from selected cosmids. Most of these STSs, spaced at an average distance of ~35 kb along the euchromatic region of the chromosome, represent DNA tags that can be used as entry points to the fruitfly genome. Furthermore, 42 genes have been placed on the physical map, either through the hybridization of specific probes to the cosmids or through the fact that they were represented among the STSs. These provide a link between the physical and the genetic maps of D. melanogaster. Nine novel genes have been tentatively identified in Drosophila on the basis of matches between STS sequences and sequences from other species. PMID:7789765

  13. De novo sequencing of unique sequence tags for discovery of post-translational modifications of proteins

    SciTech Connect

    Shen, Yufeng; Tolic, Nikola; Hixson, Kim K.; Purvine, Samuel O.; Anderson, Gordon A.; Smith, Richard D.

    2008-10-15

    De novo sequencing has a promise to discover the protein post-translation modifications; however, such approach is still in their infancy and not widely applied for proteomics practices due to its limited reliability. In this work, we describe a de novo sequencing approach for discovery of protein modifications through identification of the UStags (Anal. Chem. 2008, 80, 1871-1882). The de novo information was obtained from Fourier-transform tandem mass spectrometry for peptides and polypeptides in a yeast lysate, and the de novo sequences obtained were filtered to define a more limited set of UStags. The DNA-predicted database protein sequences were then compared to the UStags, and the differences observed across or in the UStags (i.e., the UStags’ prefix and suffix sequences and the UStags themselves) were used to infer the possible sequence modifications. With this de novo-UStag approach, we uncovered some unexpected variances of yeast protein sequences due to amino acid mutations and/or multiple modifications to the predicted protein sequences. Random matching of the de novo sequences to the predicted sequences were examined with use of two random (false) databases, and ~3% false discovery rates were estimated for the de novo-UStag approach. The factors affecting the reliability (e.g., existence of de novo sequencing noise residues and redundant sequences) and the sensitivity are described. The de novo-UStag complements the UStag method previously reported by enabling discovery of new protein modifications.

  14. Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus.

    PubMed

    Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

    2012-01-01

    Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function.

  15. Contamination of cDNA libraries and expressed sequence-tags databases

    SciTech Connect

    Dean, M.; Allikmets, R.

    1995-11-01

    Partially sequenced cDNAs, or expressed sequence tags (ESTs), are claimed to represent an efficient strategy for characterizing an organism`s genes. By necessity, these sequences are incompletely characterized, and examples of contamination of cDNA libraries with sequences from other species have been described. It has been suggested that a Human T-cell cDNA library (Clontech HL1963g) is contaminated by sequences from yeast (Saccharomyces cerevisiae) and an unknown bacterium. We are characterizing human ESTs that represent new members of the ATP-binding cassette transporter super-family. In examining human ESTs generated from the T-cell library, we have encountered one gene that was in fact a yeast sequence (Genbank Z15214 = SSH2 locus) and several genes that do not hybridize to human DNA or RNA. PCR primers from these sequences failed to amplify a product from human, yeast, or Escherichia coli DNA but did produce a product from a Clontech kidney cDNA library (HL1123a). To determine the source of the contamination, we amplified a conserved segment of the 16S rDNA (following a suggestion from Dr. C. Savakis) from the kidney library. The sequence of this product was nearly identical to that of the bacterium Leuconostoc lactis (300 of 304 bp). Leuconostoc species are commonly found in dairy products, fruits, vegetables, and wine and are nonpathogenic to humans. 6 refs., 1 fig.

  16. Development of expressed sequence tag-simple sequence repeat markers for Chrysanthemum morifolium and closely related species.

    PubMed

    Liu, H; Zhang, Q X; Sun, M; Pan, H T; Kong, Z X

    2015-07-13

    With the development of chrysanthemum breeding in recent years, an increasing number of wild species in genera related to Chrysanthemum were introduced to extend the genetic resources and facilitate the genetic improvement of chrysanthemums via hybridization. However, few simple sequence repeat (SSR) markers are available for marker-assisted breeding and population genetic studies of chrysanthemum and closely related species. Expressed sequence tags (ESTs) in public databases and cross-species transferable markers are considered to be a cost-effective means for developing sequence-based markers. In this study, 25 EST-SSRs were successfully developed from Chrysanthemum EST sequences for Chrysanthemum morifolium and closely related species. In total, 4164 unigene sequences were assembled from 7180 ESTs of chrysanthemum in GenBank, which were subsequently used to screen for the presence of microsatellites with the SSRIT software. The screening criteria were 8, 5, 4, and 3 repeating units for di-, tri-, tetra-, and penta- and higher-order nucleotides, respectively. Moreover, 310 SSR loci from 296 sequences were identified, and 198 primer pairs for SSR amplification were designed with the Primer Premier 5.0 software, of which 25 SSR loci showed polymorphic amplification in 52 species and varieties belonging to Chrysanthemum, Ajania, and Opisthopappus. The application of EST-SSR markers to the identification of intergeneric hybrids between Chrysanthemum and Ajania was demonstrated. Therefore, EST-SSRs can be developed for species that lack gene sequences or ESTs by utilizing ESTs of closely related species.

  17. Development of microsatellite markers in the tetraploid fern Ceratopteris thalictroides (Parkeriaceae) using RAD tag sequencing.

    PubMed

    Yang, X Y; Long, Z C; Gichira, A W; Guo, Y H; Wang, Q F; Chen, J M

    2016-02-19

    To understand the genetic variability of the tetraploid fern Ceratopteris thalictroides (Parkeriaceae), we described 30 polymorphic microsatellite markers obtained using the restriction site-associated DNA (RAD) tag sequencing technique. A total of 26 individuals were genotyped for each marker. The number of alleles per locus ranged from 4 to 10, and the expected heterozygosity and the Shannon-Wiener index ranged from 0.264 to 0.852 and 0.676 to 2.032, respectively. Because these 30 microsatellite markers exhibit high degrees of genetic variation, they will be useful tools for studying the adaptive genetic variation and sustainable conservation of C. thalictroides.

  18. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array

    PubMed Central

    Fuller, Carl W.; Kumar, Shiv; Porel, Mintu; Chien, Minchen; Bibillo, Arek; Stranges, P. Benjamin; Dorwart, Michael; Tao, Chuanjuan; Li, Zengmin; Guo, Wenjing; Shi, Shundi; Korenblum, Daniel; Trans, Andrew; Aguirre, Anne; Liu, Edward; Harada, Eric T.; Pollard, James; Bhat, Ashwini; Cech, Cynthia; Yang, Alexander; Arnold, Cleoma; Palla, Mirkó; Hovis, Jennifer; Chen, Roger; Morozova, Irina; Kalachikov, Sergey; Russo, James J.; Kasianowicz, John J.; Davis, Randy; Roever, Stefan; Church, George M.; Ju, Jingyue

    2016-01-01

    DNA sequencing by synthesis (SBS) offers a robust platform to decipher nucleic acid sequences. Recently, we reported a single-molecule nanopore-based SBS strategy that accurately distinguishes four bases by electronically detecting and differentiating four different polymer tags attached to the 5′-phosphate of the nucleotides during their incorporation into a growing DNA strand catalyzed by DNA polymerase. Further developing this approach, we report here the use of nucleotides tagged at the terminal phosphate with oligonucleotide-based polymers to perform nanopore SBS on an α-hemolysin nanopore array platform. We designed and synthesized several polymer-tagged nucleotides using tags that produce different electrical current blockade levels and verified they are active substrates for DNA polymerase. A highly processive DNA polymerase was conjugated to the nanopore, and the conjugates were complexed with primer/template DNA and inserted into lipid bilayers over individually addressable electrodes of the nanopore chip. When an incoming complementary-tagged nucleotide forms a tight ternary complex with the primer/template and polymerase, the tag enters the pore, and the current blockade level is measured. The levels displayed by the four nucleotides tagged with four different polymers captured in the nanopore in such ternary complexes were clearly distinguishable and sequence-specific, enabling continuous sequence determination during the polymerase reaction. Thus, real-time single-molecule electronic DNA sequencing data with single-base resolution were obtained. The use of these polymer-tagged nucleotides, combined with polymerase tethering to nanopores and multiplexed nanopore sensors, should lead to new high-throughput sequencing methods. PMID:27091962

  19. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array.

    PubMed

    Fuller, Carl W; Kumar, Shiv; Porel, Mintu; Chien, Minchen; Bibillo, Arek; Stranges, P Benjamin; Dorwart, Michael; Tao, Chuanjuan; Li, Zengmin; Guo, Wenjing; Shi, Shundi; Korenblum, Daniel; Trans, Andrew; Aguirre, Anne; Liu, Edward; Harada, Eric T; Pollard, James; Bhat, Ashwini; Cech, Cynthia; Yang, Alexander; Arnold, Cleoma; Palla, Mirkó; Hovis, Jennifer; Chen, Roger; Morozova, Irina; Kalachikov, Sergey; Russo, James J; Kasianowicz, John J; Davis, Randy; Roever, Stefan; Church, George M; Ju, Jingyue

    2016-05-10

    DNA sequencing by synthesis (SBS) offers a robust platform to decipher nucleic acid sequences. Recently, we reported a single-molecule nanopore-based SBS strategy that accurately distinguishes four bases by electronically detecting and differentiating four different polymer tags attached to the 5'-phosphate of the nucleotides during their incorporation into a growing DNA strand catalyzed by DNA polymerase. Further developing this approach, we report here the use of nucleotides tagged at the terminal phosphate with oligonucleotide-based polymers to perform nanopore SBS on an α-hemolysin nanopore array platform. We designed and synthesized several polymer-tagged nucleotides using tags that produce different electrical current blockade levels and verified they are active substrates for DNA polymerase. A highly processive DNA polymerase was conjugated to the nanopore, and the conjugates were complexed with primer/template DNA and inserted into lipid bilayers over individually addressable electrodes of the nanopore chip. When an incoming complementary-tagged nucleotide forms a tight ternary complex with the primer/template and polymerase, the tag enters the pore, and the current blockade level is measured. The levels displayed by the four nucleotides tagged with four different polymers captured in the nanopore in such ternary complexes were clearly distinguishable and sequence-specific, enabling continuous sequence determination during the polymerase reaction. Thus, real-time single-molecule electronic DNA sequencing data with single-base resolution were obtained. The use of these polymer-tagged nucleotides, combined with polymerase tethering to nanopores and multiplexed nanopore sensors, should lead to new high-throughput sequencing methods.

  20. Evaluation of cleaved amplified polymorphic sequence markers for Chamaecyparis obtusa based on expressed sequence tag information from Cryptomeria japonica.

    PubMed

    Matsumoto, A; Tsumura, Y

    2004-12-01

    We have developed and evaluated sequence-tagged site (STS) primers based on expressed sequence-tag information derived from sugi (Cryptomeria japonica) for use in hinoki (Chamaecyparis obtusa), a species that belongs to a different family (although it appears to be fairly closely related to sugi). Of the 417 C. japonica STS primer pairs we screened, 120 (approximately 30%) were transferable and provided specific PCR amplification products from 16 C. obtusa plus trees. We used haploid megagametophytes to investigate the homology of 80 STS fragments between C. obtusa and C. japonica and to identify orthologous loci. Nearly 90% of the fragments showed high (>70%) degrees of similarity between the species, and 35 STSs indicated homology to entries with the same putative function in a public DNA database. Of the 120 STS fragments amplified, 72 showed restriction fragment length polymorphisms; in addition, the CC2430 primers detected amplicon length polymorphism. We assessed the inheritance pattern of 27 cleaved amplified polymorphic sequence markers, using 20 individuals from the segregation population. All the markers analyzed were consistent with the marker inheritance patterns obtained from the screening panel, and no markers (except CC2716) showed significant (P<0.01) deviation from the expected segregation ratio. In total, 136 polymorphic markers were developed using C. japonica-based STS primers without any sequence modification. In addition, the applicability of STS-based markers developed in one species to other species was found to closely reflect the evolutionary distance between the species, which is roughly concordant with the difference between their rbcL sequences. We plan to use these markers for genetic studies in C. obtusa. Most of the markers should also provide reliable anchor loci for comparative mapping studies of the C. obtusa and C. japonica genomes.

  1. Generation and Analysis of a Large-Scale Expressed Sequence Tag Database from a Full-Length Enriched cDNA Library of Developing Leaves of Gossypium hirsutum L

    PubMed Central

    Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun

    2013-01-01

    Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation

  2. Linguistic Preprocessing and Tagging for Problem Report Trend Analysis

    NASA Technical Reports Server (NTRS)

    Beil, Robert J.; Malin, Jane T.

    2012-01-01

    Mr. Robert Beil, Systems Engineer at Kennedy Space Center (KSC), requested the NASA Engineering and Safety Center (NESC) develop a prototype tool suite that combines complementary software technology used at Johnson Space Center (JSC) and KSC for problem report preprocessing and semantic tag extraction, to improve input to data mining and trend analysis. This document contains the outcome of the assessment and the Findings, Observations and NESC Recommendations.

  3. Digital cloning: identification of human cDNAs homologous to novel kinases through expressed sequence tag database searching.

    PubMed

    Chen, H C; Kung, H J; Robinson, D

    1998-01-01

    Identification of novel kinases based on their sequence conservation within kinase catalytic domain has relied so far on two major approaches, low-stringency hybridization of cDNA libraries, and PCR method using degenerate primers. Both of these approaches at times are technically difficult and time-consuming. We have developed a procedure that can significantly reduce the time and effort involved in searching for novel kinases and increase the sensitivity of the analysis. This procedure exploits the computer analysis of a vast resource of human cDNA sequences represented in the expressed sequence tag (EST) database. Seventeen novel human cDNA clones showing significant homology to serine/threonine kinases, including STE-20, CDK- and YAK-related family kinases, were identified by searching EST database. Further sequence analysis of these novel kinases obtained either directly from EST clones or from PCR-RACE products confirmed their identity as protein kinases. Given the rapid accumulation of the EST database and the advent of powerful computer analysis software, this approach provides a fast, sensitive, and economical way to identify novel kinases as well as other genes from EST database.

  4. Serial sequencing of isolength RAD tags for cost-efficient genome-wide profiling of genetic and epigenetic variations.

    PubMed

    Wang, Shi; Liu, Pingping; Lv, Jia; Li, Yangping; Cheng, Taoran; Zhang, Lingling; Xia, Yu; Sun, Hongzhen; Hu, Xiaoli; Bao, Zhenmin

    2016-11-01

    Isolength restriction site-associated DNA (isoRAD) sequencing is a very simple but powerful approach that was originally developed for genome-wide genotyping at minimal labor and cost, and it has recently extended its applicability to allow quantification of DNA methylation levels. The isoRAD method is distinct from other genotyping-by-sequencing (GBS) methods because of its use of special restriction enzymes to produce isolength tags (32-36 bp), and sequencing of these uniform tags can bring many benefits. However, the relatively short tags produced by the original protocol are mostly suited to single-end (SE) sequencing (36-50 bp), and therefore they cannot efficiently match the gradually increased sequencing capacity of next-generation sequencing (NGS) platforms. To address this issue, we describe an advanced protocol that allows the preparation of five concatenated isoRAD tags for Illumina paired-end (PE) sequencing (100-150 bp). The configuration of the five concatenated tags is highly flexible, and can be defined by users to work with a desired combination of samples and/or restriction enzymes to suit specific research purposes. In comparison with the original protocol, the advanced protocol has an additional digestion and ligation step, and library preparation can be completed in ∼8 h.

  5. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

    PubMed Central

    Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.

    2001-01-01

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022

  6. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.

    PubMed

    Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M

    2001-10-09

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.

  7. TBestDB: a taxonomically broad database of expressed sequence tags (ESTs)

    PubMed Central

    O'Brien, Emmet A.; Koski, Liisa B.; Zhang, Yue; Yang, LiuSong; Wang, Eric; Gray, Michael W.; Burger, Gertraud; Lang, B. Franz

    2007-01-01

    The TBestDB database contains ∼370 000 clustered expressed sequence tag (EST) sequences from 49 organisms, covering a taxonomically broad range of poorly studied, mainly unicellular eukaryotes, and includes experimental information, consensus sequences, gene annotations and metabolic pathway predictions. Most of these ESTs have been generated by the Protist EST Program, a collaboration among six Canadian research groups. EST sequences are read from trace files up to a minimum quality cut-off, vector and linker sequence is masked, and the ESTs are clustered using phrap. The resulting consensus sequences are automatically annotated by using the AutoFACT program. The datasets are automatically checked for clustering errors due to chimerism and potential cross-contamination between organisms, and suspect data are flagged in or removed from the database. Access to data deposited in TBestDB by individual users can be restricted to those users for a limited period. With this first report on TBestDB, we open the database to the research community for free processing, annotation, interspecies comparisons and GenBank submission of EST data generated in individual laboratories. For instructions on submission to TBestDB, contact tbestdb@bch.umontreal.ca. The database can be queried at . PMID:17202165

  8. Correction of sequence-based artifacts in serial analysis of gene expression.

    PubMed

    Akmaev, Viatcheslav R; Wang, Clarence J

    2004-05-22

    Serial Analysis of Gene Expression (SAGE) is a powerful technology for measuring global gene expression, through rapid generation of large numbers of transcript tags. Beyond their intrinsic value in differential gene expression analysis, SAGE tag collections afford abundant information on the size and shape of the sample transcriptome and can accelerate novel gene discovery. These latter SAGE applications are facilitated by the enhanced method of Long SAGE. A characteristic of sequencing-based methods, such as SAGE and Long SAGE is the unavoidable occurrence of artifact sequences resulting from sequencing errors. By virtue of their low-random incidence, such tag errors have minimal impact on differential expression analysis. However, to fully exploit the value of large SAGE tag datasets, it is desirable to account for and correct tag artifacts. We present estimates for occurrences of tag errors, and an efficient error correction algorithm. Error rate estimates are based on a stochastic model that includes the Polymerase chain reaction and sequencing error contributions. The correction algorithm, SAGEScreen, is a multi-step procedure that addresses ditag processing, estimation of empirical error rates from highly abundant tags, grouping of similar-sequence tags and statistical testing of observed counts. We apply SAGEScreen to Long SAGE libraries and compare error rates for several processing scenarios. Results with simulated tag collections indicate that SAGEScreen corrects 78% of recoverable tag errors and reduces the occurrences of singleton tags. The SAGEScreen software is available for academic users from the first author.

  9. Quantifying perinatal transmission of Hepatitis B viral quasispecies by tag linkage deep sequencing.

    PubMed

    Du, Yushen; Chi, Xiumei; Wang, Chong; Jiang, Jing; Kong, Fei; Yan, Hongqing; Wang, Xiaomei; Li, Jie; Wu, Nicholas C; Dai, Lei; Zhang, Tian-Hao; Shu, Sara; Zhou, Jian; Yoshizawa, Janice M; Li, Xinmin; Bhattacharya, Debika; Wu, Ting-Ting; Niu, Junqi; Sun, Ren

    2017-08-31

    Despite full immunoprophylaxis, mother-to-child transmission (MTCT) of Hepatitis B Virus still occurs in approximately 2-5% of HBsAg positive mothers. Little is known about the bottleneck of HBV transmission and the evolution of viral quasispecies in the context of MTCT. Here we adopted a newly developed tag linkage deep sequencing method and analyzed the quasispecies of four MTCT pairs that broke through immunoprophylaxis. By assigning unique tags to individual viral sequences, we accurately reconstructed HBV haplotypes in a region of 836 bp, which contains the major immune epitopes and drug resistance mutations. The detection limit of minor viral haplotypes reached 0.1% for individual patient sample. Dominance of "a determinant" polymorphisms were observed in two children, which pre-existed as minor quasispecies in maternal samples. In all four pairs of MTCT samples, we consistently observed a significant overlap of viral haplotypes shared between mother and child. We also demonstrate that the data can be potentially useful to estimate the bottleneck effect during HBV MTCT, which provides information to optimize treatment for reducing the frequency of MTCT.

  10. Annotated Expressed Sequence Tags and cDNA Microarrays for Studies of Brain and Behavior in the Honey Bee

    PubMed Central

    Whitfield, Charles W.; Band, Mark R.; Bonaldo, Maria F.; Kumar, Charu G.; Liu, Lei; Pardinas, Jose R.; Robertson, Hugh M.; Soares, M. Bento; Robinson, Gene E.

    2002-01-01

    To accelerate the molecular analysis of behavior in the honey bee (Apis mellifera), we created expressed sequence tag (EST) and cDNA microarray resources for the bee brain. Over 20,000 cDNA clones were partially sequenced from a normalized (and subsequently subtracted) library generated from adult A. mellifera brains. These sequences were processed to identify 15,311 high-quality ESTs representing 8912 putative transcripts. Putative transcripts were functionally annotated (using the Gene Ontology classification system) based on matching gene sequences in Drosophila melanogaster. The brain ESTs represent a broad range of molecular functions and biological processes, with neurobiological classifications particularly well represented. Roughly half of Drosophila genes currently implicated in synaptic transmission and/or behavior are represented in the Apis EST set. Of Apis sequences with open reading frames of at least 450 bp, 24% are highly diverged with no matches to known protein sequences. Additionally, over 100 Apis transcript sequences conserved with other organisms appear to have been lost from the Drosophila genome. DNA microarrays were fabricated with over 7000 EST cDNA clones putatively representing different transcripts. Using probe derived from single bee brain mRNA, microarrays detected gene expression for 90% of Apis cDNAs two standard deviations greater than exogenous control cDNAs. [The sequence data described in this paper have been submitted to Genbank data library under accession nos. BI502708–BI517278. The sequences are also available at http://titan.biotec.uiuc.edu/bee/honeybee_project.htm.] PMID:11932240

  11. Identification of expressed resistance gene analogs from peanut (Arachis hypogaea L.) expressed sequence tags.

    PubMed

    Liu, Zhanji; Feng, Suping; Pandey, Manish K; Chen, Xiaoping; Culbreath, Albert K; Varshney, Rajeev K; Guo, Baozhu

    2013-05-01

    Low genetic diversity makes peanut (Arachis hypogaea L.) very vulnerable to plant pathogens, causing severe yield loss and reduced seed quality. Several hundred partial genomic DNA sequences as nucleotide-binding-site leucine-rich repeat (NBS-LRR) resistance genes (R) have been identified, but a small portion with expressed transcripts has been found. We aimed to identify resistance gene analogs (RGAs) from peanut expressed sequence tags (ESTs) and to develop polymorphic markers. The protein sequences of 54 known R genes were used to identify homologs from peanut ESTs from public databases. A total of 1,053 ESTs corresponding to six different classes of known R genes were recovered, and assembled 156 contigs and 229 singletons as peanut-expressed RGAs. There were 69 that encoded for NBS-LRR proteins, 191 that encoded for protein kinases, 82 that encoded for LRR-PK/transmembrane proteins, 28 that encoded for Toxin reductases, 11 that encoded for LRR-domain containing proteins and four that encoded for TM-domain containing proteins. Twenty-eight simple sequence repeats (SSRs) were identified from 25 peanut expressed RGAs. One SSR polymorphic marker (RGA121) was identified. Two polymerase chain reaction-based markers (Ahsw-1 and Ahsw-2) developed from RGA013 were homologous to the Tomato Spotted Wilt Virus (TSWV) resistance gene. All three markers were mapped on the same linkage group AhIV. These expressed RGAs are the source for RGA-tagged marker development and identification of peanut resistance genes. © 2013 Institute of Botany, Chinese Academy of Sciences.

  12. OSIRIS-REx Touch-And-Go (TAG) Mission Design and Analysis

    NASA Technical Reports Server (NTRS)

    Berry, Kevin; Sutter, Brian; May, Alex; Williams, Ken; Barbee, Brent W.; Beckman, Mark; Williams, Bobby

    2013-01-01

    The Origins Spectral Interpretation Resource Identification Security Regolith Explorer (OSIRIS-REx) mission is a NASA New Frontiers mission launching in 2016 to rendezvous with the near-Earth asteroid (101955) 1999 RQ36 in late 2018. After several months in formation with and orbit about the asteroid, OSIRIS-REx will fly a Touch-And-Go (TAG) trajectory to the asteroid s surface to obtain a regolith sample. This paper describes the mission design of the TAG sequence and the propulsive maneuvers required to achieve the trajectory. This paper also shows preliminary results of orbit covariance analysis and Monte-Carlo analysis that demonstrate the ability to arrive at a targeted location on the surface of RQ36 within a 25 meter radius with 98.3% confidence.

  13. Expressed Sequence Tags With cDNA Termini: Previously Overlooked Resources for Gene Annotation and Transcriptome Exploration in Chlamydomonas reinhardtii

    PubMed Central

    Liang, Chun; Liu, Yuansheng; Liu, Lin; Davis, Adam C.; Shen, Yingjia; Li, Qingshun Quinn

    2008-01-01

    Many of Chlamydomonas reinhardtii expressed sequence tags (ESTs) in GenBank dbEST and community EST assemblies were either over- or undertrimmed in terms of their cDNA termini, which are defined as the diagnostic sequence elements that delineate 3′/5′ ends of mRNA transcripts. Overtrimming represents a loss of directional, positional, and structural information of transcript ends whereas undertrimming causes unclean spurious sequences retained in ESTs that exert deleterious impacts on downstream EST-based applications. We examined 309,278 raw EST sequencing trace files of C. reinhardtii and found that only 57% had cDNA termini that matched the expected structures specified in their cDNA library constructions while satisfying our minimum length requirement for their final clean sequences. Using GMAP, 156,963 individual ESTs were mapped to the genome successfully, with their in silico-verified cDNA termini anchored to the genome. Our data analysis suggested strong macro- and microheterogeneity of 3′/5′ end positions of individual transcripts derived from the same genes in C. reinhardtii. This work annotating differential ends of individual transcripts in the draft genome presents the research community with a new stream of data that will facilitate accurate determination of gene structures, genome annotation, and exploration of the transcriptome and mRNA metabolism in C. reinhardtii. PMID:18493042

  14. Satellite-tagged transcribing sequences in Bubalus bubalis genome undergo programmed modulation in meiocytes: possible implications for transcriptional inactivation.

    PubMed

    Chattopadhyay, M; Gangadharan, S; Kapur, V; Azfer, M A; Prakash, B; Ali, S

    2001-09-01

    We cloned and sequenced a 1378 bp BamHI satellite DNA fraction from the water buffalo Bubalus bubalis and have studied its expression in different tissues. The GC-rich sequences of the resultant contig pDS5 crosshybridize only with bovid DNA and are not conserved evolutionarily. Typing of buffalo genomic DNA using pDS5 with several restriction enzymes revealed multilocus monomorphic bands. Similar typing of cattle, buffalo, goat, sheep, and gaur genomic DNA revealed variations in copy number and allele length giving rise to species-specific band patterns. Expression study of pDS5 in bubaline samples by RNA slot-blot, Northern blot, and RT-PCR showed various levels of signal in all the somatic tissues and germline cells except heart. A GenBank database search revealed homology of pDS5 sequences in the 5' region from nt 1-1261 with collagen gene. An AluI typing analysis of DNA from bubaline semen samples showed consistent loss of two bands. The presence of corresponding bands in somatic tissues suggests a sequence modulation within the pDS5 array in meiocytes during spermatogenesis, which is restored in the somatic cells after fertilization. Modulation of the satellite-tagged transcribing sequence in the meiocytes may be a mechanism of its inactivation.

  15. Generation of expressed sequence tags of random root cDNA clones of Brassica napus by single-run partial sequencing.

    PubMed Central

    Park, Y S; Kwak, J M; Kwon, O Y; Kim, Y S; Lee, D S; Cho, M J; Lee, H H; Nam, H G

    1993-01-01

    Two hundred thirty-seven expressed sequence tags (ESTs) of Brassica napus were generated by single-run partial sequencing of 197 random root cDNA clones. A computer search of these root ESTs revealed that 21 ESTs show significant similarity to the protein-coding sequences in the existing data bases, including five stress- or defense-related genes and four clones related to the genes from other kingdoms. Northern blot analysis of the 10 data base-matched cDNA clones revealed that many of the clones are expressed most abundantly in root but less abundantly in other organs. However, two clones were highly root specific. The results show that generation of the root ESTs by partial sequencing of random cDNA clones along with the expression analysis is an efficient approach to isolate genes that are functional in plant root in a large scale. We also discuss the results of the examination of cDNA libraries and sequencing methods suitable for this approach. PMID:8029332

  16. Behavior Analysis Based on Coordinates of Body Tags

    NASA Astrophysics Data System (ADS)

    Luštrek, Mitja; Kaluža, Boštjan; Dovgan, Erik; Pogorelc, Bogdan; Gams, Matjaž

    This paper describes fall detection, activity recognition and the detection of anomalous gait in the Confidence project. The project aims to prolong the independence of the elderly by detecting falls and other types of behavior indicating a health problem. The behavior will be analyzed based on the coordinates of tags worn on the body. The coordinates will be detected with radio sensors. We describe two Confidence modules. The first one classifies the user's activity into one of six classes, including falling. The second one detects walking anomalies, such as limping, dizziness and hemiplegia. The walking analysis can automatically adapt to each person by using only the examples of normal walking of that person. Both modules employ machine learning: the paper focuses on the features they use and the effect of tag placement and sensor noise on the classification accuracy. Four tags were enough for activity recognition accuracy of over 93% at moderate sensor noise, while six were needed to detect walking anomalies with the accuracy of over 90%.

  17. Application of Cydia pomonella expressed sequence tags: identification and expression of three general odorant binding proteins in codling moth

    USDA-ARS?s Scientific Manuscript database

    The codling moth, Cydia pomonella, is one of the most important pests of pome fruits in the world, yet the molecular genetics and physiology of this insect remains poorly understood. A combined assembly of 8340 expressed sequence tags (ESTs) was generated from Roche 454 GS-FLX sequencing of 8 tissu...

  18. A Comprehensive Approach to Clustering of Expressed Human Gene Sequence: The Sequence Tag Alignment and Consensus Knowledge Base

    PubMed Central

    Miller, Robert T.; Christoffels, Alan G.; Gopalakrishnan, Chella; Burke, John; Ptitsyn, Andrey A.; Broveak, Tania R.; Hide, Winston A.

    1999-01-01

    The expressed human genome is being sequenced and analyzed by disparate groups producing disparate data. The majority of the identified coding portion is in the form of expressed sequence tags (ESTs). The need to discover exonic representation and expression forms of full-length cDNAs for each human gene is frustrated by the partial and variable quality nature of this data delivery. A highly redundant human EST data set has been processed into integrated and unified expressed transcript indices that consist of hierarchically organized human transcript consensi reflecting gene expression forms and genetic polymorphism within an index class. The expression index and its intermediate outputs include cleaned transcript sequence, expression, and alignment information and a higher fidelity subset, SANIGENE. The STACK_PACK clustering system has been applied to dbEST release 121598 (GenBank version 110). Sixty-four percent of 1,313,103 Homo sapiens ESTs are condensed into 143,885 tissue level multiple sequence clusters; linking through clone-ID annotations produces 68,701 total assemblies, such that 81% of the original input set is captured in a STACK multiple sequence or linked cluster. Indexing of alignments by substituent EST accession allows browsing of the data structure and its cross-links to UniGene. STACK metaclusters consolidate a greater number of ESTs by a factor of 1.86 with respect to the corresponding UniGene build. Fidelity comparison with genome reference sequence AC004106 demonstrates consensus expression clusters that reflect significantly lower spurious repeat sequence content and capture alternate splicing within a whole body index cluster and three STACK v.2.3 tissue-level clusters. Statistics of a staggered release whole body index build of STACK v.2.0 are presented. PMID:10568754

  19. Analysis of wheat SAGE tags reveals evidence for widespread antisense transcription

    PubMed Central

    Poole, Rebecca L; Barker, Gary LA; Werner, Kay; Biggi, Gaia F; Coghill, Jane; Gibbings, J George; Berry, Simon; Dunwell, Jim M; Edwards, Keith J

    2008-01-01

    Background Serial Analysis of Gene Expression (SAGE) is a powerful tool for genome-wide transcription studies. Unlike microarrays, it has the ability to detect novel forms of RNA such as alternatively spliced and antisense transcripts, without the need for prior knowledge of their existence. One limitation of using SAGE on an organism with a complex genome and lacking detailed sequence information, such as the hexaploid bread wheat Triticum aestivum, is accurate annotation of the tags generated. Without accurate annotation it is impossible to fully understand the dynamic processes involved in such complex polyploid organisms. Hence we have developed and utilised novel procedures to characterise, in detail, SAGE tags generated from the whole grain transcriptome of hexaploid wheat. Results Examination of 71,930 Long SAGE tags generated from six libraries derived from two wheat genotypes grown under two different conditions suggested that SAGE is a reliable and reproducible technique for use in studying the hexaploid wheat transcriptome. However, our results also showed that in poorly annotated and/or poorly sequenced genomes, such as hexaploid wheat, considerably more information can be extracted from SAGE data by carrying out a systematic analysis of both perfect and "fuzzy" (partially matched) tags. This detailed analysis of the SAGE data shows first that while there is evidence of alternative polyadenylation this appears to occur exclusively within the 3' untranslated regions. Secondly, we found no strong evidence for widespread alternative splicing in the developing wheat grain transcriptome. However, analysis of our SAGE data shows that antisense transcripts are probably widespread within the transcriptome and appear to be derived from numerous locations within the genome. Examination of antisense transcripts showing sequence similarity to the Puroindoline a and Puroindoline b genes suggests that such antisense transcripts might have a role in the regulation of

  20. A chemically synthesized peptoid-based drag-tag enhances free-solution DNA sequencing by capillary electrophoresis.

    PubMed

    Haynes, Russell D; Meagher, Robert J; Barron, Annelise E

    2011-01-01

    We report a capillary-based DNA sequencing read length of 100 bases in 16 min using end-labeled free-solution conjugate electrophoresis (FSCE) with a monodisperse poly-N-substituted glycine (polypeptoid) as a synthetic drag-tag. FSCE enabled rapid separation of single-stranded (ss) DNA sequencing fragments with single-base resolution without the need for a viscous DNA separation matrix. Protein-based drag-tags previously used for FSCE sequencing, for example, streptavidin, are heterogeneous in molar mass (polydisperse); the resultant band-broadening can make it difficult to obtain the single-base resolution necessary for DNA sequencing. In this study, we synthesized and HPLC-purified a 70mer poly-N-(methoxyethyl)glycine (NMEG) drag-tag with a molar mass of - 11 kDa. The NMEG monomers that comprise this peptoid drag-tag are interesting for bioanalytical applications, because the methoxyethyl side chain's chemical structure is reminiscent of the basic monomer unit of polyethylene glycol, a highly biocompatible commercially available polymer, which, however, is not available in monodisperse preparation at an - 11 kDa molar mass. This is the first report of ssDNA separation and of four-color, base-by-base DNA sequencing by FSCE through the use of a chemically synthesized drag-tag. These results show that high-molar mass, chemically synthesized drag-tags based on the polyNMEG structure, if obtained in monodisperse preparation, would serve as ideal drag-tags and could help FSCE reach the commercially relevant read lengths of 100 bases or more.

  1. SpiroESTdb: a transcriptome database and online tool for sparganum expressed sequences tags.

    PubMed

    Kim, Dae-Won; Kim, Dong-Wook; Yoo, Won Gi; Nam, Seong-Hyeuk; Lee, Myoung-Ro; Yang, Hye-Won; Park, Junhyung; Lee, Kyooyeol; Lee, Sanghyun; Cho, Shin-Hyeong; Lee, Won-Ja; Park, Hong-Seog; Ju, Jung-Won

    2012-03-08

    Sparganum (plerocercoid of Spirometra erinacei) is a parasite that possesses the remarkable ability to survive by successfully modifying its physiology and morphology to suit various hosts and can be found in various tissues, even the nervous system. However, surprisingly little is known about the molecular function of genes that are expressed during the course of the parasite life cycle. To begin to decipher the molecular processes underlying gene function, we constructed a database of expressed sequence tags (ESTs) generated from sparganum. SpiroESTdb is a web-based information resource that is built upon the annotation and curation of 5,655 ESTs data. SpiroESTdb provides an integrated platform for expressed sequence data, expression dynamics, functional genes, genetic markers including single nucleotide polymorphisms and tandem repeats, gene ontology and KEGG pathway information. Moreover, SpiroESTdb supports easy access to gene pages, such as (i) curation and query forms, (ii) in silico expression profiling and (iii) BLAST search tools. Comprehensive descriptions of the sparganum content of all sequenced data are available, including summary reports. The contents of SpiroESTdb can be viewed and downloaded from the web (http://pathod.cdc.go.kr/spiroestdb). This integrative web-based database of sequence data, functional annotations and expression profiling data will serve as a useful tool to help understand and expand the characterization of parasitic infections. It can also be used to identify potential industrial drug targets and vaccine candidate genes.

  2. Advances in sequence analysis.

    PubMed

    Califano, A

    2001-06-01

    In its early days, the entire field of computational biology revolved almost entirely around biological sequence analysis. Over the past few years, however, a number of new non-sequence-based areas of investigation have become mainstream, from the analysis of gene expression data from microarrays, to whole-genome association discovery, and to the reverse engineering of gene regulatory pathways. Nonetheless, with the completion of private and public efforts to map the human genome, as well as those of other organisms, sequence data continue to be a veritable mother lode of valuable biological information that can be mined in a variety of contexts. Furthermore, the integration of sequence data with a variety of alternative information is providing valuable and fundamentally new insight into biological processes, as well as an array of new computational methodologies for the analysis of biological data.

  3. Development of polymorphic expressed sequence tag-single sequence repeat markers in the common Chinese cuttlefish, Sepiella maindroni.

    PubMed

    Li, R H; Lu, S K; Zhang, C L; Song, W W; Mu, C K; Wang, C L

    2014-07-25

    The common Chinese cuttlefish (Sepiella maindroni) is one of the popular edible cephalopod consumed across Asia. To facilitate the population genetic investigation of this species, we developed fourteen polymorphic microsatellite makers from expressed sequence tags of S. maindroni. The number of alleles at each locus ranged from 6 to 10 with an average of 7.9 alleles per locus. The ranges of observed and expected heterozygosity were from 0.615 to 0.962 and 0.685 to 0.888, respectively. Four loci were found deviated significantly from Hardy-Weinberg equilibrium. The polymorphism information content ranged from 0.638 to 0.833. These polymorphic microsatellite loci will be helpful for the population genetic, genetic linkage map, and other genetic studies of S. maindroni.

  4. Development of expressed sequence tag resources for Vanda Mimi Palmer and data mining for EST-SSR.

    PubMed

    Teh, Seow-Ling; Chan, Wai-Sun; Abdullah, Janna Ong; Namasivayam, Parameswari

    2011-08-01

    Vanda Mimi Palmer (VMP) is a highly sought as fragrant-orchid hybrid in Malaysia. It is economically important in cosmetic and beauty industries and also a famous potted ornamental plant. To date, no work on fragrance-related genes of vandaceous orchids has been reported from other research groups although the analysis of floral fragrance or volatiles have been extensively studied. An expressed sequence tag (EST) resource was developed for VMP principally to mine any potential fragrance-related expressed sequence tag-simple sequence repeat (EST-SSR) for future development as markers in the identification of fragrant vandaceous orchids endemic to Malaysia. Clustering, annotation and assembling of the ESTs identified 1,196 unigenes which defined 966 singletons and 230 contigs. The VMP dbEST was functionally classified by gene ontology (GO) into three groups: molecular functions (51.2%), cellular components (16.4%) and biological processes (24.6%) while the remaining 7.8% showed no hits with GO identifier. A total of 112 EST-SSR (9.4%) was mined on which at least five units of di-, tri-, tetra-, penta-, or hexa-nucleotide repeats were predicted. The di-nucleotide motif repeats appeared to be the most frequent repeats among the detected SSRs with the AT/TA types as the most abundant among the dimerics, while AAG/TTC, AGA/TCT-type were the most frequent trimerics. The mined EST-SSR is believed to be useful in the development of EST-SSR markers that is applicable in the screening and characterization of fragrance-related transcripts in closely related species.

  5. Single syllable tongue motion analysis using tagged cine MRI.

    PubMed

    Unay, Devrim; Ozturk, Cengizhan; Stone, Maureen

    2014-01-01

    The complicated muscle activity of the human tongue and the resultant surface shapes can give us important clues about speech motor control and pathological tongue motion. This study uses tagged magnetic resonance imaging to provide a 2D surface deformation analysis of the tongue, as well as a 4D compression-expansion analysis, during utterances of four different syllables (/ba/, /ta/, /sha/ and /ga/). All speech tasks were performed several times to confirm the repeatability of the motion analysis. The results showed that the tongue has unique motion patterns for utterances of different syllables, and these differences, which may not be observed by a simple surface analysis, can be examined thoroughly by a 4D motion model-based analysis of the tongue muscles.

  6. Expressed sequence tags in cultivated peanut (Arachis hypogaea): discovery of genes in seed development and response to Ralstonia solanacearum challenge.

    PubMed

    Huang, Jiaquan; Yan, Liying; Lei, Yong; Jiang, Huifang; Ren, Xiaoping; Liao, Boshou

    2012-11-01

    Although an important oil crop, peanut has only 162,030 expressed sequence tags (ESTs) publicly available, 86,943 of which are from cultivated plants. More ESTs from cultivated peanuts are needed for isolation of stress-resistant, tissue-specific and developmentally important genes. Here, we generated 63,234 ESTs from our 5 constructed peanut cDNA libraries of Ralstonia solanacearum challenged roots, R. solanacearum challenged leaves, and unchallenged cultured peanut roots, leaves and developing seeds. Among these ESTs, there were 14,547 unique sequences with 7,961 tentative consensus sequences and 6,586 singletons. Putative functions for 47.8 % of the sequences were identified, including transcription factors, tissue-specific genes, genes involved in fatty acid biosynthesis and oil formation regulation, and resistance gene analogue genes. Additionally, differentially expressed genes, including those involved in ethylene and jasmonic acid signal transduction pathways, from both peanut leaves and roots, were identified in R. solanacearum challenged samples. This large expression dataset from different peanut tissues will be a valuable source for marker development and gene expression analysis. It will also be helpful for finding candidate genes for fatty acid synthesis and oil formation regulation as well as for studying mechanisms of interactions between the peanut host and R. solanacearum pathogen.

  7. Generation of expressed sequence tags from low-CO2 and high-CO2 adapted cells of Chlamydomonas reinhardtii.

    PubMed

    Asamizu, E; Miura, K; Kucho, K; Inoue, Y; Fukuzawa, H; Ohyama, K; Nakamura, Y; Tabata, S

    2000-10-31

    To characterize genes whose expression is induced in carbon-stress conditions, 12,969 and 13,450 5'-end expressed sequence tags (ESTs) were generated from cells grown in low-CO2 and high-CO2 conditions of the unicellular green alga, Chlamydomonas reinhardtii. These ESTs were clustered into 4436 and 3566 non-redundant EST groups, respectively. Comparison of their sequences with those of 3433 non-redundant ESTs previously generated from the cells under the standard growth condition indicated that 2665 and 1879 EST groups occurred only in the low-CO2 and high-CO2 populations, respectively. It was also noted that 96.2% and 96.0% of the cDNA species respectively obtained from the low-CO2 and high-CO2 conditions had no similar EST sequence deposited in the public databases. The EST species identified only in the low-CO2 treated cells included genes previously reported to be expressed specifically in low-CO2 acclimatized cells, suggesting that the ESTs generated in this study will be a useful source for analysis of genes related to carbon-stress acclimatization. The sequence information and search results of each clone will appear at the web site: http://www.kazusa.or.jp/en/plant/chlamy/EST/.

  8. Genome-wide characterization and selection of expressed sequence tag simple sequence repeat primers for optimized marker distribution and reliability in peach

    USDA-ARS?s Scientific Manuscript database

    Expressed sequence tag (EST) simple sequence repeats (SSRs) in Prunus were mined, and flanking primers designed and used for genome-wide characterization and selection of primers to optimize marker distribution and reliability. A total of 12,618 contigs were assembled from 84,727 ESTs, along with 34...

  9. Androgenesis in chickpea: anther culture and expressed sequence tags derived annotation.

    PubMed

    Panchangam, Sameera Sastry; Mallikarjuna, Nalini; Gaur, Pooran M; Suravajhala, Prashanth

    2014-02-01

    Double haploid technique is not routinely used in legume breeding programs, though recent publications report haploid plants via anther culture in chickpea (Cicer arietinum L.). The focus of this study was to develop an efficient and reproducible protocol for the production of double haploids with the application of multiple stress pre-treatments such as centrifugation and osmotic shock for genotypes of interest in chickpea for their direct use in breeding programs. Four genotypes, ICC 4958, WR315, ICCV 95423 and Arearti were tested for anther culture experiments. The yield was shown to be consistent with 3-5 nucleate microspores and 2-7 celled structures with no further growth. To gain a further insight into the molecular mechanism underlying the switch from microsporogenesis to androgenesis, bioinformatics tools were employed. The challenges on the roles of such genes were reviewed while an attempt was made to find putative candidates for androgenesis using Expressed Sequenced Tags (EST) and interolog based protein interaction analyses.

  10. Mining of haplotype-based expressed sequence tag single nucleotide polymorphisms in citrus.

    PubMed

    Chen, Chunxian; Gmitter, Fred G

    2013-11-01

    Single nucleotide polymorphisms (SNPs), the most abundant variations in a genome, have been widely used in various studies. Detection and characterization of citrus haplotype-based expressed sequence tag (EST) SNPs will greatly facilitate further utilization of these gene-based resources. In this paper, haplotype-based SNPs were mined out of publicly available citrus expressed sequence tags (ESTs) from different citrus cultivars (genotypes) individually and collectively for comparison. There were a total of 567,297 ESTs belonging to 27 cultivars in varying numbers and consequentially yielding different numbers of haplotype-based quality SNPs. Sweet orange (SO) had the most (213,830) ESTs, generating 11,182 quality SNPs in 3,327 out of 4,228 usable contigs. Summed from all the individually mining results, a total of 25,417 quality SNPs were discovered - 15,010 (59.1%) were transitions (AG and CT), 9,114 (35.9%) were transversions (AC, GT, CG, and AT), and 1,293 (5.0%) were insertion/deletions (indels). A vast majority of SNP-containing contigs consisted of only 2 haplotypes, as expected, but the percentages of 2 haplotype contigs varied widely in these citrus cultivars. BLAST of the 25,417 25-mer SNP oligos to the Clementine reference genome scaffolds revealed 2,947 SNPs had "no hits found", 19,943 had 1 unique hit / alignment, 1,571 had one hit and 2+ alignments per hit, and 956 had 2+ hits and 1+ alignment per hit. Of the total 24,293 scaffold hits, 23,955 (98.6%) were on the main scaffolds 1 to 9, and only 338 were on 87 minor scaffolds. Most alignments had 100% (25/25) or 96% (24/25) nucleotide identities, accounting for 93% of all the alignments. Considering almost all the nucleotide discrepancies in the 24/25 alignments were at the SNP sites, it served well as in silico validation of these SNPs, in addition to and consistent with the rate (81%) validated by sequencing and SNaPshot assay. High-quality EST-SNPs from different citrus genotypes were detected, and

  11. Mining of haplotype-based expressed sequence tag single nucleotide polymorphisms in citrus

    PubMed Central

    2013-01-01

    Background Single nucleotide polymorphisms (SNPs), the most abundant variations in a genome, have been widely used in various studies. Detection and characterization of citrus haplotype-based expressed sequence tag (EST) SNPs will greatly facilitate further utilization of these gene-based resources. Results In this paper, haplotype-based SNPs were mined out of publicly available citrus expressed sequence tags (ESTs) from different citrus cultivars (genotypes) individually and collectively for comparison. There were a total of 567,297 ESTs belonging to 27 cultivars in varying numbers and consequentially yielding different numbers of haplotype-based quality SNPs. Sweet orange (SO) had the most (213,830) ESTs, generating 11,182 quality SNPs in 3,327 out of 4,228 usable contigs. Summed from all the individually mining results, a total of 25,417 quality SNPs were discovered – 15,010 (59.1%) were transitions (AG and CT), 9,114 (35.9%) were transversions (AC, GT, CG, and AT), and 1,293 (5.0%) were insertion/deletions (indels). A vast majority of SNP-containing contigs consisted of only 2 haplotypes, as expected, but the percentages of 2 haplotype contigs varied widely in these citrus cultivars. BLAST of the 25,417 25-mer SNP oligos to the Clementine reference genome scaffolds revealed 2,947 SNPs had “no hits found”, 19,943 had 1 unique hit / alignment, 1,571 had one hit and 2+ alignments per hit, and 956 had 2+ hits and 1+ alignment per hit. Of the total 24,293 scaffold hits, 23,955 (98.6%) were on the main scaffolds 1 to 9, and only 338 were on 87 minor scaffolds. Most alignments had 100% (25/25) or 96% (24/25) nucleotide identities, accounting for 93% of all the alignments. Considering almost all the nucleotide discrepancies in the 24/25 alignments were at the SNP sites, it served well as in silico validation of these SNPs, in addition to and consistent with the rate (81%) validated by sequencing and SNaPshot assay. Conclusions High-quality EST-SNPs from different

  12. Optimal cDNA microarray design using expressed sequence tags for organisms with limited genomic information

    PubMed Central

    Chen, Yian A; Mckillen, David J; Wu, Shuyuan; Jenny, Matthew J; Chapman, Robert; Gross, Paul S; Warr, Gregory W; Almeida, Jonas S

    2004-01-01

    Background Expression microarrays are increasingly used to characterize environmental responses and host-parasite interactions for many different organisms. Probe selection for cDNA microarrays using expressed sequence tags (ESTs) is challenging due to high sequence redundancy and potential cross-hybridization between paralogous genes. In organisms with limited genomic information, like marine organisms, this challenge is even greater due to annotation uncertainty. No general tool is available for cDNA microarray probe selection for these organisms. Therefore, the goal of the design procedure described here is to select a subset of ESTs that will minimize sequence redundancy and characterize potential cross-hybridization while providing functionally representative probes. Results Sequence similarity between ESTs, quantified by the E-value of pair-wise alignment, was used as a surrogate for expected hybridization between corresponding sequences. Using this value as a measure of dissimilarity, sequence redundancy reduction was performed by hierarchical cluster analyses. The choice of how many microarray probes to retain was made based on an index developed for this research: a sequence diversity index (SDI) within a sequence diversity plot (SDP). This index tracked the decreasing within-cluster sequence diversity as the number of clusters increased. For a given stage in the agglomeration procedure, the EST having the highest similarity to all the other sequences within each cluster, the centroid EST, was selected as a microarray probe. A small dataset of ESTs from Atlantic white shrimp (Litopenaeus setiferus) was used to test this algorithm so that the detailed results could be examined. The functional representative level of the selected probes was quantified using Gene Ontology (GO) annotations. Conclusions For organisms with limited genomic information, combining hierarchical clustering methods to analyze ESTs can yield an optimal cDNA microarray design. If

  13. Survey in the sugarcane expressed sequence tag database (SUCEST) for simple sequence repeats.

    PubMed

    Pinto, Luciana Rossini; Oliveira, Karine Miranda; Ulian, Eugênio César; Garcia, Antonio Augusto Franco; de Souza, Anete Pereira

    2004-10-01

    Sugarcane microsatellites or simple sequence repeats (SSR) were developed in an economical and practical way by mining EST databases. A survey in the SUCEST (sugarcane EST) database revealed a total of 2005 clusters out of 43,141 containing SSRs. Of these, 8.2% were dinucleotide, 30.5% were trinucleotide, and 61.3% were tetranucleotide repeats. Except for dinucleotides, the CG-rich motif types were the most common. Differences in abundance of trinucleotide motif types were observed between EST-SSRs and those isolated from sugarcane genomic libraries. Among the different cDNA libraries used for EST sequencing, SSRs were more frequent in the ones derived from leaf roll (LR). Twenty-three out of 30 tested SSRs produced scorable polymorphisms in 18 sugarcane commercial clones. These EST-SSRs showed a moderate level of polymorphism with some SSRs producing unique fingerprints. The number of alleles observed among the 18 clones evaluated varied from 2 to 15, with an average of 6.04 alleles/locus. The polymorphism information content (PIC) values ranged from 0.28 to 0.90 with a mean of 0.66. The EST-SSRs screened over both parents (SP 80-180; SP 80-4966) and 6 F1 individuals produced 52 segregating markers that could potentially be used for sugarcane mapping. The EST-SSRs were found in clusters that had significant homology to proteins involved in important metabolic pathways such as sugar biosynthesis, proving that EST-SSRs are a valuable tool for the construction of a functional sugarcane map.

  14. Characterization of expressed sequence tags from a full-length enriched cDNA library of Cryptomeria japonica male strobili

    PubMed Central

    Futamura, Norihiro; Totoki, Yasushi; Toyoda, Atsushi; Igasaki, Tomohiro; Nanjo, Tokihiko; Seki, Motoaki; Sakaki, Yoshiyuki; Mari, Adriano; Shinozaki, Kazuo; Shinohara, Kenji

    2008-01-01

    Background Cryptomeria japonica D. Don is one of the most commercially important conifers in Japan. However, the allergic disease caused by its pollen is a severe public health problem in Japan. Since large-scale analysis of expressed sequence tags (ESTs) in the male strobili of C. japonica should help us to clarify the overall expression of genes during the process of pollen development, we constructed a full-length enriched cDNA library that was derived from male strobili at various developmental stages. Results We obtained 36,011 expressed sequence tags (ESTs) from either one or both ends of 19,437 clones derived from the cDNA library of C. japonica male strobili at various developmental stages. The 19,437 cDNA clones corresponded to 10,463 transcripts. Approximately 80% of the transcripts resembled ESTs from Pinus and Picea, while approximately 75% had homologs in Arabidopsis. An analysis of homologies between ESTs from C. japonica male strobili and known pollen allergens in the Allergome Database revealed that products of 180 transcripts exhibited significant homology. Approximately 2% of the transcripts appeared to encode transcription factors. We identified twelve genes for MADS-box proteins among these transcription factors. The twelve MADS-box genes were classified as DEF/GLO/GGM13-, AG-, AGL6-, TM3- and TM8-like MIKCC genes and type I MADS-box genes. Conclusion Our full-length enriched cDNA library derived from C. japonica male strobili provides information on expression of genes during the development of male reproductive organs. We provided potential allergens in C. japonica. We also provided new information about transcription factors including MADS-box genes expressed in male strobili of C. japonica. Large-scale gene discovery using full-length cDNAs is a valuable tool for studies of gymnosperm species. PMID:18691438

  15. Serial number tagging reveals a prominent sequence preference of retrotransposon integration.

    PubMed

    Chatterjee, Atreyi Ghatak; Esnault, Caroline; Guo, Yabin; Hung, Stevephen; McQueen, Philip G; Levin, Henry L

    2014-07-01

    Transposable elements (TE) have both negative and positive impact on the biology of their host. As a result, a balance is struck between the host and the TE that relies on directing integration to specific genome territories. The extraordinary capacity of DNA sequencing can create ultra dense maps of integration that are being used to study the mechanisms that position integration. Unfortunately, the great increase in the numbers of insertion sites detected comes with the cost of not knowing which positions are rare targets and which sustain high numbers of insertions. To address this problem we developed the serial number system, a TE tagging method that measures the frequency of integration at single nucleotide positions. We sequenced 1 million insertions of retrotransposon Tf1 in the genome of Schizosaccharomyces pombe and obtained the first profile of integration with frequencies for each individual position. Integration levels at individual nucleotides varied over two orders of magnitude and revealed that sequence recognition plays a key role in positioning integration. The serial number system is a general method that can be applied to determine precise integration maps for retroviruses and gene therapy vectors.

  16. Quantitative gene expression profiles in real time from expressed sequence tag databases.

    PubMed

    Funari, Vincent A; Voevodski, Konstantin; Leyfer, Dimitry; Yerkes, Laura; Cramer, Donald; Tolan, Dean R

    2010-01-01

    An accumulation of expressed sequence tag (EST) data in the public domain and the availability of bioinformatic programs have made EST gene expression profiling a common practice. However, the utility and validity of using EST databases (e.g., dbEST) has been criticized, particularly for quantitative assessment of gene expression. Problems with EST sequencing errors, library construction, EST annotation, and multiple paralogs make generation of specific and sensitive qualitative arid quantitative expression profiles a concern. In addition, most EST-derived expression data exists in previously assembled databases. The Virtual Northern Blot (VNB) (http: //tlab.bu.edu/vnb.html) allows generation, evaluation, and optimization of expression profiles in real time, which is especially important for alternatively spliced, novel, or poorly characterized genes. Representative gene families with variable nucleotide sequence identity, tissue specificity, and levels of expression (bcl-xl, aldoA, and cyp2d9) are used to assess the quality of VNB's output. The profiles generated by VNB are more sensitive and specific than those constructed with ESTs listed in preindexed databases at UCSC and NCBI. Moreover, quantitative expression profiles produced by VNB are comparable to quantization obtained from Northern blots and qPCR. The VNB pipeline generates real-time gene expression profiles for single-gene queries that are both qualitatively and quantitatively reliable.

  17. Expressed sequence tags reveal Proctotrupomorpha (minus Chalcidoidea) as sister to Aculeata (Hymenoptera: Insecta).

    PubMed

    Sharanowski, Barbara J; Robbertse, Barbara; Walker, John; Voss, S Randal; Yoder, Ryan; Spatafora, Joseph; Sharkey, Michael J

    2010-10-01

    Hymenoptera is one of the most diverse groups of animals on the planet and have vital importance for ecosystem function as pollinators and parasitoids. Higher-level relationships among Hymenoptera have been notoriously difficult to resolve with both morphological and traditional molecular approaches. Here we examined the utility of expressed sequence tags for resolving relationships among hymenopteran superfamilies. Transcripts were assembled for 6 disparate Hymenopteran taxa with additional sequences added from public databases for a final dataset of 24 genes for 16 taxa and over 10 kb of sequence data. The concatenated dataset recovered a robust and well-supported topology demonstrating the monophyly of Holometabola, Hymenoptera, Apocrita, Aculeata, Ichneumonoidea, and a sister relationship between the two most closely related proctotrupomorphs in the dataset (Cynipoidea+Proctotrupoidea). The data strongly supported a sister relationship between Aculeata and Proctotrupomorpha, contrary to previously proposed hypotheses. Additionally there was strong evidence indicating Ichneumonoidea as sister to Aculeata+Proctotrupomorpha. These relationships were robust to missing data, nucleotide composition biases, low taxonomic sampling, and conflicting signal across gene trees. There was also strong evidence indicating that Chalcidoidea is not contained within Proctotrupomorpha. Copyright 2010 Elsevier Inc. All rights reserved.

  18. Mining for single nucleotide polymorphisms and insertions / deletions in expressed sequence tag libraries of oil palm.

    PubMed

    Riju, Aykkal; Chandrasekar, Arumugam; Arunachalam, Vadivel

    2007-01-01

    The oil palm is a tropical oil bearing tree. Recently EST-derived SNPs and SSRs are a free by-product of the currently expanding EST (Expressed Sequence Tag) data bases. The development of high-throughput methods for the detection of SNPs (Single Nucleotide Polymorphism) and small indels (insertion / deletion) has led to a revolution in their use as molecular markers. Available (5452) Oil palm EST sequences were mined from dbEST of NCBI. CAP3 program was used to assemble EST sequences into contigs. Candidate SNPs and Indel polymorphisms were detected using the perl script auto_snip version 1.0 which has used 576 ESTs for detecting SNPs and Indel sites. We found 1180 SNP sites and 137 indel polymorphisms with frequency 1.36 SNPs / 100 bp. Among the six tissues from which the EST libraries had been generated, mesocarp had high frequency of 2.91 SNPs and indels per 100 bp whereas the zygotic embryos had lowest frequency of 0.15 per 100 bp. We also used the Shannon index to analyze the proportion of ten possible types of SNP/indels. ESTs from tissues of normal apex showed highest values of Shannon index (0.60) whereas abnormal apex had least value (0.02). The present report deals the use of Shannon index for comparing SNP/ indel frequencies mined from ESTlibraries and also confirm that the frequency of SNP occurrence in oil palm to use them as markers for genetic studies.

  19. Characterization of genic microsatellite markers derived from expressed sequence tags in Pacific abalone ( Haliotis discus hannai)

    NASA Astrophysics Data System (ADS)

    Li, Qi; Shu, Jing; Zhao, Cui; Liu, Shikai; Kong, Lingfeng; Zheng, Xiaodong

    2010-01-01

    Simple sequence repeat (SSR) markers were developed from the expressed sequence tags (ESTs) of Pacific abalone ( Haliotis discus hannai). Repeat motifs were found in 4.95% of the ESTs at a frequency of one repeat every 10.04 kb of EST sequences, after redundancy elimination. Seventeen polymorphic EST-SSRs were developed. The number of alleles per locus varied from 2-17, with an average of 6.8 alleles per locus. The expected and observed heterozygosities ranged from 0.159 to 0.928 and from 0.132 to 0.922, respectively. Twelve of the 17 loci (70.6%) were successfully amplified in H. diversicolor. Seventeen loci segregated in three families, with three showing the presence of null alleles (17.6%). The adequate level of variability and low frequency of null alleles observed in H. discus hannai, together with the high rate of transportability across Haliotis species, make this set of EST-SSR markers an important tool for comparative mapping, marker-assisted selection, and evolutionary studies, not only in the Pacific abalone, but also in related species.

  20. Quantitative Gene Expression Profiles in Real Time From Expressed Sequence Tag Databases

    PubMed Central

    FUNARI, VINCENT A.; VOEVODSKI, KONSTANTIN; LEYFER, DIMITRY; YERKES, LAURA; CRAMER, DONALD; TOLAN, DEAN R.

    2010-01-01

    An accumulation of expressed sequence tag (EST) data in the public domain and the availability of bioinformatic programs have made EST gene expression profiling a common practice. However, the utility and validity of using EST databases (e.g., dbEST) has been criticized, particularly for quantitative assessment of gene expression. Problems with EST sequencing errors, library construction, EST annotation, and multiple paralogs make generation of specific and sensitive qualitative and quantitative expression profiles a concern. In addition, most EST-derived expression data exists in previously assembled databases. The Virtual Northern Blot (VNB) (http://tlab.bu.edu/vnb.html) allows generation, evaluation, and optimization of expression profiles in real time, which is especially important for alternatively spliced, novel, or poorly characterized genes. Representative gene families with variable nucleotide sequence identity, tissue specificity, and levels of expression (bcl-xl, aldoA, and cyp2d9) are used to assess the quality of VNB’s output. The profiles generated by VNB are more sensitive and specific than those constructed with ESTs listed in preindexed databases at UCSI and NCBI. Moreover, quantitative expression profiles produced by VNB are comparable to quantization obtained from Northern blots and qPCR. The VNB pipeline generates real-time gene expression profiles for single-gene queries that are both qualitatively and quantitatively reliable. PMID:20635574

  1. Large scale in silico identification of MYB family genes from wheat expressed sequence tags.

    PubMed

    Cai, Hongsheng; Tian, Shan; Dong, Hansong

    2012-10-01

    The MYB proteins constitute one of the largest transcription factor families in plants. Much research has been performed to determine their structures, functions, and evolution, especially in the model plants, Arabidopsis, and rice. However, this transcription factor family has been much less studied in wheat (Triticum aestivum), for which no genome sequence is yet available. Despite this, expressed sequence tags are an important resource that permits opportunities for large scale gene identification. In this study, a total of 218 sequences from wheat were identified and confirmed to be putative MYB proteins, including 1RMYB, R2R3-type MYB, 3RMYB, and 4RMYB types. A total of 36 R2R3-type MYB genes with complete open reading frames were obtained. The putative orthologs were assigned in rice and Arabidopsis based on the phylogenetic tree. Tissue-specific expression pattern analyses confirmed the predicted orthologs, and this meant that gene information could be inferred from the Arabidopsis genes. Moreover, the motifs flanking the MYB domain were analyzed using the MEME web server. The distribution of motifs among wheat MYB proteins was investigated and this facilitated subfamily classification.

  2. Construction and Evaluation of cDNA Libraries for Large-Scale Expressed Sequence Tag Sequencing in Wheat (Triticum aestivum L.)

    PubMed Central

    Zhang, D.; Choi, D. W.; Wanamaker, S.; Fenton, R. D.; Chin, A.; Malatrasi, M.; Turuspekov, Y.; Walia, H.; Akhunov, E. D.; Kianian, P.; Otto, C.; Simons, K.; Deal, K. R.; Echenique, V.; Stamova, B.; Ross, K.; Butler, G. E.; Strader, L.; Verhey, S. D.; Johnson, R.; Altenbach, S.; Kothari, K.; Tanaka, C.; Shah, M. M.; Laudencia-Chingcuanco, D.; Han, P.; Miller, R. E.; Crossman, C. C.; Chao, S.; Lazo, G. R.; Klueva, N.; Gustafson, J. P.; Kianian, S. F.; Dubcovsky, J.; Walker-Simmons, M. K.; Gill, K. S.; Dvořák, J.; Anderson, O. D.; Sorrells, M. E.; McGuire, P. E.; Qualset, C. O.; Nguyen, H. T.; Close, T. J.

    2004-01-01

    A total of 37 original cDNA libraries and 9 derivative libraries enriched for rare sequences were produced from Chinese Spring wheat (Triticum aestivum L.), five other hexaploid wheat genotypes (Cheyenne, Brevor, TAM W101, BH1146, Butte 86), tetraploid durum wheat (T. turgidum L.), diploid wheat (T. monococcum L.), and two other diploid members of the grass tribe Triticeae (Aegilops speltoides Tausch and Secale cereale L.). The emphasis in the choice of plant materials for library construction was reproductive development subjected to environmental factors that ultimately affect grain quality and yield, but roots and other tissues were also included. Partial cDNA expressed sequence tags (ESTs) were examined by various measures to assess the quality of these libraries. All ESTs were processed to remove cloning system sequences and contaminants and then assembled using CAP3. Following these processing steps, this assembly yielded 101,107 sequences derived from 89,043 clones, which defined 16,740 contigs and 33,213 singletons, a total of 49,953 “unigenes.” Analysis of the distribution of these unigenes among the libraries led to the conclusion that the enrichment methods were effective in reducing the most abundant unigenes and to the observation that the most diverse libraries were from tissues exposed to environmental stresses including heat, drought, salinity, or low temperature. PMID:15514038

  3. Cell-free translational screening of an expression sequence tag library of Clonorchis sinensis for novel antigen discovery.

    PubMed

    Kasi, Devi; Catherine, Christy; Lee, Seung-Won; Lee, Kyung-Ho; Kim, Yu Jung; Ro Lee, Myeong; Ju, Jung Won; Kim, Dong-Myung

    2017-01-27

    The rapidly evolving cloning and sequencing technologies have enabled understanding of genomic structure of parasite genomes, opening up new ways of combatting parasite-related diseases. To make the most of the exponentially accumulating genomic data, however, it is crucial to analyze the proteins encoded by these genomic sequences. In this study, we adopted an engineered cell-free protein synthesis system for large-scale expression screening of an expression sequence tag (EST) library of Clonorchis sinensis to identify potential antigens that can be used for diagnosis and treatment of clonorchiasis. To allow high-throughput expression and identification of individual genes comprising the library, a cell-free synthesis reaction was designed such that both the template DNA and the expressed proteins were co-immobilized on the same microbeads, leading to microbead-based linkage of the genotype and phenotype. This reaction configuration allowed streamlined expression, recovery, and analysis of proteins. This approach enabled us to identify 21 antigenic proteins. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 2017.

  4. Generation of 10,154 expressed sequence tags from a leafy gametophyte of a marine red alga, Porphyra yezoensis.

    PubMed

    Nikaido, I; Asamizu, E; Nakajima, M; Nakamura, Y; Saga, N; Tabata, S

    2000-06-30

    A total of 10,154 5'-end expressed sequence tags (EST) were established from the normalized and size-selected cDNA libraries of a marine red alga, Porphyra yezoensis. Among the ESTs, 2140 were unique species, and the remaining 8014 were grouped into 1127 species. Database search of the 3267 non-redundant ESTs by BLAST algorithm showed that the sequences of 1080 species (33.1%) have similarity to those of registered genes from various organisms including higher plants, mammals, yeasts, and cyanobacteria, while 2187 (66.9%) are novel. Codon usage analysis in the coding regions of 101 non-redundant EST groups showing significant similarity to known genes indicated the higher GC contents at the third position of codons (79.4%) than the first (62.2%) and the second position (45.0%), suggesting that the genome has been exposed to high GC pressure during evolution. The sequence data of individual ESTs are available at the web site http://www.kazusa.or.jp/en/plant/porphyra/EST/.

  5. An expressed sequence tag survey of gene expression in the pond snail Lymnaea stagnalis, an intermediate vector of trematodes [corrected].

    PubMed

    Davison, A; Blaxter, M L

    2005-05-01

    The pond snail Lymnaea stagnalis is an intermediate vector for the liver fluke Fasciola hepatica, a common parasite of ruminants and humans. Yet, despite being a disease of medical and economic importance, as well as a potentially useful comparative tool, the genetics of the relationship between Lymnaea and Fasciola has barely been investigated. As a complement to forthcoming F. hepatica expressed sequence tags (ESTs), we generated 1320 ESTs from L. stagnalis central nervous system (CNS) libraries. We estimate that these sequences derive from 771 different genes, of which 374 showed significant similarity to proteins in public databases, and 169 were similar to ESTs from the snail vector Biomphalaria glabrata. These L. stagnalis ESTs will provide insight into the function of the snail CNS, as well as the molecular components of behaviour and response to parasitism. In the future, the comparative analysis of Lymnaea/Fasciola with Biomphalaria/Schistosoma will help to understand both conserved and divergent aspects of the host-parasite relationship. The L. stagnalis ESTs will also assist gene prediction in the forthcoming B. glabrata genome sequence. The dataset is available for searching on the world-wide web at http://zeldia.cap.ed.ac.uk/mollusca.html.

  6. Sampling and pyrosequencing methods for characterizing bacterial communities in the human gut using 16S sequence tags.

    PubMed

    Wu, Gary D; Lewis, James D; Hoffmann, Christian; Chen, Ying-Yu; Knight, Rob; Bittinger, Kyle; Hwang, Jennifer; Chen, Jun; Berkowsky, Ronald; Nessel, Lisa; Li, Hongzhe; Bushman, Frederic D

    2010-07-30

    Intense interest centers on the role of the human gut microbiome in health and disease, but optimal methods for analysis are still under development. Here we present a study of methods for surveying bacterial communities in human feces using 454/Roche pyrosequencing of 16S rRNA gene tags. We analyzed fecal samples from 10 individuals and compared methods for storage, DNA purification and sequence acquisition. To assess reproducibility, we compared samples one cm apart on a single stool specimen for each individual. To analyze storage methods, we compared 1) immediate freezing at -80 degrees C, 2) storage on ice for 24 or 3) 48 hours. For DNA purification methods, we tested three commercial kits and bead beating in hot phenol. Variations due to the different methodologies were compared to variation among individuals using two approaches--one based on presence-absence information for bacterial taxa (unweighted UniFrac) and the other taking into account their relative abundance (weighted UniFrac). In the unweighted analysis relatively little variation was associated with the different analytical procedures, and variation between individuals predominated. In the weighted analysis considerable variation was associated with the purification methods. Particularly notable was improved recovery of Firmicutes sequences using the hot phenol method. We also carried out surveys of the effects of different 454 sequencing methods (FLX versus Titanium) and amplification of different 16S rRNA variable gene segments. Based on our findings we present recommendations for protocols to collect, process and sequence bacterial 16S rDNA from fecal samples--some major points are 1) if feasible, bead-beating in hot phenol or use of the PSP kit improves recovery; 2) storage methods can be adjusted based on experimental convenience; 3) unweighted (presence-absence) comparisons are less affected by lysis method.

  7. Mining of expressed sequence tag libraries of cacao for microsatellite markers using five computational tools.

    PubMed

    Riju, Aikkal; Rajesh, M K; Sherin, P T P Fasila; Chandrasekar, A; Apshara, S Elain; Arunachalam, Vadivel

    2009-08-01

    Expressed sequence tags (ESTs) provide researchers with a quick and inexpensive route for discovering new genes, data on gene expression and regulation, and also provide genic markers that help in constructing genome maps. Cacao is an important perennial crop of humid tropics. Cacao EST sequences, as available in the public domain, were downloaded and made into contigs. Microsatellites were located in these ESTs and contigs using five softwares (MISA, TRA, TROLL, SSRIT and SSR primer). MISA gave maximum coverage of SSRs in cacao ESTs and contigs, although TRA was able to detect higher order (5-mer) repeats. The frequency of SSRs was one per 26.9 kb in the known set of ESTs. One-third of the repeats in EST-contigs were found to be trimeric. A few rare repeats like 21-mer repeat were also located. A/T repeats were most abundant among the mononucleotide repeats and the AG/GA/TC/CT type was the most frequent among dimerics. Flanking primers were designed using Primer3 program and verified experimentally for PCR amplification. The results of the study are made available freely online database (http://riju.byethost31.com/cocoa/). Seven primer pairs amplified genomic DNA isolated from leaves were used to screen a representative set of 12 accessions of cacao.

  8. Exploitation of a turbot (Scophthalmus maximus L.) immune-related expressed sequence tag (EST) database for microsatellite screening and validation.

    PubMed

    Navajas-Pérez, R; Robles, F; Molina-Luzón, M J; De La Herrán, R; Alvarez-Dios, J A; Pardo, B G; Vera, M; Bouza, C; Martínez, P

    2012-07-01

    In this study, we identified and characterized 160 microsatellite loci from an expressed sequence tag (EST) database generated from immune-related organs of turbot (Scophthalmus maximus). A final set of 83 new polymorphic microsatellites were validated after the analysis of 40 individuals of Atlantic origin including both wild and farmed individuals. The allele number and the expected heterozygosity ranged from 2 to 18 and from 0.021 to 0.951, respectively. Evidences of null alleles at moderate-high frequencies were detected at six loci using population data. None of the analysed loci showed deviations from Mendelian segregation after the analysis of five full-sib families including approximately 92 individuals/family. The markers are used to consolidate the turbot genetic map, and because they are mostly EST-derived, they will be very useful for comparative genomic studies within flatfishes and with model fish species. Using an in silico approach, we detected significant homologies of microsatellite sequences with the EST databases of the flatfish species with highest genomic resources (Senegalese sole, Atlantic halibut, bastard halibut) in 31% of these turbot markers. The conservation of these microsatellites within Pleuronectiformes will pave the way for anchoring genetic maps of different species and identifying genomic regions related to productive traits.

  9. Proteomic analysis of Trypanosoma cruzi developmental stages using isotope-coded affinity tag reagents.

    PubMed

    Paba, Jaime; Ricart, Carlos A O; Fontes, Wagner; Santana, Jaime M; Teixeira, Antonio R L; Marchese, Jason; Williamson, Brian; Hunt, Tony; Karger, Barry L; Sousa, Marcelo V

    2004-01-01

    Comparative proteome analysis of developmental stages of the human pathogen Trypanosoma cruzi was carried out by isotope-coded affinity tag technology (ICAT) associated with liquid cromatography-mass spectrometry peptide sequencing (LC-MS/MS). Protein extracts of the protozoan trypomastigote and amastigote stages were labeled with heavy (D8) and light (D0) ICAT reagents and subjected to cation exchange and avidin affinity chromatographies followed by LC-MS/MS analysis. High confidence sequence information and expression levels for 41 T. cruzi polypeptides, including metabolic enzymes, paraflagellar rod components, tubulins, and heat-shock proteins were reported. Twenty-nine proteins displayed similar levels of expression in both forms of the parasite, nine proteins presented higher levels in trypomastigotes, whereas three were more expressed in amastigotes.

  10. Population Genomics of Parallel Adaptation in Threespine Stickleback using Sequenced RAD Tags

    PubMed Central

    Etter, Paul D.; Stiffler, Nicholas; Johnson, Eric A.; Cresko, William A.

    2010-01-01

    Next-generation sequencing technology provides novel opportunities for gathering genome-scale sequence data in natural populations, laying the empirical foundation for the evolving field of population genomics. Here we conducted a genome scan of nucleotide diversity and differentiation in natural populations of threespine stickleback (Gasterosteus aculeatus). We used Illumina-sequenced RAD tags to identify and type over 45,000 single nucleotide polymorphisms (SNPs) in each of 100 individuals from two oceanic and three freshwater populations. Overall estimates of genetic diversity and differentiation among populations confirm the biogeographic hypothesis that large panmictic oceanic populations have repeatedly given rise to phenotypically divergent freshwater populations. Genomic regions exhibiting signatures of both balancing and divergent selection were remarkably consistent across multiple, independently derived populations, indicating that replicate parallel phenotypic evolution in stickleback may be occurring through extensive, parallel genetic evolution at a genome-wide scale. Some of these genomic regions co-localize with previously identified QTL for stickleback phenotypic variation identified using laboratory mapping crosses. In addition, we have identified several novel regions showing parallel differentiation across independent populations. Annotation of these regions revealed numerous genes that are candidates for stickleback phenotypic evolution and will form the basis of future genetic analyses in this and other organisms. This study represents the first high-density SNP–based genome scan of genetic diversity and differentiation for populations of threespine stickleback in the wild. These data illustrate the complementary nature of laboratory crosses and population genomic scans by confirming the adaptive significance of previously identified genomic regions, elucidating the particular evolutionary and demographic history of such regions in natural

  11. Mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants.

    PubMed

    Lopez, Lua; Barreiro, Rodolfo; Fischer, Markus; Koch, Marcus A

    2015-10-13

    Simple Sequence Repeats (SSRs) are widely used in population genetic studies but their classical development is costly and time-consuming. The ever-increasing available DNA datasets generated by high-throughput techniques offer an inexpensive alternative for SSRs discovery. Expressed Sequence Tags (ESTs) have been widely used as SSR source for plants of economic relevance but their application to non-model species is still modest. Here, we explored the use of publicly available ESTs (GenBank at the National Center for Biotechnology Information-NCBI) for SSRs development in non-model plants, focusing on genera listed by the International Union for the Conservation of Nature (IUCN). We also search two model genera with fully annotated genomes for EST-SSRs, Arabidopsis and Oryza, and used them as controls for genome distribution analyses. Overall, we downloaded 16 031 555 sequences for 258 plant genera which were mined for SSRsand their primers with the help of QDD1. Genome distribution analyses in Oryza and Arabidopsis were done by blasting the sequences with SSR against the Oryza sativa and Arabidopsis thaliana reference genomes implemented in the Basal Local Alignment Tool (BLAST) of the NCBI website. Finally, we performed an empirical test to determine the performance of our EST-SSRs in a few individuals from four species of two eudicot genera, Trifolium and Centaurea. We explored a total of 14 498 726 EST sequences from the dbEST database (NCBI) in 257 plant genera from the IUCN Red List. We identify a very large number (17 102) of ready-to-test EST-SSRs in most plant genera (193) at no cost. Overall, dinucleotide and trinucleotide repeats were the prevalent types but the abundance of the various types of repeat differed between taxonomic groups. Control genomes revealed that trinucleotide repeats were mostly located in coding regions while dinucleotide repeats were largely associated with untranslated regions. Our results from the empirical test revealed

  12. Construction of a Lotus japonicus late nodulin expressed sequence tag library and identification of novel nodule-specific genes.

    PubMed Central

    Szczyglowski, K; Hamburger, D; Kapranov, P; de Bruijn, F J

    1997-01-01

    A range of novel expressed sequence tags (ESTs) associated with late developmental events during nodule organogenesis in the legume Lotus japonicus were identified using mRNA differential display; 110 differentially displayed polymerase chain reaction products were cloned and analyzed. Of 88 unique cDNAs obtained, 22 shared significant homology to DNA/protein sequences in the respective databases. This group comprises, among others, a nodule-specific homolog of protein phosphatase 2C, a peptide transporter protein, and a nodule-specific form of cytochrome P450. RNA gel-blot analysis of 16 differentially displayed ESTs confirmed their nodule-specific expression pattern. The kinetics of mRNA accumulation of the majority of the ESTs analyzed were found to resemble the expression pattern observed for the L. japonicus leghemoglobin gene. These results indicate that the newly isolated molecular markers correspond to genes induced during late developmental stages of L. japonicus nodule organogenesis and provide important, novel tools for the study of nodulation. PMID:9276951

  13. Transposon Tc1-derived, sequence-tagged sites in Caenorhabditis elegans as markers for gene mapping

    PubMed Central

    Korswagen, Hendrik C.; Durbin, Richard M.; Smits, Miriam T.; Plasterk, Ronald H. A.

    1996-01-01

    We present an approach to map large numbers of Tc1 transposon insertions in the genome of Caenorhabditis elegans. Strains have been described that contain up to 500 polymorphic Tc1 insertions. From these we have cloned and shotgun sequenced over 2000 Tc1 flanks, resulting in an estimated set of 400 or more distinct Tc1 insertion alleles. Alignment of these sequences revealed a weak Tc1 insertion site consensus sequence that was symmetric around the invariant TA target site and reads CAYATATRTG. The Tc1 flanking sequences were compared with 40 Mbp of a C. elegans genome sequence. We found 151 insertions within the sequenced area, a density of ≈1 Tc1 insertion in every 265 kb. As the rest of the C. elegans genome sequence is obtained, remaining Tc1 alleles will fall into place. These mapped Tc1 insertions can serve two functions: (i) insertions in or near genes can be used to isolate deletion derivatives that have that gene mutated; and (ii) they represent a dense collection of polymorphic sequence-tagged sites. We demonstrate a strategy to use these Tc1 sequence-tagged sites in fine-mapping mutations. PMID:8962114

  14. Expressed sequence tags related to nitrogen metabolism in maize inoculated with Azospirillum brasilense.

    PubMed

    Pereira-Defilippi, L; Pereira, E M; Silva, F M; Moro, G V

    2017-05-31

    The relative quantitative real-time expression of two expressed sequence tags (ESTs) codifying for key enzymes in nitrogen metabolism in maize, nitrate reductase (ZmNR), and glutamine synthetase (ZmGln1-3) was performed for genotypes inoculated with Azospirillum brasilense. Two commercial single-cross hybrids (AG7098 and 2B707) and two experimental synthetic varieties (V2 and V4) were raised under controlled greenhouse conditions, in six treatment groups corresponding to different forms of inoculation and different levels of nitrogen application by top-dressing. The genotypes presented distinct responses to inoculation with A. brasilense. Increases in the expression of ZmNR were observed for the hybrids, while V4 only displayed a greater level of expression when the plants received nitrogenous fertilization by top-dressing and there was no inoculation. The expression of the ZmGln1-3EST was induced by A. brasilense in the hybrids and the variety V4. In contrast, the variety V2 did not respond to inoculation.

  15. Identification of cut rose (Rosa hybrida) and rootstock varieties using robust sequence tagged microsatellite site markers.

    PubMed

    Esselink, G D; Smulders, M J M; Vosman, B

    2003-01-01

    In this study a DNA fingerprinting protocol was developed for the identification of rose varieties based on the variability of microsatellites. Microsatellites were isolated from Rosa hybrida L. using enriched small insert libraries. In total 24 polymorphic sequenced tagged microsatellite site (STMS) markers with easily scorable allele profiles, from six different linkage groups, were used to characterize 46 Hybrid Tea varieties and 30 rootstock varieties belonging to different species (Rosa canina L., Rosa indica Thory., Rosa chinensis Jacq., Rosa rubiginosa L., and Rosa rubrifolia glauca Pour.). Clones and known flower color mutants were identified as being identical, all other varieties were differentiated by a unique pattern with as few as three STMS markers. The high discriminating power of the loci suggests that a selection of the most-robust STMS markers may be able to differentiate any two varieties within rootstocks or Hybrid Teas except for mutants. The selected STMS markers will be useful as a tool for reference collection management, for assessing essential derivation of varieties and illegal propagation.

  16. Parallel tagged amplicon sequencing of transcriptome-based genetic markers for Triturus newts with the Ion Torrent next-generation sequencing platform.

    PubMed

    Wielstra, B; Duijm, E; Lagler, P; Lammers, Y; Meilink, W R M; Ziermann, J M; Arntzen, J W

    2014-09-01

    Next-generation sequencing is a fast and cost-effective way to obtain sequence data for nonmodel organisms for many markers and for many individuals. We describe a protocol through which we obtain orthologous markers for the crested newts (Amphibia: Salamandridae: Triturus), suitable for analysis of interspecific hybridization. We use transcriptome data of a single Triturus species and design 96 primer pairs that amplify c. 180 bp fragments positioned in 3-prime untranslated regions. Next, these markers are tested with uniplex PCR for a set of species spanning the taxonomical width of the genus Triturus. The 52 markers that consistently show a single band of expected length at gel electrophoreses for all tested crested newt species are then amplified in five multiplex PCRs (with a plexity of ten or eleven) for 132 individual newts: a set of 84 representing the seven (candidate) species and a set of 48 from a presumed hybrid population. After pooling multiplexes per individual, unique tags are ligated to link amplicons to individuals. Subsequently, individuals are pooled equimolar and sequenced on the Ion Torrent next-generation sequencing platform. A bioinformatics pipeline identifies the alleles and recodes these to a genotypic format. Next, we test the utility of our markers. baps allocates the 84 crested newt individuals representing (candidate) species to their expected (candidate) species, confirming the markers are suitable for species delineation. newhybrids, a hybrid index and hiest confirm the 48 individuals from the presumed hybrid population to be genetically admixed, illustrating the potential of the markers to identify interspecific hybridization. We expect the set of markers we designed to provide a high resolving power for analysis of hybridization in Triturus.

  17. Parallel tagged amplicon sequencing of transcriptome-based genetic markers for Triturus newts with the Ion Torrent next-generation sequencing platform

    PubMed Central

    Wielstra, B; Duijm, E; Lagler, P; Lammers, Y; Meilink, W R M; Ziermann, J M; Arntzen, J W

    2014-01-01

    Next-generation sequencing is a fast and cost-effective way to obtain sequence data for nonmodel organisms for many markers and for many individuals. We describe a protocol through which we obtain orthologous markers for the crested newts (Amphibia: Salamandridae: Triturus), suitable for analysis of interspecific hybridization. We use transcriptome data of a single Triturus species and design 96 primer pairs that amplify c. 180 bp fragments positioned in 3-prime untranslated regions. Next, these markers are tested with uniplex PCR for a set of species spanning the taxonomical width of the genus Triturus. The 52 markers that consistently show a single band of expected length at gel electrophoreses for all tested crested newt species are then amplified in five multiplex PCRs (with a plexity of ten or eleven) for 132 individual newts: a set of 84 representing the seven (candidate) species and a set of 48 from a presumed hybrid population. After pooling multiplexes per individual, unique tags are ligated to link amplicons to individuals. Subsequently, individuals are pooled equimolar and sequenced on the Ion Torrent next-generation sequencing platform. A bioinformatics pipeline identifies the alleles and recodes these to a genotypic format. Next, we test the utility of our markers. baps allocates the 84 crested newt individuals representing (candidate) species to their expected (candidate) species, confirming the markers are suitable for species delineation. newhybrids, a hybrid index and hiest confirm the 48 individuals from the presumed hybrid population to be genetically admixed, illustrating the potential of the markers to identify interspecific hybridization. We expect the set of markers we designed to provide a high resolving power for analysis of hybridization in Triturus. PMID:24571307

  18. A Scalable Epitope Tagging Approach for High Throughput ChIP-Seq Analysis.

    PubMed

    Xiong, Xiong; Zhang, Yanxiao; Yan, Jian; Jain, Surbhi; Chee, Sora; Ren, Bing; Zhao, Huimin

    2017-06-16

    Eukaryotic transcriptional factors (TFs) typically recognize short genomic sequences alone or together with other proteins to modulate gene expression. Mapping of TF-DNA interactions in the genome is crucial for understanding the gene regulatory programs in cells. While chromatin immunoprecipitation followed by sequencing (ChIP-Seq) is commonly used for this purpose, its application is severely limited by the availability of suitable antibodies for TFs. To overcome this limitation, we developed an efficient and scalable strategy named cmChIP-Seq that combines the clustered regularly interspaced short palindromic repeats (CRISPR) technology with microhomology mediated end joining (MMEJ) to genetically engineer a TF with an epitope tag. We demonstrated the utility of this tool by applying it to four TFs in a human colorectal cancer cell line. The highly scalable procedure makes this strategy ideal for ChIP-Seq analysis of TFs in diverse species and cell types.

  19. Rediscovering medicinal plants' potential with OMICS: microsatellite survey in expressed sequence tags of eleven traditional plants with potent antidiabetic properties.

    PubMed

    Sahu, Jagajjit; Sen, Priyabrata; Choudhury, Manabendra Dutta; Dehury, Budheswar; Barooah, Madhumita; Modi, Mahendra Kumar; Talukdar, Anupam Das

    2014-05-01

    Herbal medicines and traditionally used medicinal plants present an untapped potential for novel molecular target discovery using systems science and OMICS biotechnology driven strategies. Since up to 40% of the world's poor people have no access to government health services, traditional and folk medicines are often the only therapeutics available to them. In this vein, North East (NE) India is recognized for its rich bioresources. As part of the Indo-Burma hotspot, it is regarded as an epicenter of biodiversity for several plants having myriad traditional uses, including medicinal use. However, the improvement of these valuable bioresources through molecular breeding strategies, for example, using genic microsatellites or Simple Sequence Repeats (SSRs) or Expressed Sequence Tags (ESTs)-derived SSRs has not been fully utilized in large scale to date. In this study, we identified a total of 47,700 microsatellites from 109,609 ESTs of 11 medicinal plants (pineapple, papaya, noyontara, bitter orange, bermuda brass, ratalu, barbados nut, mango, mulberry, lotus, and guduchi) having proven antidiabetic properties. A total of 58,159 primer pairs were designed for the non-redundant 8060 SSR-positive ESTs and putative functions were assigned to 4483 unique contigs. Among the identified microsatellites, excluding mononucleotide repeats, di-/trinucleotides are predominant, among which repeat motifs of AG/CT and AAG/CTT were most abundant. Similarity search of SSR containing ESTs and antidiabetic gene sequences revealed 11 microsatellites linked to antidiabetic genes in five plants. GO term enrichment analysis revealed a total of 80 enriched GO terms widely distributed in 53 biological processes, 17 molecular functions, and 10 cellular components associated with the 11 markers. The present study therefore provides concrete insights into the frequency and distribution of SSRs in important medicinal resources. The microsatellite markers reported here markedly add to the genetic

  20. Rediscovering Medicinal Plants' Potential with OMICS: Microsatellite Survey in Expressed Sequence Tags of Eleven Traditional Plants with Potent Antidiabetic Properties

    PubMed Central

    Sahu, Jagajjit; Sen, Priyabrata; Choudhury, Manabendra Dutta; Dehury, Budheswar; Barooah, Madhumita; Modi, Mahendra Kumar

    2014-01-01

    Abstract Herbal medicines and traditionally used medicinal plants present an untapped potential for novel molecular target discovery using systems science and OMICS biotechnology driven strategies. Since up to 40% of the world's poor people have no access to government health services, traditional and folk medicines are often the only therapeutics available to them. In this vein, North East (NE) India is recognized for its rich bioresources. As part of the Indo-Burma hotspot, it is regarded as an epicenter of biodiversity for several plants having myriad traditional uses, including medicinal use. However, the improvement of these valuable bioresources through molecular breeding strategies, for example, using genic microsatellites or Simple Sequence Repeats (SSRs) or Expressed Sequence Tags (ESTs)-derived SSRs has not been fully utilized in large scale to date. In this study, we identified a total of 47,700 microsatellites from 109,609 ESTs of 11 medicinal plants (pineapple, papaya, noyontara, bitter orange, bermuda brass, ratalu, barbados nut, mango, mulberry, lotus, and guduchi) having proven antidiabetic properties. A total of 58,159 primer pairs were designed for the non-redundant 8060 SSR-positive ESTs and putative functions were assigned to 4483 unique contigs. Among the identified microsatellites, excluding mononucleotide repeats, di-/trinucleotides are predominant, among which repeat motifs of AG/CT and AAG/CTT were most abundant. Similarity search of SSR containing ESTs and antidiabetic gene sequences revealed 11 microsatellites linked to antidiabetic genes in five plants. GO term enrichment analysis revealed a total of 80 enriched GO terms widely distributed in 53 biological processes, 17 molecular functions, and 10 cellular components associated with the 11 markers. The present study therefore provides concrete insights into the frequency and distribution of SSRs in important medicinal resources. The microsatellite markers reported here markedly add to

  1. Single nucleotide polymorphism discovery from expressed sequence tags in the waterflea Daphnia magna

    PubMed Central

    2011-01-01

    Background Daphnia (Crustacea: Cladocera) plays a central role in standing aquatic ecosystems, has a well known ecology and is widely used in population studies and environmental risk assessments. Daphnia magna is, especially in Europe, intensively used to study stress responses of natural populations to pollutants, climate change, and antagonistic interactions with predators and parasites, which have all been demonstrated to induce micro-evolutionary and adaptive responses. Although its ecology and evolutionary biology is intensively studied, little is known on the functional genomics underpinning of phenotypic responses to environmental stressors. The aim of the present study was to find genes expressed in presence of environmental stressors, and target such genes for single nucleotide polymorphic (SNP) marker development. Results We developed three expressed sequence tag (EST) libraries using clonal lineages of D. magna exposed to ecological stressors, namely fish predation, parasite infection and pesticide exposure. We used these newly developed ESTs and other Daphnia ESTs retrieved from NCBI GeneBank to mine for SNP markers targeting synonymous as well as non synonymous genetic variation. We validate the developed SNPs in six natural populations of D. magna distributed at regional scale. Conclusions A large proportion (47%) of the produced ESTs are Daphnia lineage specific genes, which are potentially involved in responses to environmental stress rather than to general cellular functions and metabolic activities, or reflect the arthropod's aquatic lifestyle. The characterization of genes expressed under stress and the validation of their SNPs for population genetic study is important for identifying ecologically responsive genes in D. magna. PMID:21668940

  2. Changes on microsatellites of expressed sequence tag of sugarcane (Saccharum spp) during vegetative propagation.

    PubMed

    Augusto, R; Maranho, R C; Mangolin, C A; Filho, J C Bespalhok; Machado, M F P S

    2017-03-08

    The reduction in sugarcane productivity in subsequent cutting stages may be related to a gradual decrease of the allele number and mean observed heterozygosity (HO) in the sugarcane ratoon. This hypothesis was tested assessing the number of alleles and HO values in 10 expressed sequence tag microsatellites (Est-SSR loci) of the sugarcane varieties RB72454 and RB867515 in different cutting stages. Changes of allele numbers in samples of different cutting stages were observed in seven and six EstSSR loci of the RB72454 and RB867515 varieties, respectively. Reduction of allele numbers was observed in the samples collected in the fourth and sixth cutting stages of the RB72454 variety. In contrast, an increase of the allele numbers was detected in the samples collected on fourth, sixth, and seventh cutting stages of the RB867515 variety. Unchanged allele numbers were observed only in EstB41, EstC84, and EstB130 loci of the RB72454 variety, and EstB41, EstC67, EstA68, and EstB130 loci of the RB867515 variety. The variety RB867515 has lower polymorphism and values of HO than the RB72454 variety in different stages of cutting. At molecular level, in Est-SSR loci, the RB72454 variety showed higher changes in subsequent stages of cutting than RB867515. The similarities and divergences at molecular level between varieties RB72454 and RB867515 observed in the 10 Est-SSR loci during subsequent cutting stages can not explain the reduced productivity frequently observed after subsequent cutting stages but showed that phenotypic and physiological changes after each cutting stage are also accompanied by changes at genomic level.

  3. DeNovoID: a web-based tool for identifying peptides from sequence and mass tags deduced from de novo peptide sequencing by mass spectroscopy.

    PubMed

    Halligan, Brian D; Ruotti, Victor; Twigger, Simon N; Greene, Andrew S

    2005-07-01

    One of the core activities of high-throughput proteomics is the identification of peptides from mass spectra. Some peptides can be identified using spectral matching programs like Sequest or Mascot, but many spectra do not produce high quality database matches. De novo peptide sequencing is an approach to determine partial peptide sequences for some of the unidentified spectra. A drawback of de novo peptide sequencing is that it produces a series of ordered and disordered sequence tags and mass tags rather than a complete, non-degenerate peptide amino acid sequence. This incomplete data is difficult to use in conventional search programs such as BLAST or FASTA. DeNovoID is a program that has been specifically designed to use degenerate amino acid sequence and mass data derived from MS experiments to search a peptide database. Since the algorithm employed depends on the amino acid composition of the peptide and not its sequence, DeNovoID does not have to consider all possible sequences, but rather a smaller number of compositions consistent with a spectrum. DeNovoID also uses a geometric indexing scheme that reduces the number of calculations required to determine the best peptide match in the database. DeNovoID is available at http://proteomics.mcw.edu/denovoid.

  4. Barcoded DNA-Tag Reporters for Multiplex Cis-Regulatory Analysis

    PubMed Central

    Nam, Jongmin; Davidson, Eric H.

    2012-01-01

    Cis-regulatory DNA sequences causally mediate patterns of gene expression, but efficient experimental analysis of these control systems has remained challenging. Here we develop a new version of “barcoded" DNA-tag reporters, “Nanotags" that permit simultaneous quantitative analysis of up to 130 distinct cis-regulatory modules (CRMs). The activities of these reporters are measured in single experiments by the NanoString RNA counting method and other quantitative procedures. We demonstrate the efficiency of the Nanotag method by simultaneously measuring hourly temporal activities of 126 CRMs from 46 genes in the developing sea urchin embryo, otherwise a virtually impossible task. Nanotags are also used in gene perturbation experiments to reveal cis-regulatory responses of many CRMs at once. Nanotag methodology can be applied to many research areas, ranging from gene regulatory networks to functional and evolutionary genomics. PMID:22563420

  5. Analysis of elite variety tag SNPs reveals an important allele in upland rice.

    PubMed

    Lyu, Jun; Zhang, Shilai; Dong, Yang; He, Weiming; Zhang, Jing; Deng, Xianneng; Zhang, Yesheng; Li, Xin; Li, Baoye; Huang, Wangqi; Wan, Wenting; Yu, Yang; Li, Qiong; Li, Jun; Liu, Xin; Wang, Bo; Tao, Dayun; Zhang, Gengyun; Wang, Jun; Xu, Xun; Hu, Fengyi; Wang, Wen

    2013-01-01

    Elite crop varieties usually fix alleles that occur at low frequencies within non-elite gene pools. Dissecting these alleles for desirable agronomic traits can be accomplished by comparing the genomes of elite varieties with those from non-elite populations. Here we deep-sequence six elite rice varieties and use two large control panels to identify elite variety tag single-nucleotide polymorphism alleles (ETASs). Guided by this preliminary analysis, we comprehensively characterize one protein-altering ETAS in the 9-cis-epoxycarotenoid dioxygenase gene of the IRAT104 upland rice variety. This allele displays a drastic frequency difference between upland and irrigated rice, and a selective sweep is observed around this allele. Functional analysis indicates that in upland rice, this allele is associated with significantly higher abscisic acid levels and denser lateral roots, suggesting its association with upland rice suitability. This report provides a potential strategy to mine rare, agronomically important alleles.

  6. A direct method for regiospecific analysis of TAG using alpha-MAG.

    PubMed

    Turon, F; Bachain, P; Caro, Y; Pina, M; Graille, J

    2002-08-01

    An analytical procedure was developed for regiodistribution analysis of TAG using alpha-MAG prepared by an ethyl magnesium bromide deacylation. In the present communication, the deacylation procedure is shown to lead to representative alpha-MAG, allowing the composition of the native TAG in the alpha-position to be determined directly. The composition in the beta-position can then be estimated from the composition of the alpha-MAG and TAG according to the formula 3 x TAG - 2 x alpha-MAG. The estimates are superior to those obtained using the alpha,beta-DAG and Brockerhoff calculations as they come closer to the theoretical value and have smaller SD. The present procedure, first demonstrated on a synthetic TAG, was then successfully applied to the analysis of borage oil, milkfat, and tuna oil.

  7. Chasing Migration Genes: A Brain Expressed Sequence Tag Resource for Summer and Migratory Monarch Butterflies (Danaus plexippus)

    PubMed Central

    Zhu, Haisun; Casselman, Amy; Reppert, Steven M.

    2008-01-01

    North American monarch butterflies (Danaus plexippus) undergo a spectacular fall migration. In contrast to summer butterflies, migrants are juvenile hormone (JH) deficient, which leads to reproductive diapause and increased longevity. Migrants also utilize time-compensated sun compass orientation to help them navigate to their overwintering grounds. Here, we describe a brain expressed sequence tag (EST) resource to identify genes involved in migratory behaviors. A brain EST library was constructed from summer and migrating butterflies. Of 9,484 unique sequences, 6068 had positive hits with the non-redundant protein database; the EST database likely represents ∼52% of the gene-encoding potential of the monarch genome. The brain transcriptome was cataloged using Gene Ontology and compared to Drosophila. Monarch genes were well represented, including those implicated in behavior. Three genes involved in increased JH activity (allatotropin, juvenile hormone acid methyltransfersase, and takeout) were upregulated in summer butterflies, compared to migrants. The locomotion-relevant turtle gene was marginally upregulated in migrants, while the foraging and single-minded genes were not differentially regulated. Many of the genes important for the monarch circadian clock mechanism (involved in sun compass orientation) were in the EST resource, including the newly identified cryptochrome 2. The EST database also revealed a novel Na+/K+ ATPase allele predicted to be more resistant to the toxic effects of milkweed than that reported previously. Potential genetic markers were identified from 3,486 EST contigs and included 1599 double-hit single nucleotide polymorphisms (SNPs) and 98 microsatellite polymorphisms. These data provide a template of the brain transcriptome for the monarch butterfly. Our “snap-shot” analysis of the differential regulation of candidate genes between summer and migratory butterflies suggests that unbiased, comprehensive transcriptional profiling

  8. Chasing migration genes: a brain expressed sequence tag resource for summer and migratory monarch butterflies (Danaus plexippus).

    PubMed

    Zhu, Haisun; Casselman, Amy; Reppert, Steven M

    2008-01-09

    North American monarch butterflies (Danaus plexippus) undergo a spectacular fall migration. In contrast to summer butterflies, migrants are juvenile hormone (JH) deficient, which leads to reproductive diapause and increased longevity. Migrants also utilize time-compensated sun compass orientation to help them navigate to their overwintering grounds. Here, we describe a brain expressed sequence tag (EST) resource to identify genes involved in migratory behaviors. A brain EST library was constructed from summer and migrating butterflies. Of 9,484 unique sequences, 6068 had positive hits with the non-redundant protein database; the EST database likely represents approximately 52% of the gene-encoding potential of the monarch genome. The brain transcriptome was cataloged using Gene Ontology and compared to Drosophila. Monarch genes were well represented, including those implicated in behavior. Three genes involved in increased JH activity (allatotropin, juvenile hormone acid methyltransfersase, and takeout) were upregulated in summer butterflies, compared to migrants. The locomotion-relevant turtle gene was marginally upregulated in migrants, while the foraging and single-minded genes were not differentially regulated. Many of the genes important for the monarch circadian clock mechanism (involved in sun compass orientation) were in the EST resource, including the newly identified cryptochrome 2. The EST database also revealed a novel Na+/K+ ATPase allele predicted to be more resistant to the toxic effects of milkweed than that reported previously. Potential genetic markers were identified from 3,486 EST contigs and included 1599 double-hit single nucleotide polymorphisms (SNPs) and 98 microsatellite polymorphisms. These data provide a template of the brain transcriptome for the monarch butterfly. Our "snap-shot" analysis of the differential regulation of candidate genes between summer and migratory butterflies suggests that unbiased, comprehensive transcriptional

  9. Metagenomic 16S rDNA Illumina tags are a powerful alternative to amplicon sequencing to explore diversity and structure of microbial communities.

    PubMed

    Logares, Ramiro; Sunagawa, Shinichi; Salazar, Guillem; Cornejo-Castillo, Francisco M; Ferrera, Isabel; Sarmento, Hugo; Hingamp, Pascal; Ogata, Hiroyuki; de Vargas, Colomban; Lima-Mendez, Gipsi; Raes, Jeroen; Poulain, Julie; Jaillon, Olivier; Wincker, Patrick; Kandels-Lewis, Stefanie; Karsenti, Eric; Bork, Peer; Acinas, Silvia G

    2014-09-01

    Sequencing of 16S rDNA polymerase chain reaction (PCR) amplicons is the most common approach for investigating environmental prokaryotic diversity, despite the known biases introduced during PCR. Here we show that 16S rDNA fragments derived from Illumina-sequenced environmental metagenomes (mi tags) are a powerful alternative to 16S rDNA amplicons for investigating the taxonomic diversity and structure of prokaryotic communities. As part of the Tara Oceans global expedition, marine plankton was sampled in three locations, resulting in 29 subsamples for which metagenomes were produced by shotgun Illumina sequencing (ca. 700 Gb). For comparative analyses, a subset of samples was also selected for Roche-454 sequencing using both shotgun (m454 tags; 13 metagenomes, ca. 2.4 Gb) and 16S rDNA amplicon (454 tags; ca. 0.075 Gb) approaches. Our results indicate that by overcoming PCR biases related to amplification and primer mismatch, mi tags may provide more realistic estimates of community richness and evenness than amplicon 454 tags. In addition, mi tags can capture expected beta diversity patterns. Using mi tags is now economically feasible given the dramatic reduction in high-throughput sequencing costs, having the advantage of retrieving simultaneously both taxonomic (Bacteria, Archaea and Eukarya) and functional information from the same microbial community.

  10. Expressed sequence tags (ESTs) from immune tissues of turbot (Scophthalmus maximus) challenged with pathogens

    PubMed Central

    Pardo, Belén G; Fernández, Carlos; Millán, Adrián; Bouza, Carmen; Vázquez-López, Araceli; Vera, Manuel; Alvarez-Dios, José A; Calaza, Manuel; Gómez-Tato, Antonio; Vázquez, María; Cabaleiro, Santiago; Magariños, Beatriz; Lemos, Manuel L; Leiro, José M; Martínez, Paulino

    2008-01-01

    Background The turbot (Scophthalmus maximus; Scophthalmidae; Pleuronectiformes) is a flatfish species of great relevance for marine aquaculture in Europe. In contrast to other cultured flatfish, very few genomic resources are available in this species. Aeromonas salmonicida and Philasterides dicentrarchi are two pathogens that affect turbot culture causing serious economic losses to the turbot industry. Little is known about the molecular mechanisms for disease resistance and host-pathogen interactions in this species. In this work, thousands of ESTs for functional genomic studies and potential markers linked to ESTs for mapping (microsatellites and single nucleotide polymorphisms (SNPs)) are provided. This information enabled us to obtain a preliminary view of regulated genes in response to these pathogens and it constitutes the basis for subsequent and more accurate microarray analysis. Results A total of 12584 cDNAs partially sequenced from three different cDNA libraries of turbot (Scophthalmus maximus) infected with Aeromonas salmonicida, Philasterides dicentrarchi and from healthy fish were analyzed. Three immune-relevant tissues (liver, spleen and head kidney) were sampled at several time points in the infection process for library construction. The sequences were processed into 9256 high-quality sequences, which constituted the source for the turbot EST database. Clustering and assembly of these sequences, revealed 3482 different putative transcripts, 1073 contigs and 2409 singletons. BLAST searches with public databases detected significant similarity (e-value ≤ 1e-5) in 1766 (50.7%) sequences and 816 of them (23.4%) could be functionally annotated. Two hundred three of these genes (24.9%), encoding for defence/immune-related proteins, were mostly identified for the first time in turbot. Some ESTs showed significant differences in the number of transcripts when comparing the three libraries, suggesting regulation in response to these pathogens. A total of

  11. Phosphorylation of serine residues in histidine-tag sequences attached to recombinant protein kinases: a cause of heterogeneity in mass and complications in function.

    PubMed

    Du, Ping; Loulakis, Pat; Luo, Chun; Mistry, Anil; Simons, Samuel P; LeMotte, Peter K; Rajamohan, Francis; Rafidi, Kristina; Coleman, Kevin G; Geoghegan, Kieran F; Xie, Zhi

    2005-12-01

    High-level recombinant expression of protein kinases in eukaryotic cells or Escherichia coli commonly gives products that are phosphorylated by autocatalysis or by the action of endogenous kinases. Here, we report that phosphorylation occurred on serine residues adjacent to hexahistidine affinity tags (His-tags) derived from several commercial expression vectors and fused to overexpressed kinases. The result was observed with a variety of recombinant kinases expressed in either insect cells or E. coli. Multiple phosphorylations of His-tagged full-length Aurora A, a protein serine/threonine kinase, were detected by mass spectrometry when it was expressed in insect cells in the presence of okadaic acid, a protein phosphatase inhibitor. Peptide mapping by liquid chromatography-mass spectrometry detected phosphorylations on all three serine residues in an N-terminal tag, alpha-N-acetyl-MHHHHHHSSGLPRGS. The same sequence was also phosphorylated, but only at a low level, when a His-tagged protein tyrosine kinase, Pyk2 was expressed in insect cells and activated in vitro. When catalytic domains of Aurora A and several other protein serine/threonine kinases were expressed in E. coli, serines in the affinity tag sequence GSSHHHHHHSSGLVPRGS were also variably phosphorylated. His-Aurora A with hyperphosphorylation of the serine residues in the tag aggregated and resisted thrombin-catalyzed removal of the tag. Treatment with alkaline phosphatase partly restored sensitivity to thrombin. The same His-tag sequence was also detected bearing alpha-N-d-gluconoylation in addition to multiple phosphorylations. The results show that histidine-tag sequences can receive complicated posttranslational modification, and that the hyperphosphorylation and resulting heterogeneity of the recombinant fusion proteins can interfere with downstream applications.

  12. Massive parallel insertion site sequencing of an arrayed Sinorhizobium meliloti signature-tagged mini-Tn 5 transposon mutant library.

    PubMed

    Serrania, Javier; Johner, Tobias; Rupp, Oliver; Goesmann, Alexander; Becker, Anke

    2017-02-21

    Transposon mutagenesis in conjunction with identification of genomic transposon insertion sites is a powerful tool for gene function studies. We have implemented a protocol for parallel determination of transposon insertion sites by Illumina sequencing involving a hierarchical barcoding method that allowed for tracking back insertion sites to individual clones of an arrayed signature-tagged transposon mutant library. This protocol was applied to further characterize a signature-tagged mini-Tn 5 mutant library comprising about 12,000 mutants of the symbiotic nitrogen-fixing alphaproteobacterium Sinorhizobium meliloti (Pobigaylo et al., 2006; Appl. Environ. Microbiol. 72, 4329-4337). Previously, insertion sites have been determined for 5000 mutants of this library. Combining an adapter-free, inverse PCR method for sequencing library preparation with next generation sequencing, we identified 4473 novel insertion sites, increasing the total number of transposon mutants with known insertion site to 9562. The number of protein-coding genes that were hit at least once by a transposon increased by 1231 to a total number of 3673 disrupted genes, which represents 59% of the predicted protein-coding genes in S. meliloti.

  13. Miniaturised wireless smart tag for optical chemical analysis applications.

    PubMed

    Steinberg, Matthew D; Kassal, Petar; Tkalčec, Biserka; Murković Steinberg, Ivana

    2014-01-01

    A novel miniaturised photometer has been developed as an ultra-portable and mobile analytical chemical instrument. The low-cost photometer presents a paradigm shift in mobile chemical sensor instrumentation because it is built around a contactless smart card format. The photometer tag is based on the radio-frequency identification (RFID) smart card system, which provides short-range wireless data and power transfer between the photometer and a proximal reader, and which allows the reader to also energise the photometer by near field electromagnetic induction. RFID is set to become a key enabling technology of the Internet-of-Things (IoT), hence devices such as the photometer described here will enable numerous mobile, wearable and vanguard chemical sensing applications in the emerging connected world. In the work presented here, we demonstrate the characterisation of a low-power RFID wireless sensor tag with an LED/photodiode-based photometric input. The performance of the wireless photometer has been tested through two different model analytical applications. The first is photometry in solution, where colour intensity as a function of dye concentration was measured. The second is an ion-selective optode system in which potassium ion concentrations were determined by using previously well characterised bulk optode membranes. The analytical performance of the wireless photometer smart tag is clearly demonstrated by these optical absorption-based analytical experiments, with excellent data agreement to a reference laboratory instrument. © 2013 Elsevier B.V. All rights reserved.

  14. Construction of a yeast artificial chromosome contig spanning the pseudoautosomal region and isolation of 25 new sequence-tagged sites

    SciTech Connect

    Slim, R. Laboratoire de Cytogenetique et Genetique Oncologiques, Villejuif ); Le Paslier, D.; Ougen, P.; Billault, A.; Cohen, D. ); Compain, S.; Levilliers, J.; Mintz, L.; Weissenbach, J.; Petit, C. )

    1993-06-01

    Thirty-one yeast artificial chromosomes (YACs) from the human pseudoautosomal region were identified by a combination of sequence-tagged site (STS) screenings and colony hybridizations, using a subtelomeric interspersed repetitive element mapping predominantly to the pseudoautosomal region. Twenty-five new pseudoautosomal STSs were generated, of which 4 detected restriction fragment length polymorphisms. A total of 33 STSs were used to assemble the 31 YACs into a single contiguous set of overlapping DNA fragments spanning at least 2.3 megabases of the pseudoautosomal region. In addition, four pseudoautosomal genes including hydroxyindole O-methyltransferase have been positioned on this set of fragments. 48 refs., 1 fig., 3 tabs.

  15. Large-scale identification of odorant-binding proteins and chemosensory proteins from expressed sequence tags in insects

    PubMed Central

    2009-01-01

    Background Insect odorant binding proteins (OBPs) and chemosensory proteins (CSPs) play an important role in chemical communication of insects. Gene discovery of these proteins is a time-consuming task. In recent years, expressed sequence tags (ESTs) of many insect species have accumulated, thus providing a useful resource for gene discovery. Results We have developed a computational pipeline to identify OBP and CSP genes from insect ESTs. In total, 752,841 insect ESTs were examined from 54 species covering eight Orders of Insecta. From these ESTs, 142 OBPs and 177 CSPs were identified, of which 117 OBPs and 129 CSPs are new. The complete open reading frames (ORFs) of 88 OBPs and 123 CSPs were obtained by electronic elongation. We randomly chose 26 OBPs from eight species of insects, and 21 CSPs from four species for RT-PCR validation. Twenty two OBPs and 16 CSPs were confirmed by RT-PCR, proving the efficiency and reliability of the algorithm. Together with all family members obtained from the NCBI (OBPs) or the UniProtKB (CSPs), 850 OBPs and 237 CSPs were analyzed for their structural characteristics and evolutionary relationship. Conclusions A large number of new OBPs and CSPs were found, providing the basis for deeper understanding of these proteins. In addition, the conserved motif and evolutionary analysis provide some new insights into the evolution of insect OBPs and CSPs. Motif pattern fine-tune the functions of OBPs and CSPs, leading to the minor difference in binding sex pheromone or plant volatiles in different insect Orders. PMID:20034407

  16. Analytic signal phase-based myocardial motion estimation in tagged MRI sequences by a bilinear model and motion compensation.

    PubMed

    Wang, Liang; Basarab, Adrian; Girard, Patrick R; Croisille, Pierre; Clarysse, Patrick; Delachartre, Philippe

    2015-08-01

    Different mathematical tools, such as multidimensional analytic signals, allow for the calculation of 2D spatial phases of real-value images. The motion estimation method proposed in this paper is based on two spatial phases of the 2D analytic signal applied to cardiac sequences. By combining the information of these phases issued from analytic signals of two successive frames, we propose an analytical estimator for 2D local displacements. To improve the accuracy of the motion estimation, a local bilinear deformation model is used within an iterative estimation scheme. The main advantages of our method are: (1) The phase-based method allows the displacement to be estimated with subpixel accuracy and is robust to image intensity variation in time; (2) Preliminary filtering is not required due to the bilinear model. The proposed algorithm, integrating phase-based optical flow motion estimation and the combination of global motion compensation with local bilinear transform, allows spatio-temporal cardiac motion analysis, e.g. strain and dense trajectory estimation over the cardiac cycle. Results from 7 realistic simulated tagged magnetic resonance imaging (MRI) sequences show that our method is more accurate compared with state-of-the-art method for cardiac motion analysis and with another differential approach from the literature. The motion estimation errors (end point error) of the proposed method are reduced by about 33% compared with that of the two methods. In our work, the frame-to-frame displacements are further accumulated in time, to allow for the calculation of myocardial Lagrangian cardiac strains and point trajectories. Indeed, from the estimated trajectories in time on 11 in vivo data sets (9 patients and 2 healthy volunteers), the shape of myocardial point trajectories belonging to pathological regions are clearly reduced in magnitude compared with the ones from normal regions. Myocardial point trajectories, estimated from our phase-based analytic

  17. Gene expression profiling of coelomic cells and discovery of immune-related genes in the earthworm, Eisenia andrei, using expressed sequence tags.

    PubMed

    Tak, Eun Sik; Cho, Sung-Jin; Park, Soon Cheol

    2015-01-01

    The coelomic cells of the earthworm consist of leukocytes, chlorogocytes, and coelomocytes, which play an important role in innate immunity reactions. To gain insight into the expression profiles of coelomic cells of the earthworm, Eisenia andrei, we analyzed 1151 expressed sequence tags (ESTs) derived from the cDNA library of the coelomic cells. Among the 1151 ESTs analyzed, 493 ESTs (42.8%) showed a significant similarity to known genes and represented 164 unique genes, of which 93 ESTs were singletons and 71 ESTs manifested as two or more ESTs. From the 164 unique genes sequenced, we found 24 immune-related and cell defense genes. Furthermore, real-time PCR analysis showed that levels of lysenin-related proteins mRNA in coelomic cells of E. andrei were upregulated after the injection of Bacillus subtilis bacteria. This EST data-set would provide a valuable resource for future researches of earthworm immune system.

  18. Existence of microsatellites in expressed sequence tags of common carp ( Cyprinus carpio L.) available in GenBank dbEST database

    NASA Astrophysics Data System (ADS)

    Hu, Jingjie; Wang, Xiaolong; Hu, Xiaoli; Bao, Zhenmin

    2006-01-01

    Common carp expressed sequence tags (ESTs) were analyzed for the existence of microsatellites, or simple sequence repeats (SSRs). In the NCBI dbEST database, a total of 10612 sequences were registered before December 31, 2004. A complete search of 2 6 nucleotide microsatellites resulted in the identification of 513 SSR-containing ESTs, accounting for 4.8% of the total. Cluster analysis indicated that 73 sequences of SSR-containing ESTs fell into 27 groups and the remaining 440 ESTs were indenpendent. A total of 467 unique SSR-containing ESTs were identified. These EST-SSRs contained a variety of simple sequence types, and di- and tri-nucleotide repeats were the most abundant, accounting for 42.1% and 27.9% of the whole, respectively. Of the dinucleotide repeats, CA/TG was the most abundant, followed by GA/TC. BLASTx search showed that 38.1% of the SSR loci could be associated with genes or proteins of known or unknown function. BLASTx searches of SSR-containing ESTs also showed high frequencies (98/179) of hits on zebrafish sequences.

  19. Existence of microsatellites in expressed sequence tags of common carp ( Cyprinus carpio L.) available in GenBank dbEST database

    NASA Astrophysics Data System (ADS)

    Jingjie, Hu; Xiaolong, Wang; Xiaoli, Hu; Zhenmin, Bao

    2006-01-01

    Common carp expressed sequence tags (ESTs) were analyzed for the existence of microsatellites, or simple sequence repeats (SSRs). In the NCBI dbEST database, a total of 10612 sequences were registered before December 31, 2004. A complete search of 2-6 nucleotide microsatellites resulted in the identification of 513 SSR-containing ESTs, accounting for 4.8% of the total. Cluster analysis indicated that 73 sequences of SSR-containing ESTs fell into 27 groups and the remaining 440 ESTs were indenpendent. A total of 467 unique SSR-containing ESTs were identified. These EST-SSRs contained a variety of simple sequence types, and di- and tri-nucleotide repeats were the most abundant, accounting for 42.1% and 27.9% of the whole, respectively. Of the dinucleotide repeats, CA/TG was the most abundant, followed by GA/TC. BLASTx search showed that 38.1% of the SSR loci could be associated with genes or proteins of known or unknown function. BLASTx searches of SSR-containing ESTs also showed high frequencies (98/179) of hits on zebrafish sequences.

  20. Exploring the Structure of Library and Information Science Web Space Based on Multivariate Analysis of Social Tags

    ERIC Educational Resources Information Center

    Joo, Soohyung; Kipp, Margaret E. I.

    2015-01-01

    Introduction: This study examines the structure of Web space in the field of library and information science using multivariate analysis of social tags from the Website, Delicious.com. A few studies have examined mathematical modelling of tags, mainly examining tagging in terms of tripartite graphs, pattern tracing and descriptive statistics. This…

  1. Expressed sequence tags and molecular cloning and characterization of gene encoding pinoresinol/lariciresinol reductase from Podophyllum hexandrum.

    PubMed

    Wankhede, Dhammaprakash Pandhari; Biswas, Dipul Kumar; Rajkumar, Subramani; Sinha, Alok Krishna

    2013-12-01

    Podophyllotoxin, an aryltetralin lignan, is the source of important anticancer drugs etoposide, teniposide, and etopophos. Roots/rhizome of Podophyllum hexandrum form one of the most important sources of podophyllotoxin. In order to understand genes involved in podophyllotoxin biosynthesis, two suppression subtractive hybridization libraries were synthesized, one each from root/rhizome and leaves using high and low podophyllotoxin-producing plants of P. hexandrum. Sequencing of clones identified a total of 1,141 Expressed Sequence Tags (ESTs) resulting in 354 unique ESTs. Several unique ESTs showed sequence similarity to the genes involved in metabolism, stress/defense responses, and signalling pathways. A few ESTs also showed high sequence similarity with genes which were shown to be involved in podophyllotoxin biosynthesis in other plant species such as pinoresinol/lariciresinol reductase. A full length coding sequence of pinoresinol/lariciresinol reductase (PLR) has been cloned from P. hexandrum which was found to encode protein with 311 amino acids and show sequence similarity with PLR from Forsythia intermedia and Linum spp. Spatial and stress-inducible expression pattern of PhPLR and other known genes of podophyllotoxin biosynthesis, secoisolariciresinol dehydrogenase (PhSDH), and dirigent protein oxidase (PhDPO) have been studied. All the three genes showed wounding and methyl jasmonate-inducible expression pattern. The present work would form a basis for further studies to understand genomics of podophyllotoxin biosynthesis in P. hexandrum.

  2. Modified PCR methods for 3' end amplification from serial analysis of gene expression (SAGE) tags.

    PubMed

    Xu, Wang-Jie; Wang, Zhao-Xia; Qiao, Zhong-Dong

    2009-05-01

    Serial analysis of gene expression (SAGE) is a powerful technique to study gene expression at the genome level. However, a disadvantage of the shortness of SAGE tags is that it prevents further study of SAGE library data, thus limiting extensive application of the SAGE method in gene expression studies. However, this problem can be solved by extension of the SAGE tags to 3' cDNAs. Therefore, several methods based on PCR have been developed to generate a 3' longer fragment cDNA corresponding to a SAGE tag. The list of modified methods is extensive, and includes rapid RT-PCR analysis of unknown SAGE tags (RAST-PCR), generation of longer cDNA fragments from SAGE tags for gene identification (GLGI), a high-throughput GLGI procedure, reverse SAGE (rSAGE), two-step analysis of unknown SAGE tags (TSAT-PCR), etc. These procedures are constantly being updated because they have characteristics and advantages that can be shared. Development of these methods has promoted the widespread use of the SAGE technique, and has accelerated the speed of studies of large-scale gene expression.

  3. Thirty-four Musa (Musaceae) expressed sequence tag-derived microsatellite markers transferred to Musella lasiocarpa.

    PubMed

    Li, W J; Ma, H; Li, Z H; Wan, Y M; Liu, X X; Zhou, C L

    2012-08-06

    We assembled 31,308 publicly available Musa EST sequences into 21,129 unigenes; 4944 of them contained 5416 SSR motifs. In all, 238 unigenes flanking SSRs were randomly selected for primer design and then tested for amplification in Musella lasiocarpa. Seventy-eight primer pairs were found to be transferable to this species, and 49 displayed polymorphism. A set of 34 polymorphic SSR markers was analyzed in 24 individuals from four wild M. lasiocarpa populations. The mean number of alleles per locus was 3.0, ranging from 2 to 7. The observed and expected heterozygosities per marker ranged from 0.087 to 0.875 (mean 0.503) and from 0.294 to 0.788 (mean 0.544), respectively. These markers will be of practical use for genetic diversity and quantitative trait loci analysis of M. lasiocarpa.

  4. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  5. Development of Expressed Sequence Tag (EST)-based Cleaved Amplified Polymorphic Sequence (CAPS) markers of tea plant and their application to cultivar identification.

    PubMed

    Ujihara, Tomomi; Taniguchi, Fumiya; Tanaka, Jun-Ichi; Hayashi, Nobuyuki

    2011-03-09

    To develop cleaved amplified polymorphic sequence (CAPS) markers for cultivar identification of the tea leaf, 5 primer pairs designed on the basis of genes that encode proteins related to nitrogen assimilation and 26 primer pairs based on expressed sequence tag (EST) sequences of the root of tea plant were screened. From combinations of primer pair and restriction enzyme that showed polymorphism among tea plants, 16 markers were selected and applied to DNA fingerprinting of Japanese tea cultivars. Sixty-three cultivars, except for a bud sport (Kiraka) and its original cultivar (Yabukita) and a pair that was the progeny of the same crossing parent (Harumoegi and Sakimidori), were distinguished from one another. By combining the 16 markers with previously developed CAPS markers and observing the physical appearance, 67 cultivars were distinguishable. The cultivars involve approximately 95% of total tea cultivating area in Japan; therefore, about 95% of tea leaves produced in Japan can be authenticated by labeling their cultivars.

  6. Sequence analysis of diacylglycerol acyltransferases

    USDA-ARS?s Scientific Manuscript database

    Diacylglycerol acyltransferases (DGATs) catalyze the final step of triacylglycerol (TAG) biosynthesis in eukaryotes. DGATs esterify sn-1,2-diacylglycerol with a long-chain fatty acyl-CoA. Plants and animals deficient in DGATs accumulate less TAG and over-expression of DGATs increases TAG. DGAT knock...

  7. Evidence from sequence-tagged-site markers of a recent progenitor-derivative species pair in conifers

    PubMed Central

    Perron, Martin; Perry, Daniel J.; Andalo, Christophe; Bousquet, Jean

    2000-01-01

    Black spruce (Picea mariana [B.S.P.] Mill.) and red spruce (Picea rubens Sarg.) are two conifer species known to hybridize naturally in northeastern North America. We hypothesized that there is a progenitor-derivative relationship between these two taxa and conducted a genetic investigation by using sequence-tagged-site markers of expressed genes. Based on the 26 sequence-tagged-site loci assayed in this study, the unbiased genetic identity between the two taxa was quite high with a value of 0.920. The mean number of polymorphic loci, the mean number of alleles per polymorphic locus, and the average observed heterozygosity were lower in red spruce (P = 35%, AP = 2.1, Ho = 0.069) than in black spruce (P = 54%, AP = 2.9, Ho = 0.103). No unique alleles were found in red spruce, and the observed patterns of allele distribution indicated that the genetic diversity of red spruce was essentially a subset of that found in black spruce. When considered in combination with ecological evidence and simulation results, these observations clearly support the existence of a progenitor-derivative relationship and suggest that the reduced level of genetic diversity in red spruce may result from allopatric speciation through glaciation-induced isolation of a preexisting black spruce population during the Pleistocene era. Our observations signal a need for a thorough reexamination of several conifer species complexes in which natural hybridization is known to occur. PMID:11016967

  8. Parallel Tagged Next-Generation Sequencing on Pooled Samples – A New Approach for Population Genetics in Ecology and Conservation

    PubMed Central

    Zavodna, Monika; Grueber, Catherine E.; Gemmell, Neil J.

    2013-01-01

    Next-generation sequencing (NGS) on pooled samples has already been broadly applied in human medical diagnostics and plant and animal breeding. However, thus far it has been only sparingly employed in ecology and conservation, where it may serve as a useful diagnostic tool for rapid assessment of species genetic diversity and structure at the population level. Here we undertake a comprehensive evaluation of the accuracy, practicality and limitations of parallel tagged amplicon NGS on pooled population samples for estimating species population diversity and structure. We obtained 16S and Cyt b data from 20 populations of Leiopelma hochstetteri, a frog species of conservation concern in New Zealand, using two approaches – parallel tagged NGS on pooled population samples and individual Sanger sequenced samples. Data from each approach were then used to estimate two standard population genetic parameters, nucleotide diversity (π) and population differentiation (FST), that enable population genetic inference in a species conservation context. We found a positive correlation between our two approaches for population genetic estimates, showing that the pooled population NGS approach is a reliable, rapid and appropriate method for population genetic inference in an ecological and conservation context. Our experimental design also allowed us to identify both the strengths and weaknesses of the pooled population NGS approach and outline some guidelines and suggestions that might be considered when planning future projects. PMID:23637841

  9. Parallel tagged next-generation sequencing on pooled samples - a new approach for population genetics in ecology and conservation.

    PubMed

    Zavodna, Monika; Grueber, Catherine E; Gemmell, Neil J

    2013-01-01

    Next-generation sequencing (NGS) on pooled samples has already been broadly applied in human medical diagnostics and plant and animal breeding. However, thus far it has been only sparingly employed in ecology and conservation, where it may serve as a useful diagnostic tool for rapid assessment of species genetic diversity and structure at the population level. Here we undertake a comprehensive evaluation of the accuracy, practicality and limitations of parallel tagged amplicon NGS on pooled population samples for estimating species population diversity and structure. We obtained 16S and Cyt b data from 20 populations of Leiopelma hochstetteri, a frog species of conservation concern in New Zealand, using two approaches - parallel tagged NGS on pooled population samples and individual Sanger sequenced samples. Data from each approach were then used to estimate two standard population genetic parameters, nucleotide diversity (π) and population differentiation (FST), that enable population genetic inference in a species conservation context. We found a positive correlation between our two approaches for population genetic estimates, showing that the pooled population NGS approach is a reliable, rapid and appropriate method for population genetic inference in an ecological and conservation context. Our experimental design also allowed us to identify both the strengths and weaknesses of the pooled population NGS approach and outline some guidelines and suggestions that might be considered when planning future projects.

  10. Transcriptome analysis of the medulla tissue from cattle in response to bovine spongiform encephalopathy using digital gene expression tag profiling.

    PubMed

    Basu, Urmila; Almeida, Luciane; Olson, N Eric; Meng, Yan; Williams, John L; Moore, Stephen S; Guan, Le Luo

    2011-01-01

    Bovine spongiform encephalopathy (BSE) is a transmissible, fatal neurodegenerative disorder of cattle produced by prions. The use of excessive parallel sequencing for comparison of gene expression in bovine control and infected tissues may help to elucidate the molecular mechanisms associated with this disease. In this study, tag profiling Solexa sequencing was used for transcriptome analysis of bovine brain tissues. Replicate libraries were prepared from mRNA isolated from control and infected (challenged with 100 g of BSE-infected brain) medulla tissues 45 mo after infection. For each library, 5-6 million sequence reads were generated and approximately 67-70% of the reads were mapped against the Bovine Genome database to approximately 13,700-14,120 transcripts (each having at least one read). About 42-47% of the total reads mapped uniquely. Using the GeneSifter software package, 190 differentially expressed (DE) genes were identified (>2.0-fold change, p < .01): 73 upregulated and 117 downregulated. Seventy-nine DE genes had functions described in the Gene Ontology (GO) database and 16 DE genes were involved in 38 different pathways described in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Digital analysis expression by tag profiling may be a powerful approach to comprehensive transcriptome analysis to identify changes associated with disease progression, leading to a better understanding of the underlying mechanism of pathogenesis of BSE.

  11. Open-reading-frame sequence tags (OSTs) support the existence of at least 17,300 genes in C. elegans.

    PubMed

    Reboul, J; Vaglio, P; Tzellas, N; Thierry-Mieg, N; Moore, T; Jackson, C; Shin-i, T; Kohara, Y; Thierry-Mieg, D; Thierry-Mieg, J; Lee, H; Hitti, J; Doucette-Stamm, L; Hartley, J L; Temple, G F; Brasch, M A; Vandenhaute, J; Lamesch, P E; Hill, D E; Vidal, M

    2001-03-01

    The genome sequences of Caenorhabditis elegans, Drosophila melanogaster and Arabidopsis thaliana have been predicted to contain 19,000, 13,600 and 25,500 genes, respectively. Before this information can be fully used for evolutionary and functional studies, several issues need to be addressed. First, the gene number estimates obtained in silico and not yet supported by any experimental data need to be verified. For example, it seems biologically paradoxical that C. elegans would have 50% more genes than Drosophilia. Second, intron/exon predictions need to be tested experimentally. Third, complete sets of open reading frames (ORFs), or "ORFeomes," need to be cloned into various expression vectors. To address these issues simultaneously, we have designed and applied to C. elegans the following strategy. Predicted ORFs are amplified by PCR from a highly representative cDNA library using ORF-specific primers, cloned by Gateway recombination cloning and then sequenced to generate ORF sequence tags (OSTs) as a way to verify identity and splicing. In a sample (n=1,222) of the nearly 10,000 genes predicted ab initio (that is, for which no expressed sequence tag (EST) is available so far), at least 70% were verified by OSTs. We also observed that 27% of these experimentally confirmed genes have a structure different from that predicted by GeneFinder. We now have experimental evidence that supports the existence of at least 17,300 genes in C. elegans. Hence we suggest that gene counts based primarily on ESTs may underestimate the number of genes in human and in other organisms.

  12. Accurate and unambiguous tag-to-gene mapping in serial analysis of gene expression

    PubMed Central

    Malig, Rodrigo; Varela, Cristian; Agosin, Eduardo; Melo, Francisco

    2006-01-01

    Background In this study, we present a robust and reliable computational method for tag-to-gene assignment in serial analysis of gene expression (SAGE). The method relies on current genome information and annotation, incorporation of several new features, and key improvements over alternative methods, all of which are important to determine gene expression levels more accurately. The method provides a complete annotation of potential virtual SAGE tags within a genome, along with an estimation of their confidence for experimental observation that ranks tags that present multiple matches in the genome. Results We applied this method to the Saccharomyces cerevisiae genome, producing the most thorough and accurate annotation of potential virtual SAGE tags that is available today for this organism. The usefulness of this method is exemplified by the significant reduction of ambiguous cases in existing experimental SAGE data. In addition, we report new insights from the analysis of existing SAGE data. First, we found that experimental SAGE tags mapping onto introns, intron-exon boundaries, and non-coding RNA elements are observed in all available SAGE data. Second, a significant fraction of experimental SAGE tags was found to map onto genomic regions currently annotated as intergenic. Third, a significant number of existing experimental SAGE tags for yeast has been derived from truncated cDNAs, which are synthesized through oligo-d(T) priming to internal poly-(A) regions during reverse transcription. Conclusion We conclude that an accurate and unambiguous tag mapping process is essential to increase the quality and the amount of information that can be extracted from SAGE experiments. This is supported by the results obtained here and also by the large impact that the erroneous interpretation of these data could have on downstream applications. PMID:17083742

  13. Expressed sequence tags from normalized cDNA libraries prepared from gill and hypodermal tissues of the blue crab, Callinectes sapidus.

    PubMed

    Coblentz, Francie E; Towle, David W; Shafer, Thomas H

    2006-06-01

    Expressed sequence tags (ESTs) were produced from two normalized cDNA libraries from the blue crab, Callinectes sapidus. The gill library represented pooled RNA from respiratory and transporting gills after acclimation to either high or low salinity. The hypodermis library was from arthrodial and dorsal tissue from both pre- and post-molt crabs. Random clones were single-pass sequenced from the 5'-ends, resulting in 11,761 high quality ESTs averaging 652 bases. All the ESTs were assembled using Paracel Transcript Assembler software, producing 2176 potential transcripts-883 contigs and 1293 singlets. Of these, 1235 (56.7%) were sequenced only from the gill library, while 578 (26.6%) were exclusively hypodermal. There were 363 contigs containing ESTs from both tissues (16.7% of the putative transcripts). All contigs and singlets were compared to the public protein database using BLASTx, and descriptions of the three most similar proteins for each were recorded. Additional annotations included an Interpro analysis of protein domains and a listing of Gene Ontology (GO) categories inferred from similar proteins in GO-annotated databases. All sequences are available on a web page (http://firedev.bear.uncw.edu:8080/shaferlab/). The annotations can be searched, and BLAST alignment of user-inputted sequences against the putative transcripts is possible. In addition, the ESTs have been submitted to GenBank.

  14. Detrimental effect of the 6 His C-terminal tag on YedY enzymatic activity and influence of the TAT signal sequence on YedY synthesis

    PubMed Central

    2013-01-01

    Background YedY, a molybdoenzyme belonging to the sulfite oxidase family, is found in most Gram-negative bacteria. It contains a twin-arginine signal sequence that is cleaved after its translocation into the periplasm. Despite a weak reductase activity with substrates such as dimethyl sulfoxide or trimethylamine N-oxide, its natural substrate and its role in the cell remain unknown. Although sequence conservation of the YedY family displays a strictly conserved hydrophobic C-terminal residue, all known studies on Escherichia coli YedY have been performed with an enzyme containing a 6 histidine-tag at the C-terminus which could hamper enzyme activity. Results In this study, we demonstrate that the tag fused to the C-terminus of Rhodobacter sphaeroides YedY is detrimental to the enzyme’s reductase activity and results in an eight-fold decrease in catalytic efficiency. Nonetheless this C-terminal tag does not influence the properties of the molybdenum active site, as assayed by EPR spectroscopy. When a cleavable His-tag was fused to the N-terminus of the mature enzyme in the absence of the signal sequence, YedY was expressed and folded with its cofactor. However, when the signal sequence was added upstream of the N-ter tag, the amount of enzyme produced was approximately ten-fold higher. Conclusion Our study thus underscores the risk of using a C-terminus tagged enzyme while studying YedY, and presents an alternative strategy to express signal sequence-containing enzymes with an N-terminal tag. It brings new insights into molybdoenzyme maturation in R. sphaeroides showing that for some enzymes, maturation can occur in the absence of the signal sequence but that its presence is required for high expression of active enzyme. PMID:24180491

  15. Precipitation recycling in West Africa - regional modeling, evaporation tagging and atmospheric water budget analysis

    NASA Astrophysics Data System (ADS)

    Arnault, Joel; Kunstmann, Harald; Knoche, Hans-Richard

    2015-04-01

    Many numerical studies have shown that the West African monsoon is highly sensitive to the state of the land surface. It is however questionable to which extend a local change of land surface properties would affect the local climate, especially with respect to precipitation. This issue is traditionally addressed with the concept of precipitation recycling, defined as the contribution of local surface evaporation to local precipitation. For this study the West African monsoon has been simulated with the Weather Research and Forecasting (WRF) model using explicit convection, for the domain (1°S-21°N, 18°W-14°E) at a spatial resolution of 10 km, for the period January-October 2013, and using ERA-Interim reanalyses as driving data. This WRF configuration has been selected for its ability to simulate monthly precipitation amounts and daily histograms close to TRMM (Tropical Rainfall Measuring Mission) data. In order to investigate precipitation recycling in this WRF simulation, surface evaporation tagging has been implemented in the WRF source code as well as the budget of total and tagged atmospheric water. Surface evaporation tagging consists in duplicating all water species and the respective prognostic equations in the source code. Then, tagged water species are set to zero at the lateral boundaries of the simulated domain (no inflow of tagged water vapor), and tagged surface evaporation is considered only in a specified region. All the source terms of the prognostic equations of total and tagged water species are finally saved in the outputs for the budget analysis. This allows quantifying the respective contribution of total and tagged atmospheric water to atmospheric precipitation processes. The WRF simulation with surface evaporation tagging and budgets has been conducted two times, first with a 100 km2 tagged region (11-12°N, 1-2°W), and second with a 1000 km2 tagged region (7-16°N, 6°W -3°E). In this presentation we will investigate hydro

  16. Pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of Plasmodium vivax in human patients

    PubMed Central

    Merino, Emilio F; Fernandez-Becerra, Carmen; Madeira, Alda MBN; Machado, Ariane L; Durham, Alan; Gruber, Arthur; Hall, Neil; del Portillo, Hernando A

    2003-01-01

    Background Plasmodium vivax is the most widely distributed human malaria, responsible for 70–80 million clinical cases each year and large socio-economical burdens for countries such as Brazil where it is the most prevalent species. Unfortunately, due to the impossibility of growing this parasite in continuous in vitro culture, research on P. vivax remains largely neglected. Methods A pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of P. vivax was performed. To do so, 1,184 clones from a cDNA library constructed with parasites obtained from 10 different human patients in the Brazilian Amazon were sequenced. Sequences were automatedly processed to remove contaminants and low quality reads. A total of 806 sequences with an average length of 586 bp met such criteria and their clustering revealed 666 distinct events. The consensus sequence of each cluster and the unique sequences of the singlets were used in similarity searches against different databases that included P. vivax, Plasmodium falciparum, Plasmodium yoelii, Plasmodium knowlesi, Apicomplexa and the GenBank non-redundant database. An E-value of <10-30 was used to define a significant database match. ESTs were manually assigned a gene ontology (GO) terminology Results A total of 769 ESTs could be assigned a putative identity based upon sequence similarity to known proteins in GenBank. Moreover, 292 ESTs were annotated and a GO terminology was assigned to 164 of them. Conclusion These are the first ESTs reported for P. vivax and, as such, they represent a valuable resource to assist in the annotation of the P. vivax genome currently being sequenced. Moreover, since the GC-content of the P. vivax genome is strikingly different from that of P. falciparum, these ESTs will help in the validation of gene predictions for P. vivax and to create a gene index of this malaria parasite. PMID:12914668

  17. Protein identities from 'Graphocephala atropunctata' expressed sequence tags: Expanding leafhopper vector biology

    USDA-ARS?s Scientific Manuscript database

    Heat shock proteins and 44 protein sequences from the blue-green sharpshooter, BGSS, were produced and identified. The sequences were submitted and published under accession numbers: DQ445499-DQ445542, in the National Center for Biotechnology Information, NCBI, Public Database. The blue-green sharps...

  18. Analysis of C. elegans muscle transcriptome using trans-splicing-based RNA tagging (SRT)

    PubMed Central

    Ma, Xiaopeng; Zhan, Ge; Sleumer, Monica C.; Chen, Siyu; Liu, Weihong; Zhang, Michael Q.; Liu, Xiao

    2016-01-01

    Current approaches to profiling tissue-specific gene expression in C. elegans require delicate manipulation and are difficult under certain conditions, e.g. from dauer or aging worms. We have developed an easy and robust method for tissue-specific RNA-seq by taking advantage of the endogenous trans-splicing process. In this method, transgenic worms are generated in which a spliced leader (SL) RNA gene is fused with a sequence tag and driven by a tissue-specific promoter. Only in the tissue of interest, the tagged SL RNA gene is transcribed and then trans-spliced onto mRNAs. The tag allows enrichment and sequencing of mRNAs from that tissue only. As a proof of principle, we profiled the muscle transcriptome, which showed high coverage and efficient enrichment of muscle specific genes, with low background noise. To demonstrate the robustness of our method, we profiled muscle gene expression in dauer larvae and aging worms, revealing gene expression changes consistent with the physiology of these stages. The resulting muscle transcriptome also revealed 461 novel RNA transcripts, likely muscle-expressed long non-coding RNAs. In summary, the splicing-based RNA tagging (SRT) method provides a convenient and robust tool to profile trans-spliced genes and identify novel transcripts in a tissue-specific manner, with a low false positive rate. PMID:27557708

  19. Application of Cydia pomonella expressed sequence tags: Identification and expression of three general odorant binding proteins in codling moth.

    PubMed

    Garczynski, Stephen F; Coates, Brad S; Unruh, Thomas R; Schaeffer, Scott; Jiwan, Derick; Koepke, Tyson; Dhingra, Amit

    2013-10-01

    The codling moth, Cydia pomonella, is one of the most important pests of pome fruits in the world, yet the molecular genetics and the physiology of this insect remain poorly understood. A combined assembly of 8 341 expressed sequence tags was generated from Roche 454 GS-FLX sequencing of eight tissue-specific cDNA libraries. Putative chemosensory proteins (12) and odorant binding proteins (OBPs) (18) were annotated, which included three putative general OBP (GOBP), one more than typically reported for other Lepidoptera. To further characterize CpomGOBPs, we cloned cDNA copies of their transcripts and determined their expression patterns in various tissues. Cloning and sequencing of the 698 nt transcript for CpomGOBP1 resulted in the prediction of a 163 amino acid coding region, and subsequent RT-PCR indicated that the transcripts were mainly expressed in antennae and mouthparts. The 1 289 nt (160 amino acid) CpomGOBP2 and the novel 702 nt (169 amino acid) CpomGOBP3 transcripts are mainly expressed in antennae, mouthparts, and female abdomen tips. These results indicate that next generation sequencing is useful for the identification of novel transcripts of interest, and that codling moth expresses a transcript encoding for a new member of the GOBP subfamily.

  20. DG-CST (Disease Gene Conserved Sequence Tags), a database of human–mouse conserved elements associated to disease genes

    PubMed Central

    Boccia, Angelo; Petrillo, Mauro; di Bernardo, Diego; Guffanti, Alessandro; Mignone, Flavio; Confalonieri, Stefano; Luzi, Lucilla; Pesole, Graziano; Paolella, Giovanni; Ballabio, Andrea; Banfi, Sandro

    2005-01-01

    The identification and study of evolutionarily conserved genomic sequences that surround disease-related genes is a valuable tool to gain insight into the functional role of these genes and to better elucidate the pathogenetic mechanisms of disease. We created the DG-CST (Disease Gene Conserved Sequence Tags) database for the identification and detailed annotation of human–mouse conserved genomic sequences that are localized within or in the vicinity of human disease-related genes. CSTs are defined as sequences that show at least 70% identity between human and mouse over a length of at least 100 bp. The database contains CST data relative to over 1088 genes responsible for monogenetic human genetic diseases or involved in the susceptibility to multifactorial/polygenic diseases. DG-CST is accessible via the internet at http://dgcst.ceinge.unina.it/ and may be searched using both simple and complex queries. A graphic browser allows direct visualization of the CSTs and related annotations within the context of the relative gene and its transcripts. PMID:15608249

  1. Probing essential oil biosynthesis and secretion by functional evaluation of expressed sequence tags from mint glandular trichomes.

    PubMed

    Lange, B M; Wildung, M R; Stauber, E J; Sanchez, C; Pouchnik, D; Croteau, R

    2000-03-14

    Functional genomics approaches, which use combined computational and expression-based analyses of large amounts of sequence information, are emerging as powerful tools to accelerate the comprehensive understanding of cellular metabolism in specialized tissues and whole organisms. As part of an ongoing effort to identify genes of essential oil (monoterpene) biosynthesis, we have obtained sequence information from 1,316 randomly selected cDNA clones, or expressed sequence tags (ESTs), from a peppermint (Mentha x piperita) oil gland secretory cell cDNA library. After bioinformatic selection, candidate genes putatively involved in essential oil biosynthesis and secretion have been subcloned into suitable expression vectors for functional evaluation in Escherichia coli. On the basis of published and preliminary data on the functional properties of these clones, it is estimated that the ESTs involved in essential oil metabolism represent about 25% of the described sequences. An additional 7% of the recognized genes code for proteins involved in transport processes, and a subset of these is likely involved in the secretion of essential oil terpenes from the site of synthesis to the storage cavity of the oil glands. The integrated approaches reported here represent an essential step toward the development of a metabolic map of oil glands and provide a valuable resource for defining molecular targets for the genetic engineering of essential oil formation.

  2. Parallel tagged amplicon sequencing reveals major lineages and phylogenetic structure in the North American tiger salamander (Ambystoma tigrinum) species complex.

    PubMed

    O'Neill, Eric M; Schwartz, Rachel; Bullock, C Thomas; Williams, Joshua S; Shaffer, H Bradley; Aguilar-Miguel, X; Parra-Olea, Gabriela; Weisrock, David W

    2013-01-01

    Modern analytical methods for population genetics and phylogenetics are expected to provide more accurate results when data from multiple genome-wide loci are analysed. We present the results of an initial application of parallel tagged sequencing (PTS) on a next-generation platform to sequence thousands of barcoded PCR amplicons generated from 95 nuclear loci and 93 individuals sampled across the range of the tiger salamander (Ambystoma tigrinum) species complex. To manage the bioinformatic processing of this large data set (344 330 reads), we developed a pipeline that sorts PTS data by barcode and locus, identifies high-quality variable nucleotides and yields phased haplotype sequences for each individual at each locus. Our sequencing and bioinformatic strategy resulted in a genome-wide data set with relatively low levels of missing data and a wide range of nucleotide variation. structure analyses of these data in a genotypic format resulted in strongly supported assignments for the majority of individuals into nine geographically defined genetic clusters. Species tree analyses of the most variable loci using a multi-species coalescent model resulted in strong support for most branches in the species tree; however, analyses including more than 50 loci produced parameter sampling trends that indicated a lack of convergence on the posterior distribution. Overall, these results demonstrate the potential for amplicon-based PTS to rapidly generate large-scale data for population genetic and phylogenetic-based research.

  3. Application of Cydia pomonella expressed sequence tags: Identification and expression of three general odorant binding proteins in codling moth

    PubMed Central

    Garczynski, Stephen F.; Coates, Brad S.; Unruh, Thomas R.; Schaeffer, Scott; Jiwan, Derick; Koepke, Tyson; Dhingra, Amit

    2014-01-01

    The codling moth, Cydia pomonella, is one of the most important pests of pome fruits in the world, yet the molecular genetics and the physiology of this insect remain poorly understood. A combined assembly of 8 341 expressed sequence tags was generated from Roche 454 GS-FLX sequencing of eight tissue-specific cDNA libraries. Putative chemosensory proteins (12) and odorant binding proteins (OBPs) (18) were annotated, which included three putative general OBP (GOBP), one more than typically reported for other Lepidoptera. To further characterize CpomGOBPs, we cloned cDNA copies of their transcripts and determined their expression patterns in various tissues. Cloning and sequencing of the 698 nt transcript for CpomGOBP1 resulted in the prediction of a 163 amino acid coding region, and subsequent RT-PCR indicated that the transcripts were mainly expressed in antennae and mouthparts. The 1 289 nt (160 amino acid) CpomGOBP2 and the novel 702 nt (169 amino acid) CpomGOBP3 transcripts are mainly expressed in antennae, mouthparts, and female abdomen tips. These results indicate that next generation sequencing is useful for the identification of novel transcripts of interest, and that codling moth expresses a transcript encoding for a new member of the GOBP subfamily. PMID:23956229

  4. Transient Analysis Generator /TAG/ simulates behavior of large class of electrical networks

    NASA Technical Reports Server (NTRS)

    Thomas, W. J.

    1967-01-01

    Transient Analysis Generator program simulates both transient and dc steady-state behavior of a large class of electrical networks. It generates a special analysis program for each circuit described in an easily understood and manipulated programming language. A generator or preprocessor and a simulation system make up the TAG system.

  5. In silico identification of miRNAs and their targets from the expressed sequence tags of Raphanus sativus

    PubMed Central

    Muvva, Charuvaka; Tewari, Lata; Aruna, Kasoju; Ranjit, Pabbati; MD, Zahoorullah S; MD, K A Matheen; Veeramachaneni, Hemanth

    2012-01-01

    MicroRNAs (miRNAs) are a novel growing family of endogenous, small, non- coding, single-stranded RNA molecules directly involved in regulating gene expression at the posttranscriptional level. High conservation of miRNAs in plant provides the foundation for identification of new miRNAs in other plant species through homology alignment. Here, previous known plant miRNAs were BLASTed against the Expressed Sequence Tag (EST) database of Raphanus sativus, and according to a series of filtering criteria, a total of 48 miRNAs belonging to 9 miRNA families were identified, and 16 potential target genes of them were subsequently predicted, most of which seemed to encode transcription factors or enzymes participating in regulation of development, growth and other physiological processes. Overall, our findings lay the foundation for further researches of miRNAs function in R.sativus. PMID:22359443

  6. Image analysis for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Palaniappan, Kannappan; Huang, Thomas S.

    1991-07-01

    There is a great deal of interest in automating the process of DNA (deoxyribonucleic acid) sequencing to support the analysis of genomic DNA such as the Human and Mouse Genome projects. In one class of gel-based sequencing protocols autoradiograph images are generated in the final step and usually require manual interpretation to reconstruct the DNA sequence represented by the image. The need to handle a large volume of sequence information necessitates automation of the manual autoradiograph reading step through image analysis in order to reduce the length of time required to obtain sequence data and reduce transcription errors. Various adaptive image enhancement, segmentation and alignment methods were applied to autoradiograph images. The methods are adaptive to the local characteristics of the image such as noise, background signal, or presence of edges. Once the two-dimensional data is converted to a set of aligned one-dimensional profiles waveform analysis is used to determine the location of each band which represents one nucleotide in the sequence. Different classification strategies including a rule-based approach are investigated to map the profile signals, augmented with the original two-dimensional image data as necessary, to textual DNA sequence information.

  7. Improved measurement of brain deformation during mild head acceleration using a novel tagged MRI sequence.

    PubMed

    Knutsen, Andrew K; Magrath, Elizabeth; McEntee, Julie E; Xing, Fangxu; Prince, Jerry L; Bayly, Philip V; Butman, John A; Pham, Dzung L

    2014-11-07

    In vivo measurements of human brain deformation during mild acceleration are needed to help validate computational models of traumatic brain injury and to understand the factors that govern the mechanical response of the brain. Tagged magnetic resonance imaging is a powerful, noninvasive technique to track tissue motion in vivo which has been used to quantify brain deformation in live human subjects. However, these prior studies required from 72 to 144 head rotations to generate deformation data for a single image slice, precluding its use to investigate the entire brain in a single subject. Here, a novel method is introduced that significantly reduces temporal variability in the acquisition and improves the accuracy of displacement estimates. Optimization of the acquisition parameters in a gelatin phantom and three human subjects leads to a reduction in the number of rotations from 72 to 144 to as few as 8 for a single image slice. The ability to estimate accurate, well-resolved, fields of displacement and strain in far fewer repetitions will enable comprehensive studies of acceleration-induced deformation throughout the human brain in vivo.

  8. Improved measurement of brain deformation during mild head acceleration using a novel tagged MRI sequence

    PubMed Central

    Knutsen, Andrew K.; Magrath, Elizabeth; McEntee, Julie E.; Xing, Fangxu; Prince, Jerry L.; Bayly, Philip V.; Butman, John A.; Pham, Dzung L.

    2014-01-01

    In vivo measurements of human brain deformation during mild acceleration are needed to help validate computational models of traumatic brain injury and to understand the factors that govern the mechanical response of the brain. Tagged magnetic resonance imaging is a powerful, noninvasive technique to track tissue motion in vivo which has been used to quantify brain deformation in live human subjects. However, these prior studies required from 72 to 144 head rotations to generate deformation data for a single image slice, precluding its use to investigate the entire brain in a single subject. Here, a novel method is introduced that significantly reduces temporal variability in the acquisition and improves the accuracy of displacement estimates. Optimization of the acquisition parameters in a gelatin phantom and three human subjects leads to a reduction in the number of rotations from 72–144 to as few as 8for a single image slice. The ability to estimate accurate, well-resolved, fields of displacement and strain in far fewer repetitions will enable comprehensive studies of acceleration-induced deformation throughout the human brain in vivo. PMID:25287113

  9. Toward a physical map of Drosophila buzzatii. Use of randomly amplified polymorphic dna polymorphisms and sequence-tagged site landmarks.

    PubMed Central

    Laayouni, H; Santos, M; Fontdevila, A

    2000-01-01

    We present a physical map based on RAPD polymorphic fragments and sequence-tagged sites (STSs) for the repleta group species Drosophila buzzatii. One hundred forty-four RAPD markers have been used as probes for in situ hybridization to the polytene chromosomes, and positive results allowing the precise localization of 108 RAPDs were obtained. Of these, 73 behave as effectively unique markers for physical map construction, and in 9 additional cases the probes gave two hybridization signals, each on a different chromosome. Most markers (68%) are located on chromosomes 2 and 4, which partially agree with previous estimates on the distribution of genetic variation over chromosomes. One RAPD maps close to the proximal breakpoint of inversion 2z(3) but is not included within the inverted fragment. However, it was possible to conclude from this RAPD that the distal breakpoint of 2z(3) had previously been wrongly assigned. A total of 39 cytologically mapped RAPDs were converted to STSs and yielded an aggregate sequence of 28,431 bp. Thirty-six RAPDs (25%) did not produce any detectable hybridization signal, and we obtained the DNA sequence from three of them. Further prospects toward obtaining a more developed genetic map than the one currently available for D. buzzatii are discussed. PMID:11102375

  10. In silico identification of conserved microRNAs and their target transcripts from expressed sequence tags of three earthworm species.

    PubMed

    Gong, Ping; Xie, Fuliang; Zhang, Baohong; Perkins, Edward J

    2010-12-01

    MicroRNAs are a recently identified class of small regulatory RNAs that target more than 30% protein-coding genes. Elevating evidence shows that miRNAs play a critical role in many biological processes, including developmental timing, tissue differentiation, and response to chemical exposure. In this study, we applied a computational approach to analyze expressed sequence tags, and identified 32 miRNAs belonging to 22 miRNA families, in three earthworm species Eisenia fetida, Eisenia andrei, and Lumbricus rubellus. These newly identified earthworm miRNAs possess a difference of 2-4 nucleotides from their homologous counterparts in Caenorhabditis elegans. They also share similar features with other known animal miRNAs, for instance, the nucleotide U being dominant in both mature and pre-miRNA sequences, particularly in the first position of mature miRNA sequences at the 5' end. The newly identified earthworm miRNAs putatively regulate mRNA genes that are involved in many important biological processes and pathways related to development, growth, locomotion, and reproduction as well as response to stresses, particularly oxidative stress. Future efforts will focus on experimental validation of their presence and target mRNA genes to further elucidate their biological functions in earthworms.

  11. Developing expressed sequence tag libraries and the discovery of simple sequence repeat markers for two species of raspberry (Rubus L.)

    USDA-ARS?s Scientific Manuscript database

    Background: Due to a relatively high level of codominant inheritance and transferability within and among taxonomic groups, simple sequence repeat (SSR) markers are important elements in comparative mapping and delineation of genomic regions associated with traits of economic importance. Expressed S...

  12. Exploiting expressed sequence tag databases for the development and characterization of gene-derived simple sequence repeat markers in the opium poppy (Papaver somniferum L.) for forensic applications.

    PubMed

    Lee, Eun Jung; Jin, Gang Nam; Lee, Kyung Lyong; Han, Myun Soo; Lee, Yang Han; Yang, Moon Sik

    2011-09-01

    Simple sequence repeat (SSR) markers in the opium poppy (Papaver somniferum L.) were identified from an expressed sequence tag (EST) database comprised of 20,340 sequences. In total, 2780 SSR-containing sequences were identified. The most frequent microsatellite had an AT/TA motif (37%). Twenty-two opium poppy EST-SSR markers were presently developed and polymorphisms of six markers (psom 2, 4, 12, 13, 17, and 22) were utilized in 135 individuals under narcotic control investigation. An average of three alleles per locus (range: 2-5 alleles) with a mean heterozygosity of 0.167 was detected. Six loci identified 29 unique profiles in 135 individuals. The EST-SSR markers exhibited small degrees of genetic differentiation (fixation index = 0.727, p < 0.001). Other variable markers will be needed to facilitate the forensic identification of the opium poppy for future cases. To determine the potential for cross-species amplification, six markers were tested in five Papaver genera species and two Eschscholzia genera. The psom 4 and psom 17 primer pair was transferable. This is the first study to report SSR markers of the opium poppy.

  13. Random Tagging Genotyping by Sequencing (rtGBS), an Unbiased Approach to Locate Restriction Enzyme Sites across the Target Genome

    PubMed Central

    Hilario, Elena; Barron, Lorna; Deng, Cecilia H.; Datson, Paul M.; Davy, Marcus W.; Storey, Roy D.

    2015-01-01

    Genotyping by sequencing (GBS) is a restriction enzyme based targeted approach developed to reduce the genome complexity and discover genetic markers when a priori sequence information is unavailable. Sufficient coverage at each locus is essential to distinguish heterozygous from homozygous sites accurately. The number of GBS samples able to be pooled in one sequencing lane is limited by the number of restriction sites present in the genome and the read depth required at each site per sample for accurate calling of single-nucleotide polymorphisms. Loci bias was observed using a slight modification of the Elshire et al. method: some restriction enzyme sites were represented in higher proportions while others were poorly represented or absent. This bias could be due to the quality of genomic DNA, the endonuclease and ligase reaction efficiency, the distance between restriction sites, the preferential amplification of small library restriction fragments, or bias towards cluster formation of small amplicons during the sequencing process. To overcome these issues, we have developed a GBS method based on randomly tagging genomic DNA (rtGBS). By randomly landing on the genome, we can, with less bias, find restriction sites that are far apart, and undetected by the standard GBS (stdGBS) method. The study comprises two types of biological replicates: six different kiwifruit plants and two independent DNA extractions per plant; and three types of technical replicates: four samples of each DNA extraction, stdGBS vs. rtGBS methods, and two independent library amplifications, each sequenced in separate lanes. A statistically significant unbiased distribution of restriction fragment size by rtGBS showed that this method targeted 49% (39,145) of BamH I sites shared with the reference genome, compared to only 14% (11,513) by stdGBS. PMID:26633193

  14. Protein identities - Graphocephala atropunctata expressed sequenced tags: expanding leafhopper vector biology

    USDA-ARS?s Scientific Manuscript database

    A small heat shock protein was isolated and sequenced from the Blue-green sharpshooter, BGSS, Graphocephala atropunctata (Signoret) (Hemiptera: Cicadellidae). The BGSS has been the native vector of Pierce’s disease in vineyards in California for nearly a century. The importance of this vector spec...

  15. Identification of expressed resistance gene analogs from peanut (Arachis hypogaea L.) expressed sequence tags

    USDA-ARS?s Scientific Manuscript database

    Cultivated peanut is an important source of protein and oil. However, low genetic diversity makes peanut vulnerable to many diseases. Several hundred of partial genomic DNA sequences targeting nucleotide-binding-site leucine-rich repeat (NBS-LRR) resistance (R) genes have been reported. Only a small...

  16. Tag Questions across Irish English and British English: A Corpus Analysis of Form and Function

    ERIC Educational Resources Information Center

    Barron, Anne; Pandarova, Irina; Muderack, Karoline

    2015-01-01

    The present study, situated in the area of variational pragmatics, contrasts tag question (TQ) use in Ireland and Great Britain using spoken data from the Irish and British components of the International Corpus of English (ICE). Analysis is on the formal and functional level and also investigates form-functional relationships. Findings reveal…

  17. Tag Questions across Irish English and British English: A Corpus Analysis of Form and Function

    ERIC Educational Resources Information Center

    Barron, Anne; Pandarova, Irina; Muderack, Karoline

    2015-01-01

    The present study, situated in the area of variational pragmatics, contrasts tag question (TQ) use in Ireland and Great Britain using spoken data from the Irish and British components of the International Corpus of English (ICE). Analysis is on the formal and functional level and also investigates form-functional relationships. Findings reveal…

  18. FAST: FAST Analysis of Sequences Toolbox

    PubMed Central

    Lawrence, Travis J.; Kauffman, Kyle T.; Amrine, Katherine C. H.; Carper, Dana L.; Lee, Raymond S.; Becich, Peter J.; Canales, Claudia J.; Ardell, David H.

    2015-01-01

    FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought. PMID:26042145

  19. Expressed sequences tags of the anther smut fungus, Microbotryum violaceum, identify mating and pathogenicity genes

    PubMed Central

    Yockteng, Roxana; Marthey, Sylvain; Chiapello, Hélène; Gendrault, Annie; Hood, Michael E; Rodolphe, François; Devier, Benjamin; Wincker, Patrick; Dossat, Carole; Giraud, Tatiana

    2007-01-01

    Background The basidiomycete fungus Microbotryum violaceum is responsible for the anther-smut disease in many plants of the Caryophyllaceae family and is a model in genetics and evolutionary biology. Infection is initiated by dikaryotic hyphae produced after the conjugation of two haploid sporidia of opposite mating type. This study describes M. violaceum ESTs corresponding to nuclear genes expressed during conjugation and early hyphal production. Results A normalized cDNA library generated 24,128 sequences, which were assembled into 7,765 unique genes; 25.2% of them displayed significant similarity to annotated proteins from other organisms, 74.3% a weak similarity to the same set of known proteins, and 0.5% were orphans. We identified putative pheromone receptors and genes that in other fungi are involved in the mating process. We also identified many sequences similar to genes known to be involved in pathogenicity in other fungi. The M. violaceum EST database, MICROBASE, is available on the Web and provides access to the sequences, assembled contigs, annotations and programs to compare similarities against MICROBASE. Conclusion This study provides a basis for cloning the mating type locus, for further investigation of pathogenicity genes in the anther smut fungi, and for comparative genomics. PMID:17692127

  20. DNA sequence-based "bar codes" for tracking the origins of expressed sequence tags from a maize cDNA library constructed using multiple mRNA sources.

    PubMed

    Qiu, Fang; Guo, Ling; Wen, Tsui-Jung; Liu, Feng; Ashlock, Daniel A; Schnable, Patrick S

    2003-10-01

    To enhance gene discovery, expressed sequence tag (EST) projects often make use of cDNA libraries produced using diverse mixtures of mRNAs. As such, expression data are lost because the origins of the resulting ESTs cannot be determined. Alternatively, multiple libraries can be prepared, each from a more restricted source of mRNAs. Although this approach allows the origins of ESTs to be determined, it requires the production of multiple libraries. A hybrid approach is reported here. A cDNA library was prepared using 21 different pools of maize (Zea mays) mRNAs. DNA sequence "bar codes" were added during first-strand cDNA synthesis to uniquely identify the mRNA source pool from which individual cDNAs were derived. Using a decoding algorithm that included error correction, it was possible to identify the source mRNA pool of more than 97% of the ESTs. The frequency at which a bar code is represented in an EST contig should be proportional to the abundance of the corresponding mRNA in the source pool. Consistent with this, all ESTs derived from several genes (zein and adh1) that are known to be exclusively expressed in kernels or preferentially expressed under anaerobic conditions, respectively, were exclusively tagged with bar codes associated with mRNA pools prepared from kernel and anaerobically treated seedlings, respectively. Hence, by allowing for the retention of expression data, the bar coding of cDNA libraries can enhance the value of EST projects.

  1. High-Throughput T7 LIC Vector for Introducing C-Terminus Poly-Histidine Tags with Variable Lengths without Extra Sequences

    PubMed Central

    Lee, Jonas; Kim, Sung-Hou

    2008-01-01

    Immobilized metal ion affinity chromatography (IMAC) has become one of the most popular protein purification methods for recombinant proteins with a hexa-histidine tag (His-tag) placed at the C- or N- terminus of proteins. Nevertheless, there are always difficult proteins that show weak binding to the metal chelating resin and thus low purity. These difficulties are often overcome by increasing the His-tag to 8 or 10 histidines. Despite their success, there are only few expression vectors available to easily clone and test different His-tag lengths. Therefore, we have modified Escherichia coli T7 expression vector pET21a to accommodate ligation-independent cloning (LIC) that will allow easy and efficient parallel cloning of target genes with different His-tag lengths using a single insert. Unlike most LIC vectors available commercially, our vectors will not translate unwanted extra sequences by engineering the N-terminal linker to anneal before the open reading frame, and the C-terminal linker to anneal as a His-tag. PMID:18824233

  2. High-throughput T7 LIC vector for introducing C-terminal poly-histidine tags with variable lengths without extra sequences.

    PubMed

    Lee, Jonas; Kim, Sung-Hou

    2009-01-01

    Immobilized metal ion affinity chromatography (IMAC) has become one of the most popular protein purification methods for recombinant proteins with a hexa-histidine tag (His-tag) placed at the C- or N-terminus of proteins. Nevertheless, there are always difficult proteins that show weak binding to the metal chelating resin and thus low purity. These difficulties are often overcome by increasing the His-tag to 8 or 10 histidines. Despite their success, there are only few expression vectors available to easily clone and test different His-tag lengths. Therefore, we have modified Escherichia coli T7 expression vector pET21a to accommodate ligation-independent cloning (LIC) that will allow easy and efficient parallel cloning of target genes with different His-tag lengths using a single insert. Unlike most LIC vectors available commercially, our vectors will not translate unwanted extra sequences by engineering the N-terminal linker to anneal before the open reading frame, and the C-terminal linker to anneal as a His-tag.

  3. Identification of anhydrobiosis-related genes from an expressed sequence tag database in the cryptobiotic midge Polypedilum vanderplanki (Diptera; Chironomidae).

    PubMed

    Cornette, Richard; Kanamori, Yasushi; Watanabe, Masahiko; Nakahara, Yuichi; Gusev, Oleg; Mitsumasu, Kanako; Kadono-Okuda, Keiko; Shimomura, Michihiko; Mita, Kazuei; Kikawada, Takahiro; Okuda, Takashi

    2010-11-12

    Some organisms are able to survive the loss of almost all their body water content, entering a latent state known as anhydrobiosis. The sleeping chironomid (Polypedilum vanderplanki) lives in the semi-arid regions of Africa, and its larvae can survive desiccation in an anhydrobiotic form during the dry season. To unveil the molecular mechanisms of this resistance to desiccation, an anhydrobiosis-related Expressed Sequence Tag (EST) database was obtained from the sequences of three cDNA libraries constructed from P. vanderplanki larvae after 0, 12, and 36 h of desiccation. The database contained 15,056 ESTs distributed into 4,807 UniGene clusters. ESTs were classified according to gene ontology categories, and putative expression patterns were deduced for all clusters on the basis of the number of clones in each library; expression patterns were confirmed by real-time PCR for selected genes. Among up-regulated genes, antioxidants, late embryogenesis abundant (LEA) proteins, and heat shock proteins (Hsps) were identified as important groups for anhydrobiosis. Genes related to trehalose metabolism and various transporters were also strongly induced by desiccation. Those results suggest that the oxidative stress response plays a central role in successful anhydrobiosis. Similarly, protein denaturation and aggregation may be prevented by marked up-regulation of Hsps and the anhydrobiosis-specific LEA proteins. A third major feature is the predicted increase in trehalose synthesis and in the expression of various transporter proteins allowing the distribution of trehalose and other solutes to all tissues.

  4. Mining of assembled expressed sequence tag (EST) data for protein families: application to the G protein-coupled receptor superfamily.

    PubMed

    Conklin, D; Yee, D P; Millar, R; Engelbrecht, J; Vissing, H

    2000-02-01

    The availability of large expressed sequence tag (EST) databases has led to a revolution in the way new genes are identified. Mining of these databases using known protein sequences as queries is a powerful technique for discovering orthologous and paralogous genes. The scientist is often confronted, however, by an enormous amount of search output owing to the inherent redundancy of EST data. In addition, high search sensitivity often cannot be achieved using only a single member of a protein superfamily as a query. In this paper a technique for addressing both of these issues is described. Assembled EST databases are queried with every member of a protein superfamily, the results are integrated and false positives are pruned from the set. The result is a set of assemblies enriched in members of the protein superfamily under consideration. The technique is applied to the G protein-coupled receptor (GPCR) superfamily in the construction of a GPCR Resource. A novel full-length human GPCR identified from the GPCR Resource is presented, illustrating the utility of the method.

  5. A semiautomated approach to gene discovery through expressed sequence tag data mining: discovery of new human transporter genes.

    PubMed

    Brown, Shoshana; Chang, Jean L; Sadée, Wolfgang; Babbitt, Patricia C

    2003-01-01

    Identification and functional characterization of the genes in the human genome remain a major challenge. A principal source of publicly available information used for this purpose is the National Center for Biotechnology Information database of expressed sequence tags (dbEST), which contains over 4 million human ESTs. To extract the information buried in this data more effectively, we have developed a semiautomated method to mine dbEST for uncharacterized human genes. Starting with a single protein input sequence, a family of related proteins from all species is compiled. This entire family is then used to mine the human EST database for new gene candidates. Evaluation of putative new gene candidates in the context of a family of characterized proteins provides a framework for inference of the structure and function of the new genes. When applied to a test data set of 28 families within the major facilitator superfamily (MFS) of membrane transporters, our protocol found 73 previously characterized human MFS genes and 43 new MFS gene candidates. Development of this approach provided insights into the problems and pitfalls of automated data mining using public databases.

  6. Uncovering the Salt Response of Soybean by Unraveling Its Wild and Cultivated Functional Genomes Using Tag Sequencing

    PubMed Central

    Ali, Zulfiqar; Zhang, Da Yong; Xu, Zhao Long; Xu, Ling; Yi, Jin Xin; He, Xiao Lan; Huang, Yi Hong; Liu, Xiao Qing; Khan, Asif Ali; Trethowan, Richard M.; Ma, Hong Xiang

    2012-01-01

    Soil salinity has very adverse effects on growth and yield of crop plants. Several salt tolerant wild accessions and cultivars are reported in soybean. Functional genomes of salt tolerant Glycine soja and a salt sensitive genotype of Glycine max were investigated to understand the mechanism of salt tolerance in soybean. For this purpose, four libraries were constructed for Tag sequencing on Illumina platform. We identify around 490 salt responsive genes which included a number of transcription factors, signaling proteins, translation factors and structural genes like transporters, multidrug resistance proteins, antiporters, chaperons, aquaporins etc. The gene expression levels and ratio of up/down-regulated genes was greater in tolerant plants. Translation related genes remained stable or showed slightly higher expression in tolerant plants under salinity stress. Further analyses of sequenced data and the annotations for gene ontology and pathways indicated that soybean adapts to salt stress through ABA biosynthesis and regulation of translation and signal transduction of structural genes. Manipulation of these pathways may mitigate the effect of salt stress thus enhancing salt tolerance. PMID:23209559

  7. Identification of Anhydrobiosis-related Genes from an Expressed Sequence Tag Database in the Cryptobiotic Midge Polypedilum vanderplanki (Diptera; Chironomidae)*

    PubMed Central

    Cornette, Richard; Kanamori, Yasushi; Watanabe, Masahiko; Nakahara, Yuichi; Gusev, Oleg; Mitsumasu, Kanako; Kadono-Okuda, Keiko; Shimomura, Michihiko; Mita, Kazuei; Kikawada, Takahiro; Okuda, Takashi

    2010-01-01

    Some organisms are able to survive the loss of almost all their body water content, entering a latent state known as anhydrobiosis. The sleeping chironomid (Polypedilum vanderplanki) lives in the semi-arid regions of Africa, and its larvae can survive desiccation in an anhydrobiotic form during the dry season. To unveil the molecular mechanisms of this resistance to desiccation, an anhydrobiosis-related Expressed Sequence Tag (EST) database was obtained from the sequences of three cDNA libraries constructed from P. vanderplanki larvae after 0, 12, and 36 h of desiccation. The database contained 15,056 ESTs distributed into 4,807 UniGene clusters. ESTs were classified according to gene ontology categories, and putative expression patterns were deduced for all clusters on the basis of the number of clones in each library; expression patterns were confirmed by real-time PCR for selected genes. Among up-regulated genes, antioxidants, late embryogenesis abundant (LEA) proteins, and heat shock proteins (Hsps) were identified as important groups for anhydrobiosis. Genes related to trehalose metabolism and various transporters were also strongly induced by desiccation. Those results suggest that the oxidative stress response plays a central role in successful anhydrobiosis. Similarly, protein denaturation and aggregation may be prevented by marked up-regulation of Hsps and the anhydrobiosis-specific LEA proteins. A third major feature is the predicted increase in trehalose synthesis and in the expression of various transporter proteins allowing the distribution of trehalose and other solutes to all tissues. PMID:20833722

  8. Integration of Expressed Sequence Tag Data Flanking Predicted RNA Secondary Structures Facilitates Novel Non-Coding RNA Discovery

    PubMed Central

    Krzyzanowski, Paul M.; Price, Feodor D.; Muro, Enrique M.; Rudnicki, Michael A.; Andrade-Navarro, Miguel A.

    2011-01-01

    Many computational methods have been used to predict novel non-coding RNAs (ncRNAs), but none, to our knowledge, have explicitly investigated the impact of integrating existing cDNA-based Expressed Sequence Tag (EST) data that flank structural RNA predictions. To determine whether flanking EST data can assist in microRNA (miRNA) prediction, we identified genomic sites encoding putative miRNAs by combining functional RNA predictions with flanking ESTs data in a model consistent with miRNAs undergoing cleavage during maturation. In both human and mouse genomes, we observed that the inclusion of flanking ESTs adjacent to and not overlapping predicted miRNAs significantly improved the performance of various methods of miRNA prediction, including direct high-throughput sequencing of small RNA libraries. We analyzed the expression of hundreds of miRNAs predicted to be expressed during myogenic differentiation using a customized microarray and identified several known and predicted myogenic miRNA hairpins. Our results indicate that integrating ESTs flanking structural RNA predictions improves the quality of cleaved miRNA predictions and suggest that this strategy can be used to predict other non-coding RNAs undergoing cleavage during maturation. PMID:21698286

  9. Identification and characterization of 43 microsatellite markers derived from expressed sequence tags of the sea cucumber ( Apostichopus japonicus)

    NASA Astrophysics Data System (ADS)

    Jiang, Qun; Li, Qi; Yu, Hong; Kong, Lingfeng

    2011-06-01

    The sea cucumber Apostichopus japonicus is a commercially and ecologically important species in China. A total of 3056 potential unigenes were generated after assembling 7597 A. japonicus expressed sequence tags (ESTs) downloaded from Gen-Bank. Two hundred and fifty microsatellite-containing ESTs (8.18%) and 299 simple sequence repeats (SSRs) were detected. The average density of SSRs was 1 per 7.403 kb of EST after redundancy elimination. Di-nucleotide repeat motifs appeared to be the most abundant type with a percentage of 69.90%. Of the 126 primer pairs designed, 90 amplified the expected products and 43 showed polymorphism in 30 individuals tested. The number of alleles per locus ranged from 2 to 26 with an average of 7.0 alleles, and the observed and expected heterozygosities varied from 0.067 to 1.000 and from 0.066 to 0.959, respectively. These new EST-derived microsatellite markers would provide sufficient polymorphism for population genetic studies and genome mapping of this sea cucumber species.

  10. Construction of a Genetic Linkage Map Based on Amplified Fragment Length Polymorphism Markers and Development of Sequence-Tagged Site Markers for Marker-Assisted Selection of the Sporeless Trait in the Oyster Mushroom (Pleurotus eryngii)

    PubMed Central

    Ueda, Jun; Obatake, Yasushi; Murakami, Shigeyuki; Fukumasa, Yukitaka; Matsumoto, Teruyuki

    2012-01-01

    A large number of spores from fruiting bodies can lead to allergic reactions and other problems during the cultivation of edible mushrooms, including Pleurotus eryngii (DC.) Quél. A cultivar harboring a sporulation-deficient (sporeless) mutation would be useful for preventing these problems, but traditional breeding requires extensive time and labor. In this study, using a sporeless P. eryngii strain, we constructed a genetic linkage map to introduce a molecular breeding program like marker-assisted selection. Based on the segregation of 294 amplified fragment length polymorphism markers, two mating type factors, and the sporeless trait, the linkage map consisted of 11 linkage groups with a total length of 837.2 centimorgans (cM). The gene region responsible for the sporeless trait was located in linkage group IX with 32 amplified fragment length polymorphism markers and the B mating type factor. We also identified eight markers closely linked (within 1.2 cM) to the sporeless locus using bulked-segregant analysis-based amplified fragment length polymorphism. One such amplified fragment length polymorphism marker was converted into two sequence-tagged site markers, SD488-I and SD488-II. Using 14 wild isolates, sequence-tagged site analysis indicated the potential usefulness of the combination of two sequence-tagged site markers in cross-breeding of the sporeless strain. It also suggested that a map constructed for P. eryngii has adequate accuracy for marker-assisted selection. PMID:22210222

  11. In silico mining for simple sequence repeat loci in a pineapple expressed sequence tag database and cross-species amplification of EST-SSR markers across Bromeliaceae.

    PubMed

    Wöhrmann, Tina; Weising, Kurt

    2011-08-01

    A collection of 5,659 expressed sequence tags (ESTs) from pineapple [Ananas comosus (L.) Merr.] was screened for simple sequence repeats (EST-SSRs) with motif lengths between 1 and 6 bp. Lower thresholds of 15, 7 and 5 repeat units were used to define microsatellites of the mono-, di-, and tri- to hexanucleotide repeat type, respectively. Based on these criteria, 696 SSRs were identified among 3,389 EST unigenes, together representing 2,840 kb. This corresponds to an average density of one SSR every 4.1 kb of non-redundant EST sequences. Dinucleotide repeats were most abundant (38.4% of all SSRs) followed by trinucleotide repeats (38.1%). Flanking primer pairs were designed for 537 EST-SSR loci, and 49 of these were screened for their functionality in 12 accessions of A. comosus, 14 accessions of 5 additional Ananas species and 1 species of Pseudananas. Distinct PCR products of the expected size range were obtained with 36 primer pairs. Eighteen loci analyzed in more detail were all polymorphic in pineapple, and primer pairs flanking these loci also generated PCR products from a wide range of genera and species from six subfamilies of the Bromeliaceae. The potential to reveal polymorphism in a heterologous target species was demonstrated in Deuterocohnia brevifolia (subfamily Pitcairnioideae).

  12. Analyses of Expressed Sequence Tags from the Maize Foliar Pathogen Cercospora Zeae-Maydis Identifing Novel Genes expressed during Vegetative, Infectious, & Reproductive Growth

    USDA-ARS?s Scientific Manuscript database

    The fungus Cercospora zeae-maydis is an aggressive foliar pathogen of maize that causes substantial yield losses annually throughout the western hemisphere. To learn more about the molecular regulation of pathogenesis in C. zeae-maydis, we generated a collection of expressed sequence tags (ESTs) and...

  13. Development of high-density linkage map and tagging leaf spot resistance in pearl millet using genotyping-by-sequencing markers

    USDA-ARS?s Scientific Manuscript database

    Pearl millet is an important forage and grain crop in many parts of the world. Genome mapping studies are a prerequisite for tagging agronomically important traits. Genotyping-by-Sequencing (GBS) markers can be used to build high density linkage maps even in species lacking a reference genome. A re...

  14. Digital Gene Expression Tag Profiling Analysis of the Gene Expression Patterns Regulating the Early Stage of Mouse Spermatogenesis

    PubMed Central

    Meng, Lijun; Liu, Meiling; Zhao, Lina; Hu, Fen; Ding, Cunbao; Wang, Yang; He, Baoling; Pan, Yuxin; Fang, Wei; Chen, Jing; Hu, Songnian; Jia, Mengchun

    2013-01-01

    Detailed characterization of the gene expression patterns in spermatogonia and primary spermatocytes is critical to understand the processes which occur prior to meiosis during normal spermatogenesis. The genome-wide expression profiles of mouse type B spermatogonia and primary spermatocytes were investigated using the Solexa/Illumina digital gene expression (DGE) system, a tag based high-throughput transcriptome sequencing method, and the developmental processes which occur during early spermatogenesis were systematically analyzed. Gene expression patterns vary significantly between mouse type B spermatogonia and primary spermatocytes. The functional analysis revealed that genes related to junction assembly, regulation of the actin cytoskeleton and pluripotency were most significantly differently expressed. Pathway analysis indicated that the Wnt non-canonical signaling pathway played a central role and interacted with the actin filament organization pathway during the development of spermatogonia. This study provides a foundation for further analysis of the gene expression patterns and signaling pathways which regulate the molecular mechanisms of early spermatogenesis. PMID:23554914

  15. Comparative analysis of cleavable diazobenzene-based affinity tags for bioorthogonal chemical proteomics

    PubMed Central

    Yang, Yu-Ying; Grammel, Markus; Raghavan, Anuradha S.; Charron, Guillaume

    2011-01-01

    SUMMARY The advances in bioorthogonal ligation methods have provided new opportunities for proteomic analysis of newly synthesized proteins, posttranslational modifications and specific enzyme families using azide/alkyne-functionalized chemical reporters and activity-based probes. Efficient enrichment and elution of azide/alkyne-labeled proteins with selectively cleavable affinity tags is essential for protein identification and quantification applications. Here we report the synthesis and comparative analysis of Na2S2O4-cleavable diazobenzene-based affinity tags for bioorthogonal chemical proteomics. We demonstrated that ortho-hydroxyl substituent is required for efficient diazobenzene-bond cleavage and show that these cleavable affinity tags can be used to identify newly synthesized proteins in bacteria targeted by amino acid chemical reporters as well as their sites of modification on endogenously expressed proteins. The diazobenzene-based affinity tags are compatible with in-gel, in-solution and on-bead enrichment strategies and should afford useful tools for diverse bioorthogonal proteomic applications. PMID:21095571

  16. Analysis and design of power efficient semi-passive RFID tag

    NASA Astrophysics Data System (ADS)

    Wenyi, Che; Shuo, Guan; Xiao, Wang; Tingwen, Xiong; Jingtian, Xi; Xi, Tan; Na, Yan; Hao, Min

    2010-07-01

    The analysis and design of a semi-passive radio frequency identification (RFID) tag is presented. By studying the power transmission link of the backscatter RFID system and exploiting a power conversion efficiency model for a multi-stage AC-DC charge pump, the calculation method for semi-passive tag's read range is proposed. According to different read range limitation factors, an intuitive way to define the specifications of tag's power budget and backscatter modulation index is given. A test chip is implemented in SMIC 0.18 μm standard CMOS technology under the guidance of theoretical analysis. The main building blocks are the threshold compensated charge pump and low power wake-up circuit using the power triggering wake-up mode. The proposed semi-passive tag is fully compatible to EPC C1G2 standard. It has a compact chip size of 0.54 mm2, and is adaptable to batteries with a 1.2 to 2.4 V output voltage.

  17. Parasites as biological tags of fish stocks: a meta-analysis of their discriminatory power.

    PubMed

    Poulin, Robert; Kamiya, Tsukushi

    2015-01-01

    The use of parasites as biological tags to discriminate among marine fish stocks has become a widely accepted method in fisheries management. Here, we first link this approach to its unstated ecological foundation, the decay in the similarity of the species composition of assemblages as a function of increasing distance between them, a phenomenon almost universal in nature. We explain how distance decay of similarity can influence the use of parasites as biological tags. Then, we perform a meta-analysis of 61 uses of parasites as tags of marine fish populations in multivariate discriminant analyses, obtained from 29 articles. Our main finding is that across all studies, the observed overall probability of correct classification of fish based on parasite data was about 71%. This corresponds to a two-fold improvement over the rate of correct classification expected by chance alone, and the average effect size (Zr = 0·463) computed from the original values was also indicative of a medium-to-large effect. However, none of the moderator variables included in the meta-analysis had a significant effect on the proportion of correct classification; these moderators included the total number of fish sampled, the number of parasite species used in the discriminant analysis, the number of localities from which fish were sampled, the minimum and maximum distance between any pair of sampling localities, etc. Therefore, there are no clear-cut situations in which the use of parasites as tags is more useful than others. Finally, we provide recommendations for the future usage of parasites as tags for stock discrimination, to ensure that future applications of the method achieve statistical rigour and a high discriminatory power.

  18. Identification and Validation of Expressed Sequence Tags from Pigeonpea (Cajanus cajan L.) Root

    PubMed Central

    Kumar, Ravi Ranjan; Yadav, Shailesh; Joshi, Shourabh; Bhandare, Prithviraj P.; Patil, Vinod Kumar; Kulkarni, Pramod B.; Sonkawade, Swati; Naik, G. R.

    2014-01-01

    Pigeonpea (Cajanus cajan (L) Millsp.) is an important food legume crop of rain fed agriculture in the arid and semiarid tropics of the world. It has deep and extensive root system which serves a number of important physiological and metabolic functions in plant development and growth. In order to identify genes associated with pigeonpea root, ESTs were generated from the root tissues of pigeonpea (GRG-295 genotype) by normalized cDNA library. A total of 105 high quality ESTs were generated by sequencing of 250 random clones which resulted in 72 unigenes comprising 25 contigs and 47 singlets. The ESTs were assigned to 9 functional categories on the basis of their putative function. In order to validate the possible expression of transcripts, four genes, namely, S-adenosylmethionine synthetase, phosphoglycerate kinase, serine carboxypeptidase, and methionine aminopeptidase, were further analyzed by reverse transcriptase PCR. The possible role of the identified transcripts and their functions associated with root will also be a valuable resource for the functional genomics study in legume crop. PMID:24895494

  19. Development of microsatellite markers based on expressed sequence tags in Asparagus cochinchinensis (Asparagaceae)1

    PubMed Central

    Kim, Bo-Yun; Park, Han-Sol; Lee, Jung-Hoon; Kwak, Myounghai; Kim, Young-Dong

    2017-01-01

    Premise of the study: Transcriptome-derived simple sequence repeat (SSR) markers were developed in Asparagus cochinchinensis (Asparagaceae). Due to its application in traditional medicine, its wild populations are threatened by over-collection even in protected areas, requiring immediate conservation efforts. Methods and Results: Based on transcriptome data of A. cochinchinensis, 96 primer pairs with two to seven alleles per locus were selected for initial validation; of those, 27 primer pairs amplified across all samples, resulting in 15 polymorphic and 12 monomorphic microsatellite markers. The usefulness of these markers was assessed in 60 individuals representing three populations of A. cochinchinensis. Observed and expected heterozygosity values ranged from 0.050 to 0.950 and 0.049 to 0.626, respectively. Cross-species amplification of the 27 markers was tested in the related species A. rigidulus and A. schoberioides. Conclusions: These polymorphic, transcriptome-derived SSR markers can be used as molecular markers to study population genetics and ecological conservation in A. cochinchinensis and related taxa. PMID:28439480

  20. A new view of insect-crustacean relationships II. Inferences from expressed sequence tags and comparisons with neural cladistics.

    PubMed

    Andrew, David R

    2011-05-01

    The enormous diversity of Arthropoda has complicated attempts by systematists to deduce the history of this group in terms of phylogenetic relationships and phenotypic change. Traditional hypotheses regarding the relationships of the major arthropod groups (Chelicerata, Myriapoda, Crustacea, and Hexapoda) focus on suites of morphological characters, whereas phylogenomics relies on large amounts of molecular sequence data to infer evolutionary relationships. The present discussion is based on expressed sequence tags (ESTs) that provide large numbers of short molecular sequences and so provide an abundant source of sequence data for phylogenetic inference. This study presents well-supported phylogenies of diverse arthropod and metazoan outgroup taxa obtained from publicly-available databases. An in-house bioinformatics pipeline has been used to compile and align conserved orthologs from each taxon for maximum likelihood inferences. This approach resolves many currently accepted hypotheses regarding internal relationships between the major groups of Arthropoda, including monophyletic Hexapoda, Tetraconata (Crustacea + Hexapoda), Myriapoda, and Chelicerata sensu lato (Pycnogonida + Euchelicerata). "Crustacea" is a paraphyletic group with some taxa more closely related to the monophyletic Hexapoda. These results support studies that have utilized more restricted EST data for phylogenetic inference, yet they differ in important regards from recently published phylogenies employing nuclear protein-coding sequences. The present results do not, however, depart from other phylogenies that resolve Branchiopoda as the crustacean sister group of Hexapoda. Like other molecular phylogenies, EST-derived phylogenies alone are unable to resolve morphological convergences or evolved reversals and thus omit what may be crucial events in the history of life. For example, molecular data are unable to resolve whether a Hexapod-Branchiopod sister relationship infers a branchiopod

  1. Novel Y-chromosomal microdeletions associated with non-obstructive azoospermia uncovered by high throughput sequencing of sequence-tagged sites (STSs).

    PubMed

    Liu, Xiao; Li, Zesong; Su, Zheng; Zhang, Junjie; Li, Honggang; Xie, Jun; Xu, Hanshi; Jiang, Tao; Luo, Liya; Zhang, Ruifang; Zeng, Xiaojing; Xu, Huaiqian; Huang, Yi; Mou, Lisha; Hu, Jingchu; Qian, Weiping; Zeng, Yong; Zhang, Xiuqing; Xiong, Chengliang; Yang, Huanming; Kristiansen, Karsten; Cai, Zhiming; Wang, Jun; Gui, Yaoting

    2016-02-24

    Y-chromosomal microdeletion (YCM) serves as an important genetic factor in non-obstructive azoospermia (NOA). Multiplex polymerase chain reaction (PCR) is routinely used to detect YCMs by tracing sequence-tagged sites (STSs) in the Y chromosome. Here we introduce a novel methodology in which we sequence 1,787 (post-filtering) STSs distributed across the entire male-specific Y chromosome (MSY) in parallel to uncover known and novel YCMs. We validated this approach with 766 Chinese men with NOA and 683 ethnically matched healthy individuals and detected 481 and 98 STSs that were deleted in the NOA and control group, representing a substantial portion of novel YCMs which significantly influenced the functions of spermatogenic genes. The NOA patients tended to carry more and rarer deletions that were enriched in nearby intragenic regions. Haplogroup O2* was revealed to be a protective lineage for NOA, in which the enrichment of b1/b3 deletion in haplogroup C was also observed. In summary, our work provides a new high-resolution portrait of deletions in the Y chromosome.

  2. Novel Y-chromosomal microdeletions associated with non-obstructive azoospermia uncovered by high throughput sequencing of sequence-tagged sites (STSs)

    PubMed Central

    Liu, Xiao; Li, Zesong; Su, Zheng; Zhang, Junjie; Li, Honggang; Xie, Jun; Xu, Hanshi; Jiang, Tao; Luo, Liya; Zhang, Ruifang; Zeng, Xiaojing; Xu, Huaiqian; Huang, Yi; Mou, Lisha; Hu, Jingchu; Qian, Weiping; Zeng, Yong; Zhang, Xiuqing; Xiong, Chengliang; Yang, Huanming; Kristiansen, Karsten; Cai, Zhiming; Wang, Jun; Gui, Yaoting

    2016-01-01

    Y-chromosomal microdeletion (YCM) serves as an important genetic factor in non-obstructive azoospermia (NOA). Multiplex polymerase chain reaction (PCR) is routinely used to detect YCMs by tracing sequence-tagged sites (STSs) in the Y chromosome. Here we introduce a novel methodology in which we sequence 1,787 (post-filtering) STSs distributed across the entire male-specific Y chromosome (MSY) in parallel to uncover known and novel YCMs. We validated this approach with 766 Chinese men with NOA and 683 ethnically matched healthy individuals and detected 481 and 98 STSs that were deleted in the NOA and control group, representing a substantial portion of novel YCMs which significantly influenced the functions of spermatogenic genes. The NOA patients tended to carry more and rarer deletions that were enriched in nearby intragenic regions. Haplogroup O2* was revealed to be a protective lineage for NOA, in which the enrichment of b1/b3 deletion in haplogroup C was also observed. In summary, our work provides a new high-resolution portrait of deletions in the Y chromosome. PMID:26907467

  3. Rapid analysis of protein expression and solubility with the SpyTag-SpyCatcher system.

    PubMed

    Dovala, Dustin; Sawyer, William S; Rath, Christopher M; Metzger, Louis E

    2016-01-01

    Successful isolation of well-folded and active protein often first requires the creation of many constructs. These are needed to assess the effects of truncations, insertions, mutations, and the presence and position of different affinity tags. Determining which constructs yield the highest expression and solubility requires the investigator to express and partially purify each construct, and, in the case of low-expressing proteins, to follow the protein using time-consuming Western blots. Even then, many proteins form soluble aggregates, which may only be apparent after more extensive purification via size exclusion chromatography. In this work, we have utilized a covalent bond-forming tag/domain pair, known as SpyTag/SpyCatcher, to rapidly and specifically attach a fluorescent label to proteins of interest in cellular lysates. Once labeled, tagged proteins can easily be followed via SDS-PAGE and fluorescence size exclusion chromatography (F-SEC) to assess expression levels, solubility, and monodispersity without the need for purification. These techniques enable rapid and facile analysis of proteins, which may greatly facilitate optimization of protein expression constructs. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  4. WEBSAGE: a web tool for visual analysis of differentially expressed human SAGE tags.

    PubMed

    Pylouster, Jean; Sénamaud-Beaufort, Catherine; Saison-Behmoaras, Tula Ester

    2005-07-01

    The serial analysis of gene expression (SAGE) is a powerful method to compare gene expression of mRNA populations. To provide quantitative expression levels on a genome-wide scale, the Cancer Genome Anatomy Project (CGAP) uses SAGE. Over 7 million SAGE tags, from 171 human cell types have been assembled. The growing number of laboratories involved in SAGE research necessitates the use of software that provides statistical analysis of raw data, allowing the rapid visualization and interpretation of results. We have created the first simple tool that performs statistical analysis on SAGE data, identifies the tags differentially expressed and shows the results in a scatter plot. It is freely available and accessible at http://bioserv.rpbs.jussieu.fr/websage/index.php.

  5. Comparison of direct boiling method with commercial kits for extracting fecal microbiome DNA by Illumina sequencing of 16S rRNA tags.

    PubMed

    Peng, Xin; Yu, Ke-Qiang; Deng, Guan-Hua; Jiang, Yun-Xia; Wang, Yu; Zhang, Guo-Xia; Zhou, Hong-Wei

    2013-12-01

    Low cost and high throughput capacity are major advantages of using next generation sequencing (NGS) techniques to determine metagenomic 16S rRNA tag sequences. These methods have significantly changed our view of microorganisms in the fields of human health and environmental science. However, DNA extraction using commercial kits has shortcomings of high cost and time constraint. In the present study, we evaluated the determination of fecal microbiomes using a direct boiling method compared with 5 different commercial extraction methods, e.g., Qiagen and MO BIO kits. Principal coordinate analysis (PCoA) using UniFrac distances and clustering showed that direct boiling of a wide range of feces concentrations gave a similar pattern of bacterial communities as those obtained from most of the commercial kits, with the exception of the MO BIO method. Fecal concentration by boiling method affected the estimation of α-diversity indices, otherwise results were generally comparable between boiling and commercial methods. The operational taxonomic units (OTUs) determined through direct boiling showed highly consistent frequencies with those determined through most of the commercial methods. Even those for the MO BIO kit were also obtained by the direct boiling method with high confidence. The present study suggested that direct boiling could be used to determine the fecal microbiome and using this method would significantly reduce the cost and improve the efficiency of the sample preparation for studying gut microbiome diversity.

  6. Deformation analysis of 3D tagged cardiac images using an optical flow method

    PubMed Central

    2010-01-01

    Background This study proposes and validates a method of measuring 3D strain in myocardium using a 3D Cardiovascular Magnetic Resonance (CMR) tissue-tagging sequence and a 3D optical flow method (OFM). Methods Initially, a 3D tag MR sequence was developed and the parameters of the sequence and 3D OFM were optimized using phantom images with simulated deformation. This method then was validated in-vivo and utilized to quantify normal sheep left ventricular functions. Results Optimizing imaging and OFM parameters in the phantom study produced sub-pixel root-mean square error (RMS) between the estimated and known displacements in the x (RMSx = 0.62 pixels (0.43 mm)), y (RMSy = 0.64 pixels (0.45 mm)) and z (RMSz = 0.68 pixels (1 mm)) direction, respectively. In-vivo validation demonstrated excellent correlation between the displacement measured by manually tracking tag intersections and that generated by 3D OFM (R ≥ 0.98). Technique performance was maintained even with 20% Gaussian noise added to the phantom images. Furthermore, 3D tracking of 3D cardiac motions resulted in a 51% decrease in in-plane tracking error as compared to 2D tracking. The in-vivo function studies showed that maximum wall thickening was greatest in the lateral wall, and increased from both apex and base towards the mid-ventricular region. Regional deformation patterns are in agreement with previous studies on LV function. Conclusion A novel method was developed to measure 3D LV wall deformation rapidly with high in-plane and through-plane resolution from one 3D cine acquisition. PMID:20353600

  7. Moving Away from the Reference Genome: Evaluating a Peptide Sequencing Tagging Approach for Single Amino Acid Polymorphism Identifications in the Genus Populus

    SciTech Connect

    Abraham, Paul E; Adams, Rachel M; Tuskan, Gerald A; Hettich, Robert {Bob} L

    2013-01-01

    The genetic diversity across natural populations of the model organism, Populus, is extensive, containing a single nucleotide polymorphism roughly every 200 base pairs. When deviations from the reference genome occur in coding regions, they can impact protein sequences. Rather than relying on a static reference database to profile protein expression, we employed a peptide sequence tagging (PST) approach capable of decoding the plasticity of the Populus proteome. Using shotgun proteomics data from two genotypes of P. trichocarpa, a tag-based approach enabled the detection of 6,653 unexpected sequence variants. Through manual validation, our study investigated how the most abundant chemical modification (methionine oxidation) could masquerade as a sequence variant (AlaSer) when few site-determining ions existed. In fact, precise localization of an oxidation site for peptides with more than one potential placement was indeterminate for 70% of the MS/MS spectra. We demonstrate that additional fragment ions made available by high energy collisional dissociation enhances the robustness of the peptide sequence tagging approach (81% of oxidation events could be exclusively localized to a methionine). We are confident that augmenting fragmentation processes for a PST approach will further improve the identification of single amino acid polymorphism in Populus and potentially other species as well.

  8. Sequence Tag Site and Host Range Assays Demonstrate that Radapholus similis and R. citraphilus are not Reproductively Isolated.

    PubMed

    Kaplan, D T; Vanderspool, M C; Opperman, C H

    1997-12-01

    Males of citrus-parasitic Radopholus citrophilus (FL1) were mated with non-citrus-parasitic R. similis (FL5) females. Progeny inherited a 2.4-kb sequence tag site (DK#1) and the ability to reproduce in citrus from the paternal parent (FLl); both traits were absent in the maternal line (FL5). The hybrid progeny produced offspring in roots of citrus seedlings over an 8-month period and therefore were considered reproductively viable. Genomic DNA hybridization studies indicated that one or more copies of DK#1 were present in R. citrophilus FL1. It is not likely that DK#1 represents a citrus parasitism gene because it was amplified from some burrowing nematode isolates that did not parasitize citrus and because DK#1 contains no open reading frames. Inability to reliably test individual nematodes for their ability to parasitize citrus was a constraint to obtaining F2 data required for definitive genetic characterization of citrus parasitism in burrowing nematodes, and alternate approaches will be required. Although the physical relationship of DK#1 and the citrus parasitism locus remains undefined, results of controlled mating studies using these parameters as genetic markers enabled us to identify hybrid F progeny. Therefore, R. similis and R. citrophilus are not sibling species since gene flow between the two does not appear to be restricted via geographic isolation (sympatric in Florida) or by genetics.

  9. Water stress-responsive genes in loblolly pine (Pinus taeda) roots identified by analyses of expressed sequence tag libraries.

    PubMed

    Lorenz, W Walter; Sun, Feng; Liang, Chun; Kolychev, Dmitri; Wang, Haiming; Zhao, Xin; Cordonnier-Pratt, Marie-Michele; Pratt, Lee H; Dean, Jeffrey F D

    2006-01-01

    Drought stress is the principal cause of seedling mortality in pine forests of the southeastern United States and in many other forested regions around the globe. As part of a larger effort to discover loblolly pine genes, this study subjected rooted cuttings of three unrelated pine genotypes to three watering regimens. Expressed sequence tags (ESTs) were obtained from both the 3' and 5' ends of 12,918 randomly selected cDNAs generated from root tissues. These ESTs were clustered to identify 6,765 unique transcripts (UniScripts) derived from 6,202 putative unique genes (UniGenes-S). Tentative annotations were assigned on the basis of BLASTX comparisons to the Protein Information Resource Nonredundant Reference (PIR-NREF) database. Expression levels of 42 UniScripts varied with high statistical significance with respect to treatment. Many of them resembled gene products shown to be important for drought tolerance in other species, including dehydrins, endochitinases, cytochrome P450 enzymes, pathogenesis-related proteins and various late-embryogenesis abundant (LEA) gene products. Similarly, expression levels of 110 UniScripts varied with high statistical significance among genotypes, indicating that gene expression patterns in this species are much more dependent on genotype than on treatment. Most of the water stress-induced pine UniScripts that appeared to encode products resembling drought tolerance factors in other species were most highly induced in a single genotype, suggesting that particularly useful adaptive alleles for drought tolerance might exist within the collection of cDNAs characterized from this genotype. Mining and visualizing the complete data set, as well as downloading of both EST and UniScript contig sequences, are possible using MAGIC Gene Discovery at http://fungen.org/genediscovery/.

  10. HPLC-APCI-MS analysis of triacylglycerols (TAGs) in historical pharmaceutical ointments from the eighteenth century.

    PubMed

    Saliu, Francesco; Modugno, Francesca; Orlandi, Marco; Colombini, Maria Perla

    2011-10-01

    The lipid fractions of residues from historical pharmaceutical ointments were analysed by reversed-phase liquid chromatography coupled with atmospheric pressure chemical ionization and mass spectrometer detection. The residues were contained in a series of historical apothecary jars, dating from the eighteenth century and conserved at the "Aboca Museum" in Sansepolcro (Arezzo, Italy) and at the pharmacy of the "Real Cartuja de Valldemossa" in Palma de Majorca (Spain). The analytical protocol was set up using a comparative study based on the evaluation of triacylglycerol (TAG) compositions in raw natural lipid materials and in laboratory-reproduced ointments. These ointments were prepared following pharmaceutical recipes reported in historical treatises and used as reference materials. The reference materials were also subjected to stress treatments in order to evaluate the modification occurring in the TAG profiles as an effect of ageing. TAGs were successfully detected in the reproduced formulations even in mixtures of up to ten ingredients and after harsh degradative treatments, and also in real historical samples. No particular interferences were detected from other non-lipid ingredients of the formulations. The TAG compositions detected in the historical ointments indicated a predominant use of olive oil and pig adipose material as lipid ingredients. The detection of a high level of tristearine and myristyl-palmitoyl-stearyl glycerol in two of the samples suggested the presence of a fatty material of a different origin (maybe a ruminant). On the basis of the positional isomer ratio, sn-PPO/sn-POP, it was possible to hypothesize an exclusive use of pig fat in one sample. We also evaluated the application of principal component analysis of TAG profiles as an approach for the multivariate statistical comparison of the reference and historical ointments.

  11. Analysis and verification of a proposed antenna design for an implantable RFID Tag at 915 MHz

    NASA Astrophysics Data System (ADS)

    Bakore, Rahul

    This work focused on design and analysis of an antenna to be used with an RFID tag that is implanted in human brain tissue. The goal is to maximize the power transferred between the external RFID measurement system and the implanted RFID tag while minimizing the power dissipated within the surrounding tissue. The commercial computational electromagnetics software package COMSOL, based on finite element method (FEM) has been used for design process. The COMSOL models have been validated against additional simulations using the FEKO commercial package based on method of moments (MOM) as well as against measurement of test antenna structures radiating in bulk homogeneous medium. The proposed antenna geometry is compatible with the human tissue and viable for use in implantable RFID Tag. The proposed antenna is a planar folded dipole made from a gold conductor that acts as a biocompatible material. The metal thickness is 1 micrometer and the overall antenna dimensions are 22 mm × 3.5 mm. The antenna structure also includes a dielectric substrate and an acrylic coating. The antenna impedance is 28 + j201.5 Ω at 915 MHz. The inductive reactance is high enough to compensate the capacitive reactance of RFID tag and the antenna resistance is close to effective chip resistance providing a conjugate match. This antenna fulfills the criteria for minimizing the power dissipation within the human tissue. Also, a radiation efficiency of 87% is achieved with this antenna at 915 MHz. The quality factor of greater than 10 is achieved which is sufficient to turn on the diodes in the electronic circuit of the RFID tag due to the high D.C voltage obtained.

  12. Microsatellite markers derived from Quercus mongolica var. crispula (Fagaceae) inner bark expressed sequence tags.

    PubMed

    Ueno, Saneyoshi; Taguchi, Yuriko; Tsumura, Yoshihiko

    2008-04-01

    In reforestation programs the genetic composition and diversity of populations that could be used as sources of planting material needs to be carefully considered to maximize the chances of successful establishment. For such purposes genetic analyses that include the identification of functional genes are required. In this study, we constructed a cDNA library from inner bark of Quercus mongolica (which is widely distributed in Japan) and collected 3385 ESTs. After constructing 2140 unigenes, 274 microsatellites were found within them. The most frequent microsatellite had AG motif (48%) and the next most common was AAG motif (12%). There were no CG repeats in the unigenes. In total, 20 EST-SSR markers were developed, polymorphisms of which were described by using eight individuals from eight populations over the species' distributional range. The number of alleles per locus (Na) and observed heterozygosity (H(o)) ranged from 2 to 12, and from 0.25 to 1.00, respectively. Cross-species amplification was successful for 19 loci in eight individuals of Q. serrata and for 20 loci in eight individuals of Q. dentata, with values of Na and H(o) comparable to those of Q. mongolica. The EST-SSR markers characterized in this study should facilitate the analysis of genetic diversity in future studies.

  13. Analysis of myocardial motion using generalized spline models and tagged magnetic resonance images

    NASA Astrophysics Data System (ADS)

    Chen, Fang; Rose, Stephen E.; Wilson, Stephen J.; Veidt, Martin; Bennett, Cameron J.; Doddrell, David M.

    2000-06-01

    Heart wall motion abnormalities are the very sensitive indicators of common heart diseases, such as myocardial infarction and ischemia. Regional strain analysis is especially important in diagnosing local abnormalities and mechanical changes in the myocardium. In this work, we present a complete method for the analysis of cardiac motion and the evaluation of regional strain in the left ventricular wall. The method is based on the generalized spline models and tagged magnetic resonance images (MRI) of the left ventricle. The whole method combines dynamical tracking of tag deformation, simulating cardiac movement and accurately computing the regional strain distribution. More specifically, the analysis of cardiac motion is performed in three stages. Firstly, material points within the myocardium are tracked over time using a semi-automated snake-based tag tracking algorithm developed for this purpose. This procedure is repeated in three orthogonal axes so as to generate a set of one-dimensional sample measurements of the displacement field. The 3D-displacement field is then reconstructed from this sample set by using a generalized vector spline model. The spline reconstruction of the displacement field is explicitly expressed as a linear combination of a spline kernel function associated with each sample point and a polynomial term. Finally, the strain tensor (linear or nonlinear) with three direct components and three shear components is calculated by applying a differential operator directly to the displacement function. The proposed method is computationally effective and easy to perform on tagged MR images. The preliminary study has shown potential advantages of using this method for the analysis of myocardial motion and the quantification of regional strain.

  14. Fine Mutational Analysis of 2B8 and 3H7 Tag Epitopes with Corresponding Specific Monoclonal Antibodies.

    PubMed

    Kim, Tae-Lim; Cho, Man-Ho; Sangsawang, Kanidta; Bhoo, Seong Hee

    2016-06-30

    Bacteriophytochromes are phytochrome-like light-sensing photoreceptors that use biliverdin as a chromophore. To study the biochemical properties of the Deinococcus radiodurans bacteriophytochrome (DrBphP) protein, two anti-DrBphP mouse monoclonal antibodies (2B8 and 3H7) were generated. Their specific epitopes were identified in our previous report. We present here fine epitope mapping of these two antibodies by using truncation and substitution of original epitope sequences in order to identify minimized epitope peptides. The previously reported original epitope sequences for 2B8 and 3H7 were truncated from both sides. Our analysis showed that the minimal peptide sequence lengths for 2B8 and 3H7 antibodies were nine amino acids (RDPLPFFPP) and six amino acids (PGEIEE), respectively. We further characterized these peptides in order to investigate their reactivity after single deletion and single substitution of the original peptides. We found that single-substituted 2B8 epitope (RDPLPAFPP) and dual-substituted 3H7 epitope (PGEIAD) showed significantly increased reactivity. These two antibodies with high reactivity for the short modified peptide sequences are valueble for developing new peptide tags for protein research.

  15. Fine Mutational Analysis of 2B8 and 3H7 Tag Epitopes with Corresponding Specific Monoclonal Antibodies

    PubMed Central

    Kim, Tae-Lim; Cho, Man-Ho; Sangsawang, Kanidta; Bhoo, Seong Hee

    2016-01-01

    Bacteriophytochromes are phytochrome-like light-sensing photoreceptors that use biliverdin as a chromophore. To study the biochemical properties of the Deinococcus radiodurans bacteriophytochrome (DrBphP) protein, two anti-DrBphP mouse monoclonal antibodies (2B8 and 3H7) were generated. Their specific epitopes were identified in our previous report. We present here fine epitope mapping of these two antibodies by using truncation and substitution of original epitope sequences in order to identify minimized epitope peptides. The previously reported original epitope sequences for 2B8 and 3H7 were truncated from both sides. Our analysis showed that the minimal peptide sequence lengths for 2B8 and 3H7 antibodies were nine amino acids (RDPLPFFPP) and six amino acids (PGEIEE), respectively. We further characterized these peptides in order to investigate their reactivity after single deletion and single substitution of the original peptides. We found that single-substituted 2B8 epitope (RDPLPAFPP) and dual-substituted 3H7 epitope (PGEIAD) showed significantly increased reactivity. These two antibodies with high reactivity for the short modified peptide sequences are valueble for developing new peptide tags for protein research. PMID:27137090

  16. Annotating nonspecific SAGE tags with microarray data.

    PubMed

    Ge, Xijin; Jung, Yong-Chul; Wu, Qingfa; Kibbe, Warren A; Wang, San Ming

    2006-01-01

    SAGE (serial analysis of gene expression) detects transcripts by extracting short tags from the transcripts. Because of the limited length, many SAGE tags are shared by transcripts from different genes. Relying on sequence information in the general gene expression database has limited power to solve this problem due to the highly heterogeneous nature of the deposited sequences. Considering that the complexity of gene expression at a single tissue level should be much simpler than that in the general expression database, we reasoned that by restricting gene expression to tissue level, the accuracy of gene annotation for the nonspecific SAGE tags should be significantly improved. To test the idea, we developed a tissue-specific SAGE annotation database based on microarray data (). This database contains microarray expression information represented as UniGene clusters for 73 normal human tissues and 18 cancer tissues and cell lines. The nonspecific SAGE tag is first matched to the database by the same tissue type used by both SAGE and microarray analysis; then the multiple UniGene clusters assigned to the nonspecific SAGE tag are searched in the database under the matched tissue type. The UniGene cluster presented solely or at higher expression levels in the database is annotated to represent the specific gene for the nonspecific SAGE tags. The accuracy of gene annotation by this database was largely confirmed by experimental data. Our study shows that microarray data provide a useful source for annotating the nonspecific SAGE tags.

  17. The dynamics of the bacterial diversity in the redox transition and anoxic zones of the Cariaco Basin assessed by parallel tag sequencing.

    PubMed

    Rodriguez-Mora, Maria J; Scranton, Mary I; Taylor, Gordon T; Chistoserdov, Andrei Y

    2015-09-01

    Massively parallel tag sequencing was applied to describe the bacterial diversity in the redox transition and anoxic zones of the Cariaco Basin. In total, 14 samples from the Cariaco Basin were collected over a period of eight years from two stations. A total of 244 357 unique bacterial V6 amplicons were sequenced. The total number of operational taxonomic units (OTUs) found in this study was 4692, with a range of 511-1491 OTUs per sample. Approximately 95% of the OTUs found in the redox transition zone and anoxic layers of Cariaco are represented by less than 50 amplicons suggesting that only about 5% of the bacterial OTUs are responsible for the bulk of the microbial processes in the basin redox transition and anoxic zones. The same dominant OTUs were observed across all eight years of sampling although periodic fluctuations in their proportion were apparent. No distinctive differences were observed between the bacterial communities from the redox transition and anoxic layers of the Cariaco Basin water column. The largest proportion of amplicons belongs to Gammaproteobacteria represented mostly by sulfide oxidizers, followed by Marine Group A (originally described as SAR406; Gordon and Giovannoni 1996), a group of uncultured bacteria hypothesized to be involved in metal reduction, and sulfate-reducing Deltaproteobacteria. Gammaproteobacteria, Deltaproteobacteria and Marine Group A make up 67-90% of all V6 amplicons sequenced in this study. This strongly suggests that the basin's microbial communities are actively involved in the sulfur-related metabolism and coupling of the sulfur and carbon cycles. According to detrended canonical correspondence analysis, ecological factors such as chemoautotrophy, nitrate and oxidized and reduced sulfur compounds influence the structuring and distribution of the Cariaco microbial communities.

  18. Species-diagnostic single-nucleotide polymorphism and sequence-tagged site markers for the parasitic wasp genus Nasonia (Hymenoptera: Pteromalidae).

    PubMed

    Niehuis, O; Judson, A K; Werren, J H; Hunter, W B; Dang, P M; Dowd, S E; Grillenberger, B; Beukeboom, L W; Gadau, J

    2007-08-01

    Wasps of the genus Nasonia are important biological control agents of house flies and related filth flies, which are major vectors of human pathogens. Species of Nasonia (Hymenoptera: Pteromalidae) are not easily differentiated from one another by morphological characters, and molecular markers for their reliable identification have been missing so far. Here, we report eight single-nucleotide polymorphism and three sequence-tagged site markers derived from expressed sequenced tag libraries for the two closely related and regionally sympatric species N. giraulti and N. vitripennis. We studied variation of these markers in natural populations of the two species, and we mapped them in the Nasonia genome. The markers are species-diagnostic and evenly spread over all five chromosomes. They are ideal for rapid species identification and hybrid recognition, and they can be used to map economically relevant quantitative trait loci in the Nasonia genome.

  19. Analysis of the early-flowering mechanisms and generation of T-DNA tagging lines in Kitaake, a model rice cultivar.

    PubMed

    Kim, Song Lim; Choi, Minkyung; Jung, Ki-Hong; An, Gynheung

    2013-11-01

    As an extremely early flowering cultivar, rice cultivar Kitaake is a suitable model system for molecular studies. Expression analyses revealed that transcript levels of the flowering repressor Ghd7 were decreased while those of its downstream genes, Ehd1, Hd3a, and RFT1, were increased. Sequencing the known flowering-regulator genes revealed mutations in Ghd7 and OsPRR37 that cause early translation termination and amino acid substitutions, respectively. Genetic analysis of F2 progeny from a cross between cv. Kitaake and cv. Dongjin indicated that those mutations additively contribute to the early-flowering phenotype in cv. Kitaake. Because the short life cycle facilitates genetics research, this study generated 10 000 T-DNA tagging lines and deduced 6758 flanking sequence tags (FSTs), in which 3122 were genic and 3636 were intergenic. Among the genic lines, 367 (11.8%) were inserted into new genes that were not previously tagged. Because the lines were generated by T-DNA that contained the promoterless GUS reporter gene, which had an intron with triple splicing donors/acceptors in the right border region, a high efficiency of GUS expression was shown in various organs. Sequencing of the GUS-positive lines demonstrated that the third splicing donor and the first splicing acceptor of the vector were extensively used. The FST data have now been released into the public domain for seed distribution and facilitation of rice research.

  20. Synthesis of oligodeoxynucleotides containing N4-mercaptoethylcytosine and their use in the preparation of oligonucleotide-peptide conjugates carrying c-myc tag-sequence.

    PubMed

    Gottschling, D; Seliger, H; Tarrasón, G; Piulats, J; Eritja, R

    1998-01-01

    The preparation and properties of oligodeoxynucleotides containing mercaptoethyl groups at position N-4 of cytosine are described. The resulting thiol-oligodeoxynucleotides were reacted with a maleimido-peptide carrying the c-myc tag-sequence. The peptide-oligonucleotide conjugate is specifically recognized by an anti c-myc monoclonal antibody, thus constituting a labeling system with sensitivity similar to other existing methods of nonradioactive labeling.

  1. SAGExplore: a web server for unambiguous tag mapping in serial analysis of gene expression oriented to gene discovery and annotation.

    PubMed

    Norambuena, Tomás; Malig, Rodrigo; Melo, Francisco

    2007-07-01

    We describe a web server for the accurate mapping of experimental tags in serial analysis of gene expression (SAGE). The core of the server relies on a database of genomic virtual tags built by a recently described method that attempts to reduce the amount of ambiguous assignments for those tags that are not unique in the genome. The method provides a complete annotation of potential virtual SAGE tags within a genome, along with an estimation of their confidence for experimental observation that ranks tags that present multiple matches in the genome. The output of the server consists of a table in HTML format that contains links to a graphic representation of the results and to some external servers and databases, facilitating the tasks of analysis of gene expression and gene discovery. Also, a table in tab delimited text format is produced, allowing the user to export the results into custom databases and software for further analysis. The current server version provides the most accurate and complete SAGE tag mapping source that is available for the yeast organism. In the near future, this server will also allow the accurate mapping of experimental SAGE-tags from other model organisms such as human, mouse, frog and fly. The server is freely available on the web at: http://dna.bio.puc.cl/SAGExplore.html.

  2. Tissue expression map of a large number of expressed sequence tags and its application to in silico screening of stress response genes in common wheat.

    PubMed

    Mochida, Keiichi; Kawaura, Kanako; Shimosaka, Etsuo; Kawakami, Naoto; Shin-I, Tadasu; Kohara, Yuji; Yamazaki, Yukiko; Ogihara, Yasunari

    2006-09-01

    In order to assess global changes in gene expression patterns in stress-induced tissues, we conducted large-scale analysis of expressed sequence tags (ESTs) in common wheat. Twenty-one cDNA libraries derived from stress-induced tissues, such as callus, as well as liquid cultures and abiotic stress conditions (temperature treatment, desiccation, photoperiod, moisture and ABA) were constructed. Several thousand colonies were randomly selected from each of these 21 cDNA libraries and sequenced from both the 5' and 3' ends. By computing abundantly expressed ESTs, correlated expression patterns of genes across the tissues were monitored. Furthermore, the relationships between gene expression profiles among the stress-induced tissues were inferred from the gene expression patterns. Multi-dimensional analysis of EST data is analogous to microarray experiments. As an example, genes specifically induced and/or suppressed by cold acclimation and heat-shock treatments were selected in silico. Four hundred and ninety genes showing fivefold induction or 218 genes for suppression in comparison to the control expression level were selected. These selected genes were annotated with the BLAST search. Furthermore, gene ontology was conducted for these genes with the InterPro search. Because genes regulated in response to temperature treatment were successfully selected, this method can be applied to other stress-treated tissues. Then, the method was applied to screen genes in response to abiotic stresses such as drought and ABA treatments. In silico selection of screened genes from virtual display should provide a powerful tool for functional plant genomics.

  3. Novel Cysteine Tags for the Sequencing of Non-Tryptic Disulfide Peptides of Anurans: ESI-MS Study of Fragmentation Efficiency

    NASA Astrophysics Data System (ADS)

    Samgina, Tatyana Y.; Vorontsov, Egor A.; Gorshkov, Vladimir A.; Artemenko, Konstantin A.; Nifant'ev, Ilya E.; Kanawati, Basem; Schmitt-Kopplin, Philippe; Zubarev, Roman A.; Lebedev, Albert T.

    2011-12-01

    Mass spectrometry faces considerable difficulties in de novo sequencing of long non-tryptic peptides with S-S bonds. Long disulfide-containing peptides brevinins 1E and 2Ec from frog Rana ridibunda were reduced and alkylated with nine novel and three known derivatizing agents. Eight of the novel reagents are maleimide derivatives. Modified samples were subjected to MS/MS studies on FT-ICR and Orbitrap mass spectrometers using CAD/HCD or ECD/ETD techniques. Procedures, fragmentation patterns, and sequence coverage for two peptides modified with 12 tags are described. ECD/ETD and CAD fragmentation revealed complementary sequence information. Higher-energy collisionally activated dissociation (HCD) sufficiently enhanced y-ions formation for brevinin 1E, but not for brevinin 2Ec. Some novel tags [ N-benzylmaleimide, N-(2,6-dimethylphenyl)maleimide] along with known N-phenylmaleimide and iodoacetic acid showed high total sequence coverage taking into account combined ETD and HCD fragmentation. Moreover, modification of long (34 residues) brevinin 2Ec with N-benzylmaleimide or N-(2,6-dimethylphenyl)maleimide yielded high sequence coverage and full C-terminal sequence determination with ECD alone.

  4. A capture-recapture survival analysis model for radio-tagged animals

    USGS Publications Warehouse

    Pollock, K.H.; Bunck, C.M.; Winterstein, S.R.; Chen, C.-L.; North, P.M.; Nichols, J.D.

    1995-01-01

    In recent years, survival analysis of radio-tagged animals has developed using methods based on the Kaplan-Meier method used in medical and engineering applications (Pollock et al., 1989a,b). An important assumption of this approach is that all tagged animals with a functioning radio can be relocated at each sampling time with probability 1. This assumption may not always be reasonable in practice. In this paper, we show how a general capture-recapture model can be derived which allows for some probability (less than one) for animals to be relocated. This model is not simply a Jolly-Seber model because it is possible to relocate both dead and live animals, unlike when traditional tagging is used. The model can also be viewed as a generalization of the Kaplan-Meier procedure, thus linking the Jolly-Seber and Kaplan-Meier approaches to survival estimation. We present maximum likelihood estimators and discuss testing between submodels. We also discuss model assumptions and their validity in practice. An example is presented based on canvasback data collected by G. M. Haramis of Patuxent Wildlife Research Center, Laurel, Maryland, USA.

  5. Multiplex analysis of sphingolipids using amine-reactive tags (iTRAQ).

    PubMed

    Nabetani, Takuji; Makino, Asami; Hullin-Matsuda, Françoise; Hirakawa, Taka-Aki; Takeoka, Shinji; Okino, Nozomu; Ito, Makoto; Kobayashi, Toshihide; Hirabayashi, Yoshio

    2011-06-01

    Ceramides play a crucial role in divergent signaling events, including differentiation, senescence, proliferation, and apoptosis. Ceramides are a minor lipid component in terms of content; thus, highly sensitive detection is required for accurate quantification. The recently developed isobaric tags for relative and absolute quantitation (iTRAQ) method enables a precise comparison of both protein and aminophospholipids. However, iTRAQ tagging had not been applied to the determination of sphingolipids. Here we report a method for the simultaneous measurement of multiple ceramide and monohexosylceramide samples using iTRAQ tags. Samples were hydrolyzed with sphingolipid ceramide N-deacylase (SCDase) to expose the free amino group of the sphingolipids, to which the N-hydroxysuccinimide group of iTRAQ reagent was conjugated. The reaction was performed in the presence of a cleavable detergent, 3-[3-(1,1-bisalkyloxyethyl)pyridine-1-yl]propane-1-sulfonate (PPS) to both improve the hydrolysis and ensure the accuracy of the mass spectrometry analysis performed after iTRAQ labeling. This method was successfully applied to the profiling of ceramides and monohexosylceramides in sphingomyelinase-treated Madin Darby canine kidney (MDCK) cells and apoptotic Jurkat cells.

  6. Multiplex analysis of sphingolipids using amine-reactive tags (iTRAQ)

    PubMed Central

    Nabetani, Takuji; Makino, Asami; Hullin-Matsuda, Françoise; Hirakawa, Taka-aki; Takeoka, Shinji; Okino, Nozomu; Ito, Makoto; Kobayashi, Toshihide; Hirabayashi, Yoshio

    2011-01-01

    Ceramides play a crucial role in divergent signaling events, including differentiation, senescence, proliferation, and apoptosis. Ceramides are a minor lipid component in terms of content; thus, highly sensitive detection is required for accurate quantification. The recently developed isobaric tags for relative and absolute quantitation (iTRAQ) method enables a precise comparison of both protein and aminophospholipids. However, iTRAQ tagging had not been applied to the determination of sphingolipids. Here we report a method for the simultaneous measurement of multiple ceramide and monohexosylceramide samples using iTRAQ tags. Samples were hydrolyzed with sphingolipid ceramide N-deacylase (SCDase) to expose the free amino group of the sphingolipids, to which the N-hydroxysuccinimide group of iTRAQ reagent was conjugated. The reaction was performed in the presence of a cleavable detergent, 3-[3-(1,1-bisalkyloxyethyl)pyridine-1-yl]propane-1-sulfonate (PPS) to both improve the hydrolysis and ensure the accuracy of the mass spectrometry analysis performed after iTRAQ labeling. This method was successfully applied to the profiling of ceramides and monohexosylceramides in sphingomyelinase-treated Madin Darby canine kidney (MDCK) cells and apoptotic Jurkat cells. PMID:21487068

  7. Functional categorization of unique expressed sequence tags obtained from the yeast-like growth phase of the elm pathogen Ophiostoma novo-ulmi

    PubMed Central

    2011-01-01

    Background The highly aggressive pathogenic fungus Ophiostoma novo-ulmi continues to be a serious threat to the American elm (Ulmus americana) in North America. Extensive studies have been conducted in North America to understand the mechanisms of virulence of this introduced pathogen and its evolving population structure, with a view to identifying potential strategies for the control of Dutch elm disease. As part of a larger study to examine the genomes of economically important Ophiostoma spp. and the genetic basis of virulence, we have constructed an expressed sequence tag (EST) library using total RNA extracted from the yeast-like growth phase of O. novo-ulmi (isolate H327). Results A total of 4,386 readable EST sequences were annotated by determining their closest matches to known or theoretical sequences in public databases by BLASTX analysis. Searches matched 2,093 sequences to entries found in Genbank, including 1,761 matches with known proteins and 332 matches with unknown (hypothetical/predicted) proteins. Known proteins included a collection of 880 unique transcripts which were categorized to obtain a functional profile of the transcriptome and to evaluate physiological function. These assignments yielded 20 primary functional categories (FunCat), the largest including Metabolism (FunCat 01, 20.28% of total), Sub-cellular localization (70, 10.23%), Protein synthesis (12, 10.14%), Transcription (11, 8.27%), Biogenesis of cellular components (42, 8.15%), Cellular transport, facilitation and routes (20, 6.08%), Classification unresolved (98, 5.80%), Cell rescue, defence and virulence (32, 5.31%) and the unclassified category, or known sequences of unknown metabolic function (99, 7.5%). A list of specific transcripts of interest was compiled to initiate an evaluation of their impact upon strain virulence in subsequent studies. Conclusions This is the first large-scale study of the O. novo-ulmi transcriptome. The expression profile obtained from the yeast

  8. Preparing and Analyzing Expressed Sequence Tags (ESTs) Library for the Mammary Tissue of Local Turkish Kivircik Sheep.

    PubMed

    Ozdemir Ozgenturk, Nehir; Omeroglu Ulu, Zehra; Ulu, Salih; Un, Cemal; Ozdem Oztabak, Kemal; Altunatmaz, Kemal

    2017-01-01

    Kivircik sheep is an important local Turkish sheep according to its meat quality and milk productivity. The aim of this study was to analyze gene expression profiles of both prenatal and postnatal stages for the Kivircik sheep. Therefore, two different cDNA libraries, which were taken from the same Kivircik sheep mammary gland tissue at prenatal and postnatal stages, were constructed. Total 3072 colonies which were randomly selected from the two libraries were sequenced for developing a sheep ESTs collection. We used Phred/Phrap computer programs for analysis of the raw EST and readable EST sequences were assembled with the CAP3 software. Putative functions of all unique sequences and statistical analysis were determined by Geneious software. Total 422 ESTs have over 80% similarity to known sequences of other organisms in NCBI classified by Panther database for the Gene Ontology (GO) category. By comparing gene expression profiles, we observed some putative genes that may be relative to reproductive performance or play important roles in milk synthesis and secretion. A total of 2414 ESTs have been deposited to the NCBI GenBank database (GW996847-GW999260). EST data in this study have provided a new source of information to functional genome studies of sheep.

  9. Preparing and Analyzing Expressed Sequence Tags (ESTs) Library for the Mammary Tissue of Local Turkish Kivircik Sheep

    PubMed Central

    Omeroglu Ulu, Zehra; Ulu, Salih; Un, Cemal; Ozdem Oztabak, Kemal; Altunatmaz, Kemal

    2017-01-01

    Kivircik sheep is an important local Turkish sheep according to its meat quality and milk productivity. The aim of this study was to analyze gene expression profiles of both prenatal and postnatal stages for the Kivircik sheep. Therefore, two different cDNA libraries, which were taken from the same Kivircik sheep mammary gland tissue at prenatal and postnatal stages, were constructed. Total 3072 colonies which were randomly selected from the two libraries were sequenced for developing a sheep ESTs collection. We used Phred/Phrap computer programs for analysis of the raw EST and readable EST sequences were assembled with the CAP3 software. Putative functions of all unique sequences and statistical analysis were determined by Geneious software. Total 422 ESTs have over 80% similarity to known sequences of other organisms in NCBI classified by Panther database for the Gene Ontology (GO) category. By comparing gene expression profiles, we observed some putative genes that may be relative to reproductive performance or play important roles in milk synthesis and secretion. A total of 2414 ESTs have been deposited to the NCBI GenBank database (GW996847–GW999260). EST data in this study have provided a new sourc