Sample records for tag est databases

  1. Rapid in silico cloning of genes using expressed sequence tags (ESTs).

    PubMed

    Gill, R W; Sanseau, P

    2000-01-01

    Expressed sequence tags (ESTs) are short single-pass DNA sequences obtained from either end of cDNA clones. These ESTs are derived from a vast number of cDNA libraries obtained from different species. Human ESTs are the bulk of the data and have been widely used to identify new members of gene families, as markers on the human chromosomes, to discover polymorphism sites and to compare expression patterns in different tissues or pathologies states. Information strategies have been devised to query EST databases. Since most of the analysis is performed with a computer, the term "in silico" strategy has been coined. In this chapter we will review the current status of EST databases, the pros and cons of EST-type data and describe possible strategies to retrieve meaningful information.

  2. A large scale analysis of cDNA in Arabidopsis thaliana: generation of 12,028 non-redundant expressed sequence tags from normalized and size-selected cDNA libraries.

    PubMed

    Asamizu, E; Nakamura, Y; Sato, S; Tabata, S

    2000-06-30

    For comprehensive analysis of genes expressed in the model dicotyledonous plant, Arabidopsis thaliana, expressed sequence tags (ESTs) were accumulated. Normalized and size-selected cDNA libraries were constructed from aboveground organs, flower buds, roots, green siliques and liquid-cultured seedlings, respectively, and a total of 14,026 5'-end ESTs and 39,207 3'-end ESTs were obtained. The 3'-end ESTs could be clustered into 12,028 non-redundant groups. Similarity search of the non-redundant ESTs against the public non-redundant protein database indicated that 4816 groups show similarity to genes of known function, 1864 to hypothetical genes, and the remaining 5348 are novel sequences. Gene coverage by the non-redundant ESTs was analyzed using the annotated genomic sequences of approximately 10 Mb on chromosomes 3 and 5. A total of 923 regions were hit by at least one EST, among which only 499 regions were hit by the ESTs deposited in the public database. The result indicates that the EST source generated in this project complements the EST data in the public database and facilitates new gene discovery.

  3. The use of sequence-based SSR mining for the development of a vast collection of microsatellites in Aquilegia Formosa

    Treesearch

    Brandon Schlautman; Vera Pfeiffer; Juan Zalapa; Johanne Brunet

    2014-01-01

    Numerous microsatellite markers were developed for Aquilegia formosafrom sequences deposited within the Expressed Sequence Tag (EST), Genomic Survey Sequence (GSS), and Nucleotide databases in NCBI. Microsatellites (SSRs) were identified and primers were designed for 9 SSR containing sequences in the Nucleotide database, 3803 sequences in the EST...

  4. Analysis of SSR information in EST resources of sugarcane

    USDA-ARS?s Scientific Manuscript database

    Expressed sequence tags ( ESTs) offer the opportunity to exploit single, low -copy, conserved sequence motifs for the development of simple sequence repeats ( SSRs). The total of 262 113 ESTs of sugarcane (Saccharum officinarum) in the database of NCBI were downloaded and analyzed, which resulted in...

  5. EST-PAC a web package for EST annotation and protein sequence prediction

    PubMed Central

    Strahm, Yvan; Powell, David; Lefèvre, Christophe

    2006-01-01

    With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST) from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST) annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1) searching local or remote biological databases for sequence similarities using Blast services, 2) predicting protein coding sequence from EST data and, 3) annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics. PMID:17147782

  6. Update of the Diatom EST Database: a new tool for digital transcriptomics

    PubMed Central

    Maheswari, Uma; Mock, Thomas; Armbrust, E. Virginia; Bowler, Chris

    2009-01-01

    The Diatom Expressed Sequence Tag (EST) Database was constructed to provide integral access to ESTs from these ecologically and evolutionarily interesting microalgae. It has now been updated with 130 000 Phaeodactylum tricornutum ESTs from 16 cDNA libraries and 77 000 Thalassiosira pseudonana ESTs from seven libraries, derived from cells grown in different nutrient and stress regimes. The updated relational database incorporates results from statistical analyses such as log-likelihood ratios and hierarchical clustering, which help to identify differentially expressed genes under different conditions, and allow similarities in gene expression in different libraries to be investigated in a functional context. The database also incorporates links to the recently sequenced genomes of P. tricornutum and T. pseudonana, enabling an easy cross-talk between the expression pattern of diatom orthologs and the genome browsers. These improvements will facilitate exploration of diatom responses to conditions of ecological relevance and will aid gene function identification of diatom-specific genes and in silico gene prediction in this largely unexplored class of eukaryotes. The updated Diatom EST Database is available at http://www.biologie.ens.fr/diatomics/EST3. PMID:19029140

  7. openSputnik--a database to ESTablish comparative plant genomics using unsaturated sequence collections.

    PubMed

    Rudd, Stephen

    2005-01-01

    The public expressed sequence tag collections are continually being enriched with high-quality sequences that represent an ever-expanding range of taxonomically diverse plant species. While these sequence collections provide biased insight into the populations of expressed genes available within individual species and their associated tissues, the information is conceivably of wider relevance in a comparative context. When we consider the available expressed sequence tag (EST) collections of summer 2004, most of the major plant taxonomic clades are at least superficially represented. Investigation of the five million available plant ESTs provides a wealth of information that has applications in modelling the routes of plant genome evolution and the identification of lineage-specific genes and gene families. Over four million ESTs from over 50 distinct plant species have been collated within an EST analysis pipeline called openSputnik. The ESTs were resolved down into approximately one million unigene sequences. These have been annotated using orthology-based annotation transfer from reference plant genomes and using a variety of contemporary bioinformatics methods to assign peptide, structural and functional attributes. The openSputnik database is available at http://sputnik.btk.fi.

  8. Analysis and functional annotation of expressed sequence tags from the fall armyworm Spodoptera frugiperda

    PubMed Central

    Deng, Youping; Dong, Yinghua; Thodima, Venkata; Clem, Rollie J; Passarelli, A Lorena

    2006-01-01

    Background Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm. Results We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences. Conclusion S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses. PMID:17052344

  9. An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

    PubMed Central

    Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

    2004-01-01

    Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051

  10. Development of an Expressed Sequence Tag (EST) Resource for Wheat (Triticum aestivum L.)

    PubMed Central

    Lazo, G. R.; Chao, S.; Hummel, D. D.; Edwards, H.; Crossman, C. C.; Lui, N.; Matthews, D. E.; Carollo, V. L.; Hane, D. L.; You, F. M.; Butler, G. E.; Miller, R. E.; Close, T. J.; Peng, J. H.; Lapitan, N. L. V.; Gustafson, J. P.; Qi, L. L.; Echalier, B.; Gill, B. S.; Dilbirligi, M.; Randhawa, H. S.; Gill, K. S.; Greene, R. A.; Sorrells, M. E.; Akhunov, E. D.; Dvořák, J.; Linkiewicz, A. M.; Dubcovsky, J.; Hossain, K. G.; Kalavacharla, V.; Kianian, S. F.; Mahmoud, A. A.; Miftahudin; Ma, X.-F.; Conley, E. J.; Anderson, J. A.; Pathan, M. S.; Nguyen, H. T.; McGuire, P. E.; Qualset, C. O.; Anderson, O. D.

    2004-01-01

    This report describes the rationale, approaches, organization, and resource development leading to a large-scale deletion bin map of the hexaploid (2n = 6x = 42) wheat genome (Triticum aestivum L.). Accompanying reports in this issue detail results from chromosome bin-mapping of expressed sequence tags (ESTs) representing genes onto the seven homoeologous chromosome groups and a global analysis of the entire mapped wheat EST data set. Among the resources developed were the first extensive public wheat EST collection (113,220 ESTs). Described are protocols for sequencing, sequence processing, EST nomenclature, and the assembly of ESTs into contigs. These contigs plus singletons (unassembled ESTs) were used for selection of distinct sequence motif unigenes. Selected ESTs were rearrayed, validated by 5′ and 3′ sequencing, and amplified for probing a series of wheat aneuploid and deletion stocks. Images and data for all Southern hybridizations were deposited in databases and were used by the coordinators for each of the seven homoeologous chromosome groups to validate the mapping results. Results from this project have established the foundation for future developments in wheat genomics. PMID:15514037

  11. SolEST database: a "one-stop shop" approach to the study of Solanaceae transcriptomes.

    PubMed

    D'Agostino, Nunzio; Traini, Alessandra; Frusciante, Luigi; Chiusano, Maria Luisa

    2009-11-30

    Since no genome sequences of solanaceous plants have yet been completed, expressed sequence tag (EST) collections represent a reliable tool for broad sampling of Solanaceae transcriptomes, an attractive route for understanding Solanaceae genome functionality and a powerful reference for the structural annotation of emerging Solanaceae genome sequences. We describe the SolEST database http://biosrv.cab.unina.it/solestdb which integrates different EST datasets from both cultivated and wild Solanaceae species and from two species of the genus Coffea. Background as well as processed data contained in the database, extensively linked to external related resources, represent an invaluable source of information for these plant families. Two novel features differentiate SolEST from other resources: i) the option of accessing and then visualizing Solanaceae EST/TC alignments along the emerging tomato and potato genome sequences; ii) the opportunity to compare different Solanaceae assemblies generated by diverse research groups in the attempt to address a common complaint in the SOL community. Different databases have been established worldwide for collecting Solanaceae ESTs and are related in concept, content and utility to the one presented herein. However, the SolEST database has several distinguishing features that make it appealing for the research community and facilitates a "one-stop shop" for the study of Solanaceae transcriptomes.

  12. Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida

    PubMed Central

    Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

    2007-01-01

    Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730

  13. In silico analysis of expressed sequence tags from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform with conventional database searches.

    PubMed

    Nagaraj, Shivashankar H; Gasser, Robin B; Nisbet, Alasdair J; Ranganathan, Shoba

    2008-01-01

    The analysis of expressed sequence tags (EST) offers a rapid and cost effective approach to elucidate the transcriptome of an organism, but requires several computational methods for assembly and annotation. Researchers frequently analyse each step manually, which is laborious and time consuming. We have recently developed ESTExplorer, a semi-automated computational workflow system, in order to achieve the rapid analysis of EST datasets. In this study, we evaluated EST data analysis for the parasitic nematode Trichostrongylus vitrinus (order Strongylida) using ESTExplorer, compared with database matching alone. We functionally annotated 1776 ESTs obtained via suppressive-subtractive hybridisation from T. vitrinus, an important parasitic trichostrongylid of small ruminants. Cluster and comparative genomic analyses of the transcripts using ESTExplorer indicated that 290 (41%) sequences had homologues in Caenorhabditis elegans, 329 (42%) in parasitic nematodes, 202 (28%) in organisms other than nematodes, and 218 (31%) had no significant match to any sequence in the current databases. Of the C. elegans homologues, 90 were associated with 'non-wildtype' double-stranded RNA interference (RNAi) phenotypes, including embryonic lethality, maternal sterility, sterile progeny, larval arrest and slow growth. We could functionally classify 267 (38%) sequences using the Gene Ontologies (GO) and establish pathway associations for 230 (33%) sequences using the Kyoto Encyclopedia of Genes and Genomes (KEGG). Further examination of this EST dataset revealed a number of signalling molecules, proteases, protease inhibitors, enzymes, ion channels and immune-related genes. In addition, we identified 40 putative secreted proteins that could represent potential candidates for developing novel anthelmintics or vaccines. We further compared the automated EST sequence annotations, using ESTExplorer, with database search results for individual T. vitrinus ESTs. ESTExplorer reliably and rapidly annotated 301 ESTs, with pathway and GO information, eliminating 60 low quality hits from database searches. We evaluated the efficacy of ESTExplorer in analysing EST data, and demonstrate that computational tools can be used to accelerate the process of gene discovery in EST sequencing projects. The present study has elucidated sets of relatively conserved and potentially novel genes for biological investigation, and the annotated EST set provides further insight into the molecular biology of T. vitrinus, towards the identification of novel drug targets.

  14. Sequence analysis of 497 mouse brain ESTs expressed in the substantia nigra

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stewart, G.J.; Savioz, A.; Davies, R.W.

    1997-01-15

    The use of subtracted, region-specific cDNA libraries combined with single-pass cDNA sequencing allows the discovery of novel genes and facilitates molecular description of the tissue or region involved. We report the sequence of 497 mouse expressed sequence tags (ESTs) from two subtracted libraries enriched for cDNAs expressed in the substantia nigra, a brain region with important roles in movement control and Parkinson disease. Of these, 238 ESTs give no database matches and therefore derive from novel genes. A further 115 ESTs show sequence similarity to ESTs from other organisms, which themselves do not yield any significant database matches to genesmore » of known function. Fifty-six ESTs show sequence similarity to previously identified genes whose mouse homologues have not been reported. The total number of ESTs reported that are new for the mouse is 407, which, together with the 90 ESTs corresponding to known mouse genes or cDNAs, contributes to the molecular description of the substantia nigra. 21 refs., 4 tabs.« less

  15. Candidate gene database and transcript map for peach, a model species for fruit trees.

    PubMed

    Horn, Renate; Lecouls, Anne-Claire; Callahan, Ann; Dandekar, Abhaya; Garay, Lilibeth; McCord, Per; Howad, Werner; Chan, Helen; Verde, Ignazio; Main, Doreen; Jung, Sook; Georgi, Laura; Forrest, Sam; Mook, Jennifer; Zhebentyayeva, Tatyana; Yu, Yeisoo; Kim, Hye Ran; Jesudurai, Christopher; Sosinski, Bryon; Arús, Pere; Baird, Vance; Parfitt, Dan; Reighard, Gregory; Scorza, Ralph; Tomkins, Jeffrey; Wing, Rod; Abbott, Albert Glenn

    2005-05-01

    Peach (Prunus persica) is a model species for the Rosaceae, which includes a number of economically important fruit tree species. To develop an extensive Prunus expressed sequence tag (EST) database for identifying and cloning the genes important to fruit and tree development, we generated 9,984 high-quality ESTs from a peach cDNA library of developing fruit mesocarp. After assembly and annotation, a putative peach unigene set consisting of 3,842 ESTs was defined. Gene ontology (GO) classification was assigned based on the annotation of the single "best hit" match against the Swiss-Prot database. No significant homology could be found in the GenBank nr databases for 24.3% of the sequences. Using core markers from the general Prunus genetic map, we anchored bacterial artificial chromosome (BAC) clones on the genetic map, thereby providing a framework for the construction of a physical and transcript map. A transcript map was developed by hybridizing 1,236 ESTs from the putative peach unigene set and an additional 68 peach cDNA clones against the peach BAC library. Hybridizing ESTs to genetically anchored BACs immediately localized 11.2% of the ESTs on the genetic map. ESTs showed a clustering of expressed genes in defined regions of the linkage groups. [The data were built into a regularly updated Genome Database for Rosaceae (GDR), available at (http://www.genome.clemson.edu/gdr/).].

  16. Analysis of expressed sequence tags from Actinidia: applications of a cross species EST database for gene discovery in the areas of flavor, health, color and ripening

    PubMed Central

    Crowhurst, Ross N; Gleave, Andrew P; MacRae, Elspeth A; Ampomah-Dwamena, Charles; Atkinson, Ross G; Beuning, Lesley L; Bulley, Sean M; Chagne, David; Marsh, Ken B; Matich, Adam J; Montefiori, Mirco; Newcomb, Richard D; Schaffer, Robert J; Usadel, Björn; Allan, Andrew C; Boldingh, Helen L; Bowen, Judith H; Davy, Marcus W; Eckloff, Rheinhart; Ferguson, A Ross; Fraser, Lena G; Gera, Emma; Hellens, Roger P; Janssen, Bart J; Klages, Karin; Lo, Kim R; MacDiarmid, Robin M; Nain, Bhawana; McNeilage, Mark A; Rassam, Maysoon; Richardson, Annette C; Rikkerink, Erik HA; Ross, Gavin S; Schröder, Roswitha; Snowden, Kimberley C; Souleyre, Edwige JF; Templeton, Matt D; Walton, Eric F; Wang, Daisy; Wang, Mindy Y; Wang, Yanming Y; Wood, Marion; Wu, Rongmei; Yauk, Yar-Khing; Laing, William A

    2008-01-01

    Background Kiwifruit (Actinidia spp.) are a relatively new, but economically important crop grown in many different parts of the world. Commercial success is driven by the development of new cultivars with novel consumer traits including flavor, appearance, healthful components and convenience. To increase our understanding of the genetic diversity and gene-based control of these key traits in Actinidia, we have produced a collection of 132,577 expressed sequence tags (ESTs). Results The ESTs were derived mainly from four Actinidia species (A. chinensis, A. deliciosa, A. arguta and A. eriantha) and fell into 41,858 non redundant clusters (18,070 tentative consensus sequences and 23,788 EST singletons). Analysis of flavor and fragrance-related gene families (acyltransferases and carboxylesterases) and pathways (terpenoid biosynthesis) is presented in comparison with a chemical analysis of the compounds present in Actinidia including esters, acids, alcohols and terpenes. ESTs are identified for most genes in color pathways controlling chlorophyll degradation and carotenoid biosynthesis. In the health area, data are presented on the ESTs involved in ascorbic acid and quinic acid biosynthesis showing not only that genes for many of the steps in these pathways are represented in the database, but that genes encoding some critical steps are absent. In the convenience area, genes related to different stages of fruit softening are identified. Conclusion This large EST resource will allow researchers to undertake the tremendous challenge of understanding the molecular basis of genetic diversity in the Actinidia genus as well as provide an EST resource for comparative fruit genomics. The various bioinformatics analyses we have undertaken demonstrates the extent of coverage of ESTs for genes encoding different biochemical pathways in Actinidia. PMID:18655731

  17. Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus

    PubMed Central

    Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

    2012-01-01

    Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382

  18. Analysis of Expressed Sequence Tags (EST) in Date Palm.

    PubMed

    Al-Faifi, Sulieman A; Migdadi, Hussein M; Algamdi, Salem S; Khan, Mohammad Altaf; Al-Obeed, Rashid S; Ammar, Megahed H; Jakse, Jerenj

    2017-01-01

    Expressed sequence tags (EST) were generated from a normalized cDNA library of the date palm Sukkari cv. to understand the high-quality and better field performance of this well-known commercial cultivar. A total of 6943 high-quality ESTs were generated, out of them 6671 are submitted to the GenBank dbEST (LIBEST_028537). The generated ESTs were assembled into 6362 unigenes, consisting of 494 (14.4%) contigs and 5868 (84.53%) singletons. The functional annotation shows that the majority of the ESTs are associated with binding (44%), catalytic (40%), transporter (5%), and structural molecular (5%) activities. The blastx results show that 73% of unigenes are significantly similar to known plant genes and 27% are novel. The latter could be of particular interest in date palm genetic studies. Further analysis shows that some ESTs are categorized as stress/defense- and fruit development-related genes. These newly generated ESTs could significantly enhance date palm EST databases in the public domain and are available to scientists and researchers across the globe. This knowledge will facilitate the discovery of candidate genes that govern important developmental and agronomical traits in date palm. It will provide important resources for developing genetic tools, comparative genomics, and genome evolution among date palm cultivars.

  19. Pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of Plasmodium vivax in human patients.

    PubMed

    Merino, Emilio F; Fernandez-Becerra, Carmen; Madeira, Alda M B N; Machado, Ariane L; Durham, Alan; Gruber, Arthur; Hall, Neil; del Portillo, Hernando A

    2003-07-21

    Plasmodium vivax is the most widely distributed human malaria, responsible for 70-80 million clinical cases each year and large socio-economical burdens for countries such as Brazil where it is the most prevalent species. Unfortunately, due to the impossibility of growing this parasite in continuous in vitro culture, research on P. vivax remains largely neglected. A pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of P. vivax was performed. To do so, 1,184 clones from a cDNA library constructed with parasites obtained from 10 different human patients in the Brazilian Amazon were sequenced. Sequences were automatedly processed to remove contaminants and low quality reads. A total of 806 sequences with an average length of 586 bp met such criteria and their clustering revealed 666 distinct events. The consensus sequence of each cluster and the unique sequences of the singlets were used in similarity searches against different databases that included P. vivax, Plasmodium falciparum, Plasmodium yoelii, Plasmodium knowlesi, Apicomplexa and the GenBank non-redundant database. An E-value of <10(-30) was used to define a significant database match. ESTs were manually assigned a gene ontology (GO) terminology A total of 769 ESTs could be assigned a putative identity based upon sequence similarity to known proteins in GenBank. Moreover, 292 ESTs were annotated and a GO terminology was assigned to 164 of them. These are the first ESTs reported for P. vivax and, as such, they represent a valuable resource to assist in the annotation of the P. vivax genome currently being sequenced. Moreover, since the GC-content of the P. vivax genome is strikingly different from that of P. falciparum, these ESTs will help in the validation of gene predictions for P. vivax and to create a gene index of this malaria parasite.

  20. ESTuber db: an online database for Tuber borchii EST sequences.

    PubMed

    Lazzari, Barbara; Caprera, Andrea; Cosentino, Cristian; Stella, Alessandra; Milanesi, Luciano; Viotti, Angelo

    2007-03-08

    The ESTuber database (http://www.itb.cnr.it/estuber) includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-house prepared cDNA library from truffle vegetative hyphae, and 882 sequences downloaded from GenBank and representing four libraries from white truffle mycelia and ascocarps at different developmental stages. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts. Data were collected in a MySQL database, which can be queried via a php-based web interface. Sequences included in the ESTuber db were clustered and annotated against three databases: the GenBank nr database, the UniProtKB database and a third in-house prepared database of fungi genomic sequences. An algorithm was implemented to infer statistical classification among Gene Ontology categories from the ontology occurrences deduced from the annotation procedure against the UniProtKB database. Ontologies were also deduced from the annotation of more than 130,000 EST sequences from five filamentous fungi, for intra-species comparison purposes. Further analyses were performed on the ESTuber db dataset, including tandem repeats search and comparison of the putative protein dataset inferred from the EST sequences to the PROSITE database for protein patterns identification. All the analyses were performed both on the complete sequence dataset and on the contig consensus sequences generated by the EST assembly procedure. The resulting web site is a resource of data and links related to truffle expressed genes. The Sequence Report and Contig Report pages are the web interface core structures which, together with the Text search utility and the Blast utility, allow easy access to the data stored in the database.

  1. ESTs and EST-linked polymorphisms for genetic mapping and phylogenetic reconstruction in the guppy, Poecilia reticulata

    PubMed Central

    Dreyer, Christine; Hoffmann, Margarete; Lanz, Christa; Willing, Eva-Maria; Riester, Markus; Warthmann, Norman; Sprecher, Andrea; Tripathi, Namita; Henz, Stefan R; Weigel, Detlef

    2007-01-01

    Background The guppy, Poecilia reticulata, is a well-known model organism for studying inheritance and variation of male ornamental traits as well as adaptation to different river habitats. However, genomic resources for studying this important model were not previously widely available. Results With the aim of generating molecular markers for genetic mapping of the guppy, cDNA libraries were constructed from embryos and different adult organs to generate expressed sequence tags (ESTs). About 18,000 ESTs were annotated according to BLASTN and BLASTX results and the sequence information from the 3' UTRs was exploited to generate PCR primers for re-sequencing of genomic DNA from different wild type strains. By comparison of EST-linked genomic sequences from at least four different ecotypes, about 1,700 polymorphisms were identified, representing about 400 distinct genes. Two interconnected MySQL databases were built to organize the ESTs and markers, respectively. A robust phylogeny of the guppy was reconstructed, based on 10 different nuclear genes. Conclusion Our EST and marker databases provide useful tools for genetic mapping and phylogenetic studies of the guppy. PMID:17686157

  2. Exploiting EST databases for the development and characterisation of 3425 gene-tagged CISP markers in biofuel crop sugarcane and their transferability in cereals and orphan tropical grasses.

    PubMed

    Chandra, Amaresh; Jain, Radha; Solomon, Sushil; Shrivastava, Shiksha; Roy, Ajoy K

    2013-02-04

    Sugarcane is an important cash crop, providing 70% of the global raw sugar as well as raw material for biofuel production. Genetic analysis is hindered in sugarcane because of its large and complex polyploid genome and lack of sufficiently informative gene-tagged markers. Modern genomics has produced large amount of ESTs, which can be exploited to develop molecular markers based on comparative analysis with EST datasets of related crops and whole rice genome sequence, and accentuate their cross-technical functionality in orphan crops like tropical grasses. Utilising 246,180 Saccharum officinarum EST sequences vis-à-vis its comparative analysis with ESTs of sorghum and barley and the whole rice genome sequence, we have developed 3425 novel gene-tagged markers - namely, conserved-intron scanning primers (CISP) - using the web program GeMprospector. Rice orthologue annotation results indicated homology of 1096 sequences with expressed proteins, 491 with hypothetical proteins. The remaining 1838 were miscellaneous in nature. A total of 367 primer-pairs were tested in diverse panel of samples. The data indicate amplification of 41% polymorphic bands leading to 0.52 PIC and 3.50 MI with a set of sugarcane varieties and Saccharum species. In addition, a moderate technical functionality of a set of such markers with orphan tropical grasses (22%) and fodder cum cereal oat (33%) is observed. Developed gene-tagged CISP markers exhibited considerable technical functionality with varieties of sugarcane and unexplored species of tropical grasses. These markers would thus be particularly useful in identifying the economical traits in sugarcane and developing conservation strategies for orphan tropical grasses.

  3. Chasing Migration Genes: A Brain Expressed Sequence Tag Resource for Summer and Migratory Monarch Butterflies (Danaus plexippus)

    PubMed Central

    Zhu, Haisun; Casselman, Amy; Reppert, Steven M.

    2008-01-01

    North American monarch butterflies (Danaus plexippus) undergo a spectacular fall migration. In contrast to summer butterflies, migrants are juvenile hormone (JH) deficient, which leads to reproductive diapause and increased longevity. Migrants also utilize time-compensated sun compass orientation to help them navigate to their overwintering grounds. Here, we describe a brain expressed sequence tag (EST) resource to identify genes involved in migratory behaviors. A brain EST library was constructed from summer and migrating butterflies. Of 9,484 unique sequences, 6068 had positive hits with the non-redundant protein database; the EST database likely represents ∼52% of the gene-encoding potential of the monarch genome. The brain transcriptome was cataloged using Gene Ontology and compared to Drosophila. Monarch genes were well represented, including those implicated in behavior. Three genes involved in increased JH activity (allatotropin, juvenile hormone acid methyltransfersase, and takeout) were upregulated in summer butterflies, compared to migrants. The locomotion-relevant turtle gene was marginally upregulated in migrants, while the foraging and single-minded genes were not differentially regulated. Many of the genes important for the monarch circadian clock mechanism (involved in sun compass orientation) were in the EST resource, including the newly identified cryptochrome 2. The EST database also revealed a novel Na+/K+ ATPase allele predicted to be more resistant to the toxic effects of milkweed than that reported previously. Potential genetic markers were identified from 3,486 EST contigs and included 1599 double-hit single nucleotide polymorphisms (SNPs) and 98 microsatellite polymorphisms. These data provide a template of the brain transcriptome for the monarch butterfly. Our “snap-shot” analysis of the differential regulation of candidate genes between summer and migratory butterflies suggests that unbiased, comprehensive transcriptional profiling will inform the molecular basis of migration. The identified SNPs and microsatellite polymorphisms can be used as genetic markers to address questions of population and subspecies structure. PMID:18183285

  4. Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

    PubMed Central

    Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

    2003-01-01

    To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979

  5. Expressed sequence tag (EST) analysis of two subspecies of Metarhizium anisopliae reveals a plethora of secreted proteins with potential activity in insect hosts.

    PubMed

    Freimoser, Florian M; Screen, Steven; Bagga, Savita; Hu, Gang; St Leger, Raymond J

    2003-01-01

    Expressed sequence tag (EST) libraries for Metarhizium anisopliae, the causative agent of green muscardine disease, were developed from the broad host-range pathogen Metarhizium anisopliae sf. anisopliae and the specific grasshopper pathogen, M. anisopliae sf. acridum. Approximately 1,700 5' end sequences from each subspecies were generated from cDNA libraries representing fungi grown under conditions that maximize secretion of cuticle-degrading enzymes. Both subspecies had ESTs for virtually all pathogenicity-related genes cloned to date from M. anisopliae, but many novel genes encoding potential virulence factors were also tagged. Enzymes with potential targets in the insect host included proteases, chitinases, phospholipases, lipases, esterases, phosphatases and enzymes producing toxic secondary metabolites. A diverse array of proteases composed 36 % of all M. anisopliae sf. anisopliae ESTs. Eighty percent of the ESTs that could be clustered into functional groups had significant matches (E<10(-5)) in other ascomycete fungi. These included genes reported to have specific roles in pathogens with plant or vertebrate hosts. Many of the remaining ESTs had their best BLAST match among animal, plant and bacterial sequences. These include genes with plant and microbial counterparts that produce potent antimicrobials. The abundance of transcripts discovered for different functional groups varied between the two subspecies of M. anisopliae in a manner consistent with ecological adaptations of the two pathogens. By hastening gene discovery this project has enhanced development of improved mycoinsecticides. In addition, the M. anisopliae ESTs represent a significant contribution to the extensive database of sequences from ascomycetes that are saprophytes or plant and vertebrate pathogens. Comparative analyses of these sequences is providing important information about the biology and evolutionary history of this clade.

  6. Construction of a Lotus japonicus late nodulin expressed sequence tag library and identification of novel nodule-specific genes.

    PubMed Central

    Szczyglowski, K; Hamburger, D; Kapranov, P; de Bruijn, F J

    1997-01-01

    A range of novel expressed sequence tags (ESTs) associated with late developmental events during nodule organogenesis in the legume Lotus japonicus were identified using mRNA differential display; 110 differentially displayed polymerase chain reaction products were cloned and analyzed. Of 88 unique cDNAs obtained, 22 shared significant homology to DNA/protein sequences in the respective databases. This group comprises, among others, a nodule-specific homolog of protein phosphatase 2C, a peptide transporter protein, and a nodule-specific form of cytochrome P450. RNA gel-blot analysis of 16 differentially displayed ESTs confirmed their nodule-specific expression pattern. The kinetics of mRNA accumulation of the majority of the ESTs analyzed were found to resemble the expression pattern observed for the L. japonicus leghemoglobin gene. These results indicate that the newly isolated molecular markers correspond to genes induced during late developmental stages of L. japonicus nodule organogenesis and provide important, novel tools for the study of nodulation. PMID:9276951

  7. Identification of true EST alignments for recognising transcribed regions.

    PubMed

    Ma, Chuang; Wang, Jia; Li, Lun; Duan, Mo-Jie; Zhou, Yan-Hong

    2011-01-01

    Transcribed regions can be determined by aligning Expressed Sequence Tags (ESTs) with genome sequences. The kernel of this strategy is to effectively distinguish true EST alignments from spurious ones. In this study, three measures including Direction Check, Identity Check and Terminal Check were introduced to more effectively eliminate spurious EST alignments. On the basis of these introduced measures and other widely used measures, a computational tool, named ESTCleanser, has been developed to identify true EST alignments for obtaining reliable transcribed regions. The performance of ESTCleanser has been evaluated on the well-annotated human ENCyclopedia of DNA Elements (ENCODE) regions using human ESTs in the dbEST database. The evaluation results show that the accuracy of ESTCleanser at exon and intron levels is more remarkably enhanced than that of UCSC-spliced EST alignments. This work would be helpful to EST-based researches on finding new genes, complementing genome annotation, recognising alternative splicing events and Single Nucleotide Polymorphisms (SNPs), etc.

  8. Generation and analysis of expressed sequence tags from a cDNA library of the fruiting body of Ganoderma lucidum

    PubMed Central

    2010-01-01

    Background Little genomic or trancriptomic information on Ganoderma lucidum (Lingzhi) is known. This study aims to discover the transcripts involved in secondary metabolite biosynthesis and developmental regulation of G. lucidum using an expressed sequence tag (EST) library. Methods A cDNA library was constructed from the G. lucidum fruiting body. Its high-quality ESTs were assembled into unique sequences with contigs and singletons. The unique sequences were annotated according to sequence similarities to genes or proteins available in public databases. The detection of simple sequence repeats (SSRs) was preformed by online analysis. Results A total of 1,023 clones were randomly selected from the G. lucidum library and sequenced, yielding 879 high-quality ESTs. These ESTs showed similarities to a diverse range of genes. The sequences encoding squalene epoxidase (SE) and farnesyl-diphosphate synthase (FPS) were identified in this EST collection. Several candidate genes, such as hydrophobin, MOB2, profilin and PHO84 were detected for the first time in G. lucidum. Thirteen (13) potential SSR-motif microsatellite loci were also identified. Conclusion The present study demonstrates a successful application of EST analysis in the discovery of transcripts involved in the secondary metabolite biosynthesis and the developmental regulation of G. lucidum. PMID:20230644

  9. RNA-Seq and molecular docking reveal multi-level pesticide resistance in the bed bug

    PubMed Central

    2012-01-01

    Background Bed bugs (Cimex lectularius) are hematophagous nocturnal parasites of humans that have attained high impact status due to their worldwide resurgence. The sudden and rampant resurgence of C. lectularius has been attributed to numerous factors including frequent international travel, narrower pest management practices, and insecticide resistance. Results We performed a next-generation RNA sequencing (RNA-Seq) experiment to find differentially expressed genes between pesticide-resistant (PR) and pesticide-susceptible (PS) strains of C. lectularius. A reference transcriptome database of 51,492 expressed sequence tags (ESTs) was created by combining the databases derived from de novo assembled mRNA-Seq tags (30,404 ESTs) and our previous 454 pyrosequenced database (21,088 ESTs). The two-way GLMseq analysis revealed ~15,000 highly significant differentially expressed ESTs between the PR and PS strains. Among the top 5,000 differentially expressed ESTs, 109 putative defense genes (cuticular proteins, cytochrome P450s, antioxidant genes, ABC transporters, glutathione S-transferases, carboxylesterases and acetyl cholinesterase) involved in penetration resistance and metabolic resistance were identified. Tissue and development-specific expression of P450 CYP3 clan members showed high mRNA levels in the cuticle, Malpighian tubules, and midgut; and in early instar nymphs, respectively. Lastly, molecular modeling and docking of a candidate cytochrome P450 (CYP397A1V2) revealed the flexibility of the deduced protein to metabolize a broad range of insecticide substrates including DDT, deltamethrin, permethrin, and imidacloprid. Conclusions We developed significant molecular resources for C. lectularius putatively involved in metabolic resistance as well as those participating in other modes of insecticide resistance. RNA-Seq profiles of PR strains combined with tissue-specific profiles and molecular docking revealed multi-level insecticide resistance in C. lectularius. Future research that is targeted towards RNA interference (RNAi) on the identified metabolic targets such as cytochrome P450s and cuticular proteins could lay the foundation for a better understanding of the genetic basis of insecticide resistance in C. lectularius. PMID:22226239

  10. ESTminer: a Web interface for mining EST contig and cluster databases.

    PubMed

    Huang, Yecheng; Pumphrey, Janie; Gingle, Alan R

    2005-03-01

    ESTminer is a Web application and database schema for interactive mining of expressed sequence tag (EST) contig and cluster datasets. The Web interface contains a query frame that allows the selection of contigs/clusters with specific cDNA library makeup or a threshold number of members. The results are displayed as color-coded tree nodes, where the color indicates the fractional size of each cDNA library component. The nodes are expandable, revealing library statistics as well as EST or contig members, with links to sequence data, GenBank records or user configurable links. Also, the interface allows 'queries within queries' where the result set of a query is further filtered by the subsequent query. ESTminer is implemented in Java/JSP and the package, including MySQL and Oracle schema creation scripts, is available from http://cggc.agtec.uga.edu/Data/download.asp agingle@uga.edu.

  11. Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis.

    PubMed

    Journet, Etienne-Pascal; van Tuinen, Diederik; Gouzy, Jérome; Crespeau, Hervé; Carreau, Véronique; Farmer, Mary-Jo; Niebel, Andreas; Schiex, Thomas; Jaillon, Olivier; Chatagnier, Odile; Godiard, Laurence; Micheli, Fabienne; Kahn, Daniel; Gianinazzi-Pearson, Vivienne; Gamas, Pascal

    2002-12-15

    We report on a large-scale expressed sequence tag (EST) sequencing and analysis program aimed at characterizing the sets of genes expressed in roots of the model legume Medicago truncatula during interactions with either of two microsymbionts, the nitrogen-fixing bacterium Sinorhizobium meliloti or the arbuscular mycorrhizal fungus Glomus intraradices. We have designed specific tools for in silico analysis of EST data, in relation to chimeric cDNA detection, EST clustering, encoded protein prediction, and detection of differential expression. Our 21 473 5'- and 3'-ESTs could be grouped into 6359 EST clusters, corresponding to distinct virtual genes, along with 52 498 other M.truncatula ESTs available in the dbEST (NCBI) database that were recruited in the process. These clusters were manually annotated, using a specifically developed annotation interface. Analysis of EST cluster distribution in various M.truncatula cDNA libraries, supported by a refined R test to evaluate statistical significance and by 'electronic northern' representation, enabled us to identify a large number of novel genes predicted to be up- or down-regulated during either symbiotic root interaction. These in silico analyses provide a first global view of the genetic programs for root symbioses in M.truncatula. A searchable database has been built and can be accessed through a public interface.

  12. Microsatellite DNA in genomic survey sequences and UniGenes of loblolly pine

    Treesearch

    Craig S Echt; Surya Saha; Dennis L Deemer; C Dana Nelson

    2011-01-01

    Genomic DNA sequence databases are a potential and growing resource for simple sequence repeat (SSR) marker development in loblolly pine (Pinus taeda L.). Loblolly pine also has many expressed sequence tags (ESTs) available for microsatellite (SSR) marker development. We compared loblolly pine SSR densities in genome survey sequences (GSSs) to those in non-redundant...

  13. Identification of Anhydrobiosis-related Genes from an Expressed Sequence Tag Database in the Cryptobiotic Midge Polypedilum vanderplanki (Diptera; Chironomidae)*

    PubMed Central

    Cornette, Richard; Kanamori, Yasushi; Watanabe, Masahiko; Nakahara, Yuichi; Gusev, Oleg; Mitsumasu, Kanako; Kadono-Okuda, Keiko; Shimomura, Michihiko; Mita, Kazuei; Kikawada, Takahiro; Okuda, Takashi

    2010-01-01

    Some organisms are able to survive the loss of almost all their body water content, entering a latent state known as anhydrobiosis. The sleeping chironomid (Polypedilum vanderplanki) lives in the semi-arid regions of Africa, and its larvae can survive desiccation in an anhydrobiotic form during the dry season. To unveil the molecular mechanisms of this resistance to desiccation, an anhydrobiosis-related Expressed Sequence Tag (EST) database was obtained from the sequences of three cDNA libraries constructed from P. vanderplanki larvae after 0, 12, and 36 h of desiccation. The database contained 15,056 ESTs distributed into 4,807 UniGene clusters. ESTs were classified according to gene ontology categories, and putative expression patterns were deduced for all clusters on the basis of the number of clones in each library; expression patterns were confirmed by real-time PCR for selected genes. Among up-regulated genes, antioxidants, late embryogenesis abundant (LEA) proteins, and heat shock proteins (Hsps) were identified as important groups for anhydrobiosis. Genes related to trehalose metabolism and various transporters were also strongly induced by desiccation. Those results suggest that the oxidative stress response plays a central role in successful anhydrobiosis. Similarly, protein denaturation and aggregation may be prevented by marked up-regulation of Hsps and the anhydrobiosis-specific LEA proteins. A third major feature is the predicted increase in trehalose synthesis and in the expression of various transporter proteins allowing the distribution of trehalose and other solutes to all tissues. PMID:20833722

  14. Generation and analysis of expressed sequence tags from six developing xylem libraries in Pinus radiata D. Don

    PubMed Central

    Li, Xinguo; Wu, Harry X; Dillon, Shannon K; Southerton, Simon G

    2009-01-01

    Background Wood is a major renewable natural resource for the timber, fibre and bioenergy industry. Pinus radiata D. Don is the most important commercial plantation tree species in Australia and several other countries; however, genomic resources for this species are very limited in public databases. Our primary objective was to sequence a large number of expressed sequence tags (ESTs) from genes involved in wood formation in radiata pine. Results Six developing xylem cDNA libraries were constructed from earlywood and latewood tissues sampled at juvenile (7 yrs), transition (11 yrs) and mature (30 yrs) ages, respectively. These xylem tissues represent six typical development stages in a rotation period of radiata pine. A total of 6,389 high quality ESTs were collected from 5,952 cDNA clones. Assembly of 5,952 ESTs from 5' end sequences generated 3,304 unigenes including 952 contigs and 2,352 singletons. About 97.0% of the 5,952 ESTs and 96.1% of the unigenes have matches in the UniProt and TIGR databases. Of the 3,174 unigenes with matches, 42.9% were not assigned GO (Gene Ontology) terms and their functions are unknown or unclassified. More than half (52.1%) of the 5,952 ESTs have matches in the Pfam database and represent 772 known protein families. About 18.0% of the 5,952 ESTs matched cell wall related genes in the MAIZEWALL database, representing all 18 categories, 91 of all 174 families and possibly 557 genes. Fifteen cell wall-related genes are ranked in the 30 most abundant genes, including CesA, tubulin, AGP, SAMS, actin, laccase, CCoAMT, MetE, phytocyanin, pectate lyase, cellulase, SuSy, expansin, chitinase and UDP-glucose dehydrogenase. Based on the PlantTFDB database 41 of the 64 transcription factor families in the poplar genome were identified as being involved in radiata pine wood formation. Comparative analysis of GO term abundance revealed a distinct transcriptome in juvenile earlywood formation compared to other stages of wood development. Conclusion The first large scale genomic resource in radiata pine was generated from six developing xylem cDNA libraries. Cell wall-related genes and transcription factors were identified. Juvenile earlywood has a distinct transcriptome, which is likely to contribute to the undesirable properties of juvenile wood in radiata pine. The publicly available resource of radiata pine will also be valuable for gene function studies and comparative genomics in forest trees. PMID:19159482

  15. Expressed sequence tags from the plant trypanosomatid Phytomonas serpens.

    PubMed

    Pappas, Georgios J; Benabdellah, Karim; Zingales, Bianca; González, Antonio

    2005-08-01

    We have generated 2190 expressed sequence tags (ESTs) from a cDNA library of the plant trypanosomatid Phytomonas serpens. Upon processing and clustering the set of 1893 accepted sequences was reduced to 697 clusters consisting of 452 singletons and 245 contigs. Functional categories were assigned based on BLAST searches against a database of the eukaryotic orthologous groups of proteins (KOG). Thirty six percent of the generated sequences showed no hits against the KOG database and 39.6% presented similarity to the KOG classes corresponding to translation, ribosomal structure and biogenesis. The most populated cluster contained 45 ESTs homologous to members of the glucose transporter family. This fact can be immediately correlated to the reported Phytomonas dependence on anaerobic glycolytic ATP production due to the lack of cytochrome-mediated respiratory chain. In this context, not only a number of enzymes of the glycolytic pathway were identified but also of the Krebs cycle as well as specific components of the respiratory chain. The data here reported, including a few hundred unique sequences and the description of tandemly repeated motifs and putative transcript stability motifs at untranslated mRNA ends, represent an initial approach to overcome the lack of information on the molecular biology of this organism.

  16. ESTree db: a Tool for Peach Functional Genomics

    PubMed Central

    Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo

    2005-01-01

    Background The ESTree db represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. Results The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. Conclusion The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig. PMID:16351742

  17. ESTree db: a tool for peach functional genomics.

    PubMed

    Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo

    2005-12-01

    The ESTree db http://www.itb.cnr.it/estree/ represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig.

  18. MAGIC database and interfaces: an integrated package for gene discovery and expression.

    PubMed

    Cordonnier-Pratt, Marie-Michèle; Liang, Chun; Wang, Haiming; Kolychev, Dmitri S; Sun, Feng; Freeman, Robert; Sullivan, Robert; Pratt, Lee H

    2004-01-01

    The rapidly increasing rate at which biological data is being produced requires a corresponding growth in relational databases and associated tools that can help laboratories contend with that data. With this need in mind, we describe here a Modular Approach to a Genomic, Integrated and Comprehensive (MAGIC) Database. This Oracle 9i database derives from an initial focus in our laboratory on gene discovery via production and analysis of expressed sequence tags (ESTs), and subsequently on gene expression as assessed by both EST clustering and microarrays. The MAGIC Gene Discovery portion of the database focuses on information derived from DNA sequences and on its biological relevance. In addition to MAGIC SEQ-LIMS, which is designed to support activities in the laboratory, it contains several additional subschemas. The latter include MAGIC Admin for database administration, MAGIC Sequence for sequence processing as well as sequence and clone attributes, MAGIC Cluster for the results of EST clustering, MAGIC Polymorphism in support of microsatellite and single-nucleotide-polymorphism discovery, and MAGIC Annotation for electronic annotation by BLAST and BLAT. The MAGIC Microarray portion is a MIAME-compliant database with two components at present. These are MAGIC Array-LIMS, which makes possible remote entry of all information into the database, and MAGIC Array Analysis, which provides data mining and visualization. Because all aspects of interaction with the MAGIC Database are via a web browser, it is ideally suited not only for individual research laboratories but also for core facilities that serve clients at any distance.

  19. Generation and Analysis of Expressed Sequence Tags from Olea europaea L.

    PubMed Central

    Ozdemir Ozgenturk, Nehir; Oruç, Fatma; Sezerman, Ugur; Kuçukural, Alper; Vural Korkut, Senay; Toksoz, Feriha; Un, Cemal

    2010-01-01

    Olive (Olea europaea L.) is an important source of edible oil which was originated in Near-East region. In this study, two cDNA libraries were constructed from young olive leaves and immature olive fruits for generation of ESTs to discover the novel genes and search the function of unknown genes of olive. The randomly selected 3840 colonies were sequenced for EST collection from both libraries. Readable 2228 sequences for olive leaf and 1506 sequences for olive fruit were assembled into 205 and 69 contigs, respectively, whereas 2478 were singletons. Putative functions of all 2752 differentially expressed unique sequences were designated by gene homology based on BLAST and annotated using BLAST2GO. While 1339 ESTs show no homology to the database, 2024 ESTs have homology (under 80%) with hypothetical proteins, putative proteins, expressed proteins, and unknown proteins in NCBI-GenBank. 635 EST's unique genes sequence have been identified by over 80% homology to known function in other species which were not previously described in Olea family. Only 3.1% of total EST's was shown similarity with olive database existing in NCBI. This generated EST's data and consensus sequences were submitted to NCBI as valuable source for functional genome studies of olive. PMID:21197085

  20. Preparing and Analyzing Expressed Sequence Tags (ESTs) Library for the Mammary Tissue of Local Turkish Kivircik Sheep

    PubMed Central

    Omeroglu Ulu, Zehra; Ulu, Salih; Un, Cemal; Ozdem Oztabak, Kemal; Altunatmaz, Kemal

    2017-01-01

    Kivircik sheep is an important local Turkish sheep according to its meat quality and milk productivity. The aim of this study was to analyze gene expression profiles of both prenatal and postnatal stages for the Kivircik sheep. Therefore, two different cDNA libraries, which were taken from the same Kivircik sheep mammary gland tissue at prenatal and postnatal stages, were constructed. Total 3072 colonies which were randomly selected from the two libraries were sequenced for developing a sheep ESTs collection. We used Phred/Phrap computer programs for analysis of the raw EST and readable EST sequences were assembled with the CAP3 software. Putative functions of all unique sequences and statistical analysis were determined by Geneious software. Total 422 ESTs have over 80% similarity to known sequences of other organisms in NCBI classified by Panther database for the Gene Ontology (GO) category. By comparing gene expression profiles, we observed some putative genes that may be relative to reproductive performance or play important roles in milk synthesis and secretion. A total of 2414 ESTs have been deposited to the NCBI GenBank database (GW996847–GW999260). EST data in this study have provided a new source of information to functional genome studies of sheep. PMID:28239610

  1. Identification of single nucleotide polymorphism in ginger using expressed sequence tags

    PubMed Central

    Chandrasekar, Arumugam; Riju, Aikkal; Sithara, Kandiyl; Anoop, Sahadevan; Eapen, Santhosh J

    2009-01-01

    Ginger (Zingiber officinale Rosc) (Family: Zingiberaceae) is a herbaceous perennial, the rhizomes of which are used as a spice. Ginger is a plant which is well known for its medicinal applications. Recently EST-derived SNPs are a free by-product of the currently expanding EST (Expressed Sequence Tag) databases. The development of high-throughput methods for the detection of SNPs (Single Nucleotide Polymorphism) and small indels (insertion/deletion) has led to a revolution in their use as molecular markers. Available (38139) Ginger EST sequences were mined from dbEST of NCBI. CAP3 program was used to assemble EST sequences into contigs. Candidate SNPs and Indel polymorphisms were detected using the perl script AutoSNP version 1.0 which has used 31905 ESTs for detecting SNPs and Indel sites. We found 64026 SNP sites and 7034 indel polymorphisms with frequency of 0.84 SNPs / 100 bp. Among the three tissues from which the EST libraries had been generated, Rhizomes had high frequency of 1.08 SNPs/indels per 100 bp whereas the leaves had lowest frequency of 0.63 per 100 bp and root is showing relative frequency 0.82/100bp. Transitions and transversion ratio is 0.90. In overall detected SNP, transversion is high when compare to transition. These detected SNPs can be used as markers for genetic studies. Availability The results of the present study hosted in our webserver www.spices.res.in/spicesnip PMID:20198184

  2. Complementary DNA sequencing and identification of mRNAs from the venomous gland of Agkistrodon piscivorus leucostoma.

    PubMed

    Jia, Ying; Cantu, Bruno A; Sánchez, Elda E; Pérez, John C

    2008-06-15

    To advance our knowledge on the snake venom composition and transcripts expressed in venom gland at the molecular level, we constructed a cDNA library from the venom gland of Agkistrodon piscivorus leucostoma for the generation of expressed sequence tags (ESTs) database. From the randomly sequenced 2112 independent clones, we have obtained ESTs for 1309 (62%) cDNAs, which showed significant deduced amino acid sequence similarity (scores >80) to previously characterized proteins in National Center for Biotechnology Information (NCBI) database. Ribosomal proteins make up 47 clones (2%) and the remaining 756 (36%) cDNAs represent either unknown identity or show BLASTX sequence identity scores of <80 with known GenBank accessions. The most highly expressed gene encoding phospholipase A(2) (PLA(2)) accounting for 35% of A. p. leucostoma venom gland cDNAs was identified and further confirmed by crude venom applied to sodium dodecyl sulfate/polyacrylamide gel electrophoresis (SDS-PAGE) electrophoresis and protein sequencing. A total of 180 representative genes were obtained from the sequence assemblies and deposited to EST database. Clones showing sequence identity to disintegrins, thrombin-like enzymes, hemorrhagic toxins, fibrinogen clotting inhibitors and plasminogen activators were also identified in our EST database. These data can be used to develop a research program that will help us identify genes encoding proteins that are of medical importance or proteins involved in the mechanisms of the toxin venom.

  3. Generation and analysis of expressed sequence tags from the bone marrow of Chinese Sika deer.

    PubMed

    Yao, Baojin; Zhao, Yu; Zhang, Mei; Li, Juan

    2012-03-01

    Sika deer is one of the best-known and highly valued animals of China. Despite its economic, cultural, and biological importance, there has not been a large-scale sequencing project for Sika deer to date. With the ultimate goal of sequencing the complete genome of this organism, we first established a bone marrow cDNA library for Sika deer and generated a total of 2,025 reads. After processing the sequences, 2,017 high-quality expressed sequence tags (ESTs) were obtained. These ESTs were assembled into 1,157 unigenes, including 238 contigs and 919 singletons. Comparative analyses indicated that 888 (76.75%) of the unigenes had significant matches to sequences in the non-redundant protein database, In addition to highly expressed genes, such as stearoyl-CoA desaturase, cytochrome c oxidase, adipocyte-type fatty acid-binding protein, adiponectin and thymosin beta-4, we also obtained vascular endothelial growth factor-A and heparin-binding growth-associated molecule, both of which are of great importance for angiogenesis research. There were 244 (21.09%) unigenes with no significant match to any sequence in current protein or nucleotide databases, and these sequences may represent genes with unknown function in Sika deer. Open reading frame analysis of the sequences was performed using the getorf program. In addition, the sequences were functionally classified using the gene ontology hierarchy, clusters of orthologous groups of proteins and Kyoto encyclopedia of genes and genomes databases. Analysis of ESTs described in this paper provides an important resource for the transcriptome exploration of Sika deer, and will also facilitate further studies on functional genomics, gene discovery and genome annotation of Sika deer.

  4. Integration of transcriptomic and proteomic data from a single wheat cultivar provides new tools for understanding the roles of individual alpha gliadin proteins in flour quality and celiac disease

    USDA-ARS?s Scientific Manuscript database

    One-hundred-thirty-six expressed sequence tags (ESTs) encoding alpha gliadins from Triticum aestivum cv Butte 86 were identified in public databases and assembled into 19 contigs. Consensus sequences for 12 of the contigs encoded complete alpha gliadin proteins, but only two were identical to protei...

  5. PipeOnline 2.0: automated EST processing and functional data sorting.

    PubMed

    Ayoubi, Patricia; Jin, Xiaojing; Leite, Saul; Liu, Xianghui; Martajaja, Jeson; Abduraham, Abdurashid; Wan, Qiaolan; Yan, Wei; Misawa, Eduardo; Prade, Rolf A

    2002-11-01

    Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.

  6. Analysis of expressed sequence tags of the cyclically parthenogenetic rotifer Brachionus plicatilis.

    PubMed

    Suga, Koushirou; Welch, David Mark; Tanaka, Yukari; Sakakura, Yoshitaka; Hagiwara, Atsushi

    2007-08-01

    Rotifers are among the most common non-arthropod animals and are the most experimentally tractable members of the basal assemblage of metazoan phyla known as Gnathifera. The monogonont rotifer Brachionus plicatilis is a developing model system for ecotoxicology, aquatic ecology, cryptic speciation, and the evolution of sex, and is an important food source for finfish aquaculture. However, basic knowledge of the genome and transcriptome of any rotifer species has been lacking. We generated and partially sequenced a cDNA library from B. plicatilis and constructed a database of over 2300 expressed sequence tags corresponding to more than 450 transcripts. About 20% of the transcripts had no significant similarity to database sequences by BLAST; most of these contained open reading frames of significant length but few had recognized Pfam motifs. Sixteen transcripts accounted for 25% of the ESTs; four of these had no significant similarity to BLAST or Pfam databases. Putative up- and downstream untranslated regions are relatively short and AT rich. In contrast to bdelloid rotifers, there was no evidence of a conserved trans-spliced leader sequence among the transcripts and most genes were single-copy. Despite the small size of this EST project it revealed several important features of the rotifer transcriptome and of individual monogonont genes. Because there is little genomic data for Gnathifera, the transcripts we found with no known function may represent genes that are species-, class-, phylum- or even superphylum-specific; the fact that some are among the most highly expressed indicates their importance. The absence of trans-spliced leader exons in this monogonont species contrasts with their abundance in bdelloid rotifers and indicates that the presence of this phenomenon can vary at the subphylum level. Our EST database provides a relatively large quantity of transcript-level data for B. plicatilis, and more generally of rotifers and other gnathiferan phyla, and can be browsed and searched at gmod.mbl.edu.

  7. Analysis of Expressed Sequence Tags of the Cyclically Parthenogenetic Rotifer Brachionus plicatilis

    PubMed Central

    Suga, Koushirou; Mark Welch, David; Tanaka, Yukari; Sakakura, Yoshitaka; Hagiwara, Atsushi

    2007-01-01

    Background Rotifers are among the most common non-arthropod animals and are the most experimentally tractable members of the basal assemblage of metazoan phyla known as Gnathifera. The monogonont rotifer Brachionus plicatilis is a developing model system for ecotoxicology, aquatic ecology, cryptic speciation, and the evolution of sex, and is an important food source for finfish aquaculture. However, basic knowledge of the genome and transcriptome of any rotifer species has been lacking. Methodology/Principal Findings We generated and partially sequenced a cDNA library from B. plicatilis and constructed a database of over 2300 expressed sequence tags corresponding to more than 450 transcripts. About 20% of the transcripts had no significant similarity to database sequences by BLAST; most of these contained open reading frames of significant length but few had recognized Pfam motifs. Sixteen transcripts accounted for 25% of the ESTs; four of these had no significant similarity to BLAST or Pfam databases. Putative up- and downstream untranslated regions are relatively short and AT rich. In contrast to bdelloid rotifers, there was no evidence of a conserved trans-spliced leader sequence among the transcripts and most genes were single-copy. Conclusions/Significance Despite the small size of this EST project it revealed several important features of the rotifer transcriptome and of individual monogonont genes. Because there is little genomic data for Gnathifera, the transcripts we found with no known function may represent genes that are species-, class-, phylum- or even superphylum-specific; the fact that some are among the most highly expressed indicates their importance. The absence of trans-spliced leader exons in this monogonont species contrasts with their abundance in bdelloid rotifers and indicates that the presence of this phenomenon can vary at the subphylum level. Our EST database provides a relatively large quantity of transcript-level data for B. plicatilis, and more generally of rotifers and other gnathiferan phyla, and can be browsed and searched at gmod.mbl.edu. PMID:17668053

  8. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    NASA Astrophysics Data System (ADS)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  9. Developing expressed sequence tag libraries and the discovery of simple sequence repeat markers for two species of raspberry (Rubus L.).

    PubMed

    Bushakra, Jill M; Lewers, Kim S; Staton, Margaret E; Zhebentyayeva, Tetyana; Saski, Christopher A

    2015-10-26

    Due to a relatively high level of codominant inheritance and transferability within and among taxonomic groups, simple sequence repeat (SSR) markers are important elements in comparative mapping and delineation of genomic regions associated with traits of economic importance. Expressed sequence tags (ESTs) are a source of SSRs that can be used to develop markers to facilitate plant breeding and for more basic research across genera and higher plant orders. Leaf and meristem tissue from 'Heritage' red raspberry (Rubus idaeus) and 'Bristol' black raspberry (R. occidentalis) were utilized for RNA extraction. After conversion to cDNA and library construction, ESTs were sequenced, quality verified, assembled and scanned for SSRs.  Primers flanking the SSRs were designed and a subset tested for amplification, polymorphism and transferability across species. ESTs containing SSRs were functionally annotated using the GenBank non-redundant (nr) database and further classified using the gene ontology database. To accelerate development of EST-SSRs in the genus Rubus (Rosaceae), 1149 and 2358 cDNA sequences were generated from red raspberry and black raspberry, respectively. The cDNA sequences were screened using rigorous filtering criteria which resulted in the identification of 121 and 257 SSR loci for red and black raspberry, respectively. Primers were designed from the surrounding sequences resulting in 131 and 288 primer pairs, respectively, as some sequences contained more than one SSR locus. Sequence analysis revealed that the SSR-containing genes span a diversity of functions and share more sequence identity with strawberry genes than with other Rosaceous species. This resource of Rubus-specific, gene-derived markers will facilitate the construction of linkage maps composed of transferable markers for studying and manipulating important traits in this economically important genus.

  10. Serial analysis of gene expression in the silkworm, Bombyx mori.

    PubMed

    Huang, Jianhua; Miao, Xuexia; Jin, Weirong; Couble, Pierre; Mita, Kasuei; Zhang, Yong; Liu, Wenbin; Zhuang, Leijun; Shen, Yan; Keime, Celine; Gandrillon, Olivier; Brouilly, Patrick; Briolay, Jerome; Zhao, Guoping; Huang, Yongping

    2005-08-01

    The silkworm Bombyx mori is one of the most economically important insects and serves as a model for Lepidoptera insects. We used serial analysis of gene expression (SAGE) to derive profiles of expressed genes during the developmental life cycle of the silkworm and to create a reference for understanding silkworm metamorphosis. We generated four SAGE libraries, one from each of the four developmental stages of the silkworm. In total we obtained 257,964 SAGE tags, of which 39,485 were unique tags. Sorted by copy number, 14.1% of the unique tags were detected at a median to high level (five or more copies), 24.2% at lower levels (two to four copies), and 61.7% as single copies. Using a basic local alignment search tool on the EST database, 35% of the tags matched known silkworm expressed sequence tags. SAGE demonstrated that a number of the genes were up- or down-regulated during the four developmental phases of the egg, larva, pupa, and adult. Furthermore, we found that the generation of longer cDNA fragments from SAGE tags constituted the most efficient method of gene identification, which facilitated the analysis of a large number of unknown genes.

  11. Characterization of the Kenaf (Hibiscus cannabinus) Global Transcriptome Using Illumina Paired-End Sequencing and Development of EST-SSR Markers

    PubMed Central

    Li, Hui; Li, Defang; Chen, Anguo; Tang, Huijuan; Li, Jianjun; Huang, Siqi

    2016-01-01

    Kenaf (Hibiscus cannabinus L.) is an economically important natural fiber crop grown worldwide. However, only 20 expressed tag sequences (ESTs) for kenaf are available in public databases. The aim of this study was to develop large-scale simple sequence repeat (SSR) markers to lay a solid foundation for the construction of genetic linkage maps and marker-assisted breeding in kenaf. We used Illumina paired-end sequencing technology to generate new EST-simple sequences and MISA software to mine SSR markers. We identified 71,318 unigenes with an average length of 1143 nt and annotated these unigenes using four different protein databases. Overall, 9324 complementary pairs were designated as EST-SSR markers, and their quality was validated using 100 randomly selected SSR markers. In total, 72 primer pairs reproducibly amplified target amplicons, and 61 of these primer pairs detected significant polymorphism among 28 kenaf accessions. Thus, in this study, we have developed large-scale SSR markers for kenaf, and this new resource will facilitate construction of genetic linkage maps, investigation of fiber growth and development in kenaf, and also be of value to novel gene discovery and functional genomic studies. PMID:26960153

  12. Next-Generation Sequencing of the Chrysanthemum nankingense (Asteraceae) Transcriptome Permits Large-Scale Unigene Assembly and SSR Marker Discovery

    PubMed Central

    Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi

    2013-01-01

    Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799

  13. Gene discovery in Boophilus microplus, the cattle tick: the transcriptomes of ovaries, salivary glands, and hemocytes.

    PubMed

    Santos, Isabel K F de Miranda; Valenzuela, Jesus G; Ribeiro, José Marcos C; de Castro, Marilia; Costa, Juliana Nardelli; Costa, Ana Maria; da Silva, Edson Ramiro; Neto, Olavo Bilac Rego; Rocha, Clarisse; Daffre, Sirlei; Ferreira, Beatriz R; da Silva, João Santana; Szabó, Matias Pablo; Bechara, Gervasio Henrique

    2004-10-01

    The quest for new control strategies for ticks can profit from high throughput genomics. In order to identify genes that are involved in oogenesis and development, in defense, and in hematophagy, the transcriptomes of ovaries, hemocytes, and salivary glands from rapidly ingurgitating females, and of salivary glands from males of Boophilus microplus were PCR amplified, and the expressed sequence tags (EST) of random clones were mass sequenced. So far, more than 1,344 EST have been generated for these tissues, with approximately 30% novelty, depending on the the tissue studied. To date approximately 760 nucleotide sequences from B. microplus are deposited in the NCBI database. Mass sequencing of partial cDNAs of parasite genes can build up this scant database and rapidly generate a large quantity of useful information about potential targets for immunobiological or chemical control.

  14. Characteristics of the Lotus japonicus gene repertoire deduced from large-scale expressed sequence tag (EST) analysis.

    PubMed

    Asamizu, Erika; Nakamura, Yasukazu; Sato, Shusei; Tabata, Satoshi

    2004-02-01

    To perform a comprehensive analysis of genes expressed in a model legume, Lotus japonicus, a total of 74472 3'-end expressed sequence tags (EST) were generated from cDNA libraries produced from six different organs. Clustering of sequences was performed with an identity criterion of 95% for 50 bases, and a total of 20457 non-redundant sequences, 8503 contigs and 11954 singletons were generated. EST sequence coverage was analyzed by using the annotated L. japonicus genomic sequence and 1093 of the 1889 predicted protein-encoding genes (57.9%) were hit by the EST sequence(s). Gene content was compared to several plant species. Among the 8503 contigs, 471 were identified as sequences conserved only in leguminous species and these included several disease resistance-related genes. This suggested that in legumes, these genes may have evolved specifically to resist pathogen attack. The rate of gene sequence divergence was assessed by comparing similarity level and functional category based on the Gene Ontology (GO) annotation of Arabidopsis genes. This revealed that genes encoding ribosomal proteins, as well as those related to translation, photosynthesis, and cellular structure were more abundantly represented in the highly conserved class, and that genes encoding transcription factors and receptor protein kinases were abundantly represented in the less conserved class. To make the sequence information and the cDNA clones available to the research community, a Web database with useful services was created at http://www.kazusa.or.jp/en/plant/lotus/EST/.

  15. Transcriptome analysis of the desert locust central nervous system: production and annotation of a Schistocerca gregaria EST database.

    PubMed

    Badisco, Liesbeth; Huybrechts, Jurgen; Simonet, Gert; Verlinden, Heleen; Marchal, Elisabeth; Huybrechts, Roger; Schoofs, Liliane; De Loof, Arnold; Vanden Broeck, Jozef

    2011-03-21

    The desert locust (Schistocerca gregaria) displays a fascinating type of phenotypic plasticity, designated as 'phase polyphenism'. Depending on environmental conditions, one genome can be translated into two highly divergent phenotypes, termed the solitarious and gregarious (swarming) phase. Although many of the underlying molecular events remain elusive, the central nervous system (CNS) is expected to play a crucial role in the phase transition process. Locusts have also proven to be interesting model organisms in a physiological and neurobiological research context. However, molecular studies in locusts are hampered by the fact that genome/transcriptome sequence information available for this branch of insects is still limited. We have generated 34,672 raw expressed sequence tags (EST) from the CNS of desert locusts in both phases. These ESTs were assembled in 12,709 unique transcript sequences and nearly 4,000 sequences were functionally annotated. Moreover, the obtained S. gregaria EST information is highly complementary to the existing orthopteran transcriptomic data. Since many novel transcripts encode neuronal signaling and signal transduction components, this paper includes an overview of these sequences. Furthermore, several transcripts being differentially represented in solitarious and gregarious locusts were retrieved from this EST database. The findings highlight the involvement of the CNS in the phase transition process and indicate that this novel annotated database may also add to the emerging knowledge of concomitant neuronal signaling and neuroplasticity events. In summary, we met the need for novel sequence data from desert locust CNS. To our knowledge, we hereby also present the first insect EST database that is derived from the complete CNS. The obtained S. gregaria EST data constitute an important new source of information that will be instrumental in further unraveling the molecular principles of phase polyphenism, in further establishing locusts as valuable research model organisms and in molecular evolutionary and comparative entomology.

  16. Analysis of expressed sequence tags from the four main developmental stages of Trypanosoma congolense

    PubMed Central

    Helm, Jared R.; Hertz-Fowler, Christiane; Aslett, Martin; Berriman, Matthew; Sanders, Mandy; Quail, Michael A.; Soares, Marcelo B.; Bonaldo, Maria F.; Sakurai, Tatsuya; Inoue, Noboru; Donelson, John E.

    2009-01-01

    Trypanosoma congolense is one of the most economically important pathogens of livestock in Africa. Culture-derived parasites of each of the three main insect stages of the T. congolense life cycle, i.e., the procyclic, epimastigote and metacyclic stages, and bloodstream stage parasites isolated from infected mice, were used to construct stage-specific cDNA libraries and expressed sequence tags (ESTs or cDNA clones) in each library were sequenced. Thirteen EST clusters encoding different variant surface glycoproteins (VSGs) were detected in the metacyclic library and twenty-six VSG EST clusters were found in the bloodstream library, six of which are shared by the metacyclic library. Rare VSG ESTs are present in the epimastigote library, and none were detected in the procyclic library. ESTs encoding enzymes that catalyze oxidative phosphorylation and amino acid metabolism are about twice as abundant in the procyclic and epimastigote stages as in the metacyclic and bloodstream stages. In contrast, ESTs encoding enzymes involved in glycolysis, the citric acid cycle and nucleotide metabolism are about the same in all four developmental stages. Cysteine proteases, kinases and phosphatases are the most abundant enzyme groups represented by the ESTs. All four libraries contain T. congolense-specific expressed sequences not present in the T. brucei and T. cruzi genomes. Normalized cDNA libraries were constructed from the metacyclic and bloodstream stages, and found to be further enriched for T. congolense-specific ESTs. Given that cultured T. congolense offers an experimental advantage over other African trypanosome species, these ESTs provide a basis for further investigation of the molecular properties of these four developmental stages, especially the epimastigote and metacyclic stages for which it is difficult to obtain large quantities of organisms. The T. congolense EST databases are available at: http://www.sanger.ac.uk/Projects/T_congolense/EST_index.shtml. PMID:19559733

  17. ocsESTdb: a database of oil crop seed EST sequences for comparative analysis and investigation of a global metabolic network and oil accumulation metabolism.

    PubMed

    Ke, Tao; Yu, Jingyin; Dong, Caihua; Mao, Han; Hua, Wei; Liu, Shengyi

    2015-01-21

    Oil crop seeds are important sources of fatty acids (FAs) for human and animal nutrition. Despite their importance, there is a lack of an essential bioinformatics resource on gene transcription of oil crops from a comparative perspective. In this study, we developed ocsESTdb, the first database of expressed sequence tag (EST) information on seeds of four large-scale oil crops with an emphasis on global metabolic networks and oil accumulation metabolism that target the involved unigenes. A total of 248,522 ESTs and 106,835 unigenes were collected from the cDNA libraries of rapeseed (Brassica napus), soybean (Glycine max), sesame (Sesamum indicum) and peanut (Arachis hypogaea). These unigenes were annotated by a sequence similarity search against databases including TAIR, NR protein database, Gene Ontology, COG, Swiss-Prot, TrEMBL and Kyoto Encyclopedia of Genes and Genomes (KEGG). Five genome-scale metabolic networks that contain different numbers of metabolites and gene-enzyme reaction-association entries were analysed and constructed using Cytoscape and yEd programs. Details of unigene entries, deduced amino acid sequences and putative annotation are available from our database to browse, search and download. Intuitive and graphical representations of EST/unigene sequences, functional annotations, metabolic pathways and metabolic networks are also available. ocsESTdb will be updated regularly and can be freely accessed at http://ocri-genomics.org/ocsESTdb/ . ocsESTdb may serve as a valuable and unique resource for comparative analysis of acyl lipid synthesis and metabolism in oilseed plants. It also may provide vital insights into improving oil content in seeds of oil crop species by transcriptional reconstruction of the metabolic network.

  18. Newt-omics: a comprehensive repository for omics data from the newt Notophthalmus viridescens

    PubMed Central

    Bruckskotten, Marc; Looso, Mario; Reinhardt, Richard; Braun, Thomas; Borchardt, Thilo

    2012-01-01

    Notophthalmus viridescens, a member of the salamander family is an excellent model organism to study regenerative processes due to its unique ability to replace lost appendages and to repair internal organs. Molecular insights into regenerative events have been severely hampered by the lack of genomic, transcriptomic and proteomic data, as well as an appropriate database to store such novel information. Here, we describe ‘Newt-omics’ (http://newt-omics.mpi-bn.mpg.de), a database, which enables researchers to locate, retrieve and store data sets dedicated to the molecular characterization of newts. Newt-omics is a transcript-centred database, based on an Expressed Sequence Tag (EST) data set from the newt, covering ∼50 000 Sanger sequenced transcripts and a set of high-density microarray data, generated from regenerating hearts. Newt-omics also contains a large set of peptides identified by mass spectrometry, which was used to validate 13 810 ESTs as true protein coding. Newt-omics is open to implement additional high-throughput data sets without changing the database structure. Via a user-friendly interface Newt-omics allows access to a huge set of molecular data without the need for prior bioinformatical expertise. PMID:22039101

  19. Analysis and functional annotation of expressed sequence tags (ESTs) from multiple tissues of oil palm (Elaeis guineensis Jacq.)

    PubMed Central

    Ho, Chai-Ling; Kwan, Yen-Yen; Choi, Mei-Chooi; Tee, Sue-Sean; Ng, Wai-Har; Lim, Kok-Ang; Lee, Yang-Ping; Ooi, Siew-Eng; Lee, Weng-Wah; Tee, Jin-Ming; Tan, Siang-Hee; Kulaveerasingam, Harikrishna; Alwee, Sharifah Shahrul Rabiah Syed; Abdullah, Meilina Ong

    2007-01-01

    Background Oil palm is the second largest source of edible oil which contributes to approximately 20% of the world's production of oils and fats. In order to understand the molecular biology involved in in vitro propagation, flowering, efficient utilization of nitrogen sources and root diseases, we have initiated an expressed sequence tag (EST) analysis on oil palm. Results In this study, six cDNA libraries from oil palm zygotic embryos, suspension cells, shoot apical meristems, young flowers, mature flowers and roots, were constructed. We have generated a total of 14537 expressed sequence tags (ESTs) from these libraries, from which 6464 tentative unique contigs (TUCs) and 2129 singletons were obtained. Approximately 6008 of these tentative unique genes (TUGs) have significant matches to the non-redundant protein database, from which 2361 were assigned to one or more Gene Ontology categories. Predominant transcripts and differentially expressed genes were identified in multiple oil palm tissues. Homologues of genes involved in many aspects of flower development were also identified among the EST collection, such as CONSTANS-like, AGAMOUS-like (AGL)2, AGL20, LFY-like, SQUAMOSA, SQUAMOSA binding protein (SBP) etc. Majority of them are the first representatives in oil palm, providing opportunities to explore the cause of epigenetic homeotic flowering abnormality in oil palm, given the importance of flowering in fruit production. The transcript levels of two flowering-related genes, EgSBP and EgSEP were analysed in the flower tissues of various developmental stages. Gene homologues for enzymes involved in oil biosynthesis, utilization of nitrogen sources, and scavenging of oxygen radicals, were also uncovered among the oil palm ESTs. Conclusion The EST sequences generated will allow comparative genomic studies between oil palm and other monocotyledonous and dicotyledonous plants, development of gene-targeted markers for the reference genetic map, design and fabrication of DNA array for future studies of oil palm. The outcomes of such studies will contribute to oil palm improvements through the establishment of breeding program using marker-assisted selection, development of diagnostic assays using gene targeted markers, and discovery of candidate genes related to important agronomic traits of oil palm. PMID:17953740

  20. An expressed sequence tag (EST) data mining strategy succeeding in the discovery of new G-protein coupled receptors.

    PubMed

    Wittenberger, T; Schaller, H C; Hellebrand, S

    2001-03-30

    We have developed a comprehensive expressed sequence tag database search method and used it for the identification of new members of the G-protein coupled receptor superfamily. Our approach proved to be especially useful for the detection of expressed sequence tag sequences that do not encode conserved parts of a protein, making it an ideal tool for the identification of members of divergent protein families or of protein parts without conserved domain structures in the expressed sequence tag database. At least 14 of the expressed sequence tags found with this strategy are promising candidates for new putative G-protein coupled receptors. Here, we describe the sequence and expression analysis of five new members of this receptor superfamily, namely GPR84, GPR86, GPR87, GPR90 and GPR91. We also studied the genomic structure and chromosomal localization of the respective genes applying in silico methods. A cluster of six closely related G-protein coupled receptors was found on the human chromosome 3q24-3q25. It consists of four orphan receptors (GPR86, GPR87, GPR91, and H963), the purinergic receptor P2Y1, and the uridine 5'-diphosphoglucose receptor KIAA0001. It seems likely that these receptors evolved from a common ancestor and therefore might have related ligands. In conclusion, we describe a data mining procedure that proved to be useful for the identification and first characterization of new genes and is well applicable for other gene families. Copyright 2001 Academic Press.

  1. GarlicESTdb: an online database and mining tool for garlic EST sequences.

    PubMed

    Kim, Dae-Won; Jung, Tae-Sung; Nam, Seong-Hyeuk; Kwon, Hyuk-Ryul; Kim, Aeri; Chae, Sung-Hwa; Choi, Sang-Haeng; Kim, Dong-Wook; Kim, Ryong Nam; Park, Hong-Seog

    2009-05-18

    Allium sativum., commonly known as garlic, is a species in the onion genus (Allium), which is a large and diverse one containing over 1,250 species. Its close relatives include chives, onion, leek and shallot. Garlic has been used throughout recorded history for culinary, medicinal use and health benefits. Currently, the interest in garlic is highly increasing due to nutritional and pharmaceutical value including high blood pressure and cholesterol, atherosclerosis and cancer. For all that, there are no comprehensive databases available for Expressed Sequence Tags(EST) of garlic for gene discovery and future efforts of genome annotation. That is why we developed a new garlic database and applications to enable comprehensive analysis of garlic gene expression. GarlicESTdb is an integrated database and mining tool for large-scale garlic (Allium sativum) EST sequencing. A total of 21,595 ESTs collected from an in-house cDNA library were used to construct the database. The analysis pipeline is an automated system written in JAVA and consists of the following components: automatic preprocessing of EST reads, assembly of raw sequences, annotation of the assembled sequences, storage of the analyzed information into MySQL databases, and graphic display of all processed data. A web application was implemented with the latest J2EE (Java 2 Platform Enterprise Edition) software technology (JSP/EJB/JavaServlet) for browsing and querying the database, for creation of dynamic web pages on the client side, and for mapping annotated enzymes to KEGG pathways, the AJAX framework was also used partially. The online resources, such as putative annotation, single nucleotide polymorphisms (SNP) and tandem repeat data sets, can be searched by text, explored on the website, searched using BLAST, and downloaded. To archive more significant BLAST results, a curation system was introduced with which biologists can easily edit best-hit annotation information for others to view. The GarlicESTdb web application is freely available at http://garlicdb.kribb.re.kr. GarlicESTdb is the first incorporated online information database of EST sequences isolated from garlic that can be freely accessed and downloaded. It has many useful features for interactive mining of EST contigs and datasets from each library, including curation of annotated information, expression profiling, information retrieval, and summary of statistics of functional annotation. Consequently, the development of GarlicESTdb will provide a crucial contribution to biologists for data-mining and more efficient experimental studies.

  2. Inventory of high-abundance mRNAs in skeletal muscle of normal men.

    PubMed

    Welle, S; Bhatt, K; Thornton, C A

    1999-05-01

    G42875rial analysis of gene expression (SAGE) method was used to generate a catalog of 53,875 short (14 base) expressed sequence tags from polyadenylated RNA obtained from vastus lateralis muscle of healthy young men. Over 12,000 unique tags were detected. The frequency of occurrence of each tag reflects the relative abundance of the corresponding mRNA. The mRNA species that were detected 10 or more times, each comprising >/=0.02% of the mRNA population, accounted for 64% of the mRNA mass but <10% of the total number of mRNA species detected. Almost all of the abundant tags matched mRNA or EST sequences cataloged in GenBank. Mitochondrial transcripts accounted for approximately 20% of the polyadenylated RNA. Transcripts encoding proteins of the myofibrils were the most abundant nuclear-encoded mRNAs. Transcripts encoding ribosomal proteins, and those encoding proteins involved in energy metabolism, also were very abundant. The database can be used as a reference for investigations of alterations in gene expression associated with conditions that influence muscle function, such as muscular dystrophies, aging, and exercise.

  3. Prediction of EST functional relationships via literature mining with user-specified parameters.

    PubMed

    Wang, Hei-Chia; Huang, Tian-Hsiang

    2009-04-01

    The massive amount of expressed sequence tags (ESTs) gathered over recent years has triggered great interest in efficient applications for genomic research. In particular, EST functional relationships can be used to determine a possible gene network for biological processes of interest. In recent years, many researchers have tried to determine EST functional relationships by analyzing the biological literature. However, it has been challenging to find efficient prediction methods. Moreover, an annotated EST is usually associated with many functions, so successful methods must be able to distinguish between relevant and irrelevant functions based on user specifications. This paper proposes a method to discover functional relationships between ESTs of interest by analyzing literature from the Medical Literature Analysis and Retrieval System Online, with user-specified parameters for selecting keywords. This method performs better than the multiple kernel documents method in setting up a specific threshold for gathering materials. The method is also able to uncover known functional relationships, as shown by a comparison with the Kyoto Encyclopedia of Genes and Genomes database. The reliable EST relationships predicted by the proposed method can help to construct gene networks for specific biological functions of interest.

  4. The barley EST DNA Replication and Repair Database (bEST-DRRD) as a tool for the identification of the genes involved in DNA replication and repair.

    PubMed

    Gruszka, Damian; Marzec, Marek; Szarejko, Iwona

    2012-06-14

    The high level of conservation of genes that regulate DNA replication and repair indicates that they may serve as a source of information on the origin and evolution of the species and makes them a reliable system for the identification of cross-species homologs. Studies that had been conducted to date shed light on the processes of DNA replication and repair in bacteria, yeast and mammals. However, there is still much to be learned about the process of DNA damage repair in plants. These studies, which were conducted mainly using bioinformatics tools, enabled the list of genes that participate in various pathways of DNA repair in Arabidopsis thaliana (L.) Heynh to be outlined; however, information regarding these mechanisms in crop plants is still very limited. A similar, functional approach is particularly difficult for a species whose complete genomic sequences are still unavailable. One of the solutions is to apply ESTs (Expressed Sequence Tags) as the basis for gene identification. For the construction of the barley EST DNA Replication and Repair Database (bEST-DRRD), presented here, the Arabidopsis nucleotide and protein sequences involved in DNA replication and repair were used to browse for and retrieve the deposited sequences, derived from four barley (Hordeum vulgare L.) sequence databases, including the "Barley Genome version 0.05" database (encompassing ca. 90% of barley coding sequences) and from two databases covering the complete genomes of two monocot models: Oryza sativa L. and Brachypodium distachyon L. in order to identify homologous genes. Sequences of the categorised Arabidopsis queries are used for browsing the repositories, which are located on the ViroBLAST platform. The bEST-DRRD is currently used in our project during the identification and validation of the barley genes involved in DNA repair. The presented database provides information about the Arabidopsis genes involved in DNA replication and repair, their expression patterns and models of protein interactions. It was designed and established to provide an open-access tool for the identification of monocot homologs of known Arabidopsis genes that are responsible for DNA-related processes. The barley genes identified in the project are currently being analysed to validate their function.

  5. Analysis of expressed sequence tags for Frankliniella occidentalis, the western flower thrips.

    PubMed

    Rotenberg, D; Whitfield, A E

    2010-08-01

    Thrips are members of the insect order Thysanoptera and Frankliniella occidentalis (the western flower thrips) is the most economically important pest within this order. F. occidentalis is both a direct pest of crops and an efficient vector of plant viruses, including Tomato spotted wilt virus (TSWV). Despite the world-wide importance of thrips in agriculture, there is little knowledge of the F. occidentalis genome or gene functions at this time. A normalized cDNA library was constructed from first instar thrips and 13 839 expressed sequence tags (ESTs) were obtained. Our EST data assembled into 894 contigs and 11 806 singletons (12 700 nonredundant sequences). We found that 31% of these sequences had significant similarity (E< or = 10(-10)) to protein sequences in the National Center for Biotechnology Information nonredundant (nr) protein database, and 25% were functionally annotated using Blast 2GO. We identified 74 sequences with putative homology to proteins associated with insect innate immunity. Sixteen sequences had significant similarity to proteins associated with small RNA-mediated gene silencing pathways (RNA interference; RNAi), including the antiviral pathway (short interfering RNA-mediated pathway). Our EST collection provides new sequence resources for characterizing gene functions in F. occidentalis and other thrips species with regards to vital biological processes, studying the mechanism of interactions with the viruses harboured and transmitted by the vector, and identifying new insect gene-centred targets for plant disease and insect control.

  6. The characterisation of novel secreted Ly-6 proteins from rat urine by the combined use of two-dimensional gel electrophoresis, microbore high performance liquid chromatography and expressed sequence tag data.

    PubMed

    Southan, Christopher; Cutler, Paul; Birrell, Helen; Connell, John; Fantom, Kenneth G M; Sims, Matthew; Shaikh, Narjis; Schneider, Klaus

    2002-02-01

    A proteomic study of rat urine was undertaken using two-dimensional gel electrophoresis, microbore high performance liquid chromatography, mass spectrometry and N-terminal sequencing. Five known urinary proteins were identified but two novel peptide fragments matched a large number of rat expressed sequence tags (ESTs) from a liver library. By combining protein chemical and nucleotide data, two 101-residue open reading frames with 90% amino acid identity were determined, rat urinary protein 1 (RUP-1) and RUP-2. The data established signal peptide removal and provided evidence for N-glycosylation. A third related sequence, rat spleen protein (RSP-1) was confirmed from EST searches. These three proteins have been submitted to SWISS-PROT as P81827, P81828 and Q9QXN2, respectively. A fourth novel homologue was found in porcine and bovine ESTs from embryo libraries. Alignment with known homologues showed conserved cysteine positions characteristic of a secreted subfamily of Ly-6 proteins. In two cases, antineoplastic urinary protein and caltrin, these homologues have unverified functional annotations. The RUP sequences showed high scoring matches to three unrelated rat mRNAs subsequently established to be chimeric. Two of these share extended sectional identity to RUP-1 but the third may represent another novel Ly-6 homologue. These chimeras have caused serious annotation errors in secondary databases.

  7. ESAP plus: a web-based server for EST-SSR marker development.

    PubMed

    Ponyared, Piyarat; Ponsawat, Jiradej; Tongsima, Sissades; Seresangtakul, Pusadee; Akkasaeng, Chutipong; Tantisuwichwong, Nathpapat

    2016-12-22

    Simple sequence repeats (SSRs) have become widely used as molecular markers in plant genetic studies due to their abundance, high allelic variation at each locus and simplicity to analyze using conventional PCR amplification. To study plants with unknown genome sequence, SSR markers from Expressed Sequence Tags (ESTs), which can be obtained from the plant mRNA (converted to cDNA), must be utilized. With the advent of high-throughput sequencing technology, huge EST sequence data have been generated and are now accessible from many public databases. However, SSR marker identification from a large in-house or public EST collection requires a computational pipeline that makes use of several standard bioinformatic tools to design high quality EST-SSR primers. Some of these computational tools are not users friendly and must be tightly integrated with reference genomic databases. A web-based bioinformatic pipeline, called EST Analysis Pipeline Plus (ESAP Plus), was constructed for assisting researchers to develop SSR markers from a large EST collection. ESAP Plus incorporates several bioinformatic scripts and some useful standard software tools necessary for the four main procedures of EST-SSR marker development, namely 1) pre-processing, 2) clustering and assembly, 3) SSR mining and 4) SSR primer design. The proposed pipeline also provides two alternative steps for reducing EST redundancy and identifying SSR loci. Using public sugarcane ESTs, ESAP Plus automatically executed the aforementioned computational pipeline via a simple web user interface, which was implemented using standard PHP, HTML, CSS and Java scripts. With ESAP Plus, users can upload raw EST data and choose various filtering options and parameters to analyze each of the four main procedures through this web interface. All input EST data and their predicted SSR results will be stored in the ESAP Plus MySQL database. Users will be notified via e-mail when the automatic process is completed and they can download all the results through the web interface. ESAP Plus is a comprehensive and convenient web-based bioinformatic tool for SSR marker development. ESAP Plus offers all necessary EST-SSR development processes with various adjustable options that users can easily use to identify SSR markers from a large EST collection. With familiar web interface, users can upload the raw EST using the data submission page and visualize/download the corresponding EST-SSR information from within ESAP Plus. ESAP Plus can handle considerably large EST datasets. This EST-SSR discovery tool can be accessed directly from: http://gbp.kku.ac.th/esap_plus/ .

  8. Exploiting a wheat EST database to assess genetic diversity

    PubMed Central

    2010-01-01

    Expressed sequence tag (EST) markers have been used to assess variety and genetic diversity in wheat (Triticum aestivum). In this study, 1549 ESTs from wheat infested with yellow rust were used to examine the genetic diversity of six susceptible and resistant wheat cultivars. The aim of using these cultivars was to improve the competitiveness of public wheat breeding programs through the intensive use of modern, particularly marker-assisted, selection technologies. The F2 individuals derived from cultivar crosses were screened for resistance to yellow rust at the seedling stage in greenhouses and adult stage in the field to identify DNA markers genetically linked to resistance. Five hundred and sixty ESTs were assembled into 136 contigs and 989 singletons. BlastX search results showed that 39 (29%) contigs and 96 (10%) singletons were homologous to wheat genes. The database-matched contigs and singletons were assigned to eight functional groups related to protein synthesis, photosynthesis, metabolism and energy, stress proteins, transporter proteins, protein breakdown and recycling, cell growth and division and reactive oxygen scavengers. PCR analyses with primers based on the contigs and singletons showed that the most polymorphic functional categories were photosynthesis (contigs) and metabolism and energy (singletons). EST analysis revealed considerable genetic variability among the Turkish wheat cultivars resistant and susceptible to yellow rust disease and allowed calculation of the mean genetic distance between cultivars, with the greatest similarity (0.725) being between Harmankaya99 and Sönmez2001, and the lowest (0.622) between Aytin98 and Izgi01. PMID:21637582

  9. Exploiting a wheat EST database to assess genetic diversity.

    PubMed

    Karakas, Ozge; Gurel, Filiz; Uncuoglu, Ahu Altinkut

    2010-10-01

    Expressed sequence tag (EST) markers have been used to assess variety and genetic diversity in wheat (Triticum aestivum). In this study, 1549 ESTs from wheat infested with yellow rust were used to examine the genetic diversity of six susceptible and resistant wheat cultivars. The aim of using these cultivars was to improve the competitiveness of public wheat breeding programs through the intensive use of modern, particularly marker-assisted, selection technologies. The F(2) individuals derived from cultivar crosses were screened for resistance to yellow rust at the seedling stage in greenhouses and adult stage in the field to identify DNA markers genetically linked to resistance. Five hundred and sixty ESTs were assembled into 136 contigs and 989 singletons. BlastX search results showed that 39 (29%) contigs and 96 (10%) singletons were homologous to wheat genes. The database-matched contigs and singletons were assigned to eight functional groups related to protein synthesis, photosynthesis, metabolism and energy, stress proteins, transporter proteins, protein breakdown and recycling, cell growth and division and reactive oxygen scavengers. PCR analyses with primers based on the contigs and singletons showed that the most polymorphic functional categories were photosynthesis (contigs) and metabolism and energy (singletons). EST analysis revealed considerable genetic variability among the Turkish wheat cultivars resistant and susceptible to yellow rust disease and allowed calculation of the mean genetic distance between cultivars, with the greatest similarity (0.725) being between Harmankaya99 and Sönmez2001, and the lowest (0.622) between Aytin98 and Izgi01.

  10. Compressing DNA sequence databases with coil.

    PubMed

    White, W Timothy J; Hendy, Michael D

    2008-05-20

    Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression - an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression - the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  11. Compressing DNA sequence databases with coil

    PubMed Central

    White, W Timothy J; Hendy, Michael D

    2008-01-01

    Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work. PMID:18489794

  12. PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes

    PubMed Central

    Wang, Ruijia; Nambiar, Ram; Zheng, Dinghai

    2018-01-01

    Abstract PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. First, PASs are mapped by the 3′ region extraction and deep sequencing (3′READS) method, ensuring unequivocal PAS identification. Second, a large volume of data based on diverse biological samples increases PAS coverage by 3.5-fold over the EST-based version and provides PAS usage information. Third, strand-specific RNA-seq data are used to extend annotated 3′ ends of genes to obtain more thorough annotations of alternative polyadenylation (APA) sites. Fourth, conservation information of PAS across mammals sheds light on significance of APA sites. The database (URL: http://www.polya-db.org/v3) currently holds PASs in human, mouse, rat and chicken, and has links to the UCSC genome browser for further visualization and for integration with other genomic data. PMID:29069441

  13. Systematic sequencing of mRNA from the Antarctic krill (Euphausia superba) and first tissue specific transcriptional signature

    PubMed Central

    De Pittà, Cristiano; Bertolucci, Cristiano; Mazzotta, Gabriella M; Bernante, Filippo; Rizzo, Giorgia; De Nardi, Barbara; Pallavicini, Alberto; Lanfranchi, Gerolamo; Costa, Rodolfo

    2008-01-01

    Background Little is known about the genome sequences of Euphausiacea (krill) although these crustaceans are abundant components of the pelagic ecosystems in all oceans and used for aquaculture and pharmaceutical industry. This study reports the results of an expressed sequence tag (EST) sequencing project from different tissues of Euphausia superba (the Antarctic krill). Results We have constructed and sequenced five cDNA libraries from different Antarctic krill tissues: head, abdomen, thoracopods and photophores. We have identified 1.770 high-quality ESTs which were assembled into 216 overlapping clusters and 801 singletons resulting in a total of 1.017 non-redundant sequences. Quantitative RT-PCR analysis was performed to quantify and validate the expression levels of ten genes presenting different EST countings in krill tissues. In addition, bioinformatic screening of the non-redundant E. superba sequences identified 69 microsatellite containing ESTs. Clusters, consensuses and related similarity and gene ontology searches were organized in a dedicated E. superba database . Conclusion We defined the first tissue transcriptional signatures of E. superba based on functional categorization among the examined tissues. The analyses of annotated transcripts showed a higher similarity with genes from insects with respect to Malacostraca possibly as an effect of the limited number of Malacostraca sequences in the public databases. Our catalogue provides for the first time a genomic tool to investigate the biology of the Antarctic krill. PMID:18226200

  14. Construction of a cDNA library from female adult of Toxocara canis, and analysis of EST and immune-related genes expressions.

    PubMed

    Zhou, Rongqiong; Xia, Qingyou; Huang, Hancheng; Lai, Min; Wang, Zhenxin

    2011-10-01

    Toxocara canis is a widespread intestinal nematode parasite of dogs, which can also cause disease in humans. We employed an expressed sequence tag (EST) strategy in order to study gene-expression including development, digestion and reproduction of T. canis. ESTs provided a rapid way to identify genes, particularly in organisms for which we have very little molecular information. In this study, a cDNA library was constructed from a female adult of T. canis and 215 high-quality ESTs from 5'-ends of the cDNA clones representing 79 unigenes were obtained. The titer of the primary cDNA library was 1.83×10(6)pfu/mL with a recombination rate of 99.33%. Most of the sequences ranged from 300 to 900bp with an average length of 656bp. Cluster analysis of these ESTs allowed identification of 79 unique sequences containing 28 contigs and 51 singletons. BLASTX searches revealed that 18 unigenes (22.78% of the total) or 70 ESTs (32.56% of the total) were novel genes that had no significant matches to any protein sequences in the public databases. The rest of the 61 unigenes (77.22% of the total) or 145 ESTs (67.44% of the total) were closely matched to the known genes or sequences deposited in the public databases. These genes were classified into seven groups based on their known or putative biological functions. We also confirmed the gene expression patterns of several immune-related genes using RT-PCR examination. This work will provide a valuable resource for the further investigations in the stage-, sex- and tissue-specific gene transcription or expression. Copyright © 2011. Published by Elsevier Inc.

  15. Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags

    PubMed Central

    Gorodkin, Jan; Cirera, Susanna; Hedegaard, Jakob; Gilchrist, Michael J; Panitz, Frank; Jørgensen, Claus; Scheibye-Knudsen, Karsten; Arvin, Troels; Lumholdt, Steen; Sawera, Milena; Green, Trine; Nielsen, Bente J; Havgaard, Jakob H; Rosenkilde, Carina; Wang, Jun; Li, Heng; Li, Ruiqiang; Liu, Bin; Hu, Songnian; Dong, Wei; Li, Wei; Yu, Jun; Wang, Jian; Stærfeldt, Hans-Henrik; Wernersson, Rasmus; Madsen, Lone B; Thomsen, Bo; Hornshøj, Henrik; Bujie, Zhan; Wang, Xuegang; Wang, Xuefei; Bolund, Lars; Brunak, Søren; Yang, Huanming; Bendixen, Christian; Fredholm, Merete

    2007-01-01

    Background Knowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages. Results Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which approximately 25% have a high confidence match to UniProt. Approximately 6,000 new porcine gene clusters were identified. Expression analysis based on the non-normalized libraries resulted in the following findings. The distribution of cluster sizes is scaling invariant. Brain and testes are among the tissues with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression of genes between different tissues, in particular brain/spinal cord, and found patterns of correlation between genes that share expression in pairs of libraries. Finally, there was remarkable agreement in expression between specialized tissues according to Gene Ontology categories. Conclusion This EST collection, the largest to date in pig, represents an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies. PMID:17407547

  16. Gene discovery in Eimeria tenella by immunoscreening cDNA expression libraries of sporozoites and schizonts with chicken intestinal antibodies.

    PubMed

    Réfega, Susana; Girard-Misguich, Fabienne; Bourdieu, Christiane; Péry, Pierre; Labbé, Marie

    2003-04-02

    Specific antibodies were produced ex vivo from intestinal culture of Eimeria tenella infected chickens. The specificity of these intestinal antibodies was tested against different parasite stages. These antibodies were used to immunoscreen first generation schizont and sporozoite cDNA libraries permitting the identification of new E. tenella antigens. We obtained a total of 119 cDNA clones which were subjected to sequence analysis. The sequences coding for the proteins inducing local immune responses were compared with nucleotide or protein databases and with expressed sequence tags (ESTs) databases. We identified new Eimeria genes coding for heat shock proteins, a ribosomal protein, a pyruvate kinase and a pyridoxine kinase. Specific features of other sequences are discussed.

  17. Two EST-derived marker systems for cultivar identification in tree peony.

    PubMed

    Zhang, J J; Shu, Q Y; Liu, Z A; Ren, H X; Wang, L S; De Keyser, E

    2012-02-01

    Tree peony (Paeonia suffruticosa Andrews), a woody deciduous shrub, belongs to the section Moutan DC. in the genus of Paeonia of the Paeoniaceae family. To increase the efficiency of breeding, two EST-derived marker systems were developed based on a tree peony expressed sequence tag (EST) database. Using target region amplification polymorphism (TRAP), 19 of 39 primer pairs showed good amplification for 56 accessions with amplicons ranging from 120 to 3,000 bp long, among which 99.3% were polymorphic. In contrast, 7 of 21 primer pairs demonstrated adequate amplification with clear bands for simple sequence repeats (SSRs) developed from ESTs, and a total of 33 alleles were found in 56 accessions. The similarity matrices generated by TRAP and EST-SSR markers were compared, and the Mantel test (r = 0.57778, P = 0.0020) showed a moderate correlation between the two types of molecular markers. TRAP markers were suitable for DNA fingerprinting and EST-SSR markers were more appropriate for discriminating synonyms (the same cultivars with different names due to limited information exchanged among different geographic areas). The two sets of EST-derived markers will be used further for genetic linkage map construction and quantitative trait locus detection in tree peony.

  18. [Multiplexing mapping of human cDNAs]. Final report, September 1, 1991--February 28, 1994

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    Using PCR with automated product analysis, 329 human brain cDNA sequences have been assigned to individual human chromosomes. Primers were designed from single-pass cDNA sequences expressed sequence tags (ESTs). Primers were used in PCR reactions with DNA from somatic cell hybrid mapping panels as templates, often with multiplexing. Many ESTs mapped match sequence database records. To evaluate of these matches, the position of the primers relative to the matching region (In), the BLAST scores and the Poisson probability values of the EST/sequence record match were determined. In cases where the gene product was stringently identified by the sequence match hadmore » already been mapped, the gene locus determined by EST was consistent with the previous position which strongly supports the validity of assigning unknown genes to human chromosomes based on the EST sequence matches. In the present cases mapping the ESTs to a chromosome can also be considered to have mapped the known gene product: rolipram-sensitive cAMP phosphodiesterase, chromosome 1; protein phosphatase 2A{beta}, chromosome 4; alpha-catenin, chromosome 5; the ELE1 oncogene, chromosome 10q11.2 or q2.1-q23; MXII protein, chromosome l0q24-qter; ribosomal protein L18a homologue, chromosome 14; ribosomal protein L3, chromosome 17; and moesin, Xp11-cen. There were also ESTs mapped that were closely related to non-human sequence records. These matches therefore can be considered to identify human counterparts of known gene products, or members of known gene families. Examples of these include membrane proteins, translation-associated proteins, structural proteins, and enzymes. These data then demonstrate that single pass sequence information is sufficient to design PCR primers useful for assigning cDNA sequences to human chromosomes. When the EST sequence matches previous sequence database records, the chromosome assignments of the EST can be used to make preliminary assignments of the human gene to a chromosome.« less

  19. Transposon tagging and the study of root development in Arabidopsis

    NASA Technical Reports Server (NTRS)

    Tsugeki, R.; Olson, M. L.; Fedoroff, N. V.

    1998-01-01

    The maize Ac-Ds transposable element family has been used as the basis of transposon mutagenesis systems that function in a variety of plants, including Arabidopsis. We have developed modified transposons and methods which simplify the detection, cloning and analysis of insertion mutations. We have identified and are analyzing two plant lines in which genes expressed either in the root cap cells or in the quiescent cells, cortex/endodermal initial cells and columella cells of the root cap have been tagged with a transposon carrying a reporter gene. A gene expressed in root cap cells tagged with an enhancer-trap Ds was isolated and its corresponding EST cDNA was identified. Nucleotide and deduced amino acid sequences of the gene show no significant similarity to other genes in the database. Genetic ablation experiments have been done by fusing a root cap-specific promoter to the diphtheria toxin A-chain gene and introducing the fusion construct into Arabidopsis plants. We find that in addition to eliminating gravitropism, root cap ablation inhibits elongation of roots by lowering root meristematic activities.

  20. Mining SNPs from EST sequences using filters and ensemble classifiers.

    PubMed

    Wang, J; Zou, Q; Guo, M Z

    2010-05-04

    Abundant single nucleotide polymorphisms (SNPs) provide the most complete information for genome-wide association studies. However, due to the bottleneck of manual discovery of putative SNPs and the inaccessibility of the original sequencing reads, it is essential to develop a more efficient and accurate computational method for automated SNP detection. We propose a novel computational method to rapidly find true SNPs in public-available EST (expressed sequence tag) databases; this method is implemented as SNPDigger. EST sequences are clustered and aligned. SNP candidates are then obtained according to a measure of redundant frequency. Several new informative biological features, such as the structural neighbor profiles and the physical position of the SNP, were extracted from EST sequences, and the effectiveness of these features was demonstrated. An ensemble classifier, which employs a carefully selected feature set, was included for the imbalanced training data. The sensitivity and specificity of our method both exceeded 80% for human genetic data in the cross validation. Our method enables detection of SNPs from the user's own EST dataset and can be used on species for which there is no genome data. Our tests showed that this method can effectively guide SNP discovery in ESTs and will be useful to avoid and save the cost of biological analyses.

  1. NemaPath: online exploration of KEGG-based metabolic pathways for nematodes

    PubMed Central

    Wylie, Todd; Martin, John; Abubucker, Sahar; Yin, Yong; Messina, David; Wang, Zhengyuan; McCarter, James P; Mitreva, Makedonka

    2008-01-01

    Background Nematode.net is a web-accessible resource for investigating gene sequences from parasitic and free-living nematode genomes. Beyond the well-characterized model nematode C. elegans, over 500,000 expressed sequence tags (ESTs) and nearly 600,000 genome survey sequences (GSSs) have been generated from 36 nematode species as part of the Parasitic Nematode Genomics Program undertaken by the Genome Center at Washington University School of Medicine. However, these sequencing data are not present in most publicly available protein databases, which only include sequences in Swiss-Prot. Swiss-Prot, in turn, relies on GenBank/Embl/DDJP for predicted proteins from complete genomes or full-length proteins. Description Here we present the NemaPath pathway server, a web-based pathway-level visualization tool for navigating putative metabolic pathways for over 30 nematode species, including 27 parasites. The NemaPath approach consists of two parts: 1) a backend tool to align and evaluate nematode genomic sequences (curated EST contigs) against the annotated Kyoto Encyclopedia of Genes and Genomes (KEGG) protein database; 2) a web viewing application that displays annotated KEGG pathway maps based on desired confidence levels of primary sequence similarity as defined by a user. NemaPath also provides cross-referenced access to nematode genome information provided by other tools available on Nematode.net, including: detailed NemaGene EST cluster information; putative translations; GBrowse EST cluster views; links from nematode data to external databases for corresponding synonymous C. elegans counterparts, subject matches in KEGG's gene database, and also KEGG Ontology (KO) identification. Conclusion The NemaPath server hosts metabolic pathway mappings for 30 nematode species and is available on the World Wide Web at . The nematode source sequences used for the metabolic pathway mappings are available via FTP , as provided by the Genome Center at Washington University School of Medicine. PMID:18983679

  2. MELOGEN: an EST database for melon functional genomics

    PubMed Central

    Gonzalez-Ibeas, Daniel; Blanca, José; Roig, Cristina; González-To, Mireia; Picó, Belén; Truniger, Verónica; Gómez, Pedro; Deleu, Wim; Caño-Delgado, Ana; Arús, Pere; Nuez, Fernando; Garcia-Mas, Jordi; Puigdomènech, Pere; Aranda, Miguel A

    2007-01-01

    Background Melon (Cucumis melo L.) is one of the most important fleshy fruits for fresh consumption. Despite this, few genomic resources exist for this species. To facilitate the discovery of genes involved in essential traits, such as fruit development, fruit maturation and disease resistance, and to speed up the process of breeding new and better adapted melon varieties, we have produced a large collection of expressed sequence tags (ESTs) from eight normalized cDNA libraries from different tissues in different physiological conditions. Results We determined over 30,000 ESTs that were clustered into 16,637 non-redundant sequences or unigenes, comprising 6,023 tentative consensus sequences (contigs) and 10,614 unclustered sequences (singletons). Many potential molecular markers were identified in the melon dataset: 1,052 potential simple sequence repeats (SSRs) and 356 single nucleotide polymorphisms (SNPs) were found. Sixty-nine percent of the melon unigenes showed a significant similarity with proteins in databases. Functional classification of the unigenes was carried out following the Gene Ontology scheme. In total, 9,402 unigenes were mapped to one or more ontology. Remarkably, the distributions of melon and Arabidopsis unigenes followed similar tendencies, suggesting that the melon dataset is representative of the whole melon transcriptome. Bioinformatic analyses primarily focused on potential precursors of melon micro RNAs (miRNAs) in the melon dataset, but many other genes potentially controlling disease resistance and fruit quality traits were also identified. Patterns of transcript accumulation were characterised by Real-Time-qPCR for 20 of these genes. Conclusion The collection of ESTs characterised here represents a substantial increase on the genetic information available for melon. A database (MELOGEN) which contains all EST sequences, contig images and several tools for analysis and data mining has been created. This set of sequences constitutes also the basis for an oligo-based microarray for melon that is being used in experiments to further analyse the melon transcriptome. PMID:17767721

  3. Genome-wide characterization and selection of expressed sequence tag simple sequence repeat primers for optimized marker distribution and reliability in peach

    USDA-ARS?s Scientific Manuscript database

    Expressed sequence tag (EST) simple sequence repeats (SSRs) in Prunus were mined, and flanking primers designed and used for genome-wide characterization and selection of primers to optimize marker distribution and reliability. A total of 12,618 contigs were assembled from 84,727 ESTs, along with 34...

  4. NEIBank: Genomics and bioinformatics resources for vision research

    PubMed Central

    Peterson, Katherine; Gao, James; Buchoff, Patee; Jaworski, Cynthia; Bowes-Rickman, Catherine; Ebright, Jessica N.; Hauser, Michael A.; Hoover, David

    2008-01-01

    NEIBank is an integrated resource for genomics and bioinformatics in vision research. It includes expressed sequence tag (EST) data and sequence-verified cDNA clones for multiple eye tissues of several species, web-based access to human eye-specific SAGE data through EyeSAGE, and comprehensive, annotated databases of known human eye disease genes and candidate disease gene loci. All expression- and disease-related data are integrated in EyeBrowse, an eye-centric genome browser. NEIBank provides a comprehensive overview of current knowledge of the transcriptional repertoires of eye tissues and their relation to pathology. PMID:18648525

  5. Analysis of expressed sequence tags from Maize mosaic rhabdovirus-infected gut tissues of Peregrinus maidis reveals the presence of key components of insect innate immunity.

    PubMed

    Whitfield, A E; Rotenberg, D; Aritua, V; Hogenhout, S A

    2011-04-01

    The corn planthopper, Peregrinus maidis, causes direct feeding damage to plants and transmits Maize mosaic rhabdovirus (MMV) in a persistent-propagative manner. MMV must cross several insect tissue layers for successful transmission to occur, and the gut serves as an important barrier for rhabdovirus transmission. In order to facilitate the identification of proteins that may interact with MMV either by facilitating acquisition or responding to virus infection, we generated and analysed the gut transcriptome of P. maidis. From two normalized cDNA libraries, we generated a P. maidis gut transcriptome composed of 20,771 expressed sequence tags (ESTs). Assembly of the sequences yielded 1860 contigs and 14,032 singletons, and biological roles were assigned to 5793 (36%). Comparison of P. maidis ESTs with other insect amino acid sequences revealed that P. maidis shares greatest sequence similarity with another hemipteran, the brown planthopper Nilaparvata lugens. We identified 202 P. maidis transcripts with putative homology to proteins associated with insect innate immunity, including those implicated in the Toll, Imd, JAK/STAT, Jnk and the small-interfering RNA-mediated pathways. Sequence comparisons between our P. maidis gut EST collection and the currently available National Center for Biotechnology Information EST database collection for Ni. lugens revealed that a pathogen recognition receptor in the Imd pathway, peptidoglycan recognition protein-long class (PGRP-LC), is present in these two members of the family Delphacidae; however, these recognition receptors are lacking in the model hemipteran Acyrthosiphon pisum. In addition, we identified sequences in the P. maidis gut transcriptome that share significant amino acid sequence similarities with the rhabdovirus receptor molecule, acetylcholine receptor (AChR), found in other hosts. This EST analysis sheds new light on immune response pathways in hemipteran guts that will be useful for further dissecting innate defence response pathways to rhabdovirus infection. © 2011 The Authors. Insect Molecular Biology © 2011 The Royal Entomological Society.

  6. De Novo Transcriptome Sequencing Reveals Important Molecular Networks and Metabolic Pathways of the Plant, Chlorophytum borivilianum

    PubMed Central

    Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

    2013-01-01

    Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum. PMID:24376689

  7. Isolation and characterization of a Ds-tagged rice (Oryza sativa L.) GA-responsive dwarf mutant defective in an early step of the gibberellin biosynthesis pathway.

    PubMed

    Margis-Pinheiro, Marcia; Zhou, Xue-Rong; Zhu, Qian-Hao; Dennis, Elizabeth S; Upadhyaya, Narayana M

    2005-03-01

    We have isolated a severe dwarf transposon (Ds) insertion mutant in rice (Oryza sativa L.), which could be differentiated early in the seedling stage by reduced shoot growth and dark green leaves, and later by severe dwarfism and failure to initiate flowering. These mutants, however, showed normal seed germination and root growth. One of the sequences flanking Ds, rescued from the mutant, was of a chromosome 4-located putative ent-kaurene synthase (KS) gene, encoding the enzyme catalyzing the second step of the gibberellin (GA) biosynthesis pathway. Dwarf mutants were always homozygous for this Ds insertion and no normal plants homozygous for this mutation were recovered in the segregating progeny, indicating that the Ds insertion mutation is recessive. As mutations in three recently reported rice GA-responsive dwarf mutant alleles and the dwarf mutation identified in this study mapped to the same locus, we designate the corresponding gene OsKS1. The osks1 mutant seedlings were responsive to exogenous gibberellin (GA3). OsKS1 transcripts of about 2.3 kb were detected in leaves and stem of wild-type plants, but not in germinating seeds or roots, suggesting that OsKS1 is not involved in germination or root growth. There are at least five OsKS1-like genes in the rice genome, four of which are also represented in rice expressed sequence tag (EST) databases. All OsKS1-like genes are transcribed with different expression patterns. ESTs corresponding to all six OsKS genes are represented in other cereal databases including barley, wheat and maize, suggesting that they are biologically active.

  8. De Novo transcriptome sequencing reveals important molecular networks and metabolic pathways of the plant, Chlorophytum borivilianum.

    PubMed

    Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

    2013-01-01

    Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum.

  9. Annotated ESTs from various tissues of the brown planthopper Nilaparvata lugens: a genomic resource for studying agricultural pests.

    PubMed

    Noda, Hiroaki; Kawai, Sawako; Koizumi, Yoko; Matsui, Kageaki; Zhang, Qiang; Furukawa, Shigetoyo; Shimomura, Michihiko; Mita, Kazuei

    2008-03-03

    The brown planthopper (BPH), Nilaparvata lugens (Hemiptera, Delphacidae), is a serious insect pests of rice plants. Major means of BPH control are application of agricultural chemicals and cultivation of BPH resistant rice varieties. Nevertheless, BPH strains that are resistant to agricultural chemicals have developed, and BPH strains have appeared that are virulent against the resistant rice varieties. Expressed sequence tag (EST) analysis and related applications are useful to elucidate the mechanisms of resistance and virulence and to reveal physiological aspects of this non-model insect, with its poorly understood genetic background. More than 37,000 high-quality ESTs, excluding sequences of mitochondrial genome, microbial genomes, and rDNA, have been produced from 18 libraries of various BPH tissues and stages. About 10,200 clusters have been made from whole EST sequences, with average EST size of 627 bp. Among the top ten most abundantly expressed genes, three are unique and show no homology in BLAST searches. The actin gene was highly expressed in BPH, especially in the thorax. Tissue-specifically expressed genes were extracted based on the expression frequency among the libraries. An EST database is available at our web site. The EST library will provide useful information for transcriptional analyses, proteomic analyses, and gene functional analyses of BPH. Moreover, specific genes for hemimetabolous insects will be identified. The microarray fabricated based on the EST information will be useful for finding genes related to agricultural and biological problems related to this pest.

  10. The ATLAS TAGS database distribution and management - Operational challenges of a multi-terabyte distributed database

    NASA Astrophysics Data System (ADS)

    Viegas, F.; Malon, D.; Cranshaw, J.; Dimitrov, G.; Nowak, M.; Nairz, A.; Goossens, L.; Gallas, E.; Gamboa, C.; Wong, A.; Vinek, E.

    2010-04-01

    The TAG files store summary event quantities that allow a quick selection of interesting events. This data will be produced at a nominal rate of 200 Hz, and is uploaded into a relational database for access from websites and other tools. The estimated database volume is 6TB per year, making it the largest application running on the ATLAS relational databases, at CERN and at other voluntary sites. The sheer volume and high rate of production makes this application a challenge to data and resource management, in many aspects. This paper will focus on the operational challenges of this system. These include: uploading the data from files to the CERN's and remote sites' databases; distributing the TAG metadata that is essential to guide the user through event selection; controlling resource usage of the database, from the user query load to the strategy of cleaning and archiving of old TAG data.

  11. Analysis of expressed sequence tags from a single wheat cultivar facilitates interpretation of tandem mass spectrometry data and discrimination of gamma gliadin proteins that may play different functional roles in flour

    USDA-ARS?s Scientific Manuscript database

    The complement of gamma gliadin genes expressed in the wheat cultivar Butte 86 was evaluated by analyzing publicly available expressed sequence tag (EST) data. Eleven contigs were assembled from 153 Butte 86 ESTs. Nine of the contigs encoded full-length proteins and four of the proteins contained an...

  12. An EST dataset for Metasequoia glyptostroboides buds: the first EST resource for molecular genomics studies in Metasequoia.

    PubMed

    Zhao, Ying; Thammannagowda, Shivegowda; Staton, Margaret; Tang, Sha; Xia, Xinli; Yin, Weilun; Liang, Haiying

    2013-03-01

    The "living fossil" Metasequoia glyptostroboides Hu et Cheng, commonly known as dawn redwood or Chinese redwood, is the only living species in the genus and is valued for its essential oil and crude extracts that have great potential for anti-fungal activity. Despite its paleontological significance and economical value as a rare relict species, genomic resources of Metasequoia are very limited. In order to gain insight into the molecular mechanisms behind the formation of reproductive buds and the transition from vegetative phase to reproductive phase in Metasequoia, we performed sequencing of expressed sequence tags from Metasequoia vegetative buds and female buds. By using the 454 pyrosequencing technology, a total of 1,571,764 high-quality reads were generated, among which 733,128 were from vegetative buds and 775,636 were from female buds. These EST reads were clustered and assembled into 114,124 putative unique transcripts (PUTs) with an average length of 536 bp. The 97,565 PUTs that were at least 100 bp in length were functionally annotated by a similarity search against public databases and assigned with Gene Ontology (GO) terms. A total of 59 known floral gene families and 190 isotigs involved in hormone regulation were captured in the dataset. Furthermore, a set of PUTs differentially expressed in vegetative and reproductive buds, as well as SSR motifs and high confidence SNPs, were identified. This is the first large-scale expressed sequence tags ever generated in Metasequoia and the first evidence for floral genes in this critically endangered deciduous conifer species.

  13. Integration of deep transcriptome and proteome analyses reveals the components of alkaloid metabolism in opium poppy cell cultures

    PubMed Central

    2010-01-01

    Background Papaver somniferum (opium poppy) is the source for several pharmaceutical benzylisoquinoline alkaloids including morphine, the codeine and sanguinarine. In response to treatment with a fungal elicitor, the biosynthesis and accumulation of sanguinarine is induced along with other plant defense responses in opium poppy cell cultures. The transcriptional induction of alkaloid metabolism in cultured cells provides an opportunity to identify components of this process via the integration of deep transcriptome and proteome databases generated using next-generation technologies. Results A cDNA library was prepared for opium poppy cell cultures treated with a fungal elicitor for 10 h. Using 454 GS-FLX Titanium pyrosequencing, 427,369 expressed sequence tags (ESTs) with an average length of 462 bp were generated. Assembly of these sequences yielded 93,723 unigenes, of which 23,753 were assigned Gene Ontology annotations. Transcripts encoding all known sanguinarine biosynthetic enzymes were identified in the EST database, 5 of which were represented among the 50 most abundant transcripts. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) of total protein extracts from cell cultures treated with a fungal elicitor for 50 h facilitated the identification of 1,004 proteins. Proteins were fractionated by one-dimensional SDS-PAGE and digested with trypsin prior to LC-MS/MS analysis. Query of an opium poppy-specific EST database substantially enhanced peptide identification. Eight out of 10 known sanguinarine biosynthetic enzymes and many relevant primary metabolic enzymes were represented in the peptide database. Conclusions The integration of deep transcriptome and proteome analyses provides an effective platform to catalogue the components of secondary metabolism, and to identify genes encoding uncharacterized enzymes. The establishment of corresponding transcript and protein databases generated by next-generation technologies in a system with a well-defined metabolite profile facilitates an improved linkage between genes, enzymes, and pathway components. The proteome database represents the most relevant alkaloid-producing enzymes, compared with the much deeper and more complete transcriptome library. The transcript database contained full-length mRNAs encoding most alkaloid biosynthetic enzymes, which is a key requirement for the functional characterization of novel gene candidates. PMID:21083930

  14. Development of expressed sequence tag-simple sequence repeat markers for genetic characterization and population structure analysis of Praxelis clematidea (Asteraceae).

    PubMed

    Wang, Q Z; Huang, M; Downie, S R; Chen, Z X

    2016-05-23

    Invasive plants tend to spread aggressively in new habitats and an understanding of their genetic diversity and population structure is useful for their management. In this study, expressed sequence tag-simple sequence repeat (EST-SSR) markers were developed for the invasive plant species Praxelis clematidea (Asteraceae) from 5548 Stevia rebaudiana (Asteraceae) expressed sequence tags (ESTs). A total of 133 microsatellite-containing ESTs (2.4%) were identified, of which 56 (42.1%) were hexanucleotide repeat motifs and 50 (37.6%) were trinucleotide repeat motifs. Of the 24 primer pairs designed from these 133 ESTs, 7 (29.2%) resulted in significant polymorphisms. The number of alleles per locus ranged from 5 to 9. The relatively high genetic diversity (H = 0.2667, I = 0.4212, and P = 100%) of P. clematidea was related to high gene flow (Nm = 1.4996) among populations. The coefficient of population differentiation (GST = 0.2500) indicated that most genetic variation occurred within populations. A Mantel test suggested that there was significant correlation between genetic distance and geographical distribution (r = 0.3192, P = 0.012). These results further support the transferability of EST-SSR markers between closely related genera of the same family.

  15. Generation of a foveomacular transcriptome

    PubMed Central

    Bernstein, Steven; Wong, Paul W.

    2014-01-01

    Purpose Organizing molecular biologic data is a growing challenge since the rate of data accumulation is steadily increasing. Information relevant to a particular biologic query can be difficult to extract from the comprehensive databases currently available. We present a data collection and organization model designed to ameliorate these problems and applied it to generate an expressed sequence tag (EST)–based foveomacular transcriptome. Methods Using Perl, MySQL, EST libraries, screening, and human foveomacular gene expression as a model system, we generated a foveomacular transcriptome database enriched for molecularly relevant data. Results Using foveomacula as a gene expression model tissue, we identified and organized 6,056 genes expressed in that tissue. Of those identified genes, 3,480 had not been previously described as expressed in the foveomacula. Internal experimental controls as well as comparison of our data set to published data sets suggest we do not yet have a complete description of the foveomacula transcriptome. Conclusions We present an organizational method designed to amplify the utility of data pertinent to a specific research interest. Our method is generic enough to be applicable to a variety of conditions yet focused enough to allow for specialized study. PMID:24991187

  16. Needles in the EST Haystack: Large-Scale Identification and Analysis of Excretory-Secretory (ES) Proteins in Parasitic Nematodes Using Expressed Sequence Tags (ESTs)

    PubMed Central

    Nagaraj, Shivashankar H.; Gasser, Robin B.; Ranganathan, Shoba

    2008-01-01

    Background Parasitic nematodes of humans, other animals and plants continue to impose a significant public health and economic burden worldwide, due to the diseases they cause. Promising antiparasitic drug and vaccine candidates have been discovered from excreted or secreted (ES) proteins released from the parasite and exposed to the immune system of the host. Mining the entire expressed sequence tag (EST) data available from parasitic nematodes represents an approach to discover such ES targets. Methods and Findings In this study, we predicted, using EST2Secretome, a novel, high-throughput, computational workflow system, 4,710 ES proteins from 452,134 ESTs derived from 39 different species of nematodes, parasitic in animals (including humans) or plants. In total, 2,632, 786, and 1,292 ES proteins were predicted for animal-, human-, and plant-parasitic nematodes. Subsequently, we systematically analysed ES proteins using computational methods. Of these 4,710 proteins, 2,490 (52.8%) had orthologues in Caenorhabditis elegans, whereas 621 (13.8%) appeared to be novel, currently having no significant match to any molecule available in public databases. Of the C. elegans homologues, 267 had strong “loss-of-function” phenotypes by RNA interference (RNAi) in this nematode. We could functionally classify 1,948 (41.3%) sequences using the Gene Ontology (GO) terms, establish pathway associations for 573 (12.2%) sequences using Kyoto Encyclopaedia of Genes and Genomes (KEGG), and identify protein interaction partners for 1,774 (37.6%) molecules. We also mapped 758 (16.1%) proteins to protein domains including the nematode-specific protein family “transthyretin-like” and “chromadorea ALT,” considered as vaccine candidates against filariasis in humans. Conclusions We report the large-scale analysis of ES proteins inferred from EST data for a range of parasitic nematodes. This set of ES proteins provides an inventory of known and novel members of ES proteins as a foundation for studies focused on understanding the biology of parasitic nematodes and their interactions with their hosts, as well as for the development of novel drugs or vaccines for parasite intervention and control. PMID:18820748

  17. Bioinformatics and reanalysis of subtracted expressed sequence tags from the human ciliary body: Identification of novel biological functions.

    PubMed

    Escribano, Julio; Coca-Prados, Miguel

    2002-08-28

    The ciliary body is largely known for its major roles in the regulation of aqueous humor secretion, intraocular pressure, and accommodation of the lens. In this review article we applied bioinformatics to re-examine hundreds of expressed sequence tags (ESTs) previously isolated by subtractive hybridization from a human ciliary body library [1]. The DNA sequences of these clones have been recently added to the web site of NEIBank. DNA sequence comparisons of subtracted ESTs were performed against all entries in the last available release of the non-redundant database containing GenBank, EMBL, DDBJ and PDB sequences using the BlastN program accessed through NCBI's BLAST services on the internet (NCBI). Sequences were also compared and mapped using the Blast search program provided through the Internet by the Human Genome Project (UCSC). A total number of 284 independent ESTs were classified in 17 functional groups. Analysis of their relationships allowed to define the expression of five major groups of known genes: (i) protein synthesis, folding, secretion and degradation (20%); (ii) energy supply and biosynthesis (12%); (iii) contractility and cytoskeleton structure (6%); (iv) cellular signaling and cell cycle regulation (7%); and (v) nerve cell related tasks (2%), including neuropeptide processing and putative non-visual phototransduction and circadian rhythm control. The largest group contain unidentified sequences, a total of 105 sequences, accounting for 37% of ESTs. The unidentified sequences show similarity to genomic non-coding regions, or genes of unknown function. The most highly represented EST, correspond to myocilin, a gene involved in glaucoma. The data also confirms the secretory functions of the ciliary epithelium, and its high metabolism; the presence of a neuroendocrine peptidergic system presumably involved in the regulation of the intraocular pressure and/or aqueous humor secretion. Additional genes may be related to a non-visual phototransduction cascade and/or to circadian rhythms. Overall this initial group of subtracted ESTs can lead to uncover novel physiological functions of the ciliary body in normal and in disease, as well as novel candidate genes for ocular diseases.

  18. Bioinformatic mining of EST-SSR loci in the Pacific oyster, Crassostrea gigas.

    PubMed

    Wang, Y; Ren, R; Yu, Z

    2008-06-01

    A set of expressed sequence tag-simple sequence repeat (EST-SSR) markers of the Pacific oyster, Crassostrea gigas, was developed through bioinformatic mining of the GenBank public database. As of June 30, 2007, a total of 5132 EST sequences from GenBank were downloaded and screened for di-, tri- and tetra-nucleotide repeats, with criteria set at a minimum of 5, 4 and 4 repeats for the three categories of SSRs respectively. Seventeen polymorphic microsatellite markers were characterized. Allele numbers ranged from 3 to 10, and the observed and expected heterozygosity values varied from 0.125 to 0.770 and from 0.113 to 0.732 respectively. Eleven loci were at Hardy-Weinberg equilibrium (HWE); the other six loci showed significant departure from HWE (P < 0.01), suggesting possible presence of null alleles. Pairwise check of linkage disequilibrium (LD) indicated that 11 of 136 pairs of loci showed significant LD (P < 0.01), likely due to HWE present in single markers. Cross-species amplification was examined for five other Crassostrea species and reasonable results were obtained, promising usefulness of these markers in oyster genetics.

  19. Informatic selection of a neural crest-melanocyte cDNA set for microarray analysis

    PubMed Central

    Loftus, S. K.; Chen, Y.; Gooden, G.; Ryan, J. F.; Birznieks, G.; Hilliard, M.; Baxevanis, A. D.; Bittner, M.; Meltzer, P.; Trent, J.; Pavan, W.

    1999-01-01

    With cDNA microarrays, it is now possible to compare the expression of many genes simultaneously. To maximize the likelihood of finding genes whose expression is altered under the experimental conditions, it would be advantageous to be able to select clones for tissue-appropriate cDNA sets. We have taken advantage of the extensive sequence information in the dbEST expressed sequence tag (EST) database to identify a neural crest-derived melanocyte cDNA set for microarray analysis. Analysis of characterized genes with dbEST identified one library that contained ESTs representing 21 neural crest-expressed genes (library 198). The distribution of the ESTs corresponding to these genes was biased toward being derived from library 198. This is in contrast to the EST distribution profile for a set of control genes, characterized to be more ubiquitously expressed in multiple tissues (P < 1 × 10−9). From library 198, a subset of 852 clustered ESTs were selected that have a library distribution profile similar to that of the 21 neural crest-expressed genes. Microarray analysis demonstrated the majority of the neural crest-selected 852 ESTs (Mel1 array) were differentially expressed in melanoma cell lines compared with a non-neural crest kidney epithelial cell line (P < 1 × 10−8). This was not observed with an array of 1,238 ESTs that was selected without library origin bias (P = 0.204). This study presents an approach for selecting tissue-appropriate cDNAs that can be used to examine the expression profiles of developmental processes and diseases. PMID:10430933

  20. Discovery of Neuropeptides in the Nematode Ascaris suum by Database Mining and Tandem Mass Spectrometry

    PubMed Central

    Jarecki, Jessica L.; Frey, Brian L.; Smith, Lloyd M.; Stretton, Antony O.

    2011-01-01

    Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) was used to discover peptides in extracts of the large parasitic nematode Ascaris suum. This required the assembly of a new database of known and predicted peptides. In addition to those already sequenced, peptides were either previously predicted to be processed from precursor proteins identified in an A. suum library of expressed sequence tags (ESTs), or newly predicted from a library of A. suum genome survey sequences (GSSs). The predicted MS/MS fragmentation patterns of this collection of real and putative peptides were compared with the actual fragmentation patterns found in the MS/MS spectra of peptides fractionated by MS; this enabled individual peptides to be sequenced. Many previously identified peptides were found, and 21 novel peptides were discovered. Thus, this approach is very useful, despite the fact that the available GSS database is still preliminary, having only 1X coverage. PMID:21524146

  1. Generation of a total of 6483 expressed sequence tags from 60 day-old bovine whole fetus and fetal placenta.

    PubMed

    Oishi, M; Gohma, H; Lejukole, H Y; Taniguchi, Y; Yamada, T; Suzuki, K; Shinkai, H; Uenishi, H; Yasue, H; Sasaki, Y

    2004-05-01

    Expressed sequence tags (ESTs) generated based on characterization of clones isolated randomly from cDNA libraries are used to study gene expression profiles in specific tissues and to provide useful information for characterizing tissue physiology. In this study, two directionally cloned cDNA libraries were constructed from 60 day-old bovine whole fetus and fetal placenta. We have characterized 5357 and 1126 clones, and then identified 3464 and 795 unique sequences for the fetus and placenta cDNA libraries: 1851 and 504 showed homology to already identified genes, and 1613 and 291 showed no significant matches to any of the sequences in DNA databases, respectively. Further, we found 94 unique sequences overlapping in both the fetus and the placenta, leading to a catalog of 4165 genes expressed in 60 day-old fetus and placenta. The catalog is used to examine expression profile of genes in 60 day-old bovine fetus and placenta.

  2. AntiHunter 2.0: increased speed and sensitivity in searching BLAST output for EST antisense transcripts.

    PubMed

    Lavorgna, Giovanni; Triunfo, Riccardo; Santoni, Federico; Orfanelli, Ugo; Noci, Sara; Bulfone, Alessandro; Zanetti, Gianluigi; Casari, Giorgio

    2005-07-01

    An increasing number of eukaryotic and prokaryotic genes are being found to have natural antisense transcripts (NATs). There is also growing evidence to suggest that antisense transcription could play a key role in many human diseases. Consequently, there have been several recent attempts to set up computational procedures aimed at identifying novel NATs. Our group has developed the AntiHunter program for the identification of expressed sequence tag (EST) antisense transcripts from BLAST output. In order to perform an analysis, the program requires a genomic sequence plus an associated list of transcript names and coordinates of the genomic region. After masking the repeated regions, the program carries out a BLASTN search of this sequence in the selected EST database, reporting via email the EST entries that reveal an antisense transcript according to the user-supplied list. Here, we present the newly developed version 2.0 of the AntiHunter tool. Several improvements have been added to this version of the program in order to increase its ability to detect a larger number of antisense ESTs. As a result, AntiHunter can now detect, on average, >45% more antisense ESTs with little or no increase in the percentage of the false positives. We also raised the maximum query size to 3 Mb (previously 1 Mb). Moreover, we found that a reasonable trade-off between the program search sensitivity and the maximum allowed size of the input-query sequence could be obtained by querying the database with the MEGABLAST program, rather than by using the BLAST one. We now offer this new opportunity to users, i.e. if choosing the MEGABLAST option, users can input a query sequence up to 30 Mb long, thus considerably improving the possibility to analyze longer query regions. The AntiHunter tool is freely available at http://bioinfo.crs4.it/AH2.0.

  3. Generation and Analysis of a Large-Scale Expressed Sequence Tag Database from a Full-Length Enriched cDNA Library of Developing Leaves of Gossypium hirsutum L

    PubMed Central

    Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun

    2013-01-01

    Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870

  4. Expressed sequence tag analysis of functional genes associated with adventitious rooting in Liriodendron hybrids.

    PubMed

    Zhong, Y D; Sun, X Y; Liu, E Y; Li, Y Q; Gao, Z; Yu, F X

    2016-06-24

    Liriodendron hybrids (Liriodendron chinense x L. tulipifera) are important landscaping and afforestation hardwood trees. To date, little genomic research on adventitious rooting has been reported in these hybrids, as well as in the genus Liriodendron. In the present study, we used adventitious roots to construct the first cDNA library for Liriodendron hybrids. A total of 5176 expressed sequence tags (ESTs) were generated and clustered into 2921 unigenes. Among these unigenes, 2547 had significant homology to the non-redundant protein database representing a wide variety of putative functions. Homologs of these genes regulated many aspects of adventitious rooting, including those for auxin signal transduction and root hair development. Results of quantitative real-time polymerase chain reaction showed that AUX1, IRE, and FB1 were highly expressed in adventitious roots and the expression of AUX1, ARF1, NAC1, RHD1, and IRE increased during the development of adventitious roots. Additionally, 181 simple sequence repeats were identified from 166 ESTs and more than 91.16% of these were dinucleotide and trinucleotide repeats. To the best of our knowledge, the present study reports the identification of the genes associated with adventitious rooting in the genus Liriodendron for the first time and provides a valuable resource for future genomic studies. Expression analysis of selected genes could allow us to identify regulatory genes that may be essential for adventitious rooting.

  5. Analysis of expressed sequence tags from a NaHCO(3)-treated alkali-tolerant plant, Chloris virgata.

    PubMed

    Nishiuchi, Shunsaku; Fujihara, Kazumasa; Liu, Shenkui; Takano, Tetsuo

    2010-04-01

    Chloris virgata Swartz (C. virgata) is a gramineous wild plant that can survive in saline-alkali areas in northeast China. To examine the tolerance mechanisms of C. virgata, we constructed a cDNA library from whole plants of C. virgata that had been treated with 100 mM NaHCO(3) for 24 h and sequenced 3168 randomly selected clones. Most (2590) of the expressed sequence tags (ESTs) showed significant similarity to sequences in the NCBI database. Of the 2590 genes, 1893 were unique. Gene Ontology (GO) Slim annotations were obtained for 1081 ESTs by BLAST2GO and it was found that 75 genes of them were annotated with GO terms "response to stress", "response to abiotic stimulus", and "response to biotic stimulus", indicating these genes were likely to function in tolerance mechanism of C. virgata. In a separate experiment, 24 genes that are known from previous studies to be associated with abiotic stress tolerance were further examined by real-time RT-PCR to see how their expressions were affected by NaHCO(3) stress. NaHCO(3) treatment up-regulated the expressions of pathogenesis-related gene (DC998527), Win1 precursor gene (DC998617), catalase gene (DC999385), ribosome inactivating protein 1 (DC999555), Na(+)/H(+) antiporter gene (DC998043), and two-component regulator gene (DC998236). Copyright 2010 Elsevier Masson SAS. All rights reserved.

  6. Expressed sequence tags from the flower pathogen Claviceps purpurea.

    PubMed

    Oeser, Birgitt; Beaussart, François; Haarmann, Thomas; Lorenz, Nicole; Nathues, Eva; Rolke, Yvonne; Scheffer, Jan; Weiner, January; Tudzynski, Paul

    2009-09-01

    SUMMARY The ascomycete Claviceps purpurea (ergot) is a biotrophic flower pathogen of rye and other grasses. The deleterious toxic effects of infected rye seeds on humans and grazing animals have been known since the Middle Ages. To gain further insight into the molecular basis of this disease, we generated about 10 000 expressed sequence tags (ESTs)-about 25% originating from axenic fungal culture and about 75% from tissues collected 6-20 days after infection of rye spikes. The pattern of axenic vs. in planta gene expression was compared. About 200 putative plant genes were identified within the in planta library. A high percentage of these were predicted to function in plant defence against the ergot fungus and other pathogens, for example pathogenesis-related proteins. Potential fungal pathogenicity and virulence genes were found via comparison with the pathogen-host interaction database (PHI-base; http://www.phi-base.org) and with genes known to be highly expressed in the haustoria of the bean rust fungus. Comparative analysis of Claviceps and two other fungal flower pathogens (necrotrophic Fusarium graminearum and biotrophic Ustilago maydis) highlighted similarities and differences in their lifestyles, for example all three fungi have signalling components and cell wall-degrading enzymes in their arsenal. In summary, the analysis of axenic and in planta ESTs yielded a collection of candidate genes to be evaluated for functional roles in this plant-microbe interaction.

  7. Transcriptome analysis of carnation (Dianthus caryophyllus L.) based on next-generation sequencing technology.

    PubMed

    Tanase, Koji; Nishitani, Chikako; Hirakawa, Hideki; Isobe, Sachiko; Tabata, Satoshi; Ohmiya, Akemi; Onozaki, Takashi

    2012-07-02

    Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. We constructed a normalized cDNA library and a 3'-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.

  8. Transcriptome analysis of carnation (Dianthus caryophyllus L.) based on next-generation sequencing technology

    PubMed Central

    2012-01-01

    Background Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. Results We constructed a normalized cDNA library and a 3’-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. Conclusions We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant. PMID:22747974

  9. Annotated ESTs from various tissues of the brown planthopper Nilaparvata lugens: A genomic resource for studying agricultural pests

    PubMed Central

    Noda, Hiroaki; Kawai, Sawako; Koizumi, Yoko; Matsui, Kageaki; Zhang, Qiang; Furukawa, Shigetoyo; Shimomura, Michihiko; Mita, Kazuei

    2008-01-01

    Background The brown planthopper (BPH), Nilaparvata lugens (Hemiptera, Delphacidae), is a serious insect pests of rice plants. Major means of BPH control are application of agricultural chemicals and cultivation of BPH resistant rice varieties. Nevertheless, BPH strains that are resistant to agricultural chemicals have developed, and BPH strains have appeared that are virulent against the resistant rice varieties. Expressed sequence tag (EST) analysis and related applications are useful to elucidate the mechanisms of resistance and virulence and to reveal physiological aspects of this non-model insect, with its poorly understood genetic background. Results More than 37,000 high-quality ESTs, excluding sequences of mitochondrial genome, microbial genomes, and rDNA, have been produced from 18 libraries of various BPH tissues and stages. About 10,200 clusters have been made from whole EST sequences, with average EST size of 627 bp. Among the top ten most abundantly expressed genes, three are unique and show no homology in BLAST searches. The actin gene was highly expressed in BPH, especially in the thorax. Tissue-specifically expressed genes were extracted based on the expression frequency among the libraries. An EST database is available at our web site. Conclusion The EST library will provide useful information for transcriptional analyses, proteomic analyses, and gene functional analyses of BPH. Moreover, specific genes for hemimetabolous insects will be identified. The microarray fabricated based on the EST information will be useful for finding genes related to agricultural and biological problems related to this pest. PMID:18315884

  10. Characterization of expressed sequence tags (ESTs) of pigeonpea (Cajanus cajan L.) and functional validation of selected genes for abiotic stress tolerance in Arabidopsis thaliana.

    PubMed

    Priyanka, B; Sekhar, K; Sunita, T; Reddy, V D; Rao, Khareedu Venkateswara

    2010-03-01

    Pigeonpea, a major grain legume crop with remarkable drought tolerance traits, has been used for the isolation of stress-responsive genes. Herein, we report generation of ESTs, transcript profiles of selected genes and validation of candidate genes obtained from the subtracted cDNA libraries of pigeonpea plants subjected to PEG/water-deficit stress conditions. Cluster analysis of 124 selected ESTs yielded 75 high-quality ESTs. Homology searches disclosed that 55 ESTs share significant similarity with the known/putative proteins or ESTs available in the databases. These ESTs were characterized and genes relevant to the specific physiological processes were identified. Of the 75 ESTs obtained from the cDNA libraries of drought-stressed plants, 20 ESTs proved to be unique to the pigeonpea. These sequences are envisaged to serve as a potential source of stress-inducible genes of the drought stress-response transcriptome, and hence may be used for deciphering the mechanism of drought tolerance of the pigeonpea. Expression profiles of selected genes revealed increased levels of m-RNA transcripts in pigeonpea plants subjected to different abiotic stresses. Transgenic Arabidopsis lines, expressing Cajanus cajan hybrid-proline-rich protein (CcHyPRP), C. cajan cyclophilin (CcCYP) and C. cajan cold and drought regulatory (CcCDR) genes, exhibited marked tolerance, increased plant biomass and enhanced photosynthetic rates under PEG/NaCl/cold/heat stress conditions. This study represents the first report dealing with the isolation of drought-specific ESTs, transcriptome analysis and functional validation of drought-responsive genes of the pigeonpea. These genes, as such, hold promise for engineering crop plants bestowed with tolerance to major abiotic stresses.

  11. Insilico profiling of microRNAs in Korean ginseng (Panax ginseng Meyer)

    PubMed Central

    Mathiyalagan, Ramya; Subramaniyam, Sathiyamoorthy; Natarajan, Sathishkumar; Kim, Yeon Ju; Sun, Myung Suk; Kim, Se Young; Kim, Yu-Jin; Yang, Deok Chun

    2013-01-01

    MicroRNAs (miRNAs) are a class of recently discovered non-coding small RNA molecules, on average approximately 21 nucleotides in length, which underlie numerous important biological roles in gene regulation in various organisms. The miRNA database (release 18) has 18,226 miRNAs, which have been deposited from different species. Although miRNAs have been identified and validated in many plant species, no studies have been reported on discovering miRNAs in Panax ginseng Meyer, which is a traditionally known medicinal plant in oriental medicine, also known as Korean ginseng. It has triterpene ginseng saponins called ginsenosides, which are responsible for its various pharmacological activities. Predicting conserved miRNAs by homology-based analysis with available expressed sequence tag (EST) sequences can be powerful, if the species lacks whole genome sequence information. In this study by using the EST based computational approach, 69 conserved miRNAs belonging to 44 miRNA families were identified in Korean ginseng. The digital gene expression patterns of predicted conserved miRNAs were analyzed by deep sequencing using small RNA sequences of flower buds, leaves, and lateral roots. We have found that many of the identified miRNAs showed tissue specific expressions. Using the insilico method, 346 potential targets were identified for the predicted 69 conserved miRNAs by searching the ginseng EST database, and the predicted targets were mainly involved in secondary metabolic processes, responses to biotic and abiotic stress, and transcription regulator activities, as well as a variety of other metabolic processes. PMID:23717176

  12. Analysis of expressed sequence tags (ESTs) from cocoa (Theobroma cacao L) upon infection with Phytophthora megakarya.

    PubMed

    Naganeeswaran, Sudalaimuthu Asari; Subbian, Elain Apshara; Ramaswamy, Manimekalai

    2012-01-01

    Phytophthora megakarya, the causative agent of cacao black pod disease in West African countries causes an extensive loss of yield. In this study we have analyzed 4 libraries of ESTs derived from Phytophthora megakarya infected cocoa leaf and pod tissues. Totally 6379 redundant sequences were retrieved from ESTtik database and EST processing was performed using seqclean tool. Clustering and assembling using CAP3 generated 3333 non-redundant (907 contigs and 2426 singletons) sequences. The primary sequence analysis of 3333 non-redundant sequences showed that the GC percentage was 42.7 and the sequence length ranged from 101 - 2576 nucleotides. Further, functional analysis (Blast, Interproscan, Gene ontology and KEGG search) were executed and 1230 orthologous genes were annotated. Totally 272 enzymes corresponding to 114 metabolic pathways were identified. Functional annotation revealed that most of the sequences are related to molecular function, stress response and biological processes. The annotated enzymes are aldehyde dehydrogenase (E.C: 1.2.1.3), catalase (E.C: 1.11.1.6), acetyl-CoA C-acetyltransferase (E.C: 2.3.1.9), threonine ammonia-lyase (E.C: 4.3.1.19), acetolactate synthase (E.C: 2.2.1.6), O-methyltransferase (E.C: 2.1.1.68) which play an important role in amino acid biosynthesis and phenyl propanoid biosynthesis. All this information was stored in MySQL database management system to be used in future for reconstruction of biotic stress response pathway in cocoa.

  13. Exploring the Host Parasitism of the Migratory Plant-Parasitic Nematode Ditylenchus destuctor by Expressed Sequence Tags Analysis

    PubMed Central

    Peng, Huan; Gao, Bing-li; Kong, Ling-an; Yu, Qing; Huang, Wen-kun; He, Xu-feng; Long, Hai-bo; Peng, De-liang

    2013-01-01

    The potato rot nematode, Ditylenchus destructor, is a very destructive nematode pest on many agriculturally important crops worldwide, but the molecular characterization of its parasitism of plant has been limited. The effectors involved in nematode parasitism of plant for several sedentary endo-parasitic nematodes such as Heterodera glycines, Globodera rostochiensis and Meloidogyne incognita have been identified and extensively studied over the past two decades. Ditylenchus destructor, as a migratory plant parasitic nematode, has different feeding behavior, life cycle and host response. Comparing the transcriptome and parasitome among different types of plant-parasitic nematodes is the way to understand more fully the parasitic mechanism of plant nematodes. We undertook the approach of sequencing expressed sequence tags (ESTs) derived from a mixed stage cDNA library of D. destructor. This is the first study of D. destructor ESTs. A total of 9800 ESTs were grouped into 5008 clusters including 3606 singletons and 1402 multi-member contigs, representing a catalog of D. destructor genes. Implementing a bioinformatics' workflow, we found 1391 clusters have no match in the available gene database; 31 clusters only have similarities to genes identified from D. africanus, the most closely related species to D. destructor; 1991 clusters were annotated using Gene Ontology (GO); 1550 clusters were assigned enzyme commission (EC) numbers; and 1211 clusters were mapped to 181 KEGG biochemical pathways. 22 ESTs had similarities to reported nematode effectors. Interestedly, most of the effectors identified in this study are involved in host cell wall degradation or modification, such as 1,4-beta-glucanse, 1,3-beta-glucanse, pectate lyase, chitinases and expansin, or host defense suppression such as calreticulin, annexin and venom allergen-like protein. This result implies that the migratory plant-parasitic nematode D. destructor secrets similar effectors to those of sedentary plant nematodes. Finally we further characterized the two D. destructor expansin proteins. PMID:23922743

  14. Characterization of expressed sequence tag-derived simple sequence repeat markers for Aspergillus flavus: emphasis on variability of isolates from the southern United States.

    PubMed

    Wang, Xinwang; Wadl, Phillip A; Wood-Jones, Alicia; Windham, Gary; Trigiano, Robert N; Scruggs, Mary; Pilgrim, Candace; Baird, Richard

    2012-12-01

    Simple sequence repeat (SSR) markers were developed from Aspergillus flavus expressed sequence tag (EST) database to conduct an analysis of genetic relationships of Aspergillus isolates from numerous host species and geographical regions, but primarily from the United States. Twenty-nine primers were designed from 362 tri-nucleotide EST-SSR sequences. Eighteen polymorphic loci were used to genotype 96 Aspergillus species isolates. The number of alleles detected per locus ranged from 2 to 24 with a mean of 8.2 alleles. Haploid diversity ranged from 0.28 to 0.91. Genetic distance matrix was used to perform principal coordinates analysis (PCA) and to generate dendrograms using unweighted pair group method with arithmetic mean (UPGMA). Two principal coordinates explained more than 75 % of the total variation among the isolates. One clade was identified for A. flavus isolates (n = 87) with the other Aspergillus species (n = 7) using PCA, but five distinct clusters were present when the others taxa were excluded from the analysis. Six groups were noted when the EST-SSR data were compared using UPGMA. However, the latter PCA or UPGMA comparison resulted in no direct associations with host species, geographical region or aflatoxin production. Furthermore, there was no direct correlation to visible morphological features such as sclerotial types. The isolates from Mississippi Delta region, which contained the largest percentage of isolates, did not show any unusual clustering except for isolates K32, K55, and 199. Further studies of these three isolates are warranted to evaluate their pathogenicity, aflatoxin production potential, additional gene sequences (e.g., RPB2), and morphological comparisons.

  15. A first step in understanding an invasive weed through its genes: an EST analysis of invasive Centaurea maculosa.

    PubMed

    Broz, Amanda K; Broeckling, Corey D; He, Ji; Dai, Xinbin; Zhao, Patrick X; Vivanco, Jorge M

    2007-05-24

    The economic and biological implications of plant invasion are overwhelming; however, the processes by which plants become successful invaders are not well understood. Limited genetic resources are available for most invasive and weedy species, making it difficult to study molecular and genetic aspects that may be associated with invasion. As an initial step towards understanding the molecular mechanisms by which plants become invasive, we have generated a normalized Expressed Sequence Tag (EST) library comprising seven invasive populations of Centaurea maculosa, an invasive aster in North America. Seventy-seven percent of the 4423 unique transcripts showed significant similarity to existing proteins in the NCBI database and could be grouped based on gene ontology assignments. The C. maculosa EST library represents an initial step towards looking at gene-specific expression in this species, and will pave the way for creation of other resources such as microarray chips that can help provide a view of global gene expression in invasive C. maculosa and its native counterparts. To our knowledge, this is the first published set of ESTs derived from an invasive weed that will be targeted to study invasive behavior. Understanding the genetic basis of evolution for increased invasiveness in exotic plants is critical to understanding the mechanisms through which exotic invasions occur.

  16. Development of EST Intron-Targeting SNP Markers for Panax ginseng and Their Application to Cultivar Authentication.

    PubMed

    Wang, Hongtao; Li, Guisheng; Kwon, Woo-Saeng; Yang, Deok-Chun

    2016-06-04

    Panax ginseng is one of the most valuable medicinal plants in the Orient. The low level of genetic variation has limited the application of molecular markers for cultivar authentication and marker-assisted selection in cultivated ginseng. To exploit DNA polymorphism within ginseng cultivars, ginseng expressed sequence tags (ESTs) were searched against the potential intron polymorphism (PIP) database to predict the positions of introns. Intron-flanking primers were then designed in conserved exon regions and used to amplify across the more variable introns. Sequencing results showed that single nucleotide polymorphisms (SNPs), as well as indels, were detected in four EST-derived introns, and SNP markers specific to "Gopoong" and "K-1" were first reported in this study. Based on cultivar-specific SNP sites, allele-specific polymerase chain reaction (PCR) was conducted and proved to be effective for the authentication of ginseng cultivars. Additionally, the combination of a simple NaOH-Tris DNA isolation method and real-time allele-specific PCR assay enabled the high throughput selection of cultivars from ginseng fields. The established real-time allele-specific PCR assay should be applied to molecular authentication and marker assisted selection of P. ginseng cultivars, and the EST intron-targeting strategy will provide a potential approach for marker development in species without whole genomic DNA sequence information.

  17. Development and characterization of novel EST-SSR markers and their application for genetic diversity analysis of Jerusalem artichoke (Helianthus tuberosus L.).

    PubMed

    Mornkham, T; Wangsomnuk, P P; Mo, X C; Francisco, F O; Gao, L Z; Kurzweil, H

    2016-10-24

    Jerusalem artichoke (Helianthus tuberosus L.) is a perennial tuberous plant and a traditional inulin-rich crop in Thailand. It has become the most important source of inulin and has great potential for use in chemical and food industries. In this study, expressed sequence tag (EST)-based simple sequence repeat (SSR) markers were developed from 40,362 Jerusalem artichoke ESTs retrieved from the NCBI database. Among 23,691 non-redundant identified ESTs, 1949 SSR motifs harboring 2 to 6 nucleotides with varied repeat motifs were discovered from 1676 assembled sequences. Seventy-nine primer pairs were generated from EST sequences harboring SSR motifs. Our results show that 43 primers are polymorphic for the six studied populations, while the remaining 36 were either monomorphic or failed to amplify. These 43 SSR loci exhibited a high level of genetic diversity among populations, with allele numbers varying from 2 to 7, with an average of 3.95 alleles per loci. Heterozygosity ranged from 0.096 to 0.774, with an average of 0.536; polymorphic index content ranged from 0.096 to 0.854, with an average of 0.568. Principal component analysis and neighbor-joining analysis revealed that the six populations could be divided into six clusters. Our results indicate that these newly characterized EST-SSR markers may be useful in the exploration of genetic diversity and range expansion of the Jerusalem artichoke, and in cross-species application for the genus Helianthus.

  18. Investigation of SnSPR1, a novel and abundant surface protein of Sarcocystis neurona merozoites.

    PubMed

    Zhang, Deqing; Howe, Daniel K

    2008-04-15

    An expressed sequence tag (EST) sequencing project has produced over 15,000 partial cDNA sequences from the equine pathogen Sarcocystis neurona. While many of the sequences are clear homologues of previously characterized genes, a significant number of the S. neurona ESTs do not exhibit similarity to anything in the extensive sequence databases that have been generated. In an effort to characterize parasite proteins that are novel to S. neurona, a seemingly unique gene was selected for further investigation based on its abundant representation in the collection of ESTs and the predicted presence of a signal peptide and glycolipid anchor addition on the encoded protein. The gene was expressed in E. coli, and monospecific polyclonal antiserum against the recombinant protein was produced by immunization of a rabbit. Characterization of the native protein in S. neurona merozoites and schizonts revealed that it is a low molecular weight surface protein that is expressed throughout intracellular development of the parasite. The protein was designated Surface Protein 1 (SPR1) to reflect its display on the outer surface of merozoites and to distinguish it from the ubiquitous SAG/SRS surface antigens of the heteroxenous Coccidia. Interestingly, infection assays in the presence of the polyclonal antiserum suggested that SnSPR1 plays some role in attachment and/or invasion of host cells by S. neurona merozoites. The work described herein represents a general template for selecting and characterizing the various unidentified gene sequences that are plentiful in the EST databases for S. neurona and other apicomplexans. Furthermore, this study illustrates the value of investigating these novel sequences since it can offer new candidates for diagnostic or vaccine development while also providing greater insight into the biology of these parasites.

  19. An ordered EST catalogue and gene expression profiles of cassava (Manihot esculenta) at key growth stages.

    PubMed

    Li, You-Zhi; Pan, Ying-Hua; Sun, Chang-Bin; Dong, Hai-Tao; Luo, Xing-Lu; Wang, Zhi-Qiang; Tang, Ji-Liang; Chen, Baoshan

    2010-12-01

    A cDNA library was constructed from the root tissues of cassava variety Huanan 124 at the root bulking stage. A total of 9,600 cDNA clones from the library were sequenced with single-pass from the 5'-terminus to establish a catalogue of expressed sequence tags (ESTs). Assembly of the resulting EST sequences resulted in 2,878 putative unigenes. Blastn analysis showed that 62.6% of the unigenes matched with known cassava ESTs and the rest had no 'hits' against the cassava database in the integrative PlantGDB database. Blastx analysis showed that 1,715 (59.59%) of the unigenes matched with one or more GenBank protein entries and 1,163 (40.41%) had no 'hits'. A cDNA microarray with 2,878 unigenes was developed and used to analyze gene expression profiling of Huanan 124 at key growth stages including seedling, formation of root system, root bulking, and starch maturity. Array data analysis revealed that (1) the higher ratio of up-regulated ribosome-related genes was accompanied by a high ratio of up-regulated ubiquitin, proteasome-related and protease genes in cassava roots; (2) starch formation and degradation simultaneously occur at the early stages of root development but starch degradation is declined partially due to decrease in UDP-glucose dehydrogenase activity with root maturity; (3) starch may also be synthesized in situ in roots; (4) starch synthesis, translocation, and accumulation are also associated probably with signaling pathways that parallel Wnt, LAM, TCS and ErbB signaling pathways in animals; (5) constitutive expression of stress-responsive genes may be due to the adaptation of cassava to harsh environments during long-term evolution.

  20. Differential transferability of EST-SSR primers developed from diploid species Pseudoroegneria spicata, Thinopyrum bessarabicum, and Th. elongatum

    USDA-ARS?s Scientific Manuscript database

    Simple sequence repeat technology based on expressed sequence tag (EST-SSR) is a useful genomic tool for genome mapping, characterizing plant species relationships, elucidating genome evolution, and tracing genes on alien chromosome segments. EST-SSR primers developed from three perennial diploid T...

  1. De Novo Transcriptome Sequencing Analysis of cDNA Library and Large-Scale Unigene Assembly in Japanese Red Pine (Pinus densiflora)

    PubMed Central

    Liu, Le; Zhang, Shijie; Lian, Chunlan

    2015-01-01

    Japanese red pine (Pinus densiflora) is extensively cultivated in Japan, Korea, China, and Russia and is harvested for timber, pulpwood, garden, and paper markets. However, genetic information and molecular markers were very scarce for this species. In this study, over 51 million sequencing clean reads from P. densiflora mRNA were produced using Illumina paired-end sequencing technology. It yielded 83,913 unigenes with a mean length of 751 bp, of which 54,530 (64.98%) unigenes showed similarity to sequences in the NCBI database. Among which the best matches in the NCBI Nr database were Picea sitchensis (41.60%), Amborella trichopoda (9.83%), and Pinus taeda (4.15%). A total of 1953 putative microsatellites were identified in 1784 unigenes using MISA (MicroSAtellite) software, of which the tri-nucleotide repeats were most abundant (50.18%) and 629 EST-SSR (expressed sequence tag- simple sequence repeats) primer pairs were successfully designed. Among 20 EST-SSR primer pairs randomly chosen, 17 markers yielded amplification products of the expected size in P. densiflora. Our results will provide a valuable resource for gene-function analysis, germplasm identification, molecular marker-assisted breeding and resistance-related gene(s) mapping for pine for P. densiflora. PMID:26690126

  2. Gene Polymorphism Studies in a Teaching Laboratory

    NASA Astrophysics Data System (ADS)

    Shultz, Jeffry

    2009-02-01

    I present a laboratory procedure for illustrating transcription, post-transcriptional modification, gene conservation, and comparative genetics for use in undergraduate biology education. Students are individually assigned genes in a targeted biochemical pathway, for which they design and test polymerase chain reaction (PCR) primers. In this example, students used genes annotated for the steroid biosynthesis pathway in soybean. The authoritative Kyoto encyclopedia of genes and genomes (KEGG) interactive database and other online resources were used to design primers based first on soybean expressed sequence tags (ESTs), then on ESTs from an alternate organism if soybean sequence was unavailable. Students designed a total of 50 gene-based primer pairs (37 soybean, 13 alternative) and tested these for polymorphism state and similarity between two soybean and two pea lines. Student assessment was based on acquisition of laboratory skills and successful project completion. This simple procedure illustrates conservation of genes and is not limited to soybean or pea. Cost per student estimates are included, along with a detailed protocol and flow diagram of the procedure.

  3. Gambling on a shortcut to genome sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Roberts, L.

    1991-06-21

    Almost from the start of the Human Genome Project, a debate has been raging over whether to sequence the entire human genome, all 3 billion bases, or just the genes - a mere 2% or 3% of the genome, and by far the most interesting part. In England, Sydney Brenner convinced the Medical Research Council (MRC) to start with the expressed genes, or complementary DNAs. But the US stance has been that the entire sequence is essential if we are to understand the blueprint of man. Craig Venter of the National Institute of Neurological Disorders and Stroke says that focusingmore » on the expressed genes may be even more useful than expected. His strategy involves randomly selecting clones from cDNA libraries which theoretically contain all the genes that are switched on at a particular time in a particular tissue. Then the researchers sequence just a short stretch of each clone, about 400 to 500 bases, to create can expressed sequence tag or EST. The sequences of these ESTs are then stored in a database. Using that information, other researchers can then recreate that EST by using polymerase chain reaction techniques.« less

  4. Comparison of transcripts in Phalaenopsis bellina and Phalaenopsis equestris (Orchidaceae) flowers to deduce monoterpene biosynthesis pathway.

    PubMed

    Hsiao, Yu-Yun; Tsai, Wen-Chieh; Kuoh, Chang-Sheng; Huang, Tian-Hsiang; Wang, Hei-Chia; Wu, Tian-Shung; Leu, Yann-Lii; Chen, Wen-Huei; Chen, Hong-Hwa

    2006-07-13

    Floral scent is one of the important strategies for ensuring fertilization and for determining seed or fruit set. Research on plant scents has hampered mainly by the invisibility of this character, its dynamic nature, and complex mixtures of components that are present in very small quantities. Most progress in scent research, as in other areas of plant biology, has come from the use of molecular and biochemical techniques. Although volatile components have been identified in several orchid species, the biosynthetic pathways of orchid flower fragrance are far from understood. We investigated how flower fragrance was generated in certain Phalaenopsis orchids by determining the chemical components of the floral scent, identifying floral expressed-sequence-tags (ESTs), and deducing the pathways of floral scent biosynthesis in Phalaneopsis bellina by bioinformatics analysis. The main chemical components in the P. bellina flower were shown by gas chromatography-mass spectrometry to be monoterpenoids, benzenoids and phenylpropanoids. The set of floral scent producing enzymes in the biosynthetic pathway from glyceraldehyde-3-phosphate (G3P) to geraniol and linalool were recognized through data mining of the P. bellina floral EST database (dbEST). Transcripts preferentially expressed in P. bellina were distinguished by comparing the scent floral dbEST to that of a scentless species, P. equestris, and included those encoding lipoxygenase, epimerase, diacylglycerol kinase and geranyl diphosphate synthase. In addition, EST filtering results showed that transcripts encoding signal transduction and Myb transcription factors and methyltransferase, in addition to those for scent biosynthesis, were detected by in silico hybridization of the P. bellina unigene database against those of the scentless species, rice and Arabidopsis. Altogether, we pinpointed 66% of the biosynthetic steps from G3P to geraniol, linalool and their derivatives. This systems biology program combined chemical analysis, genomics and bioinformatics to elucidate the scent biosynthesis pathway and identify the relevant genes. It integrates the forward and reverse genetic approaches to knowledge discovery by which researchers can study non-model plants.

  5. Expression of castor LPAT2 enhances ricinoleic acid content at the sn-2 position of triacylglycerols in lesquerella seed

    USDA-ARS?s Scientific Manuscript database

    Lesquerella (Physaria fendelri) is a potential crop for hydroxy fatty acid (HFA) production. Its seed triacylglcerols (TAGs) contain 55–60% lesquerolic acid (20:1OH), mostly at the sn-1 and the sn-3 positions of TAG. Castor (Ricinus communis) TAGs contain 90% of ricinoleic acid (18:1OH) which is est...

  6. Gene discovery in EST sequences from the wheat leaf rust fungus Puccinia triticina sexual spores, asexual spores and haustoria, compared to other rust and corn smut fungi

    PubMed Central

    2011-01-01

    Background Rust fungi are biotrophic basidiomycete plant pathogens that cause major diseases on plants and trees world-wide, affecting agriculture and forestry. Their biotrophic nature precludes many established molecular genetic manipulations and lines of research. The generation of genomic resources for these microbes is leading to novel insights into biology such as interactions with the hosts and guiding directions for breakthrough research in plant pathology. Results To support gene discovery and gene model verification in the genome of the wheat leaf rust fungus, Puccinia triticina (Pt), we have generated Expressed Sequence Tags (ESTs) by sampling several life cycle stages. We focused on several spore stages and isolated haustorial structures from infected wheat, generating 17,684 ESTs. We produced sequences from both the sexual (pycniospores, aeciospores and teliospores) and asexual (germinated urediniospores) stages of the life cycle. From pycniospores and aeciospores, produced by infecting the alternate host, meadow rue (Thalictrum speciosissimum), 4,869 and 1,292 reads were generated, respectively. We generated 3,703 ESTs from teliospores produced on the senescent primary wheat host. Finally, we generated 6,817 reads from haustoria isolated from infected wheat as well as 1,003 sequences from germinated urediniospores. Along with 25,558 previously generated ESTs, we compiled a database of 13,328 non-redundant sequences (4,506 singlets and 8,822 contigs). Fungal genes were predicted using the EST version of the self-training GeneMarkS algorithm. To refine the EST database, we compared EST sequences by BLASTN to a set of 454 pyrosequencing-generated contigs and Sanger BAC-end sequences derived both from the Pt genome, and to ESTs and genome reads from wheat. A collection of 6,308 fungal genes was identified and compared to sequences of the cereal rusts, Puccinia graminis f. sp. tritici (Pgt) and stripe rust, P. striiformis f. sp. tritici (Pst), and poplar leaf rust Melampsora species, and the corn smut fungus, Ustilago maydis (Um). While extensive homologies were found, many genes appeared novel and species-specific; over 40% of genes did not match any known sequence in existing databases. Focusing on spore stages, direct comparison to Um identified potential functional homologs, possibly allowing heterologous functional analysis in that model fungus. Many potentially secreted protein genes were identified by similarity searches against genes and proteins of Pgt and Melampsora spp., revealing apparent orthologs. Conclusions The current set of Pt unigenes contributes to gene discovery in this major cereal pathogen and will be invaluable for gene model verification in the genome sequence. PMID:21435244

  7. Development, characterization and cross species amplification of polymorphic microsatellite markers from expressed sequence tags of turmeric (Curcuma longa L.).

    PubMed

    Siju, S; Dhanya, K; Syamkumar, S; Sasikumar, B; Sheeja, T E; Bhat, A I; Parthasarathy, V A

    2010-02-01

    Expressed sequence tags (ESTs) from turmeric (Curcuma longa L.) were used for the screening of type and frequency of Class I (hypervariable) simple sequence repeats (SSRs). A total of 231 microsatellite repeats were detected from 12,593 EST sequences of turmeric after redundancy elimination. The average density of Class I SSRs accounts to one SSR per 17.96 kb of EST. Mononucleotides were the most abundant class of microsatellite repeat in turmeric ESTs followed by trinucleotides. A robust set of 17 polymorphic EST-SSRs were developed and used for evaluating 20 turmeric accessions. The number of alleles detected ranged from 3 to 8 per loci. The developed markers were also evaluated in 13 related species of C. longa confirming high rate (100%) of cross species transferability. The polymorphic microsatellite markers generated from this study could be used for genetic diversity analysis and resolving the taxonomic confusion prevailing in the genus.

  8. SEAN: SNP prediction and display program utilizing EST sequence clusters.

    PubMed

    Huntley, Derek; Baldo, Angela; Johri, Saurabh; Sergot, Marek

    2006-02-15

    SEAN is an application that predicts single nucleotide polymorphisms (SNPs) using multiple sequence alignments produced from expressed sequence tag (EST) clusters. The algorithm uses rules of sequence identity and SNP abundance to determine the quality of the prediction. A Java viewer is provided to display the EST alignments and predicted SNPs.

  9. Development and in silico evaluation of large-scale metabolite identification methods using functional group detection for metabolomics

    PubMed Central

    Mitchell, Joshua M.; Fan, Teresa W.-M.; Lane, Andrew N.; Moseley, Hunter N. B.

    2014-01-01

    Large-scale identification of metabolites is key to elucidating and modeling metabolism at the systems level. Advances in metabolomics technologies, particularly ultra-high resolution mass spectrometry (MS) enable comprehensive and rapid analysis of metabolites. However, a significant barrier to meaningful data interpretation is the identification of a wide range of metabolites including unknowns and the determination of their role(s) in various metabolic networks. Chemoselective (CS) probes to tag metabolite functional groups combined with high mass accuracy provide additional structural constraints for metabolite identification and quantification. We have developed a novel algorithm, Chemically Aware Substructure Search (CASS) that efficiently detects functional groups within existing metabolite databases, allowing for combined molecular formula and functional group (from CS tagging) queries to aid in metabolite identification without a priori knowledge. Analysis of the isomeric compounds in both Human Metabolome Database (HMDB) and KEGG Ligand demonstrated a high percentage of isomeric molecular formulae (43 and 28%, respectively), indicating the necessity for techniques such as CS-tagging. Furthermore, these two databases have only moderate overlap in molecular formulae. Thus, it is prudent to use multiple databases in metabolite assignment, since each major metabolite database represents different portions of metabolism within the biosphere. In silico analysis of various CS-tagging strategies under different conditions for adduct formation demonstrate that combined FT-MS derived molecular formulae and CS-tagging can uniquely identify up to 71% of KEGG and 37% of the combined KEGG/HMDB database vs. 41 and 17%, respectively without adduct formation. This difference between database isomer disambiguation highlights the strength of CS-tagging for non-lipid metabolite identification. However, unique identification of complex lipids still needs additional information. PMID:25120557

  10. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    PubMed Central

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  11. A Tool for Conditions Tag Management in ATLAS

    NASA Astrophysics Data System (ADS)

    Sharmazanashvili, A.; Batiashvili, G.; Gvaberidze, G.; Shekriladze, L.; Formica, A.; Atlas Collaboration

    2014-06-01

    ATLAS Conditions data include about 2 TB in a relational database and 400 GB of files referenced from the database. Conditions data is entered and retrieved using COOL, the API for accessing data in the LCG Conditions Database infrastructure. It is managed using an ATLAS-customized python based tool set. Conditions data are required for every reconstruction and simulation job, so access to them is crucial for all aspects of ATLAS data taking and analysis, as well as by preceding tasks to derive optimal corrections to reconstruction. Optimized sets of conditions for processing are accomplished using strict version control on those conditions: a process which assigns COOL Tags to sets of conditions, and then unifies those conditions over data-taking intervals into a COOL Global Tag. This Global Tag identifies the set of conditions used to process data so that the underlying conditions can be uniquely identified with 100% reproducibility should the processing be executed again. Understanding shifts in the underlying conditions from one tag to another and ensuring interval completeness for all detectors for a set of runs to be processed is a complex task, requiring tools beyond the above mentioned python utilities. Therefore, a JavaScript /PHP based utility called the Conditions Tag Browser (CTB) has been developed. CTB gives detector and conditions experts the possibility to navigate through the different databases and COOL folders; explore the content of given tags and the differences between them, as well as their extent in time; visualize the content of channels associated with leaf tags. This report describes the structure and PHP/ JavaScript classes of functions of the CTB.

  12. Genetic variation patterns of American chestnut populations at EST-SSRs

    Treesearch

    Oliver Gailing; C. Dana Nelson

    2017-01-01

    The objective of this study is to analyze patterns of genetic variation at genic expressed sequence tag - simple sequence repeats (EST-SSRs) and at chloroplast DNA markers in populations of American chestnut (Castanea dentata Borkh.) to assist in conservation and breeding efforts. Allelic diversity at EST-SSRs decreased significantly from southwest to northeast along...

  13. An elm EST database for identifying leaf beetle egg-induced defense genes

    PubMed Central

    2012-01-01

    Background Plants can defend themselves against herbivorous insects prior to the onset of larval feeding by responding to the eggs laid on their leaves. In the European field elm (Ulmus minor), egg laying by the elm leaf beetle ( Xanthogaleruca luteola) activates the emission of volatiles that attract specialised egg parasitoids, which in turn kill the eggs. Little is known about the transcriptional changes that insect eggs trigger in plants and how such indirect defense mechanisms are orchestrated in the context of other biological processes. Results Here we present the first large scale study of egg-induced changes in the transcriptional profile of a tree. Five cDNA libraries were generated from leaves of (i) untreated control elms, and elms treated with (ii) egg laying and feeding by elm leaf beetles, (iii) feeding, (iv) artificial transfer of egg clutches, and (v) methyl jasmonate. A total of 361,196 ESTs expressed sequence tags (ESTs) were identified which clustered into 52,823 unique transcripts (Unitrans) and were stored in a database with a public web interface. Among the analyzed Unitrans, 73% could be annotated by homology to known genes in the UniProt (Plant) database, particularly to those from Vitis, Ricinus, Populus and Arabidopsis. Comparative in silico analysis among the different treatments revealed differences in Gene Ontology term abundances. Defense- and stress-related gene transcripts were present in high abundance in leaves after herbivore egg laying, but transcripts involved in photosynthesis showed decreased abundance. Many pathogen-related genes and genes involved in phytohormone signaling were expressed, indicative of jasmonic acid biosynthesis and activation of jasmonic acid responsive genes. Cross-comparisons between different libraries based on expression profiles allowed the identification of genes with a potential relevance in egg-induced defenses, as well as other biological processes, including signal transduction, transport and primary metabolism. Conclusion Here we present a dataset for a large-scale study of the mechanisms of plant defense against insect eggs in a co-evolved, natural ecological plant–insect system. The EST database analysis provided here is a first step in elucidating the transcriptional responses of elm to elm leaf beetle infestation, and adds further to our knowledge on insect egg-induced transcriptomic changes in plants. The sequences identified in our comparative analysis give many hints about novel defense mechanisms directed towards eggs. PMID:22702658

  14. An elm EST database for identifying leaf beetle egg-induced defense genes.

    PubMed

    Büchel, Kerstin; McDowell, Eric; Nelson, Will; Descour, Anne; Gershenzon, Jonathan; Hilker, Monika; Soderlund, Carol; Gang, David R; Fenning, Trevor; Meiners, Torsten

    2012-06-15

    Plants can defend themselves against herbivorous insects prior to the onset of larval feeding by responding to the eggs laid on their leaves. In the European field elm (Ulmus minor), egg laying by the elm leaf beetle ( Xanthogaleruca luteola) activates the emission of volatiles that attract specialised egg parasitoids, which in turn kill the eggs. Little is known about the transcriptional changes that insect eggs trigger in plants and how such indirect defense mechanisms are orchestrated in the context of other biological processes. Here we present the first large scale study of egg-induced changes in the transcriptional profile of a tree. Five cDNA libraries were generated from leaves of (i) untreated control elms, and elms treated with (ii) egg laying and feeding by elm leaf beetles, (iii) feeding, (iv) artificial transfer of egg clutches, and (v) methyl jasmonate. A total of 361,196 ESTs expressed sequence tags (ESTs) were identified which clustered into 52,823 unique transcripts (Unitrans) and were stored in a database with a public web interface. Among the analyzed Unitrans, 73% could be annotated by homology to known genes in the UniProt (Plant) database, particularly to those from Vitis, Ricinus, Populus and Arabidopsis. Comparative in silico analysis among the different treatments revealed differences in Gene Ontology term abundances. Defense- and stress-related gene transcripts were present in high abundance in leaves after herbivore egg laying, but transcripts involved in photosynthesis showed decreased abundance. Many pathogen-related genes and genes involved in phytohormone signaling were expressed, indicative of jasmonic acid biosynthesis and activation of jasmonic acid responsive genes. Cross-comparisons between different libraries based on expression profiles allowed the identification of genes with a potential relevance in egg-induced defenses, as well as other biological processes, including signal transduction, transport and primary metabolism. Here we present a dataset for a large-scale study of the mechanisms of plant defense against insect eggs in a co-evolved, natural ecological plant-insect system. The EST database analysis provided here is a first step in elucidating the transcriptional responses of elm to elm leaf beetle infestation, and adds further to our knowledge on insect egg-induced transcriptomic changes in plants. The sequences identified in our comparative analysis give many hints about novel defense mechanisms directed towards eggs.

  15. A first step in understanding an invasive weed through its genes: an EST analysis of invasive Centaurea maculosa

    PubMed Central

    Broz, Amanda K; Broeckling, Corey D; He, Ji; Dai, Xinbin; Zhao, Patrick X; Vivanco, Jorge M

    2007-01-01

    Background The economic and biological implications of plant invasion are overwhelming; however, the processes by which plants become successful invaders are not well understood. Limited genetic resources are available for most invasive and weedy species, making it difficult to study molecular and genetic aspects that may be associated with invasion. Results As an initial step towards understanding the molecular mechanisms by which plants become invasive, we have generated a normalized Expressed Sequence Tag (EST) library comprising seven invasive populations of Centaurea maculosa, an invasive aster in North America. Seventy-seven percent of the 4423 unique transcripts showed significant similarity to existing proteins in the NCBI database and could be grouped based on gene ontology assignments. Conclusion The C. maculosa EST library represents an initial step towards looking at gene-specific expression in this species, and will pave the way for creation of other resources such as microarray chips that can help provide a view of global gene expression in invasive C. maculosa and its native counterparts. To our knowledge, this is the first published set of ESTs derived from an invasive weed that will be targeted to study invasive behavior. Understanding the genetic basis of evolution for increased invasiveness in exotic plants is critical to understanding the mechanisms through which exotic invasions occur. PMID:17524143

  16. Development of EST Intron-Targeting SNP Markers for Panax ginseng and Their Application to Cultivar Authentication

    PubMed Central

    Wang, Hongtao; Li, Guisheng; Kwon, Woo-Saeng; Yang, Deok-Chun

    2016-01-01

    Panax ginseng is one of the most valuable medicinal plants in the Orient. The low level of genetic variation has limited the application of molecular markers for cultivar authentication and marker-assisted selection in cultivated ginseng. To exploit DNA polymorphism within ginseng cultivars, ginseng expressed sequence tags (ESTs) were searched against the potential intron polymorphism (PIP) database to predict the positions of introns. Intron-flanking primers were then designed in conserved exon regions and used to amplify across the more variable introns. Sequencing results showed that single nucleotide polymorphisms (SNPs), as well as indels, were detected in four EST-derived introns, and SNP markers specific to “Gopoong” and “K-1” were first reported in this study. Based on cultivar-specific SNP sites, allele-specific polymerase chain reaction (PCR) was conducted and proved to be effective for the authentication of ginseng cultivars. Additionally, the combination of a simple NaOH-Tris DNA isolation method and real-time allele-specific PCR assay enabled the high throughput selection of cultivars from ginseng fields. The established real-time allele-specific PCR assay should be applied to molecular authentication and marker assisted selection of P. ginseng cultivars, and the EST intron-targeting strategy will provide a potential approach for marker development in species without whole genomic DNA sequence information. PMID:27271615

  17. Teasing apart a three-way symbiosis: transcriptome analyses of Curvularia protuberata in response to viral infection and heat stress.

    PubMed

    Morsy, Mustafa R; Oswald, Jennifer; He, Ji; Tang, Yuhong; Roossinck, Marilyn J

    2010-10-15

    The fungus Curvularia protuberata carries a dsRNA virus, Curvularia thermal tolerance virus, and develops a three-way symbiotic relationship with plants to enable their survival in extreme soil temperatures. To learn about the genome of C. protuberata and possible mechanisms of heat tolerance a collection of expressed sequence tags (ESTs) were developed from two subtracted cDNA libraries from mycelial cultures grown under control and heat stress conditions. We analyzed 4207 ESTs that were assembled into 1926 unique transcripts. Of the unique transcripts, 1347 (70%) had sequence similarity with GenBank entries using BLASTX while the rest represented unknown proteins with no matches in the databases. The majority of ESTs with known similarities were homologues to fungal genes. The EST collection presents a rich source of heat stress and viral induced genes of a fungal endophyte that is involved in a symbiotic relationship with plants. Expression profile analyses of some candidate genes suggest possible involvement of osmoprotectants such as trehalose, glycine betaine, and taurine in the heat stress response. The fungal pigment melanin, and heat shock proteins also may be involved in the thermotolerance of C. protuberata in culture. The results assist in understanding the molecular basis of thermotolerance of the three-way symbiosis. Further studies will confirm or refute the involvement of these pathways in stress tolerance. Copyright © 2010 Elsevier Inc. All rights reserved.

  18. MytiBase: a knowledgebase of mussel (M. galloprovincialis) transcribed sequences

    PubMed Central

    Venier, Paola; De Pittà, Cristiano; Bernante, Filippo; Varotto, Laura; De Nardi, Barbara; Bovo, Giuseppe; Roch, Philippe; Novoa, Beatriz; Figueras, Antonio; Pallavicini, Alberto; Lanfranchi, Gerolamo

    2009-01-01

    Background Although Bivalves are among the most studied marine organisms due to their ecological role, economic importance and use in pollution biomonitoring, very little information is available on the genome sequences of mussels. This study reports the functional analysis of a large-scale Expressed Sequence Tag (EST) sequencing from different tissues of Mytilus galloprovincialis (the Mediterranean mussel) challenged with toxic pollutants, temperature and potentially pathogenic bacteria. Results We have constructed and sequenced seventeen cDNA libraries from different Mediterranean mussel tissues: gills, digestive gland, foot, anterior and posterior adductor muscle, mantle and haemocytes. A total of 24,939 clones were sequenced from these libraries generating 18,788 high-quality ESTs which were assembled into 2,446 overlapping clusters and 4,666 singletons resulting in a total of 7,112 non-redundant sequences. In particular, a high-quality normalized cDNA library (Nor01) was constructed as determined by the high rate of gene discovery (65.6%). Bioinformatic screening of the non-redundant M. galloprovincialis sequences identified 159 microsatellite-containing ESTs. Clusters, consensuses, related similarities and gene ontology searches have been organized in a dedicated, searchable database . Conclusion We defined the first species-specific catalogue of M. galloprovincialis ESTs including 7,112 unique transcribed sequences. Putative microsatellite markers were identified. This annotated catalogue represents a valuable platform for expression studies, marker validation and genetic linkage analysis for investigations in the biology of Mediterranean mussels. PMID:19203376

  19. Identification and characterization of gene-based SSR markers in date palm (Phoenix dactylifera L.).

    PubMed

    Zhao, Yongli; Williams, Roxanne; Prakash, C S; He, Guohao

    2012-12-15

    Date palm (Phoenix dactylifera L.) is an important tree in the Middle East and North Africa due to the nutritional value of its fruit. Molecular Breeding would accelerate genetic improvement of fruit tree through marker assisted selection. However, the lack of molecular markers in date palm restricts the application of molecular breeding. In this study, we analyzed 28,889 EST sequences from the date palm genome database to identify simple-sequence repeats (SSRs) and to develop gene-based markers, i.e. expressed sequence tag-SSRs (EST-SSRs). We identified 4,609 ESTs as containing SSRs, among which, trinucleotide motifs (69.7%) were the most common, followed by tetranucleotide (10.4%) and dinucleotide motifs (9.6%). The motif AG (85.7%) was most abundant in dinucleotides, while motifs AGG (26.8%), AAG (19.3%), and AGC (16.1%) were most common among trinucleotides. A total of 4,967 primer pairs were designed for EST-SSR markers from the computational data. In a follow up laboratory study, we tested a sample of 20 random selected primer pairs for amplification and polymorphism detection using genomic DNA from date palm cultivars. Nearly one-third of these primer pairs detected DNA polymorphism to differentiate the twelve date palm cultivars used. Functional categorization of EST sequences containing SSRs revealed that 3,108 (67.4%) of such ESTs had homology with known proteins. Date palm EST sequences exhibits a good resource for developing gene-based markers. These genic markers identified in our study may provide a valuable genetic and genomic tool for further genetic research and varietal development in date palm, such as diversity study, QTL mapping, and molecular breeding.

  20. Characterization of genic microsatellite markers derived from expressed sequence tags in Pacific abalone ( Haliotis discus hannai)

    NASA Astrophysics Data System (ADS)

    Li, Qi; Shu, Jing; Zhao, Cui; Liu, Shikai; Kong, Lingfeng; Zheng, Xiaodong

    2010-01-01

    Simple sequence repeat (SSR) markers were developed from the expressed sequence tags (ESTs) of Pacific abalone ( Haliotis discus hannai). Repeat motifs were found in 4.95% of the ESTs at a frequency of one repeat every 10.04 kb of EST sequences, after redundancy elimination. Seventeen polymorphic EST-SSRs were developed. The number of alleles per locus varied from 2-17, with an average of 6.8 alleles per locus. The expected and observed heterozygosities ranged from 0.159 to 0.928 and from 0.132 to 0.922, respectively. Twelve of the 17 loci (70.6%) were successfully amplified in H. diversicolor. Seventeen loci segregated in three families, with three showing the presence of null alleles (17.6%). The adequate level of variability and low frequency of null alleles observed in H. discus hannai, together with the high rate of transportability across Haliotis species, make this set of EST-SSR markers an important tool for comparative mapping, marker-assisted selection, and evolutionary studies, not only in the Pacific abalone, but also in related species.

  1. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  2. Identification and characterization of 43 microsatellite markers derived from expressed sequence tags of the sea cucumber ( Apostichopus japonicus)

    NASA Astrophysics Data System (ADS)

    Jiang, Qun; Li, Qi; Yu, Hong; Kong, Lingfeng

    2011-06-01

    The sea cucumber Apostichopus japonicus is a commercially and ecologically important species in China. A total of 3056 potential unigenes were generated after assembling 7597 A. japonicus expressed sequence tags (ESTs) downloaded from Gen-Bank. Two hundred and fifty microsatellite-containing ESTs (8.18%) and 299 simple sequence repeats (SSRs) were detected. The average density of SSRs was 1 per 7.403 kb of EST after redundancy elimination. Di-nucleotide repeat motifs appeared to be the most abundant type with a percentage of 69.90%. Of the 126 primer pairs designed, 90 amplified the expected products and 43 showed polymorphism in 30 individuals tested. The number of alleles per locus ranged from 2 to 26 with an average of 7.0 alleles, and the observed and expected heterozygosities varied from 0.067 to 1.000 and from 0.066 to 0.959, respectively. These new EST-derived microsatellite markers would provide sufficient polymorphism for population genetic studies and genome mapping of this sea cucumber species.

  3. Identification of genes differentially expressed by calorie restriction in the rotifer (Brachionus plicatilis).

    PubMed

    Oo, Aung Kyaw Swar; Kaneko, Gen; Hirayama, Makoto; Kinoshita, Shigeharu; Watabe, Shugo

    2010-01-01

    A monogonont rotifer Brachionus plicatilis has been widely used as a model organism for physiological, ecological studies and for ecotoxicology. Because of the availability of parthenogenetic mode of reproduction as well as its versatility to be used as live food in aquaculture, the population dynamic studies using the rotifer have become more important and acquired the priority over those using other species. Although many studies have been conducted to identify environmental factors that influence rotifer populations, the molecular mechanisms involved still remain to be elucidated. In this study, gene(s) differentially expressed by calorie restriction in the rotifer was analyzed, where a calorie-restricted group was fed 3 h day(-1) and a well-fed group fed ad libitum. A subtracted cDNA library from the calorie-restricted rotifer was constructed using suppression subtractive hybridization (SSH). One hundred sixty-three expressed sequence tags (ESTs) were identified, which included 109 putative genes with a high identity to known genes in the publicly available database as well as 54 unknown ESTs. After assembling, a total of 38 different genes were obtained among 109 ESTs. Further validation of expression by semi-quantitative reverse transcription-PCR showed that 29 out of the 38 genes obtained by SSH were up regulated by calorie restriction.

  4. tropiTree: An NGS-Based EST-SSR Resource for 24 Tropical Tree Species

    PubMed Central

    Russell, Joanne R.; Hedley, Peter E.; Cardle, Linda; Dancey, Siobhan; Morris, Jenny; Booth, Allan; Odee, David; Mwaura, Lucy; Omondi, William; Angaine, Peter; Machua, Joseph; Muchugi, Alice; Milne, Iain; Kindt, Roeland; Jamnadass, Ramni; Dawson, Ian K.

    2014-01-01

    The development of genetic tools for non-model organisms has been hampered by cost, but advances in next-generation sequencing (NGS) have created new opportunities. In ecological research, this raises the prospect for developing molecular markers to simultaneously study important genetic processes such as gene flow in multiple non-model plant species within complex natural and anthropogenic landscapes. Here, we report the use of bar-coded multiplexed paired-end Illumina NGS for the de novo development of expressed sequence tag-derived simple sequence repeat (EST-SSR) markers at low cost for a range of 24 tree species. Each chosen tree species is important in complex tropical agroforestry systems where little is currently known about many genetic processes. An average of more than 5,000 EST-SSRs was identified for each of the 24 sequenced species, whereas prior to analysis 20 of the species had fewer than 100 nucleotide sequence citations. To make results available to potential users in a suitable format, we have developed an open-access, interactive online database, tropiTree (http://bioinf.hutton.ac.uk/tropiTree), which has a range of visualisation and search facilities, and which is a model for the efficient presentation and application of NGS data. PMID:25025376

  5. Curation of microarray oligonucleotides and corresponding ESTs/cDNAs used for gene expression analysis in zebra finches.

    PubMed

    Lovell, Peter V; Huizinga, Nicole A; Getachew, Abel; Mees, Brianna; Friedrich, Samantha R; Wirthlin, Morgan; Mello, Claudio V

    2018-05-18

    Zebra finches are a major model organism for investigating mechanisms of vocal learning, a trait that enables spoken language in humans. The development of cDNA collections with expressed sequence tags (ESTs) and microarrays has allowed for extensive molecular characterizations of circuitry underlying vocal learning and production. However, poor database curation can lead to errors in transcriptome and bioinformatics analyses, limiting the impact of these resources. Here we used genomic alignments and synteny analysis for orthology verification to curate and reannotate ~ 35% of the oligonucleotides and corresponding ESTs/cDNAs that make-up Agilent microarrays for gene expression analysis in finches. We found that: (1) 5475 out of 43,084 oligos (a) failed to align to the zebra finch genome, (b) aligned to multiple loci, or (c) aligned to Chr_un only, and thus need to be flagged until a better genome assembly is available, or (d) reflect cloning artifacts; (2) Out of 9635 valid oligos examined further, 3120 were incorrectly named, including 1533 with no known orthologs; and (3) 2635 oligos required name update. The resulting curated dataset provides a reference for correcting gene identification errors in previous finch microarrays studies, and avoiding such errors in future studies.

  6. Evaluation of anonymous and expressed sequence tag derived polymorphic microsatellite markers in the tobacco budworm Heliothis virescens (Lepidoptera: noctuidae)

    USDA-ARS?s Scientific Manuscript database

    Polymorphic genetic markers were identified and characterized using a partial genomic library of Heliothis virescens enriched for simple sequence repeats (SSR) and nucleotide sequences of expressed sequence tags (EST). Nucleotide sequences of 192 clones from the partial genomic library yielded 147 u...

  7. Sequence tagging reveals unexpected modifications in toxicoproteomics

    PubMed Central

    Dasari, Surendra; Chambers, Matthew C.; Codreanu, Simona G.; Liebler, Daniel C.; Collins, Ben C.; Pennington, Stephen R.; Gallagher, William M.; Tabb, David L.

    2010-01-01

    Toxicoproteomic samples are rich in posttranslational modifications (PTMs) of proteins. Identifying these modifications via standard database searching can incur significant performance penalties. Here we describe the latest developments in TagRecon, an algorithm that leverages inferred sequence tags to identify modified peptides in toxicoproteomic data sets. TagRecon identifies known modifications more effectively than the MyriMatch database search engine. TagRecon outperformed state of the art software in recognizing unanticipated modifications from LTQ, Orbitrap, and QTOF data sets. We developed user-friendly software for detecting persistent mass shifts from samples. We follow a three-step strategy for detecting unanticipated PTMs in samples. First, we identify the proteins present in the sample with a standard database search. Next, identified proteins are interrogated for unexpected PTMs with a sequence tag-based search. Finally, additional evidence is gathered for the detected mass shifts with a refinement search. Application of this technology on toxicoproteomic data sets revealed unintended cross-reactions between proteins and sample processing reagents. Twenty five proteins in rat liver showed signs of oxidative stress when exposed to potentially toxic drugs. These results demonstrate the value of mining toxicoproteomic data sets for modifications. PMID:21214251

  8. The characterization of a new set of EST-derived simple sequence repeat (SSR) markers as a resource for the genetic analysis of Phaseolus vulgaris

    PubMed Central

    2011-01-01

    Background Over recent years, a growing effort has been made to develop microsatellite markers for the genomic analysis of the common bean (Phaseolus vulgaris) to broaden the knowledge of the molecular genetic basis of this species. The availability of large sets of expressed sequence tags (ESTs) in public databases has given rise to an expedient approach for the identification of SSRs (Simple Sequence Repeats), specifically EST-derived SSRs. In the present work, a battery of new microsatellite markers was obtained from a search of the Phaseolus vulgaris EST database. The diversity, degree of transferability and polymorphism of these markers were tested. Results From 9,583 valid ESTs, 4,764 had microsatellite motifs, from which 377 were used to design primers, and 302 (80.11%) showed good amplification quality. To analyze transferability, a group of 167 SSRs were tested, and the results showed that they were 82% transferable across at least one species. The highest amplification rates were observed between the species from the Phaseolus (63.7%), Vigna (25.9%), Glycine (19.8%), Medicago (10.2%), Dipterix (6%) and Arachis (1.8%) genera. The average PIC (Polymorphism Information Content) varied from 0.53 for genomic SSRs to 0.47 for EST-SSRs, and the average number of alleles per locus was 4 and 3, respectively. Among the 315 newly tested SSRs in the BJ (BAT93 X Jalo EEP558) population, 24% (76) were polymorphic. The integration of these segregant loci into a framework map composed of 123 previously obtained SSR markers yielded a total of 199 segregant loci, of which 182 (91.5%) were mapped to 14 linkage groups, resulting in a map length of 1,157 cM. Conclusions A total of 302 newly developed EST-SSR markers, showing good amplification quality, are available for the genetic analysis of Phaseolus vulgaris. These markers showed satisfactory rates of transferability, especially between species that have great economic and genomic values. Their diversity was comparable to genomic SSRs, and they were incorporated in the common bean reference genetic map, which constitutes an important contribution to and advance in Phaseolus vulgaris genomic research. PMID:21554695

  9. Expressed sequence tags (ESTs) from immune tissues of turbot (Scophthalmus maximus) challenged with pathogens

    PubMed Central

    Pardo, Belén G; Fernández, Carlos; Millán, Adrián; Bouza, Carmen; Vázquez-López, Araceli; Vera, Manuel; Alvarez-Dios, José A; Calaza, Manuel; Gómez-Tato, Antonio; Vázquez, María; Cabaleiro, Santiago; Magariños, Beatriz; Lemos, Manuel L; Leiro, José M; Martínez, Paulino

    2008-01-01

    Background The turbot (Scophthalmus maximus; Scophthalmidae; Pleuronectiformes) is a flatfish species of great relevance for marine aquaculture in Europe. In contrast to other cultured flatfish, very few genomic resources are available in this species. Aeromonas salmonicida and Philasterides dicentrarchi are two pathogens that affect turbot culture causing serious economic losses to the turbot industry. Little is known about the molecular mechanisms for disease resistance and host-pathogen interactions in this species. In this work, thousands of ESTs for functional genomic studies and potential markers linked to ESTs for mapping (microsatellites and single nucleotide polymorphisms (SNPs)) are provided. This information enabled us to obtain a preliminary view of regulated genes in response to these pathogens and it constitutes the basis for subsequent and more accurate microarray analysis. Results A total of 12584 cDNAs partially sequenced from three different cDNA libraries of turbot (Scophthalmus maximus) infected with Aeromonas salmonicida, Philasterides dicentrarchi and from healthy fish were analyzed. Three immune-relevant tissues (liver, spleen and head kidney) were sampled at several time points in the infection process for library construction. The sequences were processed into 9256 high-quality sequences, which constituted the source for the turbot EST database. Clustering and assembly of these sequences, revealed 3482 different putative transcripts, 1073 contigs and 2409 singletons. BLAST searches with public databases detected significant similarity (e-value ≤ 1e-5) in 1766 (50.7%) sequences and 816 of them (23.4%) could be functionally annotated. Two hundred three of these genes (24.9%), encoding for defence/immune-related proteins, were mostly identified for the first time in turbot. Some ESTs showed significant differences in the number of transcripts when comparing the three libraries, suggesting regulation in response to these pathogens. A total of 191 microsatellites, with 104 having sufficient flanking sequences for primer design, and 1158 putative SNPs were identified from these EST resources in turbot. Conclusion A collection of 9256 high-quality ESTs was generated representing 3482 unique turbot sequences. A large proportion of defence/immune-related genes were identified, many of them regulated in response to specific pathogens. Putative microsatellites and SNPs were identified. These genome resources constitute the basis to develop a microarray for functional genomics studies and marker validation for genetic linkage and QTL analysis in turbot. PMID:18817567

  10. The First Molecular Identification of an Olive Collection Applying Standard Simple Sequence Repeats and Novel Expressed Sequence Tag Markers.

    PubMed

    Mousavi, Soraya; Mariotti, Roberto; Regni, Luca; Nasini, Luigi; Bufacchi, Marina; Pandolfi, Saverio; Baldoni, Luciana; Proietti, Primo

    2017-01-01

    Germplasm collections of tree crop species represent fundamental tools for conservation of diversity and key steps for its characterization and evaluation. For the olive tree, several collections were created all over the world, but only few of them have been fully characterized and molecularly identified. The olive collection of Perugia University (UNIPG), established in the years' 60, represents one of the first attempts to gather and safeguard olive diversity, keeping together cultivars from different countries. In the present study, a set of 370 olive trees previously uncharacterized was screened with 10 standard simple sequence repeats (SSRs) and nine new EST-SSR markers, to correctly and thoroughly identify all genotypes, verify their representativeness of the entire cultivated olive variation, and validate the effectiveness of new markers in comparison to standard genotyping tools. The SSR analysis revealed the presence of 59 genotypes, corresponding to 72 well known cultivars, 13 of them resulting exclusively present in this collection. The new EST-SSRs have shown values of diversity parameters quite similar to those of best standard SSRs. When compared to hundreds of Mediterranean cultivars, the UNIPG olive accessions were splitted into the three main populations (East, Center and West Mediterranean), confirming that the collection has a good representativeness of the entire olive variability. Furthermore, Bayesian analysis, performed on the 59 genotypes of the collection by the use of both sets of markers, have demonstrated their splitting into four clusters, with a well balanced membership obtained by EST respect to standard SSRs. The new OLEST ( Olea expressed sequence tags) SSR markers resulted as effective as the best standard markers. The information obtained from this study represents a high valuable tool for ex situ conservation and management of olive genetic resources, useful to build a common database from worldwide olive cultivar collections, also based on recently developed markers.

  11. Exploiting rice-sorghum synteny for targeted development of EST-SSRs to enrich the sorghum genetic linkage map.

    PubMed

    Ramu, P; Kassahun, B; Senthilvel, S; Ashok Kumar, C; Jayashree, B; Folkertsma, R T; Reddy, L Ananda; Kuruvinashetti, M S; Haussmann, B I G; Hash, C T

    2009-11-01

    The sequencing and detailed comparative functional analysis of genomes of a number of select botanical models open new doors into comparative genomics among the angiosperms, with potential benefits for improvement of many orphan crops that feed large populations. In this study, a set of simple sequence repeat (SSR) markers was developed by mining the expressed sequence tag (EST) database of sorghum. Among the SSR-containing sequences, only those sharing considerable homology with rice genomic sequences across the lengths of the 12 rice chromosomes were selected. Thus, 600 SSR-containing sorghum EST sequences (50 homologous sequences on each of the 12 rice chromosomes) were selected, with the intention of providing coverage for corresponding homologous regions of the sorghum genome. Primer pairs were designed and polymorphism detection ability was assessed using parental pairs of two existing sorghum mapping populations. About 28% of these new markers detected polymorphism in this 4-entry panel. A subset of 55 polymorphic EST-derived SSR markers were mapped onto the existing skeleton map of a recombinant inbred population derived from cross N13 x E 36-1, which is segregating for Striga resistance and the stay-green component of terminal drought tolerance. These new EST-derived SSR markers mapped across all 10 sorghum linkage groups, mostly to regions expected based on prior knowledge of rice-sorghum synteny. The ESTs from which these markers were derived were then mapped in silico onto the aligned sorghum genome sequence, and 88% of the best hits corresponded to linkage-based positions. This study demonstrates the utility of comparative genomic information in targeted development of markers to fill gaps in linkage maps of related crop species for which sufficient genomic tools are not available.

  12. Identification of new stress-induced microRNA and their targets in wheat using computational approach.

    PubMed

    Pandey, Bharati; Gupta, Om Prakash; Pandey, Dev Mani; Sharma, Indu; Sharma, Pradeep

    2013-05-01

    MicroRNAs (miRNAs) are a class of short endogenous non-coding small RNA molecules of about 18-22 nucleotides in length. Their main function is to downregulate gene expression in different manners like translational repression, mRNA cleavage and epigenetic modification. Computational predictions have raised the number of miRNAs in wheat significantly using an EST based approach. Hence, a combinatorial approach which is amalgamation of bioinformatics software and perl script was used to identify new miRNA to add to the growing database of wheat miRNA. Identification of miRNAs was initiated by mining the EST (Expressed Sequence Tags) database available at National Center for Biotechnology Information. In this investigation, 4677 mature microRNA sequences belonging to 50 miRNA families from different plant species were used to predict miRNA in wheat. A total of five abiotic stress-responsive new miRNAs were predicted and named Ta-miR5653, Ta-miR855, Ta-miR819k, Ta-miR3708 and Ta-miR5156. In addition, four previously identified miRNA, i.e., Ta-miR1122, miR1117, Ta-miR1134 and Ta-miR1133 were predicted in newly identified EST sequence and 14 potential target genes were subsequently predicted, most of which seems to encode ubiquitin carrier protein, serine/threonine protein kinase, 40S ribosomal protein, F-box/kelch-repeat protein, BTB/POZ domain-containing protein, transcription factors which are involved in growth, development, metabolism and stress response. Our result has increased the number of miRNAs in wheat, which should be useful for further investigation into the biological functions and evolution of miRNAs in wheat and other plant species.

  13. Analysis of the Nicotiana tabacum Stigma/Style Transcriptome Reveals Gene Expression Differences between Wet and Dry Stigma Species1[W][OA

    PubMed Central

    Quiapim, Andréa C.; Brito, Michael S.; Bernardes, Luciano A.S.; daSilva, Idalete; Malavazi, Iran; DePaoli, Henrique C.; Molfetta-Machado, Jeanne B.; Giuliatti, Silvana; Goldman, Gustavo H.; Goldman, Maria Helena S.

    2009-01-01

    The success of plant reproduction depends on pollen-pistil interactions occurring at the stigma/style. These interactions vary depending on the stigma type: wet or dry. Tobacco (Nicotiana tabacum) represents a model of wet stigma, and its stigmas/styles express genes to accomplish the appropriate functions. For a large-scale study of gene expression during tobacco pistil development and preparation for pollination, we generated 11,216 high-quality expressed sequence tags (ESTs) from stigmas/styles and created the TOBEST database. These ESTs were assembled in 6,177 clusters, from which 52.1% are pistil transcripts/genes of unknown function. The 21 clusters with the highest number of ESTs (putative higher expression levels) correspond to genes associated with defense mechanisms or pollen-pistil interactions. The database analysis unraveled tobacco sequences homologous to the Arabidopsis (Arabidopsis thaliana) genes involved in specifying pistil identity or determining normal pistil morphology and function. Additionally, 782 independent clusters were examined by macroarray, revealing 46 stigma/style preferentially expressed genes. Real-time reverse transcription-polymerase chain reaction experiments validated the pistil-preferential expression for nine out of 10 genes tested. A search for these 46 genes in the Arabidopsis pistil data sets demonstrated that only 11 sequences, with putative equivalent molecular functions, are expressed in this dry stigma species. The reverse search for the Arabidopsis pistil genes in the TOBEST exposed a partial overlap between these dry and wet stigma transcriptomes. The TOBEST represents the most extensive survey of gene expression in the stigmas/styles of wet stigma plants, and our results indicate that wet and dry stigmas/styles express common as well as distinct genes in preparation for the pollination process. PMID:19052150

  14. MoccaDB - an integrative database for functional, comparative and diversity studies in the Rubiaceae family

    PubMed Central

    Plechakova, Olga; Tranchant-Dubreuil, Christine; Benedet, Fabrice; Couderc, Marie; Tinaut, Alexandra; Viader, Véronique; De Block, Petra; Hamon, Perla; Campa, Claudine; de Kochko, Alexandre; Hamon, Serge; Poncet, Valérie

    2009-01-01

    Background In the past few years, functional genomics information has been rapidly accumulating on Rubiaceae species and especially on those belonging to the Coffea genus (coffee trees). An increasing number of expressed sequence tag (EST) data and EST- or genomic-derived microsatellite markers have been generated, together with Conserved Ortholog Set (COS) markers. This considerably facilitates comparative genomics or map-based genetic studies through the common use of orthologous loci across different species. Similar genomic information is available for e.g. tomato or potato, members of the Solanaceae family. Since both Rubiaceae and Solanaceae belong to the Euasterids I (lamiids) integration of information on genetic markers would be possible and lead to more efficient analyses and discovery of key loci involved in important traits such as fruit development, quality, and maturation, or adaptation. Our goal was to develop a comprehensive web data source for integrated information on validated orthologous markers in Rubiaceae. Description MoccaDB is an online MySQL-PHP driven relational database that houses annotated and/or mapped microsatellite markers in Rubiaceae. In its current release, the database stores 638 markers that have been defined on 259 ESTs and 379 genomic sequences. Marker information was retrieved from 11 published works, and completed with original data on 132 microsatellite markers validated in our laboratory. DNA sequences were derived from three Coffea species/hybrids. Microsatellite markers were checked for similarity, in vitro tested for cross-amplification and diversity/polymorphism status in up to 38 Rubiaceae species belonging to the Cinchonoideae and Rubioideae subfamilies. Functional annotation was provided and some markers associated with described metabolic pathways were also integrated. Users can search the database for marker, sequence, map or diversity information through multi-option query forms. The retrieved data can be browsed and downloaded, along with protocols used, using a standard web browser. MoccaDB also integrates bioinformatics tools (CMap viewer and local BLAST) and hyperlinks to related external data sources (NCBI GenBank and PubMed, SOL Genomic Network database). Conclusion We believe that MoccaDB will be extremely useful for all researchers working in the areas of comparative and functional genomics and molecular evolution, in general, and population analysis and association mapping of Rubiaceae and Solanaceae species, in particular. PMID:19788737

  15. Sequence analysis reveals genomic factors affecting EST-SSR primer performance and polymorphism

    USDA-ARS?s Scientific Manuscript database

    Search for simple sequence repeat (SSR) motifs and design of flanking primers in expressed sequence tag (EST) sequences can be easily done at a large scale using bioinformatics programs. However, failed amplification and/or detection, along with lack of polymorphism, is often seen among randomly sel...

  16. Comparison of transcripts in Phalaenopsis bellina and Phalaenopsis equestris (Orchidaceae) flowers to deduce monoterpene biosynthesis pathway

    PubMed Central

    Hsiao, Yu-Yun; Tsai, Wen-Chieh; Kuoh, Chang-Sheng; Huang, Tian-Hsiang; Wang, Hei-Chia; Wu, Tian-Shung; Leu, Yann-Lii; Chen, Wen-Huei; Chen, Hong-Hwa

    2006-01-01

    Background Floral scent is one of the important strategies for ensuring fertilization and for determining seed or fruit set. Research on plant scents has hampered mainly by the invisibility of this character, its dynamic nature, and complex mixtures of components that are present in very small quantities. Most progress in scent research, as in other areas of plant biology, has come from the use of molecular and biochemical techniques. Although volatile components have been identified in several orchid species, the biosynthetic pathways of orchid flower fragrance are far from understood. We investigated how flower fragrance was generated in certain Phalaenopsis orchids by determining the chemical components of the floral scent, identifying floral expressed-sequence-tags (ESTs), and deducing the pathways of floral scent biosynthesis in Phalaneopsis bellina by bioinformatics analysis. Results The main chemical components in the P. bellina flower were shown by gas chromatography-mass spectrometry to be monoterpenoids, benzenoids and phenylpropanoids. The set of floral scent producing enzymes in the biosynthetic pathway from glyceraldehyde-3-phosphate (G3P) to geraniol and linalool were recognized through data mining of the P. bellina floral EST database (dbEST). Transcripts preferentially expressed in P. bellina were distinguished by comparing the scent floral dbEST to that of a scentless species, P. equestris, and included those encoding lipoxygenase, epimerase, diacylglycerol kinase and geranyl diphosphate synthase. In addition, EST filtering results showed that transcripts encoding signal transduction and Myb transcription factors and methyltransferase, in addition to those for scent biosynthesis, were detected by in silico hybridization of the P. bellina unigene database against those of the scentless species, rice and Arabidopsis. Altogether, we pinpointed 66% of the biosynthetic steps from G3P to geraniol, linalool and their derivatives. Conclusion This systems biology program combined chemical analysis, genomics and bioinformatics to elucidate the scent biosynthesis pathway and identify the relevant genes. It integrates the forward and reverse genetic approaches to knowledge discovery by which researchers can study non-model plants. PMID:16836766

  17. Semantic Annotation of Complex Text Structures in Problem Reports

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.; Throop, David R.; Fleming, Land D.

    2011-01-01

    Text analysis is important for effective information retrieval from databases where the critical information is embedded in text fields. Aerospace safety depends on effective retrieval of relevant and related problem reports for the purpose of trend analysis. The complex text syntax in problem descriptions has limited statistical text mining of problem reports. The presentation describes an intelligent tagging approach that applies syntactic and then semantic analysis to overcome this problem. The tags identify types of problems and equipment that are embedded in the text descriptions. The power of these tags is illustrated in a faceted searching and browsing interface for problem report trending that combines automatically generated tags with database code fields and temporal information.

  18. EST databases and web tools for EST projects.

    PubMed

    Shen, Yao-Qing; O'Brien, Emmet; Koski, Liisa; Lang, B Franz; Burger, Gertraud

    2009-01-01

    This chapter outlines key considerations for constructing and implementing an EST database. Instead of showing the technological details step by step, emphasis is put on the design of an EST database suited to the specific needs of EST projects and how to choose the most suitable tools. Using TBestDB as an example, we illustrate the essential factors to be considered for database construction and the steps for data population and annotation. This process employs technologies such as PostgreSQL, Perl, and PHP to build the database and interface, and tools such as AutoFACT for data processing and annotation. We discuss these in comparison to other available technologies and tools, and explain the reasons for our choices.

  19. Identification of differentially expressed genes in cucumber (Cucumis sativus L.) root under waterlogging stress by digital gene expression profile.

    PubMed

    Qi, Xiao-Hua; Xu, Xue-Wen; Lin, Xiao-Jian; Zhang, Wen-Jie; Chen, Xue-Hao

    2012-03-01

    High-throughput tag-sequencing (Tag-seq) analysis based on the Solexa Genome Analyzer platform was applied to analyze the gene expression profiling of cucumber plant at 5 time points over a 24h period of waterlogging treatment. Approximately 5.8 million total clean sequence tags per library were obtained with 143013 distinct clean tag sequences. Approximately 23.69%-29.61% of the distinct clean tags were mapped unambiguously to the unigene database, and 53.78%-60.66% of the distinct clean tags were mapped to the cucumber genome database. Analysis of the differentially expressed genes revealed that most of the genes were down-regulated in the waterlogging stages, and the differentially expressed genes mainly linked to carbon metabolism, photosynthesis, reactive oxygen species generation/scavenging, and hormone synthesis/signaling. Finally, quantitative real-time polymerase chain reaction using nine genes independently verified the tag-mapped results. This present study reveals the comprehensive mechanisms of waterlogging-responsive transcription in cucumber. Copyright © 2011 Elsevier Inc. All rights reserved.

  20. Application of Cydia pomonella expressed sequence tags: identification and expression of three general odorant binding proteins in codling moth

    USDA-ARS?s Scientific Manuscript database

    The codling moth, Cydia pomonella, is one of the most important pests of pome fruits in the world, yet the molecular genetics and physiology of this insect remains poorly understood. A combined assembly of 8340 expressed sequence tags (ESTs) was generated from Roche 454 GS-FLX sequencing of 8 tissu...

  1. Fish and chips: Various methodologies demonstrate utility of a 16,006-gene salmonid microarray

    PubMed Central

    von Schalburg, Kristian R; Rise, Matthew L; Cooper, Glenn A; Brown, Gordon D; Gibbs, A Ross; Nelson, Colleen C; Davidson, William S; Koop, Ben F

    2005-01-01

    Background We have developed and fabricated a salmonid microarray containing cDNAs representing 16,006 genes. The genes spotted on the array have been stringently selected from Atlantic salmon and rainbow trout expressed sequence tag (EST) databases. The EST databases presently contain over 300,000 sequences from over 175 salmonid cDNA libraries derived from a wide variety of tissues and different developmental stages. In order to evaluate the utility of the microarray, a number of hybridization techniques and screening methods have been developed and tested. Results We have analyzed and evaluated the utility of a microarray containing 16,006 (16K) salmonid cDNAs in a variety of potential experimental settings. We quantified the amount of transcriptome binding that occurred in cross-species, organ complexity and intraspecific variation hybridization studies. We also developed a methodology to rapidly identify and confirm the contents of a bacterial artificial chromosome (BAC) library containing Atlantic salmon genomic DNA. Conclusion We validate and demonstrate the usefulness of the 16K microarray over a wide range of teleosts, even for transcriptome targets from species distantly related to salmonids. We show the potential of the use of the microarray in a variety of experimental settings through hybridization studies that examine the binding of targets derived from different organs and tissues. Intraspecific variation in transcriptome expression is evaluated and discussed. Finally, BAC hybridizations are demonstrated as a rapid and accurate means to identify gene content. PMID:16164747

  2. The European general thoracic surgery database project.

    PubMed

    Falcoz, Pierre Emmanuel; Brunelli, Alessandro

    2014-05-01

    The European Society of Thoracic Surgeons (ESTS) Database is a free registry created by ESTS in 2001. The current online version was launched in 2007. It runs currently on a Dendrite platform with extensive data security and frequent backups. The main features are a specialty-specific, procedure-specific, prospectively maintained, periodically audited and web-based electronic database, designed for quality control and performance monitoring, which allows for the collection of all general thoracic procedures. Data collection is the "backbone" of the ESTS database. It includes many risk factors, processes of care and outcomes, which are specially designed for quality control and performance audit. The user can download and export their own data and use them for internal analyses and quality control audits. The ESTS database represents the gold standard of clinical data collection for European General Thoracic Surgery. Over the past years, the ESTS database has achieved many accomplishments. In particular, the database hit two major milestones: it now includes more than 235 participating centers and 70,000 surgical procedures. The ESTS database is a snapshot of surgical practice that aims at improving patient care. In other words, data capture should become integral to routine patient care, with the final objective of improving quality of care within Europe.

  3. Development of a EST dataset and characterization of EST-SSRs in a traditional Chinese medicinal plant, Epimedium sagittatum (Sieb. Et Zucc.) Maxim

    PubMed Central

    2010-01-01

    Background Epimedium sagittatum (Sieb. Et Zucc.) Maxim, a traditional Chinese medicinal plant species, has been used extensively as genuine medicinal materials. Certain Epimedium species are endangered due to commercial overexploition, while sustainable application studies, conservation genetics, systematics, and marker-assisted selection (MAS) of Epimedium is less-studied due to the lack of molecular markers. Here, we report a set of expressed sequence tags (ESTs) and simple sequence repeats (SSRs) identified in these ESTs for E. sagittatum. Results cDNAs of E. sagittatum are sequenced using 454 GS-FLX pyrosequencing technology. The raw reads are cleaned and assembled into a total of 76,459 consensus sequences comprising of 17,231 contigs and 59,228 singlets. About 38.5% (29,466) of the consensus sequences significantly match to the non-redundant protein database (E-value < 1e-10), 22,295 of which are further annotated using Gene Ontology (GO) terms. A total of 2,810 EST-SSRs is identified from the Epimedium EST dataset. Trinucleotide SSR is the dominant repeat type (55.2%) followed by dinucleotide (30.4%), tetranuleotide (7.3%), hexanucleotide (4.9%), and pentanucleotide (2.2%) SSR. The dominant repeat motif is AAG/CTT (23.6%) followed by AG/CT (19.3%), ACC/GGT (11.1%), AT/AT (7.5%), and AAC/GTT (5.9%). Thirty-two SSR-ESTs are randomly selected and primer pairs are synthesized for testing the transferability across 52 Epimedium species. Eighteen primer pairs (85.7%) could be successfully transferred to Epimedium species and sixteen of those show high genetic diversity with 0.35 of observed heterozygosity (Ho) and 0.65 of expected heterozygosity (He) and high number of alleles per locus (11.9). Conclusion A large EST dataset with a total of 76,459 consensus sequences is generated, aiming to provide sequence information for deciphering secondary metabolism, especially for flavonoid pathway in Epimedium. A total of 2,810 EST-SSRs is identified from EST dataset and ~1580 EST-SSR markers are transferable. E. sagittatum EST-SSR transferability to the major Epimedium germplasm is up to 85.7%. Therefore, this EST dataset and EST-SSRs will be a powerful resource for further studies such as taxonomy, molecular breeding, genetics, genomics, and secondary metabolism in Epimedium species. PMID:20141623

  4. The Human EST Ontology Explorer: a tissue-oriented visualization system for ontologies distribution in human EST collections.

    PubMed

    Merelli, Ivan; Caprera, Andrea; Stella, Alessandra; Del Corvo, Marcello; Milanesi, Luciano; Lazzari, Barbara

    2009-10-15

    The NCBI dbEST currently contains more than eight million human Expressed Sequenced Tags (ESTs). This wide collection represents an important source of information for gene expression studies, provided it can be inspected according to biologically relevant criteria. EST data can be browsed using different dedicated web resources, which allow to investigate library specific gene expression levels and to make comparisons among libraries, highlighting significant differences in gene expression. Nonetheless, no tool is available to examine distributions of quantitative EST collections in Gene Ontology (GO) categories, nor to retrieve information concerning library-dependent EST involvement in metabolic pathways. In this work we present the Human EST Ontology Explorer (HEOE) http://www.itb.cnr.it/ptp/human_est_explorer, a web facility for comparison of expression levels among libraries from several healthy and diseased tissues. The HEOE provides library-dependent statistics on the distribution of sequences in the GO Direct Acyclic Graph (DAG) that can be browsed at each GO hierarchical level. The tool is based on large-scale BLAST annotation of EST sequences. Due to the huge number of input sequences, this BLAST analysis was performed with the aid of grid computing technology, which is particularly suitable to address data parallel task. Relying on the achieved annotation, library-specific distributions of ESTs in the GO Graph were inferred. A pathway-based search interface was also implemented, for a quick evaluation of the representation of libraries in metabolic pathways. EST processing steps were integrated in a semi-automatic procedure that relies on Perl scripts and stores results in a MySQL database. A PHP-based web interface offers the possibility to simultaneously visualize, retrieve and compare data from the different libraries. Statistically significant differences in GO categories among user selected libraries can also be computed. The HEOE provides an alternative and complementary way to inspect EST expression levels with respect to approaches currently offered by other resources. Furthermore, BLAST computation on the whole human EST dataset was a suitable test of grid scalability in the context of large-scale bioinformatics analysis. The HEOE currently comprises sequence analysis from 70 non-normalized libraries, representing a comprehensive overview on healthy and unhealthy tissues. As the analysis procedure can be easily applied to other libraries, the number of represented tissues is intended to increase.

  5. A study of alternative splicing in the pig

    PubMed Central

    2010-01-01

    Background Since at least half of the genes in mammalian genomes are subjected to alternative splicing, alternative pre-mRNA splicing plays an important contribution to the complexity of the mammalian proteome. Expressed sequence tags (ESTs) provide evidence of a great number of possible alternative isoforms. With the EST resource for the domestic pig now containing more than one million porcine ESTs, it is possible to identify alternative splice forms of the individual transcripts in this species from the EST data with some confidence. Results The pig EST data generated by the Sino-Danish Pig Genome project has been assembled with publicly available ESTs and made available in the PigEST database. Using the Distiller package 2,515 EST clusters with candidate alternative isoforms were identified in the EST data with high confidence. In agreement with general observations in human and mouse, we find putative splice variants in about 30% of the contigs with more than 50 ESTs. Based on the criteria that a minimum of two EST sequences confirmed each splice event, a list of 100 genes with the most distinct tissue-specific alternative splice events was generated from the list of candidates. To confirm the tissue specificity of the splice events, 10 genes with functional annotation were randomly selected from which 16 individual splice events were chosen for experimental verification by quantitative PCR (qPCR). Six genes were shown to have tissue specific alternatively spliced transcripts with expression patterns matching those of the EST data. The remaining four genes had tissue-restricted expression of alternative spliced transcripts. Five out of the 16 splice events that were experimentally verified were found to be putative pig specific. Conclusions In accordance with human and rodent studies we estimate that approximately 30% of the porcine genes undergo alternative splicing. We found a good correlation between EST predicted tissue-specificity and experimentally validated splice events in different porcine tissue. This study indicates that a cluster size of around 50 ESTs is optimal for in silico detection of alternative splicing. Although based on a limited number of splice events, the study supports the notion that alternative splicing could have an important impact on species differentiation since 31% of the splice events studied appears to be species specific. PMID:20444244

  6. Annotation and sequence diversity of transposable elements in common bean (Phaseolus vulgaris).

    PubMed

    Gao, Dongying; Abernathy, Brian; Rohksar, Daniel; Schmutz, Jeremy; Jackson, Scott A

    2014-01-01

    Common bean (Phaseolus vulgaris) is an important legume crop grown and consumed worldwide. With the availability of the common bean genome sequence, the next challenge is to annotate the genome and characterize functional DNA elements. Transposable elements (TEs) are the most abundant component of plant genomes and can dramatically affect genome evolution and genetic variation. Thus, it is pivotal to identify TEs in the common bean genome. In this study, we performed a genome-wide transposon annotation in common bean using a combination of homology and sequence structure-based methods. We developed a 2.12-Mb transposon database which includes 791 representative transposon sequences and is available upon request or from www.phytozome.org. Of note, nearly all transposons in the database are previously unrecognized TEs. More than 5,000 transposon-related expressed sequence tags (ESTs) were detected which indicates that some transposons may be transcriptionally active. Two Ty1-copia retrotransposon families were found to encode the envelope-like protein which has rarely been identified in plant genomes. Also, we identified an extra open reading frame (ORF) termed ORF2 from 15 Ty3-gypsy families that was located between the ORF encoding the retrotransposase and the 3'LTR. The ORF2 was in opposite transcriptional orientation to retrotransposase. Sequence homology searches and phylogenetic analysis suggested that the ORF2 may have an ancient origin, but its function is not clear. These transposon data provide a useful resource for understanding the genome organization and evolution and may be used to identify active TEs for developing transposon-tagging system in common bean and other related genomes.

  7. Markers and mapping revisited: finding your gene.

    PubMed

    Jones, Neil; Ougham, Helen; Thomas, Howard; Pasakinskiene, Izolda

    2009-01-01

    This paper is an update of our earlier review (Jones et al., 1997, Markers and mapping: we are all geneticists now. New Phytologist 137: 165-177), which dealt with the genetics of mapping, in terms of recombination as the basis of the procedure, and covered some of the first generation of markers, including restriction fragment length polymorphisms (RFLPs), random amplified polymorphic DNA (RAPDs), simple sequence repeats (SSRs) and quantitative trait loci (QTLs). In the intervening decade there have been numerous developments in marker science with many new systems becoming available, which are herein described: cleavage amplification polymorphism (CAP), sequence-specific amplification polymorphism (S-SAP), inter-simple sequence repeat (ISSR), sequence tagged site (STS), sequence characterized amplification region (SCAR), selective amplification of microsatellite polymorphic loci (SAMPL), single nucleotide polymorphism (SNP), expressed sequence tag (EST), sequence-related amplified polymorphism (SRAP), target region amplification polymorphism (TRAP), microarrays, diversity arrays technology (DArT), single-strand conformation polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), temperature gradient gel electrophoresis (TGGE) and methylation-sensitive PCR. In addition there has been an explosion of knowledge and databases in the area of genomics and bioinformatics. The number of flowering plant ESTs is c. 19 million and counting, with all the opportunity that this provides for gene-hunting, while the survey of bioinformatics and computer resources points to a rapid growth point for future activities in unravelling and applying the burst of new information on plant genomes. A case study is presented on tracking down a specific gene (stay-green (SGR), a post-transcriptional senescence regulator) using the full suite of mapping tools and comparative mapping resources. We end with a brief speculation on how genome analysis may progress into the future of this highly dynamic arena of plant science.

  8. Peanut gene expression profiling in developing seeds at different reproduction stages during Aspergillus parasiticus infection

    PubMed Central

    Guo, Baozhu; Chen, Xiaoping; Dang, Phat; Scully, Brian T; Liang, Xuanqiang; Holbrook, C Corley; Yu, Jiujiang; Culbreath, Albert K

    2008-01-01

    Background Peanut (Arachis hypogaea L.) is an important crop economically and nutritionally, and is one of the most susceptible host crops to colonization of Aspergillus parasiticus and subsequent aflatoxin contamination. Knowledge from molecular genetic studies could help to devise strategies in alleviating this problem; however, few peanut DNA sequences are available in the public database. In order to understand the molecular basis of host resistance to aflatoxin contamination, a large-scale project was conducted to generate expressed sequence tags (ESTs) from developing seeds to identify resistance-related genes involved in defense response against Aspergillus infection and subsequent aflatoxin contamination. Results We constructed six different cDNA libraries derived from developing peanut seeds at three reproduction stages (R5, R6 and R7) from a resistant and a susceptible cultivated peanut genotypes, 'Tifrunner' (susceptible to Aspergillus infection with higher aflatoxin contamination and resistant to TSWV) and 'GT-C20' (resistant to Aspergillus with reduced aflatoxin contamination and susceptible to TSWV). The developing peanut seed tissues were challenged by A. parasiticus and drought stress in the field. A total of 24,192 randomly selected cDNA clones from six libraries were sequenced. After removing vector sequences and quality trimming, 21,777 high-quality EST sequences were generated. Sequence clustering and assembling resulted in 8,689 unique EST sequences with 1,741 tentative consensus EST sequences (TCs) and 6,948 singleton ESTs. Functional classification was performed according to MIPS functional catalogue criteria. The unique EST sequences were divided into twenty-two categories. A similarity search against the non-redundant protein database available from NCBI indicated that 84.78% of total ESTs showed significant similarity to known proteins, of which 165 genes had been previously reported in peanuts. There were differences in overall expression patterns in different libraries and genotypes. A number of sequences were expressed throughout all of the libraries, representing constitutive expressed sequences. In order to identify resistance-related genes with significantly differential expression, a statistical analysis to estimate the relative abundance (R) was used to compare the relative abundance of each gene transcripts in each cDNA library. Thirty six and forty seven unique EST sequences with threshold of R > 4 from libraries of 'GT-C20' and 'Tifrunner', respectively, were selected for examination of temporal gene expression patterns according to EST frequencies. Nine and eight resistance-related genes with significant up-regulation were obtained in 'GT-C20' and 'Tifrunner' libraries, respectively. Among them, three genes were common in both genotypes. Furthermore, a comparison of our EST sequences with other plant sequences in the TIGR Gene Indices libraries showed that the percentage of peanut EST matched to Arabidopsis thaliana, maize (Zea mays), Medicago truncatula, rapeseed (Brassica napus), rice (Oryza sativa), soybean (Glycine max) and wheat (Triticum aestivum) ESTs ranged from 33.84% to 79.46% with the sequence identity ≥ 80%. These results revealed that peanut ESTs are more closely related to legume species than to cereal crops, and more homologous to dicot than to monocot plant species. Conclusion The developed ESTs can be used to discover novel sequences or genes, to identify resistance-related genes and to detect the differences among alleles or markers between these resistant and susceptible peanut genotypes. Additionally, this large collection of cultivated peanut EST sequences will make it possible to construct microarrays for gene expression studies and for further characterization of host resistance mechanisms. It will be a valuable genomic resource for the peanut community. The 21,777 ESTs have been deposited to the NCBI GenBank database with accession numbers ES702769 to ES724546. PMID:18248674

  9. Gene discovery using next-generation pyrosequencing to develop ESTs for Phalaenopsis orchids

    PubMed Central

    2011-01-01

    Background Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome. Results To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs) with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR) protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7). Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified. Conclusion Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies. PMID:21749684

  10. Cloning and characterization of a novel oocyte-specific gene encoding an F-Box protein in rainbow trout (Oncorhynchus mykiss)

    USDA-ARS?s Scientific Manuscript database

    Oocyte-specific genes play critical roles in oogenesis, folliculogenesis and early embryonic development. Through analysis of expressed sequence tags (ESTs) from a rainbow trout oocyte cDNA library, we identified a novel transcript which is represented by multiple ESTs derived only from the oocyte c...

  11. Cloning and characterization of a novel oocyte-specific gene encoding an F-Box protein in rainbow trout (Oncorhynchus mykiss)

    USDA-ARS?s Scientific Manuscript database

    Oocyte-specific genes play critical roles in oogenesis, folliculogenesis and early embryonic development. Through analysis of expressed sequence tags (ESTs) from a rainbow trout oocyte cDNA library, we identified a novel transcript which is represented by ESTs only from the oocyte library. The novel...

  12. LSGermOPA, a custom OPA of 384 EST-derived SNPs for high-throughput lettuce (Lactuca sativa L.) germplasm fingerprinting

    USDA-ARS?s Scientific Manuscript database

    We assessed the genetic diversity and population structure among 148 cultivated lettuce (Lactuca sativa L.) accessions using the high-throughput GoldenGate assay and 384 EST (Expressed Sequence Tag)-derived SNP (single nucleotide polymorphism) markers. A custom OPA (Oligo Pool All), LSGermOPA was fo...

  13. ASFinder: a tool for genome-wide identification of alternatively splicing transcripts from EST-derived sequences.

    PubMed

    Min, Xiang Jia

    2013-01-01

    Expressed Sequence Tags (ESTs) are a rich resource for identifying Alternatively Splicing (AS) genes. The ASFinder webserver is designed to identify AS isoforms from EST-derived sequences. Two approaches are implemented in ASFinder. If no genomic sequences are provided, the server performs a local BLASTN to identify AS isoforms from ESTs having both ends aligned but an internal segment unaligned. Otherwise, ASFinder uses SIM4 to map ESTs to the genome, then the overlapping ESTs that are mapped to the same genomic locus and have internal variable exon/intron boundaries are identified as AS isoforms. The tool is available at http://proteomics.ysu.edu/tools/ASFinder.html.

  14. Mapping genes to human chromosome 19

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Connolly, Sarah

    1996-05-01

    For this project, 22 Expressed Sequence Tags (ESTs) were fine mapped to regions of human chromosome 19. An EST is a short DNA sequence that occurs once in the genome and corresponds to a single expressed gene. {sup 32}P-radiolabeled probes were made by polymerase chain reaction for each EST and hybridized to filters containing a chromosome 19-specific cosmid library. The location of the ESTs on the chromosome was determined by the location of the ordered cosmid to which the EST hybridized. Of the 22 ESTs that were sublocalized, 6 correspond to known genes, and 16 correspond to anonymous genes. Thesemore » localized ESTs may serve as potential candidates for disease genes, as well as markers for future physical mapping.« less

  15. Expressed sequence tag (EST) analysis of the pine wood nematode Bursaphelenchus xylophilus and B. mucronatus.

    PubMed

    Kikuchi, Taisei; Aikawa, Takuya; Kosaka, Hajime; Pritchard, Leighton; Ogura, Nobuo; Jones, John T

    2007-09-01

    Most Bursaphelenchus species feed on fungi that colonise dead or dying trees. However, Bursaphelenchus xylophilus is unique in that in addition to feeding on fungi it has the capacity to be a parasite of live pine trees. We present an analysis of over 13,000 expressed sequence tags (ESTs) from B. xylophilus and, by way of contrast, over 3000 ESTs from a closely related species that does not parasitise plants as readily; B. mucronatus. Four libraries from B. xylophilus, from a variety of life stages including fungal feeding nematodes, nematodes extracted from plants and dauer-like stage nematodes, and one library from B. mucronatus were constructed and used to generate ESTs. Contig analysis showed that the 13,327 B. xylophilus ESTs could be grouped into 2110 contigs and 4377 singletons giving a total of 6487 identified genes. Similarly the 3193 B. mucronatus ESTs yielded a total of 2219 identified genes from 425 contigs and 1794 singletons. A variety of proteins potentially important in the parasitic process of B. xylophilus and B. mucronatus, including plant and fungal cell wall degrading enzymes and a novel gene potentially encoding a expansin-like protein that may disrupt non-covalent bonds in the plant cell wall were identified in the libraries. Additionally several gene candidates potentially involved in dauer entry or maintenance were also identified in the EST dataset. The EST sequences from this study will provide a solid base for future research on the biology, pathogenicity and evolutionary history of this nematode group.

  16. Gene expression profiling via LongSAGE in a non-model plant species: a case study in seeds of Brassica napus

    PubMed Central

    Obermeier, Christian; Hosseini, Bashir; Friedt, Wolfgang; Snowdon, Rod

    2009-01-01

    Background Serial analysis of gene expression (LongSAGE) was applied for gene expression profiling in seeds of oilseed rape (Brassica napus ssp. napus). The usefulness of this technique for detailed expression profiling in a non-model organism was demonstrated for the highly complex, neither fully sequenced nor annotated genome of B. napus by applying a tag-to-gene matching strategy based on Brassica ESTs and the annotated proteome of the closely related model crucifer A. thaliana. Results Transcripts from 3,094 genes were detected at two time-points of seed development, 23 days and 35 days after pollination (DAP). Differential expression showed a shift from gene expression involved in diverse developmental processes including cell proliferation and seed coat formation at 23 DAP to more focussed metabolic processes including storage protein accumulation and lipid deposition at 35 DAP. The most abundant transcripts at 23 DAP were coding for diverse protease inhibitor proteins and proteases, including cysteine proteases involved in seed coat formation and a number of lipid transfer proteins involved in embryo pattern formation. At 35 DAP, transcripts encoding napin, cruciferin and oleosin storage proteins were most abundant. Over both time-points, 18.6% of the detected genes were matched by Brassica ESTs identified by LongSAGE tags in antisense orientation. This suggests a strong involvement of antisense transcript expression in regulatory processes during B. napus seed development. Conclusion This study underlines the potential of transcript tagging approaches for gene expression profiling in Brassica crop species via EST matching to annotated A. thaliana genes. Limits of tag detection for low-abundance transcripts can today be overcome by ultra-high throughput sequencing approaches, so that tag-based gene expression profiling may soon become the method of choice for global expression profiling in non-model species. PMID:19575793

  17. Functional analysis of Pacific oyster (Crassostrea gigas) β-thymosin: Focus on antimicrobial activity.

    PubMed

    Nam, Bo-Hye; Seo, Jung-Kil; Lee, Min Jeong; Kim, Young-Ok; Kim, Dong-Gyun; An, Cheul Min; Park, Nam Gyu

    2015-07-01

    An antimicrobial peptide, ∼5 kDa in size, was isolated and purified in its active form from the mantle of the Pacific oyster Crassostrea gigas by C18 reversed-phase high-performance liquid chromatography. Matrix-assisted laser desorption ionisation time-of-flight analysis revealed 4656.4 Da of the purified and unreduced peptide. A comparison of the N-terminal amino acid sequence of oyster antimicrobial peptide with deduced amino acid sequences in our local expressed sequence tag (EST) database of C. gigas (unpublished data) revealed that the oyster antimicrobial peptide sequence entirely matched the deduced amino acid sequence of an EST clone (HM-8_A04), which was highly homologous with the β-thymosin of other species. The cDNA possessed a 126-bp open reading frame that encoded a protein of 41 amino acids. To confirm the antimicrobial activity of C. gigas β-thymosin, we overexpressed a recombinant β-thymosin (rcgTβ) using a pET22 expression plasmid in an Escherichia coli system. The antimicrobial activity of rcgTβ was evaluated and demonstrated using a bacterial growth inhibition test in both liquid and solid cultures. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Sugar transporter genes of the brown planthopper, Nilaparvata lugens: A facilitated glucose/fructose transporter.

    PubMed

    Kikuta, Shingo; Kikawada, Takahiro; Hagiwara-Komoda, Yuka; Nakashima, Nobuhiko; Noda, Hiroaki

    2010-11-01

    The brown planthopper (BPH), Nilaparvata lugens, attacks rice plants and feeds on their phloem sap, which contains large amounts of sugars. The main sugar component of phloem sap is sucrose, a disaccharide composed of glucose and fructose. Sugars appear to be incorporated into the planthopper body by sugar transporters in the midgut. A total of 93 expressed sequence tags (ESTs) for putative sugar transporters were obtained from a BPH EST database, and 18 putative sugar transporter genes (Nlst1-18) were identified. The most abundantly expressed of these genes was Nlst1. This gene has previously been identified in the BPH as the glucose transporter gene NlHT1, which belongs to the major facilitator superfamily. Nlst1, 4, 6, 9, 12, 16, and 18 were highly expressed in the midgut, and Nlst2, 7, 8, 10, 15, 17, and 18 were highly expressed during the embryonic stages. Functional analyses were performed using Xenopus oocytes expressing NlST1 or 6. This showed that NlST6 is a facilitative glucose/fructose transporter that mediates sugar uptake from rice phloem sap in the BPH midgut in a manner similar to NlST1. Copyright © 2010 Elsevier Ltd. All rights reserved.

  19. Identification of differentially expressed genes in the oviduct of two rabbit lines divergently selected for uterine capacity using suppression subtractive hybridization.

    PubMed

    Ballester, M; Castelló, A; Peiró, R; Argente, M J; Santacreu, M A; Folch, J M

    2013-06-01

    Suppressive subtractive hybridization libraries from oviduct at 62 h post-mating of two lines of rabbits divergently selected for uterine capacity were generated to identify differentially expressed genes. A total of 438 singletons and 126 contigs were obtained by cluster assembly and sequence alignment of 704 expressed sequence tags (ESTs), of which 54% showed homology to known proteins of the non-redundant NCBI databases. Differential screening by dot blot validated 71 ESTs, of which 47 showed similarity to known genes. Transcripts of genes were functionally annotated in the molecular function and the biological process gene ontology categories using the BLAST2GO software and were assigned to reproductive developmental process, immune response, amino acid metabolism and degradation, response to stress and apoptosis terms. Finally, three interesting genes, PGR, HSD17B4 and ERO1L, were identified as overexpressed in the low line using RT-qPCR. Our study provides a list of candidate genes that can be useful to understanding the molecular mechanisms underlying the phenotypic differences observed in early embryo survival and development traits. © 2012 The Authors, Animal Genetics © 2012 Stichting International Foundation for Animal Genetics.

  20. Mass fingerprinting of the venom and transcriptome of venom gland of scorpion Centruroides tecomanus.

    PubMed

    Valdez-Velázquez, Laura L; Quintero-Hernández, Verónica; Romero-Gutiérrez, Maria Teresa; Coronas, Fredy I V; Possani, Lourival D

    2013-01-01

    Centruroides tecomanus is a Mexican scorpion endemic of the State of Colima, that causes human fatalities. This communication describes a proteome analysis obtained from milked venom and a transcriptome analysis from a cDNA library constructed from two pairs of venom glands of this scorpion. High perfomance liquid chromatography separation of soluble venom produced 80 fractions, from which at least 104 individual components were identified by mass spectrometry analysis, showing to contain molecular masses from 259 to 44,392 Da. Most of these components are within the expected molecular masses for Na(+)- and K(+)-channel specific toxic peptides, supporting the clinical findings of intoxication, when humans are stung by this scorpion. From the cDNA library 162 clones were randomly chosen, from which 130 sequences of good quality were identified and were clustered in 28 contigs containing, each, two or more expressed sequence tags (EST) and 49 singlets with only one EST. Deduced amino acid sequence analysis from 53% of the total ESTs showed that 81% (24 sequences) are similar to known toxic peptides that affect Na(+)-channel activity, and 19% (7 unique sequences) are similar to K(+)-channel especific toxins. Out of the 31 sequences, at least 8 peptides were confirmed by direct Edman degradation, using components isolated directly from the venom. The remaining 19%, 4%, 4%, 15% and 5% of the ESTs correspond respectively to proteins involved in cellular processes, antimicrobial peptides, venom components, proteins without defined function and sequences without similarity in databases. Among the cloned genes are those similar to metalloproteinases.

  1. The construction of an EST database for Bombyx mori and its application

    PubMed Central

    Mita, Kazuei; Morimyo, Mitsuoki; Okano, Kazuhiro; Koike, Yoshiko; Nohata, Junko; Kawasaki, Hideki; Kadono-Okuda, Keiko; Yamamoto, Kimiko; Suzuki, Masataka G.; Shimada, Toru; Goldsmith, Marian R.; Maeda, Susumu

    2003-01-01

    To build a foundation for the complete genome analysis of Bombyx mori, we have constructed an EST database. Because gene expression patterns deeply depend on tissues as well as developmental stages, we analyzed many cDNA libraries prepared from various tissues and different developmental stages to cover the entire set of Bombyx genes. So far, the Bombyx EST database contains 35,000 ESTs from 36 cDNA libraries, which are grouped into ≈11,000 nonredundant ESTs with the average length of 1.25 kb. The comparison with FlyBase suggests that the present EST database, SilkBase, covers >55% of all genes of Bombyx. The fraction of library-specific ESTs in each cDNA library indicates that we have not yet reached saturation, showing the validity of our strategy for constructing an EST database to cover all genes. To tackle the coming saturation problem, we have checked two methods, subtraction and normalization, to increase coverage and decrease the number of housekeeping genes, resulting in a 5–11% increase of library-specific ESTs. The identification of a number of genes and comprehensive cloning of gene families have already emerged from the SilkBase search. Direct links of SilkBase with FlyBase and WormBase provide ready identification of candidate Lepidoptera-specific genes. PMID:14614147

  2. Tags Help Make Libraries Del.icio.us: Social Bookmarking and Tagging Boost Participation

    ERIC Educational Resources Information Center

    Rethlefsen, Melissa L.

    2007-01-01

    Traditional library web products, whether online public access catalogs, library databases, or even library web sites, have long been rigidly controlled and difficult to use. Patrons regularly prefer Google's simple interface. Now social bookmarking and tagging tools help librarians bridge the gap between the library's need to offer authoritative,…

  3. Essential proteins and possible therapeutic targets of Wolbachia endosymbiont and development of FiloBase-a comprehensive drug target database for Lymphatic filariasis

    NASA Astrophysics Data System (ADS)

    Sharma, Om Prakash; Kumar, Muthuvel Suresh

    2016-01-01

    Lymphatic filariasis (Lf) is one of the oldest and most debilitating tropical diseases. Millions of people are suffering from this prevalent disease. It is estimated to infect over 120 million people in at least 80 nations of the world through the tropical and subtropical regions. More than one billion people are in danger of getting affected with this life-threatening disease. Several studies were suggested its emerging limitations and resistance towards the available drugs and therapeutic targets for Lf. Therefore, better medicine and drug targets are in demand. We took an initiative to identify the essential proteins of Wolbachia endosymbiont of Brugia malayi, which are indispensable for their survival and non-homologous to human host proteins. In this current study, we have used proteome subtractive approach to screen the possible therapeutic targets for wBm. In addition, numerous literatures were mined in the hunt for potential drug targets, drugs, epitopes, crystal structures, and expressed sequence tag (EST) sequences for filarial causing nematodes. Data obtained from our study were presented in a user friendly database named FiloBase. We hope that information stored in this database may be used for further research and drug development process against filariasis. URL: http://filobase.bicpu.edu.in.

  4. Large-Scale Concatenation cDNA Sequencing

    PubMed Central

    Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

    1997-01-01

    A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174

  5. Measurement of tag confidence in user generated contents retrieval

    NASA Astrophysics Data System (ADS)

    Lee, Sihyoung; Min, Hyun-Seok; Lee, Young Bok; Ro, Yong Man

    2009-01-01

    As online image sharing services are becoming popular, the importance of correctly annotated tags is being emphasized for precise search and retrieval. Tags created by user along with user-generated contents (UGC) are often ambiguous due to the fact that some tags are highly subjective and visually unrelated to the image. They cause unwanted results to users when image search engines rely on tags. In this paper, we propose a method of measuring tag confidence so that one can differentiate confidence tags from noisy tags. The proposed tag confidence is measured from visual semantics of the image. To verify the usefulness of the proposed method, experiments were performed with UGC database from social network sites. Experimental results showed that the image retrieval performance with confidence tags was increased.

  6. Tag Content Access Control with Identity-based Key Exchange

    NASA Astrophysics Data System (ADS)

    Yan, Liang; Rong, Chunming

    2010-09-01

    Radio Frequency Identification (RFID) technology that used to identify objects and users has been applied to many applications such retail and supply chain recently. How to prevent tag content from unauthorized readout is a core problem of RFID privacy issues. Hash-lock access control protocol can make tag to release its content only to reader who knows the secret key shared between them. However, in order to get this shared secret key required by this protocol, reader needs to communicate with a back end database. In this paper, we propose to use identity-based secret key exchange approach to generate the secret key required for hash-lock access control protocol. With this approach, not only back end database connection is not needed anymore, but also tag cloning problem can be eliminated at the same time.

  7. Application of the accurate mass and time tag approach in studies of the human blood lipidome

    PubMed Central

    Ding, Jie; Sorensen, Christina M.; Jaitly, Navdeep; Jiang, Hongliang; Orton, Daniel J.; Monroe, Matthew E.; Moore, Ronald J.; Smith, Richard D.; Metz, Thomas O.

    2008-01-01

    We report a preliminary demonstration of the accurate mass and time (AMT) tag approach for lipidomics. Initial data-dependent LC-MS/MS analyses of human plasma, erythrocyte, and lymphocyte lipids were performed in order to identify lipid molecular species in conjunction with complementary accurate mass and isotopic distribution information. Identified lipids were used to populate initial lipid AMT tag databases containing 250 and 45 entries for those species detected in positive and negative electrospray ionization (ESI) modes, respectively. The positive ESI database was then utilized to identify human plasma, erythrocyte, and lymphocyte lipids in high-throughput LC-MS analyses based on the AMT tag approach. We were able to define the lipid profiles of human plasma, erythrocytes, and lymphocytes based on qualitative and quantitative differences in lipid abundance. PMID:18502191

  8. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    PubMed Central

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-01-01

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105

  9. Construction of a full-length enriched cDNA library and preliminary analysis of expressed sequence tags from Bengal Tiger Panthera tigris tigris.

    PubMed

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-05-24

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.

  10. The Alveolate Perkinsus marinus: Biological Insights from EST Gene Discovery

    PubMed Central

    2010-01-01

    Background Perkinsus marinus, a protozoan parasite of the eastern oyster Crassostrea virginica, has devastated natural and farmed oyster populations along the Atlantic and Gulf coasts of the United States. It is classified as a member of the Perkinsozoa, a recently established phylum considered close to the ancestor of ciliates, dinoflagellates, and apicomplexans, and a key taxon for understanding unique adaptations (e.g. parasitism) within the Alveolata. Despite intense parasite pressure, no disease-resistant oysters have been identified and no effective therapies have been developed to date. Results To gain insight into the biological basis of the parasite's virulence and pathogenesis mechanisms, and to identify genes encoding potential targets for intervention, we generated >31,000 5' expressed sequence tags (ESTs) derived from four trophozoite libraries generated from two P. marinus strains. Trimming and clustering of the sequence tags yielded 7,863 unique sequences, some of which carry a spliced leader. Similarity searches revealed that 55% of these had hits in protein sequence databases, of which 1,729 had their best hit with proteins from the chromalveolates (E-value ≤ 1e-5). Some sequences are similar to those proven to be targets for effective intervention in other protozoan parasites, and include not only proteases, antioxidant enzymes, and heat shock proteins, but also those associated with relict plastids, such as acetyl-CoA carboxylase and methyl erythrithol phosphate pathway components, and those involved in glycan assembly, protein folding/secretion, and parasite-host interactions. Conclusions Our transcriptome analysis of P. marinus, the first for any member of the Perkinsozoa, contributes new insight into its biology and taxonomic position. It provides a very informative, albeit preliminary, glimpse into the expression of genes encoding functionally relevant proteins as potential targets for chemotherapy, and evidence for the presence of a relict plastid. Further, although P. marinus sequences display significant similarity to those from both apicomplexans and dinoflagellates, the presence of trans-spliced transcripts confirms the previously established affinities with the latter. The EST analysis reported herein, together with the recently completed sequence of the P. marinus genome and the development of transfection methodology, should result in improved intervention strategies against dermo disease. PMID:20374649

  11. Uneven distribution of expressed sequence tag loci on maize pachytene chromosomes

    PubMed Central

    Anderson, Lorinda K.; Lai, Ann; Stack, Stephen M.; Rizzon, Carene; Gaut, Brandon S.

    2006-01-01

    Examining the relationships among DNA sequence, meiotic recombination, and chromosome structure at a genome-wide scale has been difficult because only a few markers connect genetic linkage maps with physical maps. Here, we have positioned 1195 genetically mapped expressed sequence tag (EST) markers onto the 10 pachytene chromosomes of maize by using a newly developed resource, the RN-cM map. The RN-cM map charts the distribution of crossing over in the form of recombination nodules (RNs) along synaptonemal complexes (SCs, pachytene chromosomes) and allows genetic cM distances to be converted into physical micrometer distances on chromosomes. When this conversion is made, most of the EST markers used in the study are located distally on the chromosomes in euchromatin. ESTs are significantly clustered on chromosomes, even when only euchromatic chromosomal segments are considered. Gene density and recombination rate (as measured by EST and RN frequencies, respectively) are strongly correlated. However, crossover frequencies for telomeric intervals are much higher than was expected from their EST frequencies. For pachytene chromosomes, EST density is about fourfold higher in euchromatin compared with heterochromatin, while DNA density is 1.4 times higher in heterochromatin than in euchromatin. Based on DNA density values and the fraction of pachytene chromosome length that is euchromatic, we estimate that ∼1500 Mbp of the maize genome is in euchromatin. This overview of the organization of the maize genome will be useful in examining genome and chromosome evolution in plants. PMID:16339046

  12. In silico mining and characterization of simple sequence repeats from gilthead sea bream (Sparus aurata) expressed sequence tags (EST-SSRs); PCR amplification, polymorphism evaluation and multiplexing and cross-species assays.

    PubMed

    Vogiatzi, Emmanouella; Lagnel, Jacques; Pakaki, Victoria; Louro, Bruno; Canario, Adelino V M; Reinhardt, Richard; Kotoulas, Georgios; Magoulas, Antonios; Tsigenopoulos, Costas S

    2011-06-01

    We screened for simple sequence repeats (SSRs) found in ESTs derived from an EST-database development project ('Marine Genomics Europe' Network of Excellence). Different motifs of di-, tri-, tetra-, penta- and hexanucleotide SSRs were evaluated for variation in length and position in the expressed sequences, relative abundance and distribution in gilthead sea bream (Sparus aurata). We found 899 ESTs that harbor 997 SSRs (4.94%). On average, one SSR was found per 2.95 kb of EST sequence and the dinucleotide SSRs are the most abundant accounting for 47.6% of the total number. EST-SSRs were used as template for primer design. 664 primer pairs could be successfully identified and a subset of 206 pairs of primers was synthesized, PCR-tested and visualized on ethidium bromide stained agarose gels. The main objective was to further assess the potential of EST-SSRs as informative markers and investigate their cross-species amplification in sixteen teleost fish species: seven sparid species and nine other species from different families. Approximately 78% of the primer pairs gave PCR products of expected size in gilthead sea bream, and as expected, the rate of successful amplification of sea bream EST-SSRs was higher in sparids, lower in other perciforms and even lower in species of the Clupeiform and Gadiform orders. We finally determined the polymorphism and the heterozygosity of 63 markers in a wild gilthead sea bream population; fifty-eight loci were found to be polymorphic with the expected heterozygosity and the number of alleles ranging from 0.089 to 0.946 and from 2 to 27, respectively. These tools and markers are expected to enhance the available genetic linkage map in gilthead sea bream, to assist comparative mapping and genome analyses for this species and further with other model fish species and finally to help advance genetic analysis for cultivated and wild populations and accelerate breeding programs. Copyright © 2011 Elsevier B.V. All rights reserved.

  13. Analysis of expressed sequence tags from Prunus mume flower and fruit and development of simple sequence repeat markers

    PubMed Central

    2010-01-01

    Background Expressed Sequence Tag (EST) has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST) sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047), among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs) in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65%) and low in the peach (46%), and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species. PMID:20626882

  14. Identification of triacylglycerol using automated annotation of high resolution multistage mass spectral trees.

    PubMed

    Wang, Xiupin; Peng, Qingzhi; Li, Peiwu; Zhang, Qi; Ding, Xiaoxia; Zhang, Wen; Zhang, Liangxiao

    2016-10-12

    High complexity of identification for non-target triacylglycerols (TAGs) is a major challenge in lipidomics analysis. To identify non-target TAGs, a powerful tool named accurate MS(n) spectrometry generating so-called ion trees is used. In this paper, we presented a technique for efficient structural elucidation of TAGs on MS(n) spectral trees produced by LTQ Orbitrap MS(n), which was implemented as an open source software package, or TIT. The TIT software was used to support automatic annotation of non-target TAGs on MS(n) ion trees from a self-built fragment ion database. This database includes 19108 simulate TAG molecules from a random combination of fatty acids and corresponding 500582 self-built multistage fragment ions (MS ≤ 3). Our software can identify TAGs using a "stage-by-stage elimination" strategy. By utilizing the MS(1) accurate mass and referenced RKMD, the TIT software can discriminate unique elemental composition candidates. The regiospecific isomers of fatty acyl chains will be distinguished using MS(2) and MS(3) fragment spectra. We applied the algorithm to the selection of 45 TAG standards and demonstrated that the molecular ions could be 100% correctly assigned. Therefore, the TIT software could be applied to TAG identification in complex biological samples such as mouse plasma extracts. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Separation and identification of Musa acuminate Colla (banana) leaf proteins by two-dimensional gel electrophoresis and mass spectrometry.

    PubMed

    Lu, Y; Qi, Y X; Zhang, H; Zhang, H Q; Pu, J J; Xie, Y X

    2013-12-19

    To establish a proteomic reference map of Musa acuminate Colla (banana) leaf, we separated and identified leaf proteins using two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) and mass spectrometry (MS). Tryptic digests of 44 spots were subjected to peptide mass fingerprinting (PMF) by matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) MS. Three spots that were not identified by MALDI-TOF MS analysis were identified by searching against the NCBInr, SwissProt, and expressed sequence tag (EST) databases. We identified 41 unique proteins. The majority of the identified leaf proteins were found to be involved in energy metabolism. The results indicate that 2D-PAGE is a sensitive and powerful technique for the separation and identification of Musa leaf proteins. A summary of the identified proteins and their putative functions is discussed.

  16. Citrus sinensis annotation project (CAP): a comprehensive database for sweet orange genome.

    PubMed

    Wang, Jia; Chen, Dijun; Lei, Yang; Chang, Ji-Wei; Hao, Bao-Hai; Xing, Feng; Li, Sen; Xu, Qiang; Deng, Xiu-Xin; Chen, Ling-Ling

    2014-01-01

    Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET) evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs) and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.

  17. Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India.

    PubMed

    Pemberton, T J; Jakobsson, M; Conrad, D F; Coop, G; Wall, J D; Pritchard, J K; Patel, P I; Rosenberg, N A

    2008-07-01

    When performing association studies in populations that have not been the focus of large-scale investigations of haplotype variation, it is often helpful to rely on genomic databases in other populations for study design and analysis - such as in the selection of tag SNPs and in the imputation of missing genotypes. One way of improving the use of these databases is to rely on a mixture of database samples that is similar to the population of interest, rather than using the single most similar database sample. We demonstrate the effectiveness of the mixture approach in the application of African, European, and East Asian HapMap samples for tag SNP selection in populations from India, a genetically intermediate region underrepresented in genomic studies of haplotype variation.

  18. RNA-Seq Reveals Dynamic Changes of Gene Expression in Key Stages of Intestine Regeneration in the Sea Cucumber Apostichopus japonicas

    PubMed Central

    Sun, Lina; Yang, Hongsheng; Chen, Muyan; Ma, Deyou; Lin, Chenggang

    2013-01-01

    Background Sea cucumbers (Holothuroidea; Echinodermata) have the capacity to regenerate lost tissues and organs. Although the histological and cytological aspects of intestine regeneration have been extensively studied, little is known of the genetic mechanisms involved. There has, however, been a renewed effort to develop a database of Expressed Sequence Tags (ESTs) in Apostichopus japonicus, an economically-important species that occurs in China. This is important for studies on genetic breeding, molecular markers and special physiological phenomena. We have also constructed a library of ESTs obtained from the regenerative body wall and intestine of A. japonicus. The database has increased to ∼30000 ESTs. Results We used RNA-Seq to determine gene expression profiles associated with intestinal regeneration in A. japonicus at 3, 7, 14 and 21 days post evisceration (dpe). This was compared to profiles obtained from a normally-functioning intestine. Approximately 5 million (M) reads were sequenced in every library. Over 2400 up-regulated genes (>10%) and over 1000 down-regulated genes (∼5%) were observed at 3 and 7dpe (log2Ratio≥1, FDR≤0.001). Specific “Go terms” revealed that the DEGs (Differentially Expressed Genes) performed an important function at every regeneration stage. Besides some expected pathways (for example, Ribosome and Spliceosome pathway term), the “Notch signaling pathway,” the “ECM-receptor interaction” and the “Cytokine-cytokine receptor interaction” were significantly enriched. We also investigated the expression profiles of developmental genes, ECM-associated genes and Cytoskeletal genes. Twenty of the most important differentially expressed genes (DEGs) were verified by Real-time PCR, which resulted in a trend concordance of almost 100% between the two techniques. Conclusion Our studies demonstrated dynamic changes in global gene expression during intestine regeneration and presented a series of candidate genes and enriched pathways that contribute to intestine regeneration in sea cucumbers. This provides a foundation for future studies on the genetics/molecular mechanisms associated with intestine regeneration. PMID:23936330

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    B. Gardiner; L.Graton; J.Longo

    Classified removable electronic media (CREM) are tracked in several different ways at the Laboratory. To ensure greater security for CREM, we are creating a single, Laboratory-wide system to track CREM. We are researching technology that can be used to electronically tag and detect CREM, designing a database to track the movement of CREM, and planning to test the system at several locations around the Laboratory. We focus on affixing ''smart tags'' to items we want to track and installing gates at pedestrian portals to detect the entry or exit of tagged items. By means of an enterprise database, the systemmore » will track the entry and exit of tagged items into and from CREM storage vaults, vault-type rooms, access corridors, or boundaries of secure areas, as well as the identity of the person carrying an item. We are considering several options for tracking items that can give greater security, but at greater expense.« less

  20. Gapped Spectral Dictionaries and Their Applications for Database Searches of Tandem Mass Spectra*

    PubMed Central

    Jeong, Kyowon; Kim, Sangtae; Bandeira, Nuno; Pevzner, Pavel A.

    2011-01-01

    Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the limitation of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS searches. Our MS-GappedDictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-GappedDictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. We show that, contrary to conventional wisdom, some high-quality spectra do not have good peptide sequence tags and introduce gapped tags that have advantages over the conventional peptide sequence tags in MS/MS database searches. PMID:21444829

  1. A high-density genetic map of Arachis duranensis, a diploid ancestor of cultivated peanut

    PubMed Central

    2012-01-01

    Background Cultivated peanut (Arachis hypogaea) is an allotetraploid species whose ancestral genomes are most likely derived from the A-genome species, A. duranensis, and the B-genome species, A. ipaensis. The very recent (several millennia) evolutionary origin of A. hypogaea has imposed a bottleneck for allelic and phenotypic diversity within the cultigen. However, wild diploid relatives are a rich source of alleles that could be used for crop improvement and their simpler genomes can be more easily analyzed while providing insight into the structure of the allotetraploid peanut genome. The objective of this research was to establish a high-density genetic map of the diploid species A. duranensis based on de novo generated EST databases. Arachis duranensis was chosen for mapping because it is the A-genome progenitor of cultivated peanut and also in order to circumvent the confounding effects of gene duplication associated with allopolyploidy in A. hypogaea. Results More than one million expressed sequence tag (EST) sequences generated from normalized cDNA libraries of A. duranensis were assembled into 81,116 unique transcripts. Mining this dataset, 1236 EST-SNP markers were developed between two A. duranensis accessions, PI 475887 and Grif 15036. An additional 300 SNP markers also were developed from genomic sequences representing conserved legume orthologs. Of the 1536 SNP markers, 1054 were placed on a genetic map. In addition, 598 EST-SSR markers identified in A. hypogaea assemblies were included in the map along with 37 disease resistance gene candidate (RGC) and 35 other previously published markers. In total, 1724 markers spanning 1081.3 cM over 10 linkage groups were mapped. Gene sequences that provided mapped markers were annotated using similarity searches in three different databases, and gene ontology descriptions were determined using the Medicago Gene Atlas and TAIR databases. Synteny analysis between A. duranensis, Medicago and Glycine revealed significant stretches of conserved gene clusters spread across the peanut genome. A higher level of colinearity was detected between A. duranensis and Glycine than with Medicago. Conclusions The first high-density, gene-based linkage map for A. duranensis was generated that can serve as a reference map for both wild and cultivated Arachis species. The markers developed here are valuable resources for the peanut, and more broadly, to the legume research community. The A-genome map will have utility for fine mapping in other peanut species and has already had application for mapping a nematode resistance gene that was introgressed into A. hypogaea from A. cardenasii. PMID:22967170

  2. Ontology and diversity of transcript-associated microsatellites mined from a globe artichoke EST database

    PubMed Central

    Scaglione, Davide; Acquadro, Alberto; Portis, Ezio; Taylor, Christopher A; Lanteri, Sergio; Knapp, Steven J

    2009-01-01

    Background The globe artichoke (Cynara cardunculus var. scolymus L.) is a significant crop in the Mediterranean basin. Despite its commercial importance and its both dietary and pharmaceutical value, knowledge of its genetics and genomics remains scant. Microsatellite markers have become a key tool in genetic and genomic analysis, and we have exploited recently acquired EST (expressed sequence tag) sequence data (Composite Genome Project - CGP) to develop an extensive set of microsatellite markers. Results A unigene assembly was created from over 36,000 globe artichoke EST sequences, containing 6,621 contigs and 12,434 singletons. Over 12,000 of these unigenes were functionally assigned on the basis of homology with Arabidopsis thaliana reference proteins. A total of 4,219 perfect repeats, located within 3,308 unigenes was identified and the gene ontology (GO) analysis highlighted some GO term's enrichments among different classes of microsatellites with respect to their position. Sufficient flanking sequence was available to enable the design of primers to amplify 2,311 of these microsatellites, and a set of 300 was tested against a DNA panel derived from 28 C. cardunculus genotypes. Consistent amplification and polymorphism was obtained from 236 of these assays. Their polymorphic information content (PIC) ranged from 0.04 to 0.90 (mean 0.66). Between 176 and 198 of the assays were informative in at least one of the three available mapping populations. Conclusion EST-based microsatellites have provided a large set of de novo genetic markers, which show significant amounts of polymorphism both between and within the three taxa of C. cardunculus. They are thus well suited as assays for phylogenetic analysis, the construction of genetic maps, marker-assisted breeding, transcript mapping and other genomic applications in the species. PMID:19785740

  3. Discovery and mapping of a new expressed sequence tag-single nucleotide polymorphism and simple sequence repeat panel for large-scale genetic studies and breeding of Theobroma cacao L.

    PubMed Central

    Allegre, Mathilde; Argout, Xavier; Boccara, Michel; Fouet, Olivier; Roguet, Yolande; Bérard, Aurélie; Thévenin, Jean Marc; Chauveau, Aurélie; Rivallan, Ronan; Clement, Didier; Courtois, Brigitte; Gramacho, Karina; Boland-Augé, Anne; Tahi, Mathias; Umaharan, Pathmanathan; Brunel, Dominique; Lanaud, Claire

    2012-01-01

    Theobroma cacao is an economically important tree of several tropical countries. Its genetic improvement is essential to provide protection against major diseases and improve chocolate quality. We discovered and mapped new expressed sequence tag-single nucleotide polymorphism (EST-SNP) and simple sequence repeat (SSR) markers and constructed a high-density genetic map. By screening 149 650 ESTs, 5246 SNPs were detected in silico, of which 1536 corresponded to genes with a putative function, while 851 had a clear polymorphic pattern across a collection of genetic resources. In addition, 409 new SSR markers were detected on the Criollo genome. Lastly, 681 new EST-SNPs and 163 new SSRs were added to the pre-existing 418 co-dominant markers to construct a large consensus genetic map. This high-density map and the set of new genetic markers identified in this study are a milestone in cocoa genomics and for marker-assisted breeding. The data are available at http://tropgenedb.cirad.fr. PMID:22210604

  4. A resource of large-scale molecular markers for monitoring Agropyron cristatum chromatin introgression in wheat background based on transcriptome sequences.

    PubMed

    Zhang, Jinpeng; Liu, Weihua; Lu, Yuqing; Liu, Qunxing; Yang, Xinming; Li, Xiuquan; Li, Lihui

    2017-09-20

    Agropyron cristatum is a wild grass of the tribe Triticeae and serves as a gene donor for wheat improvement. However, very few markers can be used to monitor A. cristatum chromatin introgressions in wheat. Here, we reported a resource of large-scale molecular markers for tracking alien introgressions in wheat based on transcriptome sequences. By aligning A. cristatum unigenes with the Chinese Spring reference genome sequences, we designed 9602 A. cristatum expressed sequence tag-sequence-tagged site (EST-STS) markers for PCR amplification and experimental screening. As a result, 6063 polymorphic EST-STS markers were specific for the A. cristatum P genome in the single-receipt wheat background. A total of 4956 randomly selected polymorphic EST-STS markers were further tested in eight wheat variety backgrounds, and 3070 markers displaying stable and polymorphic amplification were validated. These markers covered more than 98% of the A. cristatum genome, and the marker distribution density was approximately 1.28 cM. An application case of all EST-STS markers was validated on the A. cristatum 6 P chromosome. These markers were successfully applied in the tracking of alien A. cristatum chromatin. Altogether, this study provided a universal method of large-scale molecular marker development to monitor wild relative chromatin in wheat.

  5. A statistical method for assessing peptide identification confidence in accurate mass and time tag proteomics

    PubMed Central

    Stanley, Jeffrey R.; Adkins, Joshua N.; Slysz, Gordon W.; Monroe, Matthew E.; Purvine, Samuel O.; Karpievitch, Yuliya V.; Anderson, Gordon A.; Smith, Richard D.; Dabney, Alan R.

    2011-01-01

    Current algorithms for quantifying peptide identification confidence in the accurate mass and time (AMT) tag approach assume that the AMT tags themselves have been correctly identified. However, there is uncertainty in the identification of AMT tags, as this is based on matching LC-MS/MS fragmentation spectra to peptide sequences. In this paper, we incorporate confidence measures for the AMT tag identifications into the calculation of probabilities for correct matches to an AMT tag database, resulting in a more accurate overall measure of identification confidence for the AMT tag approach. The method is referred to as Statistical Tools for AMT tag Confidence (STAC). STAC additionally provides a Uniqueness Probability (UP) to help distinguish between multiple matches to an AMT tag and a method to calculate an overall false discovery rate (FDR). STAC is freely available for download as both a command line and a Windows graphical application. PMID:21692516

  6. Large-Scale Collection and Analysis of Full-Length cDNAs from Brachypodium distachyon and Integration with Pooideae Sequence Resources

    PubMed Central

    Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Takahashi, Fuminori; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo

    2013-01-01

    A comprehensive collection of full-length cDNAs is essential for correct structural gene annotation and functional analyses of genes. We constructed a mixed full-length cDNA library from 21 different tissues of Brachypodium distachyon Bd21, and obtained 78,163 high quality expressed sequence tags (ESTs) from both ends of ca. 40,000 clones (including 16,079 contigs). We updated gene structure annotations of Brachypodium genes based on full-length cDNA sequences in comparison with the latest publicly available annotations. About 10,000 non-redundant gene models were supported by full-length cDNAs; ca. 6,000 showed some transcription unit modifications. We also found ca. 580 novel gene models, including 362 newly identified in Bd21. Using the updated transcription start sites, we searched a total of 580 plant cis-motifs in the −3 kb promoter regions and determined a genome-wide Brachypodium promoter architecture. Furthermore, we integrated the Brachypodium full-length cDNAs and updated gene structures with available sequence resources in wheat and barley in a web-accessible database, the RIKEN Brachypodium FL cDNA database. The database represents a “one-stop” information resource for all genomic information in the Pooideae, facilitating functional analysis of genes in this model grass plant and seamless knowledge transfer to the Triticeae crops. PMID:24130698

  7. Expressed sequences tags of the anther smut fungus, Microbotryum violaceum, identify mating and pathogenicity genes

    PubMed Central

    Yockteng, Roxana; Marthey, Sylvain; Chiapello, Hélène; Gendrault, Annie; Hood, Michael E; Rodolphe, François; Devier, Benjamin; Wincker, Patrick; Dossat, Carole; Giraud, Tatiana

    2007-01-01

    Background The basidiomycete fungus Microbotryum violaceum is responsible for the anther-smut disease in many plants of the Caryophyllaceae family and is a model in genetics and evolutionary biology. Infection is initiated by dikaryotic hyphae produced after the conjugation of two haploid sporidia of opposite mating type. This study describes M. violaceum ESTs corresponding to nuclear genes expressed during conjugation and early hyphal production. Results A normalized cDNA library generated 24,128 sequences, which were assembled into 7,765 unique genes; 25.2% of them displayed significant similarity to annotated proteins from other organisms, 74.3% a weak similarity to the same set of known proteins, and 0.5% were orphans. We identified putative pheromone receptors and genes that in other fungi are involved in the mating process. We also identified many sequences similar to genes known to be involved in pathogenicity in other fungi. The M. violaceum EST database, MICROBASE, is available on the Web and provides access to the sequences, assembled contigs, annotations and programs to compare similarities against MICROBASE. Conclusion This study provides a basis for cloning the mating type locus, for further investigation of pathogenicity genes in the anther smut fungi, and for comparative genomics. PMID:17692127

  8. In silico comparative analysis of SSR markers in plants

    PubMed Central

    2011-01-01

    Background The adverse environmental conditions impose extreme limitation to growth and plant development, restricting the genetic potential and reflecting on plant yield losses. The progress obtained by classic plant breeding methods aiming at increasing abiotic stress tolerances have not been enough to cope with increasing food demands. New target genes need to be identified to reach this goal, which requires extensive studies of the related biological mechanisms. Comparative analyses in ancestral plant groups can help to elucidate yet unclear biological processes. Results In this study, we surveyed the occurrence patterns of expressed sequence tag-derived microsatellite markers for model plants. A total of 13,133 SSR markers were discovered using the SSRLocator software in non-redundant EST databases made for all eleven species chosen for this study. The dimer motifs are more frequent in lower plant species, such as green algae and mosses, and the trimer motifs are more frequent for the majority of higher plant groups, such as monocots and dicots. With this in silico study we confirm several microsatellite plant survey results made with available bioinformatics tools. Conclusions The comparative studies of EST-SSR markers among all plant lineages is well suited for plant evolution studies as well as for future studies of transferability of molecular markers. PMID:21247422

  9. Genes up-regulated during red coloration in UV-B irradiated lettuce leaves.

    PubMed

    Park, Jong-Sug; Choung, Myoung-Gun; Kim, Jung-Bong; Hahn, Bum-Soo; Kim, Jong-Bum; Bae, Shin-Chul; Roh, Kyung-Hee; Kim, Yong-Hwan; Cheon, Choong-Ill; Sung, Mi-Kyung; Cho, Kang-Jin

    2007-04-01

    Molecular analysis of gene expression differences between green and red lettuce leaves was performed using the SSH method. BlastX comparisons of subtractive expressed sequence tags (ESTs) indicated that 7.6% of clones encoded enzymes involved in secondary metabolism. Such clones had a particularly high abundance of flavonoid-metabolism proteins (6.5%). Following SSH, 566 clones were rescreened for differential gene expression using dot-blot hybridization. Of these, 53 were found to overexpressed during red coloration. The up-regulated expression of six genes was confirmed by Northern blot analyses. The expression of chalcone synthase (CHS), flavanone 3-hydroxylase (F3H), and dihydroflavonol 4-reductase (DFR) genes showed a positive correlation with anthocyanin accumulation in UV-B-irradiated lettuce leaves; flavonoid 3',5'-hydroxylase (F3',5'H) and anthocyanidin synthase (ANS) were expressed continuously in both samples. These results indicated that the genes CHS, F3H, and DFR coincided with increases in anthocyanin accumulation during the red coloration of lettuce leaves. This study show a relationship between red coloration and the expression of up-regulated genes in lettuce. The subtractive cDNA library and EST database described in this study represent a valuable resource for further research for secondary metabolism in the vegetable crops.

  10. Evolutionary characterization of pig interferon-inducible transmembrane gene family and member expression dynamics in tracheobronchial lymph nodes of pigs infected with swine respiratory disease viruses.

    PubMed

    Miller, Laura C; Jiang, Zhihua; Sang, Yongming; Harhay, Gregory P; Lager, Kelly M

    2014-06-15

    Studies have found that a cluster of duplicated gene loci encoding the interferon-inducible transmembrane proteins (IFITMs) family have antiviral activity against several viruses, including influenza A virus. The gene family has 5 and 7 members in humans and mice, respectively. Here, we confirm the current annotation of pig IFITM1, IFITM2, IFITM3, IFITM5, IFITM1L1 and IFITM1L4, manually annotated IFITM1L2, IFITM1L3, IFITM5L, IFITM3L1 and IFITM3L2, and provide expressed sequence tag (EST) and/or mRNA evidence, not contained with the NCBI Reference Sequence database (RefSeq), for the existence of IFITM6, IFITM7 and a new IFITM1-like (IFITM1LN) gene in pigs. Phylogenic analyses showed seven porcine IFITM genes with highly conserved human/mouse orthologs known to have anti-viral activity. Digital Gene Expression Tag Profiling (DGETP) of swine tracheobronchial lymph nodes (TBLN) of pigs infected with swine influenza virus (SIV), porcine pseudorabies virus, porcine reproductive and respiratory syndrome virus or porcine circovirus type 2 over 14 days post-inoculation (dpi) showed that gene expression abundance differs dramatically among pig IFITM family members, ranging from 0 to over 3000 tags per million. In particular, SIV up-regulated IFITM1 by 5.9 fold at 3 dpi. Bayesian framework further identified pig IFITM1 and IFITM3 as differentially expressed genes in the overall transcriptome analysis. In addition to being a component of protein complexes involved in homotypic adhesion, the IFITM1 is also associated with pathways related to regulation of cell proliferation and IFITM3 is involved in immune responses. Published by Elsevier B.V.

  11. ApiEST-DB: analyzing clustered EST data of the apicomplexan parasites.

    PubMed

    Li, Li; Crabtree, Jonathan; Fischer, Steve; Pinney, Deborah; Stoeckert, Christian J; Sibley, L David; Roos, David S

    2004-01-01

    ApiEST-DB (http://www.cbil.upenn.edu/paradbs-servlet/) provides integrated access to publicly available EST data from protozoan parasites in the phylum Apicomplexa. The database currently incorporates a total of nearly 100,000 ESTs from several parasite species of clinical and/or veterinary interest, including Eimeria tenella, Neospora caninum, Plasmodium falciparum, Sarcocystis neurona and Toxoplasma gondii. To facilitate analysis of these data, EST sequences were clustered and assembled to form consensus sequences for each organism, and these assemblies were then subjected to automated annotation via similarity searches against protein and domain databases. The underlying relational database infrastructure, Genomics Unified Schema (GUS), enables complex biologically based queries, facilitating validation of gene models, identification of alternative splicing, detection of single nucleotide polymorphisms, identification of stage-specific genes and recognition of phylogenetically conserved and phylogenetically restricted sequences.

  12. Generation and Analysis of the Expressed Sequence Tags from the Mycelium of Ganoderma lucidum

    PubMed Central

    Huang, Yen-Hua; Wu, Hung-Yi; Wu, Keh-Ming; Liu, Tze-Tze; Liou, Ruey-Fen; Tsai, Shih-Feng; Shiao, Ming-Shi; Ho, Low-Tone; Tzean, Shean-Shong; Yang, Ueng-Cheng

    2013-01-01

    Ganoderma lucidum (G. lucidum) is a medicinal mushroom renowned in East Asia for its potential biological effects. To enable a systematic exploration of the genes associated with the various phenotypes of the fungus, the genome consortium of G. lucidum has carried out an expressed sequence tag (EST) sequencing project. Using a Sanger sequencing based approach, 47,285 ESTs were obtained from in vitro cultures of G. lucidum mycelium of various durations. These ESTs were further clustered and merged into 7,774 non-redundant expressed loci. The features of these expressed contigs were explored in terms of over-representation, alternative splicing, and natural antisense transcripts. Our results provide an invaluable information resource for exploring the G. lucidum transcriptome and its regulation. Many cases of the genes over-represented in fast-growing dikaryotic mycelium are closely related to growth, such as cell wall and bioactive compound synthesis. In addition, the EST-genome alignments containing putative cassette exons and retained introns were manually curated and then used to make inferences about the predominating splice-site recognition mechanism of G. lucidum. Moreover, a number of putative antisense transcripts have been pinpointed, from which we noticed that two cases are likely to reveal hitherto undiscovered biological pathways. To allow users to access the data and the initial analysis of the results of this project, a dedicated web site has been created at http://csb2.ym.edu.tw/est/. PMID:23658685

  13. Expressed sequence tag based identification and expression analysis of some cold inducible elements in seabuckthorn (Hippophae rhamnoides L.).

    PubMed

    Ghangal, Rajesh; Raghuvanshi, Saurabh; Sharma, Prakash C

    2012-02-01

    A cDNA library was constructed from the mature leaves of seabuckthorn (Hippophae rhamnoides). Expressed Sequence Tags (ESTs) were generated by single pass sequencing of 4500 cDNA clones. We submitted 3412 ESTs to dbEST of NCBI. Clustering of these ESTs yielded 1665 unigenes comprising of 345 contigs and 1320 singletons. Out of 1665 unigenes, 1278 unigenes were annotated by similarity search while the remaining 387 unannotated unigenes were considered as organism specific. Gene Ontology (GO) analysis of the unigene dataset showed 691 unigenes related to biological processes, 727 to molecular functions and 588 to cellular component category. On the basis of similarity search and GO annotation, 43 unigenes were found responsive to biotic and abiotic stresses. To validate this observation, 13 genes that are known to be associated with cold stress tolerance from previous studies in Arabidopsis and 3 novel transcripts were examined by Real time RT-PCR to understand the change in expression pattern under cold/freeze stress. In silico study of occurrence of microsatellites in these ESTs revealed the presence of 62 Simple Sequence Repeats (SSRs), some of which are being explored to assess genetic diversity among seabuckthorn collections. This is the first report of generation of transcriptome data providing information about genes involved in managing plant abiotic stress in seabuckthorn, a plant known for its enormous medicinal and ecological value. Copyright © 2011 Elsevier Masson SAS. All rights reserved.

  14. In silico identification and comparative analysis of differentially expressed genes in human and mouse tissues

    PubMed Central

    Pao, Sheng-Ying; Lin, Win-Li; Hwang, Ming-Jing

    2006-01-01

    Background Screening for differentially expressed genes on the genomic scale and comparative analysis of the expression profiles of orthologous genes between species to study gene function and regulation are becoming increasingly feasible. Expressed sequence tags (ESTs) are an excellent source of data for such studies using bioinformatic approaches because of the rich libraries and tremendous amount of data now available in the public domain. However, any large-scale EST-based bioinformatics analysis must deal with the heterogeneous, and often ambiguous, tissue and organ terms used to describe EST libraries. Results To deal with the issue of tissue source, in this work, we carefully screened and organized more than 8 million human and mouse ESTs into 157 human and 108 mouse tissue/organ categories, to which we applied an established statistic test using different thresholds of the p value to identify genes differentially expressed in different tissues. Further analysis of the tissue distribution and level of expression of human and mouse orthologous genes showed that tissue-specific orthologs tended to have more similar expression patterns than those lacking significant tissue specificity. On the other hand, a number of orthologs were found to have significant disparity in their expression profiles, hinting at novel functions, divergent regulation, or new ortholog relationships. Conclusion Comprehensive statistics on the tissue-specific expression of human and mouse genes were obtained in this very large-scale, EST-based analysis. These statistical results have been organized into a database, freely accessible at our website , for easy searching of human and mouse tissue-specific genes and for investigating gene expression profiles in the context of comparative genomics. Comparative analysis showed that, although highly tissue-specific genes tend to exhibit similar expression profiles in human and mouse, there are significant exceptions, indicating that orthologous genes, while sharing basic genomic properties, could result in distinct phenotypes. PMID:16626500

  15. Comparing the hierarchy of author given tags and repository given tags in a large document archive

    NASA Astrophysics Data System (ADS)

    Tibély, Gergely; Pollner, Péter; Palla, Gergely

    2016-10-01

    Folksonomies - large databases arising from collaborative tagging of items by independent users - are becoming an increasingly important way of categorizing information. In these systems users can tag items with free words, resulting in a tripartite item-tag-user network. Although there are no prescribed relations between tags, the way users think about the different categories presumably has some built in hierarchy, in which more special concepts are descendants of some more general categories. Several applications would benefit from the knowledge of this hierarchy. Here we apply a recent method to check the differences and similarities of hierarchies resulting from tags given by independent individuals and from tags given by a centrally managed repository system. The results from our method showed substantial differences between the lower part of the hierarchies, and in contrast, a relatively high similarity at the top of the hierarchies.

  16. MEPD: a Medaka gene expression pattern database

    PubMed Central

    Henrich, Thorsten; Ramialison, Mirana; Quiring, Rebecca; Wittbrodt, Beate; Furutani-Seiki, Makoto; Wittbrodt, Joachim; Kondoh, Hisato

    2003-01-01

    The Medaka Expression Pattern Database (MEPD) stores and integrates information of gene expression during embryonic development of the small freshwater fish Medaka (Oryzias latipes). Expression patterns of genes identified by ESTs are documented by images and by descriptions through parameters such as staining intensity, category and comments and through a comprehensive, hierarchically organized dictionary of anatomical terms. Sequences of the ESTs are available and searchable through BLAST. ESTs in the database are clustered upon entry and have been blasted against public data-bases. The BLAST results are updated regularly, stored within the database and searchable. The MEPD is a project within the Medaka Genome Initiative (MGI) and entries will be interconnected to integrated genomic map databases. MEPD is accessible through the WWW at http://medaka.dsp.jst.go.jp/MEPD. PMID:12519950

  17. The transcriptome of Lutzomyia longipalpis (Diptera: Psychodidae) male reproductive organs.

    PubMed

    Azevedo, Renata V D M; Dias, Denise B S; Bretãs, Jorge A C; Mazzoni, Camila J; Souza, Nataly A; Albano, Rodolpho M; Wagner, Glauber; Davila, Alberto M R; Peixoto, Alexandre A

    2012-01-01

    It has been suggested that genes involved in the reproductive biology of insect disease vectors are potential targets for future alternative methods of control. Little is known about the molecular biology of reproduction in phlebotomine sand flies and there is no information available concerning genes that are expressed in male reproductive organs of Lutzomyia longipalpis, the main vector of American visceral leishmaniasis and a species complex. We generated 2678 high quality ESTs ("Expressed Sequence Tags") of L. longipalpis male reproductive organs that were grouped in 1391 non-redundant sequences (1136 singlets and 255 clusters). BLAST analysis revealed that only 57% of these sequences share similarity with a L. longipalpis female EST database. Although no more than 36% of the non-redundant sequences showed similarity to protein sequences deposited in databases, more than half of them presented the best-match hits with mosquito genes. Gene ontology analysis identified subsets of genes involved in biological processes such as protein biosynthesis and DNA replication, which are probably associated with spermatogenesis. A number of non-redundant sequences were also identified as putative male reproductive gland proteins (mRGPs), also known as male accessory gland protein genes (Acps). The transcriptome analysis of L. longipalpis male reproductive organs is one step further in the study of the molecular basis of the reproductive biology of this important species complex. It has allowed the identification of genes potentially involved in spermatogenesis as well as putative mRGPs sequences, which have been studied in many insect species because of their effects on female post-mating behavior and physiology and their potential role in sexual selection and speciation. These data open a number of new avenues for further research in the molecular and evolutionary reproductive biology of sand flies.

  18. Acclimation to different depths by the marine angiosperm Posidonia oceanica: transcriptomic and proteomic profiles

    PubMed Central

    Dattolo, Emanuela; Gu, Jenny; Bayer, Philipp E.; Mazzuca, Silvia; Serra, Ilia A.; Spadafora, Antonia; Bernardo, Letizia; Natali, Lucia; Cavallini, Andrea; Procaccini, Gabriele

    2013-01-01

    For seagrasses, seasonal and daily variations in light and temperature represent the mains factors driving their distribution along the bathymetric cline. Changes in these environmental factors, due to climatic and anthropogenic effects, can compromise their survival. In a framework of conservation and restoration, it becomes crucial to improve our knowledge about the physiological plasticity of seagrass species along environmental gradients. Here, we aimed to identify differences in transcriptomic and proteomic profiles, involved in the acclimation along the depth gradient in the seagrass Posidonia oceanica, and to improve the available molecular resources in this species, which is an important requisite for the application of eco-genomic approaches. To do that, from plant growing in shallow (−5 m) and deep (−25 m) portions of a single meadow, (i) we generated two reciprocal Expressed Sequences Tags (EST) libraries using a Suppressive Subtractive Hybridization (SSH) approach, to obtain depth/specific transcriptional profiles, and (ii) we identified proteins differentially expressed, using the highly innovative USIS mass spectrometry methodology, coupled with 1D-SDS electrophoresis and labeling free approach. Mass spectra were searched in the open source Global Proteome Machine (GPM) engine against plant databases and with the X!Tandem algorithm against a local database. Transcriptional analysis showed both quantitative and qualitative differences between depths. EST libraries had only the 3% of transcripts in common. A total of 315 peptides belonging to 64 proteins were identified by mass spectrometry. ATP synthase subunits were among the most abundant proteins in both conditions. Both approaches identified genes and proteins in pathways related to energy metabolism, transport and genetic information processing, that appear to be the most involved in depth acclimation in P. oceanica. Their putative rules in acclimation to depth were discussed. PMID:23785376

  19. Acclimation to different depths by the marine angiosperm Posidonia oceanica: transcriptomic and proteomic profiles.

    PubMed

    Dattolo, Emanuela; Gu, Jenny; Bayer, Philipp E; Mazzuca, Silvia; Serra, Ilia A; Spadafora, Antonia; Bernardo, Letizia; Natali, Lucia; Cavallini, Andrea; Procaccini, Gabriele

    2013-01-01

    For seagrasses, seasonal and daily variations in light and temperature represent the mains factors driving their distribution along the bathymetric cline. Changes in these environmental factors, due to climatic and anthropogenic effects, can compromise their survival. In a framework of conservation and restoration, it becomes crucial to improve our knowledge about the physiological plasticity of seagrass species along environmental gradients. Here, we aimed to identify differences in transcriptomic and proteomic profiles, involved in the acclimation along the depth gradient in the seagrass Posidonia oceanica, and to improve the available molecular resources in this species, which is an important requisite for the application of eco-genomic approaches. To do that, from plant growing in shallow (-5 m) and deep (-25 m) portions of a single meadow, (i) we generated two reciprocal Expressed Sequences Tags (EST) libraries using a Suppressive Subtractive Hybridization (SSH) approach, to obtain depth/specific transcriptional profiles, and (ii) we identified proteins differentially expressed, using the highly innovative USIS mass spectrometry methodology, coupled with 1D-SDS electrophoresis and labeling free approach. Mass spectra were searched in the open source Global Proteome Machine (GPM) engine against plant databases and with the X!Tandem algorithm against a local database. Transcriptional analysis showed both quantitative and qualitative differences between depths. EST libraries had only the 3% of transcripts in common. A total of 315 peptides belonging to 64 proteins were identified by mass spectrometry. ATP synthase subunits were among the most abundant proteins in both conditions. Both approaches identified genes and proteins in pathways related to energy metabolism, transport and genetic information processing, that appear to be the most involved in depth acclimation in P. oceanica. Their putative rules in acclimation to depth were discussed.

  20. A Hybrid Probabilistic Model for Unified Collaborative and Content-Based Image Tagging.

    PubMed

    Zhou, Ning; Cheung, William K; Qiu, Guoping; Xue, Xiangyang

    2011-07-01

    The increasing availability of large quantities of user contributed images with labels has provided opportunities to develop automatic tools to tag images to facilitate image search and retrieval. In this paper, we present a novel hybrid probabilistic model (HPM) which integrates low-level image features and high-level user provided tags to automatically tag images. For images without any tags, HPM predicts new tags based solely on the low-level image features. For images with user provided tags, HPM jointly exploits both the image features and the tags in a unified probabilistic framework to recommend additional tags to label the images. The HPM framework makes use of the tag-image association matrix (TIAM). However, since the number of images is usually very large and user-provided tags are diverse, TIAM is very sparse, thus making it difficult to reliably estimate tag-to-tag co-occurrence probabilities. We developed a collaborative filtering method based on nonnegative matrix factorization (NMF) for tackling this data sparsity issue. Also, an L1 norm kernel method is used to estimate the correlations between image features and semantic concepts. The effectiveness of the proposed approach has been evaluated using three databases containing 5,000 images with 371 tags, 31,695 images with 5,587 tags, and 269,648 images with 5,018 tags, respectively.

  1. EST-derived SNP discovery and selective pressure analysis in Pacific white shrimp ( Litopenaeus vannamei)

    NASA Astrophysics Data System (ADS)

    Liu, Chengzhang; Wang, Xia; Xiang, Jianhai; Li, Fuhua

    2012-09-01

    Pacific white shrimp has become a major aquaculture and fishery species worldwide. Although a large scale EST resource has been publicly available since 2008, the data have not yet been widely used for SNP discovery or transcriptome-wide assessment of selective pressure. In this study, a set of 155 411 expressed sequence tags (ESTs) from the NCBI database were computationally analyzed and 17 225 single nucleotide polymorphisms (SNPs) were predicted, including 9 546 transitions, 5 124 transversions and 2 481 indels. Among the 7 298 SNP substitutions located in functionally annotated contigs, 58.4% (4 262) are non-synonymous SNPs capable of introducing amino acid mutations. Two hundred and fifty nonsynonymous SNPs in genes associated with economic traits have been identified as candidates for markers in selective breeding. Diversity estimates among the synonymous nucleotides were on average 3.49 times greater than those in non-synonymous, suggesting negative selection. Distribution of non-synonymous to synonymous substitutions (Ka/Ks) ratio ranges from 0 to 4.01, (average 0.42, median 0.26), suggesting that the majority of the affected genes are under purifying selection. Enrichment analysis identified multiple gene ontology categories under positive or negative selection. Categories involved in innate immune response and male gamete generation are rich in positively selected genes, which is similar to reports in Drosophila and primates. This work is the first transcriptome-wide assessment of selective pressure in a Penaeid shrimp species. The functionally annotated SNPs provide a valuable resource of potential molecular markers for selective breeding.

  2. In-silico and in-vivo analyses of EST databases unveil conserved miRNAs from Carthamus tinctorius and Cynara cardunculus

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) are small RNAs (21-24 bp) providing an RNA-based system of gene regulation highly conserved in plants and animals. In plants, miRNAs control mRNA degradation or restrain translation, affecting development and responses to stresses. Plant miRNAs show imperfect but extensive complementarity to mRNA targets, making their computational prediction possible, useful when data mining is applied on different species. In this study we used a comparative approach to identify both miRNAs and their targets, in artichoke and safflower. Results Two complete expressed sequence tags (ESTs) datasets from artichoke (3.6·104 entries) and safflower (4.2·104), were analysed with a bioinformatic pipeline and in vitro experiments, identifying 17 potential miRNAs. For each EST, using RNAhybrid program and 953 non redundant miRNA mature sequences, available in mirBase as reference, we searched matching putative targets. 8730 out of 42011 ESTs from safflower and 7145 of 36323 ESTs from artichoke showed at least one predicted miRNA target. BLAST analysis showed that 75% of all ESTs shared at least a common homologous region (E-value < 10-4) and about 50% of these displayed 400 bp or longer aligned sequences as conserved homologous/orthologous (COS) regions. 960 and 890 ESTs of safflower and artichoke organized in COS shared 79 different miRNA targets, considered functionally conserved, and statistically significant when compared with random sequences (signal to noise ratio > 2 and specificity ≥ 0.85). Four highly significant miRNAs selected from in silico data were experimentally validated in globe artichoke leaves. Conclusions Mature miRNAs and targets were predicted within EST sequences of safflower and artichoke. Most of the miRNA targets appeared highly/moderately conserved, highlighting an important and conserved function. In this study we introduce a stringent parameter for the comparative sequence analysis, represented by the identification of the same target in the COS region. After statistical analysis 79 targets, found on the COS regions and belonging to 60 miRNA families, have a signal to noise ratio > 2, with ≥ 0.85 specificity. The putative miRNAs identified belong to 55 dicotyledon plants and to 24 families only in monocotyledon. PMID:22536958

  3. Analysis of archaeological triacylglycerols by high resolution nanoESI, FT-ICR MS and IRMPD MS/MS: Application to 5th century BC-4th century AD oil lamps from Olbia (Ukraine)

    NASA Astrophysics Data System (ADS)

    Garnier, Nicolas; Rolando, Christian; Høtje, Jakob Munk; Tokarski, Caroline

    2009-07-01

    This work presents the precise identification of triacylglycerols (TAGs) extracted from archaeological samples using a methodology based on nanoelectrospray and Fourier transform mass spectrometry. The archaeological TAG identification needs adapted sample preparation protocols to trace samples in advanced degradation state. More precisely, the proposed preparation procedure includes extraction of the lipid components from finely grinded ceramic using dichloromethane/methanol mixture with additional ultrasonication treatment, and TAG purification by solid phase extraction on a diol cartridge. Focusing on the analytical approach, the implementation of "in-house" species-dependent TAG database was investigated using MS and InfraRed Multiphoton Dissociation (IRMPD) MS/MS spectra; several vegetal oils, dairy products and animal fats were studied. The high mass accuracy of the Fourier transform analyzer ([Delta]m below 2.5 ppm) provides easier data interpretation, and allows distinction between products of different origins. In details, the IRMPD spectra of the lithiated TAGs reveal fragmentation reactions including loss of free neutral fatty acid and loss of fatty acid as [alpha],[beta]-unsaturated moieties. Based on the developed preparation procedure and on the constituted database, TAG extracts from 5th century BC to 4th century AD Olbia lamps were analyzed. The structural information obtained succeeds in identifying that bovine/ovine fats were used as fuel used in these archaeological Olbia lamps.

  4. Identification of tissue-specific, abiotic stress-responsive gene expression patterns in wine grape (Vitis vinifera L.) based on curation and mining of large-scale EST data sets

    PubMed Central

    2011-01-01

    Background Abiotic stresses, such as water deficit and soil salinity, result in changes in physiology, nutrient use, and vegetative growth in vines, and ultimately, yield and flavor in berries of wine grape, Vitis vinifera L. Large-scale expressed sequence tags (ESTs) were generated, curated, and analyzed to identify major genetic determinants responsible for stress-adaptive responses. Although roots serve as the first site of perception and/or injury for many types of abiotic stress, EST sequencing in root tissues of wine grape exposed to abiotic stresses has been extremely limited to date. To overcome this limitation, large-scale EST sequencing was conducted from root tissues exposed to multiple abiotic stresses. Results A total of 62,236 expressed sequence tags (ESTs) were generated from leaf, berry, and root tissues from vines subjected to abiotic stresses and compared with 32,286 ESTs sequenced from 20 public cDNA libraries. Curation to correct annotation errors, clustering and assembly of the berry and leaf ESTs with currently available V. vinifera full-length transcripts and ESTs yielded a total of 13,278 unique sequences, with 2302 singletons and 10,976 mapped to V. vinifera gene models. Of these, 739 transcripts were found to have significant differential expression in stressed leaves and berries including 250 genes not described previously as being abiotic stress responsive. In a second analysis of 16,452 ESTs from a normalized root cDNA library derived from roots exposed to multiple, short-term, abiotic stresses, 135 genes with root-enriched expression patterns were identified on the basis of their relative EST abundance in roots relative to other tissues. Conclusions The large-scale analysis of relative EST frequency counts among a diverse collection of 23 different cDNA libraries from leaf, berry, and root tissues of wine grape exposed to a variety of abiotic stress conditions revealed distinct, tissue-specific expression patterns, previously unrecognized stress-induced genes, and many novel genes with root-enriched mRNA expression for improving our understanding of root biology and manipulation of rootstock traits in wine grape. mRNA abundance estimates based on EST library-enriched expression patterns showed only modest correlations between microarray and quantitative, real-time reverse transcription-polymerase chain reaction (qRT-PCR) methods highlighting the need for deep-sequencing expression profiling methods. PMID:21592389

  5. OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.

    PubMed

    Schreiber, Fabian; Pick, Kerstin; Erpenbeck, Dirk; Wörheide, Gert; Morgenstern, Burkhard

    2009-07-16

    Phylogenetic studies using expressed sequence tags (EST) are becoming a standard approach to answer evolutionary questions. Such studies are usually based on large sets of newly generated, unannotated, and error-prone EST sequences from different species. A first crucial step in EST-based phylogeny reconstruction is to identify groups of orthologous sequences. From these data sets, appropriate target genes are selected, and redundant sequences are eliminated to obtain suitable sequence sets as input data for tree-reconstruction software. Generating such data sets manually can be very time consuming. Thus, software tools are needed that carry out these steps automatically. We developed a flexible and user-friendly software pipeline, running on desktop machines or computer clusters, that constructs data sets for phylogenomic analyses. It automatically searches assembled EST sequences against databases of orthologous groups (OG), assigns ESTs to these predefined OGs, translates the sequences into proteins, eliminates redundant sequences assigned to the same OG, creates multiple sequence alignments of identified orthologous sequences and offers the possibility to further process this alignment in a last step by excluding potentially homoplastic sites and selecting sufficiently conserved parts. Our software pipeline can be used as it is, but it can also be adapted by integrating additional external programs. This makes the pipeline useful for non-bioinformaticians as well as to bioinformatic experts. The software pipeline is especially designed for ESTs, but it can also handle protein sequences. OrthoSelect is a tool that produces orthologous gene alignments from assembled ESTs. Our tests show that OrthoSelect detects orthologs in EST libraries with high accuracy. In the absence of a gold standard for orthology prediction, we compared predictions by OrthoSelect to a manually created and published phylogenomic data set. Our tool was not only able to rebuild the data set with a specificity of 98%, but it detected four percent more orthologous sequences. Furthermore, the results OrthoSelect produces are in absolut agreement with the results of other programs, but our tool offers a significant speedup and additional functionality, e.g. handling of ESTs, computing sequence alignments, and refining them. To our knowledge, there is currently no fully automated and freely available tool for this purpose. Thus, OrthoSelect is a valuable tool for researchers in the field of phylogenomics who deal with large quantities of EST sequences. OrthoSelect is written in Perl and runs on Linux/Mac OS X. The tool can be downloaded at (http://gobics.de/fabian/orthoselect.php).

  6. Martin Karplus and Computer Modeling for Chemical Systems

    Science.gov Websites

    &D Nuggets Database dropdown arrow Search Tag Cloud Browse Reports Database Help Finding Aids with two-time Nobel laureate Linus Pauling, whom Karplus described as an important early influence. He

  7. Yuan T. Lee and Molecular Beam Studies

    Science.gov Websites

    &D Nuggets Database dropdown arrow Search Tag Cloud Browse Reports Database Help Finding Aids : The Influence of Yuan T. Lee, Journal of Chemical Physics, Volume 125, Issue 13, pp. 132302-132302-19

  8. The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika

    2010-01-27

    Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set ofmore » tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in EST library sequencing approaches, and thus represent a rich resource for studies of environmental genomics.« less

  9. VitisExpDB: a database resource for grape functional genomics.

    PubMed

    Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L

    2008-02-28

    The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores approximately 320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of approximately 20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website http://cropdisease.ars.usda.gov/vitis_at/main-page.htm.

  10. VitisExpDB: A database resource for grape functional genomics

    PubMed Central

    Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L

    2008-01-01

    Background The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. Description VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores ~320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of ~20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. Conclusion The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website . PMID:18307813

  11. Cytogenetic and molecular markers for detecting Aegilops uniaristata chromosomes in a wheat background.

    PubMed

    Gong, Wenping; Li, Guangrong; Zhou, Jianping; Li, Genying; Liu, Cheng; Huang, Chengyan; Zhao, Zhendong; Yang, Zujun

    2014-09-01

    Aegilops uniaristata has many agronomically useful traits that can be used for wheat breeding. So far, a Triticum turgidum - Ae. uniaristata amphiploid and one set of Chinese Spring (CS) - Ae. uniaristata addition lines have been produced. To guide Ae. uniaristata chromatin transformation from these lines into cultivated wheat through chromosome engineering, reliable cytogenetic and molecular markers specific for Ae. uniaristata chromosomes need to be developed. Standard C-banding shows that C-bands mainly exist in the centromeric regions of Ae. uniaristata but rarely at the distal ends. Fluorescence in situ hybridization (FISH) using (GAA)8 as a probe showed that the hybridization signal of chromosomes 1N-7N are different, thus (GAA)8 can be used to identify all Ae. uniaristata chromosomes in wheat background simultaneously. Moreover, a total of 42 molecular markers specific for Ae. uniaristata chromosomes were developed by screening expressed sequence tag - sequence tagged site (EST-STS), expressed sequence tag - simple sequence repeat (EST-SSR), and PCR-based landmark unique gene (PLUG) primers. The markers were subsequently localized using the CS - Ae. uniaristata addition lines and different wheat cultivars as controls. The cytogenetic and molecular markers developed herein will be helpful for screening and identifying wheat - Ae. uniaristata progeny.

  12. The Eimeria Transcript DB: an integrated resource for annotated transcripts of protozoan parasites of the genus Eimeria

    PubMed Central

    Rangel, Luiz Thibério; Novaes, Jeniffer; Durham, Alan M.; Madeira, Alda Maria B. N.; Gruber, Arthur

    2013-01-01

    Parasites of the genus Eimeria infect a wide range of vertebrate hosts, including chickens. We have recently reported a comparative analysis of the transcriptomes of Eimeria acervulina, Eimeria maxima and Eimeria tenella, integrating ORESTES data produced by our group and publicly available Expressed Sequence Tags (ESTs). All cDNA reads have been assembled, and the reconstructed transcripts have been submitted to a comprehensive functional annotation pipeline. Additional studies included orthology assignment across apicomplexan parasites and clustering analyses of gene expression profiles among different developmental stages of the parasites. To make all this body of information publicly available, we constructed the Eimeria Transcript Database (EimeriaTDB), a web repository that provides access to sequence data, annotation and comparative analyses. Here, we describe the web interface, available sequence data sets and query tools implemented on the site. The main goal of this work is to offer a public repository of sequence and functional annotation data of reconstructed transcripts of parasites of the genus Eimeria. We believe that EimeriaTDB will represent a valuable and complementary resource for the Eimeria scientific community and for those researchers interested in comparative genomics of apicomplexan parasites. Database URL: http://www.coccidia.icb.usp.br/eimeriatdb/ PMID:23411718

  13. Discovering genes associated with dormancy in the monogonont rotifer Brachionus plicatilis

    PubMed Central

    Denekamp, Nadav Y; Thorne, Michael AS; Clark, Melody S; Kube, Michael; Reinhardt, Richard; Lubzens, Esther

    2009-01-01

    Background Microscopic monogonont rotifers, including the euryhaline species Brachionus plicatilis, are typically found in water bodies where environmental factors restrict population growth to short periods lasting days or months. The survival of the population is ensured via the production of resting eggs that show a remarkable tolerance to unfavorable conditions and remain viable for decades. The aim of this study was to generate Expressed Sequence Tags (ESTs) for molecular characterisation of processes associated with the formation of resting eggs, their survival during dormancy and hatching. Results Four normalized and four subtractive libraries were constructed to provide a resource for rotifer transcriptomics associated with resting-egg formation, storage and hatching. A total of 47,926 sequences were assembled into 18,000 putative transcripts and analyzed using both Blast and GO annotation. About 28–55% (depending on the library) of the clones produced significant matches against the Swissprot and Trembl databases. Genes known to be associated with desiccation tolerance during dormancy in other organisms were identified in the EST libraries. These included genes associated with antioxidant activity, low molecular weight heat shock proteins and Late Embryonic Abundant (LEA) proteins. Real-time PCR confirmed that LEA transcripts, small heat-shock proteins and some antioxidant genes were upregulated in resting eggs, therefore suggesting that desiccation tolerance is a characteristic feature of resting eggs even though they do not necessarily fully desiccate during dormancy. The role of trehalose in resting-egg formation and survival remains unclear since there was no significant difference between resting-egg producing females and amictic females in the expression of the tps-1 gene. In view of the absence of vitellogenin transcripts, matches to lipoprotein lipase proteins suggest that, similar to the situation in dipterans, these proteins may serve as the yolk proteins in rotifers. Conclusion The 47,926 ESTs expand significantly the current sequence resource of B. plicatilis. It describes, for the first time, genes putatively associated with resting eggs and will serve as a database for future global expression experiments, particularly for the further identification of dormancy related genes. PMID:19284654

  14. Discovering genes associated with dormancy in the monogonont rotifer Brachionus plicatilis.

    PubMed

    Denekamp, Nadav Y; Thorne, Michael A S; Clark, Melody S; Kube, Michael; Reinhardt, Richard; Lubzens, Esther

    2009-03-13

    Microscopic monogonont rotifers, including the euryhaline species Brachionus plicatilis, are typically found in water bodies where environmental factors restrict population growth to short periods lasting days or months. The survival of the population is ensured via the production of resting eggs that show a remarkable tolerance to unfavorable conditions and remain viable for decades. The aim of this study was to generate Expressed Sequence Tags (ESTs) for molecular characterisation of processes associated with the formation of resting eggs, their survival during dormancy and hatching. Four normalized and four subtractive libraries were constructed to provide a resource for rotifer transcriptomics associated with resting-egg formation, storage and hatching. A total of 47,926 sequences were assembled into 18,000 putative transcripts and analyzed using both Blast and GO annotation. About 28-55% (depending on the library) of the clones produced significant matches against the Swissprot and Trembl databases. Genes known to be associated with desiccation tolerance during dormancy in other organisms were identified in the EST libraries. These included genes associated with antioxidant activity, low molecular weight heat shock proteins and Late Embryonic Abundant (LEA) proteins. Real-time PCR confirmed that LEA transcripts, small heat-shock proteins and some antioxidant genes were upregulated in resting eggs, therefore suggesting that desiccation tolerance is a characteristic feature of resting eggs even though they do not necessarily fully desiccate during dormancy. The role of trehalose in resting-egg formation and survival remains unclear since there was no significant difference between resting-egg producing females and amictic females in the expression of the tps-1 gene. In view of the absence of vitellogenin transcripts, matches to lipoprotein lipase proteins suggest that, similar to the situation in dipterans, these proteins may serve as the yolk proteins in rotifers. The 47,926 ESTs expand significantly the current sequence resource of B. plicatilis. It describes, for the first time, genes putatively associated with resting eggs and will serve as a database for future global expression experiments, particularly for the further identification of dormancy related genes.

  15. Gene expression analysis of flax seed development

    PubMed Central

    2011-01-01

    Background Flax, Linum usitatissimum L., is an important crop whose seed oil and stem fiber have multiple industrial applications. Flax seeds are also well-known for their nutritional attributes, viz., omega-3 fatty acids in the oil and lignans and mucilage from the seed coat. In spite of the importance of this crop, there are few molecular resources that can be utilized toward improving seed traits. Here, we describe flax embryo and seed development and generation of comprehensive genomic resources for the flax seed. Results We describe a large-scale generation and analysis of expressed sequences in various tissues. Collectively, the 13 libraries we have used provide a broad representation of genes active in developing embryos (globular, heart, torpedo, cotyledon and mature stages) seed coats (globular and torpedo stages) and endosperm (pooled globular to torpedo stages) and genes expressed in flowers, etiolated seedlings, leaves, and stem tissue. A total of 261,272 expressed sequence tags (EST) (GenBank accessions LIBEST_026995 to LIBEST_027011) were generated. These EST libraries included transcription factor genes that are typically expressed at low levels, indicating that the depth is adequate for in silico expression analysis. Assembly of the ESTs resulted in 30,640 unigenes and 82% of these could be identified on the basis of homology to known and hypothetical genes from other plants. When compared with fully sequenced plant genomes, the flax unigenes resembled poplar and castor bean more than grape, sorghum, rice or Arabidopsis. Nearly one-fifth of these (5,152) had no homologs in sequences reported for any organism, suggesting that this category represents genes that are likely unique to flax. Digital analyses revealed gene expression dynamics for the biosynthesis of a number of important seed constituents during seed development. Conclusions We have developed a foundational database of expressed sequences and collection of plasmid clones that comprise even low-expressed genes such as those encoding transcription factors. This has allowed us to delineate the spatio-temporal aspects of gene expression underlying the biosynthesis of a number of important seed constituents in flax. Flax belongs to a taxonomic group of diverse plants and the large sequence database will allow for evolutionary studies as well. PMID:21529361

  16. Anchoring 9,371 Maize Expressed Sequence Tagged Unigenes to the Bacterial Artificial Chromosome Contig Map by Two-Dimensional Overgo Hybridization1

    PubMed Central

    Gardiner, Jack; Schroeder, Steven; Polacco, Mary L.; Sanchez-Villeda, Hector; Fang, Zhiwei; Morgante, Michele; Landewe, Tim; Fengler, Kevin; Useche, Francisco; Hanafey, Michael; Tingey, Scott; Chou, Hugh; Wing, Rod; Soderlund, Carol; Coe, Edward H.

    2004-01-01

    Our goal is to construct a robust physical map for maize (Zea mays) comprehensively integrated with the genetic map. We have used a two-dimensional 24 × 24 overgo pooling strategy to anchor maize expressed sequence tagged (EST) unigenes to 165,888 bacterial artificial chromosomes (BACs) on high-density filters. A set of 70,716 public maize ESTs seeded derivation of 10,723 EST unigene assemblies. From these assemblies, 10,642 overgo sequences of 40 bp were applied as hybridization probes. BAC addresses were obtained for 9,371 overgo probes, representing an 88% success rate. More than 96% of the successful overgo probes identified two or more BACs, while 5% identified more than 50 BACs. The majority of BACs identified (79%) were hybridized with one or two overgos. A small number of BACs hybridized with eight or more overgos, suggesting that these BACs must be gene rich. Approximately 5,670 overgos identified BACs assembled within one contig, indicating that these probes are highly locus specific. A total of 1,795 megabases (Mb; 87%) of the total 2,050 Mb in BAC contigs were associated with one or more overgos, which are serving as sequence-tagged sites for single nucleotide polymorphism development. Overgo density ranged from less than one overgo per megabase to greater than 20 overgos per megabase. The majority of contigs (52%) hit by overgos contained three to nine overgos per megabase. Analysis of approximately 1,022 Mb of genetically anchored BAC contigs indicates that 9,003 of the total 13,900 overgo-contig sites are genetically anchored. Our results indicate overgos are a powerful approach for generating gene-specific hybridization probes that are facilitating the assembly of an integrated genetic and physical map for maize. PMID:15020742

  17. Bringing the fathead minnow (Pimephales promelas) into the ...

    EPA Pesticide Factsheets

    The fathead minnow (Pimephales promelas) is a well-established ecotoxicological model organism that has been widely used for regulatory ecotoxicity testing and research for over a half century. Throughout this time, a lot of knowledge has been gained about the fathead minnow’s biological responses to various xenobiotics. However, despite its importance as a model organism, the fathead minnow still has few publicly available gene sequences. Recently, Burns et al. (2015; Environ. Toxicol. Chem. 35:212) described the sequencing and de-novo assembly of the fathead minnow genome. Two draft genome assemblies are now publicly available on the GenBank database. However, on their own the draft assemblies remain of limited use to researchers who are primarily interested in the functional units of the genome, i.e. the genes. In the present study, an annotation pipeline, consisting of gene prediction, evidence alignment, and data synthesis, was applied to the fathead minnow SOAPdenovo assembly. Ab initio gene prediction was performed using AUGUSTUS, which provided a starting point of 43,345 gene predictions. Fathead minnow Expressed Sequence Tags (ESTs) and zebrafish protein-coding sequences (CDSs) were then aligned to the assembly using the corresponding spliced alignment methods of the program Exonerate. Of the over 240,000 EST alignments, 73% were successfully aligned with 90% or greater sequence identity and query coverage. Similarly, 39% of nearly 45,000 zebrafish co

  18. Purification, crystallization and preliminary crystallographic analysis of Est25: a ketoprofen-specific hormone-sensitive lipase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, SeungBum; Joo, Sangbum; Yoon, Hyun C.

    2007-07-01

    Est25, a ketoprofen-specific hormone-sensitive lipase from a metagenomic library, was crystallized and diffraction data were collected to 1.49 Å resolution. Ketoprofen, a nonsteroidal anti-inflammatory drug, inhibits the synthesis of prostaglandin. A novel hydrolase (Est25) with high ketoprofen specificity has previously been identified using a metagenomic library from environmental samples. Recombinant Est25 protein with a histidine tag at the N-terminus was expressed in Escherichia coli and purified in a homogenous form. Est25 was crystallized from 2.4 M sodium malonate pH 7.0 and X-ray diffraction data were collected to 1.49 Å using synchrotron radiation. The crystals belong to the monoclinic space groupmore » C2, with unit-cell parameters a = 197.8, b = 95.2, c = 99.4 Å, β = 97.1°.« less

  19. A comprehensive resource of drought- and salinity- responsive ESTs for gene discovery and marker development in chickpea (Cicer arietinum L.)

    PubMed Central

    2009-01-01

    Background Chickpea (Cicer arietinum L.), an important grain legume crop of the world is seriously challenged by terminal drought and salinity stresses. However, very limited number of molecular markers and candidate genes are available for undertaking molecular breeding in chickpea to tackle these stresses. This study reports generation and analysis of comprehensive resource of drought- and salinity-responsive expressed sequence tags (ESTs) and gene-based markers. Results A total of 20,162 (18,435 high quality) drought- and salinity- responsive ESTs were generated from ten different root tissue cDNA libraries of chickpea. Sequence editing, clustering and assembly analysis resulted in 6,404 unigenes (1,590 contigs and 4,814 singletons). Functional annotation of unigenes based on BLASTX analysis showed that 46.3% (2,965) had significant similarity (≤1E-05) to sequences in the non-redundant UniProt database. BLASTN analysis of unique sequences with ESTs of four legume species (Medicago, Lotus, soybean and groundnut) and three model plant species (rice, Arabidopsis and poplar) provided insights on conserved genes across legumes as well as novel transcripts for chickpea. Of 2,965 (46.3%) significant unigenes, only 2,071 (32.3%) unigenes could be functionally categorised according to Gene Ontology (GO) descriptions. A total of 2,029 sequences containing 3,728 simple sequence repeats (SSRs) were identified and 177 new EST-SSR markers were developed. Experimental validation of a set of 77 SSR markers on 24 genotypes revealed 230 alleles with an average of 4.6 alleles per marker and average polymorphism information content (PIC) value of 0.43. Besides SSR markers, 21,405 high confidence single nucleotide polymorphisms (SNPs) in 742 contigs (with ≥ 5 ESTs) were also identified. Recognition sites for restriction enzymes were identified for 7,884 SNPs in 240 contigs. Hierarchical clustering of 105 selected contigs provided clues about stress- responsive candidate genes and their expression profile showed predominance in specific stress-challenged libraries. Conclusion Generated set of chickpea ESTs serves as a resource of high quality transcripts for gene discovery and development of functional markers associated with abiotic stress tolerance that will be helpful to facilitate chickpea breeding. Mapping of gene-based markers in chickpea will also add more anchoring points to align genomes of chickpea and other legume species. PMID:19912666

  20. DICOM Data Warehouse: Part 2.

    PubMed

    Langer, Steve G

    2016-06-01

    In 2010, the DICOM Data Warehouse (DDW) was launched as a data warehouse for DICOM meta-data. Its chief design goals were to have a flexible database schema that enabled it to index standard patient and study information, modality specific tags (public and private), and create a framework to derive computable information (derived tags) from the former items. Furthermore, it was to map the above information to an internally standard lexicon that enables a non-DICOM savvy programmer to write standard SQL queries and retrieve the equivalent data from a cohort of scanners, regardless of what tag that data element was found in over the changing epochs of DICOM and ensuing migration of elements from private to public tags. After 5 years, the original design has scaled astonishingly well. Very little has changed in the database schema. The knowledge base is now fluent in over 90 device types. Also, additional stored procedures have been written to compute data that is derivable from standard or mapped tags. Finally, an early concern is that the system would not be able to address the variability DICOM-SR objects has been addressed. As of this writing the system is indexing 300 MR, 600 CT, and 2000 other (XA, DR, CR, MG) imaging studies per day. The only remaining issue to be solved is the case for tags that were not prospectively indexed-and indeed, this final challenge may lead to a noSQL, big data, approach in a subsequent version.

  1. Exploiting the Brachypodium Tool Box in cereal and grass research

    USDA-ARS?s Scientific Manuscript database

    It is now a decade since Brachypodium distachyon was suggested as a model species for temperate grasses and cereals. Since then transformation protocols, large expressed sequence tag (EST) populations, tools for forward and reverse genetic screens, highly refined cytogenetic probes, germplasm coll...

  2. Genotyping variability of computationally categorized peach microsatellite markers

    USDA-ARS?s Scientific Manuscript database

    Numerous expressed sequence tag (EST) simple sequence repeat (SSR) primers can be easily mined out. The obstacle to develop them into usable markers is how to optimally select downsized subsets of the primers for genotyping, which accordingly reduces amplification failure and monomorphism often occu...

  3. Expressed sequence tags related to nitrogen metabolism in maize inoculated with Azospirillum brasilense.

    PubMed

    Pereira-Defilippi, L; Pereira, E M; Silva, F M; Moro, G V

    2017-05-31

    The relative quantitative real-time expression of two expressed sequence tags (ESTs) codifying for key enzymes in nitrogen metabolism in maize, nitrate reductase (ZmNR), and glutamine synthetase (ZmGln1-3) was performed for genotypes inoculated with Azospirillum brasilense. Two commercial single-cross hybrids (AG7098 and 2B707) and two experimental synthetic varieties (V2 and V4) were raised under controlled greenhouse conditions, in six treatment groups corresponding to different forms of inoculation and different levels of nitrogen application by top-dressing. The genotypes presented distinct responses to inoculation with A. brasilense. Increases in the expression of ZmNR were observed for the hybrids, while V4 only displayed a greater level of expression when the plants received nitrogenous fertilization by top-dressing and there was no inoculation. The expression of the ZmGln1-3EST was induced by A. brasilense in the hybrids and the variety V4. In contrast, the variety V2 did not respond to inoculation.

  4. Isolation and characterization of novel EST-derived genic markers in Pisum sativum (Fabaceae)1

    PubMed Central

    Jain, Shalu; McPhee, Kevin E.

    2013-01-01

    • Premise of the study: Novel markers were developed for pea (Pisum sativum) from pea expressed sequence tags (ESTs) having significant homology to Medicago truncatula gene sequences to investigate genetic diversity, linkage mapping, and cross-species transferability. • Methods and Results: Seventy-seven EST-derived genic markers were developed through comparative mapping between M. truncatula and P. sativum in which 75 markers produced PCR products and 33 were polymorphic among 16 pea genotypes. • Conclusions: The novel markers described here will be useful for future genetic studies of P. sativum; their amplification in lentil (Lens culinaris) demonstrates their potential for use in closely related species. PMID:25202494

  5. Extending Immunological Profiling in the Gilthead Sea Bream, Sparus aurata, by Enriched cDNA Library Analysis, Microarray Design and Initial Studies upon the Inflammatory Response to PAMPs.

    PubMed

    Boltaña, Sebastian; Castellana, Barbara; Goetz, Giles; Tort, Lluis; Teles, Mariana; Mulero, Victor; Novoa, Beatriz; Figueras, Antonio; Goetz, Frederick W; Gallardo-Escarate, Cristian; Planas, Josep V; Mackenzie, Simon

    2017-02-03

    This study describes the development and validation of an enriched oligonucleotide-microarray platform for Sparus aurata (SAQ) to provide a platform for transcriptomic studies in this species. A transcriptome database was constructed by assembly of gilthead sea bream sequences derived from public repositories of mRNA together with reads from a large collection of expressed sequence tags (EST) from two extensive targeted cDNA libraries characterizing mRNA transcripts regulated by both bacterial and viral challenge. The developed microarray was further validated by analysing monocyte/macrophage activation profiles after challenge with two Gram-negative bacterial pathogen-associated molecular patterns (PAMPs; lipopolysaccharide (LPS) and peptidoglycan (PGN)). Of the approximately 10,000 EST sequenced, we obtained a total of 6837 EST longer than 100 nt, with 3778 and 3059 EST obtained from the bacterial-primed and from the viral-primed cDNA libraries, respectively. Functional classification of contigs from the bacterial- and viral-primed cDNA libraries by Gene Ontology (GO) showed that the top five represented categories were equally represented in the two libraries: metabolism (approximately 24% of the total number of contigs), carrier proteins/membrane transport (approximately 15%), effectors/modulators and cell communication (approximately 11%), nucleoside, nucleotide and nucleic acid metabolism (approximately 7.5%) and intracellular transducers/signal transduction (approximately 5%). Transcriptome analyses using this enriched oligonucleotide platform identified differential shifts in the response to PGN and LPS in macrophage-like cells, highlighting responsive gene-cassettes tightly related to PAMP host recognition. As observed in other fish species, PGN is a powerful activator of the inflammatory response in S. aurata macrophage-like cells. We have developed and validated an oligonucleotide microarray (SAQ) that provides a platform enriched for the study of gene expression in S. aurata with an emphasis upon immunity and the immune response.

  6. Mining of haplotype-based expressed sequence tag single nucleotide polymorphisms in citrus

    PubMed Central

    2013-01-01

    Background Single nucleotide polymorphisms (SNPs), the most abundant variations in a genome, have been widely used in various studies. Detection and characterization of citrus haplotype-based expressed sequence tag (EST) SNPs will greatly facilitate further utilization of these gene-based resources. Results In this paper, haplotype-based SNPs were mined out of publicly available citrus expressed sequence tags (ESTs) from different citrus cultivars (genotypes) individually and collectively for comparison. There were a total of 567,297 ESTs belonging to 27 cultivars in varying numbers and consequentially yielding different numbers of haplotype-based quality SNPs. Sweet orange (SO) had the most (213,830) ESTs, generating 11,182 quality SNPs in 3,327 out of 4,228 usable contigs. Summed from all the individually mining results, a total of 25,417 quality SNPs were discovered – 15,010 (59.1%) were transitions (AG and CT), 9,114 (35.9%) were transversions (AC, GT, CG, and AT), and 1,293 (5.0%) were insertion/deletions (indels). A vast majority of SNP-containing contigs consisted of only 2 haplotypes, as expected, but the percentages of 2 haplotype contigs varied widely in these citrus cultivars. BLAST of the 25,417 25-mer SNP oligos to the Clementine reference genome scaffolds revealed 2,947 SNPs had “no hits found”, 19,943 had 1 unique hit / alignment, 1,571 had one hit and 2+ alignments per hit, and 956 had 2+ hits and 1+ alignment per hit. Of the total 24,293 scaffold hits, 23,955 (98.6%) were on the main scaffolds 1 to 9, and only 338 were on 87 minor scaffolds. Most alignments had 100% (25/25) or 96% (24/25) nucleotide identities, accounting for 93% of all the alignments. Considering almost all the nucleotide discrepancies in the 24/25 alignments were at the SNP sites, it served well as in silico validation of these SNPs, in addition to and consistent with the rate (81%) validated by sequencing and SNaPshot assay. Conclusions High-quality EST-SNPs from different citrus genotypes were detected, and compared to estimate the heterozygosity of each genome. All the SNP oligo sequences were aligned with the Clementine citrus genome to determine their distribution and uniqueness and for in silico validation, in addition to SNaPshot and sequencing validation of selected SNPs. PMID:24175923

  7. An annotated cDNA library of juvenile Euprymna scolopes with and without colonization by the symbiont Vibrio fischeri

    PubMed Central

    Chun, Carlene K; Scheetz, Todd E; Bonaldo, Maria de Fatima; Brown, Bartley; Clemens, Anik; Crookes-Goodson, Wendy J; Crouch, Keith; DeMartini, Tad; Eyestone, Mari; Goodson, Michael S; Janssens, Bernadette; Kimbell, Jennifer L; Koropatnick, Tanya A; Kucaba, Tamara; Smith, Christina; Stewart, Jennifer J; Tong, Deyan; Troll, Joshua V; Webster, Sarahrose; Winhall-Rice, Jane; Yap, Cory; Casavant, Thomas L; McFall-Ngai, Margaret J; Soares, M Bento

    2006-01-01

    Background Biologists are becoming increasingly aware that the interaction of animals, including humans, with their coevolved bacterial partners is essential for health. This growing awareness has been a driving force for the development of models for the study of beneficial animal-bacterial interactions. In the squid-vibrio model, symbiotic Vibrio fischeri induce dramatic developmental changes in the light organ of host Euprymna scolopes over the first hours to days of their partnership. We report here the creation of a juvenile light-organ specific EST database. Results We generated eleven cDNA libraries from the light organ of E. scolopes at developmentally significant time points with and without colonization by V. fischeri. Single pass 3' sequencing efforts generated 42,564 expressed sequence tags (ESTs) of which 35,421 passed our quality criteria and were then clustered via the UIcluster program into 13,962 nonredundant sequences. The cDNA clones representing these nonredundant sequences were sequenced from the 5' end of the vector and 58% of these resulting sequences overlapped significantly with the associated 3' sequence to generate 8,067 contigs with an average sequence length of 1,065 bp. All sequences were annotated with BLASTX (E-value < -03) and Gene Ontology (GO). Conclusion Both the number of ESTs generated from each library and GO categorizations are reflective of the activity state of the light organ during these early stages of symbiosis. Future analyses of the sequences identified in these libraries promise to provide valuable information not only about pathways involved in colonization and early development of the squid light organ, but also about pathways conserved in response to bacterial colonization across the animal kingdom. PMID:16780587

  8. Transcriptome and Proteome Exploration to Provide a Resource for the Study of Agrocybe aegerita

    PubMed Central

    Jiang, Shuai; Chen, Yijie; Yin, Yalin; Pan, Yongfu; Yu, Guojun; Li, Yamu; Wong, Barry Hon Cheung; Liang, Yi; Sun, Hui

    2013-01-01

    Background Agrocybe aegerita, the black poplar mushroom, has been highly valued as a functional food for its medicinal and nutritional benefits. Several bioactive extracts from A. aegerita have been found to exhibit antitumor and antioxidant activities. However, limited genetic resources for A. aegerita have hindered exploration of this species. Methodology/Principal Findings To facilitate the research on A. aegerita, we established a deep survey of the transcriptome and proteome of this mushroom. We applied high-throughput sequencing technology (Illumina) to sequence A. aegerita transcriptomes from mycelium and fruiting body. The raw clean reads were de novo assembled into a total of 36,134 expressed sequences tags (ESTs) with an average length of 663 bp. These ESTs were annotated and classified according to Gene Ontology (GO), Clusters of Orthologous Groups (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways. Gene expression profile analysis showed that 18,474 ESTs were differentially expressed, with 10,131 up-regulated in mycelium and 8,343 up-regulated in fruiting body. Putative genes involved in polysaccharide and steroid biosynthesis were identified from A. aegerita transcriptome, and these genes were differentially expressed at the two stages of A. aegerita. Based on one-dimensional gel electrophoresis (1-DGE) coupled with electrospray ionization liquid chromatography tandem MS (LC-ESI-MS/MS), we identified a total of 309 non-redundant proteins. And many metabolic enzymes involved in glycolysis were identified in the protein database. Conclusions/Significance This is the first study on transcriptome and proteome analyses of A. aegerita. The data in this study serve as a resource of A. aegerita transcripts and proteins, and offer clues to the applications of this mushroom in nutrition, pharmacy and industry. PMID:23418592

  9. Transcriptome and proteome exploration to provide a resource for the study of Agrocybe aegerita.

    PubMed

    Wang, Man; Gu, Bianli; Huang, Jie; Jiang, Shuai; Chen, Yijie; Yin, Yalin; Pan, Yongfu; Yu, Guojun; Li, Yamu; Wong, Barry Hon Cheung; Liang, Yi; Sun, Hui

    2013-01-01

    Agrocybe aegerita, the black poplar mushroom, has been highly valued as a functional food for its medicinal and nutritional benefits. Several bioactive extracts from A. aegerita have been found to exhibit antitumor and antioxidant activities. However, limited genetic resources for A. aegerita have hindered exploration of this species. To facilitate the research on A. aegerita, we established a deep survey of the transcriptome and proteome of this mushroom. We applied high-throughput sequencing technology (Illumina) to sequence A. aegerita transcriptomes from mycelium and fruiting body. The raw clean reads were de novo assembled into a total of 36,134 expressed sequences tags (ESTs) with an average length of 663 bp. These ESTs were annotated and classified according to Gene Ontology (GO), Clusters of Orthologous Groups (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways. Gene expression profile analysis showed that 18,474 ESTs were differentially expressed, with 10,131 up-regulated in mycelium and 8,343 up-regulated in fruiting body. Putative genes involved in polysaccharide and steroid biosynthesis were identified from A. aegerita transcriptome, and these genes were differentially expressed at the two stages of A. aegerita. Based on one-dimensional gel electrophoresis (1-DGE) coupled with electrospray ionization liquid chromatography tandem MS (LC-ESI-MS/MS), we identified a total of 309 non-redundant proteins. And many metabolic enzymes involved in glycolysis were identified in the protein database. This is the first study on transcriptome and proteome analyses of A. aegerita. The data in this study serve as a resource of A. aegerita transcripts and proteins, and offer clues to the applications of this mushroom in nutrition, pharmacy and industry.

  10. PlantGI: a database for searching gene indices in agricultural plants developed at NIAB, Korea

    PubMed Central

    Kim, Chang Kug; Choi, Ji Weon; Park, DongSuk; Kang, Man Jung; Seol, Young-Joo; Hyun, Do Yoon; Hahn, Jang Ho

    2008-01-01

    The Plant Gene Index (PlantGI) database is developed as a web-based search system with search capabilities for keywords to provide information on gene indices specifically for agricultural plants. The database contains specific Gene Index information for ten agricultural species, namely, rice, Chinese cabbage, wheat, maize, soybean, barley, mushroom, Arabidopsis, hot pepper and tomato. PlantGI differs from other Gene Index databases in being specific to agricultural plant species and thus complements services from similar other developments. The database includes options for interactive mining of EST CONTIGS and assembled EST data for user specific keyword queries. The current version of PlantGI contains a total of 34,000 EST CONTIGS data for rice (8488 records), wheat (8560 records), maize (4570 records), soybean (3726 records), barley (3417 records), Chinese cabbage (3602 records), tomato (1236 records), hot pepper (998 records), mushroom (130 records) and Arabidopsis (8 records). Availability The database is available for free at http://www.niab.go.kr/nabic/. PMID:18685722

  11. Transcriptome characterization and polymorphism detection between subspecies of big sagebrush (Artemisia tridentata)

    PubMed Central

    2011-01-01

    Background Big sagebrush (Artemisia tridentata) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush. Results cDNA of A. tridentata sspp. tridentata and vaseyana were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. tridentata and 20,250 contigs in ssp. vaseyana. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e-15). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. wyomingensis) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. tridentata and vaseyana identified in the combined assembly were also polymorphic within the two geographically distant ssp. wyomingensis samples. Conclusion We have produced a large EST dataset for Artemisia tridentata, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. wyomingensis via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches. PMID:21767398

  12. Analysis of 10,000 ESTs from lymphocytes of the cynomolgus monkey to improve our understanding of its immune system

    PubMed Central

    Chen, Wei-Hua; Wang, Xue-Xia; Lin, Wei; He, Xiao-Wei; Wu, Zhen-Qiang; Lin, Ying; Hu, Song-Nian; Wang, Xiao-Ning

    2006-01-01

    Background The cynomolgus monkey (Macaca fascicularis) is one of the most widely used surrogate animal models for an increasing number of human diseases and vaccines, especially immune-system-related ones. Towards a better understanding of the gene expression background upon its immunogenetics, we constructed a cDNA library from Epstein-Barr virus (EBV)-transformed B lymphocytes of a cynomolgus monkey and sequenced 10,000 randomly picked clones. Results After processing, 8,312 high-quality expressed sequence tags (ESTs) were generated and assembled into 3,728 unigenes. Annotations of these uniquely expressed transcripts demonstrated that out of the 2,524 open reading frame (ORF) positive unigenes (mitochondrial and ribosomal sequences were not included), 98.8% shared significant similarities (E-value less than 1e-10) with the NCBI nucleotide (nt) database, while only 67.7% (E-value less than 1e-5) did so with the NCBI non-redundant protein (nr) database. Further analysis revealed that 90.0% of the unigenes that shared no similarities to the nr database could be assigned to human chromosomes, in which 75 did not match significantly to any cynomolgus monkey and human ESTs. The mapping regions to known human genes on the human genome were described in detail. The protein family and domain analysis revealed that the first, second and fourth of the most abundantly expressed protein families were all assigned to immunoglobulin and major histocompatibility complex (MHC)-related proteins. The expression profiles of these genes were compared with that of homologous genes in human blood, lymph nodes and a RAMOS cell line, which demonstrated expression changes after transformation with EBV. The degree of sequence similarity of the MHC class I and II genes to the human reference sequences was evaluated. The results indicated that class I molecules showed weak amino acid identities (<90%), while class II showed slightly higher ones. Conclusion These results indicated that the genes expressed in the cynomolgus monkey could be used to identify novel protein-coding genes and revise those incomplete or incorrect annotations in the human genome by comparative methods, since the old world monkeys and humans share high similarities at the molecular level, especially within coding regions. The identification of multiple genes involved in the immune response, their sequence variations to the human homologues, and their responses to EBV infection could provide useful information to improve our understanding of the cynomolgus monkey immune system. PMID:16618371

  13. Wearable Tracking Tags Test Privacy Boundaries at the U. of Washington

    ERIC Educational Resources Information Center

    Dotinga, Randy

    2008-01-01

    Tags such as the radio-frequency identifications or RFIDs are devices that make it possible for individuals to be tracked and their location reported back to a database. The devices--chips with radio antennas--emit signals, and tracking them reveals the movement of people or things. Many stores use the technology to catch shoplifters at exits. To…

  14. A high-resolution whole genome radiation hybrid map of human chromosome 17q22-q25.3 across the genes for GH and TK

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Foster, J.W.; Schafer, A.J.; Critcher, R.

    1996-04-15

    We have constructed a whole genome radiation hybrid (WG-RH) map across a region of human chromosome 17q, from growth hormone (GH) to thymidine kinase (TK). A panel of 128 WG-RH hybrid cell lines generated by X-irradiation and fusion has been tested for the retention of 39 sequence-tagged site (STS) markers by the polymerase chain reaction. This genome mapping technique has allowed the integration of existing VNTR and microsatellite markers with additional new markers and existing STS markers previously mapped to this region by other means. The WG-RH map includes eight expressed sequence tag (EST) and three anonymous markers developed formore » this study, together with 23 anonymous microsatellites and five existing ESTs. Analysis of these data resulted in a high-density comprehensive map across this region of the genome. A subset of these markers has been used to produce a framework map consisting of 20 loci ordered with odds greater than 1000:1. The markers are of sufficient density to build a YAC contig across this region based on marker content. We have developed sequence tags for both ends of a 2.1-Mb YAC and mapped these using the WG-RH panel, allowing a direct comparison of cRay{sub 6000} to physical distance. 31 refs., 3 figs., 2 tabs.« less

  15. Mass spectrometry-based protein identification by integrating de novo sequencing with database searching.

    PubMed

    Wang, Penghao; Wilson, Susan R

    2013-01-01

    Mass spectrometry-based protein identification is a very challenging task. The main identification approaches include de novo sequencing and database searching. Both approaches have shortcomings, so an integrative approach has been developed. The integrative approach firstly infers partial peptide sequences, known as tags, directly from tandem spectra through de novo sequencing, and then puts these sequences into a database search to see if a close peptide match can be found. However the current implementation of this integrative approach has several limitations. Firstly, simplistic de novo sequencing is applied and only very short sequence tags are used. Secondly, most integrative methods apply an algorithm similar to BLAST to search for exact sequence matches and do not accommodate sequence errors well. Thirdly, by applying these methods the integrated de novo sequencing makes a limited contribution to the scoring model which is still largely based on database searching. We have developed a new integrative protein identification method which can integrate de novo sequencing more efficiently into database searching. Evaluated on large real datasets, our method outperforms popular identification methods.

  16. Visual feature extraction and establishment of visual tags in the intelligent visual internet of things

    NASA Astrophysics Data System (ADS)

    Zhao, Yiqun; Wang, Zhihui

    2015-12-01

    The Internet of things (IOT) is a kind of intelligent networks which can be used to locate, track, identify and supervise people and objects. One of important core technologies of intelligent visual internet of things ( IVIOT) is the intelligent visual tag system. In this paper, a research is done into visual feature extraction and establishment of visual tags of the human face based on ORL face database. Firstly, we use the principal component analysis (PCA) algorithm for face feature extraction, then adopt the support vector machine (SVM) for classifying and face recognition, finally establish a visual tag for face which is already classified. We conducted a experiment focused on a group of people face images, the result show that the proposed algorithm have good performance, and can show the visual tag of objects conveniently.

  17. A SSR-based genetic linkage map of cultivated peanut (Arachis hypogaea L.)

    USDA-ARS?s Scientific Manuscript database

    The objective of this study was to construct a molecular linkage map of cultivated tetraploid peanut using simple sequence repeat (SSR) markers derived primarily from peanut genomic sequences, expressed sequence tags (ESTs), and by "data mining" sequences released in GenBank. Three recombinant inbre...

  18. Gene expression profiling of the plant pathogenic basidiomycetous fungus Rhizoctonia solani AG 4 reveals putative virulence factors

    USDA-ARS?s Scientific Manuscript database

    Rhizoctonia solani is a ubiquitous basidiomycetous soilborne fungal pathogen causing damping off of seedlings, aerial blights and postharvest diseases. To gain insight into the molecular mechanisms of pathogenesis a global approach based on analysis of expressed sequence tags (ESTs) was undertaken. ...

  19. Characterization of transcriptome dynamics during watermelon fruit development: sequencing, assembly, annotation and gene expression profiles

    PubMed Central

    2011-01-01

    Background Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues. Results We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression profiles helped elucidating molecular mechanisms governing these important quality-related traits during watermelon fruit development. Conclusion We have generated a large collection of watermelon ESTs, which represents a significant expansion of the current transcript catalog of watermelon and a valuable resource for future studies on the genomics of watermelon and other closely-related species. Digital expression analysis of this EST collection allowed us to identify a large set of genes that were differentially expressed during watermelon fruit development and ripening, which provide a rich source of candidates for future functional analysis and represent a valuable increase in our knowledge base of watermelon fruit biology. PMID:21936920

  20. Characterization of transcriptome dynamics during watermelon fruit development: sequencing, assembly, annotation and gene expression profiles.

    PubMed

    Guo, Shaogui; Liu, Jingan; Zheng, Yi; Huang, Mingyun; Zhang, Haiying; Gong, Guoyi; He, Hongju; Ren, Yi; Zhong, Silin; Fei, Zhangjun; Xu, Yong

    2011-09-21

    Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues. We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression profiles helped elucidating molecular mechanisms governing these important quality-related traits during watermelon fruit development. We have generated a large collection of watermelon ESTs, which represents a significant expansion of the current transcript catalog of watermelon and a valuable resource for future studies on the genomics of watermelon and other closely-related species. Digital expression analysis of this EST collection allowed us to identify a large set of genes that were differentially expressed during watermelon fruit development and ripening, which provide a rich source of candidates for future functional analysis and represent a valuable increase in our knowledge base of watermelon fruit biology.

  1. An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

    PubMed

    Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

    2011-01-01

    cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

  2. A SAGE based approach to human glomerular endothelium: defining the transcriptome, finding a novel molecule and highlighting endothelial diversity.

    PubMed

    Sengoelge, Guerkan; Winnicki, Wolfgang; Kupczok, Anne; von Haeseler, Arndt; Schuster, Michael; Pfaller, Walter; Jennings, Paul; Weltermann, Ansgar; Blake, Sophia; Sunder-Plassmann, Gere

    2014-08-27

    Large scale transcript analysis of human glomerular microvascular endothelial cells (HGMEC) has never been accomplished. We designed this study to define the transcriptome of HGMEC and facilitate a better characterization of these endothelial cells with unique features. Serial analysis of gene expression (SAGE) was used for its unbiased approach to quantitative acquisition of transcripts. We generated a HGMEC SAGE library consisting of 68,987 transcript tags. Then taking advantage of large public databases and advanced bioinformatics we compared the HGMEC SAGE library with a SAGE library of non-cultured ex vivo human glomeruli (44,334 tags) which contained endothelial cells. The 823 tags common to both which would have the potential to be expressed in vivo were subsequently checked against 822,008 tags from 16 non-glomerular endothelial SAGE libraries. This resulted in 268 transcript tags differentially overexpressed in HGMEC compared to non-glomerular endothelia. These tags were filtered using a set of criteria: never before shown in kidney or any type of endothelial cell, absent in all nephron regions except the glomerulus, more highly expressed than statistically expected in HGMEC. Neurogranin, a direct target of thyroid hormone action which had been thought to be brain specific and never shown in endothelial cells before, fulfilled these criteria. Its expression in glomerular endothelium in vitro and in vivo was then verified by real-time-PCR, sequencing and immunohistochemistry. Our results represent an extensive molecular characterization of HGMEC beyond a mere database, underline the endothelial heterogeneity, and propose neurogranin as a potential link in the kidney-thyroid axis.

  3. The rehydration transcriptome of the desiccation-tolerant bryophyte Tortula ruralis: transcript classification and analysis

    PubMed Central

    Oliver, Melvin J; Dowd, Scot E; Zaragoza, Joaquin; Mauget, Steven A; Payton, Paxton R

    2004-01-01

    Background The cellular response of plants to water-deficits has both economic and evolutionary importance directly affecting plant productivity in agriculture and plant survival in the natural environment. Genes induced by water-deficit stress have been successfully enumerated in plants that are relatively sensitive to cellular dehydration, however we have little knowledge as to the adaptive role of these genes in establishing tolerance to water loss at the cellular level. Our approach to address this problem has been to investigate the genetic responses of plants that are capable of tolerating extremes of dehydration, in particular the desiccation-tolerant bryophyte, Tortula ruralis. To establish a sound basis for characterizing the Tortula genome in regards to desiccation tolerance, we analyzed 10,368 expressed sequence tags (ESTs) from rehydrated rapid-dried Tortula gametophytes, a stage previously determined to exhibit the maximum stress induced change in gene expression. Results The 10, 368 ESTs formed 5,563 EST clusters (contig groups representing individual genes) of which 3,321 (59.7%) exhibited similarity to genes present in the public databases and 2,242 were categorized as unknowns based on protein homology scores. The 3,321 clusters were classified by function using the Gene Ontology (GO) hierarchy and the KEGG database. The results indicate that the transcriptome contains a diverse population of transcripts that reflects, as expected, a period of metabolic upheaval in the gametophyte cells. Much of the emphasis within the transcriptome is centered on the protein synthetic machinery, ion and metabolite transport, and membrane biosynthesis and repair. Rehydrating gametophytes also have an abundance of transcripts that code for enzymes involved in oxidative stress metabolism and phosphorylating activities. The functional classifications reflect a remarkable consistency with what we have previously established with regards to the metabolic activities that are important in the recovery of the gametophytes from desiccation. A comparison of the GO distribution of Tortula clusters with an identical analysis of 9,981 clusters from the desiccation sensitive bryophyte species Physcomitrella patens, revealed, and accentuated, the differences between stressed and unstressed transcriptomes. Cross species sequence comparisons indicated that on the whole the Tortula clusters were more closely related to those from Physcomitrella than Arabidopsis (complete genome BLASTx comparison) although because of the differences in the databases there were more high scoring matches to the Arabidopsis sequences. The most abundant transcripts contained within the Tortula ESTs encode Late Embryogenesis Abundant (LEA) proteins that are normally associated with drying plant tissues. This suggests that LEAs may also play a role in recovery from desiccation when water is reintroduced into a dried tissue. Conclusion The establishment of a rehydration EST collection for Tortula ruralis, an important plant model for plant stress responses and vegetative desiccation tolerance, is an important step in understanding the genome level response to cellular dehydration. The type of transcript analysis performed here has laid the foundation for more detailed functional and genome level analyses of the genes involved in desiccation tolerance in plants. PMID:15546486

  4. Introduction to the DISRUPT postprandial database: subjects, studies and methodologies.

    PubMed

    Jackson, Kim G; Clarke, Dave T; Murray, Peter; Lovegrove, Julie A; O'Malley, Brendan; Minihane, Anne M; Williams, Christine M

    2010-03-01

    Dysregulation of lipid and glucose metabolism in the postprandial state are recognised as important risk factors for the development of cardiovascular disease and type 2 diabetes. Our objective was to create a comprehensive, standardised database of postprandial studies to provide insights into the physiological factors that influence postprandial lipid and glucose responses. Data were collated from subjects (n = 467) taking part in single and sequential meal postprandial studies conducted by researchers at the University of Reading, to form the DISRUPT (DIetary Studies: Reading Unilever Postprandial Trials) database. Subject attributes including age, gender, genotype, menopausal status, body mass index, blood pressure and a fasting biochemical profile, together with postprandial measurements of triacylglycerol (TAG), non-esterified fatty acids, glucose, insulin and TAG-rich lipoprotein composition are recorded. A particular strength of the studies is the frequency of blood sampling, with on average 10-13 blood samples taken during each postprandial assessment, and the fact that identical test meal protocols were used in a number of studies, allowing pooling of data to increase statistical power. The DISRUPT database is the most comprehensive postprandial metabolism database that exists worldwide and preliminary analysis of the pooled sequential meal postprandial dataset has revealed both confirmatory and novel observations with respect to the impact of gender and age on the postprandial TAG response. Further analysis of the dataset using conventional statistical techniques along with integrated mathematical models and clustering analysis will provide a unique opportunity to greatly expand current knowledge of the aetiology of inter-individual variability in postprandial lipid and glucose responses.

  5. Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement.

    PubMed

    Tang, Jinhui; Shu, Xiangbo; Qi, Guo-Jun; Li, Zechao; Wang, Meng; Yan, Shuicheng; Jain, Ramesh

    2017-08-01

    Social image tag refinement, which aims to improve tag quality by automatically completing the missing tags and rectifying the noise-corrupted ones, is an essential component for social image search. Conventional approaches mainly focus on exploring the visual and tag information, without considering the user information, which often reveals important hints on the (in)correct tags of social images. Towards this end, we propose a novel tri-clustered tensor completion framework to collaboratively explore these three kinds of information to improve the performance of social image tag refinement. Specifically, the inter-relations among users, images and tags are modeled by a tensor, and the intra-relations between users, images and tags are explored by three regularizations respectively. To address the challenges of the super-sparse and large-scale tensor factorization that demands expensive computing and memory cost, we propose a novel tri-clustering method to divide the tensor into a certain number of sub-tensors by simultaneously clustering users, images and tags into a bunch of tri-clusters. And then we investigate two strategies to complete these sub-tensors by considering (in)dependence between the sub-tensors. Experimental results on a real-world social image database demonstrate the superiority of the proposed method compared with the state-of-the-art methods.

  6. Large-scale identification of odorant-binding proteins and chemosensory proteins from expressed sequence tags in insects

    PubMed Central

    2009-01-01

    Background Insect odorant binding proteins (OBPs) and chemosensory proteins (CSPs) play an important role in chemical communication of insects. Gene discovery of these proteins is a time-consuming task. In recent years, expressed sequence tags (ESTs) of many insect species have accumulated, thus providing a useful resource for gene discovery. Results We have developed a computational pipeline to identify OBP and CSP genes from insect ESTs. In total, 752,841 insect ESTs were examined from 54 species covering eight Orders of Insecta. From these ESTs, 142 OBPs and 177 CSPs were identified, of which 117 OBPs and 129 CSPs are new. The complete open reading frames (ORFs) of 88 OBPs and 123 CSPs were obtained by electronic elongation. We randomly chose 26 OBPs from eight species of insects, and 21 CSPs from four species for RT-PCR validation. Twenty two OBPs and 16 CSPs were confirmed by RT-PCR, proving the efficiency and reliability of the algorithm. Together with all family members obtained from the NCBI (OBPs) or the UniProtKB (CSPs), 850 OBPs and 237 CSPs were analyzed for their structural characteristics and evolutionary relationship. Conclusions A large number of new OBPs and CSPs were found, providing the basis for deeper understanding of these proteins. In addition, the conserved motif and evolutionary analysis provide some new insights into the evolution of insect OBPs and CSPs. Motif pattern fine-tune the functions of OBPs and CSPs, leading to the minor difference in binding sex pheromone or plant volatiles in different insect Orders. PMID:20034407

  7. PineElm_SSRdb: a microsatellite marker database identified from genomic, chloroplast, mitochondrial and EST sequences of pineapple (Ananas comosus (L.) Merrill).

    PubMed

    Chaudhary, Sakshi; Mishra, Bharat Kumar; Vivek, Thiruvettai; Magadum, Santoshkumar; Yasin, Jeshima Khan

    2016-01-01

    Simple Sequence Repeats or microsatellites are resourceful molecular genetic markers. There are only few reports of SSR identification and development in pineapple. Complete genome sequence of pineapple available in the public domain can be used to develop numerous novel SSRs. Therefore, an attempt was made to identify SSRs from genomic, chloroplast, mitochondrial and EST sequences of pineapple which will help in deciphering genetic makeup of its germplasm resources. A total of 359511 SSRs were identified in pineapple (356385 from genome sequence, 45 from chloroplast sequence, 249 in mitochondrial sequence and 2832 from EST sequences). The list of EST-SSR markers and their details are available in the database. PineElm_SSRdb is an open source database available for non-commercial academic purpose at http://app.bioelm.com/ with a mapping tool which can develop circular maps of selected marker set. This database will be of immense use to breeders, researchers and graduates working on Ananas spp. and to others working on cross-species transferability of markers, investigating diversity, mapping and DNA fingerprinting.

  8. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

    PubMed Central

    2011-01-01

    Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns. PMID:21599934

  9. Improving data management and dissemination in web based information systems by semantic enrichment of descriptive data aspects

    NASA Astrophysics Data System (ADS)

    Gebhardt, Steffen; Wehrmann, Thilo; Klinger, Verena; Schettler, Ingo; Huth, Juliane; Künzer, Claudia; Dech, Stefan

    2010-10-01

    The German-Vietnamese water-related information system for the Mekong Delta (WISDOM) project supports business processes in Integrated Water Resources Management in Vietnam. Multiple disciplines bring together earth and ground based observation themes, such as environmental monitoring, water management, demographics, economy, information technology, and infrastructural systems. This paper introduces the components of the web-based WISDOM system including data, logic and presentation tier. It focuses on the data models upon which the database management system is built, including techniques for tagging or linking metadata with the stored information. The model also uses ordered groupings of spatial, thematic and temporal reference objects to semantically tag datasets to enable fast data retrieval, such as finding all data in a specific administrative unit belonging to a specific theme. A spatial database extension is employed by the PostgreSQL database. This object-oriented database was chosen over a relational database to tag spatial objects to tabular data, improving the retrieval of census and observational data at regional, provincial, and local areas. While the spatial database hinders processing raster data, a "work-around" was built into WISDOM to permit efficient management of both raster and vector data. The data model also incorporates styling aspects of the spatial datasets through styled layer descriptions (SLD) and web mapping service (WMS) layer specifications, allowing retrieval of rendered maps. Metadata elements of the spatial data are based on the ISO19115 standard. XML structured information of the SLD and metadata are stored in an XML database. The data models and the data management system are robust for managing the large quantity of spatial objects, sensor observations, census and document data. The operational WISDOM information system prototype contains modules for data management, automatic data integration, and web services for data retrieval, analysis, and distribution. The graphical user interfaces facilitate metadata cataloguing, data warehousing, web sensor data analysis and thematic mapping.

  10. A dehydration-inducible gene in the truffle Tuber borchii identifies a novel group of dehydrins

    PubMed Central

    Abba', Simona; Ghignone, Stefano; Bonfante, Paola

    2006-01-01

    Background The expressed sequence tag M6G10 was originally isolated from a screening for differentially expressed transcripts during the reproductive stage of the white truffle Tuber borchii. mRNA levels for M6G10 increased dramatically during fruiting body maturation compared to the vegetative mycelial stage. Results Bioinformatics tools, phylogenetic analysis and expression studies were used to support the hypothesis that this sequence, named TbDHN1, is the first dehydrin (DHN)-like coding gene isolated in fungi. Homologs of this gene, all defined as "coding for hypothetical proteins" in public databases, were exclusively found in ascomycetous fungi and in plants. Although complete (or almost complete) fungal genomes and EST collections of some Basidiomycota and Glomeromycota are already available, DHN-like proteins appear to be represented only in Ascomycota. A new and previously uncharacterized conserved signature pattern was identified and proposed to Uniprot database as the main distinguishing feature of this new group of DHNs. Expression studies provide experimental evidence of a transcript induction of TbDHN1 during cellular dehydration. Conclusion Expression pattern and sequence similarities to known plant DHNs indicate that TbDHN1 is the first characterized DHN-like protein in fungi. The high similarity of TbDHN1 with homolog coding sequences implies the existence of a novel fungal/plant group of LEA Class II proteins characterized by a previously undescribed signature pattern. PMID:16512918

  11. Extracting tag hierarchies.

    PubMed

    Tibély, Gergely; Pollner, Péter; Vicsek, Tamás; Palla, Gergely

    2013-01-01

    Tagging items with descriptive annotations or keywords is a very natural way to compress and highlight information about the properties of the given entity. Over the years several methods have been proposed for extracting a hierarchy between the tags for systems with a "flat", egalitarian organization of the tags, which is very common when the tags correspond to free words given by numerous independent people. Here we present a complete framework for automated tag hierarchy extraction based on tag occurrence statistics. Along with proposing new algorithms, we are also introducing different quality measures enabling the detailed comparison of competing approaches from different aspects. Furthermore, we set up a synthetic, computer generated benchmark providing a versatile tool for testing, with a couple of tunable parameters capable of generating a wide range of test beds. Beside the computer generated input we also use real data in our studies, including a biological example with a pre-defined hierarchy between the tags. The encouraging similarity between the pre-defined and reconstructed hierarchy, as well as the seemingly meaningful hierarchies obtained for other real systems indicate that tag hierarchy extraction is a very promising direction for further research with a great potential for practical applications. Tags have become very prevalent nowadays in various online platforms ranging from blogs through scientific publications to protein databases. Furthermore, tagging systems dedicated for voluntary tagging of photos, films, books, etc. with free words are also becoming popular. The emerging large collections of tags associated with different objects are often referred to as folksonomies, highlighting their collaborative origin and the "flat" organization of the tags opposed to traditional hierarchical categorization. Adding a tag hierarchy corresponding to a given folksonomy can very effectively help narrowing or broadening the scope of search. Moreover, recommendation systems could also benefit from a tag hierarchy.

  12. Extracting Tag Hierarchies

    PubMed Central

    Tibély, Gergely; Pollner, Péter; Vicsek, Tamás; Palla, Gergely

    2013-01-01

    Tagging items with descriptive annotations or keywords is a very natural way to compress and highlight information about the properties of the given entity. Over the years several methods have been proposed for extracting a hierarchy between the tags for systems with a "flat", egalitarian organization of the tags, which is very common when the tags correspond to free words given by numerous independent people. Here we present a complete framework for automated tag hierarchy extraction based on tag occurrence statistics. Along with proposing new algorithms, we are also introducing different quality measures enabling the detailed comparison of competing approaches from different aspects. Furthermore, we set up a synthetic, computer generated benchmark providing a versatile tool for testing, with a couple of tunable parameters capable of generating a wide range of test beds. Beside the computer generated input we also use real data in our studies, including a biological example with a pre-defined hierarchy between the tags. The encouraging similarity between the pre-defined and reconstructed hierarchy, as well as the seemingly meaningful hierarchies obtained for other real systems indicate that tag hierarchy extraction is a very promising direction for further research with a great potential for practical applications. Tags have become very prevalent nowadays in various online platforms ranging from blogs through scientific publications to protein databases. Furthermore, tagging systems dedicated for voluntary tagging of photos, films, books, etc. with free words are also becoming popular. The emerging large collections of tags associated with different objects are often referred to as folksonomies, highlighting their collaborative origin and the “flat” organization of the tags opposed to traditional hierarchical categorization. Adding a tag hierarchy corresponding to a given folksonomy can very effectively help narrowing or broadening the scope of search. Moreover, recommendation systems could also benefit from a tag hierarchy. PMID:24391901

  13. Hermann Muller and Mutations in Drosophila

    Science.gov Websites

    &D Nuggets Database dropdown arrow Search Tag Cloud Browse Reports Database Help Finding Aids Studies on Drosophila; DOE Technical Report; 1958 The Influence of Radiation in Altering the Incidence of September 15, 1960 to September 14, 1961; DOE Technical Report; 1960 The Influence of Radiation in Altering

  14. Identification and characterization of expressed resistance gene analogs (RGSs) from peanut (Arachis hypogaea L.) expressed sequence tags (ESTs)

    USDA-ARS?s Scientific Manuscript database

    Cultivated peanut (Arachis hypogaea L.) is an important food legume grown worldwide for providing edible oil and protein. However, due to scarcity of genetic diversity, peanut is very vulnerable to a variety of pathogens, such as rust (Puccinia arachidis Speg.), early leaf spot (Cercospora arachidic...

  15. Analysis and functional classification of transcripts from the nematode Meloidogyne incognita

    PubMed Central

    McCarter, James P; Dautova Mitreva, Makedonka; Martin, John; Dante, Mike; Wylie, Todd; Rao, Uma; Pape, Deana; Bowers, Yvette; Theising, Brenda; Murphy, Claire V; Kloek, Andrew P; Chiapelli, Brandi J; Clifton, Sandra W; Bird, David Mck; Waterston, Robert H

    2003-01-01

    Background Plant parasitic nematodes are major pathogens of most crops. Molecular characterization of these species as well as the development of new techniques for control can benefit from genomic approaches. As an entrée to characterizing plant parasitic nematode genomes, we analyzed 5,700 expressed sequence tags (ESTs) from second-stage larvae (L2) of the root-knot nematode Meloidogyne incognita. Results From these, 1,625 EST clusters were formed and classified by function using the Gene Ontology (GO) hierarchy and the Kyoto KEGG database. L2 larvae, which represent the infective stage of the life cycle before plant invasion, express a diverse array of ligand-binding proteins and abundant cytoskeletal proteins. L2 are structurally similar to Caenorhabditis elegans dauer larva and the presence of transcripts encoding glyoxylate pathway enzymes in the M. incognita clusters suggests that root-knot nematode larvae metabolize lipid stores while in search of a host. Homology to other species was observed in 79% of translated cluster sequences, with the C. elegans genome providing more information than any other source. In addition to identifying putative nematode-specific and Tylenchida-specific genes, sequencing revealed previously uncharacterized horizontal gene transfer candidates in Meloidogyne with high identity to rhizobacterial genes including homologs of nodL acetyltransferase and novel cellulases. Conclusions With sequencing from plant parasitic nematodes accelerating, the approaches to transcript characterization described here can be applied to more extensive datasets and also provide a foundation for more complex genome analyses. PMID:12702207

  16. A "shotgun" method for tracing the birth locations of sheep from flock tags, applied to scrapie surveillance in Great Britain.

    PubMed

    Birch, Colin P D; Del Rio Vilas, Victor J; Chikukwa, Ambrose C

    2010-09-01

    Movement records are often used to identify animal sample provenance by retracing the movements of individuals. Here we present an alternative method, which uses the same identity tags and movement records as are used to retrace movements, but ignores individual movement paths. The first step uses a simple query to identify the most likely birth holding for every identity tag included in a database recording departures from agricultural holdings. The second step rejects a proportion of the birth holding locations to leave a list of birth holding locations that are relatively reliable. The method was used to trace the birth locations of sheep sampled for scrapie in abattoirs, or on farm as fallen stock. Over 82% of the sheep sampled in the fallen stock survey died at the holding of birth. This lack of movement may be an important constraint on scrapie transmission. These static sheep provided relatively reliable birth locations, which were used to define criteria for selecting reliable traces. The criteria rejected 16.8% of fallen stock traces and 11.9% of abattoir survey traces. Two tests provided estimates that selection reduced error in fallen stock traces from 11.3% to 3.2%, and in abattoir survey traces from 8.1% to 1.8%. This method generated 14,591 accepted traces of fallen stock from samples taken during 2002-2005 and 83,136 accepted traces from abattoir samples. The absence or ambiguity of flock tag records at the time of slaughter prevented the tracing of 16-24% of abattoir samples during 2002-2004, although flock tag records improved in 2005. The use of internal scoring to generate and evaluate results from the database query, and the confirmation of results by comparison with other database fields, are analogous to methods used in web search engines. Such methods may have wide application in tracing samples and in adding value to biological datasets. Crown Copyright 2010. Published by Elsevier B.V. All rights reserved.

  17. An integrated genetic and physical map of the autosomal recessive polycystic kidney disease region

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lens, X.M.; Onuchic, L.F.; Daoust, M.

    1997-05-01

    Autosomal recessive polycystic kidney disease is one of the most common hereditary renal cystic diseases in children. Genetic studies have recently assigned the only known locus for this disorder, PKHD1, to chromosome 6p21-p12. We have generated a YAC contig that spans {approximately}5 cM of this region, defined by the markers D6S1253-D6S295, and have mapped 43 sequence-tagged sites (STS) within this interval. This set includes 20 novel STSs, which define 12 unique positions in the region, and three ESTs. A minimal set of two YACs spans the segment D6S465-D6S466, which contains PKHD1, and estimates of their sizes based on information inmore » public databases suggest that the size of the critical region is <3.1 Mb. Twenty-eight STSs map to this interval, giving an average STS density of <1/150 kb. These resources will be useful for establishing a complete trancription map of the PKHD1 region. 10 refs., 1 fig., 1 tab.« less

  18. Transcriptome analysis of eyestalk and hemocytes in the ridgetail white prawn Exopalaemon carinicauda: assembly, annotation and marker discovery.

    PubMed

    Li, Jitao; Li, Jian; Chen, Ping; Liu, Ping; He, Yuying

    2015-01-01

    The ridgetail white prawn Exopalaemon carinicauda is one of major economic mariculture species in eastern China. The deficiency of genomic and transcriptomic data is becoming the bottleneck of further researches on its good traits. In the present study, 454 pyrosequencing was undertaken to investigate the transcriptome profiles of E. carinicauda. A collection of 1,028,710 sequence reads (459.59 Mb) obtained from cDNA prepared from eyestalk and hemocytes was assembled into 162,056 expressed sequence tags (ESTs). Of these, 29.88 % of 48,428 contigs and 70.12 % of 113,628 singlets possessed high similarities to sequences in the GenBank non-redundant database, with most significant (E value <1e(-10)) unigenes matches occurring with crustacean and insect sequences. KEGG analysis of unigenes identified putative members of biological pathways related to growth and immunity. In addition, we obtained a total of putative 125,112 SNPs and 13,467 microsatellites. These results will contribute to the understanding of the genome makeup and provide useful information for future functional genomic research in E. carinicauda.

  19. Focused Logistics, Joint Vision 2010: A Joint Logistics Roadmap

    DTIC Science & Technology

    2010-01-01

    AIS). AIT devices include bar codes for individual items, optical memory cards for multipacks and containers, radio frequency tags for containers and...Fortezza Card and Firewall technologies are being developed to prevent unau- thorized access. As for infrastructure, DISA has already made significant in...radio frequency tags and optical memory cards , to continuously update the JTAV database. By September 1998, DSS will be deployed in all wholesale

  20. Integrated Management and Visualization of Electronic Tag Data with Tagbase

    PubMed Central

    Lam, Chi Hin; Tsontos, Vardis M.

    2011-01-01

    Electronic tags have been used widely for more than a decade in studies of diverse marine species. However, despite significant investment in tagging programs and hardware, data management aspects have received insufficient attention, leaving researchers without a comprehensive toolset to manage their data easily. The growing volume of these data holdings, the large diversity of tag types and data formats, and the general lack of data management resources are not only complicating integration and synthesis of electronic tagging data in support of resource management applications but potentially threatening the integrity and longer-term access to these valuable datasets. To address this critical gap, Tagbase has been developed as a well-rounded, yet accessible data management solution for electronic tagging applications. It is based on a unified relational model that accommodates a suite of manufacturer tag data formats in addition to deployment metadata and reprocessed geopositions. Tagbase includes an integrated set of tools for importing tag datasets into the system effortlessly, and provides reporting utilities to interactively view standard outputs in graphical and tabular form. Data from the system can also be easily exported or dynamically coupled to GIS and other analysis packages. Tagbase is scalable and has been ported to a range of database management systems to support the needs of the tagging community, from individual investigators to large scale tagging programs. Tagbase represents a mature initiative with users at several institutions involved in marine electronic tagging research. PMID:21750734

  1. Characterization of EST-derived and non-EST simple sequence repeats in an F₁ hybrid population of Vitis vinifera L.

    PubMed

    Kayesh, E; Bilkish, N; Liu, G S; Chen, W; Leng, X P; Fang, J G

    2014-03-31

    Among different classes of molecular markers, expressed sequence tags (ESTs) are a new resource for developing simple sequence repeat (SSR) functional markers for genotyping and genetic mapping in F1 hybrid populations of Vitis vinifera L. Recently, because of the availability of an enormous amount of data for ESTs in the public domain, the emphasis has shifted from genomic SSRs to EST-SSRs, which belong to transcribed regions of the genome and may have a role in gene expression or function. The objective of this study was to assess the polymorphisms among 94 F1 hybrids from "Early Rose" and "Red Globe" using 25 EST-derived and 25 non-EST SSR markers. A total collection of 362,375 grape ESTs that were retrieved from the National Center for Biotechnology Information (NCBI) and 2522 EST-SSR sequences were identified. From them, 205 primer pairs were randomly selected, including 176 pairs that were EST-derived and 29 non-EST SSR primer pairs, for polymerase chain reaction amplification. A total of 131 alleles were amplified using 50 pairs of primers; 78 alleles were amplified using EST-derived SSR primers and 53 were from non-EST SSR primers. At most, 6 and 5 alleles were amplified by EST-derived and non-EST SSR primers, respectively. The EST-derived SSR markers showed a maximum polymorphic information content (PIC) value of 1 and a minimum of 0.33 while non-EST SSR markers had maximum and minimum PIC values of 1 and 0.25, respectively. The average PIC value was 0.56 for EST-derived SSR markers and 0.45 for non-EST SSR markers.

  2. DOE Research and Development Accomplishments Website Policies/Important

    Science.gov Websites

    Links RSS Archive Videos XML DOE R&D Accomplishments DOE R&D Accomplishments searchQuery × Find searchQuery x Find DOE R&D Acccomplishments Navigation dropdown arrow The Basics Stories Snapshots R&D Nuggets Database dropdown arrow Search Tag Cloud Browse Reports Database Help

  3. Tomato Expression Database (TED): a suite of data presentation and analysis tools

    PubMed Central

    Fei, Zhangjun; Tang, Xuemei; Alba, Rob; Giovannoni, James

    2006-01-01

    The Tomato Expression Database (TED) includes three integrated components. The Tomato Microarray Data Warehouse serves as a central repository for raw gene expression data derived from the public tomato cDNA microarray. In addition to expression data, TED stores experimental design and array information in compliance with the MIAME guidelines and provides web interfaces for researchers to retrieve data for their own analysis and use. The Tomato Microarray Expression Database contains normalized and processed microarray data for ten time points with nine pair-wise comparisons during fruit development and ripening in a normal tomato variety and nearly isogenic single gene mutants impacting fruit development and ripening. Finally, the Tomato Digital Expression Database contains raw and normalized digital expression (EST abundance) data derived from analysis of the complete public tomato EST collection containing >150 000 ESTs derived from 27 different non-normalized EST libraries. This last component also includes tools for the comparison of tomato and Arabidopsis digital expression data. A set of query interfaces and analysis, and visualization tools have been developed and incorporated into TED, which aid users in identifying and deciphering biologically important information from our datasets. TED can be accessed at . PMID:16381976

  4. Tomato Expression Database (TED): a suite of data presentation and analysis tools.

    PubMed

    Fei, Zhangjun; Tang, Xuemei; Alba, Rob; Giovannoni, James

    2006-01-01

    The Tomato Expression Database (TED) includes three integrated components. The Tomato Microarray Data Warehouse serves as a central repository for raw gene expression data derived from the public tomato cDNA microarray. In addition to expression data, TED stores experimental design and array information in compliance with the MIAME guidelines and provides web interfaces for researchers to retrieve data for their own analysis and use. The Tomato Microarray Expression Database contains normalized and processed microarray data for ten time points with nine pair-wise comparisons during fruit development and ripening in a normal tomato variety and nearly isogenic single gene mutants impacting fruit development and ripening. Finally, the Tomato Digital Expression Database contains raw and normalized digital expression (EST abundance) data derived from analysis of the complete public tomato EST collection containing >150,000 ESTs derived from 27 different non-normalized EST libraries. This last component also includes tools for the comparison of tomato and Arabidopsis digital expression data. A set of query interfaces and analysis, and visualization tools have been developed and incorporated into TED, which aid users in identifying and deciphering biologically important information from our datasets. TED can be accessed at http://ted.bti.cornell.edu.

  5. Generation and Analysis of Expressed Sequence Tags (ESTs) from Halophyte Atriplex canescens to Explore Salt-Responsive Related Genes

    PubMed Central

    Li, Jingtao; Sun, Xinhua; Yu, Gang; Jia, Chengguo; Liu, Jinliang; Pan, Hongyu

    2014-01-01

    Little information is available on gene expression profiling of halophyte A. canescens. To elucidate the molecular mechanism for stress tolerance in A. canescens, a full-length complementary DNA library was generated from A. canescens exposed to 400 mM NaCl, and provided 343 high-quality ESTs. In an evaluation of 343 valid EST sequences in the cDNA library, 197 unigenes were assembled, among which 190 unigenes (83.1% ESTs) were identified according to their significant similarities with proteins of known functions. All the 343 EST sequences have been deposited in the dbEST GenBank under accession numbers JZ535802 to JZ536144. According to Arabidopsis MIPS functional category and GO classifications, we identified 193 unigenes of the 311 annotations EST, representing 72 non-redundant unigenes sharing similarities with genes related to the defense response. The sets of ESTs obtained provide a rich genetic resource and 17 up-regulated genes related to salt stress resistance were identified by qRT-PCR. Six of these genes may contribute crucially to earlier and later stage salt stress resistance. Additionally, among the 343 unigenes sequences, 22 simple sequence repeats (SSRs) were also identified contributing to the study of A. canescens resources. PMID:24960361

  6. An expressed sequence tag (EST) library for Drosophila serrata, a model system for sexual selection and climatic adaptation studies.

    PubMed

    Frentiu, Francesca D; Adamski, Marcin; McGraw, Elizabeth A; Blows, Mark W; Chenoweth, Stephen F

    2009-01-21

    The native Australian fly Drosophila serrata belongs to the highly speciose montium subgroup of the melanogaster species group. It has recently emerged as an excellent model system with which to address a number of important questions, including the evolution of traits under sexual selection and traits involved in climatic adaptation along latitudinal gradients. Understanding the molecular genetic basis of such traits has been limited by a lack of genomic resources for this species. Here, we present the first expressed sequence tag (EST) collection for D. serrata that will enable the identification of genes underlying sexually-selected phenotypes and physiological responses to environmental change and may help resolve controversial phylogenetic relationships within the montium subgroup. A normalized cDNA library was constructed from whole fly bodies at several developmental stages, including larvae and adults. Assembly of 11,616 clones sequenced from the 3' end allowed us to identify 6,607 unique contigs, of which at least 90% encoded peptides. Partial transcripts were discovered from a variety of genes of evolutionary interest by BLASTing contigs against the 12 Drosophila genomes currently sequenced. By incorporating into the cDNA library multiple individuals from populations spanning a large portion of the geographical range of D. serrata, we were able to identify 11,057 putative single nucleotide polymorphisms (SNPs), with 278 different contigs having at least one "double hit" SNP that is highly likely to be a real polymorphism. At least 394 EST-associated microsatellite markers, representing 355 different contigs, were also found, providing an additional set of genetic markers. The assembled EST library is available online at http://www.chenowethlab.org/serrata/index.cgi. We have provided the first gene collection and largest set of polymorphic genetic markers, to date, for the fly D. serrata. The EST collection will provide much needed genomic resources for this model species and facilitate comparative evolutionary studies within the montium subgroup of the D. melanogaster lineage.

  7. Characterization of expressed resistance gene analogs (RGAs) from peanut expressed sequence tags (ESTs)

    USDA-ARS?s Scientific Manuscript database

    Cultivated peanut (Arachis hypogaea L.) is one of the most important food legume crops grown worldwide, and is a major source for edible oil and protein. However, due to low genetic variation, peanut is very vulnerable to a variety of pathogens, such as early leaf spot, late leaf spot, rust and Toma...

  8. Characterization of EST-based SSR loci in the spruce budworm, Choristoneura fumiferana (Lepidoptera: Tortricidae)

    Treesearch

    B.M.T. Brunet; D. Doucet; B.R. Sturtevant; F.A.H. Sperling

    2013-01-01

    After identifying 114 microsatellite loci from Choristoneura fumiferana expressed sequence tags, 87 loci were assayed in a panel of 11 wild-caught individuals, giving 29 polymorphic loci. Further analysis of 20 of these loci on 31 individuals collected from a single population in northern Minnesota identified 14 in Hardy-Weinberg equilibrium.

  9. Association analysis of bacterial leaf spot resistance and SNP markers derived from expressed sequence tags (ESTs) in lettuce (Lactuca sativa L.)

    USDA-ARS?s Scientific Manuscript database

    Bacterial leaf spot of lettuce, caused by Xanthomonas campestris pv. vitians, is a devastating disease of lettuce worldwide. Since there are no chemicals available for effective control of the disease, host-plant resistance is highly desirable to protect lettuce production. A total of 179 lettuce ge...

  10. Genome Comparisons Reveal a Dominant Mechanism of Chromosome Number Reduction in Grasses and Accelerated Genome Evolution in Triticeae

    USDA-ARS?s Scientific Manuscript database

    Single nucleotide polymorphism was employed in the construction of a high-resolution, expressed sequence tag (EST) map of Aegilops tauschii, the diploid source of the wheat D genome. Comparison of the map with the rice and sorghum genome sequences revealed 50 inversions and translocations; 2, 8, and...

  11. Comparative expression profiling in grape (Vitis vinifera) berries derived from frequency analysis of ESTs and MPSS signatures.

    PubMed

    Iandolino, Alberto; Nobuta, Kan; da Silva, Francisco Goes; Cook, Douglas R; Meyers, Blake C

    2008-05-12

    Vitis vinifera (V. vinifera) is the primary grape species cultivated for wine production, with an industry valued annually in the billions of dollars worldwide. In order to sustain and increase grape production, it is necessary to understand the genetic makeup of grape species. Here we performed mRNA profiling using Massively Parallel Signature Sequencing (MPSS) and combined it with available Expressed Sequence Tag (EST) data. These tag-based technologies, which do not require a priori knowledge of genomic sequence, are well-suited for transcriptional profiling. The sequence depth of MPSS allowed us to capture and quantify almost all the transcripts at a specific stage in the development of the grape berry. The number and relative abundance of transcripts from stage II grape berries was defined using Massively Parallel Signature Sequencing (MPSS). A total of 2,635,293 17-base and 2,259,286 20-base signatures were obtained, representing at least 30,737 and 26,878 distinct sequences. The average normalized abundance per signature was approximately 49 TPM (Transcripts Per Million). Comparisons of the MPSS signatures with available Vitis species' ESTs and a unigene set demonstrated that 6,430 distinct contigs and 2,190 singletons have a perfect match to at least one MPSS signature. Among the matched sequences, ESTs were identified from tissues other than berries or from berries at different developmental stages. Additional MPSS signatures not matching to known grape ESTs can extend our knowledge of the V. vinifera transcriptome, particularly when these data are used to assist in annotation of whole genome sequences from Vitis vinifera. The MPSS data presented here not only achieved a higher level of saturation than previous EST based analyses, but in doing so, expand the known set of transcripts of grape berries during the unique stage in development that immediately precedes the onset of ripening. The MPSS dataset also revealed evidence of antisense expression not previously reported in grapes but comparable to that reported in other plant species. Finally, we developed a novel web-based, public resource for utilization of the grape MPSS data [1].

  12. High polymorphism in Est-SSR loci for cellulose synthase and β-amylase of sugarcane varieties (Saccharum spp.) used by the industrial sector for ethanol production.

    PubMed

    Augusto, Raphael; Maranho, Rone Charles; Mangolin, Claudete Aparecida; Pires da Silva Machado, Maria de Fátima

    2015-01-01

    High and low polymorphisms in simple sequence repeats of expressed sequence tag (EST-SSR) for specific proteins and enzymes, such as β-amylase, cellulose synthase, xyloglucan endotransglucosylase, fructose 1,6-bisphosphate aldolase, and fructose 1,6-bisphosphatase, were used to illustrate the genetic divergence within and between varieties of sugarcane (Saccharum spp.) and to guide the technological paths to optimize ethanol production from lignocellulose biomass. The varieties RB72454, RB867515, RB92579, and SP813250 on the second stage of cutting, all grown in the state of Paraná (PR), and the varieties RB92579 and SP813250 cultured in the PR state and in Northeastern Brazil, state of Pernambuco (PE), were analyzed using five EST-SSR primers for EstC66, EstC67, EstC68, EstC69, and EstC91 loci. Genetic divergence was evident in the EstC67 and EstC69 loci for β-amylase and cellulose synthase, respectively, among the four sugarcane varieties. An extremely high level of genetic differentiation was also detected in the EstC67 locus from the RB82579 and SP813250 varieties cultured in the PR and PE states. High polymorphism in SSR of the cellulose synthase locus may explain the high variability of substrates used in pretreatment and enzymatic hydrolysis processes, which has been an obstacle to effective industrial adaptations.

  13. Petaminer: Using ROOT for efficient data storage in MySQL database

    NASA Astrophysics Data System (ADS)

    Cranshaw, J.; Malon, D.; Vaniachine, A.; Fine, V.; Lauret, J.; Hamill, P.

    2010-04-01

    High Energy and Nuclear Physics (HENP) experiments store Petabytes of event data and Terabytes of calibration data in ROOT files. The Petaminer project is developing a custom MySQL storage engine to enable the MySQL query processor to directly access experimental data stored in ROOT files. Our project is addressing the problem of efficient navigation to PetaBytes of HENP experimental data described with event-level TAG metadata, which is required by data intensive physics communities such as the LHC and RHIC experiments. Physicists need to be able to compose a metadata query and rapidly retrieve the set of matching events, where improved efficiency will facilitate the discovery process by permitting rapid iterations of data evaluation and retrieval. Our custom MySQL storage engine enables the MySQL query processor to directly access TAG data stored in ROOT TTrees. As ROOT TTrees are column-oriented, reading them directly provides improved performance over traditional row-oriented TAG databases. Leveraging the flexible and powerful SQL query language to access data stored in ROOT TTrees, the Petaminer approach enables rich MySQL index-building capabilities for further performance optimization.

  14. Development of simple sequence repeat markers and diversity analysis in alfalfa (Medicago sativa L.).

    PubMed

    Wang, Zan; Yan, Hongwei; Fu, Xinnian; Li, Xuehui; Gao, Hongwen

    2013-04-01

    Efficient and robust molecular markers are essential for molecular breeding in plant. Compared to dominant and bi-allelic markers, multiple alleles of simple sequence repeat (SSR) markers are particularly informative and superior in genetic linkage map and QTL mapping in autotetraploid species like alfalfa. The objective of this study was to enrich SSR markers directly from alfalfa expressed sequence tags (ESTs). A total of 12,371 alfalfa ESTs were retrieved from the National Center for Biotechnology Information. Total 774 SSR-containing ESTs were identified from 716 ESTs. On average, one SSR was found per 7.7 kb of EST sequences. Tri-nucleotide repeats (48.8 %) was the most abundant motif type, followed by di-(26.1 %), tetra-(11.5 %), penta-(9.7 %), and hexanucleotide (3.9 %). One hundred EST-SSR primer pairs were successfully designed and 29 exhibited polymorphism among 28 alfalfa accessions. The allele number per marker ranged from two to 21 with an average of 6.8. The PIC values ranged from 0.195 to 0.896 with an average of 0.608, indicating a high level of polymorphism of the EST-SSR markers. Based on the 29 EST-SSR markers, assessment of genetic diversity was conducted and found that Medicago sativa ssp. sativa was clearly different from the other subspecies. The high transferability of those EST-SSR markers was also found for relative species.

  15. Development and use of EST-SSR markers for assessing genetic diversity in the brown planthopper (Nilaparvata lugens Stål).

    PubMed

    Jing, S; Liu, B; Peng, L; Peng, X; Zhu, L; Fu, Q; He, G

    2012-02-01

    To assess genetic diversity in populations of the brown planthopper (Nilaparvata lugens Stål) (Homoptera: Delphacidae), we have developed and applied microsatellite, or simple sequence repeat (SSR), markers from expressed sequence tags (ESTs). We found that the brown planthopper clusters of ESTs were rich in SSRs with unique frequencies and distributions of SSR motifs. Three hundred and fifty-one EST-SSR markers were developed and yielded clear bands from samples of four brown planthopper populations. High cross-species transferability of these markers was detected in the closely related planthopper N. muiri. The newly developed EST-SSR markers provided sufficient resolution to distinguish within and among biotypes. Analyses based on SSR data revealed host resistance-based genetic differentiation among different brown planthopper populations; the genetic diversity of populations feeding on susceptible rice varieties was lower than that of populations feeding on resistant rice varieties. This is the first large-scale development of brown planthopper SSR markers, which will be useful for future molecular genetics and genomics studies of this serious agricultural pest.

  16. Characterization and comparison of EST-SSR and TRAP markers for genetic analysis of the Japanese persimmon Diospyros kaki.

    PubMed

    Luo, C; Zhang, F; Zhang, Q L; Guo, D Y; Luo, Z R

    2013-01-09

    We developed and characterized expressed sequence tags (ESTs)-simple sequence repeats (SSRs) and targeted region amplified polymorphism (TRAP) markers to examine genetic relationships in the persimmon genus Diospyros gene pool. In total, we characterized 14 EST-SSR primer pairs and 36 TRAP primer combinations, which were amplified across 20 germplasms of 4 species in the genus Diospyros. We used various genetic parameters, including effective multiplex ratio (EMR), diversity index (DI), and marker index (MI), to test the utility of these markers. TRAP markers gave higher EMR (24.85) but lower DI (0.33), compared to EST-SSRs (EMR = 3.65, DI = 0.34). TRAP gave a very high MI (8.08), which was about 8 times than the MI of EST-SSR (1.25). These markers were utilized for phylogenetic inference of 20 genotypes of Diospyros kaki Thunb. and allied species, with a result that all kaki genotypes clustered closely and 3 allied species formed an independent group. These markers could be further exploited for large-scale genetic relationship inference.

  17. A blackberry (Rubus L.) expressed sequence tag library for the development of simple sequence repeat markers

    PubMed Central

    Lewers, Kim S; Saski, Chris A; Cuthbertson, Brandon J; Henry, David C; Staton, Meg E; Main, Dorrie S; Dhanaraj, Anik L; Rowland, Lisa J; Tomkins, Jeff P

    2008-01-01

    Background The recent development of novel repeat-fruiting types of blackberry (Rubus L.) cultivars, combined with a long history of morphological marker-assisted selection for thornlessness by blackberry breeders, has given rise to increased interest in using molecular markers to facilitate blackberry breeding. Yet no genetic maps, molecular markers, or even sequences exist specifically for cultivated blackberry. The purpose of this study is to begin development of these tools by generating and annotating the first blackberry expressed sequence tag (EST) library, designing primers from the ESTs to amplify regions containing simple sequence repeats (SSR), and testing the usefulness of a subset of the EST-SSRs with two blackberry cultivars. Results A cDNA library of 18,432 clones was generated from expanding leaf tissue of the cultivar Merton Thornless, a progenitor of many thornless commercial cultivars. Among the most abundantly expressed of the 3,000 genes annotated were those involved with energy, cell structure, and defense. From individual sequences containing SSRs, 673 primer pairs were designed. Of a randomly chosen set of 33 primer pairs tested with two blackberry cultivars, 10 detected an average of 1.9 polymorphic PCR products. Conclusion This rate predicts that this library may yield as many as 940 SSR primer pairs detecting 1,786 polymorphisms. This may be sufficient to generate a genetic map that can be used to associate molecular markers with phenotypic traits, making possible molecular marker-assisted breeding to compliment existing morphological marker-assisted breeding in blackberry. PMID:18570660

  18. Comparative Analysis of Expressed Genes from Cacao Meristems Infected by Moniliophthora perniciosa

    PubMed Central

    Gesteira, Abelmon S.; Micheli, Fabienne; Carels, Nicolas; Da Silva, Aline C.; Gramacho, Karina P.; Schuster, Ivan; Macêdo, Joci N.; Pereira, Gonçalo A. G.; Cascardo, Júlio C. M.

    2007-01-01

    Background and Aims Witches' broom disease is caused by the hemibiotrophic basidiomycete Moniliophthora perniciosa, and is one of the most important diseases of cacao in the western hemisphere. Because very little is known about the global process of such disease development, expressed sequence tags (ESTs) were used to identify genes expressed during the Theobroma cacao–Moniliophthora perniciosa interaction. Methods Two cDNA libraries corresponding to the resistant (RT) and susceptible (SP) cacao–M. perniciosa interactions were constructed from total RNA, using the DB SMART Creator cDNA library kit (Clontech). Clones were randomly selected, sequenced from the 5′ end and analysed using bioinformatics tools including in silico analysis of the differential gene expression. Key Results A total of 6884 ESTs were generated from the RT and SP cDNA libraries. These ESTs were composed of 2585 singlets and 341 contigs for a total of 2926 non-redundant sequences. The redundancy of the libraries was low and their specificity high when compared with the few other cacao libraries already published. Sequence analysis allowed the assignment of a putative functional category for 54 % of sequences, whereas approx. 22 % of sequences corresponded to unknown function and approx. 24 % of sequences did not show any significant similarity with other proteins present in the database. Despite the similar overall distribution of the sequences in functional categories between the two libraries, qualitative differences were observed. Genes involved during the defence response to pathogen infection or in programmed cell death were identified, such as pathogenesis related-proteins, trypsin inhibitor or oxalate oxidase, and some of them showed an in silico differential expression between the resistant and the susceptible interactions. Conclusions As far as is known this is the first EST resource from the cacao–M. perniciosa interaction and it is believed that it will provide a significant contribution to the understanding of the molecular mechanisms of the resistance and susceptibility of cacao to M. perniciosa, to develop strategies to control witches broom, and as a source of polymorphism for molecular marker development and marker-assisted selection. PMID:17557832

  19. Histoplasma capsulatum proteome response to decreased iron availability

    PubMed Central

    Winters, Michael S; Spellman, Daniel S; Chan, Qilin; Gomez, Francisco J; Hernandez, Margarita; Catron, Brittany; Smulian, Alan G; Neubert, Thomas A; Deepe, George S

    2008-01-01

    Background A fundamental pathogenic feature of the fungus Histoplasma capsulatum is its ability to evade innate and adaptive immune defenses. Once ingested by macrophages the organism is faced with several hostile environmental conditions including iron limitation. H. capsulatum can establish a persistent state within the macrophage. A gap in knowledge exists because the identities and number of proteins regulated by the organism under host conditions has yet to be defined. Lack of such knowledge is an important problem because until these proteins are identified it is unlikely that they can be targeted as new and innovative treatment for histoplasmosis. Results To investigate the proteomic response by H. capsulatum to decreasing iron availability we have created H. capsulatum protein/genomic databases compatible with current mass spectrometric (MS) search engines. Databases were assembled from the H. capsulatum G217B strain genome using gene prediction programs and expressed sequence tag (EST) libraries. Searching these databases with MS data generated from two dimensional (2D) in-gel digestions of proteins resulted in over 50% more proteins identified compared to searching the publicly available fungal databases alone. Using 2D gel electrophoresis combined with statistical analysis we discovered 42 H. capsulatum proteins whose abundance was significantly modulated when iron concentrations were lowered. Altered proteins were identified by mass spectrometry and database searching to be involved in glycolysis, the tricarboxylic acid cycle, lysine metabolism, protein synthesis, and one protein sequence whose function was unknown. Conclusion We have created a bioinformatics platform for H. capsulatum and demonstrated the utility of a proteomic approach by identifying a shift in metabolism the organism utilizes to cope with the hostile conditions provided by the host. We have shown that enzyme transcripts regulated by other fungal pathogens in response to lowering iron availability are also regulated in H. capsulatum at the protein level. We also identified H. capsulatum proteins sensitive to iron level reductions which have yet to be connected to iron availability in other pathogens. These data also indicate the complexity of the response by H. capsulatum to nutritional deprivation. Finally, we demonstrate the importance of a strain specific gene/protein database for H. capsulatum proteomic analysis. PMID:19108728

  20. An ovary transcriptome for all maturational stages of the striped bass (Morone saxatilis), a highly advanced perciform fish.

    PubMed

    Reading, Benjamin J; Chapman, Robert W; Schaff, Jennifer E; Scholl, Elizabeth H; Opperman, Charles H; Sullivan, Craig V

    2012-02-21

    The striped bass and its relatives (genus Morone) are important fisheries and aquaculture species native to estuaries and rivers of the Atlantic coast and Gulf of Mexico in North America. To open avenues of gene expression research on reproduction and breeding of striped bass, we generated a collection of expressed sequence tags (ESTs) from a complementary DNA (cDNA) library representative of their ovarian transcriptome. Sequences of a total of 230,151 ESTs (51,259,448 bp) were acquired by Roche 454 pyrosequencing of cDNA pooled from ovarian tissues obtained at all stages of oocyte growth, at ovulation (eggs), and during preovulatory atresia. Quality filtering of ESTs allowed assembly of 11,208 high-quality contigs ≥ 100 bp, including 2,984 contigs 500 bp or longer (average length 895 bp). Blastx comparisons revealed 5,482 gene orthologues (E-value < 10-3), of which 4,120 (36.7% of total contigs) were annotated with Gene Ontology terms (E-value < 10-6). There were 5,726 remaining unknown unique sequences (51.1% of total contigs). All of the high-quality EST sequences are available in the National Center for Biotechnology Information (NCBI) Short Read Archive (GenBank: SRX007394). Informative contigs were considered to be abundant if they were assembled from groups of ESTs comprising ≥ 0.15% of the total short read sequences (≥ 345 reads/contig). Approximately 52.5% of these abundant contigs were predicted to have predominant ovary expression through digital differential display in silico comparisons to zebrafish (Danio rerio) UniGene orthologues. Over 1,300 Gene Ontology terms from Biological Process classes of Reproduction, Reproductive process, and Developmental process were assigned to this collection of annotated contigs. This first large reference sequence database available for the ecologically and economically important temperate basses (genus Morone) provides a foundation for gene expression studies in these species. The predicted predominance of ovary gene expression and assignment of directly relevant Gene Ontology classes suggests a powerful utility of this dataset for analysis of ovarian gene expression related to fundamental questions of oogenesis. Additionally, a high definition Agilent 60-mer oligo ovary 'UniClone' microarray with 8 × 15,000 probe format has been designed based on this striped bass transcriptome (eArray Group: Striper Group, Design ID: 029004).

  1. Cloning and characterization of a basic Cysteine-like protease (Cathespsin L1) expressed in the gut of larval Diaprepes abbreviatus L. (Coleoptera: Curculionidae)

    USDA-ARS?s Scientific Manuscript database

    Diaprepes abbreviatus is an important pest that causes extensive damage to citrus in the USA. Analysis of an expressed sequence tag (EST) library from the digestive tract of larvae and adult D. abbreviatus identified cathepsins as major putative digestive enzymes. One class, sharing amino acid seque...

  2. Micropreparative capillary gel electrophoresis of DNA: rapid expressed sequence tag library construction.

    PubMed

    Shi, Liang; Khandurina, Julia; Ronai, Zsolt; Li, Bi-Yu; Kwan, Wai King; Wang, Xun; Guttman, András

    2003-01-01

    A capillary gel electrophoresis based automated DNA fraction collection technique was developed to support a novel DNA fragment-pooling strategy for expressed sequence tag (EST) library construction. The cDNA population is first cleaved by BsaJ I and EcoR I restriction enzymes, and then subpooled by selective ligation with specific adapters followed by polymerase chain reaction (PCR) amplification and labeling. Combination of this cDNA fingerprinting method with high-resolution capillary gel electrophoresis separation and precise fractionation of individual cDNA transcript representatives avoids redundant fragment selection and concomitant repetitive sequencing of abundant transcripts. Using a computer-controlled capillary electrophoresis device the transcript representatives were separated by their size and fractions were automatically collected in every 30 s into 96-well plates. The high resolving power of the sieving matrix ensured sequencing grade separation of the DNA fragments (i.e., single-base resolution) and successful fraction collection. Performance and precision of the fraction collection procedure was validated by PCR amplification of the collected DNA fragments followed by capillary electrophoresis analysis for size and purity verification. The collected and PCR-amplified transcript representatives, ranging up to several hundred base pairs, were then sequenced to create an EST library.

  3. Construction of cDNA library and preliminary analysis of expressed sequence tags from Siberian tiger

    PubMed Central

    Liu, Chang-Qing; Lu, Tao-Feng; Feng, Bao-Gang; Liu, Dan; Guan, Wei-Jun; Ma, Yue-Hui

    2010-01-01

    In this study we successfully constructed a full-length cDNA library from Siberian tiger, Panthera tigris altaica, the most well-known wild Animal. Total RNA was extracted from cultured Siberian tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.30×106 pfu/ml and 1.62×109 pfu/ml respectively. The proportion of recombinants from unamplified library was 90.5% and average length of exogenous inserts was 1.13 kb. A total of 282 individual ESTs with sizes ranging from 328 to 1,142bps were then analyzed the BLASTX score revealed that 53.9% of the sequences were classified as strong match, 38.6% as nominal and 7.4% as weak match. 28.0% of them were found to be related to enzyme/catalytic protein, 20.9% ESTs to metabolism, 13.1% ESTs to transport, 12.1% ESTs to signal transducer/cell communication, 9.9% ESTs to structure protein, 3.9% ESTs to immunity protein/defense metabolism, 3.2% ESTs to cell cycle, and 8.9 ESTs classified as novel genes. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genomic research of Siberian tigers. PMID:20941376

  4. Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

    PubMed

    Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

    2017-06-01

    Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  5. Construction and characterization of normalized cDNA libraries by 454 pyrosequencing and estimation of DNA methylation levels in three distantly related termite species.

    PubMed

    Hayashi, Yoshinobu; Shigenobu, Shuji; Watanabe, Dai; Toga, Kouhei; Saiki, Ryota; Shimada, Keisuke; Bourguignon, Thomas; Lo, Nathan; Hojo, Masaru; Maekawa, Kiyoto; Miura, Toru

    2013-01-01

    In termites, division of labor among castes, categories of individuals that perform specialized tasks, increases colony-level productivity and is the key to their ecological success. Although molecular studies on caste polymorphism have been performed in termites, we are far from a comprehensive understanding of the molecular basis of this phenomenon. To facilitate future molecular studies, we aimed to construct expressed sequence tag (EST) libraries covering wide ranges of gene repertoires in three representative termite species, Hodotermopsis sjostedti, Reticulitermes speratus and Nasutitermes takasagoensis. We generated normalized cDNA libraries from whole bodies, except for guts containing microbes, of almost all castes, sexes and developmental stages and sequenced them with the 454 GS FLX titanium system. We obtained >1.2 million quality-filtered reads yielding >400 million bases for each of the three species. Isotigs, which are analogous to individual transcripts, and singletons were produced by assembling the reads and annotated using public databases. Genes related to juvenile hormone, which plays crucial roles in caste differentiation of termites, were identified from the EST libraries by BLAST search. To explore the potential for DNA methylation, which plays an important role in caste differentiation of honeybees, tBLASTn searches for DNA methyltransferases (dnmt1, dnmt2 and dnmt3) and methyl-CpG binding domain (mbd) were performed against the EST libraries. All four of these genes were found in the H. sjostedti library, while all except dnmt3 were found in R. speratus and N. takasagoensis. The ratio of the observed to the expected CpG content (CpG O/E), which is a proxy for DNA methylation level, was calculated for the coding sequences predicted from the isotigs and singletons. In all of the three species, the majority of coding sequences showed depletion of CpG O/E (less than 1), and the distributions of CpG O/E were bimodal, suggesting the presence of DNA methylation.

  6. Construction and Characterization of Normalized cDNA Libraries by 454 Pyrosequencing and Estimation of DNA Methylation Levels in Three Distantly Related Termite Species

    PubMed Central

    Hayashi, Yoshinobu; Shigenobu, Shuji; Watanabe, Dai; Toga, Kouhei; Saiki, Ryota; Shimada, Keisuke; Bourguignon, Thomas; Lo, Nathan; Hojo, Masaru; Maekawa, Kiyoto; Miura, Toru

    2013-01-01

    In termites, division of labor among castes, categories of individuals that perform specialized tasks, increases colony-level productivity and is the key to their ecological success. Although molecular studies on caste polymorphism have been performed in termites, we are far from a comprehensive understanding of the molecular basis of this phenomenon. To facilitate future molecular studies, we aimed to construct expressed sequence tag (EST) libraries covering wide ranges of gene repertoires in three representative termite species, Hodotermopsis sjostedti , Reticulitermessperatus and Nasutitermestakasagoensis . We generated normalized cDNA libraries from whole bodies, except for guts containing microbes, of almost all castes, sexes and developmental stages and sequenced them with the 454 GS FLX titanium system. We obtained >1.2 million quality-filtered reads yielding >400 million bases for each of the three species. Isotigs, which are analogous to individual transcripts, and singletons were produced by assembling the reads and annotated using public databases. Genes related to juvenile hormone, which plays crucial roles in caste differentiation of termites, were identified from the EST libraries by BLAST search. To explore the potential for DNA methylation, which plays an important role in caste differentiation of honeybees, tBLASTn searches for DNA methyltransferases (dnmt1, dnmt2 and dnmt3) and methyl-CpG binding domain (mbd) were performed against the EST libraries. All four of these genes were found in the H . sjostedti library, while all except dnmt3 were found in R . speratus and N . takasagoensis . The ratio of the observed to the expected CpG content (CpG O/E), which is a proxy for DNA methylation level, was calculated for the coding sequences predicted from the isotigs and singletons. In all of the three species, the majority of coding sequences showed depletion of CpG O/E (less than 1), and the distributions of CpG O/E were bimodal, suggesting the presence of DNA methylation. PMID:24098800

  7. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus

    PubMed Central

    Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A

    2009-01-01

    Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an important step towards the identification of all genes in the citrus genome. Furthermore, public availability of the cDNA clones generated in this study, and not only their sequence, enables testing of the biological function of the genes represented in the collection. Expression of the citrus SEP3 homologue, CitrSEP, in Arabidopsis results in early flowering, along with other phenotypes resembling the over-expression of the Arabidopsis SEPALLATA genes. Our findings suggest that the members of the SEP gene family play similar roles in these quite distant plant species. PMID:19747386

  8. Identification of genes differentially expressed in Mikania micrantha during Cuscuta campestris infection by suppression subtractive hybridization.

    PubMed

    Li, Dong-Mei; Staehelin, Christian; Zhang, Yi-Shun; Peng, Shao-Lin

    2009-09-01

    The influence of Cuscuta campestris on its host Mikania micrantha has been studied with respect to biomass accumulation, physiology and ecology. Molecular events of this parasitic plant-plant interaction are poorly understood, however. In this study, we identified novel genes from M. micrantha induced by C. campestris infection. Genes expressed upon parasitization by C. campestris at early post-penetration stages were investigated by construction and characterization of subtracted cDNA libraries from shoots and stems of M. micrantha. Three hundred and three presumably up-regulated expressed sequence tags (ESTs) were identified and classified in functional categories, such as "metabolism", "cell defence and stress", "transcription factor", "signal transduction", "transportation" and "photosynthesis". In shoots and stems of infected M. micrantha, genes associated with defence responses and cell wall modifications were induced, confirming similar data from other parasitic plant-plant interactions. However, gene expression profiles in infected shoots and stems were found to be different. Compared to infected shoots, more genes induced in response to biotic and abiotic stress factors were identified in infected stems. Furthermore, database comparisons revealed a notable number of M. micrantha ESTs that matched genes with unknown function. Expression analysis by quantitative real-time RT-PCR of 21 genes (from different functional categories) showed significantly increased levels for 13 transcripts in response to C. campestris infection. In conclusion, this study provides an overview of genes from parasitized M. micrantha at early post-penetration stages. The acquired data form the basis for a molecular understanding of host reactions in response to parasitic plants.

  9. Sequencing analysis of 20,000 full-length cDNA clones from cassava reveals lineage specific expansions in gene families related to stress response

    PubMed Central

    Sakurai, Tetsuya; Plata, Germán; Rodríguez-Zapata, Fausto; Seki, Motoaki; Salcedo, Andrés; Toyoda, Atsushi; Ishiwata, Atsushi; Tohme, Joe; Sakaki, Yoshiyuki; Shinozaki, Kazuo; Ishitani, Manabu

    2007-01-01

    Background Cassava, an allotetraploid known for its remarkable tolerance to abiotic stresses is an important source of energy for humans and animals and a raw material for many industrial processes. A full-length cDNA library of cassava plants under normal, heat, drought, aluminum and post harvest physiological deterioration conditions was built; 19968 clones were sequence-characterized using expressed sequence tags (ESTs). Results The ESTs were assembled into 6355 contigs and 9026 singletons that were further grouped into 10577 scaffolds; we found 4621 new cassava sequences and 1521 sequences with no significant similarity to plant protein databases. Transcripts of 7796 distinct genes were captured and we were able to assign a functional classification to 78% of them while finding more than half of the enzymes annotated in metabolic pathways in Arabidopsis. The annotation of sequences that were not paired to transcripts of other species included many stress-related functional categories showing that our library is enriched with stress-induced genes. Finally, we detected 230 putative gene duplications that include key enzymes in reactive oxygen species signaling pathways and could play a role in cassava stress response features. Conclusion The cassava full-length cDNA library here presented contains transcripts of genes involved in stress response as well as genes important for different areas of cassava research. This library will be an important resource for gene discovery, characterization and cloning; in the near future it will aid the annotation of the cassava genome. PMID:18096061

  10. Expressed sequence tags from the oomycete fish pathogen Saprolegnia parasitica reveal putative virulence factors

    PubMed Central

    Torto-Alalibo, Trudy; Tian, Miaoying; Gajendran, Kamal; Waugh, Mark E; van West, Pieter; Kamoun, Sophien

    2005-01-01

    Background The oomycete Saprolegnia parasitica is one of the most economically important fish pathogens. There is a dramatic recrudescence of Saprolegnia infections in aquaculture since the use of the toxic organic dye malachite green was banned in 2002. Little is known about the molecular mechanisms underlying pathogenicity in S. parasitica and other animal pathogenic oomycetes. In this study we used a genomics approach to gain a first insight into the transcriptome of S. parasitica. Results We generated 1510 expressed sequence tags (ESTs) from a mycelial cDNA library of S. parasitica. A total of 1279 consensus sequences corresponding to 525944 base pairs were assembled. About half of the unigenes showed similarities to known protein sequences or motifs. The S. parasitica sequences tended to be relatively divergent from Phytophthora sequences. Based on the sequence alignments of 18 conserved proteins, the average amino acid identity between S. parasitica and three Phytophthora species was 77% compared to 93% within Phytophthora. Several S. parasitica cDNAs, such as those with similarity to fungal type I cellulose binding domain proteins, PAN/Apple module proteins, glycosyl hydrolases, proteases, as well as serine and cysteine protease inhibitors, were predicted to encode secreted proteins that could function in virulence. Some of these cDNAs were more similar to fungal proteins than to other eukaryotic proteins confirming that oomycetes and fungi share some virulence components despite their evolutionary distance Conclusion We provide a first glimpse into the gene content of S. parasitica, a reemerging oomycete fish pathogen. These resources will greatly accelerate research on this important pathogen. The data is available online through the Oomycete Genomics Database [1]. PMID:16076392

  11. SalmonDB: a bioinformatics resource for Salmo salar and Oncorhynchus mykiss

    PubMed Central

    Di Génova, Alex; Aravena, Andrés; Zapata, Luis; González, Mauricio; Maass, Alejandro; Iturra, Patricia

    2011-01-01

    SalmonDB is a new multiorganism database containing EST sequences from Salmo salar, Oncorhynchus mykiss and the whole genome sequence of Danio rerio, Gasterosteus aculeatus, Tetraodon nigroviridis, Oryzias latipes and Takifugu rubripes, built with core components from GMOD project, GOPArc system and the BioMart project. The information provided by this resource includes Gene Ontology terms, metabolic pathways, SNP prediction, CDS prediction, orthologs prediction, several precalculated BLAST searches and domains. It also provides a BLAST server for matching user-provided sequences to any of the databases and an advanced query tool (BioMart) that allows easy browsing of EST databases with user-defined criteria. These tools make SalmonDB database a valuable resource for researchers searching for transcripts and genomic information regarding S. salar and other salmonid species. The database is expected to grow in the near feature, particularly with the S. salar genome sequencing project. Database URL: http://genomicasalmones.dim.uchile.cl/ PMID:22120661

  12. SalmonDB: a bioinformatics resource for Salmo salar and Oncorhynchus mykiss.

    PubMed

    Di Génova, Alex; Aravena, Andrés; Zapata, Luis; González, Mauricio; Maass, Alejandro; Iturra, Patricia

    2011-01-01

    SalmonDB is a new multiorganism database containing EST sequences from Salmo salar, Oncorhynchus mykiss and the whole genome sequence of Danio rerio, Gasterosteus aculeatus, Tetraodon nigroviridis, Oryzias latipes and Takifugu rubripes, built with core components from GMOD project, GOPArc system and the BioMart project. The information provided by this resource includes Gene Ontology terms, metabolic pathways, SNP prediction, CDS prediction, orthologs prediction, several precalculated BLAST searches and domains. It also provides a BLAST server for matching user-provided sequences to any of the databases and an advanced query tool (BioMart) that allows easy browsing of EST databases with user-defined criteria. These tools make SalmonDB database a valuable resource for researchers searching for transcripts and genomic information regarding S. salar and other salmonid species. The database is expected to grow in the near feature, particularly with the S. salar genome sequencing project. Database URL: http://genomicasalmones.dim.uchile.cl/

  13. Creating and virtually screening databases of fluorescently-labelled compounds for the discovery of target-specific molecular probes

    NASA Astrophysics Data System (ADS)

    Kamstra, Rhiannon L.; Dadgar, Saedeh; Wigg, John; Chowdhury, Morshed A.; Phenix, Christopher P.; Floriano, Wely B.

    2014-11-01

    Our group has recently demonstrated that virtual screening is a useful technique for the identification of target-specific molecular probes. In this paper, we discuss some of our proof-of-concept results involving two biologically relevant target proteins, and report the development of a computational script to generate large databases of fluorescence-labelled compounds for computer-assisted molecular design. The virtual screening of a small library of 1,153 fluorescently-labelled compounds against two targets, and the experimental testing of selected hits reveal that this approach is efficient at identifying molecular probes, and that the screening of a labelled library is preferred over the screening of base compounds followed by conjugation of confirmed hits. The automated script for library generation explores the known reactivity of commercially available dyes, such as NHS-esters, to create large virtual databases of fluorescence-tagged small molecules that can be easily synthesized in a laboratory. A database of 14,862 compounds, each tagged with the ATTO680 fluorophore was generated with the automated script reported here. This library is available for downloading and it is suitable for virtual ligand screening aiming at the identification of target-specific fluorescent molecular probes.

  14. Lower Granite Dam Smolt Monitoring Program, 2003-2004 Annual Report.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mensik, Fred; Rapp, Shawn; Ross, Doug

    2004-08-01

    The 2003 fish collection season at Lower Granite Dam Juvenile Fish Facility (LGR) was characterized by water temperatures, total flows and spill that were below the five year average, low levels of debris, and increased smolt collection numbers compared to 2002 with the exception of unclipped sockeye/kokanee. There were 6,183,825 juvenile salmonids collected. Of these, 6,054,167 were transported to release sites below Bonneville Dam, 5,957,885 by barge and 96,282 by truck. An additional 102,340 fish were bypassed back to the river, primarily due to research projects with another 62,122 bypassed through the PIT-tag bypass system. According to the PTAGIS database,more » 152,268 PIT-tagged fish were detected at Lower Granite Dam. Of these, Smolt Monitoring Staff recorded 345 PIT-tagged raceway and sample mortalities. Of the 6,183,825 total fish collected, 113,290 were PIT-tagged or radio tagged and 380 were sacrificed by researchers. The collection included 836,885 fish that had hatchery marks other than clipped fins (elastomer, freeze brands or Coded Wire Tags). An estimated 54,857 incidental fish were collected with an additional 8,730 adult salmonids removed from the separator.« less

  15. Using kittens to unlock photo-sharing website datasets for environmental applications

    NASA Astrophysics Data System (ADS)

    Gascoin, Simon

    2016-04-01

    Mining photo-sharing websites is a promising approach to complement in situ and satellite observations of the environment, however a challenge is to deal with the large degree of noise inherent to online social datasets. Here I explored the value of the Flickr image hosting website database to monitor the snow cover in the Pyrenees. Using the Flickr application programming interface (API) I queried all the public images metadata tagged at least with one of the following words: "snow", "neige", "nieve", "neu" (snow in French, Spanish and Catalan languages). The search was limited to the geo-tagged pictures taken in the Pyrenees area. However, the number of public pictures available in the Flickr database for a given time interval depends on several factors, including the Flickr website popularity and the development of digital photography. Thus, I also searched for all Flickr images tagged with "chat", "gat" or "gato" (cat in French, Spanish and Catalan languages). The tag "cat" was not considered in order to exclude the results from North America where Flickr got popular earlier than in Europe. The number of "cat" images per month was used to fit a model of the number of images uploaded in Flickr with time. This model was used to remove this trend in the numbers of snow-tagged photographs. The resulting time series was compared to a time series of the snow cover area derived from the MODIS satellite over the same region. Both datasets are well correlated; in particular they exhibit the same seasonal evolution, although the inter-annual variabilities are less similar. I will also discuss which other factors may explain the main discrepancies in order to further decrease the noise in the Flickr dataset.

  16. Development and Application of a Salmonid EST Database and cDNA Microarray: Data Mining and Interspecific Hybridization Characteristics

    PubMed Central

    Rise, Matthew L.; von Schalburg, Kristian R.; Brown, Gordon D.; Mawer, Melanie A.; Devlin, Robert H.; Kuipers, Nathanael; Busby, Maura; Beetz-Sargent, Marianne; Alberto, Roberto; Gibbs, A. Ross; Hunt, Peter; Shukin, Robert; Zeznik, Jeffrey A.; Nelson, Colleen; Jones, Simon R.M.; Smailus, Duane E.; Jones, Steven J.M.; Schein, Jacqueline E.; Marra, Marco A.; Butterfield, Yaron S.N.; Stott, Jeff M.; Ng, Siemon H.S.; Davidson, William S.; Koop, Ben F.

    2004-01-01

    We report 80,388 ESTs from 23 Atlantic salmon (Salmo salar) cDNA libraries (61,819 ESTs), 6 rainbow trout (Oncorhynchus mykiss) cDNA libraries (14,544 ESTs), 2 chinook salmon (Oncorhynchus tshawytscha) cDNA libraries (1317 ESTs), 2 sockeye salmon (Oncorhynchus nerka) cDNA libraries (1243 ESTs), and 2 lake whitefish (Coregonus clupeaformis) cDNA libraries (1465 ESTs). The majority of these are 3′ sequences, allowing discrimination between paralogs arising from a recent genome duplication in the salmonid lineage. Sequence assembly reveals 28,710 different S. salar, 8981 O. mykiss, 1085 O. tshawytscha, 520 O. nerka, and 1176 C. clupeaformis putative transcripts. We annotate the submitted portion of our EST database by molecular function. Higher- and lower-molecular-weight fractions of libraries are shown to contain distinct gene sets, and higher rates of gene discovery are associated with higher-molecular weight libraries. Pyloric caecum library group annotations indicate this organ may function in redox control and as a barrier against systemic uptake of xenobiotics. A microarray is described, containing 7356 salmonid elements representing 3557 different cDNAs. Analyses of cross-species hybridizations to this cDNA microarray indicate that this resource may be used for studies involving all salmonids. PMID:14962987

  17. Review on SAW RFID tags.

    PubMed

    Plessky, Victor P; Reindl, Leonhard M

    2010-03-01

    SAW tags were invented more than 30 years ago, but only today are the conditions united for mass application of this technology. The devices in the 2.4-GHz ISM band can be routinely produced with optical lithography, high-resolution radar systems can be built up using highly sophisticated, but low-cost RF-chips, and the Internet is available for global access to the tag databases. The "Internet of Things," or I-o-T, will demand trillions of cheap tags and sensors. The SAW tags can overcome semiconductor-based analogs in many aspects: they can be read at a distance of a few meters with readers radiating power levels 2 to 3 orders lower, they are cheap, and they can operate in robust environments. Passive SAW tags are easily combined with sensors. Even the "anti-collision" problem (i.e., the simultaneous reading of many nearby tags) has adequate solutions for many practical applications. In this paper, we discuss the state-of-the-art in the development of SAW tags. The design approaches will be reviewed and optimal tag designs, as well as encoding methods, will be demonstrated. We discuss ways to reduce the size and cost of these devices. A few practical examples of tags using a time-position coding with 10(6) different codes will be demonstrated. Phase-coded devices can additionally increase the number of codes at the expense of a reduction of reading distance. We also discuss new and exciting perspectives of using ultra wide band (UWB) technology for SAW-tag systems. The wide frequency band available for this standard provides a great opportunity for SAW tags to be radically reduced in size to about 1 x 1 mm(2) while keeping a practically infinite number of possible different codes. Finally, the reader technology will be discussed, as well as detailed comparison made between SAW tags and IC-based semiconductor device.

  18. Systematic Analysis of Arabidopsis Organelles and a Protein Localization Database for Facilitating Fluorescent Tagging of Full-Length Arabidopsis Proteins1[W

    PubMed Central

    Li, Shijun; Ehrhardt, David W.; Rhee, Seung Y.

    2006-01-01

    Cells are organized into a complex network of subcellular compartments that are specialized for various biological functions. Subcellular location is an important attribute of protein function. To facilitate systematic elucidation of protein subcellular location, we analyzed experimentally verified protein localization data of 1,300 Arabidopsis (Arabidopsis thaliana) proteins. The 1,300 experimentally verified proteins are distributed among 40 different compartments, with most of the proteins localized to four compartments: mitochondria (36%), nucleus (28%), plastid (17%), and cytosol (13.3%). About 19% of the proteins are found in multiple compartments, in which a high proportion (36.4%) is localized to both cytosol and nucleus. Characterization of the overrepresented Gene Ontology molecular functions and biological processes suggests that the Golgi apparatus and peroxisome may play more diverse functions but are involved in more specialized processes than other compartments. To support systematic empirical determination of protein subcellular localization using a technology called fluorescent tagging of full-length proteins, we developed a database and Web application to provide preselected green fluorescent protein insertion position and primer sequences for all Arabidopsis proteins to study their subcellular localization and to store experimentally verified protein localization images, videos, and their annotations of proteins generated using the fluorescent tagging of full-length proteins technology. The database can be searched, browsed, and downloaded using a Web browser at http://aztec.stanford.edu/gfp/. The software can also be downloaded from the same Web site for local installation. PMID:16617091

  19. An Optimized Informatics Pipeline for Mass Spectrometry-Based Peptidomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Chaochao; Monroe, Matthew E.; Xu, Zhe

    2015-12-26

    Comprehensive MS analysis of peptidome, the intracellular and intercellular products of protein degradation, has the potential to provide novel insights on endogenous proteolytic processing and their utility in disease diagnosis and prognosis. Along with the advances in MS instrumentation, a plethora of proteomics data analysis tools have been applied for direct use in peptidomics; however an evaluation of the currently available informatics pipelines for peptidomics data analysis has yet to be reported. In this study, we set off by evaluating the results of several popular MS/MS database search engines including MS-GF+, SEQUEST and MS-Align+ for peptidomics data analysis, followed bymore » identification and label-free quantification using the well-established accurate mass and time (AMT) tag and newly developed informed quantification (IQ) approaches, both based on direct LC-MS analysis. Our result demonstrated that MS-GF+ outperformed both SEQUEST and MS-Align+ in identifying peptidome peptides. Using a database established from the MS-GF+ peptide identifications, both the AMT tag and IQ approaches provided significantly deeper peptidome coverage and less missing value for each individual data set than the MS/MS methods, while achieving robust label-free quantification. Besides having an excellent correlation with the AMT tag quantification results, IQ also provided slightly higher peptidome coverage than AMT. Taken together, we propose an optimal informatics pipeline combining MS-GF+ for initial database searching with IQ (or AMT) for identification and label-free quantification for high-throughput, comprehensive and quantitative peptidomics analysis.« less

  20. Development and characterization of microsatellite markers for the Pacific abalone ( Haliotis discus) via EST database mining

    NASA Astrophysics Data System (ADS)

    Zhan, Aibin; Bao, Zhenmin; Wang, Mingling; Chang, Dan; Yuan, Jian; Wang, Xiaolong; Hu, Xiaoli; Liang, Chengzhu; Hu, Jingjie

    2008-05-01

    The EST database of the Pacific abalone ( Haliotis discus) was mined for developing microsatellite markers. A total of 1476 EST sequences were registered in GenBank when data mining was performed. Fifty sequences (approximately 3.4%) were found to contain one or more microsatellites. Based on the length and GC content of the flanking regions, cluster analysis and BLASTN, 13 microsatellite-containing ESTs were selected for PCR primer design. The results showed that 10 out of 13 primer pairs could amplify scorable PCR products and showed polymorphism. The number of alleles ranged from 2 to 13 and the values of H o and H e varied from 0.1222 to 0.8611 and 0.2449 to 0.9311, respectively. No significant linkage disequilibrium (LD) between any pairs of these loci was found, and 6 of 10 loci conformed to the Hardy-Weinberg equilibrium (HWE). These EST-SSRs are therefore potential tools for studies of intraspecies variation and hybrid identification.

  1. Association study of IL10, IL1beta, and IL1RN and schizophrenia using tag SNPs from a comprehensive database: suggestive association with rs16944 at IL1beta.

    PubMed

    Shirts, Brian H; Wood, Joel; Yolken, Robert H; Nimgaonkar, Vishwajit L

    2006-12-01

    Genetic association studies of several candidate cytokine genes have been motivated by evidence of immune dysfunction among patients with schizophrenia. Intriguing but inconsistent associations have been reported with polymorphisms of three positional candidate genes, namely IL1beta, IL1RN, and IL10. We used comprehensive sequencing data from the Seattle SNPs database to select tag SNPs that represent all common polymorphisms in the Caucasian population at these loci. Associations with 28 tag SNPs were evaluated in 478 cases and 501 unscreened control individuals, while accounting for population sub-structure using the genomic control method. The samples were also stratified by gender, diagnostic category, and exposure to infectious agents. Significant association was not detected after correcting for multiple comparisons. However, meta-analysis of our data combined with previously published association studies of rs16944 (IL1beta -511) suggests that the C allele confers modest risk for schizophrenia among individuals reporting Caucasian ancestry, but not Asians (Caucasians, n=819 cases, 1292 controls; p=0.0013, OR=1.24, 95% CI 1.09, 1.41).

  2. Identification and characterization of microRNAs and their target genes from Nile tilapia (Oreochromis niloticus).

    PubMed

    Huang, Yong; Ma, Xiu Ying; Yang, You Bing; Ren, Hong Tao; Sun, Xi Hong; Wang, Li Rui

    MicroRNAs (miRNAs) are a class of small single-stranded, endogenous 21-22 nt non-coding RNAs that regulate their target mRNA levels by causing either inactivation or degradation of the mRNAs. In recent years, miRNA genes have been identified from mammals, insects, worms, plants, and viruses. In this research, bioinformatics approaches were used to predict potential miRNAs and their targets in Nile tilapia from the expressed sequence tag (EST) and genomic survey sequence (GSS) database, respectively, based on the conservation of miRNAs in many animal species. A total of 19 potential miRNAs were detected following a range of strict filtering criteria. To test the validity of the bioinformatics method, seven predicted Nile tilapia miRNA genes were selected for further biological validation, and their mature miRNA transcripts were successfully detected by stem-loop RT-PCR experiments. Using these potential miRNAs, we found 56 potential targets in this species. Most of the target mRNAs appear to be involved in development, metabolism, signal transduction, transcription regulation and stress responses. Overall, our findings will provide an important foundation for further research on miRNAs function in the Nile tilapia.

  3. Genes expressed during the development and ripening of watermelon fruit.

    PubMed

    Levi, A; Davis, A; Hernandez, A; Wechter, P; Thimmapuram, J; Trebitsh, T; Tadmor, Y; Katzir, N; Portnoy, V; King, S

    2006-11-01

    A normalized cDNA library was constructed using watermelon flesh mRNA from three distinct developmental time-points and was subtracted by hybridization with leaf cDNA. Random cDNA clones of the watermelon flesh subtraction library were sequenced from the 5' end in order to identify potentially informative genes associated with fruit setting, development, and ripening. One-thousand and forty-six 5'-end sequences (expressed sequence tags; ESTs) were assembled into 832 non-redundant sequences, designated as "EST-unigenes". Of these 832 "EST-unigenes", 254 ( approximately 30%) have no significant homology to sequences published so far for other plant species. Additionally, 168 "EST-unigenes" ( approximately 20%) correspond to genes with unknown function, whereas 410 "EST-unigenes" ( approximately 50%) correspond to genes with known function in other plant species. These "EST-unigenes" are mainly associated with metabolism, membrane transport, cytoskeleton synthesis and structure, cell wall formation and cell division, signal transduction, nucleic acid binding and transcription factors, defense and stress response, and secondary metabolism. This study provides the scientific community with novel genetic information for watermelon as well as an expanded pool of genes associated with fruit development in watermelon. These genes will be useful targets in future genetic and functional genomic studies of watermelon and its development.

  4. Expressed sequence tags from heat-shocked seagrass Zostera noltii (Hornemann) from its southern distribution range.

    PubMed

    Massa, Sónia I; Pearson, Gareth A; Aires, Tânia; Kube, Michael; Olsen, Jeanine L; Reinhardt, Richard; Serrão, Ester A; Arnaud-Haond, Sophie

    2011-09-01

    Predicted global climate change threatens the distributional ranges of species worldwide. We identified genes expressed in the intertidal seagrass Zostera noltii during recovery from a simulated low tide heat-shock exposure. Five Expressed Sequence Tag (EST) libraries were compared, corresponding to four recovery times following sub-lethal temperature stress, and a non-stressed control. We sequenced and analyzed 7009 sequence reads from 30min, 2h, 4h and 24h after the beginning of the heat-shock (AHS), and 1585 from the control library, for a total of 8594 sequence reads. Among 51 Tentative UniGenes (TUGs) exhibiting significantly different expression between libraries, 19 (37.3%) were identified as 'molecular chaperones' and were over-expressed following heat-shock, while 12 (23.5%) were 'photosynthesis TUGs' generally under-expressed in heat-shocked plants. A time course analysis of expression showed a rapid increase in expression of the molecular chaperone class, most of which were heat-shock proteins; which increased from 2 sequence reads in the control library to almost 230 in the 30min AHS library, followed by a slow decrease during further recovery. In contrast, 'photosynthesis TUGs' were under-expressed 30min AHS compared with the control library, and declined progressively with recovery time in the stress libraries, with a total of 29 sequence reads 24h AHS, compared with 125 in the control. A total of 4734 TUGs were screened for EST-Single Sequence Repeats (EST-SSRs) and 86 microsatellites were identified. Copyright © 2011 Elsevier B.V. All rights reserved.

  5. Chromosome-specific physical localisation of expressed sequence tag loci in Corchorus olitorius L.

    PubMed

    Joshi, A; Das, S K; Samanta, P; Paria, P; Sen, S K; Basu, A

    2014-11-01

    Jute (Corchorus spp.), as a natural fibre-producing species, ranks next only to cotton. Inadequate understanding of its genetic architecture is a major lacuna for genetic improvement of this crop in terms of yield and quality. Establishment of a physical map provides a genomic tool that helps in positional cloning of valuable genes. In this report, an attempt was initiated to study association and localisation of single copy expressed sequence tag (EST) loci in the genome of Corchorus olitorius. The chromosome-specific association of EST was determined based on the appearance of an extra signal for a single copy cDNA probe in mitotic interphase nuclei of specific trisomic(s) for fluorescence in situ hybridisation, and validated using a cDNA fragment of the 26S rRNA gene (600 bp) as molecular probe. The probe exhibited three signals in meiotic interphase nuclei of trisomic 5, instead of two as observed in diploids and other trisomics, indicating its association with chromosome 5. Subsequent hybridisation of the same probe on the pachytene chromosomes of diploids confirmed that 26S rRNA occupies the terminal end of the short arm of chromosome 5 in C. olitorius. Subsequently, chromosome-specific association of 63 single copy EST and their physical localisation were determined on chromosomes 2, 4, 5 and 7. The study describes chromosome-specific physical localisation of genes in jute. The approach used here could be a step towards construction of genome-wide physical maps for any recalcitrant plant species like jute. © 2014 German Botanical Society and The Royal Botanical Society of the Netherlands.

  6. Conservation and divergence of ADAM family proteins in the Xenopus genome

    PubMed Central

    2010-01-01

    Background Members of the disintegrin metalloproteinase (ADAM) family play important roles in cellular and developmental processes through their functions as proteases and/or binding partners for other proteins. The amphibian Xenopus has long been used as a model for early vertebrate development, but genome-wide analyses for large gene families were not possible until the recent completion of the X. tropicalis genome sequence and the availability of large scale expression sequence tag (EST) databases. In this study we carried out a systematic analysis of the X. tropicalis genome and uncovered several interesting features of ADAM genes in this species. Results Based on the X. tropicalis genome sequence and EST databases, we identified Xenopus orthologues of mammalian ADAMs and obtained full-length cDNA clones for these genes. The deduced protein sequences, synteny and exon-intron boundaries are conserved between most human and X. tropicalis orthologues. The alternative splicing patterns of certain Xenopus ADAM genes, such as adams 22 and 28, are similar to those of their mammalian orthologues. However, we were unable to identify an orthologue for ADAM7 or 8. The Xenopus orthologue of ADAM15, an active metalloproteinase in mammals, does not contain the conserved zinc-binding motif and is hence considered proteolytically inactive. We also found evidence for gain of ADAM genes in Xenopus as compared to other species. There is a homologue of ADAM10 in Xenopus that is missing in most mammals. Furthermore, a single scaffold of X. tropicalis genome contains four genes encoding ADAM28 homologues, suggesting genome duplication in this region. Conclusions Our genome-wide analysis of ADAM genes in X. tropicalis revealed both conservation and evolutionary divergence of these genes in this amphibian species. On the one hand, all ADAMs implicated in normal development and health in other species are conserved in X. tropicalis. On the other hand, some ADAM genes and ADAM protease activities are absent, while other novel ADAM proteins in this species are predicted by this study. The conservation and unique divergence of ADAM genes in Xenopus probably reflect the particular selective pressures these amphibian species faced during evolution. PMID:20630080

  7. Rediscovering Medicinal Plants' Potential with OMICS: Microsatellite Survey in Expressed Sequence Tags of Eleven Traditional Plants with Potent Antidiabetic Properties

    PubMed Central

    Sahu, Jagajjit; Sen, Priyabrata; Choudhury, Manabendra Dutta; Dehury, Budheswar; Barooah, Madhumita; Modi, Mahendra Kumar

    2014-01-01

    Abstract Herbal medicines and traditionally used medicinal plants present an untapped potential for novel molecular target discovery using systems science and OMICS biotechnology driven strategies. Since up to 40% of the world's poor people have no access to government health services, traditional and folk medicines are often the only therapeutics available to them. In this vein, North East (NE) India is recognized for its rich bioresources. As part of the Indo-Burma hotspot, it is regarded as an epicenter of biodiversity for several plants having myriad traditional uses, including medicinal use. However, the improvement of these valuable bioresources through molecular breeding strategies, for example, using genic microsatellites or Simple Sequence Repeats (SSRs) or Expressed Sequence Tags (ESTs)-derived SSRs has not been fully utilized in large scale to date. In this study, we identified a total of 47,700 microsatellites from 109,609 ESTs of 11 medicinal plants (pineapple, papaya, noyontara, bitter orange, bermuda brass, ratalu, barbados nut, mango, mulberry, lotus, and guduchi) having proven antidiabetic properties. A total of 58,159 primer pairs were designed for the non-redundant 8060 SSR-positive ESTs and putative functions were assigned to 4483 unique contigs. Among the identified microsatellites, excluding mononucleotide repeats, di-/trinucleotides are predominant, among which repeat motifs of AG/CT and AAG/CTT were most abundant. Similarity search of SSR containing ESTs and antidiabetic gene sequences revealed 11 microsatellites linked to antidiabetic genes in five plants. GO term enrichment analysis revealed a total of 80 enriched GO terms widely distributed in 53 biological processes, 17 molecular functions, and 10 cellular components associated with the 11 markers. The present study therefore provides concrete insights into the frequency and distribution of SSRs in important medicinal resources. The microsatellite markers reported here markedly add to the genetic stock for cross transferability in these plants and the literature on biomarkers and novel drug discovery for common chronic diseases such as diabetes. PMID:24802971

  8. Sorghum Expressed Sequence Tags Identify Signature Genes for Drought, Pathogenesis, and Skotomorphogenesis from a Milestone Set of 16,801 Unique Transcripts1[w

    PubMed Central

    Pratt, Lee H.; Liang, Chun; Shah, Manish; Sun, Feng; Wang, Haiming; Reid, St. Patrick; Gingle, Alan R.; Paterson, Andrew H.; Wing, Rod; Dean, Ralph; Klein, Robert; Nguyen, Henry T.; Ma, Hong-mei; Zhao, Xin; Morishige, Daryl T.; Mullet, John E.; Cordonnier-Pratt, Marie-Michèle

    2005-01-01

    Improved knowledge of the sorghum transcriptome will enhance basic understanding of how plants respond to stresses and serve as a source of genes of value to agriculture. Toward this goal, Sorghum bicolor L. Moench cDNA libraries were prepared from light- and dark-grown seedlings, drought-stressed plants, Colletotrichum-infected seedlings and plants, ovaries, embryos, and immature panicles. Other libraries were prepared with meristems from Sorghum propinquum (Kunth) Hitchc. that had been photoperiodically induced to flower, and with rhizomes from S. propinquum and johnsongrass (Sorghum halepense L. Pers.). A total of 117,682 expressed sequence tags (ESTs) were obtained representing both 3′ and 5′ sequences from about half that number of cDNA clones. A total of 16,801 unique transcripts, representing tentative UniScripts (TUs), were identified from 55,783 3′ ESTs. Of these TUs, 9,032 are represented by two or more ESTs. Collectively, these libraries were predicted to contain a total of approximately 31,000 TUs. Individual libraries, however, were predicted to contain no more than about 6,000 to 9,000, with the exception of light-grown seedlings, which yielded an estimate of close to 13,000. In addition, each library exhibits about the same level of complexity with respect to both the number of TUs preferentially expressed in that library and the frequency with which two or more ESTs is found in only that library. These results indicate that the sorghum genome is expressed in highly selective fashion in the individual organs and in response to the environmental conditions surveyed here. Close to 2,000 differentially expressed TUs were identified among the cDNA libraries examined, of which 775 were differentially expressed at a confidence level of 98%. From these 775 TUs, signature genes were identified defining drought, Colletotrichum infection, skotomorphogenesis (etiolation), ovary, immature panicle, and embryo. PMID:16169961

  9. Rediscovering medicinal plants' potential with OMICS: microsatellite survey in expressed sequence tags of eleven traditional plants with potent antidiabetic properties.

    PubMed

    Sahu, Jagajjit; Sen, Priyabrata; Choudhury, Manabendra Dutta; Dehury, Budheswar; Barooah, Madhumita; Modi, Mahendra Kumar; Talukdar, Anupam Das

    2014-05-01

    Herbal medicines and traditionally used medicinal plants present an untapped potential for novel molecular target discovery using systems science and OMICS biotechnology driven strategies. Since up to 40% of the world's poor people have no access to government health services, traditional and folk medicines are often the only therapeutics available to them. In this vein, North East (NE) India is recognized for its rich bioresources. As part of the Indo-Burma hotspot, it is regarded as an epicenter of biodiversity for several plants having myriad traditional uses, including medicinal use. However, the improvement of these valuable bioresources through molecular breeding strategies, for example, using genic microsatellites or Simple Sequence Repeats (SSRs) or Expressed Sequence Tags (ESTs)-derived SSRs has not been fully utilized in large scale to date. In this study, we identified a total of 47,700 microsatellites from 109,609 ESTs of 11 medicinal plants (pineapple, papaya, noyontara, bitter orange, bermuda brass, ratalu, barbados nut, mango, mulberry, lotus, and guduchi) having proven antidiabetic properties. A total of 58,159 primer pairs were designed for the non-redundant 8060 SSR-positive ESTs and putative functions were assigned to 4483 unique contigs. Among the identified microsatellites, excluding mononucleotide repeats, di-/trinucleotides are predominant, among which repeat motifs of AG/CT and AAG/CTT were most abundant. Similarity search of SSR containing ESTs and antidiabetic gene sequences revealed 11 microsatellites linked to antidiabetic genes in five plants. GO term enrichment analysis revealed a total of 80 enriched GO terms widely distributed in 53 biological processes, 17 molecular functions, and 10 cellular components associated with the 11 markers. The present study therefore provides concrete insights into the frequency and distribution of SSRs in important medicinal resources. The microsatellite markers reported here markedly add to the genetic stock for cross transferability in these plants and the literature on biomarkers and novel drug discovery for common chronic diseases such as diabetes.

  10. An Expressed Sequence Tag collection from the male antennae of the Noctuid moth Spodoptera littoralis: a resource for olfactory and pheromone detection research

    PubMed Central

    2011-01-01

    Background Nocturnal insects such as moths are ideal models to study the molecular bases of olfaction that they use, among examples, for the detection of mating partners and host plants. Knowing how an odour generates a neuronal signal in insect antennae is crucial for understanding the physiological bases of olfaction, and also could lead to the identification of original targets for the development of olfactory-based control strategies against herbivorous moth pests. Here, we describe an Expressed Sequence Tag (EST) project to characterize the antennal transcriptome of the noctuid pest model, Spodoptera littoralis, and to identify candidate genes involved in odour/pheromone detection. Results By targeting cDNAs from male antennae, we biased gene discovery towards genes potentially involved in male olfaction, including pheromone reception. A total of 20760 ESTs were obtained from a normalized library and were assembled in 9033 unigenes. 6530 were annotated based on BLAST analyses and gene prediction software identified 6738 ORFs. The unigenes were compared to the Bombyx mori proteome and to ESTs derived from Lepidoptera transcriptome projects. We identified a large number of candidate genes involved in odour and pheromone detection and turnover, including 31 candidate chemosensory receptor genes, but also genes potentially involved in olfactory modulation. Conclusions Our project has generated a large collection of antennal transcripts from a Lepidoptera. The normalization process, allowing enrichment in low abundant genes, proved to be particularly relevant to identify chemosensory receptors in a species for which no genomic data are available. Our results also suggest that olfactory modulation can take place at the level of the antennae itself. These EST resources will be invaluable for exploring the mechanisms of olfaction and pheromone detection in S. littoralis, and for ultimately identifying original targets to fight against moth herbivorous pests. PMID:21276261

  11. In silico identification and characterization of conserved miRNAs and their target genes in sweet potato (Ipomoea batatas L.) Expressed Sequence Tags (ESTs)

    PubMed Central

    Dehury, Budheswar; Panda, Debashis; Sahu, Jagajjit; Sahu, Mousumi; Sarma, Kishore; Barooah, Madhumita; Sen, Priyabrata; Modi, Mahendra Kumar

    2013-01-01

    The endogenous small non-coding micro RNAs (miRNAs), which are typically ~21–24 nt nucleotides, play a crucial role in regulating the intrinsic normal growth of cells and development of the plants as well as in maintaining the integrity of genomes. These small non-coding RNAs function as the universal specificity factors in post-transcriptional gene silencing. Discovering miRNAs, identifying their targets, and further inferring miRNA functions is a routine process to understand normal biological processes of miRNAs and their roles in the development of plants. Comparative genomics based approach using expressed sequence tags (EST) and genome survey sequences (GSS) offer a cost-effective platform for identification and characterization of miRNAs and their target genes in plants. Despite the fact that sweet potato (Ipomoea batatas L.) is an important staple food source for poor small farmers throughout the world, the role of miRNA in various developmental processes remains largely unknown. In this paper, we report the computational identification of miRNAs and their target genes in sweet potato from their ESTs. Using comparative genomics-based approach, 8 potential miRNA candidates belonging to miR168, miR2911, and miR156 families were identified from 23 406 ESTs in sweet potato. A total of 42 target genes were predicted and their probable functions were illustrated. Most of the newly identified miRNAs target transcription factors as well as genes involved in plant growth and development, signal transduction, metabolism, defense, and stress response. The identification of miRNAs and their targets is expected to accelerate the pace of miRNA discovery, leading to an improved understanding of the role of miRNA in development and physiology of sweet potato, as well as stress response. PMID:24067297

  12. Gene Discovery in the Apicomplexa as Revealed by EST Sequencing and Assembly of a Comparative Gene Database

    PubMed Central

    Li, Li; Brunk, Brian P.; Kissinger, Jessica C.; Pape, Deana; Tang, Keliang; Cole, Robert H.; Martin, John; Wylie, Todd; Dante, Mike; Fogarty, Steven J.; Howe, Daniel K.; Liberator, Paul; Diaz, Carmen; Anderson, Jennifer; White, Michael; Jerome, Maria E.; Johnson, Emily A.; Radke, Jay A.; Stoeckert, Christian J.; Waterston, Robert H.; Clifton, Sandra W.; Roos, David S.; Sibley, L. David

    2003-01-01

    Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%–20% represent putative homologs with a conservative cutoff of p < 10−9, thus identifying many conserved genes that are likely to share common functions with other well-studied organisms. Gene assemblies were also used to identify strain polymorphisms, examine stage-specific expression, and identify gene families. An interesting class of genes that are confined to members of this phylum and not shared by plants, animals, or fungi, was identified. These genes likely mediate the novel biological features of members of the Apicomplexa and hence offer great potential for biological investigation and as possible therapeutic targets. [The sequence data from this study have been submitted to dbEST division of GenBank under accession nos.: Toxoplasma gondii: –, –, –, –, – , –, –, –, –. Plasmodium falciparum: –, –, –, –. Sarcocystis neurona: , , , , , , , , , , , , , –, –, –, –, –. Eimeria tenella: –, –, –, –, –, –, –, –, – , –, –, –, –, –, –, –, –, –, –, –. Neospora caninum: –, –, , – , –, –.] PMID:12618375

  13. Find the fish: using PROC SQL to build a relational database

    USGS Publications Warehouse

    Fabrizio, Mary C.; Nelson, Scott N.

    1995-01-01

    Reliable estimates of abundance and survival, gained through mark-recapture studies, are necessary to better understand how to manage and restore lake trout populations in the Great Lakes. Working with a 24-year data set from a mark-recapture study conducted in Lake Superior, we attempted to disclose information on tag shedding by examining recaptures of double-tagged fish. The data set consisted of 64,288 observations on fish which had been marked with one or more tags; a subset of these fish had been marked with two tags at initial capture. Although DATA and PROC statements could be used to obtain some of the information we sought, these statements could not be used to extract a complete set of results from the double-tagging experiments. We therefore used SQL processing to create three tables representing the same information but in a fully normalized relational structure. In addition, we created indices to efficiently examine complex relationships among the individual capture records. This approach allowed us to obtain all the information necessary to estimate tag retention through subsequent modeling. We believe that our success with SQL was due in large part to its ability to simultaneosly scan the same table more than once and to permit consideration of other tables in sub-queries.

  14. Genomic analysis of expressed sequence tags in American black bear Ursus americanus

    PubMed Central

    2010-01-01

    Background Species of the bear family (Ursidae) are important organisms for research in molecular evolution, comparative physiology and conservation biology, but relatively little genetic sequence information is available for this group. Here we report the development and analyses of the first large scale Expressed Sequence Tag (EST) resource for the American black bear (Ursus americanus). Results Comprehensive analyses of molecular functions, alternative splicing, and tissue-specific expression of 38,757 black bear EST sequences were conducted using the dog genome as a reference. We identified 18 genes, involved in functions such as lipid catabolism, cell cycle, and vesicle-mediated transport, that are showing rapid evolution in the bear lineage Three genes, Phospholamban (PLN), cysteine glycine-rich protein 3 (CSRP3) and Troponin I type 3 (TNNI3), are related to heart contraction, and defects in these genes in humans lead to heart disease. Two genes, biphenyl hydrolase-like (BPHL) and CSRP3, contain positively selected sites in bear. Global analysis of evolution rates of hibernation-related genes in bear showed that they are largely conserved and slowly evolving genes, rather than novel and fast-evolving genes. Conclusion We provide a genomic resource for an important mammalian organism and our study sheds new light on the possible functions and evolution of bear genes. PMID:20338065

  15. ESTs from Seeds to Assist the Selective Breeding of Jatropha curcas L. for Oil and Active Compounds

    PubMed Central

    Gomes, Kleber A; Almeida, Tiago C; Gesteira, Abelmon S; Lôbo, Ivon P; Guimarães, Ana Carolina R; de Miranda, Antonio B; Van Sluys, Marie-Anne; da Cruz, Rosenira S; Cascardo, Júlio CM; Carels, Nicolas

    2010-01-01

    We report here on the characterization of a cDNA library from seeds of Jatropha curcas L. at three stages of fruit maturation before yellowing. We sequenced a total of 2200 clones and obtained a set of 931 non-redundant sequences (unigenes) after trimming and quality control, ie, 140 contigs and 791 singlets with PHRED quality ≥10. We found low levels of sequence redundancy and extensive metabolic coverage by homology comparison to GO. After comparison of 5841 non-redundant ESTs from a total of 13193 reads from GenBank with KEGG, we identified tags with nucleotide variations among J. curcas accessions for genes of fatty acid, terpene, alkaloid, quinone and hormone pathways of biosynthesis. More specifically, the expression level of four genes (palmitoyl-acyl carrier protein thioesterase, 3-ketoacyl-CoA thiolase B, lysophosphatidic acid acyltransferase and geranyl pyrophosphate synthase) measured by real-time PCR proved to be significantly different between leaves and fruits. Since the nucleotide polymorphism of these tags is associated to higher level of gene expression in fruits compared to leaves, we propose this approach to speed up the search for quantitative traits in selective breeding of J. curcas. We also discuss its potential utility for the selective breeding of economically important traits in J. curcas. PMID:26217103

  16. Genomic analysis of expressed sequence tags in American black bear Ursus americanus.

    PubMed

    Zhao, Sen; Shao, Chunxuan; Goropashnaya, Anna V; Stewart, Nathan C; Xu, Yichi; Tøien, Øivind; Barnes, Brian M; Fedorov, Vadim B; Yan, Jun

    2010-03-26

    Species of the bear family (Ursidae) are important organisms for research in molecular evolution, comparative physiology and conservation biology, but relatively little genetic sequence information is available for this group. Here we report the development and analyses of the first large scale Expressed Sequence Tag (EST) resource for the American black bear (Ursus americanus). Comprehensive analyses of molecular functions, alternative splicing, and tissue-specific expression of 38,757 black bear EST sequences were conducted using the dog genome as a reference. We identified 18 genes, involved in functions such as lipid catabolism, cell cycle, and vesicle-mediated transport, that are showing rapid evolution in the bear lineage Three genes, Phospholamban (PLN), cysteine glycine-rich protein 3 (CSRP3) and Troponin I type 3 (TNNI3), are related to heart contraction, and defects in these genes in humans lead to heart disease. Two genes, biphenyl hydrolase-like (BPHL) and CSRP3, contain positively selected sites in bear. Global analysis of evolution rates of hibernation-related genes in bear showed that they are largely conserved and slowly evolving genes, rather than novel and fast-evolving genes. We provide a genomic resource for an important mammalian organism and our study sheds new light on the possible functions and evolution of bear genes.

  17. D-Tagatose Is a Promising Sweetener to Control Glycaemia: A New Functional Food.

    PubMed

    Guerrero-Wyss, Marion; Durán Agüero, Samuel; Angarita Dávila, Lisse

    2018-01-01

    The objective of the current research was to review and update evidence on the dietary effect of the consumption of tagatose in type 2 diabetes, as well as to elucidate the current approach that exists on its production and biotechnological utility in functional food for diabetics. Articles published before July 1, 2017, were included in the databases PubMed, EBSCO, Google Scholar, and Scielo, including the terms "Tagatose", "Sweeteners", "Diabetes Mellitus type 2", "Sweeteners", "D-Tag". D-Tagatose (D-tag) is an isomer of fructose which is approximately 90% sweeter than sucrose. Preliminary studies in animals and preclinical studies showed that D-tag decreased glucose levels, which generated great interest in the scientific community. Recent studies indicate that tagatose has low glycemic index, a potent hypoglycemic effect, and eventually could be associated with important benefits for the treatment of obesity. The authors concluded that D-tag is promising as a sweetener without major adverse effects observed in these clinical studies.

  18. Development of EST-SSR markers for Elaeocarpus photiniifolia (Elaeocarpaceae), an endemic taxon of the Bonin Islands.

    PubMed

    Sugai, Kyoko; Setsuko, Suzuki; Uchiyama, Kentaro; Murakami, Noriaki; Kato, Hidetoshi; Yoshimaru, Hiroshi

    2012-02-01

    Expressed sequence tag (EST)-derived microsatellite markers were developed for Elaeocarpus photiniifolia, an endemic taxon of the Bonin Islands. Initially, a complementary DNA (cDNA) library was constructed by de novo pyrosequencing of total RNA extracted from a seedling. A total of 267 primer pairs were designed from the library. Of the 48 tested loci, 25 loci were polymorphic among 41 individuals representing the entire geographical range of the species, with the number of alleles per locus and expected heterozygosity ranging from two to 14 and 0.09 to 0.86, respectively. Most loci were transferable to a related species, E. sylvestris. The developed markers will be useful for evaluating the genetic structure of E. photiniifolia.

  19. Genomic resources in fruit plants: an assessment of current status.

    PubMed

    Rai, Manoj K; Shekhawat, N S

    2015-01-01

    The availability of many genomic resources such as genome sequences, functional genomics resources including microarrays and RNA-seq, sufficient numbers of molecular markers, express sequence tags (ESTs) and high-density genetic maps is causing a rapid acceleration of genetics and genomic research of many fruit plants. This is leading to an increase in our knowledge of the genes that are linked to many horticultural and agronomically important traits. Recently, some progress has also been made on the identification and functional analysis of miRNAs in some fruit plants. This is one of the most active research fields in plant sciences. The last decade has witnessed development of genomic resources in many fruit plants such as apple, banana, citrus, grapes, papaya, pears, strawberry etc.; however, many of them are still not being exploited. Furthermore, owing to lack of resources, infrastructure and research facilities in many lesser-developed countries, development of genomic resources in many underutilized or less-studied fruit crops, which grow in these countries, is limited. Thus, research emphasis should be given to those fruit crops for which genomic resources are relatively scarce. The development of genomic databases of these less-studied fruit crops will enable biotechnologists to identify target genes that underlie key horticultural and agronomical traits. This review presents an overview of the current status of the development of genomic resources in fruit plants with the main emphasis being on genome sequencing, EST resources, functional genomics resources including microarray and RNA-seq, identification of quantitative trait loci and construction of genetic maps as well as efforts made on the identification and functional analysis of miRNAs in fruit plants.

  20. Sarcocystis neurona Merozoites Express a Family of Immunogenic Surface Antigens That Are Orthologues of the Toxoplasma gondii Surface Antigens (SAGs) and SAG-Related Sequences†

    PubMed Central

    Howe, Daniel K.; Gaji, Rajshekhar Y.; Mroz-Barrett, Meaghan; Gubbels, Marc-Jan; Striepen, Boris; Stamper, Shelby

    2005-01-01

    Sarcocystis neurona is a member of the Apicomplexa that causes myelitis and encephalitis in horses but normally cycles between the opossum and small mammals. Analysis of an S. neurona expressed sequence tag (EST) database revealed four paralogous proteins that exhibit clear homology to the family of surface antigens (SAGs) and SAG-related sequences of Toxoplasma gondii. The primary peptide sequences of the S. neurona proteins are consistent with the two-domain structure that has been described for the T. gondii SAGs, and each was predicted to have an amino-terminal signal peptide and a carboxyl-terminal glycolipid anchor addition site, suggesting surface localization. All four proteins were confirmed to be membrane associated and displayed on the surface of S. neurona merozoites. Due to their surface localization and homology to T. gondii surface antigens, these S. neurona proteins were designated SnSAG1, SnSAG2, SnSAG3, and SnSAG4. Consistent with their homology, the SnSAGs elicited a robust immune response in infected and immunized animals, and their conserved structure further suggests that the SnSAGs similarly serve as adhesins for attachment to host cells. Whether the S. neurona SAG family is as extensive as the T. gondii SAG family remains unresolved, but it is probable that additional SnSAGs will be revealed as more S. neurona ESTs are generated. The existence of an SnSAG family in S. neurona indicates that expression of multiple related surface antigens is not unique to the ubiquitous organism T. gondii. Instead, the SAG gene family is a common trait that presumably has an essential, conserved function(s). PMID:15664946

  1. Sarcocystis neurona merozoites express a family of immunogenic surface antigens that are orthologues of the Toxoplasma gondii surface antigens (SAGs) and SAG-related sequences.

    PubMed

    Howe, Daniel K; Gaji, Rajshekhar Y; Mroz-Barrett, Meaghan; Gubbels, Marc-Jan; Striepen, Boris; Stamper, Shelby

    2005-02-01

    Sarcocystis neurona is a member of the Apicomplexa that causes myelitis and encephalitis in horses but normally cycles between the opossum and small mammals. Analysis of an S. neurona expressed sequence tag (EST) database revealed four paralogous proteins that exhibit clear homology to the family of surface antigens (SAGs) and SAG-related sequences of Toxoplasma gondii. The primary peptide sequences of the S. neurona proteins are consistent with the two-domain structure that has been described for the T. gondii SAGs, and each was predicted to have an amino-terminal signal peptide and a carboxyl-terminal glycolipid anchor addition site, suggesting surface localization. All four proteins were confirmed to be membrane associated and displayed on the surface of S. neurona merozoites. Due to their surface localization and homology to T. gondii surface antigens, these S. neurona proteins were designated SnSAG1, SnSAG2, SnSAG3, and SnSAG4. Consistent with their homology, the SnSAGs elicited a robust immune response in infected and immunized animals, and their conserved structure further suggests that the SnSAGs similarly serve as adhesins for attachment to host cells. Whether the S. neurona SAG family is as extensive as the T. gondii SAG family remains unresolved, but it is probable that additional SnSAGs will be revealed as more S. neurona ESTs are generated. The existence of an SnSAG family in S. neurona indicates that expression of multiple related surface antigens is not unique to the ubiquitous organism T. gondii. Instead, the SAG gene family is a common trait that presumably has an essential, conserved function(s).

  2. Identification of Odor-Processing Genes in the Emerald Ash Borer, Agrilus planipennis

    PubMed Central

    Mamidala, Praveen; Wijeratne, Asela J.; Wijeratne, Saranga; Poland, Therese; Qazi, Sohail S.; Doucet, Daniel; Cusson, Michel; Beliveau, Catherine; Mittapalli, Omprakash

    2013-01-01

    Background Insects rely on olfaction to locate food, mates, and suitable oviposition sites for successful completion of their life cycle. Agrilus planipennis Fairmaire (emerald ash borer) is a serious invasive insect pest that has killed tens of millions of North American ash (Fraxinus spp) trees and threatens the very existence of the genus Fraxinus. Adult A. planipennis are attracted to host volatiles and conspecifics; however, to date no molecular knowledge exists on olfaction in A. planipennis. Hence, we undertook an antennae-specific transcriptomic study to identify the repertoire of odor processing genes involved in A. planipennis olfaction. Methodology and Principal Findings We acquired 139,085 Roche/454 GS FLX transcriptomic reads that were assembled into 30,615 high quality expressed sequence tags (ESTs), including 3,249 isotigs and 27,366 non-isotigs (contigs and singletons). Intriguingly, the majority of the A. planipennis antennal transcripts (59.72%) did not show similarity with sequences deposited in the non-redundant database of GenBank, potentially representing novel genes. Functional annotation and KEGG analysis revealed pathways associated with signaling and detoxification. Several odor processing genes (9 odorant binding proteins, 2 odorant receptors, 1 sensory neuron membrane protein and 134 odorant/xenobiotic degradation enzymes, including cytochrome P450s, glutathione-S-transferases; esterases, etc.) putatively involved in olfaction processes were identified. Quantitative PCR of candidate genes in male and female A. planipennis in different developmental stages revealed developmental- and sex-biased expression patterns. Conclusions and Significance The antennal ESTs derived from A. planipennis constitute a rich molecular resource for the identification of genes potentially involved in the olfaction process of A. planipennis. These findings should help in understanding the processing of antennally-active compounds (e.g. 7-epi-sesquithujene) previously identified in this serious invasive pest. PMID:23424668

  3. Gene-expression analysis of cold-stress response in the sexually transmitted protist Trichomonas vaginalis.

    PubMed

    Fang, Yi-Kai; Huang, Kuo-Yang; Huang, Po-Jung; Lin, Rose; Chao, Mei; Tang, Petrus

    2015-12-01

    Trichomonas vaginalis is the etiologic agent of trichomoniasis, the most common nonviral sexually transmitted disease in the world. This infection affects millions of individuals worldwide annually. Although direct sexual contact is the most common mode of transmission, increasing evidence indicates that T. vaginalis can survive in the external environment and can be transmitted by contaminated utensils. We found that the growth of T. vaginalis under cold conditions is greatly inhibited, but recovers after placing these stressed cells at the normal cultivation temperature of 37 °C. However, the mechanisms by which T. vaginalis regulates this adaptive process are unclear. An expressed sequence tag (EST) database generated from a complementary DNA library of T. vaginalis messenger RNAs expressed under cold-culture conditions (4 °C, TvC) was compared with a previously published normal-cultured EST library (37 °C, TvE) to assess the cold-stress responses of T. vaginalis. A total of 9780 clones were sequenced from the TvC library and were mapped to 2934 genes in the T. vaginalis genome. A total of 1254 genes were expressed in both the TvE and TvC libraries, and 1680 genes were only found in the TvC library. A functional analysis showed that cold temperature has effects on many cellular mechanisms, including increased H2O2 tolerance, activation of the ubiquitin-proteasome system, induction of iron-sulfur cluster assembly, and reduced energy metabolism and enzyme expression. The current study is the first large-scale transcriptomic analysis in cold-stressed T. vaginalis and the results enhance our understanding of this important protist. Copyright © 2014. Published by Elsevier B.V.

  4. XET Activity is Found Near Sites of Growth and Cell Elongation in Bryophytes and Some Green Algae: New Insights into the Evolution of Primary Cell Wall Elongation

    PubMed Central

    Van Sandt, Vicky S. T.; Stieperaere, Herman; Guisez, Yves; Verbelen, Jean-Pierre; Vissenberg, Kris

    2007-01-01

    Background and Aims In angiosperms xyloglucan endotransglucosylase (XET)/hydrolase (XTH) is involved in reorganization of the cell wall during growth and development. The location of oligo-xyloglucan transglucosylation activity and the presence of XTH expressed sequence tags (ESTs) in the earliest diverging extant plants, i.e. in bryophytes and algae, down to the Phaeophyta was examined. The results provide information on the presence of an XET growth mechanism in bryophytes and algae and contribute to the understanding of the evolution of cell wall elongation in general. Methods Representatives of the different plant lineages were pressed onto an XET test paper and assayed. XET or XET-related activity was visualized as the incorporation of fluorescent signal. The Physcomitrella genome database was screened for the presence of XTHs. In addition, using the 3′ RACE technique searches were made for the presence of possible XTH ESTs in the Charophyta. Key Results XET activity was found in the three major divisions of bryophytes at sites corresponding to growing regions. In the Physcomitrella genome two putative XTH-encoding cDNA sequences were identified that contain all domains crucial for XET activity. Furthermore, XET activity was located at the sites of growth in Chara (Charophyta) and Ulva (Chlorophyta) and a putative XTH ancestral enzyme in Chara was identified. No XET activity was identified in the Rhodophyta or Phaeophyta. Conclusions XET activity was shown to be present in all major groups of green plants. These data suggest that an XET-related growth mechanism originated before the evolutionary divergence of the Chlorobionta and open new insights in the evolution of the mechanisms of primary cell wall expansion. PMID:17098750

  5. Development of a set of SNP markers present in expressed genes of the apple.

    PubMed

    Chagné, David; Gasic, Ksenija; Crowhurst, Ross N; Han, Yuepeng; Bassett, Heather C; Bowatte, Deepa R; Lawrence, Timothy J; Rikkerink, Erik H A; Gardiner, Susan E; Korban, Schuyler S

    2008-11-01

    Molecular markers associated with gene coding regions are useful tools for bridging functional and structural genomics. Due to their high abundance in plant genomes, single nucleotide polymorphisms (SNPs) are present within virtually all genomic regions, including most coding sequences. The objective of this study was to develop a set of SNPs for the apple by taking advantage of the wealth of genomics resources available for the apple, including a large collection of expressed sequenced tags (ESTs). Using bioinformatics tools, a search for SNPs within an EST database of approximately 350,000 sequences developed from a variety of apple accessions was conducted. This resulted in the identification of a total of 71,482 putative SNPs. As the apple genome is reported to be an ancient polyploid, attempts were made to verify whether those SNPs detected in silico were attributable either to allelic polymorphisms or to gene duplication or paralogous or homeologous sequence variations. To this end, a set of 464 PCR primer pairs was designed, PCR was amplified using two subsets of plants, and the PCR products were sequenced. The SNPs retrieved from these sequences were then mapped onto apple genetic maps, including a newly constructed map of a Royal Gala x A689-24 cross and a Malling 9 x Robusta 5, map using a bin mapping strategy. The SNP genotyping was performed using the high-resolution melting (HRM) technique. A total of 93 new markers containing 210 coding SNPs were successfully mapped. This new set of SNP markers for the apple offers new opportunities for understanding the genetic control of important horticultural traits using quantitative trait loci (QTL) or linkage disequilibrium analysis. These also serve as useful markers for aligning physical and genetic maps, and as potential transferable markers across the Rosaceae family.

  6. Tissue-specific transcriptomics of the exotic invasive insect pest emerald ash borer (Agrilus planipennis).

    PubMed

    Mittapalli, Omprakash; Bai, Xiaodong; Mamidala, Praveen; Rajarapu, Swapna Priya; Bonello, Pierluigi; Herms, Daniel A

    2010-10-28

    The insect midgut and fat body represent major tissue interfaces that deal with several important physiological functions including digestion, detoxification and immune response. The emerald ash borer (Agrilus planipennis), is an exotic invasive insect pest that has killed millions of ash trees (Fraxinus spp.) primarily in the Midwestern United States and Ontario, Canada. However, despite its high impact status little knowledge exists for A. planipennis at the molecular level. Newer-generation Roche-454 pyrosequencing was used to obtain 126,185 reads for the midgut and 240,848 reads for the fat body, which were assembled into 25,173 and 37,661 high quality expressed sequence tags (ESTs) for the midgut and the fat body of A. planipennis larvae, respectively. Among these ESTs, 36% of the midgut and 38% of the fat body sequences showed similarity to proteins in the GenBank nr database. A high number of the midgut sequences contained chitin-binding peritrophin (248)and trypsin (98) domains; while the fat body sequences showed high occurrence of cytochrome P450s (85) and protein kinase (123) domains. Further, the midgut transcriptome of A. planipennis revealed putative microbial transcripts encoding for cell-wall degrading enzymes such as polygalacturonases and endoglucanases. A significant number of SNPs (137 in midgut and 347 in fat body) and microsatellite loci (317 in midgut and 571 in fat body) were predicted in the A. planipennis transcripts. An initial assessment of cytochrome P450s belonging to various CYP clades revealed distinct expression patterns at the tissue level. To our knowledge this study is one of the first to illuminate tissue-specific gene expression in an invasive insect of high ecological and economic consequence. These findings will lay the foundation for future gene expression and functional studies in A. planipennis.

  7. Identification of odor-processing genes in the emerald ash borer, Agrilus planipennis.

    PubMed

    Mamidala, Praveen; Wijeratne, Asela J; Wijeratne, Saranga; Poland, Therese; Qazi, Sohail S; Doucet, Daniel; Cusson, Michel; Beliveau, Catherine; Mittapalli, Omprakash

    2013-01-01

    Insects rely on olfaction to locate food, mates, and suitable oviposition sites for successful completion of their life cycle. Agrilus planipennis Fairmaire (emerald ash borer) is a serious invasive insect pest that has killed tens of millions of North American ash (Fraxinus spp) trees and threatens the very existence of the genus Fraxinus. Adult A. planipennis are attracted to host volatiles and conspecifics; however, to date no molecular knowledge exists on olfaction in A. planipennis. Hence, we undertook an antennae-specific transcriptomic study to identify the repertoire of odor processing genes involved in A. planipennis olfaction. We acquired 139,085 Roche/454 GS FLX transcriptomic reads that were assembled into 30,615 high quality expressed sequence tags (ESTs), including 3,249 isotigs and 27,366 non-isotigs (contigs and singletons). Intriguingly, the majority of the A. planipennis antennal transcripts (59.72%) did not show similarity with sequences deposited in the non-redundant database of GenBank, potentially representing novel genes. Functional annotation and KEGG analysis revealed pathways associated with signaling and detoxification. Several odor processing genes (9 odorant binding proteins, 2 odorant receptors, 1 sensory neuron membrane protein and 134 odorant/xenobiotic degradation enzymes, including cytochrome P450s, glutathione-S-transferases; esterases, etc.) putatively involved in olfaction processes were identified. Quantitative PCR of candidate genes in male and female A. planipennis in different developmental stages revealed developmental- and sex-biased expression patterns. The antennal ESTs derived from A. planipennis constitute a rich molecular resource for the identification of genes potentially involved in the olfaction process of A. planipennis. These findings should help in understanding the processing of antennally-active compounds (e.g. 7-epi-sesquithujene) previously identified in this serious invasive pest.

  8. Genome-wide identification and functional annotation of miRNAs in anti-inflammatory plant and their cross-kingdom regulation in Homo sapiens.

    PubMed

    Sharma, Ankita; Sahu, Sarika; Kumari, Pooja; Gopi, Soundhara Rajan; Malhotra, Rajesh; Biswas, Sagarika

    2017-05-01

    MicroRNAs (miRNAs) are newly discovered non-coding small (~17-24 nucleotide) RNAs that regulate gene expression of its target mRNA at the post-transcriptional levels. In this study, total 12,593 ESTs of Curcuma longa were taken from database of expressed sequence tags (dbEST) and clustered into 2821 contigs using EGassembler web server. Precursor miRNAs (pre-miRNAs) were predicted from these contigs that folded into stem-loop structure using MFold server. Thirty-four mature C. longa miRNAs (clo-miRNAs) were identified from pre-miRNAs having targets involved in various important functions of plant such as self-defence, growth and development, alkaloid metabolic pathway and ethylene signalling process. Sequence analysis of identified clo-miRNAs indicated that 56% miRNAs belong to ORF and 44% belong to non-ORF region. clo-mir-5 and clo-mir-6 were established as the conserved miRNAs, whereas clo-mir-20 was predicted to be the most stable miRNA. Phylogenetic analysis carried out by molecular evolutionary genetics analysis (MEGA) software indicated close evolutionary relationship of clo-mir-5075 with osa-MIR5075. Further, identified clo-miRNAs were checked for their cross-kingdom regulatory potential. clo-mir-14 was found to regulate various gene transcripts in humans that has been further investigated for its biostability in foetal bovine serum (FBS). The results indicated higher degree of stability of clo-mir-14 (48 h) in FBS. Thus, contribution of this miRNA to the cellular immune response during the inflamed condition of rheumatoid arthritis and adequate stability may make it a good choice for the therapeutic agent in near future.

  9. ESTs Analysis Reveals Putative Genes Involved in Symbiotic Seed Germination in Dendrobium officinale

    PubMed Central

    Zhao, Ming-Ming; Zhang, Gang; Zhang, Da-Wei; Hsiao, Yu-Yun; Guo, Shun-Xing

    2013-01-01

    Dendrobium officinale (Orchidaceae) is one of the world’s most endangered plants with great medicinal value. In nature, D . officinale seeds must establish symbiotic relationships with fungi to germinate. However, the molecular events involved in the interaction between fungus and plant during this process are poorly understood. To isolate the genes involved in symbiotic germination, a suppression subtractive hybridization (SSH) cDNA library of symbiotically germinated D . officinale seeds was constructed. From this library, 1437 expressed sequence tags (ESTs) were clustered to 1074 Unigenes (including 902 singletons and 172 contigs), which were searched against the NCBI non-redundant (NR) protein database (E-value cutoff, e-5). Based on sequence similarity with known proteins, 579 differentially expressed genes in D . officinale were identified and classified into different functional categories by Gene Ontology (GO), Clusters of orthologous Groups of proteins (COGs) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The expression levels of 15 selected genes emblematic of symbiotic germination were confirmed via real-time quantitative PCR. These genes were classified into various categories, including defense and stress response, metabolism, transcriptional regulation, transport process and signal transduction pathways. All transcripts were upregulated in the symbiotically germinated seeds (SGS). The functions of these genes in symbiotic germination were predicted. Furthermore, two fungus-induced calcium-dependent protein kinases (CDPKs), which were upregulated 6.76- and 26.69-fold in SGS compared with un-germinated seeds (UGS), were cloned from D . officinale and characterized for the first time. This study provides the first global overview of genes putatively involved in D . officinale symbiotic seed germination and provides a foundation for further functional research regarding symbiotic relationships in orchids. PMID:23967335

  10. ESTs analysis reveals putative genes involved in symbiotic seed germination in Dendrobium officinale.

    PubMed

    Zhao, Ming-Ming; Zhang, Gang; Zhang, Da-Wei; Hsiao, Yu-Yun; Guo, Shun-Xing

    2013-01-01

    Dendrobiumofficinale (Orchidaceae) is one of the world's most endangered plants with great medicinal value. In nature, D. officinale seeds must establish symbiotic relationships with fungi to germinate. However, the molecular events involved in the interaction between fungus and plant during this process are poorly understood. To isolate the genes involved in symbiotic germination, a suppression subtractive hybridization (SSH) cDNA library of symbiotically germinated D. officinale seeds was constructed. From this library, 1437 expressed sequence tags (ESTs) were clustered to 1074 Unigenes (including 902 singletons and 172 contigs), which were searched against the NCBI non-redundant (NR) protein database (E-value cutoff, e(-5)). Based on sequence similarity with known proteins, 579 differentially expressed genes in D. officinale were identified and classified into different functional categories by Gene Ontology (GO), Clusters of orthologous Groups of proteins (COGs) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The expression levels of 15 selected genes emblematic of symbiotic germination were confirmed via real-time quantitative PCR. These genes were classified into various categories, including defense and stress response, metabolism, transcriptional regulation, transport process and signal transduction pathways. All transcripts were upregulated in the symbiotically germinated seeds (SGS). The functions of these genes in symbiotic germination were predicted. Furthermore, two fungus-induced calcium-dependent protein kinases (CDPKs), which were upregulated 6.76- and 26.69-fold in SGS compared with un-germinated seeds (UGS), were cloned from D. officinale and characterized for the first time. This study provides the first global overview of genes putatively involved in D. officinale symbiotic seed germination and provides a foundation for further functional research regarding symbiotic relationships in orchids.

  11. Genic Microsatellite Markers in Brassica rapa: Development, Characterization, Mapping, and Their Utility in Other Cultivated and Wild Brassica Relatives

    PubMed Central

    Ramchiary, Nirala; Nguyen, Van Dan; Li, Xiaonan; Hong, Chang Pyo; Dhandapani, Vignesh; Choi, Su Ryun; Yu, Ge; Piao, Zhong Yun; Lim, Yong Pyo

    2011-01-01

    Genic microsatellite markers, also known as functional markers, are preferred over anonymous markers as they reveal the variation in transcribed genes among individuals. In this study, we developed a total of 707 expressed sequence tag-derived simple sequence repeat markers (EST-SSRs) and used for development of a high-density integrated map using four individual mapping populations of B. rapa. This map contains a total of 1426 markers, consisting of 306 EST-SSRs, 153 intron polymorphic markers, 395 bacterial artificial chromosome-derived SSRs (BAC-SSRs), and 572 public SSRs and other markers covering a total distance of 1245.9 cM of the B. rapa genome. Analysis of allelic diversity in 24 B. rapa germplasm using 234 mapped EST-SSR markers showed amplification of 2 alleles by majority of EST-SSRs, although amplification of alleles ranging from 2 to 8 was found. Transferability analysis of 167 EST-SSRs in 35 species belonging to cultivated and wild brassica relatives showed 42.51% (Sysimprium leteum) to 100% (B. carinata, B. juncea, and B. napus) amplification. Our newly developed EST-SSRs and high-density linkage map based on highly transferable genic markers would facilitate the molecular mapping of quantitative trait loci and the positional cloning of specific genes, in addition to marker-assisted selection and comparative genomic studies of B. rapa with other related species. PMID:21768136

  12. A Guided Tour of Saada

    NASA Astrophysics Data System (ADS)

    Michel, L.; Motch, C.; Nguyen Ngoc, H.; Pineau, F. X.

    2009-09-01

    Saada (http://amwdb.u-strasbg.fr/saada) is a tool for helping astronomers build local archives without writing any code (Michel et al. 2004). Databases created by Saada can host collections of heterogeneous data files. These data collections can also be published in the VO. An overview of the main Saada features is presented in this demo: creation of a basic database, creation of relationships, data searches using SaadaQL, metadata tagging, and use of VO services.

  13. Identification and validation of Asteraceae miRNAs by the expressed sequence tag analysis.

    PubMed

    Monavar Feshani, Aboozar; Mohammadi, Saeed; Frazier, Taylor P; Abbasi, Abbas; Abedini, Raha; Karimi Farsad, Laleh; Ehya, Farveh; Salekdeh, Ghasem Hosseini; Mardi, Mohsen

    2012-02-10

    MicroRNAs (miRNAs) are small non-coding RNA molecules that play a vital role in the regulation of gene expression. Despite their identification in hundreds of plant species, few miRNAs have been identified in the Asteraceae, a large family that comprises approximately one tenth of all flowering plants. In this study, we used the expressed sequence tag (EST) analysis to identify potential conserved miRNAs and their putative target genes in the Asteraceae. We applied quantitative Real-Time PCR (qRT-PCR) to confirm the expression of eight potential miRNAs in Carthamus tinctorius and Helianthus annuus. We also performed qRT-PCR analysis to investigate the differential expression pattern of five newly identified miRNAs during five different cotyledon growth stages in safflower. Using these methods, we successfully identified and characterized 151 potentially conserved miRNAs, belonging to 26 miRNA families, in 11 genus of Asteraceae. EST analysis predicted that the newly identified conserved Asteraceae miRNAs target 130 total protein-coding ESTs in sunflower and safflower, as well as 433 additional target genes in other plant species. We experimentally confirmed the existence of seven predicted miRNAs, (miR156, miR159, miR160, miR162, miR166, miR396, and miR398) in safflower and sunflower seedlings. We also observed that five out of eight miRNAs are differentially expressed during cotyledon development. Our results indicate that miRNAs may be involved in the regulation of gene expression during seed germination and the formation of the cotyledons in the Asteraceae. The findings of this study might ultimately help in the understanding of miRNA-mediated gene regulation in important crop species. Copyright © 2011 Elsevier B.V. All rights reserved.

  14. Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

    PubMed

    Gao, Chunsheng; Xin, Pengfei; Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.

  15. Diversity Analysis in Cannabis sativa Based on Large-Scale Development of Expressed Sequence Tag-Derived Simple Sequence Repeat Markers

    PubMed Central

    Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis. PMID:25329551

  16. Transcriptome analysis of Bupleurum chinense focusing on genes involved in the biosynthesis of saikosaponins

    PubMed Central

    2011-01-01

    Abstract Background Bupleurum chinense DC. is a widely used traditional Chinese medicinal plant. Saikosaponins are the major bioactive constituents of B. chinense, but relatively little is known about saikosaponin biosynthesis. The 454 pyrosequencing technology provides a promising opportunity for finding novel genes that participate in plant metabolism. Consequently, this technology may help to identify the candidate genes involved in the saikosaponin biosynthetic pathway. Results One-quarter of the 454 pyrosequencing runs produced a total of 195, 088 high-quality reads, with an average read length of 356 bases (NCBI SRA accession SRA039388). A de novo assembly generated 24, 037 unique sequences (22, 748 contigs and 1, 289 singletons), 12, 649 (52.6%) of which were annotated against three public protein databases using a basic local alignment search tool (E-value ≤1e-10). All unique sequences were compared with NCBI expressed sequence tags (ESTs) (237) and encoding sequences (44) from the Bupleurum genus, and with a Sanger-sequenced EST dataset (3, 111). The 23, 173 (96.4%) unique sequences obtained in the present study represent novel Bupleurum genes. The ESTs of genes related to saikosaponin biosynthesis were found to encode known enzymes that catalyze the formation of the saikosaponin backbone; 246 cytochrome P450 (P450s) and 102 glycosyltransferases (GTs) unique sequences were also found in the 454 dataset. Full length cDNAs of 7 P450s and 7 uridine diphosphate GTs (UGTs) were verified by reverse transcriptase polymerase chain reaction or by cloning using 5' and/or 3' rapid amplification of cDNA ends. Two P450s and three UGTs were identified as the most likely candidates involved in saikosaponin biosynthesis. This finding was based on the coordinate up-regulation of their expression with β-AS in methyl jasmonate-treated adventitious roots and on their similar expression patterns with β-AS in various B. chinense tissues. Conclusions A collection of high-quality ESTs for B. chinense obtained by 454 pyrosequencing is provided here for the first time. These data should aid further research on the functional genomics of B. chinense and other Bupleurum species. The candidate genes for enzymes involved in saikosaponin biosynthesis, especially the P450s and UGTs, that were revealed provide a substantial foundation for follow-up research on the metabolism and regulation of the saikosaponins. PMID:22047182

  17. Regulation of Tumor Progression by Mgat5-Dependent Glycosylation

    DTIC Science & Technology

    2002-07-01

    PfA, P. found in mammals are also conserved in this nematode (9-13). vudgaris leucoaggiutinin EST, expressed sequence tag; FACS, fluores- cence...this paper, we establish that the pCSYK-L116RtogeneratepEGFP-L116R. The introduced segment was nematode orthologue is functionally equivalent to that...into the between 30 and 100 pl. Enzyme sources were nematode microsomal EcoRV site of pZErO-2 (Invitrogen). Independent recombinants were membranes

  18. Characterization, crystallization and preliminary X-ray diffraction analysis of an (S)-specific esterase (pfEstA) from Pseudomonas fluorescens KCTC 1767: enantioselectivity for potential industrial applications.

    PubMed

    Kim, Seulgi; Ngo, Tri Duc; Kim, Kyeong Kyu; Kim, T Doohun

    2012-11-01

    The structures and reaction mechanisms of enantioselective hydrolases, which can be used in industrial applications such as biotransformations, are largely unknown. Here, the X-ray crystallographic study of a novel (S)-specific esterase (pfEstA) from Pseudomonas fluorescens KCTC 1767, which can be used in the production of (S)-ketoprofen, is described. Multiple sequence alignments with other hydrolases revealed that pfEstA contains a conserved Ser67 within the S-X-X-K motif as well as a highly conserved Tyr156. Recombinant protein containing an N-terminal His tag was expressed in Escherichia coli, purified to homogeneity and characterized using SDS-PAGE, MALDI-TOF MS and enantioselective analysis. pfEstA was crystallized using a solution consisting of 1 M sodium citrate, 0.1 M CHES pH 9.5, and X-ray diffraction data were collected to a resolution of 1.9 Å with an Rmerge of 7.9%. The crystals of pfEstA belonged to space group P2(1)2(1)2(1), with unit-cell parameters a=65.31, b=82.13, c=100.41 Å, α=β=γ=90°.

  19. NABIC marker database: A molecular markers information network of agricultural crops.

    PubMed

    Kim, Chang-Kug; Seol, Young-Joo; Lee, Dong-Jun; Jeong, In-Seon; Yoon, Ung-Han; Lee, Gang-Seob; Hahn, Jang-Ho; Park, Dong-Suk

    2013-01-01

    In 2013, National Agricultural Biotechnology Information Center (NABIC) reconstructs a molecular marker database for useful genetic resources. The web-based marker database consists of three major functional categories: map viewer, RSN marker and gene annotation. It provides 7250 marker locations, 3301 RSN marker property, 3280 molecular marker annotation information in agricultural plants. The individual molecular marker provides information such as marker name, expressed sequence tag number, gene definition and general marker information. This updated marker-based database provides useful information through a user-friendly web interface that assisted in tracing any new structures of the chromosomes and gene positional functions using specific molecular markers. The database is available for free at http://nabic.rda.go.kr/gere/rice/molecularMarkers/

  20. Characterization and Amplification of Gene-Based Simple Sequence Repeat (SSR) Markers in Date Palm.

    PubMed

    Zhao, Yongli; Keremane, Manjunath; Prakash, Channapatna S; He, Guohao

    2017-01-01

    The paucity of molecular markers limits the application of genetic and genomic research in date palm (Phoenix dactylifera L.). Availability of expressed sequence tag (EST) sequences in date palm may provide a good resource for developing gene-based markers. This study characterizes a substantial fraction of transcriptome sequences containing simple sequence repeats (SSRs) from the EST sequences in date palm. The EST sequences studied are mainly homologous to those of Elaeis guineensis and Musa acuminata. A total of 911 gene-based SSR markers, characterized with functional annotations, have provided a useful basis not only for discovering candidate genes and understanding genetic basis of traits of interest but also for developing genetic and genomic tools for molecular research in date palm, such as diversity study, quantitative trait locus (QTL) mapping, and molecular breeding. The procedures of DNA extraction, polymerase chain reaction (PCR) amplification of these gene-based SSR markers, and gel electrophoresis of PCR products are described in this chapter.

  1. Development of EST-derived microsatellite markers in the aquatic macrophyte Ranunculus bungei (Ranunculaceae)1

    PubMed Central

    Wu, Zhigang; Wu, Jinwei; Wang, Yalin; Hou, Hongwei

    2017-01-01

    Premise of the study: Microsatellite or simple sequence repeat (SSR) markers were developed to investigate the influence of ecological factors on gene flow and spatial genetic structuring of the submerged plant Ranunculus bungei (Ranunculaceae), which is regarded as an important species for understanding how plants adapt to an aquatic environment. Methods and Results: Twenty-two microsatellite loci were identified from an expressed sequence tag (EST) library. The number of alleles per locus ranged from one to five, and the expected heterozygosity varied from 0.0 to 0.5 in four Chinese populations of R. bungei. Fourteen loci were polymorphic and significantly deviated from Hardy–Weinberg equilibrium. All of the loci were found to be amplifiable in two other species of Ranunculus section Batrachium, and cross-amplification in six riparian and aquatic species of Ranunculaceae was also partially successful. Conclusions: These novel EST-SSR markers will be useful for ecological and evolutionary studies of R. bungei as well as related species. PMID:28791205

  2. Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Catfish Genome Consortium; Wang, Shaolin; Peatman, Eric

    2010-03-23

    Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities tomore » known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.« less

  3. Web servicing the biological office.

    PubMed

    Szugat, Martin; Güttler, Daniel; Fundel, Katrin; Sohler, Florian; Zimmer, Ralf

    2005-09-01

    Biologists routinely use Microsoft Office applications for standard analysis tasks. Despite ubiquitous internet resources, information needed for everyday work is often not directly and seamlessly available. Here we describe a very simple and easily extendable mechanism using Web Services to enrich standard MS Office applications with internet resources. We demonstrate its capabilities by providing a Web-based thesaurus for biological objects, which maps names to database identifiers and vice versa via an appropriate synonym list. The client application ProTag makes these features available in MS Office applications using Smart Tags and Add-Ins. http://services.bio.ifi.lmu.de/prothesaurus/

  4. Time-Tag Generation Script

    NASA Technical Reports Server (NTRS)

    Jackson, Dan E.

    2010-01-01

    Time-Tag Generation Script (TTaGS) is an application program, written in the AWK scripting language, for generating commands for aiming one Ku-band antenna and two S-band antennas for communicating with spacecraft. TTaGS saves between 2 and 4 person-hours per every 24 hours by automating the repetitious process of building between 150 and 180 antenna-control commands. TTaGS reads a text database of communication satellite schedules and a text database of satellite rise and set times and cross-references items in the two databases. It then compares the scheduled start and stop with the geometric rise and set to compute the times to execute antenna control commands. While so doing, TTaGS determines whether to generate commands for guidance, navigation, and control computers to tell them which satellites to track. To help prevent Ku-band irradiation of the Earth, TTaGS accepts input from the user about horizon tolerance and accordingly restricts activation and effects deactivation of the transmitter. TTaGS can be modified easily to enable tracking of additional satellites and for such other tasks as reading Sun-rise/set tables to generate commands to point the solar photovoltaic arrays of the International Space Station at the Sun.

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abbott, Jennifer; Sandberg, Tami

    The Wind-Wildlife Impacts Literature Database (WILD), formerly known as the Avian Literature Database, was created in 1997. The goal of the database was to begin tracking the research that detailed the potential impact of wind energy development on birds. The Avian Literature Database was originally housed on a proprietary platform called Livelink ECM from Open- Text and maintained by in-house technical staff. The initial set of records was added by library staff. A vital part of the newly launched Drupal-based WILD database is the Bibliography module. Many of the resources included in the database have digital object identifiers (DOI). Themore » bibliographic information for any item that has a DOI can be imported into the database using this module. This greatly reduces the amount of manual data entry required to add records to the database. The content available in WILD is international in scope, which can be easily discerned by looking at the tags available in the browse menu.« less

  6. FirstSearch and NetFirst--Web and Dial-up Access: Plus Ca Change, Plus C'est la Meme Chose?

    ERIC Educational Resources Information Center

    Koehler, Wallace; Mincey, Danielle

    1996-01-01

    Compares and evaluates the differences between OCLC's dial-up and World Wide Web FirstSearch access methods and their interfaces with the underlying databases. Also examines NetFirst, OCLC's new Internet catalog, the only Internet tracking database from a "traditional" database service. (Author/PEN)

  7. Bioengineering recombinant diacylglycerol acyltransferases

    USDA-ARS?s Scientific Manuscript database

    Diacylglycerol acyltransferases (DGATs) catalyze the last and rate-limiting step of triacylglycerol (TAG) biosynthesis in eukaryotic organisms. At least 115 DGAT sequences are identified from 69 organisms in the GenBank databases. Only a few papers have been published in the last 28 years on the exp...

  8. GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data

    PubMed Central

    Jung, Sook; Staton, Margaret; Lee, Taein; Blenda, Anna; Svancara, Randall; Abbott, Albert; Main, Dorrie

    2008-01-01

    The Genome Database for Rosaceae (GDR) is a central repository of curated and integrated genetics and genomics data of Rosaceae, an economically important family which includes apple, cherry, peach, pear, raspberry, rose and strawberry. GDR contains annotated databases of all publicly available Rosaceae ESTs, the genetically anchored peach physical map, Rosaceae genetic maps and comprehensively annotated markers and traits. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, gene ontology terms and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap, the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom, and also through our integrated GDR map viewer, which serves as a portal to the combined genetic, transcriptome and physical mapping information. ESTs, BACs, markers and traits can be queried by various categories and the search result sites are linked to the mapping visualization tools. GDR also provides online analysis tools such as a batch BLAST/FASTA server for the GDR datasets, a sequence assembly server and microsatellite and primer detection tools. GDR is available at http://www.rosaceae.org. PMID:17932055

  9. OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents.

    PubMed

    Naderi, Nona; Kappler, Thomas; Baker, Christopher J O; Witte, René

    2011-10-01

    Semantic tagging of organism mentions in full-text articles is an important part of literature mining and semantic enrichment solutions. Tagged organism mentions also play a pivotal role in disambiguating other entities in a text, such as proteins. A high-precision organism tagging system must be able to detect the numerous forms of organism mentions, including common names as well as the traditional taxonomic groups: genus, species and strains. In addition, such a system must resolve abbreviations and acronyms, assign the scientific name and if possible link the detected mention to the NCBI Taxonomy database for further semantic queries and literature navigation. We present the OrganismTagger, a hybrid rule-based/machine learning system to extract organism mentions from the literature. It includes tools for automatically generating lexical and ontological resources from a copy of the NCBI Taxonomy database, thereby facilitating system updates by end users. Its novel ontology-based resources can also be reused in other semantic mining and linked data tasks. Each detected organism mention is normalized to a canonical name through the resolution of acronyms and abbreviations and subsequently grounded with an NCBI Taxonomy database ID. In particular, our system combines a novel machine-learning approach with rule-based and lexical methods for detecting strain mentions in documents. On our manually annotated OT corpus, the OrganismTagger achieves a precision of 95%, a recall of 94% and a grounding accuracy of 97.5%. On the manually annotated corpus of Linnaeus-100, the results show a precision of 99%, recall of 97% and grounding accuracy of 97.4%. The OrganismTagger, including supporting tools, resources, training data and manual annotations, as well as end user and developer documentation, is freely available under an open-source license at http://www.semanticsoftware.info/organism-tagger. witte@semanticsoftware.info.

  10. Expression profiling and cross-species RNA interference (RNAi) of desiccation-induced transcripts in the anhydrobiotic nematode Aphelenchus avenae

    PubMed Central

    2010-01-01

    Background Some organisms can survive extreme desiccation by entering a state of suspended animation known as anhydrobiosis. The free-living mycophagous nematode Aphelenchus avenae can be induced to enter anhydrobiosis by pre-exposure to moderate reductions in relative humidity (RH) prior to extreme desiccation. This preconditioning phase is thought to allow modification of the transcriptome by activation of genes required for desiccation tolerance. Results To identify such genes, a panel of expressed sequence tags (ESTs) enriched for sequences upregulated in A. avenae during preconditioning was created. A subset of 30 genes with significant matches in databases, together with a number of apparently novel sequences, were chosen for further study. Several of the recognisable genes are associated with water stress, encoding, for example, two new hydrophilic proteins related to the late embryogenesis abundant (LEA) protein family. Expression studies confirmed EST panel members to be upregulated by evaporative water loss, and the majority of genes was also induced by osmotic stress and cold, but rather fewer by heat. We attempted to use RNA interference (RNAi) to demonstrate the importance of this gene set for anhydrobiosis, but found A. avenae to be recalcitrant with the techniques used. Instead, therefore, we developed a cross-species RNAi procedure using A. avenae sequences in another anhydrobiotic nematode, Panagrolaimus superbus, which is amenable to gene silencing. Of 20 A. avenae ESTs screened, a significant reduction in survival of desiccation in treated P. superbus populations was observed with two sequences, one of which was novel, while the other encoded a glutathione peroxidase. To confirm a role for glutathione peroxidases in anhydrobiosis, RNAi with cognate sequences from P. superbus was performed and was also shown to reduce desiccation tolerance in this species. Conclusions This study has identified and characterised the expression profiles of members of the anhydrobiotic gene set in A. avenae. It also demonstrates the potential of RNAi for the analysis of anhydrobiosis and provides the first genetic data to underline the importance of effective antioxidant systems in metazoan desiccation tolerance. PMID:20085654

  11. Transcriptomic resources for the medicinal legume Mucuna pruriens: de novo transcriptome assembly, annotation, identification and validation of EST-SSR markers.

    PubMed

    Sathyanarayana, N; Pittala, Ranjith Kumar; Tripathi, Pankaj Kumar; Chopra, Ratan; Singh, Heikham Russiachand; Belamkar, Vikas; Bhardwaj, Pardeep Kumar; Doyle, Jeff J; Egan, Ashley N

    2017-05-25

    The medicinal legume Mucuna pruriens (L.) DC. has attracted attention worldwide as a source of the anti-Parkinson's drug L-Dopa. It is also a popular green manure cover crop that offers many agronomic benefits including high protein content, nitrogen fixation and soil nutrients. The plant currently lacks genomic resources and there is limited knowledge on gene expression, metabolic pathways, and genetics of secondary metabolite production. Here, we present transcriptomic resources for M. pruriens, including a de novo transcriptome assembly and annotation, as well as differential transcript expression analyses between root, leaf, and pod tissues. We also develop microsatellite markers and analyze genetic diversity and population structure within a set of Indian germplasm accessions. One-hundred ninety-one million two hundred thirty-three thousand two hundred forty-two bp cleaned reads were assembled into 67,561 transcripts with mean length of 626 bp and N50 of 987 bp. Assembled sequences were annotated using BLASTX against public databases with over 80% of transcripts annotated. We identified 7,493 simple sequence repeat (SSR) motifs, including 787 polymorphic repeats between the parents of a mapping population. 134 SSRs from expressed sequenced tags (ESTs) were screened against 23 M. pruriens accessions from India, with 52 EST-SSRs retained after quality control. Population structure analysis using a Bayesian framework implemented in fastSTRUCTURE showed nearly similar groupings as with distance-based (neighbor-joining) and principal component analyses, with most of the accessions clustering per geographical origins. Pair-wise comparison of transcript expression in leaves, roots and pods identified 4,387 differentially expressed transcripts with the highest number occurring between roots and leaves. Differentially expressed transcripts were enriched with transcription factors and transcripts annotated as belonging to secondary metabolite pathways. The M. pruriens transcriptomic resources generated in this study provide foundational resources for gene discovery and development of molecular markers. Polymorphic SSRs identified can be used for genetic diversity, marker-trait analyses, and development of functional markers for crop improvement. The results of differential expression studies can be used to investigate genes involved in L-Dopa synthesis and other key metabolic pathways in M. pruriens.

  12. Transcriptome survey of the anhydrobiotic tardigrade Milnesium tardigradum in comparison with Hypsibius dujardini and Richtersius coronifer

    PubMed Central

    2010-01-01

    Background The phenomenon of desiccation tolerance, also called anhydrobiosis, involves the ability of an organism to survive the loss of almost all cellular water without sustaining irreversible damage. Although there are several physiological, morphological and ecological studies on tardigrades, only limited DNA sequence information is available. Therefore, we explored the transcriptome in the active and anhydrobiotic state of the tardigrade Milnesium tardigradum which has extraordinary tolerance to desiccation and freezing. In this study, we present the first overview of the transcriptome of M. tardigradum and its response to desiccation and discuss potential parallels to stress responses in other organisms. Results We sequenced a total of 9984 expressed sequence tags (ESTs) from two cDNA libraries from the eutardigrade M. tardigradum in its active and inactive, anhydrobiotic (tun) stage. Assembly of these ESTs resulted in 3283 putative unique transcripts, whereof ~50% showed significant sequence similarity to known genes. The resulting unigenes were functionally annotated using the Gene Ontology (GO) vocabulary. A GO term enrichment analysis revealed several GOs that were significantly underrepresented in the inactive stage. Furthermore we compared the putative unigenes of M. tardigradum with ESTs from two other eutardigrade species that are available from public sequence databases, namely Richtersius coronifer and Hypsibius dujardini. The processed sequences of the three tardigrade species revealed similar functional content and the M. tardigradum dataset contained additional sequences from tardigrades not present in the other two. Conclusions This study describes novel sequence data from the tardigrade M. tardigradum, which significantly contributes to the available tardigrade sequence data and will help to establish this extraordinary tardigrade as a model for studying anhydrobiosis. Functional comparison of active and anhydrobiotic tardigrades revealed a differential distribution of Gene Ontology terms associated with chromatin structure and the translation machinery, which are underrepresented in the inactive animals. These findings imply a widespread metabolic response of the animals on dehydration. The collective tardigrade transcriptome data will serve as a reference for further studies and support the identification and characterization of genes involved in the anhydrobiotic response. PMID:20226016

  13. Lung cancer signature biomarkers: tissue specific semantic similarity based clustering of digital differential display (DDD) data.

    PubMed

    Srivastava, Mousami; Khurana, Pankaj; Sugadev, Ragumani

    2012-11-02

    The tissue-specific Unigene Sets derived from more than one million expressed sequence tags (ESTs) in the NCBI, GenBank database offers a platform for identifying significantly and differentially expressed tissue-specific genes by in-silico methods. Digital differential display (DDD) rapidly creates transcription profiles based on EST comparisons and numerically calculates, as a fraction of the pool of ESTs, the relative sequence abundance of known and novel genes. However, the process of identifying the most likely tissue for a specific disease in which to search for candidate genes from the pool of differentially expressed genes remains difficult. Therefore, we have used 'Gene Ontology semantic similarity score' to measure the GO similarity between gene products of lung tissue-specific candidate genes from control (normal) and disease (cancer) sets. This semantic similarity score matrix based on hierarchical clustering represents in the form of a dendrogram. The dendrogram cluster stability was assessed by multiple bootstrapping. Multiple bootstrapping also computes a p-value for each cluster and corrects the bias of the bootstrap probability. Subsequent hierarchical clustering by the multiple bootstrapping method (α = 0.95) identified seven clusters. The comparative, as well as subtractive, approach revealed a set of 38 biomarkers comprising four distinct lung cancer signature biomarker clusters (panel 1-4). Further gene enrichment analysis of the four panels revealed that each panel represents a set of lung cancer linked metastasis diagnostic biomarkers (panel 1), chemotherapy/drug resistance biomarkers (panel 2), hypoxia regulated biomarkers (panel 3) and lung extra cellular matrix biomarkers (panel 4). Expression analysis reveals that hypoxia induced lung cancer related biomarkers (panel 3), HIF and its modulating proteins (TGM2, CSNK1A1, CTNNA1, NAMPT/Visfatin, TNFRSF1A, ETS1, SRC-1, FN1, APLP2, DMBT1/SAG, AIB1 and AZIN1) are significantly down regulated. All down regulated genes in this panel were highly up regulated in most other types of cancers. These panels of proteins may represent signature biomarkers for lung cancer and will aid in lung cancer diagnosis and disease monitoring as well as in the prediction of responses to therapeutics.

  14. Transcriptome survey of the anhydrobiotic tardigrade Milnesium tardigradum in comparison with Hypsibius dujardini and Richtersius coronifer.

    PubMed

    Mali, Brahim; Grohme, Markus A; Förster, Frank; Dandekar, Thomas; Schnölzer, Martina; Reuter, Dirk; Wełnicz, Weronika; Schill, Ralph O; Frohme, Marcus

    2010-03-12

    The phenomenon of desiccation tolerance, also called anhydrobiosis, involves the ability of an organism to survive the loss of almost all cellular water without sustaining irreversible damage. Although there are several physiological, morphological and ecological studies on tardigrades, only limited DNA sequence information is available. Therefore, we explored the transcriptome in the active and anhydrobiotic state of the tardigrade Milnesium tardigradum which has extraordinary tolerance to desiccation and freezing. In this study, we present the first overview of the transcriptome of M. tardigradum and its response to desiccation and discuss potential parallels to stress responses in other organisms. We sequenced a total of 9984 expressed sequence tags (ESTs) from two cDNA libraries from the eutardigrade M. tardigradum in its active and inactive, anhydrobiotic (tun) stage. Assembly of these ESTs resulted in 3283 putative unique transcripts, whereof approximately 50% showed significant sequence similarity to known genes. The resulting unigenes were functionally annotated using the Gene Ontology (GO) vocabulary. A GO term enrichment analysis revealed several GOs that were significantly underrepresented in the inactive stage. Furthermore we compared the putative unigenes of M. tardigradum with ESTs from two other eutardigrade species that are available from public sequence databases, namely Richtersius coronifer and Hypsibius dujardini. The processed sequences of the three tardigrade species revealed similar functional content and the M. tardigradum dataset contained additional sequences from tardigrades not present in the other two. This study describes novel sequence data from the tardigrade M. tardigradum, which significantly contributes to the available tardigrade sequence data and will help to establish this extraordinary tardigrade as a model for studying anhydrobiosis. Functional comparison of active and anhydrobiotic tardigrades revealed a differential distribution of Gene Ontology terms associated with chromatin structure and the translation machinery, which are underrepresented in the inactive animals. These findings imply a widespread metabolic response of the animals on dehydration. The collective tardigrade transcriptome data will serve as a reference for further studies and support the identification and characterization of genes involved in the anhydrobiotic response.

  15. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

    PubMed Central

    Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.

    2001-01-01

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022

  16. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.

    PubMed

    Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M

    2001-10-09

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.

  17. Population structure of pigs determined by single nucleotide polymorphisms observed in assembled expressed sequence tags.

    PubMed

    Matsumoto, Toshimi; Okumura, Naohiko; Uenishi, Hirohide; Hayashi, Takeshi; Hamasima, Noriyuki; Awata, Takashi

    2012-01-01

    We have collected more than 190000 porcine expressed sequence tags (ESTs) from full-length complementary DNA (cDNA) libraries and identified more than 2800 single nucleotide polymorphisms (SNPs). In this study, we tentatively chose 222 SNPs observed in assembled ESTs to study pigs of different breeds; 104 were selected by comparing the cDNA sequences of a Meishan pig and samples of three-way cross pigs (Landrace, Large White, and Duroc: LWD), and 118 were selected from LWD samples. To evaluate the genetic variation between the chosen SNPs from pig breeds, we determined the genotypes for 192 pig samples (11 pig groups) from our DNA reference panel with matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Of the 222 reference SNPs, 186 were successfully genotyped. A neighbor-joining tree showed that the pig groups were classified into two large clusters, namely, Euro-American and East Asian pig populations. F-statistics and the analysis of molecular variance of Euro-American pig groups revealed that approximately 25% of the genetic variations occurred because of intergroup differences. As the F(IS) values were less than the F(ST) values(,) the clustering, based on the Bayesian inference, implied that there was strong genetic differentiation among pig groups and less divergence within the groups in our samples. © 2011 The Authors. Animal Science Journal © 2011 Japanese Society of Animal Science.

  18. Bioinformatic identification and expression analysis of banana microRNAs and their targets.

    PubMed

    Chai, Juan; Feng, Renjun; Shi, Hourui; Ren, Mengyun; Zhang, Yindong; Wang, Jingyi

    2015-01-01

    MicroRNAs (miRNAs) represent a class of endogenous non-coding small RNAs that play important roles in multiple biological processes by degrading targeted mRNAs or repressing mRNA translation. Thousands of miRNAs have been identified in many plant species, whereas only a limited number of miRNAs have been predicted in M. acuminata (A genome) and M. balbisiana (B genome). Here, previously known plant miRNAs were BLASTed against the Expressed Sequence Tag (EST) and Genomic Survey Sequence (GSS), a database of banana genes. A total of 32 potential miRNAs belonging to 13 miRNAs families were detected using a range of filtering criteria. 244 miRNA:target pairs were subsequently predicted, most of which encode transcription factors or enzymes that participate in the regulation of development, growth, metabolism, and other physiological processes. In order to validate the predicted miRNAs and the mutual relationship between miRNAs and their target genes, qRT-PCR was applied to detect the tissue-specific expression levels of 12 putative miRNAs and 6 target genes in roots, leaves, flowers, and fruits. This study provides some important information about banana pre-miRNAs, mature miRNAs, and miRNA target genes and these findings can be applied to future research of miRNA functions.

  19. Bioinformatic Identification and Expression Analysis of Banana MicroRNAs and Their Targets

    PubMed Central

    Shi, Hourui; Ren, Mengyun; Zhang, Yindong; Wang, Jingyi

    2015-01-01

    MicroRNAs (miRNAs) represent a class of endogenous non-coding small RNAs that play important roles in multiple biological processes by degrading targeted mRNAs or repressing mRNA translation. Thousands of miRNAs have been identified in many plant species, whereas only a limited number of miRNAs have been predicted in M. acuminata (A genome) and M. balbisiana (B genome). Here, previously known plant miRNAs were BLASTed against the Expressed Sequence Tag (EST) and Genomic Survey Sequence (GSS), a database of banana genes. A total of 32 potential miRNAs belonging to 13 miRNAs families were detected using a range of filtering criteria. 244 miRNA:target pairs were subsequently predicted, most of which encode transcription factors or enzymes that participate in the regulation of development, growth, metabolism, and other physiological processes. In order to validate the predicted miRNAs and the mutual relationship between miRNAs and their target genes, qRT-PCR was applied to detect the tissue-specific expression levels of 12 putative miRNAs and 6 target genes in roots, leaves, flowers, and fruits. This study provides some important information about banana pre-miRNAs, mature miRNAs, and miRNA target genes and these findings can be applied to future research of miRNA functions. PMID:25856313

  20. Text Detection and Translation from Natural Scenes

    DTIC Science & Technology

    2001-06-01

    is no explicit tags around Chinese words. A module for Chinese word segmentation is included in the system. This segmentor uses a word- frequency ... list to make segmentation decisions. We tested the EBMT based method using randomly selected 50 signs from our database, assuming perfect sign

  1. Detecting Spatial Patterns of Natural Hazards from the Wikipedia Knowledge Base

    NASA Astrophysics Data System (ADS)

    Fan, J.; Stewart, K.

    2015-07-01

    The Wikipedia database is a data source of immense richness and variety. Included in this database are thousands of geotagged articles, including, for example, almost real-time updates on current and historic natural hazards. This includes usercontributed information about the location of natural hazards, the extent of the disasters, and many details relating to response, impact, and recovery. In this research, a computational framework is proposed to detect spatial patterns of natural hazards from the Wikipedia database by combining topic modeling methods with spatial analysis techniques. The computation is performed on the Neon Cluster, a high performance-computing cluster at the University of Iowa. This work uses wildfires as the exemplar hazard, but this framework is easily generalizable to other types of hazards, such as hurricanes or flooding. Latent Dirichlet Allocation (LDA) modeling is first employed to train the entire English Wikipedia dump, transforming the database dump into a 500-dimension topic model. Over 230,000 geo-tagged articles are then extracted from the Wikipedia database, spatially covering the contiguous United States. The geo-tagged articles are converted into an LDA topic space based on the topic model, with each article being represented as a weighted multidimension topic vector. By treating each article's topic vector as an observed point in geographic space, a probability surface is calculated for each of the topics. In this work, Wikipedia articles about wildfires are extracted from the Wikipedia database, forming a wildfire corpus and creating a basis for the topic vector analysis. The spatial distribution of wildfire outbreaks in the US is estimated by calculating the weighted sum of the topic probability surfaces using a map algebra approach, and mapped using GIS. To provide an evaluation of the approach, the estimation is compared to wildfire hazard potential maps created by the USDA Forest service.

  2. Serial analysis of gene expression in a rat lung model of asthma.

    PubMed

    Yin, Lei-Miao; Jiang, Gong-Hao; Wang, Yu; Wang, Yan; Liu, Yan-Yan; Jin, Wei-Rong; Zhang, Zen; Xu, Yu-Dong; Yang, Yong-Qing

    2008-11-01

    The pathogenesis and molecular mechanism underlying asthma remain undetermined. The purpose of this study was to identify genes and pathways involved in the early airway response (EAR) phase of asthma by using serial analysis of gene expression (SAGE). Two SAGE tag libraries of lung tissues derived from a rat model of asthma and controls were generated. Bioinformatic analyses were carried out using the Database for Annotation, Visualization and IntegratedDiscovery Functional Annotation Tool, Gene Ontology (GO) TreeMachine and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. A total of 26 552 SAGE tags of asthmatic rat lung were obtained, of which 12 221 were unique tags. Of the unique tags, 55.5% were matched with known genes. By comparison of the two libraries, 186 differentially expressed tags (P < 0.05) were identified, of which 103 were upregulated and 83 were downregulated. Using the bioinformatic tools these genes were classified into 23 functional groups, 15 KEGG pathways and 37 enriched GO categories. The bioinformatic analyses of gene distribution, enriched categories and the involvement of specific pathways in the SAGE libraries have provided information on regulatory networks of the EAR phase of asthma. Analyses of the regulated genes of interest may inform new hypotheses, increase our understanding of the disease and provide a foundation for future research.

  3. On the designing of a tamper resistant prescription RFID access control system.

    PubMed

    Safkhani, Masoumeh; Bagheri, Nasour; Naderi, Majid

    2012-12-01

    Recently, Chen et al. have proposed a novel tamper resistant prescription RFID access control system, published in the Journal of Medical Systems. In this paper we consider the security of the proposed protocol and identify some existing weaknesses. The main attack is a reader impersonation attack which allows an active adversary to impersonate a legitimate doctor, e.g. the patient's doctor, to access the patient's tag and change the patient prescription. The presented attack is quite efficient. To impersonate a doctor, the adversary should eavesdrop one session between the doctor and the patient's tag and then she can impersonate the doctor with the success probability of '1'. In addition, we present efficient reader-tag to back-end database impersonation, de-synchronization and traceability attacks against the protocol. Finally, we propose an improved version of protocol which is more efficient compared to the original protocol while provides the desired security against the presented attacks.

  4. Generation, annotation and analysis of ESTs from Trichoderma harzianum CECT 2413

    PubMed Central

    Vizcaíno, Juan Antonio; González, Francisco Javier; Suárez, M Belén; Redondo, José; Heinrich, Julian; Delgado-Jarana, Jesús; Hermosa, Rosa; Gutiérrez, Santiago; Monte, Enrique; Llobell, Antonio; Rey, Manuel

    2006-01-01

    Background The filamentous fungus Trichoderma harzianum is used as biological control agent of several plant-pathogenic fungi. In order to study the genome of this fungus, a functional genomics project called "TrichoEST" was developed to give insights into genes involved in biological control activities using an approach based on the generation of expressed sequence tags (ESTs). Results Eight different cDNA libraries from T. harzianum strain CECT 2413 were constructed. Different growth conditions involving mainly different nutrient conditions and/or stresses were used. We here present the analysis of the 8,710 ESTs generated. A total of 3,478 unique sequences were identified of which 81.4% had sequence similarity with GenBank entries, using the BLASTX algorithm. Using the Gene Ontology hierarchy, we performed the annotation of 51.1% of the unique sequences and compared its distribution among the gene libraries. Additionally, the InterProScan algorithm was used in order to further characterize the sequences. The identification of the putatively secreted proteins was also carried out. Later, based on the EST abundance, we examined the highly expressed genes and a hydrophobin was identified as the gene expressed at the highest level. We compared our collection of ESTs with the previous collections obtained from Trichoderma species and we also compared our sequence set with different complete eukaryotic genomes from several animals, plants and fungi. Accordingly, the presence of similar sequences in different kingdoms was also studied. Conclusion This EST collection and its annotation provide a significant resource for basic and applied research on T. harzianum, a fungus with a high biotechnological interest. PMID:16872539

  5. EstA from Arthrobacter nitroguajacolicus Rü61a, a thermo- and solvent-tolerant carboxylesterase related to class C beta-lactamases.

    PubMed

    Schütte, Marcus; Fetzner, Susanne

    2007-03-01

    The estA gene encoding a novel cytoplasmic carboxylesterase from Arthrobacter nitroguajacolicus Rü61a was expressed in Escherichia coli. Sequence analysis and secondary structure predictions suggested that EstA belongs to the family VIII esterases, which are related to class C beta-lactamases. The S-x-x-K motif that in beta-lactamases contains the catalytic nucleophile, and a putative active-site tyrosine residue are conserved in EstA. The native molecular mass of hexahistidine-tagged (His6) EstA, purified by metal chelate affinity chromatography, was estimated to be 95 kDa by gel filtration, whereas the His6EstA peptide has a calculated molecular mass of 42.1 kDa. The enzyme catalyzes the hydrolysis of short-chain phenylacyl esters and triglycerides, and shows weak activity toward 2-hydroxy- and 2-nitroacetanilide. Its catalytic activity was inhibited by the serine-specific effector phenylmethylsulfonyl fluoride, and by Cd2+ and Hg2+ ions. Maximum activity of His6EstA was observed at a pH of 9.5 and a temperature of 50 degrees C to 60 degrees C. The enzyme was fairly thermostable. After 19 days at 50 degrees C and after 24 hours at 60 degrees C, its residual relative esterase activity toward phenylacetate was still 53% and 30%, respectively. Exposure of His6EstA to buffer-solvent mixtures showed that the enzyme was inactivated by several high log P (hydrophobic) solvents, whereas it showed remarkable stability and activity in up to 30% (by volume) of polar (low log P) organic solvents such as dimethylsulfoxide, methanol, acetonitrile, acetone, and propanol.

  6. A Bayesian nonparametric method for prediction in EST analysis

    PubMed Central

    Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor

    2007-01-01

    Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445

  7. Subtractive transcriptome analysis of leaf and rhizome reveals differentially expressed transcripts in Panax sokpayensis.

    PubMed

    Gurung, Bhusan; Bhardwaj, Pardeep K; Talukdar, Narayan C

    2016-11-01

    In the present study, suppression subtractive hybridization (SSH) strategy was used to identify rare and differentially expressed transcripts in leaf and rhizome tissues of Panax sokpayensis. Out of 1102 randomly picked clones, 513 and 374 high quality expressed sequenced tags (ESTs) were generated from leaf and rhizome subtractive libraries, respectively. Out of them, 64.92 % ESTs from leaf and 69.26 % ESTs from rhizome SSH libraries were assembled into different functional categories, while others were of unknown function. In particular, ESTs encoding galactinol synthase 2, ribosomal RNA processing Brix domain protein, and cell division cycle protein 20.1, which are involved in plant growth and development, were most abundant in the leaf SSH library. Other ESTs encoding protein KIAA0664 homologue, ubiquitin-activating enzyme e11, and major latex protein, which are involved in plant immunity and defense response, were most abundant in the rhizome SSH library. Subtractive ESTs also showed similarity with genes involved in ginsenoside biosynthetic pathway, namely farnesyl pyrophosphate synthase, squalene synthase, and dammarenediol synthase. Expression profiles of selected ESTs validated the quality of libraries and confirmed their differential expression in the leaf, stem, and rhizome tissues. In silico comparative analyses revealed that around 13.75 % of unigenes from the leaf SSH library were not represented in the available leaf transcriptome of Panax ginseng. Similarly, around 18.12, 23.75, 25, and 6.25 % of unigenes from the rhizome SSH library were not represented in available root/rhizome transcriptomes of P. ginseng, Panax notoginseng, Panax quinquefolius, and Panax vietnamensis, respectively, indicating a major fraction of novel ESTs. Therefore, these subtractive transcriptomes provide valuable resources for gene discovery in P. sokpayensis and would complement the available transcriptomes from other Panax species.

  8. Transcriptome-wide analysis of WRKY transcription factors in wheat and their leaf rust responsive expression profiling.

    PubMed

    Satapathy, Lopamudra; Singh, Dharmendra; Ranjan, Prashant; Kumar, Dhananjay; Kumar, Manish; Prabhu, Kumble Vinod; Mukhopadhyay, Kunal

    2014-12-01

    WRKY, a plant-specific transcription factor family, has important roles in pathogen defense, abiotic cues and phytohormone signaling, yet little is known about their roles and molecular mechanism of function in response to rust diseases in wheat. We identified 100 TaWRKY sequences using wheat Expressed Sequence Tag database of which 22 WRKY sequences were novel. Identified proteins were characterized based on their zinc finger motifs and phylogenetic analysis clustered them into six clades consisting of class IIc and class III WRKY proteins. Functional annotation revealed major functions in metabolic and cellular processes in control plants; whereas response to stimuli, signaling and defense in pathogen inoculated plants, their major molecular function being binding to DNA. Tag-based expression analysis of the identified genes revealed differential expression between mock and Puccinia triticina inoculated wheat near isogenic lines. Gene expression was also performed with six rust-related microarray experiments at Gene Expression Omnibus database. TaWRKY10, 15, 17 and 56 were common in both tag-based and microarray-based differential expression analysis and could be representing rust specific WRKY genes. The obtained results will bestow insight into the functional characterization of WRKY transcription factors responsive to leaf rust pathogenesis that can be used as candidate genes in molecular breeding programs to improve biotic stress tolerance in wheat.

  9. Insights into rubber biosynthesis from transcriptome analysis of Hevea brasiliensis latex.

    PubMed

    Chow, Keng-See; Wan, Kiew-Lian; Isa, Mohd Noor Mat; Bahari, Azlina; Tan, Siang-Hee; Harikrishna, K; Yeang, Hoong-Yeet

    2007-01-01

    Hevea brasiliensis is the most widely cultivated species for commercial production of natural rubber (cis-polyisoprene). In this study, 10,040 expressed sequence tags (ESTs) were generated from the latex of the rubber tree, which represents the cytoplasmic content of a single cell type, in order to analyse the latex transcription profile with emphasis on rubber biosynthesis-related genes. A total of 3,441 unique transcripts (UTs) were obtained after quality editing and assembly of EST sequences. Functional classification of UTs according to the Gene Ontology convention showed that 73.8% were related to genes of unknown function. Among highly expressed ESTs, a significant proportion encoded proteins related to rubber biosynthesis and stress or defence responses. Sequences encoding rubber particle membrane proteins (RPMPs) belonging to three protein families accounted for 12% of the ESTs. Characterization of these ESTs revealed nine RPMP variants (7.9-27 kDa) including the 14 kDa REF (rubber elongation factor) and 22 kDa SRPP (small rubber particle protein). The expression of multiple RPMP isoforms in latex was shown using antibodies against REF and SRPP. Both EST and quantitative reverse transcription-PCR (QRT-PCR) analyses demonstrated REF and SRPP to be the most abundant transcripts in latex. Besides rubber biosynthesis, comparative sequence analysis showed that the RPMPs are highly similar to sequences in the plant kingdom having stress-related functions. Implications of the RPMP function in cis-polyisoprene biosynthesis in the context of transcript abundance and differential gene expression are discussed.

  10. Expressed sequence tag analysis of adult human optic nerve for NEIBank: Identification of cell type and tissue markers

    PubMed Central

    Bernstein, Steven L; Guo, Yan; Peterson, Katherine; Wistow, Graeme

    2009-01-01

    Background The optic nerve is a pure white matter central nervous system (CNS) tract with an isolated blood supply, and is widely used in physiological studies of white matter response to various insults. We examined the gene expression profile of human optic nerve (ON) and, through the NEIBANK online resource, to provide a resource of sequenced verified cDNA clones. An un-normalized cDNA library was constructed from pooled human ON tissues and was used in expressed sequence tag (EST) analysis. Location of an abundant oligodendrocyte marker was examined by immunofluorescence. Quantitative real time polymerase chain reaction (qRT-PCR) and Western analysis were used to compare levels of expression for key calcium channel protein genes and protein product in primate and rodent ON. Results Our analyses revealed a profile similar in many respects to other white matter related tissues, but significantly different from previously available ON cDNA libraries. The previous libraries were found to include specific markers for other eye tissues, suggesting contamination. Immune/inflammatory markers were abundant in the new ON library. The oligodendrocyte marker QKI was abundant at the EST level. Immunofluorescence revealed that this protein is a useful oligodendrocyte cell-type marker in rodent and primate ONs. L-type calcium channel EST abundance was found to be particularly low. A qRT-PCR-based comparative mammalian species analysis reveals that L-type calcium channel expression levels are significantly lower in primate than in rodent ON, which may help account for the class-specific difference in responsiveness to calcium channel blocking agents. Several known eye disease genes are abundantly expressed in ON. Many genes associated with normal axonal function, mRNAs associated with axonal transport, inflammation and neuroprotection are observed. Conclusion We conclude that the new cDNA library is a faithful representation of human ON and EST data provide an initial overview of gene expression patterns in this tissue. The data provide clues for tissue-specific and species-specific properties of human ON that will help in design of therapeutic models. PMID:19778450

  11. An Expressed Sequence Tag (EST)-enriched genetic map of turbot (Scophthalmus maximus): a useful framework for comparative genomics across model and farmed teleosts

    PubMed Central

    2012-01-01

    Background The turbot (Scophthalmus maximus) is a relevant species in European aquaculture. The small turbot genome provides a source for genomics strategies to use in order to understand the genetic basis of productive traits, particularly those related to sex, growth and pathogen resistance. Genetic maps represent essential genomic screening tools allowing to localize quantitative trait loci (QTL) and to identify candidate genes through comparative mapping. This information is the backbone to develop marker-assisted selection (MAS) programs in aquaculture. Expressed sequenced tag (EST) resources have largely increased in turbot, thus supplying numerous type I markers suitable for extending the previous linkage map, which was mostly based on anonymous loci. The aim of this study was to construct a higher-resolution turbot genetic map using EST-linked markers, which will turn out to be useful for comparative mapping studies. Results A consensus gene-enriched genetic map of the turbot was constructed using 463 SNP and microsatellite markers in nine reference families. This map contains 438 markers, 180 EST-linked, clustered at 24 linkage groups. Linkage and comparative genomics evidences suggested additional linkage group fusions toward the consolidation of turbot map according to karyotype information. The linkage map showed a total length of 1402.7 cM with low average intermarker distance (3.7 cM; ~2 Mb). A global 1.6:1 female-to-male recombination frequency (RF) ratio was observed, although largely variable among linkage groups and chromosome regions. Comparative sequence analysis revealed large macrosyntenic patterns against model teleost genomes, significant hits decreasing from stickleback (54%) to zebrafish (20%). Comparative mapping supported particular chromosome rearrangements within Acanthopterygii and aided to assign unallocated markers to specific turbot linkage groups. Conclusions The new gene-enriched high-resolution turbot map represents a useful genomic tool for QTL identification, positional cloning strategies, and future genome assembling. This map showed large synteny conservation against model teleost genomes. Comparative genomics and data mining from landmarks will provide straightforward access to candidate genes, which will be the basis for genetic breeding programs and evolutionary studies in this species. PMID:22747677

  12. Expressed sequence tag analysis of adult human lens for the NEIBank Project: over 2000 non-redundant transcripts, novel genes and splice variants.

    PubMed

    Wistow, Graeme; Bernstein, Steven L; Wyatt, M Keith; Behal, Amita; Touchman, Jeffrey W; Bouffard, Gerald; Smith, Don; Peterson, Katherine

    2002-06-15

    To explore the expression profile of the human lens and to provide a resource for microarray studies, expressed sequence tag (EST) analysis has been performed on cDNA libraries from adult lenses. A cDNA library was constructed from two adult (40 year old) human lenses. Over two thousand clones were sequenced from the unamplified, un-normalized library. The library was then normalized and a further 2200 sequences were obtained. All the data were analyzed using GRIST (GRouping and Identification of Sequence Tags), a procedure for gene identification and clustering. The lens library (by) contains a low percentage of non-mRNA contaminants and a high fraction (over 75%) of apparently full length cDNA clones. Approximately 2000 reads from the unamplified library yields 810 clusters, potentially representing individual genes expressed in the lens. After normalization, the content of crystallins and other abundant cDNAs is markedly reduced and a similar number of reads from this library (fs) yields 1455 unique groups of which only two thirds correspond to named genes in GenBank. Among the most abundant cDNAs is one for a novel gene related to glutamine synthetase, which was designated "lengsin" (LGS). Analyses of ESTs also reveal examples of alternative transcripts, including a major alternative splice form for the lens specific membrane protein MP19. Variant forms for other transcripts, including those encoding the apoptosis inhibitor Livin and the armadillo repeat protein ARVCF, are also described. The lens cDNA libraries are a resource for gene discovery, full length cDNAs for functional studies and microarrays. The discovery of an abundant, novel transcript, lengsin, and a major novel splice form of MP19 reflect the utility of unamplified libraries constructed from dissected tissue. Many novel transcripts and splice forms are represented, some of which may be candidates for genetic diseases.

  13. Construction of an Ostrea edulis database from genomic and expressed sequence tags (ESTs) obtained from Bonamia ostreae infected haemocytes: Development of an immune-enriched oligo-microarray.

    PubMed

    Pardo, Belén G; Álvarez-Dios, José Antonio; Cao, Asunción; Ramilo, Andrea; Gómez-Tato, Antonio; Planas, Josep V; Villalba, Antonio; Martínez, Paulino

    2016-12-01

    The flat oyster, Ostrea edulis, is one of the main farmed oysters, not only in Europe but also in the United States and Canada. Bonamiosis due to the parasite Bonamia ostreae has been associated with high mortality episodes in this species. This parasite is an intracellular protozoan that infects haemocytes, the main cells involved in oyster defence. Due to the economical and ecological importance of flat oyster, genomic data are badly needed for genetic improvement of the species, but they are still very scarce. The objective of this study is to develop a sequence database, OedulisDB, with new genomic and transcriptomic resources, providing new data and convenient tools to improve our knowledge of the oyster's immune mechanisms. Transcriptomic and genomic sequences were obtained using 454 pyrosequencing and compiled into an O. edulis database, OedulisDB, consisting of two sets of 10,318 and 7159 unique sequences that represent the oyster's genome (WG) and de novo haemocyte transcriptome (HT), respectively. The flat oyster transcriptome was obtained from two strains (naïve and tolerant) challenged with B. ostreae, and from their corresponding non-challenged controls. Approximately 78.5% of 5619 HT unique sequences were successfully annotated by Blast search using public databases. A total of 984 sequences were identified as being related to immune response and several key immune genes were identified for the first time in flat oyster. Additionally, transcriptome information was used to design and validate the first oligo-microarray in flat oyster enriched with immune sequences from haemocytes. Our transcriptomic and genomic sequencing and subsequent annotation have largely increased the scarce resources available for this economically important species and have enabled us to develop an OedulisDB database and accompanying tools for gene expression analysis. This study represents the first attempt to characterize in depth the O. edulis haemocyte transcriptome in response to B. ostreae through massively sequencing and has aided to improve our knowledge of the immune mechanisms of flat oyster. The validated oligo-microarray and the establishment of a reference transcriptome will be useful for large-scale gene expression studies in this species. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Analysis of expressed sequence tags (ESTs) from avocado seed (Persea americana var. drymifolia) reveals abundant expression of the gene encoding the antimicrobial peptide snakin.

    PubMed

    Guzmán-Rodríguez, Jaquelina J; Ibarra-Laclette, Enrique; Herrera-Estrella, Luis; Ochoa-Zarzosa, Alejandra; Suárez-Rodríguez, Luis María; Rodríguez-Zapata, Luis C; Salgado-Garciglia, Rafael; Jimenez-Moraila, Beatriz; López-Meza, Joel E; López-Gómez, Rodolfo

    2013-09-01

    Avocado is one of the most important fruits in the world. Avocado "native mexicano" (Persea americana var. drymifolia) seeds are widely used in the propagation of this plant and are the primary source of rootstocks globally for a variety of avocado cultivars, such as the Hass avocado. Here, we report the isolation of 5005 ESTs from the 5' ends of P. americana var. drymifolia seed cDNA clones representing 1584 possible unigenes. These avocado seed ESTs were compared with the avocado flower EST library, and we detected several genes that are expressed either in both tissues or only in the seed. The snakin gene, which encodes an element of the innate immune response in plants, was one of those most frequently found among the seed ESTs, and this suggests that it is abundantly expressed in the avocado seed. We expressed the snakin gene in a heterologous system, namely the bovine endothelial cell line BVE-E6E7. Conditioned media from transfected BVE-E6E7 cells showed antimicrobial activity against strains of Escherichia coli and Staphylococcus aureus. This is the first study of the function of the snakin gene in plant seed tissue, and our observations suggest that this gene might play a protective role in the avocado seed. Copyright © 2013 Elsevier Masson SAS. All rights reserved.

  15. Development of Genic and Genomic SSR Markers of Robusta Coffee (Coffea canephora Pierre Ex A. Froehner)

    PubMed Central

    Hendre, Prasad S.; Aggarwal, Ramesh K.

    2014-01-01

    Coffee breeding and improvement efforts can be greatly facilitated by availability of a large repository of simple sequence repeats (SSRs) based microsatellite markers, which provides efficiency and high-resolution in genetic analyses. This study was aimed to improve SSR availability in coffee by developing new genic−/genomic-SSR markers using in-silico bioinformatics and streptavidin-biotin based enrichment approach, respectively. The expressed sequence tag (EST) based genic microsatellite markers (EST-SSRs) were developed using the publicly available dataset of 13,175 unigene ESTs, which showed a distribution of 1 SSR/3.4 kb of coffee transcriptome. Genomic SSRs, on the other hand, were developed from an SSR-enriched small-insert partial genomic library of robusta coffee. In total, 69 new SSRs (44 EST-SSRs and 25 genomic SSRs) were developed and validated as suitable genetic markers. Diversity analysis of selected coffee genotypes revealed these to be highly informative in terms of allelic diversity and PIC values, and eighteen of these markers (∼27%) could be mapped on a robusta linkage map. Notably, the markers described here also revealed a very high cross-species transferability. In addition to the validated markers, we have also designed primer pairs for 270 putative EST-SSRs, which are expected to provide another ca. 200 useful genetic markers considering the high success rate (88%) of marker conversion of similar pairs tested/validated in this study. PMID:25461752

  16. Endo-beta-N-acetylglucosaminidase, an enzyme involved in processing of free oligosaccharides in the cytosol.

    PubMed

    Suzuki, Tadashi; Yano, Keiichi; Sugimoto, Seiji; Kitajima, Ken; Lennarz, William J; Inoue, Sadako; Inoue, Yasuo; Emori, Yasufumi

    2002-07-23

    Formation of oligosaccharides occurs both in the cytosol and in the lumen of the endoplasmic reticulum (ER). Luminal oligosaccharides are transported into the cytosol to ensure that they do not interfere with proper functioning of the glycan-dependent quality control machinery in the lumen of the ER for newly synthesized glycoproteins. Once in the cytosol, free oligosaccharides are catabolized, possibly to maximize the reutilization of the component sugars. An endo-beta-N-acetylglucosaminidase (ENGase) is a key enzyme involved in the processing of free oligosaccharides in the cytosol. This enzyme activity has been widely described in animal cells, but the gene encoding this enzyme activity has not been reported. Here, we report the identification of the gene encoding human cytosolic ENGase. After 11 steps, the enzyme was purified 150,000-fold to homogeneity from hen oviduct, and several internal amino acid sequences were analyzed. Based on the internal sequence and examination of expressed sequence tag (EST) databases, we identified the human orthologue of the purified protein. The human protein consists of 743 aa and has no apparent signal sequence, supporting the idea that this enzyme is localized in the cytosol. By expressing the cDNA of the putative human ENGase in COS-7 cells, the enzyme activity in the soluble fraction was enhanced 100-fold over the basal level, confirming that the human gene identified indeed encodes for ENGase. Careful gene database surveys revealed the occurrence of ENGase homologues in Drosophila melanogaster, Caenorhabditis elegans, and Arabidopsis thaliana, indicating the broad occurrence of ENGase in higher eukaryotes. This gene was expressed in a variety of human tissues, suggesting that this enzyme is involved in basic biological processes in eukaryotic cells.

  17. gPhoton: Time-tagged GALEX photon events analysis tools

    NASA Astrophysics Data System (ADS)

    Million, Chase C.; Fleming, S. W.; Shiao, B.; Loyd, P.; Seibert, M.; Smith, M.

    2016-03-01

    Written in Python, gPhoton calibrates and sky-projects the ~1.1 trillion ultraviolet photon events detected by the microchannel plates on the Galaxy Evolution Explorer Spacecraft (GALEX), archives these events in a publicly accessible database at the Mikulski Archive for Space Telescopes (MAST), and provides tools for working with the database to extract scientific results, particularly over short time domains. The software includes a re-implementation of core functionality of the GALEX mission calibration pipeline to produce photon list files from raw spacecraft data as well as a suite of command line tools to generate calibrated light curves, images, and movies from the MAST database.

  18. The first set of EST resource for gene discovery and marker development in pigeonpea (Cajanus cajan L.).

    PubMed

    Raju, Nikku L; Gnanesh, Belaghihalli N; Lekha, Pazhamala; Jayashree, Balaji; Pande, Suresh; Hiremath, Pavana J; Byregowda, Munishamappa; Singh, Nagendra K; Varshney, Rajeev K

    2010-03-11

    Pigeonpea (Cajanus cajan (L.) Millsp) is one of the major grain legume crops of the tropics and subtropics, but biotic stresses [Fusarium wilt (FW), sterility mosaic disease (SMD), etc.] are serious challenges for sustainable crop production. Modern genomic tools such as molecular markers and candidate genes associated with resistance to these stresses offer the possibility of facilitating pigeonpea breeding for improving biotic stress resistance. Availability of limited genomic resources, however, is a serious bottleneck to undertake molecular breeding in pigeonpea to develop superior genotypes with enhanced resistance to above mentioned biotic stresses. With an objective of enhancing genomic resources in pigeonpea, this study reports generation and analysis of comprehensive resource of FW- and SMD- responsive expressed sequence tags (ESTs). A total of 16 cDNA libraries were constructed from four pigeonpea genotypes that are resistant and susceptible to FW ('ICPL 20102' and 'ICP 2376') and SMD ('ICP 7035' and 'TTB 7') and a total of 9,888 (9,468 high quality) ESTs were generated and deposited in dbEST of GenBank under accession numbers GR463974 to GR473857 and GR958228 to GR958231. Clustering and assembly analyses of these ESTs resulted into 4,557 unique sequences (unigenes) including 697 contigs and 3,860 singletons. BLASTN analysis of 4,557 unigenes showed a significant identity with ESTs of different legumes (23.2-60.3%), rice (28.3%), Arabidopsis (33.7%) and poplar (35.4%). As expected, pigeonpea ESTs are more closely related to soybean (60.3%) and cowpea ESTs (43.6%) than other plant ESTs. Similarly, BLASTX similarity results showed that only 1,603 (35.1%) out of 4,557 total unigenes correspond to known proteins in the UniProt database (or= 5 sequences detected 102 single nucleotide polymorphisms (SNPs) in 37 contigs. As an example, a set of 10 contigs were used for confirming in silico predicted SNPs in a set of four genotypes using wet lab experiments. Occurrence of SNPs were confirmed for all the 6 contigs for which scorable and sequenceable amplicons were generated. PCR amplicons were not obtained in case of 4 contigs. Recognition sites for restriction enzymes were identified for 102 SNPs in 37 contigs that indicates possibility of assaying SNPs in 37 genes using cleaved amplified polymorphic sequences (CAPS) assay. The pigeonpea EST dataset generated here provides a transcriptomic resource for gene discovery and development of functional markers associated with biotic stress resistance. Sequence analyses of this dataset have showed conservation of a considerable number of pigeonpea transcripts across legume and model plant species analysed as well as some putative pigeonpea specific genes. Validation of identified biotic stress responsive genes should provide candidate genes for allele mining as well as candidate markers for molecular breeding.

  19. The first set of EST resource for gene discovery and marker development in pigeonpea (Cajanus cajan L.)

    PubMed Central

    2010-01-01

    Background Pigeonpea (Cajanus cajan (L.) Millsp) is one of the major grain legume crops of the tropics and subtropics, but biotic stresses [Fusarium wilt (FW), sterility mosaic disease (SMD), etc.] are serious challenges for sustainable crop production. Modern genomic tools such as molecular markers and candidate genes associated with resistance to these stresses offer the possibility of facilitating pigeonpea breeding for improving biotic stress resistance. Availability of limited genomic resources, however, is a serious bottleneck to undertake molecular breeding in pigeonpea to develop superior genotypes with enhanced resistance to above mentioned biotic stresses. With an objective of enhancing genomic resources in pigeonpea, this study reports generation and analysis of comprehensive resource of FW- and SMD- responsive expressed sequence tags (ESTs). Results A total of 16 cDNA libraries were constructed from four pigeonpea genotypes that are resistant and susceptible to FW ('ICPL 20102' and 'ICP 2376') and SMD ('ICP 7035' and 'TTB 7') and a total of 9,888 (9,468 high quality) ESTs were generated and deposited in dbEST of GenBank under accession numbers GR463974 to GR473857 and GR958228 to GR958231. Clustering and assembly analyses of these ESTs resulted into 4,557 unique sequences (unigenes) including 697 contigs and 3,860 singletons. BLASTN analysis of 4,557 unigenes showed a significant identity with ESTs of different legumes (23.2-60.3%), rice (28.3%), Arabidopsis (33.7%) and poplar (35.4%). As expected, pigeonpea ESTs are more closely related to soybean (60.3%) and cowpea ESTs (43.6%) than other plant ESTs. Similarly, BLASTX similarity results showed that only 1,603 (35.1%) out of 4,557 total unigenes correspond to known proteins in the UniProt database (≤ 1E-08). Functional categorization of the annotated unigenes sequences showed that 153 (3.3%) genes were assigned to cellular component category, 132 (2.8%) to biological process, and 132 (2.8%) in molecular function. Further, 19 genes were identified differentially expressed between FW- responsive genotypes and 20 between SMD- responsive genotypes. Generated ESTs were compiled together with 908 ESTs available in public domain, at the time of analysis, and a set of 5,085 unigenes were defined that were used for identification of molecular markers in pigeonpea. For instance, 3,583 simple sequence repeat (SSR) motifs were identified in 1,365 unigenes and 383 primer pairs were designed. Assessment of a set of 84 primer pairs on 40 elite pigeonpea lines showed polymorphism with 15 (28.8%) markers with an average of four alleles per marker and an average polymorphic information content (PIC) value of 0.40. Similarly, in silico mining of 133 contigs with ≥ 5 sequences detected 102 single nucleotide polymorphisms (SNPs) in 37 contigs. As an example, a set of 10 contigs were used for confirming in silico predicted SNPs in a set of four genotypes using wet lab experiments. Occurrence of SNPs were confirmed for all the 6 contigs for which scorable and sequenceable amplicons were generated. PCR amplicons were not obtained in case of 4 contigs. Recognition sites for restriction enzymes were identified for 102 SNPs in 37 contigs that indicates possibility of assaying SNPs in 37 genes using cleaved amplified polymorphic sequences (CAPS) assay. Conclusion The pigeonpea EST dataset generated here provides a transcriptomic resource for gene discovery and development of functional markers associated with biotic stress resistance. Sequence analyses of this dataset have showed conservation of a considerable number of pigeonpea transcripts across legume and model plant species analysed as well as some putative pigeonpea specific genes. Validation of identified biotic stress responsive genes should provide candidate genes for allele mining as well as candidate markers for molecular breeding. PMID:20222972

  20. UGTA Photograph Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    NSTec Environmental Restoration

    One of the advantages of the Nevada Test Site (NTS) is that most of the geologic and hydrologic features such as hydrogeologic units (HGUs), hydrostratigraphic units (HSUs), and faults, which are important aspects of flow and transport modeling, are exposed at the surface somewhere in the vicinity of the NTS and thus are available for direct observation. However, due to access restrictions and the remote locations of many of the features, most Underground Test Area (UGTA) participants cannot observe these features directly in the field. Fortunately, National Security Technologies, LLC, geologists and their predecessors have photographed many of these featuresmore » through the years. During fiscal year 2009, work was done to develop an online photograph database for use by the UGTA community. Photographs were organized, compiled, and imported into Adobe® Photoshop® Elements 7. The photographs were then assigned keyword tags such as alteration type, HGU, HSU, location, rock feature, rock type, and stratigraphic unit. Some fully tagged photographs were then selected and uploaded to the UGTA website. This online photograph database provides easy access for all UGTA participants and can help “ground truth” their analytical and modeling tasks. It also provides new participants a resource to more quickly learn the geology and hydrogeology of the NTS.« less

  1. NCBI Bookshelf: books and documents in life sciences and health care

    PubMed Central

    Hoeppner, Marilu A.

    2013-01-01

    Bookshelf (http://www.ncbi.nlm.nih.gov/books/) is a full-text electronic literature resource of books and documents in life sciences and health care at the National Center for Biotechnology Information (NCBI). Created in 1999 with a single book as an encyclopedic reference for resources such as PubMed and GenBank, it has grown to its current size of >1300 titles. Unlike other NCBI databases, such as GenBank and Gene, which have a strict data structure, books come in all forms; they are diverse in publication types, formats, sizes and authoring models. The Bookshelf data format is XML tagged in the NCBI Book DTD (Document Type Definition), modeled after the National Library of Medicine journal article DTDs. The book DTD has been used for systematically tagging the diverse data formats of books, a move that has set the foundation for the growth of this resource. Books at NCBI followed the route of journal articles in the PubMed Central project, using the PubMed Central architectural framework, workflows and processes. Through integration with other NCBI molecular databases, books at NCBI can be used to provide reference information for biological data and facilitate its discovery. This article describes Bookshelf at NCBI: its growth, data handling and retrieval and integration with molecular databases. PMID:23203889

  2. NCBI Bookshelf: books and documents in life sciences and health care.

    PubMed

    Hoeppner, Marilu A

    2013-01-01

    Bookshelf (http://www.ncbi.nlm.nih.gov/books/) is a full-text electronic literature resource of books and documents in life sciences and health care at the National Center for Biotechnology Information (NCBI). Created in 1999 with a single book as an encyclopedic reference for resources such as PubMed and GenBank, it has grown to its current size of >1300 titles. Unlike other NCBI databases, such as GenBank and Gene, which have a strict data structure, books come in all forms; they are diverse in publication types, formats, sizes and authoring models. The Bookshelf data format is XML tagged in the NCBI Book DTD (Document Type Definition), modeled after the National Library of Medicine journal article DTDs. The book DTD has been used for systematically tagging the diverse data formats of books, a move that has set the foundation for the growth of this resource. Books at NCBI followed the route of journal articles in the PubMed Central project, using the PubMed Central architectural framework, workflows and processes. Through integration with other NCBI molecular databases, books at NCBI can be used to provide reference information for biological data and facilitate its discovery. This article describes Bookshelf at NCBI: its growth, data handling and retrieval and integration with molecular databases.

  3. iTAG: Integrating a Cloud Based, Collaborative Animal Tracking Network into the GCOOS data portal in the Gulf of Mexico

    NASA Astrophysics Data System (ADS)

    Kirkpatrick, B. A.; Currier, R. D.; Simoniello, C.

    2016-02-01

    The tagging and tracking of aquatic animals using acoustic telemetry hardware has traditionally been the purview of individual researchers that specialize in single species. Concerns over data privacy and unauthorized use of receiver arrays have prevented the construction of large-scale, multi-species, multi-institution, multi-researcher collaborative acoustic arrays. We have developed a toolset to build the new portal using the Flask microframework, Python language, and Twitter bootstrap. Initial feedback has been overwhelmingly positive. The privacy policy has been praised for its granularity: principal investigators can choose between three levels of privacy for all data and hardware: Completely private - viewable only by the PI Visible to iTAG members Visible to the general public At the time of this writing iTAG is still in the beta stage, but the feedback received to date indicates that with the proper design and security features, and an iterative cycle of feedback from potential members, constructing a collaborative acoustic tracking network system is possible. Initial usage will be limited to the entry and searching for `orphan/mystery' tags, with the integration of historical array deployments and data following shortly thereafter. We have also been working with staff from the Ocean Tracking Network to allow for integration of the two systems. The database schema of iTAG is based on the marine metadata convention for acoustic telemetry. This should permit machine-to-machine data exchange between iTAG and OTN. The integration of animal telemetry data into the GCOOS portal will allow researchers to easily access the physiochemical oceanography data, thus allowing for a more in depth understanding of animal response and usage patterns.

  4. Metatranscriptomics of the marine sponge Geodia barretti: tackling phylogeny and function of its microbial community.

    PubMed

    Radax, Regina; Rattei, Thomas; Lanzen, Anders; Bayer, Christoph; Rapp, Hans Tore; Urich, Tim; Schleper, Christa

    2012-05-01

    Geodia barretti is a marine cold-water sponge harbouring high numbers of microorganisms. Significant rates of nitrification have been observed in this sponge, indicating a substantial contribution to nitrogen turnover in marine environments with high sponge cover. In order to get closer insights into the phylogeny and function of the active microbial community and the interaction with its host G. barretti, a metatranscriptomic approach was employed, using the simultaneous analysis of rRNA and mRNA. Of the 262 298 RNA-tags obtained by pyrosequencing, 92% were assigned to ribosomal RNA (ribo-tags). A total of 109 325 SSU rRNA ribo-tags revealed a detailed picture of the community, dominated by group SAR202 of Chloroflexi, candidate phylum Poribacteria and Acidobacteria, which was different in its composition from that obtained in clone libraries prepared form the same samples. Optimized assembly strategies allowed the reconstruction of full-length rRNA sequences from the short ribo-tags for more detailed phylogenetic studies of the dominant taxa. Cells of several phyla were visualized by FISH analyses for confirmation. Of the remaining 21 325 RNA-tags, 10 023 were assigned to mRNA-tags, based on similarities to genes in the databases. A wide range of putative functional gene transcripts from over 10 different phyla were identified among the bacterial mRNA-tags. The most abundant mRNAs were those encoding key metabolic enzymes of nitrification from ammonia-oxidizing archaea as well as candidate genes involved in related processes. Our analysis demonstrates the potential and limits of using a combined rRNA and mRNA approach to explore the microbial community profile, phylogenetic assignments and metabolic activities of a complex, but little explored microbial community. © 2012 Society for Applied Microbiology and Blackwell Publishing Ltd.

  5. Get It Together: Integrating Data with XML.

    ERIC Educational Resources Information Center

    Miller, Ron

    2003-01-01

    Discusses the use of XML for data integration to move data across different platforms, including across the Internet, from a variety of sources. Topics include flexibility; standards; organizing databases; unstructured data and the use of meta tags to encode it with XML information; cost effectiveness; and eliminating client software licenses.…

  6. Transcriptomic analysis of grain amaranth (Amaranthus hypochondriacus) using 454 pyrosequencing: comparison with A. tuberculatus, expression profiling in stems and in response to biotic and abiotic stress

    PubMed Central

    2011-01-01

    Background Amaranthus hypochondriacus, a grain amaranth, is a C4 plant noted by its ability to tolerate stressful conditions and produce highly nutritious seeds. These possess an optimal amino acid balance and constitute a rich source of health-promoting peptides. Although several recent studies, mostly involving subtractive hybridization strategies, have contributed to increase the relatively low number of grain amaranth expressed sequence tags (ESTs), transcriptomic information of this species remains limited, particularly regarding tissue-specific and biotic stress-related genes. Thus, a large scale transcriptome analysis was performed to generate stem- and (a)biotic stress-responsive gene expression profiles in grain amaranth. Results A total of 2,700,168 raw reads were obtained from six 454 pyrosequencing runs, which were assembled into 21,207 high quality sequences (20,408 isotigs + 799 contigs). The average sequence length was 1,064 bp and 930 bp for isotigs and contigs, respectively. Only 5,113 singletons were recovered after quality control. Contigs/isotigs were further incorporated into 15,667 isogroups. All unique sequences were queried against the nr, TAIR, UniRef100, UniRef50 and Amaranthaceae EST databases for annotation. Functional GO annotation was performed with all contigs/isotigs that produced significant hits with the TAIR database. Only 8,260 sequences were found to be homologous when the transcriptomes of A. tuberculatus and A. hypochondriacus were compared, most of which were associated with basic house-keeping processes. Digital expression analysis identified 1,971 differentially expressed genes in response to at least one of four stress treatments tested. These included several multiple-stress-inducible genes that could represent potential candidates for use in the engineering of stress-resistant plants. The transcriptomic data generated from pigmented stems shared similarity with findings reported in developing stems of Arabidopsis and black cottonwood (Populus trichocarpa). Conclusions This study represents the first large-scale transcriptomic analysis of A. hypochondriacus, considered to be a highly nutritious and stress-tolerant crop. Numerous genes were found to be induced in response to (a)biotic stress, many of which could further the understanding of the mechanisms that contribute to multiple stress-resistance in plants, a trait that has potential biotechnological applications in agriculture. PMID:21752295

  7. DNA sequence chromatogram browsing using JAVA and CORBA.

    PubMed

    Parsons, J D; Buehler, E; Hillier, L

    1999-03-01

    DNA sequence chromatograms (traces) are the primary data source for all large-scale genomic and expressed sequence tags (ESTs) sequencing projects. Access to the sequencing trace assists many later analyses, for example contig assembly and polymorphism detection, but obtaining and using traces is problematic. Traces are not collected and published centrally, they are much larger than the base calls derived from them, and viewing them requires the interactivity of a local graphical client with local data. To provide efficient global access to DNA traces, we developed a client/server system based on flexible Java components integrated into other applications including an applet for use in a WWW browser and a stand-alone trace viewer. Client/server interaction is facilitated by CORBA middleware which provides a well-defined interface, a naming service, and location independence. [The software is packaged as a Jar file available from the following URL: http://www.ebi.ac.uk/jparsons. Links to working examples of the trace viewers can be found at http://corba.ebi.ac.uk/EST. All the Washington University mouse EST traces are available for browsing at the same URL.

  8. Insights into the Melipona scutellaris (Hymenoptera, Apidae, Meliponini) fat body transcriptome.

    PubMed

    de Sousa, Cristina Soares; Serrão, José Eduardo; Bonetti, Ana Maria; Amaral, Isabel Marques Rodrigues; Kerr, Warwick Estevam; Maranhão, Andréa Queiroz; Ueira-Vieira, Carlos

    2013-07-01

    The insect fat body is a multifunctional organ analogous to the vertebrate liver. The fat body is involved in the metabolism of juvenile hormone, regulation of environmental stress, production of immunity regulator-like proteins in cells and protein storage. However, very little is known about the molecular mechanisms involved in fat body physiology in stingless bees. In this study, we analyzed the transcriptome of the fat body from the stingless bee Melipona scutellaris. In silico analysis of a set of cDNA library sequences yielded 1728 expressed sequence tags (ESTs) and 997 high-quality sequences that were assembled into 29 contigs and 117 singlets. The BLAST X tool showed that 86% of the ESTs shared similarity with Apis mellifera (honeybee) genes. The M. scutellaris fat body ESTs encoded proteins with roles in numerous physiological processes, including anti-oxidation, phosphorylation, metabolism, detoxification, transmembrane transport, intracellular transport, cell proliferation, protein hydrolysis and protein synthesis. This is the first report to describe a transcriptomic analysis of specific organs of M. scutellaris. Our findings provide new insights into the physiological role of the fat body in stingless bees.

  9. Insights into the Melipona scutellaris (Hymenoptera, Apidae, Meliponini) fat body transcriptome

    PubMed Central

    de Sousa, Cristina Soares; Serrão, José Eduardo; Bonetti, Ana Maria; Amaral, Isabel Marques Rodrigues; Kerr, Warwick Estevam; Maranhão, Andréa Queiroz; Ueira-Vieira, Carlos

    2013-01-01

    The insect fat body is a multifunctional organ analogous to the vertebrate liver. The fat body is involved in the metabolism of juvenile hormone, regulation of environmental stress, production of immunity regulator-like proteins in cells and protein storage. However, very little is known about the molecular mechanisms involved in fat body physiology in stingless bees. In this study, we analyzed the transcriptome of the fat body from the stingless bee Melipona scutellaris. In silico analysis of a set of cDNA library sequences yielded 1728 expressed sequence tags (ESTs) and 997 high-quality sequences that were assembled into 29 contigs and 117 singlets. The BLAST X tool showed that 86% of the ESTs shared similarity with Apis mellifera (honeybee) genes. The M. scutellaris fat body ESTs encoded proteins with roles in numerous physiological processes, including anti-oxidation, phosphorylation, metabolism, detoxification, transmembrane transport, intracellular transport, cell proliferation, protein hydrolysis and protein synthesis. This is the first report to describe a transcriptomic analysis of specific organs of M. scutellaris. Our findings provide new insights into the physiological role of the fat body in stingless bees. PMID:23885214

  10. Identification and Validation of Expressed Sequence Tags from Pigeonpea (Cajanus cajan L.) Root

    PubMed Central

    Kumar, Ravi Ranjan; Yadav, Shailesh; Joshi, Shourabh; Bhandare, Prithviraj P.; Patil, Vinod Kumar; Kulkarni, Pramod B.; Sonkawade, Swati; Naik, G. R.

    2014-01-01

    Pigeonpea (Cajanus cajan (L) Millsp.) is an important food legume crop of rain fed agriculture in the arid and semiarid tropics of the world. It has deep and extensive root system which serves a number of important physiological and metabolic functions in plant development and growth. In order to identify genes associated with pigeonpea root, ESTs were generated from the root tissues of pigeonpea (GRG-295 genotype) by normalized cDNA library. A total of 105 high quality ESTs were generated by sequencing of 250 random clones which resulted in 72 unigenes comprising 25 contigs and 47 singlets. The ESTs were assigned to 9 functional categories on the basis of their putative function. In order to validate the possible expression of transcripts, four genes, namely, S-adenosylmethionine synthetase, phosphoglycerate kinase, serine carboxypeptidase, and methionine aminopeptidase, were further analyzed by reverse transcriptase PCR. The possible role of the identified transcripts and their functions associated with root will also be a valuable resource for the functional genomics study in legume crop. PMID:24895494

  11. Sequence and RT-PCR expression analysis of two peroxidases from Arabidopsis thaliana belonging to a novel evolutionary branch of plant peroxidases.

    PubMed

    Kjaersgård, I V; Jespersen, H M; Rasmussen, S K; Welinder, K G

    1997-03-01

    cDNA clones encoding two new Arabidopsis thaliana peroxidases, ATP 1a and ATP 2a, have been identified by searching the Arabidopsis database of expressed sequence tags (dbEST). They represent a novel branch of hitherto uncharacterized plant peroxidases which is only 35% identical in amino acid sequence to the well characterized group of basic plant peroxidases represented by the horseradish (Armoracia rusticana) isoperoxidases HRP C, HRP E5 and the similar Arabidopsis isoperoxidases ATP Ca, ATP Cb, and ATP Ea. However ATP 1a is 87% identical in amino acid sequence to a peroxidase encoded by an mRNA isolated from cotton (Gossypium hirsutum). As cotton and Arabidopsis belong to rather diverse families (Malvaceae and Crucifereae, respectively), in contrast with Arabidopsis and horseradish (both Crucifereae), the high degree of sequence identity indicates that this novel type of peroxidase, albeit of unknown function, is likely to be widespread in plant species. The atp 1 and atp 2 types of cDNA sequences were the most redundant among the 28 different isoperoxidases identified among about 200 peroxidase encoding ESTs. Interestingly, 8 out of totally 38 EST sequences coding for ATP 1 showed three identical nucleotide substitutions. This variant form is designated ATP 1b. Similarly, six out of totally 16 EST sequences coding for ATP 2 showed a number of deletions and nucleotide changes. This variant form is designated ATP 2b. The selected EST clones are full-length and contain coding regions of 993 nucleotides for atp 1a, and 984 nucleotides for atp 2a. These regions show 61% DNA sequence identity. The predicted mature proteins ATP 1a, and ATP 2a are 57% identical in sequence and contain the structurally and functionally important residues, characteristic of the plant peroxidase superfamily. However, they do show two differences of importance to peroxidase catalysis: (1) the asparagine residue linked with the active site distal histidine via hydrogen bonding is absent; (2) an N-glycosylation site is located right at the entrance to the heme channel. The reverse transcriptase polymerase chain reaction (RT-PCR) was used to identify mRNAs coding for ATP 1a/b and ATP 2a/b in germinating seeds, seedlings, roots, leaves, stems, flowers and cell suspension culture using elongation factor 1alpha (EF-1alpha) for the first time as a positive control. Both mRNAs were transcribed at levels comparable to EF-1alpha in all plant tissues investigated which were more than two days old, and in cell suspension culture. In addition, the mRNA coding for ATP 1a/b was found in two day old germinating seeds. The abundant transcription of ATP 1a/b and ATP 2a/b is in line with their many entries in dbEST, and indicates essential roles for these novel peroxidases.

  12. Gene-based SSR markers for common bean (Phaseolus vulgaris L.) derived from root and leaf tissue ESTs: an integration of the BMc series.

    PubMed

    Blair, Matthew W; Hurtado, Natalia; Chavarro, Carolina M; Muñoz-Torres, Monica C; Giraldo, Martha C; Pedraza, Fabio; Tomkins, Jeff; Wing, Rod

    2011-03-22

    Sequencing of cDNA libraries for the development of expressed sequence tags (ESTs) as well as for the discovery of simple sequence repeats (SSRs) has been a common method of developing microsatellites or SSR-based markers. In this research, our objective was to further sequence and develop common bean microsatellites from leaf and root cDNA libraries derived from the Andean gene pool accession G19833 and the Mesoamerican gene pool accession DOR364, mapping parents of a commonly used reference map. The root libraries were made from high and low phosphorus treated plants. A total of 3,123 EST sequences from leaf and root cDNA libraries were screened and used for direct simple sequence repeat discovery. From these EST sequences we found 184 microsatellites; the majority containing tri-nucleotide motifs, many of which were GC rich (ACC, AGC and AGG in particular). Di-nucleotide motif microsatellites were about half as common as the tri-nucleotide motif microsatellites but most of these were AGn microsatellites with a moderate number of ATn microsatellites in root ESTs followed by few ACn and no GCn microsatellites. Out of the 184 new SSR loci, 120 new microsatellite markers were developed in the BMc (Bean Microsatellites from cDNAs) series and these were evaluated for their capacity to distinguish bean diversity in a germplasm panel of 18 genotypes. We developed a database with images of the microsatellites and their polymorphism information content (PIC), which averaged 0.310 for polymorphic markers. The present study produced information about microsatellite frequency in root and leaf tissues of two important genotypes for common bean genomics: namely G19833, the Andean genotype selected for whole genome shotgun sequencing from race Peru, and DOR364 a race Mesoamerica subgroup 2 genotype that is a small-red seeded, released variety in Central America. Both race Peru and Mesoamerica subgroup 2 (small red beans) have been understudied in comparison to race Nueva Granada and Mesoamerica subgroup 1 (black beans) both with regards to gene expression and as sources of markers. However, we found few differences between SSR type and frequency between the G19833 leaf and DOR364 root tissue-derived ESTs. Overall, our work adds to the analysis of microsatellite frequency evaluation for common bean and provides a new set of 120 BMc markers which combined with the 248 previously developed BMc markers brings the total in this series to 368 markers. Once we include BMd markers, which are derived from GenBank sequences, the current total of gene-based markers from our laboratory surpasses 500 markers. These markers are basic for studies of the transcriptome of common bean and can form anchor points for genetic mapping studies in the future.

  13. A Database for Compounding Stress Intensity Factors.

    DTIC Science & Technology

    1985-04-01

    8217"’.’’" . ." ’" " " " "".. .... .... .... . .. .... . : ,.’. .. MICROCOP RE O UT O ," EST CHART...8217 - .. . , ., . . ,. . .,\\ .- . , , . -, . .- . . . , * TR 85046 o UNLIMITED __ ROYAL AIRCRAFT ESTABLISHMENT O Technical Report 85046 April 1985 A DATABASE FOR COMPOUNDING STRESS INTENSITY FACTORS L W by...DATABASE FOR COMPOUNDING STRESS INTENSITY FACTORS by A. M. Prior . ..- - . .. _ D. P. Rooke D. J. Cartwright* SUMMARY . The compounding method enables

  14. Tagging of Test Tubes with Electronic p-Chips for Use in Biorepositories.

    PubMed

    Mandecki, Wlodek; Kopacka, Wesley M; Qian, Ziye; Ertwine, Von; Gedzberg, Katie; Gruda, Maryann; Reinhardt, David; Rodriguez, Efrain

    2017-08-01

    A system has been developed to electronically tag and track test tubes used in biorepositories. The system is based on a light-activated microtransponder, also known as a "p-Chip." One of the pressing problems with storing and retrieving biological samples at low temperatures is the difficulty of reliably reading the identification (ID) number that links each storage tube with the database containing sample details. Commonly used barcodes are not always reliable at low temperatures because of poor adhesion of the label to the test tube and problems with reading under conditions of frost and ice accumulation. Traditional radio frequency identification (RFID) tags are not cost effective and are too large for this application. The system described herein consists of the p-Chip, p-Chip-tagged test tubes, two ID readers (for single tubes or for racks of tubes), and software. We also describe a robot that is configured for retrofitting legacy test tubes in biorepositories with p-Chips while maintaining the temperature of the sample below -50°C at all times. The main benefits of the p-Chip over other RFID devices are its small size (600 × 600 × 100 μm) that allows even very small tubes or vials to be tagged, low cost due to the chip's unitary construction, durability, and the ability to read the ID through frost and ice.

  15. Construction of new EST-SSRs for Fusarium resistant wheat breeding.

    PubMed

    Yumurtaci, Aysen; Sipahi, Hulya; Al-Abdallat, Ayed; Jighly, Abdulqader; Baum, Michael

    2017-06-01

    Surveying Fusarium resistance in wheat with easy applicable molecular markers such as simple sequence repeats (SSRs) is a prerequest for molecular breeding. Expressed sequence tags (ESTs) are one of the main sources for development of new SSR candidates. Therefore, 18.292 publicly available wheat ESTs were mined and genotyping of newly developed 55 EST-SSR derived primer pairs produced clear fragments in ten wheat cultivars carrying different levels of Fusarium resistance. Among the proved markers, 23 polymorphic EST-SSRs were obtained and related alleles were mostly found on B and D genome. Based on the fragment profiling and similarity analysis, a 327bp amplicon, which was a product of contig 1207 (chromosome 5BL), was detected only in Fusarium head blight (FHB) resistant cultivars (CM82036 and Sumai) and the amino acid sequences showed a similarity to pathogen related proteins. Another FHB resistance related EST-SSR, Contig 556 (chromosome 1BL) produced a 151bp fragment in Sumai and was associated to wax2-like protein. A polymorphic 204bp fragment, derived from Contig 578 (chromosome 1DL), was generated from root rot (FRR) resistant cultivars (2-49; Altay2000 and Sunco). A total of 98 alleles were displayed with an average of 1.8 alleles per locus and the polymorphic information content (PIC) ranged from 0.11 to 0.78. Dendrogram tree with two main and five sub-groups were displayed the highest genetic relationship between FRR resistant cultivars (2-49 and Altay2000), FRR sensitive cultivars (Seri82 and Scout66) and FHB resistant cultivars (CM82036 and Sumai). Thus, exploitation of these candidate EST-SSRs may help to genotype other wheat sources for Fusarium resistance. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. The gene space in wheat: the complete γ-gliadin gene family from the wheat cultivar Chinese Spring.

    PubMed

    Anderson, Olin D; Huo, Naxin; Gu, Yong Q

    2013-06-01

    The complete set of unique γ-gliadin genes is described for the wheat cultivar Chinese Spring using a combination of expressed sequence tag (EST) and Roche 454 DNA sequences. Assemblies of Chinese Spring ESTs yielded 11 different γ-gliadin gene sequences. Two of the sequences encode identical polypeptides and are assumed to be the result of a recent gene duplication. One gene has a 3' coding mutation that changes the reading frame in the final eight codons. A second assembly of Chinese Spring γ-gliadin sequences was generated using Roche 454 total genomic DNA sequences. The 454 assembly confirmed the same 11 active genes as the EST assembly plus two pseudogenes not represented by ESTs. These 13 γ-gliadin sequences represent the complete unique set of γ-gliadin genes for cv Chinese Spring, although not ruled out are additional genes that are exact duplications of these 13 genes. A comparison with the ESTs of two other hexaploid cultivars (Butte 86 and Recital) finds that the most active genes are present in all three cultivars, with exceptions likely due to too few ESTs for detection in Butte 86 and Recital. A comparison of the numbers of ESTs per gene indicates differential levels of expression within the γ-gliadin gene family. Genome assignments were made for 6 of the 13 Chinese Spring γ-gliadin genes, i.e., one assignment from a match to two γ-gliadin genes found within a tetraploid wheat A genome BAC and four genes that match four distinct γ-gliadin sequences assembled from Roche 454 sequences from Aegilops tauschii, the hexaploid wheat D-genome ancestor.

  17. Development and Characterization of 1,906 EST-SSR Markers from Unigenes in Jute (Corchorus spp.)

    PubMed Central

    Zhang, Liwu; Li, Yanru; Tao, Aifen; Fang, Pingping; Qi, Jianmin

    2015-01-01

    Jute, comprising white and dark jute, is the second important natural fiber crop after cotton worldwide. However, the lack of expressed sequence tag-derived simple sequence repeat (EST-SSR) markers has resulted in a large gap in the improvement of jute. Previously, de novo 48,914 unigenes from white jute were assembled. In this study, 1,906 EST-SSRs were identified from these assembled uingenes. Among these markers, di-, tri- and tetra-nucleotide repeat types were the abundant types (12.0%, 56.9% and 21.6% respectively). The AG-rich or GA-rich nucleotide repeats were the predominant. Subsequently, a sample of 116 SSRs, located in genes encoding transcription factors and cellulose synthases, were selected to survey polymorphisms among12 diverse jute accessions. Of these, 83.6% successfully amplified at least one fragment and detected polymorphism among the 12diverse genotypes, indicating that the newly developed SSRs are of good quality. Furthermore, the genetic similarity coefficients of all the 12 accessions were evaluated using 97 polymorphic SSRs. The cluster analysis divided the jute accessions into two main groups with genetic similarity coefficient of 0.61. These EST-SSR markers not only enrich molecular markers of jute genome, but also facilitate genetic and genomic researches in jute. PMID:26512891

  18. Genome-Wide Analysis of Differentially Expressed Genes Relevant to Rhizome Formation in Lotus Root (Nelumbo nucifera Gaertn)

    PubMed Central

    Yin, Jingjing; Li, Liangjun; Chen, Xuehao

    2013-01-01

    Lotus root is a popular wetland vegetable which produces edible rhizome. At the molecular level, the regulation of rhizome formation is very complex, which has not been sufficiently addressed in research. In this study, to identify differentially expressed genes (DEGs) in lotus root, four libraries (L1 library: stolon stage, L2 library: initial swelling stage, L3 library: middle swelling stage, L4: later swelling stage) were constructed from the rhizome development stages. High-throughput tag-sequencing technique was used which is based on Solexa Genome Analyzer Platform. Approximately 5.0 million tags were sequenced, and 4542104, 4474755, 4777919, and 4750348 clean tags including 151282, 137476, 215872, and 166005 distinct tags were obtained after removal of low quality tags from each library respectively. More than 43% distinct tags were unambiguous tags mapping to the reference genes, and 40% were unambiguous tag-mapped genes. From L1, L2, L3, and L4, total 20471, 18785, 23448, and 21778 genes were annotated, after mapping their functions in existing databases. Profiling of gene expression in L1/L2, L2/L3, and L3/L4 libraries were different among most of the selected 20 DEGs. Most of the DEGs in L1/L2 libraries were relevant to fiber development and stress response, while in L2/L3 and L3/L4 libraries, major of the DEGs were involved in metabolism of energy and storage. All up-regulated transcriptional factors in four libraries and 14 important rhizome formation-related genes in four libraries were also identified. In addition, the expression of 9 genes from identified DEGs was performed by qRT-PCR method. In a summary, this study provides a comprehensive understanding of gene expression during the rhizome formation in lotus root. PMID:23840598

  19. Expressed sequence tags (ESTs) and phylogenetic analysis of floral genes from a paleoherb species, Asarum caudigerum.

    PubMed

    Zhao, Yinhe; Wang, Guoying; Zhang, Jinpeng; Yang, Junbo; Peng, Shang; Gao, Lianming; Li, Chengyun; Hu, Jinyong; Li, Dezhu; Gao, Lizhi

    2006-07-01

    Asarum caudigerum (Aristolochiaceae) is an important species of paleoherb in relation to understanding the origin and evolution of angiosperm flowers, due to its basal position in the angiosperms. The aim of this study was to isolate floral-related genes from A. caudigerum, and to infer evolutionary relationships among florally expression-related genes, to further illustrate the origin and diversification of flowers in angiosperms. A subtracted floral cDNA library was constructed from floral buds using suppression subtractive hybridization (SSH). The cDNA of floral buds and leaves at the seedling stage were used as a tester and a driver, respectively. To further identify the function of putative MADS-box transcription factors, phylogenetic trees were reconstructed in order to infer evolutionary relationships within the MADS-box gene family. In the forward-subtracted floral cDNA library, 1920 clones were randomly sequenced, from which 567 unique expressed sequence tags (ESTs) were obtained. Among them, 127 genes failed to show significant similarity to any published sequences in GenBank and thus are putatively novel genes. Phylogenetic analysis indicated that a total of 29 MADS-box transcription factors were members of the APETALA3(AP3) subfamily, while nine others were putative MADS-box transcription factors that formed a cluster with MADS-box genes isolated from Amborella, the basal-most angiosperm, and those from the gymnosperms. This suggests that the origin of A. caudigerum is intermediate between the angiosperms and gymnosperms.

  20. Est10: A Novel Alkaline Esterase Isolated from Bovine Rumen Belonging to the New Family XV of Lipolytic Enzymes

    PubMed Central

    Rodríguez, María Cecilia; Loaces, Inés; Amarelle, Vanesa; Senatore, Daniella; Iriarte, Andrés; Fabiano, Elena; Noya, Francisco

    2015-01-01

    A metagenomic fosmid library from bovine rumen was used to identify clones with lipolytic activity. One positive clone was isolated. The gene responsible for the observed phenotype was identified by in vitro transposon mutagenesis and sequencing and was named est10. The 367 amino acids sequence harbors a signal peptide, the conserved secondary structure arrangement of alpha/beta hydrolases, and a GHSQG pentapeptide which is characteristic of esterases and lipases. Homology based 3D-modelling confirmed the conserved spatial orientation of the serine in a nucleophilic elbow. By sequence comparison, Est10 is related to hydrolases that are grouped into the non-specific Pfam family DUF3089 and to other characterized esterases that were recently classified into the new family XV of lipolytic enzymes. Est10 was heterologously expressed in Escherichia coli as a His-tagged fusion protein, purified and biochemically characterized. Est10 showed maximum activity towards C4 aliphatic chains and undetectable activity towards C10 and longer chains which prompted its classification as an esterase. However, it was able to efficiently catalyze the hydrolysis of aryl esters such as methyl phenylacetate and phenyl acetate. The optimum pH of this enzyme is 9.0, which is uncommon for esterases, and it exhibits an optimal temperature at 40°C. The activity of Est10 was inhibited by metal ions, detergents, chelating agents and additives. We have characterized an alkaline esterase produced by a still unidentified bacterium belonging to a recently proposed new family of esterases. PMID:25973851

  1. Construction of a full-length cDNA Library from Chinese oak silkworm pupa and identification of a KK-42-binding protein gene in relation to pupa-diapause termination.

    PubMed

    Li, Yu-Ping; Xia, Run-Xi; Wang, Huan; Li, Xi-Sheng; Liu, Yan-Qun; Wei, Zhao-Jun; Lu, Cheng; Xiang, Zhong-Huai

    2009-06-24

    In this study we successfully constructed a full-length cDNA library from Chinese oak silkworm, Antheraea pernyi, the most well-known wild silkworm used for silk production and insect food. Total RNA was extracted from a single fresh female pupa at the diapause stage. The titer of the library was 5 x 10(5) cfu/ml and the proportion of recombinant clones was approximately 95%. Expressed sequence tag (EST) analysis was used to characterize the library. A total of 175 clustered ESTs consisting of 24 contigs and 151 singlets were generated from 250 effective sequences. Of the 175 unigenes, 97 (55.4%) were known genes but only five from A. pernyi, 37 (21.2%) were known ESTs without function annotation, and 41 (23.4%) were novel ESTs. By EST sequencing, a gene coding KK-42-binding protein in A. pernyi (named as ApKK42-BP; GenBank accession no. FJ744151) was identified and characterized. Protein sequence analysis showed that ApKK42-BP was not a membrane protein but an extracellular protein with a signal peptide at position 1-18, and contained two putative conserved domains, abhydro_lipase and abhydrolase_1, suggesting it may be a member of lipase superfamily. Expression analysis based on number of ESTs showed that ApKK42-BP was an abundant gene in the period of diapause stage, suggesting it may also be involved in pupa-diapause termination.

  2. Construction of a full-length cDNA Library from Chinese oak silkworm pupa and identification of a KK-42-binding protein gene in relation to pupa-diapause termination

    PubMed Central

    Li, Yu-Ping; Xia, Run-Xi; Wang, Huan; Li, Xi-Sheng; Liu, Yan-Qun; Wei, Zhao-Jun; Lu, Cheng; Xiang, Zhong-Huai

    2009-01-01

    In this study we successfully constructed a full-length cDNA library from Chinese oak silkworm, Antheraea pernyi, the most well-known wild silkworm used for silk production and insect food. Total RNA was extracted from a single fresh female pupa at the diapause stage. The titer of the library was 5 × 105 cfu/ml and the proportion of recombinant clones was approximately 95%. Expressed sequence tag (EST) analysis was used to characterize the library. A total of 175 clustered ESTs consisting of 24 contigs and 151 singlets were generated from 250 effective sequences. Of the 175 unigenes, 97 (55.4%) were known genes but only five from A. pernyi, 37 (21.2%) were known ESTs without function annotation, and 41 (23.4%) were novel ESTs. By EST sequencing, a gene coding KK-42-binding protein in A. pernyi (named as ApKK42-BP; GenBank accession no. FJ744151) was identified and characterized. Protein sequence analysis showed that ApKK42-BP was not a membrane protein but an extracellular protein with a signal peptide at position 1-18, and contained two putative conserved domains, abhydro_lipase and abhydrolase_1, suggesting it may be a member of lipase superfamily. Expression analysis based on number of ESTs showed that ApKK42-BP was an abundant gene in the period of diapause stage, suggesting it may also be involved in pupa-diapause termination. PMID:19564928

  3. Identification and biochemical characterization of a GDSL-motif carboxylester hydrolase from Carica papaya latex.

    PubMed

    Abdelkafi, Slim; Ogata, Hiroyuki; Barouh, Nathalie; Fouquet, Benjamin; Lebrun, Régine; Pina, Michel; Scheirlinckx, Frantz; Villeneuve, Pierre; Carrière, Frédéric

    2009-11-01

    An esterase (CpEst) showing high specific activities on tributyrin and short chain vinyl esters was obtained from Carica papaya latex after an extraction step with zwitterionic detergent and sonication, followed by gel filtration chromatography. Although the protein could not be purified to complete homogeneity due to its presence in high molecular mass aggregates, a major protein band with an apparent molecular mass of 41 kDa was obtained by SDS-PAGE. This material was digested with trypsin and the amino acid sequences of the tryptic peptides were determined by LC/ESI/MS/MS. These sequences were used to identify a partial cDNA (679 bp) from expressed sequence tags (ESTs) of C. papaya. Based upon EST sequences, a full-length gene was identified in the genome of C. papaya, with an open reading frame of 1029 bp encoding a protein of 343 amino acid residues, with a theoretical molecular mass of 38 kDa. From sequence analysis, CpEst was identified as a GDSL-motif carboxylester hydrolase belonging to the SGNH protein family and four potential N-glycosylation sites were identified. The putative catalytic triad was localised (Ser(35)-Asp(307)-His(310)) with the nucleophile serine being part of the GDSL-motif. A 3D-model of CpEst was built from known X-ray structures and sequence alignments and the catalytic triad was found to be exposed at the surface of the molecule, thus confirming the results of CpEst inhibition by tetrahydrolipstatin suggesting a direct accessibility of the inhibitor to the active site.

  4. Est10: A Novel Alkaline Esterase Isolated from Bovine Rumen Belonging to the New Family XV of Lipolytic Enzymes.

    PubMed

    Rodríguez, María Cecilia; Loaces, Inés; Amarelle, Vanesa; Senatore, Daniella; Iriarte, Andrés; Fabiano, Elena; Noya, Francisco

    2015-01-01

    A metagenomic fosmid library from bovine rumen was used to identify clones with lipolytic activity. One positive clone was isolated. The gene responsible for the observed phenotype was identified by in vitro transposon mutagenesis and sequencing and was named est10. The 367 amino acids sequence harbors a signal peptide, the conserved secondary structure arrangement of alpha/beta hydrolases, and a GHSQG pentapeptide which is characteristic of esterases and lipases. Homology based 3D-modelling confirmed the conserved spatial orientation of the serine in a nucleophilic elbow. By sequence comparison, Est10 is related to hydrolases that are grouped into the non-specific Pfam family DUF3089 and to other characterized esterases that were recently classified into the new family XV of lipolytic enzymes. Est10 was heterologously expressed in Escherichia coli as a His-tagged fusion protein, purified and biochemically characterized. Est10 showed maximum activity towards C4 aliphatic chains and undetectable activity towards C10 and longer chains which prompted its classification as an esterase. However, it was able to efficiently catalyze the hydrolysis of aryl esters such as methyl phenylacetate and phenyl acetate. The optimum pH of this enzyme is 9.0, which is uncommon for esterases, and it exhibits an optimal temperature at 40 °C. The activity of Est10 was inhibited by metal ions, detergents, chelating agents and additives. We have characterized an alkaline esterase produced by a still unidentified bacterium belonging to a recently proposed new family of esterases.

  5. Intelligent Text Retrieval and Knowledge Acquisition from Texts for NASA Applications: Preprocessing Issues

    NASA Technical Reports Server (NTRS)

    2002-01-01

    A system that retrieves problem reports from a NASA database is described. The database is queried with natural language questions. Part-of-speech tags are first assigned to each word in the question using a rule based tagger. A partial parse of the question is then produced with independent sets of deterministic finite state a utomata. Using partial parse information, a look up strategy searches the database for problem reports relevant to the question. A bigram stemmer and irregular verb conjugates have been incorporated into the system to improve accuracy. The system is evaluated by a set of fifty five questions posed by NASA engineers. A discussion of future research is also presented.

  6. Transcriptome analysis of Schistosoma mansoni larval development using serial analysis of gene expression (SAGE).

    PubMed

    Taft, A S; Vermeire, J J; Bernier, J; Birkeland, S R; Cipriano, M J; Papa, A R; McArthur, A G; Yoshino, T P

    2009-04-01

    Infection of the snail, Biomphalaria glabrata, by the free-swimming miracidial stage of the human blood fluke, Schistosoma mansoni, and its subsequent development to the parasitic sporocyst stage is critical to establishment of viable infections and continued human transmission. We performed a genome-wide expression analysis of the S. mansoni miracidia and developing sporocyst using Long Serial Analysis of Gene Expression (LongSAGE). Five cDNA libraries were constructed from miracidia and in vitro cultured 6- and 20-day-old sporocysts maintained in sporocyst medium (SM) or in SM conditioned by previous cultivation with cells of the B. glabrata embryonic (Bge) cell line. We generated 21 440 SAGE tags and mapped 13 381 to the S. mansoni gene predictions (v4.0e) either by estimating theoretical 3' UTR lengths or using existing 3' EST sequence data. Overall, 432 transcripts were found to be differentially expressed amongst all 5 libraries. In total, 172 tags were differentially expressed between miracidia and 6-day conditioned sporocysts and 152 were differentially expressed between miracidia and 6-day unconditioned sporocysts. In addition, 53 and 45 tags, respectively, were differentially expressed in 6-day and 20-day cultured sporocysts, due to the effects of exposure to Bge cell-conditioned medium.

  7. A database of expressed genes from the new world screwworm, Cochliomyia hominivorax (Coquerel) (Diptera: Calliphoridae)

    USDA-ARS?s Scientific Manuscript database

    We used an expressed sequence tag and 454 pyrosequencing approach to initiate a study of the genome of the New World Screwworm, Cochliomyia hominivorax (Coquerel). Two normalized cDNA libraries were constructed from RNA isolated from embryos and 2nd instar larvae from the Panama 95 strain. Approxima...

  8. Identification and characterization of TCRgamma and TCRdelta chains in channel catfish, Ictalurus punctatus

    USDA-ARS?s Scientific Manuscript database

    Channel catfish, Ictalurus punctatus, T cell receptors (TCR) gamma and delta were identified by mining of expressed sequence tag databases and full length sequences were obtained by 5'-RACE and RT-PCR protocols. cDNAs for each of these TCR chains encode typical variable (V), (diversity; D), joining ...

  9. Web 2.0 in the Professional LIS Literature: An Exploratory Analysis

    ERIC Educational Resources Information Center

    Aharony, Noa

    2011-01-01

    This paper presents a statistical descriptive analysis and a thorough content analysis of descriptors and journal titles extracted from the Library and Information Science Abstracts (LISA) database, focusing on the subject of Web 2.0 and its main applications: blog, wiki, social network and tags.The primary research questions include: whether the…

  10. Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions

    PubMed Central

    Argout, Xavier; Fouet, Olivier; Wincker, Patrick; Gramacho, Karina; Legavre, Thierry; Sabau, Xavier; Risterucci, Ange Marie; Da Silva, Corinne; Cascardo, Julio; Allegre, Mathilde; Kuhn, David; Verica, Joseph; Courtois, Brigitte; Loor, Gaston; Babin, Regis; Sounigo, Olivier; Ducamp, Michel; Guiltinan, Mark J; Ruiz, Manuel; Alemanno, Laurence; Machado, Regina; Phillips, Wilberth; Schnell, Ray; Gilmour, Martin; Rosenquist, Eric; Butler, David; Maximova, Siela; Lanaud, Claire

    2008-01-01

    Background Theobroma cacao L., is a tree originated from the tropical rainforest of South America. It is one of the major cash crops for many tropical countries. T. cacao is mainly produced on smallholdings, providing resources for 14 million farmers. Disease resistance and T. cacao quality improvement are two important challenges for all actors of cocoa and chocolate production. T. cacao is seriously affected by pests and fungal diseases, responsible for more than 40% yield losses and quality improvement, nutritional and organoleptic, is also important for consumers. An international collaboration was formed to develop an EST genomic resource database for cacao. Results Fifty-six cDNA libraries were constructed from different organs, different genotypes and different environmental conditions. A total of 149,650 valid EST sequences were generated corresponding to 48,594 unigenes, 12,692 contigs and 35,902 singletons. A total of 29,849 unigenes shared significant homology with public sequences from other species. Gene Ontology (GO) annotation was applied to distribute the ESTs among the main GO categories. A specific information system (ESTtik) was constructed to process, store and manage this EST collection allowing the user to query a database. To check the representativeness of our EST collection, we looked for the genes known to be involved in two different metabolic pathways extensively studied in other plant species and important for T. cacao qualities: the flavonoid and the terpene pathways. Most of the enzymes described in other crops for these two metabolic pathways were found in our EST collection. A large collection of new genetic markers was provided by this ESTs collection. Conclusion This EST collection displays a good representation of the T. cacao transcriptome, suitable for analysis of biochemical pathways based on oligonucleotide microarrays derived from these ESTs. It will provide numerous genetic markers that will allow the construction of a high density gene map of T. cacao. This EST collection represents a unique and important molecular resource for T. cacao study and improvement, facilitating the discovery of candidate genes for important T. cacao trait variation. PMID:18973681

  11. Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions.

    PubMed

    Argout, Xavier; Fouet, Olivier; Wincker, Patrick; Gramacho, Karina; Legavre, Thierry; Sabau, Xavier; Risterucci, Ange Marie; Da Silva, Corinne; Cascardo, Julio; Allegre, Mathilde; Kuhn, David; Verica, Joseph; Courtois, Brigitte; Loor, Gaston; Babin, Regis; Sounigo, Olivier; Ducamp, Michel; Guiltinan, Mark J; Ruiz, Manuel; Alemanno, Laurence; Machado, Regina; Phillips, Wilberth; Schnell, Ray; Gilmour, Martin; Rosenquist, Eric; Butler, David; Maximova, Siela; Lanaud, Claire

    2008-10-30

    Theobroma cacao L., is a tree originated from the tropical rainforest of South America. It is one of the major cash crops for many tropical countries. T. cacao is mainly produced on smallholdings, providing resources for 14 million farmers. Disease resistance and T. cacao quality improvement are two important challenges for all actors of cocoa and chocolate production. T. cacao is seriously affected by pests and fungal diseases, responsible for more than 40% yield losses and quality improvement, nutritional and organoleptic, is also important for consumers. An international collaboration was formed to develop an EST genomic resource database for cacao. Fifty-six cDNA libraries were constructed from different organs, different genotypes and different environmental conditions. A total of 149,650 valid EST sequences were generated corresponding to 48,594 unigenes, 12,692 contigs and 35,902 singletons. A total of 29,849 unigenes shared significant homology with public sequences from other species.Gene Ontology (GO) annotation was applied to distribute the ESTs among the main GO categories.A specific information system (ESTtik) was constructed to process, store and manage this EST collection allowing the user to query a database.To check the representativeness of our EST collection, we looked for the genes known to be involved in two different metabolic pathways extensively studied in other plant species and important for T. cacao qualities: the flavonoid and the terpene pathways. Most of the enzymes described in other crops for these two metabolic pathways were found in our EST collection.A large collection of new genetic markers was provided by this ESTs collection. This EST collection displays a good representation of the T. cacao transcriptome, suitable for analysis of biochemical pathways based on oligonucleotide microarrays derived from these ESTs. It will provide numerous genetic markers that will allow the construction of a high density gene map of T. cacao. This EST collection represents a unique and important molecular resource for T. cacao study and improvement, facilitating the discovery of candidate genes for important T. cacao trait variation.

  12. Acoustic telemetry observation systems: challenges encountered and overcome in the Laurentian Great Lakes

    USGS Publications Warehouse

    Krueger, Charles C.; Holbrook, Christopher; Binder, Thomas R.; Vandergoot, Christopher; Hayden, Todd A.; Hondorp, Darryl W.; Nate, Nancy; Paige, Kelli; Riley, Stephen; Fisk, Aaron T.; Cooke, Steven J.

    2017-01-01

    The Great Lakes Acoustic Telemetry Observation System (GLATOS), organized in 2012, aims to advance and improve conservation and management of Great Lakes fishes by providing information on behavior, habitat use, and population dynamics. GLATOS faced challenges during establishment, including a funding agency-imposed urgency to initiate projects, a lack of telemetry expertise, and managing a flood of data. GLATOS now connects 190+ investigators, provides project consultation, maintains a web-based data portal, contributes data to Ocean Tracking Network’s global database, loans equipment, and promotes science transfer to managers. The GLATOS database currently has 50+ projects, 39 species tagged, 8000+ fish released, and 150+ million tag detections. Lessons learned include (1) seek advice from others experienced in telemetry; (2) organize networks prior to when shared data is urgently needed; (3) establish a data management system so that all receivers can contribute to every project; (4) hold annual meetings to foster relationships; (5) involve fish managers to ensure relevancy; and (6) staff require full-time commitment to lead and coordinate projects and to analyze data and publish results.

  13. Hypergraph topological quantities for tagged social networks.

    PubMed

    Zlatić, Vinko; Ghoshal, Gourab; Caldarelli, Guido

    2009-09-01

    Recent years have witnessed the emergence of a new class of social networks, which require us to move beyond previously employed representations of complex graph structures. A notable example is that of the folksonomy, an online process where users collaboratively employ tags to resources to impart structure to an otherwise undifferentiated database. In a recent paper, we proposed a mathematical model that represents these structures as tripartite hypergraphs and defined basic topological quantities of interest. In this paper, we extend our model by defining additional quantities such as edge distributions, vertex similarity and correlations as well as clustering. We then empirically measure these quantities on two real life folksonomies, the popular online photo sharing site Flickr and the bookmarking site CiteULike. We find that these systems share similar qualitative features with the majority of complex networks that have been previously studied. We propose that the quantities and methodology described here can be used as a standard tool in measuring the structure of tagged networks.

  14. Hypergraph topological quantities for tagged social networks

    NASA Astrophysics Data System (ADS)

    Zlatić, Vinko; Ghoshal, Gourab; Caldarelli, Guido

    2009-09-01

    Recent years have witnessed the emergence of a new class of social networks, which require us to move beyond previously employed representations of complex graph structures. A notable example is that of the folksonomy, an online process where users collaboratively employ tags to resources to impart structure to an otherwise undifferentiated database. In a recent paper, we proposed a mathematical model that represents these structures as tripartite hypergraphs and defined basic topological quantities of interest. In this paper, we extend our model by defining additional quantities such as edge distributions, vertex similarity and correlations as well as clustering. We then empirically measure these quantities on two real life folksonomies, the popular online photo sharing site Flickr and the bookmarking site CiteULike. We find that these systems share similar qualitative features with the majority of complex networks that have been previously studied. We propose that the quantities and methodology described here can be used as a standard tool in measuring the structure of tagged networks.

  15. D-Tagatose Is a Promising Sweetener to Control Glycaemia: A New Functional Food

    PubMed Central

    Angarita Dávila, Lisse

    2018-01-01

    The objective of the current research was to review and update evidence on the dietary effect of the consumption of tagatose in type 2 diabetes, as well as to elucidate the current approach that exists on its production and biotechnological utility in functional food for diabetics. Articles published before July 1, 2017, were included in the databases PubMed, EBSCO, Google Scholar, and Scielo, including the terms “Tagatose”, “Sweeteners”, “Diabetes Mellitus type 2”, “Sweeteners”, “D-Tag”. D-Tagatose (D-tag) is an isomer of fructose which is approximately 90% sweeter than sucrose. Preliminary studies in animals and preclinical studies showed that D-tag decreased glucose levels, which generated great interest in the scientific community. Recent studies indicate that tagatose has low glycemic index, a potent hypoglycemic effect, and eventually could be associated with important benefits for the treatment of obesity. The authors concluded that D-tag is promising as a sweetener without major adverse effects observed in these clinical studies. PMID:29546070

  16. Part-based deep representation for product tagging and search

    NASA Astrophysics Data System (ADS)

    Chen, Keqing

    2017-06-01

    Despite previous studies, tagging and indexing the product images remain challenging due to the large inner-class variation of the products. In the traditional methods, the quantized hand-crafted features such as SIFTs are extracted as the representation of the product images, which are not discriminative enough to handle the inner-class variation. For discriminative image representation, this paper firstly presents a novel deep convolutional neural networks (DCNNs) architect true pre-trained on a large-scale general image dataset. Compared to the traditional features, our DCNNs representation is of more discriminative power with fewer dimensions. Moreover, we incorporate the part-based model into the framework to overcome the negative effect of bad alignment and cluttered background and hence the descriptive ability of the deep representation is further enhanced. Finally, we collect and contribute a well-labeled shoe image database, i.e., the TBShoes, on which we apply the part-based deep representation for product image tagging and search, respectively. The experimental results highlight the advantages of the proposed part-based deep representation.

  17. Comparative analysis of expressed sequence tags of conifers and angiosperms reveals sequences specifically conserved in conifers.

    PubMed

    Ujino-Ihara, Tokuko; Kanamori, Hiroyuki; Yamane, Hiroko; Taguchi, Yuriko; Namiki, Nobukazu; Mukai, Yuzuru; Yoshimura, Kensuke; Tsumura, Yoshihiko

    2005-12-01

    To identify and characterize lineage-specific genes of conifers, two sets of ESTs (with 12791 and 5902 ESTs, representing 5373 and 3018 gene transcripts, respectively) were generated from the Cupressaceae species Cryptomeria japonica and Chamaecyparis obtusa. These transcripts were compared with non-redundant sets of genes generated from Pinaceae species, other gymnosperms and angiosperms. About 6% of tentative unique genes (Unigenes) of C. japonica and C. obtusa had homologs in other conifers but not angiosperms, and about 70% had apparent homologs in angiosperms. The calculated GC contents of orthologous genes showed that GC contents of coniferous genes are likely to be lower than those of angiosperms. Comparisons of the numbers of homologous genes in each species suggest that copy numbers of genes may be correlated between diverse seed plants. This correlation suggests that the multiplicity of such genes may have arisen before the divergence of gymnosperms and angiosperms.

  18. Development of 23 novel polymorphic EST-SSR markers for the endangered relict conifer Metasequoia glyptostroboides.

    PubMed

    Jin, Yuqing; Bi, Quanxin; Guan, Wenbin; Mao, Jian-Feng

    2015-09-01

    Metasequoia glyptostroboides is an endangered relict conifer species endemic to China. In this study, expressed sequence tag-simple sequence repeat (EST-SSR) markers were developed using transcriptome mining for future genetic and functional studies. We collected 97,565 unigene sequences generated by 454 pyrosequencing. A bioinformatics analysis identified 2087 unique and putative microsatellites, from which 96 novel microsatellite markers were developed. Fifty-three of the 96 primer sets successfully amplified clear fragments of the expected sizes; 23 of those loci were polymorphic. The number of alleles per locus ranged from two to eight, with an average of three, and the observed and expected heterozygosity values ranged from 0 to 1.0 and 0.117 to 0.813, respectively. These microsatellite loci will enrich the genetic resources to develop functional studies and conservation strategies for this endangered relict species.

  19. De novo transcriptomic analysis and development of EST-SSRs for Sorbus pohuashanensis (Hance) Hedl.

    PubMed Central

    Guan, Xuelian; Fu, Qiang; Zhang, Ze; Hu, Zenghui; Zheng, Jian; Lu, Yizeng; Li, Wei

    2017-01-01

    Sorbus pohuashanensis is a native tree species of northern China that is used for a variety of ecological purposes. The species is often grown as an ornamental landscape tree because of its beautiful form, silver flowers in early summer, attractive pinnate leaves in summer, and red leaves and fruits in autumn. However, development and further utilization of the species are hindered by the lack of comprehensive genetic information, which impedes research into its genetics and molecular biology. Recent advances in de novo transcriptome sequencing (RNA-seq) technology have provided an effective means to obtain genomic information from non-model species. Here, we applied RNA-seq for sequencing S. pohuashanensis leaves and obtained a total of 137,506 clean reads. After assembly, 96,213 unigenes with an average length of 770 bp were obtained. We found that 64.5% of the unigenes could be annotated using bioinformatics tools to analyze gene function and alignment with the NCBI database. Overall, 59,089 unigenes were annotated using the Nr database(non-redundant protein database), 35,225 unigenes were annotated using the GO (Gene Ontology categories) database, and 33,168 unigenes were annotated using COG (Cluster of Orthologous Groups). Analysis of the unigenes using the KEGG (Kyoto Encyclopedia of Genes and Genomes) database indicated that 13,953 unigenes were involved in 322 metabolic pathways. Finally, simple sequence repeat (SSR) site detection identified 6,604 unigenes that included EST-SSRs and a total of 7,473 EST-SSRs in the unigene sequences. Fifteen polymorphic SSRs were screened and found to be of use for future genetic research. These unigene sequences will provide important genetic resources for genetic improvement and investigation of biochemical processes in S. pohuashanensis. PMID:28614366

  20. Assessing Anthropogenic Impacts on Tunas, Sharks and Billfishes with Direct Observations of Human Fishers on the High Seas

    NASA Astrophysics Data System (ADS)

    Block, B.; Ferretti, F.; White, T.; De Leo, G.; Hazen, E. L.; Bograd, S. J.

    2016-12-01

    Anthropogenic impacts on marine predators have been examined within exclusive economic zones, but few data sets have enabled assessing human fishing impacts on the high seas. By combining large electronic tagging databases archiving mobile predator movements (e.g. Tagging of Pacific Pelagics, TAG A Giant, Animal Telemetry Network) with the global fishing catch and fishing effort data, from satellite tracks of vessels on the high seas (AIS), a better understanding of human use and exploitation at a global scale can be obtained. This capacity to combine the movements of mobile ocean predators (tunas, sharks, billfishes) with analyses of their human predator's behaviors, via examination of the global fishing fleet activities is unprecedented due to the new access researchers are garnering to these big satellite derived AIS databases. Global Fishing Watch is one example of such a data provider, that now makes accessible, the AIS data from the global community of maritime vessels, and has developed along with researchers new algorithms that delineate distinct types of fishing vessel behaviors (longline, purse seiner) and effort. When combined with satellite tagging data of mobile apex predators, oceanographic preferences, records of fishing fleets catches, targeted species and economic drivers of fisheries, new quantitative insights can be gained about the catch reporting of fleets, and the pelagic species targeted at a global scale. Research communities can now also examine how humans behave on the high seas, and potentially improve how fish stocks, such as tunas, billfishes, and sharks are exploited. The capacity to gather information on diverse human fishing fleets and behaviors remotely, should provide a wealth of new tools that can potentially be applied toward the resource management efforts surrounding these global fishing fleets. This type of information is essential for prioritizing regions of conservation concern for megaufauna swimming in our oceans.

  1. Transcriptional profiling reveals the expression of novel genes in response to various stimuli in the human dermatophyte Trichophyton rubrum

    PubMed Central

    2010-01-01

    Background Cutaneous mycoses are common human infections among healthy and immunocompromised hosts, and the anthropophilic fungus Trichophyton rubrum is the most prevalent microorganism isolated from such clinical cases worldwide. The aim of this study was to determine the transcriptional profile of T. rubrum exposed to various stimuli in order to obtain insights into the responses of this pathogen to different environmental challenges. Therefore, we generated an expressed sequence tag (EST) collection by constructing one cDNA library and nine suppression subtractive hybridization libraries. Results The 1388 unigenes identified in this study were functionally classified based on the Munich Information Center for Protein Sequences (MIPS) categories. The identified proteins were involved in transcriptional regulation, cellular defense and stress, protein degradation, signaling, transport, and secretion, among other functions. Analysis of these unigenes revealed 575 T. rubrum sequences that had not been previously deposited in public databases. Conclusion In this study, we identified novel T. rubrum genes that will be useful for ORF prediction in genome sequencing and facilitating functional genome analysis. Annotation of these expressed genes revealed metabolic adaptations of T. rubrum to carbon sources, ambient pH shifts, and various antifungal drugs used in medical practice. Furthermore, challenging T. rubrum with cytotoxic drugs and ambient pH shifts extended our understanding of the molecular events possibly involved in the infectious process and resistance to antifungal drugs. PMID:20144196

  2. OntoMate: a text-mining tool aiding curation at the Rat Genome Database

    PubMed Central

    Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558

  3. Transcriptional Profiles of Mating-Responsive Genes from Testes and Male Accessory Glands of the Mediterranean Fruit Fly, Ceratitis capitata

    PubMed Central

    Scolari, Francesca; Gomulski, Ludvik M.; Ribeiro, José M. C.; Siciliano, Paolo; Meraldi, Alice; Falchetto, Marco; Bonomi, Angelica; Manni, Mosè; Gabrieli, Paolo; Malovini, Alberto; Bellazzi, Riccardo; Aksoy, Serap; Gasperi, Giuliano; Malacrida, Anna R.

    2012-01-01

    Background Insect seminal fluid is a complex mixture of proteins, carbohydrates and lipids, produced in the male reproductive tract. This seminal fluid is transferred together with the spermatozoa during mating and induces post-mating changes in the female. Molecular characterization of seminal fluid proteins in the Mediterranean fruit fly, Ceratitis capitata, is limited, although studies suggest that some of these proteins are biologically active. Methodology/Principal Findings We report on the functional annotation of 5914 high quality expressed sequence tags (ESTs) from the testes and male accessory glands, to identify transcripts encoding putative secreted peptides that might elicit post-mating responses in females. The ESTs were assembled into 3344 contigs, of which over 33% produced no hits against the nr database, and thus may represent novel or rapidly evolving sequences. Extraction of the coding sequences resulted in a total of 3371 putative peptides. The annotated dataset is available as a hyperlinked spreadsheet. Four hundred peptides were identified with putative secretory activity, including odorant binding proteins, protease inhibitor domain-containing peptides, antigen 5 proteins, mucins, and immunity-related sequences. Quantitative RT-PCR-based analyses of a subset of putative secretory protein-encoding transcripts from accessory glands indicated changes in their abundance after one or more copulations when compared to virgin males of the same age. These changes in abundance, particularly evident after the third mating, may be related to the requirement to replenish proteins to be transferred to the female. Conclusions/Significance We have developed the first large-scale dataset for novel studies on functions and processes associated with the reproductive biology of Ceratitis capitata. The identified genes may help study genome evolution, in light of the high adaptive potential of the medfly. In addition, studies of male recovery dynamics in terms of accessory gland gene expression profiles and correlated remating inhibition mechanisms may permit the improvement of pest management approaches. PMID:23071645

  4. Mining the enzymes involved in the detoxification of reactive oxygen species (ROS) in sugarcane.

    PubMed

    Kurama, Eiko E; Fenille, Roseli C; Rosa, Vicente E; Rosa, Daniel D; Ulian, Eugenio C

    2002-07-01

    Summary Adopting the sequencing of expressed sequence tags (ESTs) of a sugarcane database derived from libraries induced and not induced by pathogens, we identified EST clusters homologous to genes corresponding to enzymes involved in the detoxification of reactive oxygen species. The predicted amino acids of these enzymes are superoxide dismutases (SODs), glutathione-S-transferase (GST), glutathione peroxidase (GPX), and catalases. Three MnSOD mitochondrial precursors and 10 CuZnSOD were identified in sugarcane: the MnSOD mitochondrial precursor is 96% similar to the maize MnSOD mitochondrial precursor and, of the 10 CuZnSOD identified, seven were 98% identical to maize cytosolic CuZnSOD4 and one was 67% identical to putative peroxisomal CuZnSOD from Arabidopsis. Three homologues to class Phi GST were 87-88% identical to GST III from maize. Five GPX homologues were identified: three were homologous to cytosolic GPX from barley, one was 88% identical to phospholipid hydroperoxide glutathione peroxidase (PHGPX) from rice, and the last was 71% identical to GPX from A. thaliana. Three enzymes similar to maize catalase were identified in sugarcane: two were similar to catalase isozyme 3 and catalase chain 3 from maize, which are mitochondrial, and one was similar to catalase isozyme 1 from maize, whose location is peroxisomal subcellular. All enzymes were induced in all sugarcane libraries (flower, seed, root, callus, leaves) and also in the pathogen-induced libraries, except for CuZnSOD whose cDNA was detected in none of the libraries induced by pathogens (Acetobacter diazotroficans and Herbaspirillum rubrisubalbicans). The expression of the enzymes SOD, GST, GPX, and catalases involved in the detoxification was examined using reverse transcriptase-polymerase chain reaction in cDNA from leaves of sugarcane under biotic stress conditions, inoculated with Puccinia melanocephala, the causal agent of sugarcane rust disease.

  5. Tissue-Specific Transcriptomics of the Exotic Invasive Insect Pest Emerald Ash Borer (Agrilus planipennis)

    PubMed Central

    Mittapalli, Omprakash; Bai, Xiaodong; Bonello, Pierluigi; Herms, Daniel A.

    2010-01-01

    Background The insect midgut and fat body represent major tissue interfaces that deal with several important physiological functions including digestion, detoxification and immune response. The emerald ash borer (Agrilus planipennis), is an exotic invasive insect pest that has killed millions of ash trees (Fraxinus spp.) primarily in the Midwestern United States and Ontario, Canada. However, despite its high impact status little knowledge exists for A. planipennis at the molecular level. Methodology and Principal Findings Newer-generation Roche-454 pyrosequencing was used to obtain 126,185 reads for the midgut and 240,848 reads for the fat body, which were assembled into 25,173 and 37,661 high quality expressed sequence tags (ESTs) for the midgut and the fat body of A. planipennis larvae, respectively. Among these ESTs, 36% of the midgut and 38% of the fat body sequences showed similarity to proteins in the GenBank nr database. A high number of the midgut sequences contained chitin-binding peritrophin (248)and trypsin (98) domains; while the fat body sequences showed high occurrence of cytochrome P450s (85) and protein kinase (123) domains. Further, the midgut transcriptome of A. planipennis revealed putative microbial transcripts encoding for cell-wall degrading enzymes such as polygalacturonases and endoglucanases. A significant number of SNPs (137 in midgut and 347 in fat body) and microsatellite loci (317 in midgut and 571 in fat body) were predicted in the A. planipennis transcripts. An initial assessment of cytochrome P450s belonging to various CYP clades revealed distinct expression patterns at the tissue level. Conclusions and Significance To our knowledge this study is one of the first to illuminate tissue-specific gene expression in an invasive insect of high ecological and economic consequence. These findings will lay the foundation for future gene expression and functional studies in A. planipennis. PMID:21060843

  6. Transcriptome Characterization of Cymbidium sinense 'Dharma' Using 454 Pyrosequencing and Its Application in the Identification of Genes Associated with Leaf Color Variation.

    PubMed

    Zhu, Genfa; Yang, Fengxi; Shi, Shanshan; Li, Dongmei; Wang, Zhen; Liu, Hailin; Huang, Dan; Wang, Caiyun

    2015-01-01

    The highly variable leaf color of Cymbidium sinense significantly improves its horticultural and economic value, and makes it highly desirable in the flower markets in China and Southeast Asia. However, little is understood about the molecular mechanism underlying leaf-color variations. In this study, we found the content of photosynthetic pigments, especially chlorophyll degradation metabolite in the leaf-color mutants is distinguished significantly from that in the wild type of Cymbidium sinense 'Dharma'. To further determine the candidate genes controlling leaf-color variations, we first sequenced the global transcriptome using 454 pyrosequencing. More than 0.7 million expressed sequence tags (ESTs) with an average read length of 445.9 bp were generated and assembled into 103,295 isotigs representing 68,460 genes. Of these isotigs, 43,433 were significantly aligned to known proteins in the public database, of which 29,299 could be categorized into 42 functional groups in the gene ontology system, 10,079 classified into 23 functional classifications in the clusters of orthologous groups system, and 23,092 assigned to 139 clusters of specific metabolic pathways in the Kyoto Encyclopedia of Genes and Genomes. Among these annotations, 95 isotigs were designated as involved in chlorophyll metabolism. On this basis, we identified 16 key enzyme-encoding genes in the chlorophyll metabolism pathway, the full length cDNAs and expressions of which were further confirmed. Expression pattern indicated that the key enzyme-encoding genes for chlorophyll degradation were more highly expressed in the leaf color mutants, as was consistent with their lower chlorophyll contents. This study is the first to supply an informative 454 EST dataset for Cymbidium sinense 'Dharma' and to identify original leaf color-associated genes, which provide important resources to facilitate gene discovery for molecular breeding, marketable trait discovery, and investigating various biological process in this species.

  7. Transcriptome Characterization of Cymbidium sinense 'Dharma' Using 454 Pyrosequencing and Its Application in the Identification of Genes Associated with Leaf Color Variation

    PubMed Central

    Shi, Shanshan; Li, Dongmei; Wang, Zhen; Liu, Hailin; Huang, Dan; Wang, Caiyun

    2015-01-01

    The highly variable leaf color of Cymbidium sinense significantly improves its horticultural and economic value, and makes it highly desirable in the flower markets in China and Southeast Asia. However, little is understood about the molecular mechanism underlying leaf-color variations. In this study, we found the content of photosynthetic pigments, especially chlorophyll degradation metabolite in the leaf-color mutants is distinguished significantly from that in the wild type of Cymbidium sinense 'Dharma'. To further determine the candidate genes controlling leaf-color variations, we first sequenced the global transcriptome using 454 pyrosequencing. More than 0.7 million expressed sequence tags (ESTs) with an average read length of 445.9 bp were generated and assembled into 103,295 isotigs representing 68,460 genes. Of these isotigs, 43,433 were significantly aligned to known proteins in the public database, of which 29,299 could be categorized into 42 functional groups in the gene ontology system, 10,079 classified into 23 functional classifications in the clusters of orthologous groups system, and 23,092 assigned to 139 clusters of specific metabolic pathways in the Kyoto Encyclopedia of Genes and Genomes. Among these annotations, 95 isotigs were designated as involved in chlorophyll metabolism. On this basis, we identified 16 key enzyme-encoding genes in the chlorophyll metabolism pathway, the full length cDNAs and expressions of which were further confirmed. Expression pattern indicated that the key enzyme-encoding genes for chlorophyll degradation were more highly expressed in the leaf color mutants, as was consistent with their lower chlorophyll contents. This study is the first to supply an informative 454 EST dataset for Cymbidium sinense 'Dharma' and to identify original leaf color-associated genes, which provide important resources to facilitate gene discovery for molecular breeding, marketable trait discovery, and investigating various biological process in this species. PMID:26042676

  8. Generation, Annotation, and Analysis of a Large-Scale Expressed Sequence Tag Library from Arabidopsis pumila to Explore Salt-Responsive Genes.

    PubMed

    Huang, Xianzhong; Yang, Lifei; Jin, Yuhuan; Lin, Jun; Liu, Fang

    2017-01-01

    Arabidopsis pumila is an ephemeral plant, and a close relative of the model plant Arabidopsis thaliana , but it possesses higher photosynthetic efficiency, higher propagation rate, and higher salinity tolerance compared to those A. thaliana , thus providing a candidate plant system for gene mining for environmental adaption and salt tolerance. However, A. pumila is an under-explored resource for understanding the genetic mechanisms underlying abiotic stress adaptation. To improve our understanding of the molecular and genetic mechanisms of salt stress adaptation, more than 19,900 clones randomly selected from a cDNA library constructed previously from leaf tissue exposed to high-salinity shock were sequenced. A total of 16,014 high-quality expressed sequence tags (ESTs) were generated, which have been deposited in the dbEST GenBank under accession numbers JZ932319 to JZ948332. Clustering and assembly of these ESTs resulted in the identification of 8,835 unique sequences, consisting of 2,469 contigs and 6,366 singletons. The blastx results revealed 8,011 unigenes with significant similarity to known genes, while only 425 unigenes remained uncharacterized. Functional classification demonstrated an abundance of unigenes involved in binding, catalytic, structural or transporter activities, and in pathways of energy, carbohydrate, amino acid, or lipid metabolism. At least seven main classes of genes were related to salt-tolerance among the 8,835 unigenes. Many previously reported salt tolerance genes were also manifested in this library, for example VP1, H + -ATPase, NHX1, SOS2, SOS3, NAC, MYB, ERF, LEA, P5CS1 . In addition, 251 transcription factors were identified from the library, classified into 42 families. Lastly, changes in expression of the 12 most abundant unigenes, 12 transcription factor genes, and 19 stress-related genes in the first 24 h of exposure to high-salinity stress conditions were monitored by qRT-PCR. The large-scale EST library obtained in this study provides first-hand information on gene sequences expressed in young leaves of A. pumila exposed to salt shock. The rapid discovery of known or unknown genes related to salinity stress response in A. pumila will facilitate the understanding of complex adaptive mechanisms for ephemerals.

  9. Expressed sequence tag analysis of guinea pig (Cavia porcellus) eye tissues for NEIBank

    PubMed Central

    Simpanya, Mukoma F.; Wistow, Graeme; Gao, James; David, Larry L.; Giblin, Frank J.

    2008-01-01

    Purpose To characterize gene expression patterns in guinea pig ocular tissues and identify orthologs of human genes from NEIBank expressed sequence tags. Methods RNA was extracted from dissected eye tissues of 2.5-month-old guinea pigs to make three unamplified and unnormalized cDNA libraries in the pCMVSport-6 vector for the lens, retina, and eye minus lens and retina. Over 4,000 clones were sequenced from each library and were analyzed using GRIST for clustering and gene identification. Lens crystallin EST data were validated using two-dimensional electrophoresis (2-DE), matrix assisted laser desorption (MALDI), and electrospray ionization mass spectrometry (ESIMS). Results Combined data from the three libraries generated a total of 6,694 distinctive gene clusters, with each library having between 1,000 and 3,000 clusters. Approximately 60% of the total gene clusters were novel cDNA sequences and had significant homologies to other mammalian sequences in GenBank. Complete cDNA sequences were obtained for many guinea pig lens proteins, including αA/αAinsert-, γN-, and γS-crystallins, lengsin and GRIFIN. The ratio of αA- to αB-crystallin on 2-DE gels was 8: 1 in the lens nucleus and 6.5: 1 in the cortex. Analysis of ESTs, genome sequence, and proteins (by MALDI), did not reveal any evidence for the presence of γD-, γE-, and γF-crystallin in the guinea pig. Predicted masses of many guinea pig lens crystallins were confirmed by ESIMS analysis. For the retina, orthologs of human phototransduction genes were found, such as Rhodopsin, S-antigen (Sag, Arrestin), and Transducin. The guinea-pig ortholog of NRL, a key rod photoreceptor-specific transcription factor, was also represented in EST data. In the ‘rest-of-eye’ library, the most abundant transcripts included decorin and keratin 12, representative of the cornea. Conclusions Genomic analysis of guinea pig eye tissues provides sequence-verified clones for future studies. Guinea pig orthologs of many human eye specific genes were identified. Guinea pig gene structures were similar to their human and rodent gene counterparts. Surprisingly, no orthologs of γD-, γE-, and γF-crystallin were found in EST, proteomic, or the current guinea pig genome data. PMID:19104676

  10. Improvement of DHRA-DMDC Physical Access Software DBIDS Using Cloud Computing Technology: A Case Study

    DTIC Science & Technology

    2012-06-01

    technology originally developed on the Java platform. The Hibernate framework supports rapid development of a data access layer without requiring a...31 viii 2. Hibernate ................................................................................ 31 3. Database Design...protect from security threats; o Easy aggregate management operations via file tags; 2. Hibernate We recommend using Hibernate technology for object

  11. DOE Research and Development Accomplishments Nobel Chemists Associated with

    Science.gov Websites

    the DOE and Predecessors RSS Archive Videos XML DOE R&D Accomplishments DOE R&D Accomplishments searchQuery × Find searchQuery x Find DOE R&D Acccomplishments Navigation dropdown arrow The Blog Archive SC Stories Snapshots R&D Nuggets Database dropdown arrow Search Tag Cloud Browse

  12. Ga2O3 photocatalyzed on-line tagging of cysteine to facilitate peptide mass fingerprinting.

    PubMed

    Qiao, Liang; Su, Fangzheng; Bi, Hongyan; Girault, Hubert H; Liu, Baohong

    2011-09-01

    β-Ga(2)O(3) is a wide-band-gap semiconductor having strong oxidation ability under light irradiation. Herein, the steel target plates modified with β-Ga(2)O(3) nanoparticles have been developed to carry out in-source photo-catalytic oxidative reactions for online peptide tagging during laser desorption/ionization mass spectrometry (LDI-MS) analysis. Under UV laser irradiation, β-Ga(2)O(3) can catalyze the photo-oxidation of 2-methoxyhydroquinone added to a sample mixture to 2-methoxy benzoquinone that can further react with the thiol groups of cysteine residues by Michael addition reaction. The tagging process leads to appearance of pairs of peaks with an m/z shift of 138.1Th. This online labelling strategy is demonstrated to be sensitive and efficient with a detection-limit at femtomole level. Using the strategy, the information on cysteine content in peptides can be obtained together with peptide mass, therefore constraining the database searching for an advanced identification of cysteine-containing proteins from protein mixtures. The current peptide online tagging method can be important for specific analysis of cysteine-containing proteins especially the low-abundant ones that cannot be completely isolated from other high-abundant non-cysteine-proteins. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Transcriptome Sequencing of Hevea brasiliensis for Development of Microsatellite Markers and Construction of a Genetic Linkage Map

    PubMed Central

    Triwitayakorn, Kanokporn; Chatkulkawin, Pornsupa; Kanjanawattanawong, Supanath; Sraphet, Supajit; Yoocha, Thippawan; Sangsrakru, Duangjai; Chanprasert, Juntima; Ngamphiw, Chumpol; Jomchai, Nukoon; Therawattanasuk, Kanikar; Tangphatsornruang, Sithichoke

    2011-01-01

    To obtain more information on the Hevea brasiliensis genome, we sequenced the transcriptome from the vegetative shoot apex yielding 2 311 497 reads. Clustering and assembly of the reads produced a total of 113 313 unique sequences, comprising 28 387 isotigs and 84 926 singletons. Also, 17 819 expressed sequence tag (EST)-simple sequence repeats (SSRs) were identified from the data set. To demonstrate the use of this EST resource for marker development, primers were designed for 430 of the EST-SSRs. Three hundred and twenty-three primer pairs were amplifiable in H. brasiliensis clones. Polymorphic information content values of selected 47 SSRs among 20 H. brasiliensis clones ranged from 0.13 to 0.71, with an average of 0.51. A dendrogram of genetic similarities between the 20 H. brasiliensis clones using these 47 EST-SSRs suggested two distinct groups that correlated well with clone pedigree. These novel EST-SSRs together with the published SSRs were used for the construction of an integrated parental linkage map of H. brasiliensis based on 81 lines of an F1 mapping population. The map consisted of 97 loci, consisting of 37 novel EST-SSRs and 60 published SSRs, distributed on 23 linkage groups and covered 842.9 cM with a mean interval of 11.9 cM and ∼4 loci per linkage group. Although the numbers of linkage groups exceed the haploid number (18), but with several common markers between homologous linkage groups with the previous map indicated that the F1 map in this study is appropriate for further study in marker-assisted selection. PMID:22086998

  14. Evolution of the unspliced transcriptome.

    PubMed

    Engelhardt, Jan; Stadler, Peter F

    2015-08-20

    Despite their abundance, unspliced EST data have received little attention as a source of information on non-coding RNAs. Very little is know, therefore, about the genomic distribution of unspliced non-coding transcripts and their relationship with the much better studied regularly spliced products. In particular, their evolution has remained virtually unstudied. We systematically study the evidence on unspliced transcripts available in EST annotation tracks for human and mouse, comprising 104,980 and 66,109 unspliced EST clusters, respectively. Roughly one third of these are located totally inside introns of known genes (TINs) and another third overlaps exonic regions (PINs). Eleven percent are "intergenic", far away from any annotated gene. Direct evidence for the independent transcription of many PINs and TINs is obtained from CAGE tag and chromatin data. We predict more than 2000 3'UTR-associated RNA candidates for each human and mouse. Fifteen to twenty percent of the unspliced EST cluster are conserved between human and mouse. With the exception of TINs, the sequences of unspliced EST clusters evolve significantly slower than genomic background. Furthermore, like spliced lincRNAs, they show highly tissue-specific expression patterns. Unspliced long non-coding RNAs are an important, rapidly evolving, component of mammalian transcriptomes. Their analysis is complicated by their preferential association with complex transcribed loci that usually also harbor a plethora of spliced transcripts. Unspliced EST data, although typically disregarded in transcriptome analysis, can be used to gain insights into this rarely investigated transcriptome component. The frequently postulated connection between lack of splicing and nuclear retention and the surprising overlap of chromatin-associated transcripts suggests that this class of transcripts might be involved in chromatin organization and possibly other mechanisms of epigenetic control.

  15. In-silico mining, type and frequency analysis of genic microsatellites of finger millet (Eleusine coracana (L.) Gaertn.): a comparative genomic analysis of NBS-LRR regions of finger millet with rice.

    PubMed

    Kalyana Babu, B; Pandey, Dinesh; Agrawal, P K; Sood, Salej; Kumar, Anil

    2014-05-01

    In recent years, the increased availability of the DNA sequences has given the possibility to develop and explore the expressed sequence tags (ESTs) derived SSR markers. In the present study, a total of 1956 ESTs of finger millet were used to find the microsatellite type, distribution, frequency and developed a total of 545 primer pairs from the ESTs of finger millet. Thirty-two EST sequences had more than two microsatellites and 1357 sequences did not have any SSR repeats. The most frequent type of repeats was trimeric motif, however the second place was occupied by dimeric motif followed by tetra-, hexa- and penta repeat motifs. The most common dimer repeat motif was GA and in case of trimeric SSRs, it was CGG. The EST sequences of NBS-LRR region of finger millet and rice showed higher synteny and were found on nearly same positions on the rice chromosome map. A total of eight, out of 15 EST based SSR primers were polymorphic among the selected resistant and susceptible finger millet genotypes. The primer FMBLEST5 could able to differentiate them into resistant and susceptible genotypes. The alleles specific to the resistant and susceptible genotypes were sequenced using the ABI 3130XL genetic analyzer and found similarity to NBS-LRR regions of rice and finger millet and contained the characteristic kinase-2 and kinase 3a motifs of plant R-genes belonged to NBS-LRR region. The In-silico and comparative analysis showed that the genes responsible for blast resistance can be identified, mapped and further introgressed through molecular breeding approaches for enhancing the blast resistance in finger millet.

  16. Outlier Loci and Selection Signatures of Simple Sequence Repeats (SSRs) in Flax (Linum usitatissimum L.).

    PubMed

    Soto-Cerda, Braulio J; Cloutier, Sylvie

    2013-01-01

    Genomic microsatellites (gSSRs) and expressed sequence tag-derived SSRs (EST-SSRs) have gained wide application for elucidating genetic diversity and population structure in plants. Both marker systems are assumed to be selectively neutral when making demographic inferences, but this assumption is rarely tested. In this study, three neutrality tests were assessed for identifying outlier loci among 150 SSRs (85 gSSRs and 65 EST-SSRs) that likely influence estimates of population structure in three differentiated flax sub-populations ( F ST  = 0.19). Moreover, the utility of gSSRs, EST-SSRs, and the combined sets of SSRs was also evaluated in assessing genetic diversity and population structure in flax. Six outlier loci were identified by at least two neutrality tests showing footprints of balancing selection. After removing the outlier loci, the STRUCTURE analysis and the dendrogram topology of EST-SSRs improved. Conversely, gSSRs and combined SSRs results did not change significantly, possibly as a consequence of the higher number of neutral loci assessed. Taken together, the genetic structure analyses established the superiority of gSSRs to determine the genetic relationships among flax accessions, although the combined SSRs produced the best results. Genetic diversity parameters did not differ statistically ( P  > 0.05) between gSSRs and EST-SSRs, an observation partially explained by the similar number of repeat motifs. Our study provides new insights into the ability of gSSRs and EST-SSRs to measure genetic diversity and structure in flax and confirms the importance of testing for the occurrence of outlier loci to properly assess natural and breeding populations, particularly in studies considering only few loci.

  17. ScienceCentral: open access full-text archive of scientific journals based on Journal Article Tag Suite regardless of their languages.

    PubMed

    Huh, Sun

    2013-01-01

    ScienceCentral, a free or open access, full-text archive of scientific journal literature at the Korean Federation of Science and Technology Societies, was under test in September 2013. Since it is a Journal Article Tag Suite-based full text database, extensible markup language files of all languages can be presented, according to Unicode Transformation Format 8-bit encoding. It is comparable to PubMed Central: however, there are two distinct differences. First, its scope comprises all science fields; second, it accepts all language journals. Launching ScienceCentral is the first step for free access or open access academic scientific journals of all languages to leap to the world, including scientific journals from Croatia.

  18. Transcript Profile of the Response of Two Soybean Genotypes to Potassium Deficiency

    PubMed Central

    Hao, QingNan; Sha, AiHua; Shan, ZhiHui; Chen, LiMiao; Zhou, Rong; Zhi, HaiJian; Zhou, XinAn

    2012-01-01

    The macronutrient potassium (K) is essential to plant growth and development. Crop yield potential is often affected by lack of soluble K. The molecular regulation mechanism of physiological and biochemical responses to K starvation in soybean roots and shoots is not fully understood. In the present study, two soybean varieties were subjected to low-K stress conditions: a low-K-tolerant variety (You06-71) and a low-K-sensitive variety (HengChun04-11). Eight libraries were generated for analysis: 2 genotypes ×2 tissues (roots and shoots) ×2 time periods [short term (0.5 to 12 h) and long term (3 to 12 d)]. RNA derived from the roots and shoots of these two varieties across two periods (short term and long term) were sequenced and the transcriptomes were compared using high-throughput tag-sequencing. To this end, a large number of clean tags (tags used for analysis after removal of dirty tags) corresponding to distinct tags (all types of clean tags) were identified in eight libraries (L1, You06-71-root short term; L2, HengChun04-11-root short term; L3, You06-71-shoot short term; L4, HengChun04-11-shoot short term; L5, You06-71-root long term; L6, HengChun04-11-root long term; L7, You06-71-shoot long term; L8, HengChun04-11-shoot long term). All clean tags were mapped to the available soybean (Glycine max) transcript database (http://www.soybase.org). Many genes showed substantial differences in expression across the libraries. In total, 5,440 transcripts involved in 118 KEGG pathways were either up- or down-regulated. Fifteen genes were randomly selected and their expression levels were confirmed using quantitative RT-PCR. Our results provide preliminary information on the molecular mechanism of potassium absorption and transport under low-K stress conditions in different soybean tissues. PMID:22792192

  19. Transcriptome Analysis of Gene Expression during Chinese Water Chestnut Storage Organ Formation

    PubMed Central

    Chen, Sainan; Wang, Yan; Yu, Meizhen; Chen, Xuehao; Li, Liangjun; Yin, Jingjing

    2016-01-01

    The product organ (storage organ; corm) of the Chinese water chestnut has become a very popular food in Asian countries because of its unique nutritional value. Corm formation is a complex biological process, and extensive whole genome analysis of transcripts during corm development has not been carried out. In this study, four corm libraries at different developmental stages were constructed, and gene expression was identified using a high-throughput tag sequencing technique. Approximately 4.9 million tags were sequenced, and 4,371,386, 4,372,602, 4,782,494, and 5,276,540 clean tags, including 119,676, 110,701, 100,089, and 101,239 distinct tags, respectively, were obtained after removal of low-quality tags from each library. More than 39% of the distinct tags were unambiguous and could be mapped to reference genes, while 40% were unambiguous tag-mapped genes. After mapping their functions in existing databases, a total of 11,592, 10,949, 10,585, and 7,111 genes were annotated from the B1, B2, B3, and B4 libraries, respectively. Analysis of the differentially expressed genes (DEGs) in B1/B2, B2/B3, and B3/B4 libraries showed that most of the DEGs at the B1/B2 stages were involved in carbohydrate and hormone metabolism, while the majority of DEGs were involved in energy metabolism and carbohydrate metabolism at the B2/B3 and B3/B4 stages. All of the upregulated transcription factors and 9 important genes related to product organ formation in the above four stages were also identified. The expression changes of nine of the identified DEGs were validated using a quantitative PCR approach. This study provides a comprehensive understanding of gene expression during corm formation in the Chinese water chestnut. PMID:27716802

  20. Development of highly polymorphic EST-SSR markers and segregation in F₁ hybrid population of Vitis vinifera L.

    PubMed

    Kayesh, E; Zhang, Y Y; Liu, G S; Bilkish, N; Sun, X; Leng, X P; Fang, J G

    2013-09-23

    The objectives of this investigation were to develop and validate the expressed sequence tag (EST)-simple sequence repeat (SSR) markers from large EST sequences, and to study the segregation and distribution of SSRs within two grapevine parental lines. In total, 94 F₁ lines crossed between "Early Rose" and "Red Globe" were studied. Approximately 2100 EST-SSR sequences of Vitis vinifera L. were searched for SSRs and analyzed for the design of polymerase chain reaction (PCR) primers amplifying the SSR-rich regions. Trinucleotide repeats were found to be the most abundant, followed by other nucleotide repeats. A total of 182 SSR primer pairs were first developed for the study on the parental polymorphism. Among the 182 SSR primers, 142 primer pairs (78%) could amplify the anticipated PCR products, among which only 52 primer pairs (36.62%) showed polymorphism between the two parents. These polymorphic bands were further surveyed among the 94 F₁ lines, and the results showed that a total of 162 bands were amplified, and 98 of them were polymorphic in both parents (60.86% polymorphism), with an average of 1.88 polymorphic DNA bands for each primer pair. After testing with the chi-square test, 33 of the clearly amplified polymorphic bands followed a 3:1 ratio, and 37 followed a 1:1 ratio. The rest showed distorted segregation ratios.

  1. Transcriptome Analysis of the Scleractinian Coral Stylophora pistillata

    PubMed Central

    Salmon-Divon, Mali; Katzenellenbogen, Mark; Tambutté, Sylvie; Bertucci, Anthony; Hoegh-Guldberg, Ove; Deleury, Emeline; Allemand, Denis; Levy, Oren

    2014-01-01

    The principal architects of coral reefs are the scleractinian corals; these species are divided in two major clades referred to as “robust” and “complex” corals. Although the molecular diversity of the “complex” clade has received considerable attention, with several expressed sequence tag (EST) libraries and a complete genome sequence having been constructed, the “robust” corals have received far less attention, despite the fact that robust corals have been prominent focal points for ecological and physiological studies. Filling this gap affords important opportunities to extend these studies and to improve our understanding of the differences between the two major clades. Here, we present an EST library from Stylophora pistillata (Esper 1797) and systematically analyze the assembled transcripts compared to putative homologs from the complete proteomes of six well-characterized metazoans: Nematostella vectensis, Hydra magnipapillata, Caenorhabditis elegans, Drosophila melanogaster, Strongylocentrotus purpuratus, Ciona intestinalis and Homo sapiens. Furthermore, comparative analyses of the Stylophora pistillata ESTs were performed against several Cnidaria from the Scleractinia, Actiniaria and Hydrozoa, as well as against other stony corals separately. Functional characterization of S. pistillata transcripts into KOG/COG categories and further description of Wnt and bone morphogenetic protein (BMP) signaling pathways showed that the assembled EST library provides sufficient data and coverage. These features of this new library suggest considerable opportunities for extending our understanding of the molecular and physiological behavior of “robust” corals. PMID:24551124

  2. Gene identification and analysis of transcripts differentially regulated in fracture healing by EST sequencing in the domestic sheep.

    PubMed

    Hecht, Jochen; Kuhl, Heiner; Haas, Stefan A; Bauer, Sebastian; Poustka, Albert J; Lienau, Jasmin; Schell, Hanna; Stiege, Asita C; Seitz, Volkhard; Reinhardt, Richard; Duda, Georg N; Mundlos, Stefan; Robinson, Peter N

    2006-07-05

    The sheep is an important model animal for testing novel fracture treatments and other medical applications. Despite these medical uses and the well known economic and cultural importance of the sheep, relatively little research has been performed into sheep genetics, and DNA sequences are available for only a small number of sheep genes. In this work we have sequenced over 47 thousand expressed sequence tags (ESTs) from libraries developed from healing bone in a sheep model of fracture healing. These ESTs were clustered with the previously available 10 thousand sheep ESTs to a total of 19087 contigs with an average length of 603 nucleotides. We used the newly identified sequences to develop RT-PCR assays for 78 sheep genes and measured differential expression during the course of fracture healing between days 7 and 42 postfracture. All genes showed significant shifts at one or more time points. 23 of the genes were differentially expressed between postfracture days 7 and 10, which could reflect an important role for these genes for the initiation of osteogenesis. The sequences we have identified in this work are a valuable resource for future studies on musculoskeletal healing and regeneration using sheep and represent an important head-start for genomic sequencing projects for Ovis aries, with partial or complete sequences being made available for over 5,800 previously unsequenced sheep genes.

  3. Construction of an Integrated High Density Simple Sequence Repeat Linkage Map in Cultivated Strawberry (Fragaria × ananassa) and its Applicability

    PubMed Central

    Isobe, Sachiko N.; Hirakawa, Hideki; Sato, Shusei; Maeda, Fumi; Ishikawa, Masami; Mori, Toshiki; Yamamoto, Yuko; Shirasawa, Kenta; Kimura, Mitsuhiro; Fukami, Masanobu; Hashizume, Fujio; Tsuji, Tomoko; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Tsuruoka, Hisano; Minami, Chiharu; Takahashi, Chika; Wada, Tsuyuko; Ono, Akiko; Kawashima, Kumiko; Nakazaki, Naomi; Kishida, Yoshie; Kohara, Mitsuyo; Nakayama, Shinobu; Yamada, Manabu; Fujishiro, Tsunakazu; Watanabe, Akiko; Tabata, Satoshi

    2013-01-01

    The cultivated strawberry (Fragaria× ananassa) is an octoploid (2n = 8x = 56) of the Rosaceae family whose genomic architecture is still controversial. Several recent studies support the AAA′A′BBB′B′ model, but its complexity has hindered genetic and genomic analysis of this important crop. To overcome this difficulty and to assist genome-wide analysis of F. × ananassa, we constructed an integrated linkage map by organizing a total of 4474 of simple sequence repeat (SSR) markers collected from published Fragaria sequences, including 3746 SSR markers [Fragaria vesca expressed sequence tag (EST)-derived SSR markers] derived from F. vesca ESTs, 603 markers (F. × ananassa EST-derived SSR markers) from F. × ananassa ESTs, and 125 markers (F. × ananassa transcriptome-derived SSR markers) from F. × ananassa transcripts. Along with the previously published SSR markers, these markers were mapped onto five parent-specific linkage maps derived from three mapping populations, which were then assembled into an integrated linkage map. The constructed map consists of 1856 loci in 28 linkage groups (LGs) that total 2364.1 cM in length. Macrosynteny at the chromosome level was observed between the LGs of F. × ananassa and the genome of F. vesca. Variety distinction on 129 F. × ananassa lines was demonstrated using 45 selected SSR markers. PMID:23248204

  4. Partial DNA sequencing of Douglas-fir cDNAs used in RFLP mapping

    Treesearch

    K.D. Jermstad; D.L. Bassoni; C.S. Kinlaw; D.B. Neale

    1998-01-01

    DNA sequences from 87 Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco) cDNA RFLP probes were determined. Sequences were submitted to the GenBank dbEST database and searched for similarity against nucleotide and protein databases using the BLASTn and BLASTx programs. Twenty-one sequences (24%) were assigned putative functions; 18 of which...

  5. Genetic diversity analysis in Malaysian giant prawns using expressed sequence tag microsatellite markers for stock improvement program.

    PubMed

    Atin, K H; Christianus, A; Fatin, N; Lutas, A C; Shabanimofrad, M; Subha, B

    2017-08-17

    The Malaysian giant prawn is among the most commonly cultured species of the genus Macrobrachium. Stocks of giant prawns from four rivers in Peninsular Malaysia have been used for aquaculture over the past 25 years, which has led to repeated harvesting, restocking, and transplantation between rivers. Consequently, a stock improvement program is now important to avoid the depletion of wild stocks and the loss of genetic diversity. However, the success of such an improvement program depends on our knowledge of the genetic variation of these base populations. The aim of the current study was to estimate genetic variation and differentiation of these riverine sources using novel expressed sequence tag-microsatellite (EST-SSR) markers, which not only are informative on genetic diversity but also provide information on immune and metabolic traits. Our findings indicated that the tested stocks have inbreeding depression due to a significant deficiency in heterozygotes, and F IS was estimated as 0.15538 to 0.31938. An F-statistics analysis suggested that the stocks are composed of one large panmictic population. Among the four locations, stocks from Johor, in the southern region of the peninsular, showed higher allelic and genetic diversity than the other stocks. To overcome inbreeding problems, the Johor population could be used as a base population in a stock improvement program by crossing to the other populations. The study demonstrated that EST-SSR markers can be incorporated in future marker assisted breeding to aid the proper management of the stocks by breeders and stakeholders in Malaysia.

  6. Generation of expressed sequence tags for discovery of genes responsible for floral traits of Chrysanthemum morifolium by next-generation sequencing technology.

    PubMed

    Sasaki, Katsutomo; Mitsuda, Nobutaka; Nashima, Kenji; Kishimoto, Kyutaro; Katayose, Yuichi; Kanamori, Hiroyuki; Ohmiya, Akemi

    2017-09-04

    Chrysanthemum morifolium is one of the most economically valuable ornamental plants worldwide. Chrysanthemum is an allohexaploid plant with a large genome that is commercially propagated by vegetative reproduction. New cultivars with different floral traits, such as color, morphology, and scent, have been generated mainly by classical cross-breeding and mutation breeding. However, only limited genetic resources and their genome information are available for the generation of new floral traits. To obtain useful information about molecular bases for floral traits of chrysanthemums, we read expressed sequence tags (ESTs) of chrysanthemums by high-throughput sequencing using the 454 pyrosequencing technology. We constructed normalized cDNA libraries, consisting of full-length, 3'-UTR, and 5'-UTR cDNAs derived from various tissues of chrysanthemums. These libraries produced a total number of 3,772,677 high-quality reads, which were assembled into 213,204 contigs. By comparing the data obtained with those of full genome-sequenced species, we confirmed that our chrysanthemum contig set contained the majority of all expressed genes, which was sufficient for further molecular analysis in chrysanthemums. We confirmed that our chrysanthemum EST set (contigs) contained a number of contigs that encoded transcription factors and enzymes involved in pigment and aroma compound metabolism that was comparable to that of other species. This information can serve as an informative resource for identifying genes involved in various biological processes in chrysanthemums. Moreover, the findings of our study will contribute to a better understanding of the floral characteristics of chrysanthemums including the myriad cultivars at the molecular level.

  7. Broad host range vectors for expression of proteins with (Twin-) Strep-tag, His-tag and engineered, export optimized yellow fluorescent protein

    PubMed Central

    2013-01-01

    Background In current protein research, a limitation still is the production of active recombinant proteins or native protein associations to assess their function. Especially the localization and analysis of protein-complexes or the identification of modifications and small molecule interaction partners by co-purification experiments requires a controllable expression of affinity- and/or fluorescence tagged variants of a protein of interest in its native cellular background. Advantages of periplasmic and/or homologous expressions can frequently not be realized due to a lack of suitable tools. Instead, experiments are often limited to the heterologous production in one of the few well established expression strains. Results Here, we introduce a series of new RK2 based broad host range expression plasmids for inducible production of affinity- and fluorescence tagged proteins in the cytoplasm and periplasm of a wide range of Gram negative hosts which are designed to match the recently suggested modular Standard European Vector Architecture and database. The vectors are equipped with a yellow fluorescent protein variant which is engineered to fold and brightly fluoresce in the bacterial periplasm following Sec-mediated export, as shown from fractionation and imaging studies. Expression of Strep-tag®II and Twin-Strep-tag® fusion proteins in Pseudomonas putida KT2440 is demonstrated for various ORFs. Conclusion The broad host range constructs we have produced enable good and controlled expression of affinity tagged protein variants for single-step purification and qualify for complex co-purification experiments. Periplasmic export variants enable production of affinity tagged proteins and generation of fusion proteins with a novel engineered Aequorea-based yellow fluorescent reporter protein variant with activity in the periplasm of the tested Gram-negative model bacteria Pseudomonas putida KT2440 and Escherichia coli K12 for production, localization or co-localization studies. In addition, the new tools facilitate metabolic engineering and yield assessment for cytoplasmic or periplasmic protein production in a number of different expression hosts when yields in one initially selected are insufficient. PMID:23687945

  8. Protein Information Resource: a community resource for expert annotation of protein data

    PubMed Central

    Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy

    2001-01-01

    The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter­national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041

  9. Identification of Candidate Genes Responsible for Stem Pith Production Using Expression Analysis in Solid-Stemmed Wheat.

    PubMed

    Oiestad, A J; Martin, J M; Cook, J; Varella, A C; Giroux, M J

    2017-07-01

    The wheat stem sawfly (WSS) is an economically important pest of wheat in the Northern Great Plains. The primary means of WSS control is resistance associated with the single quantitative trait locus (QTL) , which controls most stem solidness variation. The goal of this study was to identify stem solidness candidate genes via RNA-seq. This study made use of 28 single nucleotide polymorphism (SNP) makers derived from expressed sequence tags (ESTs) linked to contained within a 5.13 cM region. Allele specific expression of EST markers was examined in stem tissue for solid and hollow-stemmed pairs of two spring wheat near isogenic lines (NILs) differing for the QTL. Of the 28 ESTs, 13 were located within annotated genes and 10 had detectable stem expression. Annotated genes corresponding to four of the ESTs were differentially expressed between solid and hollow-stemmed NILs and represent possible stem solidness gene candidates. Further examination of the 5.13 cM region containing the 28 EST markers identified 260 annotated genes. Twenty of the 260 linked genes were up-regulated in hollow NIL stems, while only seven genes were up-regulated in solid NIL stems. An -methyltransferase within the region of interest was identified as a candidate based on differential expression between solid and hollow-stemmed NILs and putative function. Further study of these candidate genes may lead to the identification of the gene(s) controlling stem solidness and an increased ability to select for wheat stem solidness and manage WSS. Copyright © 2017 Crop Science Society of America.

  10. Determination of the genetic diversity of vegetable soybean [Glycine max (L.) Merr.] using EST-SSR markers*

    PubMed Central

    Zhang, Gu-wen; Xu, Sheng-chun; Mao, Wei-hua; Hu, Qi-zan; Gong, Ya-ming

    2013-01-01

    The development of expressed sequence tag-derived simple sequence repeats (EST-SSRs) provided a useful tool for investigating plant genetic diversity. In the present study, 22 polymorphic EST-SSRs from grain soybean were identified and used to assess the genetic diversity in 48 vegetable soybean accessions. Among the 22 EST-SSR loci, tri-nucleotides were the most abundant repeats, accounting for 50.00% of the total motifs. GAA was the most common motif among tri-nucleotide repeats, with a frequency of 18.18%. Polymorphic analysis identified a total of 71 alleles, with an average of 3.23 per locus. The polymorphism information content (PIC) values ranged from 0.144 to 0.630, with a mean of 0.386. Observed heterozygosity (H o) values varied from 0.0196 to 1.0000, with an average of 0.6092, while the expected heterozygosity (H e) values ranged from 0.1502 to 0.6840, with a mean value of 0.4616. Principal coordinate analysis and phylogenetic tree analysis indicated that the accessions could be assigned to different groups based to a large extent on their geographic distribution, and most accessions from China were clustered into the same groups. These results suggest that Chinese vegetable soybean accessions have a narrow genetic base. The results of this study indicate that EST-SSRs from grain soybean have high transferability to vegetable soybean, and that these new markers would be helpful in taxonomy, molecular breeding, and comparative mapping studies of vegetable soybean in the future. PMID:23549845

  11. High-resolution melt analysis to identify and map sequence-tagged site anchor points onto linkage maps: a white lupin (Lupinus albus) map as an exemplar.

    PubMed

    Croxford, Adam E; Rogers, Tom; Caligari, Peter D S; Wilkinson, Michael J

    2008-01-01

    * The provision of sequence-tagged site (STS) anchor points allows meaningful comparisons between mapping studies but can be a time-consuming process for nonmodel species or orphan crops. * Here, the first use of high-resolution melt analysis (HRM) to generate STS markers for use in linkage mapping is described. This strategy is rapid and low-cost, and circumvents the need for labelled primers or amplicon fractionation. * Using white lupin (Lupinus albus, x = 25) as a case study, HRM analysis was applied to identify 91 polymorphic markers from expressed sequence tag (EST)-derived and genomic libraries. Of these, 77 generated STS anchor points in the first fully resolved linkage map of the species. The map also included 230 amplified fragment length polymorphisms (AFLP) loci, spanned 1916 cM (84.2% coverage) and divided into the expected 25 linkage groups. * Quantitative trait loci (QTL) analyses performed on the population revealed genomic regions associated with several traits, including the agronomically important time to flowering (tf), alkaloid synthesis and stem height (Ph). Use of HRM-STS markers also allowed us to make direct comparisons between our map and that of the related crop, Lupinus angustifolius, based on the conversion of RFLP, microsatellite and single nucleotide polymorphism (SNP) markers into HRM markers.

  12. ESTimating plant phylogeny: lessons from partitioning

    PubMed Central

    de la Torre, Jose EB; Egan, Mary G; Katari, Manpreet S; Brenner, Eric D; Stevenson, Dennis W; Coruzzi, Gloria M; DeSalle, Rob

    2006-01-01

    Background While Expressed Sequence Tags (ESTs) have proven a viable and efficient way to sample genomes, particularly those for which whole-genome sequencing is impractical, phylogenetic analysis using ESTs remains difficult. Sequencing errors and orthology determination are the major problems when using ESTs as a source of characters for systematics. Here we develop methods to incorporate EST sequence information in a simultaneous analysis framework to address controversial phylogenetic questions regarding the relationships among the major groups of seed plants. We use an automated, phylogenetically derived approach to orthology determination called OrthologID generate a phylogeny based on 43 process partitions, many of which are derived from ESTs, and examine several measures of support to assess the utility of EST data for phylogenies. Results A maximum parsimony (MP) analysis resulted in a single tree with relatively high support at all nodes in the tree despite rampant conflict among trees generated from the separate analysis of individual partitions. In a comparison of broader-scale groupings based on cellular compartment (ie: chloroplast, mitochondrial or nuclear) or function, only the nuclear partition tree (based largely on EST data) was found to be topologically identical to the tree based on the simultaneous analysis of all data. Despite topological conflict among the broader-scale groupings examined, only the tree based on morphological data showed statistically significant differences. Conclusion Based on the amount of character support contributed by EST data which make up a majority of the nuclear data set, and the lack of conflict of the nuclear data set with the simultaneous analysis tree, we conclude that the inclusion of EST data does provide a viable and efficient approach to address phylogenetic questions within a parsimony framework on a genomic scale, if problems of orthology determination and potential sequencing errors can be overcome. In addition, approaches that examine conflict and support in a simultaneous analysis framework allow for a more precise understanding of the evolutionary history of individual process partitions and may be a novel way to understand functional aspects of different kinds of cellular classes of gene products. PMID:16776834

  13. Digitization and the Creation of Virtual Libraries: The Princeton University Image Card Catalog--Reaping the Benefits of Imaging.

    ERIC Educational Resources Information Center

    Henthorne, Eileen

    1995-01-01

    Describes a project at the Princeton University libraries that converted the pre-1981 public card catalog, using digital imaging and optical character recognition technology, to fully tagged and indexed records of text in MARC format that are available on an online database and will be added to the online catalog. (LRW)

  14. Ubiquitous-health (U-Health) monitoring systems for elders and caregivers

    NASA Astrophysics Data System (ADS)

    Moon, Gyu; Lim, Kyung-won; Yoo, Young-min; An, Hye-min; Lee, Ki Seop; Szu, Harold

    2011-06-01

    This paper presents two aordable low-tack system for household biomedical wellness monitoring. The rst system, JIKIMI (pronounced caregiver in Korean), is a remote monitoring system that analyzes the behavior patterns of elders that live alone. JIKIMI is composed of an in-house sensing system, a set of wireless sensor nodes containing a pyroelectric infrared sensor to detect the motion of elders, an emergency button and a magnetic sensor that detects the opening and closing of doors. The system is also equipped with a server system, which is comprised of a database and web server. The server provides the mechanism for web-based monitoring to caregivers. The second system, Reader of Bottle Information (ROBI), is an assistant system which advises the contents of bottles for elders. ROBI is composed of bottles that have connected RFID tags and an advice system, which is composed of a wireless RFID reader, a gateway and a remote database server. The RFID tags are connected to the caps of the bottles are used in conjunction with the advice system These systems have been in use for three years and have proven to be useful for caregivers to provide more ecient and eective care services.

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    SacconePhD, Scott F; Chesler, Elissa J; Bierut, Laura J

    Commercial SNP microarrays now provide comprehensive and affordable coverage of the human genome. However, some diseases have biologically relevant genomic regions that may require additional coverage. Addiction, for example, is thought to be influenced by complex interactions among many relevant genes and pathways. We have assembled a list of 486 biologically relevant genes nominated by a panel of experts on addiction. We then added 424 genes that showed evidence of association with addiction phenotypes through mouse QTL mappings and gene co-expression analysis. We demonstrate that there are a substantial number of SNPs in these genes that are not well representedmore » by commercial SNP platforms. We address this problem by introducing a publicly available SNP database for addiction. The database is annotated using numeric prioritization scores indicating the extent of biological relevance. The scores incorporate a number of factors such as SNP/gene functional properties (including synonymy and promoter regions), data from mouse systems genetics and measures of human/mouse evolutionary conservation. We then used HapMap genotyping data to determine if a SNP is tagged by a commercial microarray through linkage disequilibrium. This combination of biological prioritization scores and LD tagging annotation will enable addiction researchers to supplement commercial SNP microarrays to ensure comprehensive coverage of biologically relevant regions.« less

  16. E-microsatellite markers for Centella asiatica (Gotu Kola) genome: validation and cross-transferability in Apiaceae family for plant omics research and development.

    PubMed

    Sahu, Jagajjit; Das Talukdar, Anupam; Devi, Kamalakshi; Choudhury, Manabendra Dutta; Barooah, Madhumita; Modi, Mahendra Kumar; Sen, Priyabrata

    2015-01-01

    Abstract Centella asiatica (Gotu Kola) is a plant that grows in tropical swampy regions of the world and has important medicinal and culinary use. It is often considered as part of Ayurvedic medicine, traditional African medicine, and traditional Chinese medicine. The unavailability of genomics resources is significantly impeding its genetic improvement. To date, no attempt has been made to develop Expressed Sequence Tags (ESTs) derived Simple Sequence Repeat (SSR) markers (eSSRs) from the Centella genome. Hence, the present study aimed to develop eSSRs and their further experimental validation and cross-transferability of these markers in different genera of the Apiaceae family to which Centella belongs. An in-house pipeline was developed for the entire analyses by combining bioinformatics tools and perl scripts. A total of 4443 C. asiatica EST sequences from dbEST were processed, which generated 2617 nonredundant high quality EST sequences consisting 441 contigs and 2176 singletons. Out of 1776.5 kb of examined sequences, 417 (15.9%) ESTs containing 686 SSRs were detected with a density of one SSR per 2.59 kb. The gene ontology study revealed 282 functional domains involved in various processes, components, and functions, out of which 64 ESTs were found to have both SSRs and functional domains. Out of 603 designed EST-SSR primers, 18 pairs of primers were selected for validation based on the optimum parameter value. Reproducible amplification was obtained for six primer pairs in C. asiatica that were further tested for cross-transferability in nine other important genera/species of the Apiaceae family. Cross-transferability of the EST-SSR markers among the species were examined and Centella javanica showed highest transferability (83.3%). The study revealed six highly polymorphic EST-SSR primers with an average PIC value of 0.95. In conclusion, these EST-SSR markers hold a big promise for the genomics analysis of Centella asiatica, to facilitate comparative map-based analyses across other related species within the Apiaceae family, and future marker-assisted breeding programs. To the best of our knowledge, this is the first report of development of EST-SSRs in Centella asiatica by in silico approaches, which offers a veritable potential in further use in plant omics research and development.

  17. A Solution on Identification and Rearing Files Insmallhold Pig Farming

    NASA Astrophysics Data System (ADS)

    Xiong, Benhai; Fu, Runting; Lin, Zhaohui; Luo, Qingyao; Yang, Liang

    In order to meet government supervision of pork production safety as well as consumeŕs right to know what they buy, this study adopts animal identification, mobile PDA reader, GPRS and other information technologies, and put forward a data collection method to set up rearing files of pig in smallhold pig farming, and designs related metadata structures and its mobile database, and develops a mobile PDA embedded system to collect individual information of pig and uploading into the remote central database, and finally realizes mobile links to the a specific website. The embedded PDA can identify both a special pig bar ear tag appointed by the Ministry of Agricultural and a general data matrix bar ear tag designed by this study by mobile reader, and can record all kinds of inputs data including bacterins, feed additives, animal drugs and even some forbidden medicines and submitted them to the center database through GPRS. At the same time, the remote center database can be maintained by mobile PDA and GPRS, and finally reached pork tracking from its origin to consumption and its tracing through turn-over direction. This study has suggested a feasible technology solution how to set up network pig electronic rearing files involved smallhold pig farming based on farmer and the solution is proved practical through its application in the Tianjińs pork quality traceability system construction. Although some individual techniques have some adverse effects on the system running such as GPRS transmitting speed now, these will be resolved with the development of communication technology. The full implementation of the solution around China will supply technical supports in guaranteeing the quality and safety of pork production supervision and meet consumer demand.

  18. Impact of medical director certification on nursing home quality of care.

    PubMed

    Rowland, Frederick N; Cowles, Mick; Dickstein, Craig; Katz, Paul R

    2009-07-01

    This study tests the research hypothesis that certified medical directors are able to use their training, education, and knowledge to positively influence quality of care in US nursing homes. F-tag numbers were identified within the State Operations Manual that reflect dimensions of quality thought to be impacted by the medical director. A weighting system was developed based on the "scope and severity" level at which the nursing homes were cited for these specific tag numbers. Then homes led by certified medical directors were compared with homes led by medical directors not known to be certified. DATA/PARTICIPANTS: Data were obtained from the Centers for Medicare & Medicaid Services' Online Survey Certification and Reporting database for nursing homes. Homes with a certified medical director (547) were identified from the database of the American Medical Directors Association. The national survey database was used to compute a "standardized quality score" (zero representing best possible score and 1.0 representing average score) for each home, and the homes with certified medical directors compared with the other homes in the database. Regression analysis was then used to attempt to identify the most important contributors to measured quality score differences between the homes. The standardized quality score of facilities with certified medical directors (n=547) was 0.8958 versus 1.0037 for facilities without certified medical directors (n=15,230) (lower number represents higher quality). When nursing facility characteristics were added to the regression equation, the presence of a certified medical director accounted for up to 15% improvement in quality. The presence of certified medical directors is an independent predictor of quality in US nursing homes.

  19. Integrating RFID technique to design mobile handheld inventory management system

    NASA Astrophysics Data System (ADS)

    Huang, Yo-Ping; Yen, Wei; Chen, Shih-Chung

    2008-04-01

    An RFID-based mobile handheld inventory management system is proposed in this paper. Differing from the manual inventory management method, the proposed system works on the personal digital assistant (PDA) with an RFID reader. The system identifies electronic tags on the properties and checks the property information in the back-end database server through a ubiquitous wireless network. The system also provides a set of functions to manage the back-end inventory database and assigns different levels of access privilege according to various user categories. In the back-end database server, to prevent improper or illegal accesses, the server not only stores the inventory database and user privilege information, but also keeps track of the user activities in the server including the login and logout time and location, the records of database accessing, and every modification of the tables. Some experimental results are presented to verify the applicability of the integrated RFID-based mobile handheld inventory management system.

  20. Identification of a pair of phospholipid:diacylglycerol acyltransferases from developing flax (Linum usitatissimum L.) seed catalyzing the selective production of trilinolenin.

    PubMed

    Pan, Xue; Siloto, Rodrigo M P; Wickramarathna, Aruna D; Mietkiewska, Elzbieta; Weselake, Randall J

    2013-08-16

    The oil from flax (Linum usitatissimum L.) has high amounts of α-linolenic acid (ALA; 18:3(cis)(Δ9,12,15)) and is one of the richest sources of omega-3 polyunsaturated fatty acids (ω-3-PUFAs). To produce ∼57% ALA in triacylglycerol (TAG), it is likely that flax contains enzymes that can efficiently transfer ALA to TAG. To test this hypothesis, we conducted a systematic characterization of TAG-synthesizing enzymes from flax. We identified several genes encoding acyl-CoA:diacylglycerol acyltransferases (DGATs) and phospholipid:diacylglycerol acyltransferases (PDATs) from the flax genome database. Due to recent genome duplication, duplicated gene pairs have been identified for all genes except DGAT2-2. Analysis of gene expression indicated that two DGAT1, two DGAT2, and four PDAT genes were preferentially expressed in flax embryos. Yeast functional analysis showed that DGAT1, DGAT2, and two PDAT enzymes restored TAG synthesis when produced recombinantly in yeast H1246 strain. The activity of particular PDAT enzymes (LuPDAT1 and LuPDAT2) was stimulated by the presence of ALA. Further seed-specific expression of flax genes in Arabidopsis thaliana indicated that DGAT1, PDAT1, and PDAT2 had significant effects on seed oil phenotype. Overall, this study indicated the existence of unique PDAT enzymes from flax that are able to preferentially catalyze the synthesis of TAG containing ALA acyl moieties. The identified LuPDATs may have practical applications for increasing the accumulation of ALA and other polyunsaturated fatty acids in oilseeds for food and industrial applications.

  1. Multiple kernels learning-based biological entity relationship extraction method.

    PubMed

    Dongliang, Xu; Jingchang, Pan; Bailing, Wang

    2017-09-20

    Automatic extracting protein entity interaction information from biomedical literature can help to build protein relation network and design new drugs. There are more than 20 million literature abstracts included in MEDLINE, which is the most authoritative textual database in the field of biomedicine, and follow an exponential growth over time. This frantic expansion of the biomedical literature can often be difficult to absorb or manually analyze. Thus efficient and automated search engines are necessary to efficiently explore the biomedical literature using text mining techniques. The P, R, and F value of tag graph method in Aimed corpus are 50.82, 69.76, and 58.61%, respectively. The P, R, and F value of tag graph kernel method in other four evaluation corpuses are 2-5% higher than that of all-paths graph kernel. And The P, R and F value of feature kernel and tag graph kernel fuse methods is 53.43, 71.62 and 61.30%, respectively. The P, R and F value of feature kernel and tag graph kernel fuse methods is 55.47, 70.29 and 60.37%, respectively. It indicated that the performance of the two kinds of kernel fusion methods is better than that of simple kernel. In comparison with the all-paths graph kernel method, the tag graph kernel method is superior in terms of overall performance. Experiments show that the performance of the multi-kernels method is better than that of the three separate single-kernel method and the dual-mutually fused kernel method used hereof in five corpus sets.

  2. Transcriptome analysis in Concholepas concholepas (Gastropoda, Muricidae): mining and characterization of new genomic and molecular markers.

    PubMed

    Cárdenas, Leyla; Sánchez, Roland; Gomez, Daniela; Fuenzalida, Gonzalo; Gallardo-Escárate, Cristián; Tanguy, Arnaud

    2011-09-01

    The marine gastropod Concholepas concholepas, locally known as the "loco", is the main target species of the benthonic Chilean fisheries. Genetic and genomic tools are necessary to study the genome of this species in order to understand the molecular basis of its development, growth, and other key traits to improve the management strategies and to identify local adaptation to prevent loss of biodiversity. Here, we use pyrosequencing technologies to generate the first transcriptomic database from adult specimens of the loco. After trimming, a total of 140,756 Expressed Sequence Tag sequences were achieved. Clustering and assembly analysis identified 19,219 contigs and 105,435 singleton sequences. BlastN analysis showed a significant identity with Expressed Sequence Tags of different gastropod species available in public databases. Similarly, BlastX results showed that only 895 out of the total 124,654 had significant hits and may represent novel genes for marine gastropods. From this database, simple sequence repeat motifs were also identified and a total of 38 primer pairs were designed and tested to assess their potential as informative markers and to investigate their cross-species amplification in different related gastropod species. This dataset represents the first publicly available 454 data for a marine gastropod endemic to the southeastern Pacific coast, providing a valuable transcriptomic resource for future efforts of gene discovery and development of functional markers in other marine gastropods. Copyright © 2011 Elsevier B.V. All rights reserved.

  3. PExFInS: An Integrative Post-GWAS Explorer for Functional Indels and SNPs

    PubMed Central

    Cheng, Zhongshan; Chu, Hin; Fan, Yanhui; Li, Cun; Song, You-Qiang; Zhou, Jie; Yuen, Kwok-Yung

    2015-01-01

    Expression quantitative trait loci (eQTLs) mapping and linkage disequilibrium (LD) analysis have been widely employed to interpret findings of genome-wide association studies (GWAS). With the availability of deep sequencing data of 423 lymphoblastoid cell lines (LCLs) from six global populations and the microarray expression data, we performed eQTL analysis, identified more than 228 K SNP cis-eQTLs and 21 K indel cis-eQTLs and generated a LCL cis-eQTL database. We demonstrate that the percentages of population-shared and population-specific cis-eQTLs are comparable; while indel cis-eQTLs in the population-specific subsection make more contribution to gene expression variations than those in the population-shared subsection. We found cis-eQTLs, especially the population-shared cis-eQTLs are significantly enriched toward transcription start site. Moreover, the National Human Genome Research Institute cataloged GWAS SNPs are enriched for LCL cis-eQTLs. Specifically, 32.8% GWAS SNPs are LCL cis-eQTLs, among which 12.5% can be tagged by indel cis-eQTLs, suggesting the fundamental contribution of indel cis-eQTLs to GWAS association signals. To search for functional indels and SNPs tagging GWAS SNPs, a pipeline Post-GWAS Explorer for Functional Indels and SNPs (PExFInS) has been developed, integrating LD analysis, functional annotation from public databases, cis-eQTL mapping with our LCL cis-eQTL database and other published cis-eQTL datasets. PMID:26612672

  4. Bioinformatics and expressional analysis of cDNA clones from floral buds

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena Ewa; Skarzyńska, Agnieszka; Cebula, Justyna; Hincha, Dirck; ZiÄ bska, Karolina; PlÄ der, Wojciech; Przybecki, Zbigniew

    2017-08-01

    The application of genomic approaches may serve as an initial step in understanding the complexity of biochemical network and cellular processes responsible for regulation and execution of many developmental tasks. The molecular mechanism of sex expression in cucumber is still not elucidated. A study of differential expression was conducted to identify genes involved in sex determination and floral organ morphogenesis. Herein, we present generation of expression sequence tags (EST) obtained by differential hybridization (DH) and subtraction technique (cDNA-DSC) and their characteristic features such as molecular function, involvement in biology processes, expression and mapping position on the genome.

  5. Development, deployment and operations of ATLAS databases

    NASA Astrophysics Data System (ADS)

    Vaniachine, A. V.; Schmitt, J. G. v. d.

    2008-07-01

    In preparation for ATLAS data taking, a coordinated shift from development towards operations has occurred in ATLAS database activities. In addition to development and commissioning activities in databases, ATLAS is active in the development and deployment (in collaboration with the WLCG 3D project) of the tools that allow the worldwide distribution and installation of databases and related datasets, as well as the actual operation of this system on ATLAS multi-grid infrastructure. We describe development and commissioning of major ATLAS database applications for online and offline. We present the first scalability test results and ramp-up schedule over the initial LHC years of operations towards the nominal year of ATLAS running, when the database storage volumes are expected to reach 6.1 TB for the Tag DB and 1.0 TB for the Conditions DB. ATLAS database applications require robust operational infrastructure for data replication between online and offline at Tier-0, and for the distribution of the offline data to Tier-1 and Tier-2 computing centers. We describe ATLAS experience with Oracle Streams and other technologies for coordinated replication of databases in the framework of the WLCG 3D services.

  6. ScienceCentral: open access full-text archive of scientific journals based on Journal Article Tag Suite regardless of their languages

    PubMed Central

    Huh, Sun

    2013-01-01

    ScienceCentral, a free or open access, full-text archive of scientific journal literature at the Korean Federation of Science and Technology Societies, was under test in September 2013. Since it is a Journal Article Tag Suite-based full text database, extensible markup language files of all languages can be presented, according to Unicode Transformation Format 8-bit encoding. It is comparable to PubMed Central: however, there are two distinct differences. First, its scope comprises all science fields; second, it accepts all language journals. Launching ScienceCentral is the first step for free access or open access academic scientific journals of all languages to leap to the world, including scientific journals from Croatia. PMID:24266292

  7. Building Extraction Based on Openstreetmap Tags and Very High Spatial Resolution Image in Urban Area

    NASA Astrophysics Data System (ADS)

    Kang, L.; Wang, Q.; Yan, H. W.

    2018-04-01

    How to derive contour of buildings from VHR images is the essential problem for automatic building extraction in urban area. To solve this problem, OSM data is introduced to offer vector contour information of buildings which is hard to get from VHR images. First, we import OSM data into database. The line string data of OSM with tags of building, amenity, office etc. are selected and combined into completed contours; Second, the accuracy of contours of buildings is confirmed by comparing with the real buildings in Google Earth; Third, maximum likelihood classification is conducted with the confirmed building contours, and the result demonstrates that the proposed approach is effective and accurate. The approach offers a new way for automatic interpretation of VHR images.

  8. The alternative oxidase family of Vitis vinifera reveals an attractive model to study the importance of genomic design.

    PubMed

    Costa, José Hélio; de Melo, Dirce Fernandes; Gouveia, Zélia; Cardoso, Hélia Guerra; Peixe, Augusto; Arnholdt-Schmitt, Birgit

    2009-12-01

    'Genomic design' refers to the structural organization of gene sequences. Recently, the role of intron sequences for gene regulation is being better understood. Further, introns possess high rates of polymorphism that are considered as the major source for speciation. In molecular breeding, the length of gene-specific introns is recognized as a tool to discriminate genotypes with diverse traits of agronomic interest. 'Economy selection' and 'time-economy selection' have been proposed as models for explaining why highly expressed genes typically contain small introns. However, in contrast to these theories, plant-specific selection reveals that highly expressed genes contain introns that are large. In the presented research, 'wet'Aox gene identification from grapevine is advanced by a bioinformatics approach to study the species-specific organization of Aox gene structures in relation to available expressed sequence tag (EST) data. Two Aox1 and one Aox2 gene sequences have been identified in Vitis vinifera using grapevine cultivars from Portugal and Germany. Searching the complete genome sequence data of two grapevine cultivars confirmed that V. vinifera alternative oxidase (Aox) is encoded by a small multigene family composed of Aox1a, Aox1b and Aox2. An analysis of EST distribution revealed high expression of the VvAox2 gene. A relationship between the atypical long primary transcript of VvAox2 (in comparison to other plant Aox genes) and its expression level is suggested. V. vinifera Aox genes contain four exons interrupted by three introns except for Aox1a which contains an additional intron in the 3'-UTR. The lengths of primary Aox transcripts were estimated for each gene in two V. vinifera varieties: PN40024 and Pinot Noir. In both varieties, Aox1a and Aox1b contained small introns that corresponded to primary transcript lengths ranging from 1501 to 1810 bp. The Aox2 of PN40024 (12 329 bp) was longer than that from Pinot Noir (7279 bp) because of selection against a transposable-element insertion that is 5028 bp in size. An EST database basic local alignment search tool (BLAST) search of GenBank revealed the following ESTs percentages for each gene: Aox1a (26.2%), Aox1b (11.9%) and Aox2 (61.9%). Aox1a was expressed in fruits and roots, Aox1b expression was confined to flowers and Aox2 was ubiquitously expressed. These data for V. vinifera show that atypically long Aox intron lengths are related to high levels of gene expression. Furthermore, it is shown for the first time that two grapevine cultivars can be distinguished by Aox intron length polymorphism.

  9. The inclusion of an online journal in PubMed central - a difficult path.

    PubMed

    Grech, Victor

    2016-01-01

    The indexing of a journal in a prominent database (such as PubMed) is an important imprimatur. Journals accepted for inclusion in PubMed Central (PMC) are automatically indexed in PubMed but must provide the entire contents of their publications as XML-tagged (Extensible Markup Language) data files compliant with PubMed's document type definition (DTD). This paper describes the various attempts that the journal Images in Paediatric Cardiology made in its efforts to convert the journal contents (including all of the extant backlog) to PMC-compliant XML for archiving and indexing in PubMed after the journal was accepted for inclusion by the database.

  10. Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa.

    PubMed

    Shahin, Arwa; van Kaauwen, Martijn; Esselink, Danny; Bargsten, Joachim W; van Tuyl, Jaap M; Visser, Richard G F; Arens, Paul

    2012-11-20

    Bulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Two transcriptome sets were built that are valuable resources for marker development, comparative genomic studies and candidate gene approaches. Next generation sequencing of leaf transcriptome is very effective; however, deeper sequencing and using more tissues and stages is advisable for extended comparative studies.

  11. Piloting a livestock identification and traceability system in the northern Tanzania-Narok-Nairobi trade route.

    PubMed

    Mutua, Florence; Kihara, Absolomon; Rogena, Jason; Ngwili, Nicholas; Aboge, Gabriel; Wabacha, James; Bett, Bernard

    2018-02-01

    We designed and piloted a livestock identification and traceability system (LITS) along the Northern Tanzania-Narok-Nairobi beef value chain. Animals were randomly selected and identified at the primary markets using uniquely coded ear tags. Data on identification, ownership, source (village), and the site of recruitment (primary market) were collected and posted to an online database. Similar data were collected in all the markets where tagged animals passed through until they got to defined slaughterhouses. Meat samples were collected during slaughter and later analyzed for tetracycline and diminazene residues using high-performance liquid chromatography (HPLC). Follow up surveys were done to assess the pilot system. The database captured a total of 4260 records from 741 cattle. Cattle recruited in the primary markets in Narok (n = 1698) either came from farms (43.8%), local markets (37.7%), or from markets in Tanzania (18.5%). Soit Sambu market was the main source of animals entering the market from Tanzania (54%; n = 370). Most tagged cattle (72%, n = 197) were slaughtered at the Ewaso Ng'iro slaughterhouse in Narok. Lesions observed (5%; n = 192) were related to either hydatidosis or fascioliasis. The mean diminazene aceturate residue level was 320.78 ± 193.48 ppb. We used the traceability system to identify sources of animals with observable high drug residue levels in tissues. Based on the findings from this study, we discuss opportunities for LITS-as a tool for surveillance for both animal health and food safety, and outline challenges of its deployment in a local beef value chain-such as limited incentives for uptake.

  12. Lipopolysaccharide-induced innate immune factors in the bottlenose dolphin (Tursiops truncatus) detected in expression sequence tag analysis.

    PubMed

    Ohishi, Kazue; Shishido, Reiko; Iwata, Yasunao; Saitoh, Masafumi; Takenaka, Ryota; Ohtsu, Dai; Okutsu, Kenji; Maruyama, Tadashi

    2011-11-01

    EST analysis based on the megaclone-megasorting method was performed using leukocytes from the bottlenose dolphin (Tursiops truncatus) with or without LPS stimulation. A total of 849 upregulated and 384 downregulated EST clones were sequenced, annotated, and functionally classified. Ferritin heavy peptide I was the most abundant upregulated transcript, suggesting that LPS stimulation induced high production of reactive oxygen species, which were sequestered in ferritin. Among the immune factors, the transcripts coding for an IL-1Ra, homologs to bovine serum amyloid A3, and canine intercellular adhesion molecule-1 were highly expressed. Markedly downregulated transcripts of immune factors were those for homologs of calcium-binding proteins belonging to the S100 family, S100A12, S100A8, and S100A6. Time-course experiments on the expression of some immune factors including IL-1Ra suggested that these factors interact and control cetacean innate immunity. © 2011 The Societies and Blackwell Publishing Asia Pty Ltd.

  13. Differential Gene Expression from Midguts of Refractory and Susceptible Lines of the Mosquito, Aedes aegypti, Infected with Dengue-2 Virus

    PubMed Central

    Barón, Olga L.; Ursic-Bedoya, Raul J.; Lowenberger, Carl A.; Ocampo, Clara B.

    2010-01-01

    Suppressive subtractive hybridization was used to evaluate the differential expression of midgut genes of feral populations of Aedes aegypti (Diptera: Culicidae) from Colombia that are naturally refractory or susceptible to Dengue-2 virus infection. A total of 165 differentially expressed sequence tags (ESTs) were identified in the subtracted libraries. The analysis showed a higher number of differentially expressed genes in the susceptible Ae. aegypti individuals than the refractory mosquitoes. The functional annotation of ESTs revealed a broad response in the susceptible library that included immune molecules, metabolic molecules and transcription factors. In the refractory strain, there was the presence of a trypsin inhibitor gene, which could play a role in the infection. These results serve as a template for more detailed studies aiming to characterize the genetic components of refractoriness, which in turn can be used to devise new approaches to combat transmission of dengue fever. PMID:20572793

  14. From biomedicine to natural history research: EST resources for ambystomatid salamanders

    PubMed Central

    Putta, Srikrishna; Smith, Jeramiah J; Walker, John A; Rondet, Mathieu; Weisrock, David W; Monaghan, James; Samuels, Amy K; Kump, Kevin; King, David C; Maness, Nicholas J; Habermann, Bianca; Tanaka, Elly; Bryant, Susan V; Gardiner, David M; Parichy, David M; Voss, S Randal

    2004-01-01

    Background Establishing genomic resources for closely related species will provide comparative insights that are crucial for understanding diversity and variability at multiple levels of biological organization. We developed ESTs for Mexican axolotl (Ambystoma mexicanum) and Eastern tiger salamander (A. tigrinum tigrinum), species with deep and diverse research histories. Results Approximately 40,000 quality cDNA sequences were isolated for these species from various tissues, including regenerating limb and tail. These sequences and an existing set of 16,030 cDNA sequences for A. mexicanum were processed to yield 35,413 and 20,599 high quality ESTs for A. mexicanum and A. t. tigrinum, respectively. Because the A. t. tigrinum ESTs were obtained primarily from a normalized library, an approximately equal number of contigs were obtained for each species, with 21,091 unique contigs identified overall. The 10,592 contigs that showed significant similarity to sequences from the human RefSeq database reflected a diverse array of molecular functions and biological processes, with many corresponding to genes expressed during spinal cord injury in rat and fin regeneration in zebrafish. To demonstrate the utility of these EST resources, we searched databases to identify probes for regeneration research, characterized intra- and interspecific nucleotide polymorphism, saturated a human – Ambystoma synteny group with marker loci, and extended PCR primer sets designed for A. mexicanum / A. t. tigrinum orthologues to a related tiger salamander species. Conclusions Our study highlights the value of developing resources in traditional model systems where the likelihood of information transfer to multiple, closely related taxa is high, thus simultaneously enabling both laboratory and natural history research. PMID:15310388

  15. Cell-free translational screening of an expression sequence tag library of Clonorchis sinensis for novel antigen discovery.

    PubMed

    Kasi, Devi; Catherine, Christy; Lee, Seung-Won; Lee, Kyung-Ho; Kim, Yu Jung; Ro Lee, Myeong; Ju, Jung Won; Kim, Dong-Myung

    2017-05-01

    The rapidly evolving cloning and sequencing technologies have enabled understanding of genomic structure of parasite genomes, opening up new ways of combatting parasite-related diseases. To make the most of the exponentially accumulating genomic data, however, it is crucial to analyze the proteins encoded by these genomic sequences. In this study, we adopted an engineered cell-free protein synthesis system for large-scale expression screening of an expression sequence tag (EST) library of Clonorchis sinensis to identify potential antigens that can be used for diagnosis and treatment of clonorchiasis. To allow high-throughput expression and identification of individual genes comprising the library, a cell-free synthesis reaction was designed such that both the template DNA and the expressed proteins were co-immobilized on the same microbeads, leading to microbead-based linkage of the genotype and phenotype. This reaction configuration allowed streamlined expression, recovery, and analysis of proteins. This approach enabled us to identify 21 antigenic proteins. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 33:832-837, 2017. © 2017 American Institute of Chemical Engineers.

  16. In silico search, characterization and validation of new EST-SSR markers in the genus Prunus.

    PubMed

    Sorkheh, Karim; Prudencio, Angela S; Ghebinejad, Azim; Dehkordi, Mehrana Kohei; Erogul, Deniz; Rubio, Manuel; Martínez-Gómez, Pedro

    2016-07-07

    Simple sequence repeats (SSRs) are defined as sequence repeat units between 1 and 6 bp that occur in both coding and non-coding regions abundant in eukaryotic genomes, which may affect the expression of genes. In this study, expressed sequence tags (ESTs) of eight Prunus species were analyzed for in silico mining of EST-SSRs, protein annotation, and open reading frames (ORFs), and the identification of codon repetitions. A total of 316 SSRs were identified using MISA software. Dinucleotide SSR motifs (26.31 %) were found to be the most abundant type of repeats, followed by tri- (14.58 %), tetra- (0.53 %), and penta- (0.27 %) nucleotide motifs. An attempt was made to design primer pairs for 316 identified SSRs but these were successful for only 175 SSR sequences. The positions of SSRs with respect to ORFs were detected, and annotation of sequences containing SSRs was performed to assign function to each sequence. SSRs were also characterized (in terms of position in the reference genome and associated gene) using the two available Prunus reference genomes (mei and peach). Finally, 38 SSR markers were validated across peach, almond, plum, and apricot genotypes. This validation showed a higher transferability level of EST-SSR developed in P. mume (mei) in comparison with the rest of species analyzed. Findings will aid analysis of functionally important molecular markers and facilitate the analysis of genetic diversity.

  17. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

    PubMed

    Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

    2009-02-04

    Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  18. Cloning and analysis of DnaJ family members in the silkworm, Bombyx mori.

    PubMed

    Li, Yinü; Bu, Cuiyu; Li, Tiantian; Wang, Shibao; Jiang, Feng; Yi, Yongzhu; Yang, Huipeng; Zhang, Zhifang

    2016-01-15

    Heat shock proteins (Hsps) are involved in a variety of critical biological functions, including protein folding, degradation, and translocation and macromolecule assembly, act as molecular chaperones during periods of stress by binding to other proteins. Using expressed sequence tag (EST) and silkworm (Bombyx mori) transcriptome databases, we identified 27 cDNA sequences encoding the conserved J domain, which is found in DnaJ-type Hsps. Of the 27 J domain-containing sequences, 25 were complete cDNA sequences. We divided them into three types according to the number and presence of conserved domains. By analyzing the gene structures, intron numbers, and conserved domains and constructing a phylogenetic tree, we found that the DnaJ family had undergone convergent evolution, obtaining new domains to expand the diversity of its family members. The acquisition of the new DnaJ domains most likely occurred prior to the evolutionary divergence of prokaryotes and eukaryotes. The expression of DnaJ genes in the silkworm was generally higher in the fat body. The tissue distribution of DnaJ1 proteins was detected by western blotting, demonstrating that in the fifth-instar larvae, the DnaJ1 proteins were expressed at their highest levels in hemocytes, followed by the fat body and head. We also found that the DnaJ1 transcripts were likely differentially translated in different tissues. Using immunofluorescence cytochemistry, we revealed that in the blood cells, DnaJ1 was mainly localized in the cytoplasm. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. Proteome Analysis of Watery Saliva Secreted by Green Rice Leafhopper, Nephotettix cincticeps

    PubMed Central

    Hattori, Makoto; Komatsu, Setsuko; Noda, Hiroaki; Matsumoto, Yukiko

    2015-01-01

    The green rice leafhopper, Nephotettix cincticeps, is a vascular bundle feeder that discharges watery and gelling saliva during the feeding process. To understand the potential functions of saliva for successful and safe feeding on host plants, we analyzed the complexity of proteinaceous components in the watery saliva of N. cincticeps. Salivary proteins were collected from a sucrose diet that adult leafhoppers had fed on through a membrane of stretched parafilm. Protein concentrates were separated using SDS-PAGE under reducing and non-reducing conditions. Six proteins were identified by a gas-phase protein sequencer and two proteins were identified using LC-MS/MS analysis with reference to expressed sequence tag (EST) databases of this species. Full -length cDNAs encoding these major proteins were obtained by rapid amplification of cDNA ends-PCR (RACE-PCR) and degenerate PCR. Furthermore, gel-free proteome analysis that was performed to cover the broad range of salivary proteins with reference to the latest RNA-sequencing data from the salivary gland of N. cincticeps, yielded 63 additional protein species. Out of 71 novel proteins identified from the watery saliva, about 60 % of those were enzymes or other functional proteins, including GH5 cellulase, transferrin, carbonic anhydrases, aminopeptidase, regucalcin, and apolipoprotein. The remaining proteins appeared to be unique and species- specific. This is the first study to identify and characterize the proteins in watery saliva of Auchenorrhyncha species, especially sheath-producing, vascular bundle-feeders. PMID:25909947

  20. Global characterization of Artemisia annua glandular trichome transcriptome using 454 pyrosequencing

    PubMed Central

    Wang, Wei; Wang, Yejun; Zhang, Qing; Qi, Yan; Guo, Dianjing

    2009-01-01

    Background Glandular trichomes produce a wide variety of commercially important secondary metabolites in many plant species. The most prominent anti-malarial drug artemisinin, a sesquiterpene lactone, is produced in glandular trichomes of Artemisia annua. However, only limited genomic information is currently available in this non-model plant species. Results We present a global characterization of A. annua glandular trichome transcriptome using 454 pyrosequencing. Sequencing runs using two normalized cDNA collections from glandular trichomes yielded 406,044 expressed sequence tags (average length = 210 nucleotides), which assembled into 42,678 contigs and 147,699 singletons. Performing a second sequencing run only increased the number of genes identified by ~30%, indicating that massively parallel pyrosequencing provides deep coverage of the A. annua trichome transcriptome. By BLAST search against the NCBI non-redundant protein database, putative functions were assigned to over 28,573 unigenes, including previously undescribed enzymes likely involved in sesquiterpene biosynthesis. Comparison with ESTs derived from trichome collections of other plant species revealed expressed genes in common functional categories across different plant species. RT-PCR analysis confirmed the expression of selected unigenes and novel transcripts in A. annua glandular trichomes. Conclusion The presence of contigs corresponding to enzymes for terpenoids and flavonoids biosynthesis suggests important metabolic activity in A. annua glandular trichomes. Our comprehensive survey of genes expressed in glandular trichome will facilitate new gene discovery and shed light on the regulatory mechanism of artemisinin metabolism and trichome function in A. annua. PMID:19818120

Top