Science.gov

Sample records for sequence tags analysis

  1. Expressed sequence tags: analysis and annotation.

    PubMed

    Parkinson, John; Blaxter, Mark

    2004-01-01

    Expressed sequence tags (ESTs) present a special set of problems for bioinformatic analysis. They are partial and error-prone, and large datasets can have significant internal redundancy. To facilitate analysis of small EST datasets from in-house projects, we present an integrated "pipeline" of tools that take EST data from sequence trace to database submission. These tools also can be used to provide clustering of ESTs into putative genes and to annotate these genes with preliminary sequence similarity searches. The systems are written to use the public-domain LINUX environment and other openly available analytical tools. PMID:15153624

  2. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    NASA Astrophysics Data System (ADS)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  3. Expressed sequence tag analysis in tef (Eragrostis tef (Zucc) Trotter).

    PubMed

    Yu, Ju-Kyung; Sun, Qi; Rota, Mauricio La; Edwards, Hugh; Tefera, Hailu; Sorrells, Mark E

    2006-04-01

    Tef (Eragrostis tef (Zucc.) Trotter) is the most important cereal crop in Ethiopia; however, there is very little DNA sequence information available for this species. Expressed sequence tags (ESTs) were generated from 4 cDNA libraries: seedling leaf, seedling root, and inflorescence of E. tef and seedling leaf of Eragrostis pilosa, a wild relative of E. tef. Clustering of 3603 sequences produced 530 clusters and 1890 singletons, resulting in 2420 tef unigenes. Approximately 3/4 of tef unigenes matched protein or nucleotide sequences in public databases. Annotation of unigenes associated 68% of the putative tef genes with gene ontology categories. Identification of the translated unigenes for conserved protein domains revealed 389 protein family domains (Pfam), the most frequent of which was protein kinase. A total of 170 ESTs containing simple sequence repeats (EST-SSRs) were identified and 80 EST-SSR markers were developed. In addition, 19 single-nucleotide polymorphism (SNP) and (or) insertion-deletion (indel) and 34 intron fragment length polymorphism (IFLP) markers were developed. The EST database and molecular markers generated in this study will be valuable resources for further tef genetic research. PMID:16699556

  4. Analysis of the dermatophyte Trichophyton rubrum expressed sequence tags

    PubMed Central

    Wang, Lingling; Ma, Li; Leng, Wenchuan; Liu, Tao; Yu, Lu; Yang, Jian; Yang, Li; Zhang, Wenliang; Zhang, Qian; Dong, Jie; Xue, Ying; Zhu, Yafang; Xu, Xingye; Wan, Zhe; Ding, Guohui; Yu, Fudong; Tu, Kang; Li, Yixue; Li, Ruoyu; Shen, Yan; Jin, Qi

    2006-01-01

    Background Dermatophytes are the primary causative agent of dermatophytoses, a disease that affects billions of individuals worldwide. Trichophyton rubrum is the most common of the superficial fungi. Although T. rubrum is a recognized pathogen for humans, little is known about how its transcriptional pattern is related to development of the fungus and establishment of disease. It is therefore necessary to identify genes whose expression is relevant to growth, metabolism and virulence of T. rubrum. Results We generated 10 cDNA libraries covering nearly the entire growth phase and used them to isolate 11,085 unique expressed sequence tags (ESTs), including 3,816 contigs and 7,269 singletons. Comparisons with the GenBank non-redundant (NR) protein database revealed putative functions or matched homologs from other organisms for 7,764 (70%) of the ESTs. The remaining 3,321 (30%) of ESTs were only weakly similar or not similar to known sequences, suggesting that these ESTs represent novel genes. Conclusion The present data provide a comprehensive view of fungal physiological processes including metabolism, sexual and asexual growth cycles, signal transduction and pathogenic mechanisms. PMID:17032460

  5. Analysis of expressed sequence tags (ESTs) from Agrostis species obtained using sequence related amplified polymorphism.

    PubMed

    Dinler, Gizem; Budak, Hikmet

    2008-10-01

    Bentgrass (Agrostis spp.), a genus of the Poaceae family, consists of more than 200 species and is mainly used in athletic fields and golf courses. Creeping bentgrass (A. stolonifera L.) is the most commonly used species in maintaining golf courses, followed by colonial bentgrass (A. capillaris L.) and velvet bentgrass (A. canina L.). The presence and nature of sequence related amplified polymorphism (SRAP) at the cDNA level were investigated. We isolated 80 unique cDNA fragment bands from these species using 56 SRAP primer combinations. Sequence analysis of cDNA clones and analysis of putative translation products revealed that some encoded amino acid sequences were similar to proteins involved in DNA synthesis, transcription, and signal transduction. The cytosolic glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene (GenBank accession no. EB812822) was also identified from velvet bentgrass, and the corresponding protein sequence is further analyzed due to its critical role in many cellular processes. The partial peptide sequence obtained was 112 amino acids long, presenting a high degree of homology to parts of the N-terminal and C-terminal regions of cytosolic phosphorylating GAPDH (GapC). The existence of common expressed sequence tags (ESTs) revealed by a minimum evolutionary dendrogram among the Agrostis ESTs indicated the usefulness of SRAP for comparative genome analysis of transcribed genes in the grass species. PMID:18726683

  6. Expressed sequence tags: an overview.

    PubMed

    Parkinson, John; Blaxter, Mark

    2009-01-01

    Expressed sequence tags (ESTs) are fragments of mRNA sequences derived through single sequencing reactions performed on randomly selected clones from cDNA libraries. To date, over 45 million ESTs have been generated from over 1400 different species of eukaryotes. For the most part, EST projects are used to either complement existing genome projects or serve as low-cost alternatives for purposes of gene discovery. However, with improvements in accuracy and coverage, they are beginning to find application in fields such as phylogenetics, transcript profiling and proteomics. This volume provides practical details on the generation and analysis of ESTs. Chapters are presented which cover creation of cDNA libraries; generation and processing of sequence data; bioinformatics analysis of ESTs; and their application to phylogenetics and transcript profiling. PMID:19277571

  7. Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius.

    PubMed

    Al-Swailem, Abdulaziz M; Shehata, Maher M; Abu-Duhier, Faisel M; Al-Yamani, Essam J; Al-Busadah, Khalid A; Al-Arawi, Mohammed S; Al-Khider, Ali Y; Al-Muhaimeed, Abdullah N; Al-Qahtani, Fahad H; Manee, Manee M; Al-Shomrani, Badr M; Al-Qhtani, Saad M; Al-Harthi, Amer S; Akdemir, Kadir C; Inan, Mehmet S; Otu, Hasan H

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  8. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  9. Generation and analysis of expressed sequence tags from the ciliate protozoan parasite Ichthyophthirius multifiliis

    PubMed Central

    Abernathy, Jason W; Xu, Peng; Li, Ping; Xu, De-Hai; Kucuktas, Huseyin; Klesius, Phillip; Arias, Covadonga; Liu, Zhanjiang

    2007-01-01

    Background The ciliate protozoan Ichthyophthirius multifiliis (Ich) is an important parasite of freshwater fish that causes 'white spot disease' leading to significant losses. A genomic resource for large-scale studies of this parasite has been lacking. To study gene expression involved in Ich pathogenesis and virulence, our goal was to generate expressed sequence tags (ESTs) for the development of a powerful microarray platform for the analysis of global gene expression in this species. Here, we initiated a project to sequence and analyze over 10,000 ESTs. Results We sequenced 10,368 EST clones using a normalized cDNA library made from pooled samples of the trophont, tomont, and theront life-cycle stages, and generated 9,769 sequences (94.2% success rate). Post-sequencing processing led to 8,432 high quality sequences. Clustering analysis of these ESTs allowed identification of 4,706 unique sequences containing 976 contigs and 3,730 singletons. These unique sequences represent over two million base pairs (~10% of Plasmodium falciparum genome, a phylogenetically related protozoan). BLASTX searches produced 2,518 significant (E-value < 10-5) hits and further Gene Ontology (GO) analysis annotated 1,008 of these genes. The ESTs were analyzed comparatively against the genomes of the related protozoa Tetrahymena thermophila and P. falciparum, allowing putative identification of additional genes. All the EST sequences were deposited by dbEST in GenBank (GenBank: EG957858–EG966289). Gene discovery and annotations are presented and discussed. Conclusion This set of ESTs represents a significant proportion of the Ich transcriptome, and provides a material basis for the development of microarrays useful for gene expression studies concerning Ich development, pathogenesis, and virulence. PMID:17577414

  10. Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida

    PubMed Central

    Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

    2007-01-01

    Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730

  11. Analysis of early hepatic stage schistosomula gene expression by subtractive expressed sequence tags library.

    PubMed

    Wang, Xinzhi; Gobert, Geoffrey N; Feng, XinGang; Fu, Zhiqiang; Jin, Yamei; Peng, Jinbiao; Lin, Jiaojiao

    2009-07-01

    Schistosome parasites require a complex lifecycle requiring two hosts and aquatic phases of development. The schistosomula is a key phase of parasite development within the mammalian host, however relatively little is understood about the molecular processes underlying this stage. In this study 5723 subtractive expressed sequence tags (ESTs) were randomly selected from a 7 day hepatic schistosomula enriched library constructed using suppression subtractive hybridization method. Sequence analysis of these ESTs identified 1762 unique genes (contigs). Among them, 989 contigs were annotated with known genes, 311 contigs were homologous to established genes, 101 contigs were similar to established genes, 72 contigs were weakly similar to established genes and 289 sequences did not match any published sequences. Genes identified related to metabolism, cellular development, immune evasion and host-parasite interactions were identified as enriched in the hepatic schistosomula stage. The future identification of poorly annotated but stage-specific genes may potentially represent new drugs or vaccine targets, applicable for the future controlling of schistosomiasis. PMID:19428674

  12. Expressed sequence tag analysis of the emu (Dromaius novaehollandiae) pituitary by 454 GS Junior pyrosequencing.

    PubMed

    Kim, Ji Eun; Leung, Frederick C; Jiang, Jingwei; Kwok, Amy H Y; Bennett, Darin C; Cheng, Kimberly M

    2013-01-01

    Emus (Dromaius novaehollandiae) are farmed for their oil for pharmaceutical and cosmetic uses. This emu pituitary expressed sequence tag study was undertaken to identify novel transcripts in the emu pituitary to propel their identification and functional studies. By mapping reads derived from the Roche 454 GS Junior pyrosequencer to 8 reference species (human, mouse, chicken, zebra finch, fruit fly, turkey, round worm, and Carolina anole lizard) from the UniGene database, a total of 81,788 reads (53,312 mapped reads) were obtained and assembled with Reference Sequence (RefSeq). We annotated 6,676 potential emu genes by referencing 7 species (excluding lizard) and identified 1,232 potential genes common among 3 species (human, mouse, and chicken) with complete available reference genomes. Gene Ontology analysis revealed 376 Gene Ontology terms showing, with the highest counts, their involvements in biological processes, metabolism, and cellular components. These potential genes were detected to associate with 20 pathways including mitogen-activated protein kinase, insulin, neurotrophin signaling pathways, and carbohydrate digestion and absorption pathway. We also revealed a panel of tissue-specific genes including regulator of G-protein signaling protein (RGS), glucagon-like peptide receptor (GLPR), and growth hormone-inducible transmembrane protein (GHITM). Additionally, fatty acid binding protein (FABP), fatty acid desaturase (FAS), and stearoyl-coenzyme A desaturase (SCD), key enzyme genes in fat metabolism, were found to be also expressed in emu pituitary. This expressed sequence tag study represents the first step in functional characterization of emu pituitary gene expression and SNP identification for the improvement of fat production in the emu. PMID:23243234

  13. Comparative gene expression in the symbiotic and aposymbiotic Aiptasia pulchella by expressed sequence tag analysis.

    PubMed

    Kuo, Jimmy; Chen, Ming-Chyuan; Lin, Chorng-Horng; Fang, Lee-Shing

    2004-05-21

    Intracellular symbiotic relationships are prevalent between cnidarians, such as corals and sea anemones, and the photosynthetic dinoflagellate symbionts. However, there is little understanding about how the genes express when the symbiotic relationship is set up. To characterize genes involved in this association, the endosymbiosis between sea anemone, Aiptasia pulchella, and dinoflagellate zooxanthellae, Symbiodinium spp., was employed as a model. Two complementary DNA (cDNA) libraries were constructed from RNA isolated from symbiotic and aposymbiotic A. pulchella. Using single-pass sequencing of cDNA clones, a total of 870 expressed sequence tags (ESTs) clones were generated from the two libraries: 474 from symbiotic animal and 396 from aposymbiotic animal. The initial ESTs consisted of 143 clusters and 231 singletons. A BLASTX search revealed that 147 unique genes had similarities with protein sequences available from databases; 120 of these clones were categorized according to their putative function. However, many ESTs could not assign functionally. The putative roles of some of the identified genes relative to endosymbiosis were discussed. This is the first report of the use of EST analysis to examine the gene expression in symbiotic and aposymbiotic states of the cnidarians. The systematic analysis of EST from this study provides a useful database for future investigations of the molecular mechanisms involved in algal-cnidarian symbiosis. PMID:15110770

  14. Genome-wide analysis of immune system genes by expressed sequence Tag profiling.

    PubMed

    Giallourakis, Cosmas C; Benita, Yair; Molinie, Benoit; Cao, Zhifang; Despo, Orion; Pratt, Henry E; Zukerberg, Lawrence R; Daly, Mark J; Rioux, John D; Xavier, Ramnik J

    2013-06-01

    Profiling studies of mRNA and microRNA, particularly microarray-based studies, have been extensively used to create compendia of genes that are preferentially expressed in the immune system. In some instances, functional studies have been subsequently pursued. Recent efforts such as the Encyclopedia of DNA Elements have demonstrated the benefit of coupling RNA sequencing analysis with information from expressed sequence tags (ESTs) for transcriptomic analysis. However, the full characterization and identification of transcripts that function as modulators of human immune responses remains incomplete. In this study, we demonstrate that an integrated analysis of human ESTs provides a robust platform to identify the immune transcriptome. Beyond recovering a reference set of immune-enriched genes and providing large-scale cross-validation of previous microarray studies, we discovered hundreds of novel genes preferentially expressed in the immune system, including noncoding RNAs. As a result, we have established the Immunogene database, representing an integrated EST road map of gene expression in human immune cells, which can be used to further investigate the function of coding and noncoding genes in the immune system. Using this approach, we have uncovered a unique metabolic gene signature of human macrophages and identified PRDM15 as a novel overexpressed gene in human lymphomas. Thus, we demonstrate the utility of EST profiling as a basis for further deconstruction of physiologic and pathologic immune processes. PMID:23616578

  15. Expressed sequence tag analysis of functional genes associated with adventitious rooting in Liriodendron hybrids.

    PubMed

    Zhong, Y D; Sun, X Y; Liu, E Y; Li, Y Q; Gao, Z; Yu, F X

    2016-01-01

    Liriodendron hybrids (Liriodendron chinense x L. tulipifera) are important landscaping and afforestation hardwood trees. To date, little genomic research on adventitious rooting has been reported in these hybrids, as well as in the genus Liriodendron. In the present study, we used adventitious roots to construct the first cDNA library for Liriodendron hybrids. A total of 5176 expressed sequence tags (ESTs) were generated and clustered into 2921 unigenes. Among these unigenes, 2547 had significant homology to the non-redundant protein database representing a wide variety of putative functions. Homologs of these genes regulated many aspects of adventitious rooting, including those for auxin signal transduction and root hair development. Results of quantitative real-time polymerase chain reaction showed that AUX1, IRE, and FB1 were highly expressed in adventitious roots and the expression of AUX1, ARF1, NAC1, RHD1, and IRE increased during the development of adventitious roots. Additionally, 181 simple sequence repeats were identified from 166 ESTs and more than 91.16% of these were dinucleotide and trinucleotide repeats. To the best of our knowledge, the present study reports the identification of the genes associated with adventitious rooting in the genus Liriodendron for the first time and provides a valuable resource for future genomic studies. Expression analysis of selected genes could allow us to identify regulatory genes that may be essential for adventitious rooting. PMID:27420958

  16. Expressed sequence tag analysis of the erythrocytic stage of Plasmodium berghei.

    PubMed

    Seok, Ji-Woong; Lee, Yong-Seok; Moon, Eun-Kyung; Lee, Jung-Yub; Jha, Bijay Kumar; Kong, Hyun-Hee; Chung, Dong-Il; Hong, Yeonchul

    2011-09-01

    Rodent malaria parasites, such as Plasmodium berghei, are practical and useful model organisms for human malaria research because of their analogies to the human malaria in terms of structure, physiology, and life cycle. Exploiting the available genetic sequence information, we constructed a cDNA library from the erythrocytic stages of P. berghei and analyzed the expressed sequence tag (EST). A total of 10,040 ESTs were generated and assembled into 2,462 clusters. These EST clusters were compared against public protein databases and 48 putative new transcripts, most of which were hypothetical proteins with unknown function, were identified. Genes encoding ribosomal or membrane proteins and purine nucleotide phosphorylases were highly abundant clusters in P. berghei. Protein domain analyses and the Gene Ontology functional categorization revealed translation/protein folding, metabolism, protein degradation, and multiple family of variant antigens to be mainly prevalent. The presently-collected ESTs and its bioinformatic analysis will be useful resources to identify for drug target and vaccine candidates and validate gene predictions of P. berghei. PMID:22072821

  17. Transcriptome analysis of expressed sequence tags from the venom glands of the fish Thalassophryne nattereri.

    PubMed

    Magalhães, G S; Junqueira-de-Azevedo, I L M; Lopes-Ferreira, M; Lorenzini, D M; Ho, P L; Moura-da-Silva, A M

    2006-06-01

    Thalassophryne nattereri (niquim) is a venomous fish found on the northern and northeastern coasts of Brazil. Every year, hundreds of humans are affected by the poison, which causes excruciating local pain, edema, and necrosis, and can lead to permanent disabilities. In experimental models, T. nattereri venom induces edema and nociception, which are correlated to human symptoms and dependent on venom kininogenase activity; myotoxicity; impairment of blood flow; platelet lysis and cytotoxicity on endothelial cells. These effects were observed with minute amounts of venom. To characterize the primary structure of T. nattereri venom toxins, a list of transcripts within the venom gland was made using the expressed sequence tag (EST) strategy. Here we report the analysis of 775 ESTs that were obtained from a directional cDNA library of T. nattereri venom gland. Of these ESTs, 527 (68%) were related to sequences previously described. These were categorized into 10 groups according to their biological functions. Sequences involved in gene and protein expression accounted for 14.3% of the ESTs, reflecting the important role of protein synthesis in this gland. Other groups included proteins engaged in the assembly of disulfide bonds (0.5%), chaperones involved in the folding of nascent proteins (1.4%), and sequences related to clusterin (1.5%), as well as transcripts related to calcium binding proteins (1.0%). We detected a large cluster (1.3%) related to cocaine- and amphetamine-regulated transcript (CART), a peptide involved in the regulation of food intake. Surprisingly, several retrotransposon-like sequences (1.0%) were found in the library. It may be that their presence accounts for some of the variation in venom toxins. The toxin category (18.8%) included natterins (18%), which are a new group of kininogenases recently described by our group, and a group of C-type lectins (0.8%). In addition, a considerable number of sequences (32%) was not related to sequences in the

  18. Alternative splicing and expression profile analysis of expressed sequence tags in domestic pig.

    PubMed

    Zhang, Liang; Tao, Lin; Ye, Lin; He, Ling; Zhu, Yuan-Zhong; Zhu, Yue-Dong; Zhou, Yan

    2007-02-01

    Domestic pig (Sus scrofa domestica) is one of the most important mammals to humans. Alternative splicing is a cellular mechanism in eukaryotes that greatly increases the diversity of gene products. Expression sequence tags (ESTs) have been widely used for gene discovery, expression profile analysis, and alternative splicing detection. In this study, a total of 712,905 ESTs extracted from 101 different non-normalized EST libraries of the domestic pig were analyzed. These EST libraries cover the nervous system, digestive system, immune system, and meat production related tissues from embryo, newborn, and adult pigs, making contributions to the analysis of alternative splicing variants as well as expression profiles in various stages of tissues. A modified approach was designed to cluster and assemble large EST datasets, aiming to detect alternative splicing together with EST abundance of each splicing variant. Much efforts were made to classify alternative splicing into different types and apply different filters to each type to get more reliable results. Finally, a total of 1,223 genes with average 2.8 splicing variants were detected among 16,540 unique genes. The overview of expression profiles would change when we take alternative splicing into account. PMID:17572361

  19. Comprehensive analysis of expressed sequence tags from cultivated and wild radish (Raphanus spp.)

    PubMed Central

    2013-01-01

    Background Radish (Raphanus sativus L., 2n = 2× = 18) is an economically important vegetable crop worldwide. A large collection of radish expressed sequence tags (ESTs) has been generated but remains largely uncharacterized. Results In this study, approximately 315,000 ESTs derived from 22 Raphanus cDNA libraries from 18 different genotypes were analyzed, for the purpose of gene and marker discovery and to evaluate large-scale genome duplication and phylogenetic relationships among Raphanus spp. The ESTs were assembled into 85,083 unigenes, of which 90%, 65%, 89% and 89% had homologous sequences in the GenBank nr, SwissProt, TrEMBL and Arabidopsis protein databases, respectively. A total of 66,194 (78%) could be assigned at least one gene ontology (GO) term. Comparative analysis identified 5,595 gene families unique to radish that were significantly enriched with genes related to small molecule metabolism, as well as 12,899 specific to the Brassicaceae that were enriched with genes related to seed oil body biogenesis and responses to phytohormones. The analysis further indicated that the divergence of radish and Brassica rapa occurred approximately 8.9-14.9 million years ago (MYA), following a whole-genome duplication event (12.8-21.4 MYA) in their common ancestor. An additional whole-genome duplication event in radish occurred at 5.1-8.4 MYA, after its divergence from B. rapa. A total of 13,570 simple sequence repeats (SSRs) and 28,758 high-quality single nucleotide polymorphisms (SNPs) were also identified. Using a subset of SNPs, the phylogenetic relationships of eight different accessions of Raphanus was inferred. Conclusion Comprehensive analysis of radish ESTs provided new insights into radish genome evolution and the phylogenetic relationships of different radish accessions. Moreover, the radish EST sequences and the associated SSR and SNP markers described in this study represent a valuable resource for radish functional genomics studies and

  20. pISTil: a pipeline for yeast two-hybrid Interaction Sequence Tags identification and analysis

    PubMed Central

    Pellet, Johann; Meyniel, Laurène; Vidalain, Pierre-Olivier; de Chassey, Benoît; Tafforeau, Lionel; Lotteau, Vincent; Rabourdin-Combe, Chantal; Navratil, Vincent

    2009-01-01

    Background High-throughput screening of protein-protein interactions opens new systems biology perspectives for the comprehensive understanding of cell physiology in normal and pathological conditions. In this context, yeast two-hybrid system appears as a promising approach to efficiently reconstruct protein interaction networks at the proteome-wide scale. This protein interaction screening method generates a large amount of raw sequence data, i.e. the ISTs (Interaction Sequence Tags), which urgently need appropriate tools for their systematic and standardised analysis. Findings We develop pISTil, a bioinformatics pipeline combined with a user-friendly web-interface: (i) to establish a standardised system to analyse and to annotate ISTs generated by two-hybrid technologies with high performance and flexibility and (ii) to provide high-quality protein-protein interaction datasets for systems-level approach. This pipeline has been validated on a large dataset comprising more than 11.000 ISTs. As a case study, a detailed analysis of ISTs obtained from yeast two-hybrid screens of Hepatitis C Virus proteins against human cDNA libraries is also provided. Conclusion We have developed pISTil, an open source pipeline made of a collection of several applications governed by a Perl script. The pISTil pipeline is intended to laboratories, with IT-expertise in system administration, scripting and database management, willing to automatically process large amount of ISTs data for accurate reconstruction of protein interaction networks in a systems biology perspective. pISTil is publicly available for download at . PMID:19874608

  1. Expressed sequence tag analysis in Cycas, the most primitive living seed plant

    PubMed Central

    Brenner, Eric D; Stevenson, Dennis W; McCombie, Richard W; Katari, Manpreet S; Rudd, Stephen A; Mayer, Klaus FX; Palenchar, Peter M; Runko, Suzan J; Twigg, Richard W; Dai, Guangwei; Martienssen, Rob A; Benfey, Phillip N; Coruzzi, Gloria M

    2003-01-01

    Background Cycads are ancient seed plants (living fossils) with origins in the Paleozoic. Cycads are sometimes considered a 'missing link' as they exhibit characteristics intermediate between vascular non-seed plants and the more derived seed plants. Cycads have also been implicated as the source of 'Guam's dementia', possibly due to the production of S(+)-beta-methyl-alpha, beta-diaminopropionic acid (BMAA), which is an agonist of animal glutamate receptors. Results A total of 4,200 expressed sequence tags (ESTs) were created from Cycas rumphii and clustered into 2,458 contigs, of which 1,764 had low-stringency BLAST similarity to other plant genes. Among those cycad contigs with similarity to plant genes, 1,718 cycad 'hits' are to angiosperms, 1,310 match genes in gymnosperms and 734 match lower (non-seed) plants. Forty-six contigs were found that matched only genes in lower plants and gymnosperms. Upon obtaining the complete sequence from the clones of 37/46 contigs, 14 still matched only gymnosperms. Among those cycad contigs common to higher plants, ESTs were discovered that correspond to those involved in development and signaling in present-day flowering plants. We purified a cycad EST for a glutamate receptor (GLR)-like gene, as well as ESTs potentially involved in the synthesis of the GLR agonist BMAA. Conclusions Analysis of cycad ESTs has uncovered conserved and potentially novel genes. Furthermore, the presence of a glutamate receptor agonist, as well as a glutamate receptor-like gene in cycads, supports the hypothesis that such neuroactive plant products are not merely herbivore deterrents but may also serve a role in plant signaling. PMID:14659015

  2. Identification of candidates for cyclotide biosynthesis and cyclisation by expressed sequence tag analysis of Oldenlandia affinis

    PubMed Central

    2010-01-01

    Background Cyclotides are a family of circular peptides that exhibit a range of biological activities, including anti-bacterial, cytotoxic, anti-HIV activities, and are proposed to function in plant defence. Their high stability has motivated their development as scaffolds for the stabilisation of peptide drugs. Oldenlandia affinis is a member of the Rubiaceae (coffee) family from which 18 cyclotides have been sequenced to date, but the details of their processing from precursor proteins have only begun to be elucidated. To increase the speed at which genes involved in cyclotide biosynthesis and processing are being discovered, an expressed sequence tag (EST) project was initiated to survey the transcript profile of O. affinis and to propose some future directions of research on in vivo protein cyclisation. Results Using flow cytometry the holoploid genome size (1C-value) of O. affinis was estimated to be 4,210 - 4,284 Mbp, one of the largest genomes of the Rubiaceae family. High-quality ESTs were identified, 1,117 in total, from leaf cDNAs and assembled into 502 contigs, comprising 202 consensus sequences and 300 singletons. ESTs encoding the cyclotide precursors for kalata B1 (Oak1) and kalata B2 (Oak4) were among the 20 most abundant ESTs. In total, 31 ESTs encoded cyclotide precursors, representing a distinct commitment of 2.8% of the O. affinis transcriptome to cyclotide biosynthesis. The high expression levels of cyclotide precursor transcripts are consistent with the abundance of mature cyclic peptides in O. affinis. A new cyclotide precursor named Oak5 was isolated and represents the first cDNA for the bracelet class of cyclotides in O. affinis. Clones encoding enzymes potentially involved in processing cyclotides were also identified and include enzymes involved in oxidative folding and proteolytic processing. Conclusion The EST library generated in this study provides a valuable resource for the study of the cyclisation of plant peptides. Further analysis

  3. Myocardial tagging by Cardiovascular Magnetic Resonance: evolution of techniques--pulse sequences, analysis algorithms, and applications

    PubMed Central

    2011-01-01

    Cardiovascular magnetic resonance (CMR) tagging has been established as an essential technique for measuring regional myocardial function. It allows quantification of local intramyocardial motion measures, e.g. strain and strain rate. The invention of CMR tagging came in the late eighties, where the technique allowed for the first time for visualizing transmural myocardial movement without having to implant physical markers. This new idea opened the door for a series of developments and improvements that continue up to the present time. Different tagging techniques are currently available that are more extensive, improved, and sophisticated than they were twenty years ago. Each of these techniques has different versions for improved resolution, signal-to-noise ratio (SNR), scan time, anatomical coverage, three-dimensional capability, and image quality. The tagging techniques covered in this article can be broadly divided into two main categories: 1) Basic techniques, which include magnetization saturation, spatial modulation of magnetization (SPAMM), delay alternating with nutations for tailored excitation (DANTE), and complementary SPAMM (CSPAMM); and 2) Advanced techniques, which include harmonic phase (HARP), displacement encoding with stimulated echoes (DENSE), and strain encoding (SENC). Although most of these techniques were developed by separate groups and evolved from different backgrounds, they are in fact closely related to each other, and they can be interpreted from more than one perspective. Some of these techniques even followed parallel paths of developments, as illustrated in the article. As each technique has its own advantages, some efforts have been made to combine different techniques together for improved image quality or composite information acquisition. In this review, different developments in pulse sequences and related image processing techniques are described along with the necessities that led to their invention, which makes this

  4. Not All Sequence Tags Are Created Equal: Designing and Validating Sequence Identification Tags Robust to Indels

    PubMed Central

    Faircloth, Brant C.; Glenn, Travis C.

    2012-01-01

    Ligating adapters with unique synthetic oligonucleotide sequences (sequence tags) onto individual DNA samples before massively parallel sequencing is a popular and efficient way to obtain sequence data from many individual samples. Tag sequences should be numerous and sufficiently different to ensure sequencing, replication, and oligonucleotide synthesis errors do not cause tags to be unrecoverable or confused. However, many design approaches only protect against substitution errors during sequencing and extant tag sets contain too few tag sequences. We developed an open-source software package to validate sequence tags for conformance to two distance metrics and design sequence tags robust to indel and substitution errors. We use this software package to evaluate several commercial and non-commercial sequence tag sets, design several large sets (maxcount = 7,198) of edit metric sequence tags having different lengths and degrees of error correction, and integrate a subset of these edit metric tags to polymerase chain reaction (PCR) primers and sequencing adapters. We validate a subset of these edit metric tagged PCR primers and sequencing adapters by sequencing on several platforms and subsequent comparison to commercially available alternatives. We find that several commonly used sets of sequence tags or design methodologies used to produce sequence tags do not meet the minimum expectations of their underlying distance metric, and we find that PCR primers and sequencing adapters incorporating edit metric sequence tags designed by our software package perform as well as their commercial counterparts. We suggest that researchers evaluate sequence tags prior to use or evaluate tags that they have been using. The sequence tag sets we design improve on extant sets because they are large, valid across the set, and robust to the suite of substitution, insertion, and deletion errors affecting massively parallel sequencing workflows on all currently used platforms

  5. Desiccation survival in an Antarctic nematode: molecular analysis using expressed sequenced tags

    PubMed Central

    Adhikari, Bishwo N; Wall, Diana H; Adams, Byron J

    2009-01-01

    Background Nematodes are the dominant soil animals in Antarctic Dry Valleys and are capable of surviving desiccation and freezing in an anhydrobiotic state. Genes induced by desiccation stress have been successfully enumerated in nematodes; however we have little knowledge of gene regulation by Antarctic nematodes which can survive multiple environmental stresses. To address this problem we investigated the genetic responses of a nematode species, Plectus murrayi, that is capable of tolerating Antarctic environmental extremes, in particular desiccation and freezing. In this study, we provide the first insight into the desiccation induced transcriptome of an Antarctic nematode through cDNA library construction and suppressive subtractive hybridization. Results We obtained 2,486 expressed sequence tags (ESTs) from 2,586 clones derived from the cDNA library of desiccated P. murrayi. The 2,486 ESTs formed 1,387 putative unique transcripts of which 523 (38%) had matches in the model-nematode Caenorhabditis elegans, 107 (7%) in nematodes other than C. elegans, 153 (11%) in non-nematode organisms and 605 (44%) had no significant match to any sequences in the current databases. The 1,387 unique transcripts were functionally classified by using Gene Ontology (GO) hierarchy and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The results indicate that the transcriptome contains a group of transcripts from diverse functional areas. The subtractive library of desiccated nematodes showed 80 transcripts differentially expressed during desiccation stress, of which 28% were metabolism related, 19% were involved in environmental information processing, 28% involved in genetic information processing and 21% were novel transcripts. Expression profiling of 14 selected genes by quantitative Real-time PCR showed 9 genes significantly up-regulated, 3 down-regulated and 2 continuously expressed in response to desiccation. Conclusion The establishment of a desiccation EST

  6. Generation and analysis of end sequence database for T-DNA tagging lines in rice.

    PubMed

    An, Suyoung; Park, Sunhee; Jeong, Dong-Hoon; Lee, Dong-Yeon; Kang, Hong-Gyu; Yu, Jung-Hwa; Hur, Junghe; Kim, Sung-Ryul; Kim, Young-Hea; Lee, Miok; Han, Soonki; Kim, Soo-Jin; Yang, Jungwon; Kim, Eunjoo; Wi, Soo Jin; Chung, Hoo Sun; Hong, Jong-Pil; Choe, Vitnary; Lee, Hak-Kyung; Choi, Jung-Hee; Nam, Jongmin; Kim, Seong-Ryong; Park, Phun-Bum; Park, Ky Young; Kim, Woo Taek; Choe, Sunghwa; Lee, Chin-Bum; An, Gynheung

    2003-12-01

    We analyzed 6749 lines tagged by the gene trap vector pGA2707. This resulted in the isolation of 3793 genomic sequences flanking the T-DNA. Among the insertions, 1846 T-DNAs were integrated into genic regions, and 1864 were located in intergenic regions. Frequencies were also higher at the beginning and end of the coding regions and upstream near the ATG start codon. The overall GC content at the insertion sites was close to that measured from the entire rice (Oryza sativa) genome. Functional classification of these 1846 tagged genes showed a distribution similar to that observed for all the genes in the rice chromosomes. This indicates that T-DNA insertion is not biased toward a particular class of genes. There were 764, 327, and 346 T-DNA insertions in chromosomes 1, 4 and 10, respectively. Insertions were not evenly distributed; frequencies were higher at the ends of the chromosomes and lower near the centromere. At certain sites, the frequency was higher than in the surrounding regions. This sequence database will be valuable in identifying knockout mutants for elucidating gene function in rice. This resource is available to the scientific community at http://www.postech.ac.kr/life/pfg/risd. PMID:14630961

  7. Transcriptome analysis of Loxosceles laeta (Araneae, Sicariidae) spider venomous gland using expressed sequence tags

    PubMed Central

    Fernandes-Pedrosa, Matheus de F; Junqueira-de-Azevedo, Inácio de LM; Gonçalves-de-Andrade, Rute M; Kobashi, Leonardo S; Almeida, Diego D; Ho, Paulo L; Tambourgi, Denise V

    2008-01-01

    Background The bite of spiders belonging to the genus Loxosceles can induce a variety of clinical symptoms, including dermonecrosis, thrombosis, vascular leakage, haemolysis, and persistent inflammation. In order to examine the transcripts expressed in venom gland of Loxosceles laeta spider and to unveil the potential of its products on cellular structure and functional aspects, we generated 3,008 expressed sequence tags (ESTs) from a cDNA library. Results All ESTs were clustered into 1,357 clusters, of which 16.4% of the total ESTs belong to recognized toxin-coding sequences, being the Sphingomyelinases D the most abundant transcript; 14.5% include "possible toxins", whose transcripts correspond to metalloproteinases, serinoproteinases, hyaluronidases, lipases, C-lectins, cystein peptidases and inhibitors. Thirty three percent of the ESTs are similar to cellular transcripts, being the major part represented by molecules involved in gene and protein expression, reflecting the specialization of this tissue for protein synthesis. In addition, a considerable number of sequences, 25%, has no significant similarity to any known sequence. Conclusion This study provides a first global view of the gene expression scenario of the venom gland of L. laeta described so far, indicating the molecular bases of its venom composition. PMID:18547439

  8. Pyrosequence analysis of expressed sequence tags for Manduca sexta hemolymph proteins involved in immune responses.

    PubMed

    Zou, Zhen; Najar, Fares; Wang, Yang; Roe, Bruce; Jiang, Haobo

    2008-06-01

    The tobacco hornworm Manduca sexta is widely used as a model organism to investigate the biochemical basis of insect physiological processes but little transcriptome information is available. To get a broad view of the larval hemolymph proteins, particularly those related to immunity, we synthesized and sequenced cDNA fragments from a mixture of eight total RNA samples: fat body and hemocytes from larvae injected with killed bacteria, fat body, hemocytes, integument and trachea from naïve larvae, and fat body and hemocytes from wandering larvae. Using massively parallel pyrosequencing, we obtained 95,458 M. sexta expressed sequence tags (ESTs) at an average size of 185bp per read. A majority of the sequences (69,429 reads) could be assembled into 7231 contigs with an average size of 300bp, 1178 of which had significant similarity with Drosophila genes from various functional groups. Only approximately 8% (606) of the contigs matched known M. sexta cDNA sequences, representing 186 of the 375 unique NCBI entries. The remaining 6625 contigs represented newly discovered cDNA segments from this well studied biochemical model insect. A search of the 7231 contigs using Tribolium castaneum, Drosophila melanogaster, and Bombyx mori immunity-related sequences revealed 424 cDNA contigs with significant similarity (E-value <1 x 10(-5)). These included 218 previously unknown M. sexta sequences coding for putative defense molecules such as pattern recognition receptors, serine proteinases, serpins, Spätzle, Toll-like receptors, intracellular signaling molecules, and antimicrobial peptides. PMID:18510979

  9. Functional annotation of an expressed sequence tag library from Haliotis diversicolor and analysis of its plant-like sequences.

    PubMed

    Jiang, Jing-Zhe; Zhang, Wei; Guo, Zhi-Xun; Cai, Chen-Chen; Su, You-Lu; Wang, Rui-Xuan; Wang, Jiang-Yong

    2011-09-01

    The small abalone, Haliotis diversicolor, is a widely distributed and cultured species in the subtropical coastal area of China. To identify and classify functional genes of this important species, a normalized expressed sequence tag (EST) library, including 7069 high quality ESTs from the total body of H. diversicolor, was analyzed. A total of 4781 unigenes were assembled and 2991 novel abalone genes were identified. The GC content, codon and amino acid usage of the transcriptome were analyzed. For the accurate annotation of the abalone library, different influencing factors were evaluated. The gene ontology (GO) database provided a higher annotation rate (69.6%), and sequences longer than 800bp were easily subjected to a BLAST search. The taxonomy of the BLAST results showed that lancelet and invertebrates are most closely related to abalone. Sixty-seven identified plant-like genes were further examined by reverse transcription-polymerase chain reaction (RT-PCR) and sequencing, only seven of these were real transcripts in abalone. Phylogenic trees were also constructed to illustrate the positions of two Cystatin sequences and one Calmodulin protein sequence identified in abalone. To perform functional classification, three different databases (GO, KEGG and COG) were used and 60 immune or disease-related unigenes were determined. This work has greatly enlarged the known gene pool of H. diversicolor and will have important implications for future molecular and genetic analyses in this organism. PMID:21867971

  10. High-Throughput Tag-Sequencing Analysis of Early Events Induced by Ochratoxin A in HepG-2 Cells.

    PubMed

    Zhang, Yu; Qi, Xiaozhe; Zheng, Juanjuan; Luo, YunBo; Huang, Kunlun; Xu, Wentao

    2016-01-01

    Ochratoxin A (OTA) is produced by fungi of the species Aspergillus and Penicillium. OTA has displayed hepatotoxicity in mammals. Although recent studies have indicated that OTA influences liver function, little is known regarding its impact on differential early liver toxicity. In this study, we report high-throughput tag-sequencing (Tag-seq) analysis of the transcriptome using Solexa Analyzer platform after 4 h of OTA treatment on HepG-2 cells. The analyses of differentially expressed genes revealed the substantial changes. A total of 21,449 genes were identified and quantified, with 2726 displaying significantly altered expression levels. Expression level data were then integrated with a network of gene-gene interactions, and biological pathways to obtain a systems-level view of changes in the transcriptome that occur with OTA resistance. Our data suggest that OTA exposure leads to an imbalance in zinc finger expression and shed light on splicing factor and mitochondrial-based mechanisms. PMID:26377828

  11. Assembly of a gene sequence tag microarray by reversible biotin-streptavidin capture for transcript analysis of Arabidopsis thaliana

    PubMed Central

    Wirta, Valtteri; Holmberg, Anders; Lukacs, Morten; Nilsson, Peter; Hilson, Pierre; Uhlén, Mathias; Bhalerao, Rishikesh P; Lundeberg, Joakim

    2005-01-01

    Background Transcriptional profiling using microarrays has developed into a key molecular tool for the elucidation of gene function and gene regulation. Microarray platforms based on either oligonucleotides or purified amplification products have been utilised in parallel to produce large amounts of data. Irrespective of platform examined, the availability of genome sequence or a large number of representative expressed sequence tags (ESTs) is, however, a pre-requisite for the design and selection of specific and high-quality microarray probes. This is of great importance for organisms, such as Arabidopsis thaliana, with a high number of duplicated genes, as cross-hybridisation signals between evolutionary related genes cannot be distinguished from true signals unless the probes are carefully designed to be specific. Results We present an alternative solid-phase purification strategy suitable for efficient preparation of short, biotinylated and highly specific probes suitable for large-scale expression profiling. Twenty-one thousand Arabidopsis thaliana gene sequence tags were amplified and subsequently purified using the described technology. The use of the arrays is exemplified by analysis of gene expression changes caused by a four-hour indole-3-acetic (auxin) treatment. A total of 270 genes were identified as differentially expressed (120 up-regulated and 150 down-regulated), including several previously known auxin-affected genes, but also several previously uncharacterised genes. Conclusions The described solid-phase procedure can be used to prepare gene sequence tag microarrays based on short and specific amplified probes, facilitating the analysis of more than 21 000 Arabidopsis transcripts. PMID:15689241

  12. Transcriptome analysis in the midgut of the earthworm (Eisenia andrei) using expressed sequence tags.

    PubMed

    Lee, Myung Sik; Cho, Sung Jin; Tak, Eun Sik; Lee, Jong Ae; Cho, Hyun Ju; Park, Bum Joon; Shin, Chuog; Kim, Dae Kyong; Park, Soon Cheol

    2005-03-25

    In order to gain insight into the expression profiles of the earthworm midgut, we analyzed 1106 expressed sequence tags (ESTs) derived from the earthworm midgut cDNA library. Among the 1106 ESTs analyzed, 557 (50.4%) ESTs showed significant similarity to known genes and represented 229 unique genes of which 166 ESTs were singletons and 63 ESTs manifest as two or more ESTs. While 552 ESTs (49.9%) were sequenced only once, 230 ESTs (20.8%) appeared two to five times and 324 ESTs (29.3%) were sequenced more than five times. Considering this redundancy of expression, it is likely that the gene expression profile of the earthworm midgut would be polarized. The expression of globin-related proteins, including ferritin and linker chain, and fibrinolytic enzymes appeared to account for 10.1% and 4.7% of the total ESTs analyzed in this study, respectively. This suggests that the prime functions of the midgut in the earthworm would be associated with protein hydrolysis as well as globin formation. Among the recognized protein-coding genes, the gene category involved in protein synthesis appeared to be the largest one accounting for 15.6% of the expression in the midgut, followed by gene categories associated with energy (11.2%), homeostasis (10.8%), metabolism (3.6%), cytoskeleton (2.5%), and protein fate (1.4%). With regard to functional aspects, the most abundantly expressed genes were associated with respiratory pigment (10.1%), cellular respiration (8.6%), and fibrin hydrolysis (4.7%). In addition, we were able to identify novel ESTs in the earthworm, which were related to the innate immune system, including destabilase, a possible antagonist of transglutaminase. PMID:15708003

  13. Analysis of expressed sequence tags from Musa acuminata ssp. burmannicoides, var. Calcutta 4 (AA) leaves submitted to temperature stresses.

    PubMed

    Santos, C M R; Martins, N F; Hörberg, H M; de Almeida, E R P; Coelho, M C F; Togawa, R C; da Silva, F R; Caetano, A R; Miller, R N G; Souza, M T

    2005-05-01

    In order to discover genes expressed in leaves of Musa acuminata ssp. burmannicoides var. Calcutta 4 (AA), from plants submitted to temperature stress, we produced and characterized two full-length enriched cDNA libraries. Total RNA from plants subjected to temperatures ranging from 5 degrees C to 25 degrees C and from 25 degrees C to 45 degrees C was used to produce a COLD and a HOT cDNA library, respectively. We sequenced 1,440 clones from each library. Following quality analysis and vector trimming, we assembled 2,286 sequences from both libraries into 1,019 putative transcripts, consisting of 217 clusters and 802 singletons, which we denoted Musa acuminata assembled expressed sequence tagged (EST) sequences (MaAES). Of these MaAES, 22.87% showed no matches with existing sequences in public databases. A global analysis of the MaAES data set indicated that 10% of the sequenced cDNAs are present in both cDNA libraries, while 42% and 48% are present only in the COLD or in the HOT libraries, respectively. Annotation of the MaAES data set categorized them into 22 functional classes. Of the 2,286 high-quality sequences, 715 (31.28%) originated from full-length cDNA clones and resulted in a set of 149 genes. PMID:15841358

  14. Immune gene discovery by expressed sequence tag (EST) analysis of hemocytes in the ridgetail white prawn Exopalaemon carinicauda

    PubMed Central

    Duan, Yafei; Liu, Ping; Li, Jitao; Li, Jian; Chen, Ping

    2013-01-01

    The ridgetail white prawn Exopalaemon carinicauda is one of the most important commercial species in eastern China. However, little information of immune genes in E. carinicauda has been reported. To identify distinctive genes associated with immunity, an expressed sequence tag (EST) library was constructed from hemocytes of E. carinicauda. A total of 3411 clones were sequenced, yielding 2853 ESTs and the average sequence length is 436 bp. The cluster and assembly analysis yielded 1053 unique sequences including 329 contigs and 724 singletons. Blast analysis identified 593 (56.3%) of the unique sequences as orthologs of genes from other organisms (E-value < 1e-5). Based on the COG and Gene Ontology (GO), 593 unique sequences were classified. Through comparison with previous studies, 153 genes assembled from 367 ESTs have been identified as possibly involved in defense or immune functions. These genes are categorized into seven categories according to their putative functions in shrimp immune system: antimicrobial peptides, prophenoloxidase activating system, antioxidant defense systems, chaperone proteins, clottable proteins, pattern recognition receptors and other immune-related genes. According to EST abundance, the major immune-related genes were thioredoxin (141, 4.94% of all ESTs) and calmodulin (14, 0.49% of all ESTs). The EST sequences of E. carinicauda hemocytes provide important information of the immune system and lay the groundwork for development of molecular markers related to disease resistance in prawn species. PMID:23092732

  15. Analysis of expressed sequence tags from the blue-green sharpshooter, Graphocephala atropunctata

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We used a metagenomic approach and identified and sequenced 6,836 genetic sequences isolated from adult blue-green sharpshooters, BGSS, Graphocephala atropunctata. These results provided over 70% of the mitochondrial genome sequence which is being completed. The BGSS is endemic to southern Californ...

  16. Analysis of expressed sequence tags (ESTs) from a normalized cDNA library and isolation of EST simple sequence repeats from the invasive cotton mealybug Phenacoccus solenopsis.

    PubMed

    Li, Hui; Lang, Kun-Ling; Fu, Hai-Bin; Shen, Chang-Peng; Wan, Fang-Hao; Chu, Dong

    2015-12-01

    The cotton mealybug, Phenacoccus solenopsis Tinsley, is a serious and invasive pest. At present, genetic resources for studying P. solenopsis are limited, and this negatively affects genetic research on the organism and, consequently, translational work to improve management of this pest. In the present study, expressed sequence tags (ESTs) were analyzed from a normalized complementary DNA library of P. solenopsis. In addition, EST-derived microsatellite loci (also known as simple sequence repeats or SSRs) were isolated and characterized. A total of 1107 high-quality ESTs were acquired from the library. Clustering and assembly analysis resulted in 785 unigenes, which were classified functionally into 23 categories according to the Gene Ontology database. Seven EST-based SSR markers were developed in this study and are expected to be useful in characterizing how this invasive species was introduced, as well as providing insights into its genetic microevolution. PMID:25380551

  17. Expressed Sequence Tags Analysis and Design of Simple Sequence Repeats Markers from a Full-Length cDNA Library in Perilla frutescens (L.)

    PubMed Central

    Seong, Eun Soo; Yoo, Ji Hye; Choi, Jae Hoo; Kim, Chang Heum; Jeon, Mi Ran; Kang, Byeong Ju; Lee, Jae Geun; Choi, Seon Kang; Ghimire, Bimal Kumar; Yu, Chang Yeon

    2015-01-01

    Perilla frutescens is valuable as a medicinal plant as well as a natural medicine and functional food. However, comparative genomics analyses of P. frutescens are limited due to a lack of gene annotations and characterization. A full-length cDNA library from P. frutescens leaves was constructed to identify functional gene clusters and probable EST-SSR markers via analysis of 1,056 expressed sequence tags. Unigene assembly was performed using basic local alignment search tool (BLAST) homology searches and annotated Gene Ontology (GO). A total of 18 simple sequence repeats (SSRs) were designed as primer pairs. This study is the first to report comparative genomics and EST-SSR markers from P. frutescens will help gene discovery and provide an important source for functional genomics and molecular genetic research in this interesting medicinal plant. PMID:26664999

  18. Gene discovery and expression profile analysis through sequencing of expressed sequence tags from different developmental stages of the chytridiomycete Blastocladiella emersonii.

    PubMed

    Ribichich, Karina F; Salem-Izacc, Silvia M; Georg, Raphaela C; Vêncio, Ricardo Z N; Navarro, Luci D; Gomes, Suely L

    2005-02-01

    Blastocladiella emersonii is an aquatic fungus of the chytridiomycete class which diverged early from the fungal lineage and is notable for the morphogenetic processes which occur during its life cycle. Its particular taxonomic position makes this fungus an interesting system to be considered when investigating phylogenetic relationships and studying the biology of lower fungi. To contribute to the understanding of the complexity of the B. emersonii genome, we present here a survey of expressed sequence tags (ESTs) from various stages of the fungal development. Nearly 20,000 cDNA clones from 10 different libraries were partially sequenced from their 5' end, yielding 16,984 high-quality ESTs. These ESTs were assembled into 4,873 putative transcripts, of which 48% presented no matches with existing sequences in public databases. As a result of Gene Ontology (GO) project annotation, 1,680 ESTs (35%) were classified into biological processes of the GO structure, with transcription and RNA processing, protein biosynthesis, and transport as prevalent processes. We also report full-length sequences, useful for construction of molecular phylogenies, and several ESTs that showed high similarity with known proteins, some of which were not previously described in fungi. Furthermore, we analyzed the expression profile (digital Northern analysis) of each transcript throughout the life cycle of the fungus using Bayesian statistics. The in silico approach was validated by Northern blot analysis with good agreement between the two methodologies. PMID:15701807

  19. Analysis and RT-PCR identification of viral sequences in peanut (Arachis hypogaea L.) expressed sequence tags from different peanut tissues

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Peanut plants grown in the field have been naturally infected with different viruses resulting in economic yield loss in the southeastern US, such as tomato spotted wilt tospovirus (TSWV) in peanuts. The objectives of this study were to investigate peanut sequences of expressed sequence tags (EST) f...

  20. Generation and Analysis of Expressed Sequence Tags from Olea europaea L.

    PubMed Central

    Ozdemir Ozgenturk, Nehir; Oruç, Fatma; Sezerman, Ugur; Kuçukural, Alper; Vural Korkut, Senay; Toksoz, Feriha; Un, Cemal

    2010-01-01

    Olive (Olea europaea L.) is an important source of edible oil which was originated in Near-East region. In this study, two cDNA libraries were constructed from young olive leaves and immature olive fruits for generation of ESTs to discover the novel genes and search the function of unknown genes of olive. The randomly selected 3840 colonies were sequenced for EST collection from both libraries. Readable 2228 sequences for olive leaf and 1506 sequences for olive fruit were assembled into 205 and 69 contigs, respectively, whereas 2478 were singletons. Putative functions of all 2752 differentially expressed unique sequences were designated by gene homology based on BLAST and annotated using BLAST2GO. While 1339 ESTs show no homology to the database, 2024 ESTs have homology (under 80%) with hypothetical proteins, putative proteins, expressed proteins, and unknown proteins in NCBI-GenBank. 635 EST's unique genes sequence have been identified by over 80% homology to known function in other species which were not previously described in Olea family. Only 3.1% of total EST's was shown similarity with olive database existing in NCBI. This generated EST's data and consensus sequences were submitted to NCBI as valuable source for functional genome studies of olive. PMID:21197085

  1. Transcriptome analysis of the phytopathogenic fungus Rhizoctonia solani AG1-IB 7/3/14 applying high-throughput sequencing of expressed sequence tags (ESTs).

    PubMed

    Wibberg, Daniel; Jelonek, Lukas; Rupp, Oliver; Kröber, Magdalena; Goesmann, Alexander; Grosch, Rita; Pühler, Alfred; Schlüter, Andreas

    2014-01-01

    Rhizoctonia solani is a soil-borne plant pathogenic fungus of the phylum Basidiomycota. It affects a wide range of agriculturally important crops and hence is responsible for economically relevant crop losses. Transcriptome analysis of the bottom rot pathogen R. solani AG1-1B (isolate 7/3/14) by applying high-throughput sequencing and bioinformatics methods addressing Expressed Sequence Tag (EST) data interpretation provided new insights in expressed genes of this fungus. Two normalized cDNA libraries representing different cultivation conditions of the fungus were sequenced on the 454 FLX (Roche) system. Subsequent to cDNA sequence assembly and quality control, ESTs were analysed applying advanced bioinformatics methods. More than 14 000 transcript isoforms originating from approximately 10 000 predictable R. solani AG1-IB 7/3/14 genes are represented in each dataset. Comparative analyses revealed several differentially expressed genes depending on the growth conditions applied. Determinants with predicted functions in recognition processes between the fungus and the host plant were identified. Moreover, many R. solani AG1-IB ESTs were predicted to encode putative cellulose, pectin, and lignin degrading enzymes. Furthermore, genes playing a possible role in mitogen-activated protein (MAP) kinase cascades, 4-aminobutyric acid (GABA) metabolism, melanin synthesis, plant defence antagonism, phytotoxin, and mycotoxin synthesis were detected. PMID:25209639

  2. Analysis and functional annotation of expressed sequence tags from the Asian longhorned beetle, Anoplophora glabripennis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We identified 600 genetic sequences of which ~380 were uniquely identified to the Asian longhorned beetle (ALB), Anoplophora glabripennis, (Coleoptera) which is one of the most serious invasive forest insect pests discovered in North America in recent years. Despite the substantial impact of this p...

  3. Comparative analysis and functional annotation of a large expressed sequence tag collection of apple

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A total of 34 apple cDNA libraries were constructed from root, leaf, bud, shoot, flower, and fruit tissues, at varying developmental stages and/or under biotic or abiotic stress conditions, and of several genotypes. From these libraries, 190,425 clones were partially sequenced from the 5’ end and 4...

  4. Analysis of expressed sequence tags from Uromyces appendiculatus hyphae and haustoria and their comparison to sequences from other rust fungi

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Two separate cDNA libraries were prepared for RNA extracted from bean rust (Uromyces appendiculatus) hyphae and haustoria isolated from infected leaves bean leaves (Phaseolus vulgaris cv Pint 111) between 2 and 8 dpi. Approximately 13,000 clones were sequenced from both ends and the sequences assem...

  5. SSH Analysis of Endosperm Transcripts and Characterization of Heat Stress Regulated Expressed Sequence Tags in Bread Wheat

    PubMed Central

    Goswami, Suneha; Kumar, Ranjeet R.; Dubey, Kavita; Singh, Jyoti P.; Tiwari, Sachidanand; Kumar, Ashok; Smita, Shuchi; Mishra, Dwijesh C.; Kumar, Sanjeev; Grover, Monendra; Padaria, Jasdeep C.; Kala, Yugal K.; Singh, Gyanendra P.; Pathak, Himanshu; Chinnusamy, Viswanathan; Rai, Anil; Praveen, Shelly; Rai, Raj D.

    2016-01-01

    Heat stress is one of the major problems in agriculturally important cereal crops, especially wheat. Here, we have constructed a subtracted cDNA library from the endosperm of HS-treated (42°C for 2 h) wheat cv. HD2985 by suppression subtractive hybridization (SSH). We identified ~550 recombinant clones ranging from 200 to 500 bp with an average size of 300 bp. Sanger's sequencing was performed with 205 positive clones to generate the differentially expressed sequence tags (ESTs). Most of the ESTs were observed to be localized on the long arm of chromosome 2A and associated with heat stress tolerance and metabolic pathways. Identified ESTs were BLAST search using Ensemble, TriFLD, and TIGR databases and the predicted CDS were translated and aligned with the protein sequences available in pfam and InterProScan 5 databases to predict the differentially expressed proteins (DEPs). We observed eight different types of post-translational modifications (PTMs) in the DEPs corresponds to the cloned ESTs-147 sites with phosphorylation, 21 sites with sumoylation, 237 with palmitoylation, 96 sites with S-nitrosylation, 3066 calpain cleavage sites, and 103 tyrosine nitration sites, predicted to sense the heat stress and regulate the expression of stress genes. Twelve DEPs were observed to have transmembrane helixes (TMH) in their structure, predicted to play the role of sensors of HS. Quantitative Real-Time PCR of randomly selected ESTs showed very high relative expression of HSP17 under HS; up-regulation was observed more in wheat cv. HD2985 (thermotolerant), as compared to HD2329 (thermosusceptible) during grain-filling. The abundance of transcripts was further validated through northern blot analysis. The ESTs and their corresponding DEPs can be used as molecular marker for screening or targeted precision breeding program. PTMs identified in the DEPs can be used to elucidate the thermotolerance mechanism of wheat—a novel step toward the development of

  6. SSH Analysis of Endosperm Transcripts and Characterization of Heat Stress Regulated Expressed Sequence Tags in Bread Wheat.

    PubMed

    Goswami, Suneha; Kumar, Ranjeet R; Dubey, Kavita; Singh, Jyoti P; Tiwari, Sachidanand; Kumar, Ashok; Smita, Shuchi; Mishra, Dwijesh C; Kumar, Sanjeev; Grover, Monendra; Padaria, Jasdeep C; Kala, Yugal K; Singh, Gyanendra P; Pathak, Himanshu; Chinnusamy, Viswanathan; Rai, Anil; Praveen, Shelly; Rai, Raj D

    2016-01-01

    Heat stress is one of the major problems in agriculturally important cereal crops, especially wheat. Here, we have constructed a subtracted cDNA library from the endosperm of HS-treated (42°C for 2 h) wheat cv. HD2985 by suppression subtractive hybridization (SSH). We identified ~550 recombinant clones ranging from 200 to 500 bp with an average size of 300 bp. Sanger's sequencing was performed with 205 positive clones to generate the differentially expressed sequence tags (ESTs). Most of the ESTs were observed to be localized on the long arm of chromosome 2A and associated with heat stress tolerance and metabolic pathways. Identified ESTs were BLAST search using Ensemble, TriFLD, and TIGR databases and the predicted CDS were translated and aligned with the protein sequences available in pfam and InterProScan 5 databases to predict the differentially expressed proteins (DEPs). We observed eight different types of post-translational modifications (PTMs) in the DEPs corresponds to the cloned ESTs-147 sites with phosphorylation, 21 sites with sumoylation, 237 with palmitoylation, 96 sites with S-nitrosylation, 3066 calpain cleavage sites, and 103 tyrosine nitration sites, predicted to sense the heat stress and regulate the expression of stress genes. Twelve DEPs were observed to have transmembrane helixes (TMH) in their structure, predicted to play the role of sensors of HS. Quantitative Real-Time PCR of randomly selected ESTs showed very high relative expression of HSP17 under HS; up-regulation was observed more in wheat cv. HD2985 (thermotolerant), as compared to HD2329 (thermosusceptible) during grain-filling. The abundance of transcripts was further validated through northern blot analysis. The ESTs and their corresponding DEPs can be used as molecular marker for screening or targeted precision breeding program. PTMs identified in the DEPs can be used to elucidate the thermotolerance mechanism of wheat-a novel step toward the development of "climate-smart" wheat

  7. Transcriptomic analysis of the venom gland of the red-headed krait (Bungarus flaviceps) using expressed sequence tags

    PubMed Central

    2010-01-01

    Background The Red-headed krait (Bungarus flaviceps, Squamata: Serpentes: Elapidae) is a medically important venomous snake that inhabits South-East Asia. Although the venoms of most species of the snake genus Bungarus have been well characterized, a detailed compositional analysis of B. flaviceps is currently lacking. Results Here, we have sequenced 845 expressed sequence tags (ESTs) from the venom gland of a B. flaviceps. Of the transcripts, 74.8% were putative toxins; 20.6% were cellular; and 4.6% were unknown. The main venom protein families identified were three-finger toxins (3FTxs), Kunitz-type serine protease inhibitors (including chain B of β-bungarotoxin), phospholipase A2 (including chain A of β-bungarotoxin), natriuretic peptide (NP), CRISPs, and C-type lectin. Conclusion The 3FTxs were found to be the major component of the venom (39%). We found eight groups of unique 3FTxs and most of them were different from the well-characterized 3FTxs. We found three groups of Kunitz-type serine protease inhibitors (SPIs); one group was comparable to the classical SPIs and the other two groups to chain B of β-bungarotoxins (with or without the extra cysteine) based on sequence identity. The latter group may be functional equivalents of dendrotoxins in Bungarus venoms. The natriuretic peptide (NP) found is the first NP for any Asian elapid, and distantly related to Australian elapid NPs. Our study identifies several unique toxins in B. flaviceps venom, which may help in understanding the evolution of venom toxins and the pathophysiological symptoms induced after envenomation. PMID:20350308

  8. Sequence tagging reveals unexpected modifications in toxicoproteomics

    PubMed Central

    Dasari, Surendra; Chambers, Matthew C.; Codreanu, Simona G.; Liebler, Daniel C.; Collins, Ben C.; Pennington, Stephen R.; Gallagher, William M.; Tabb, David L.

    2010-01-01

    Toxicoproteomic samples are rich in posttranslational modifications (PTMs) of proteins. Identifying these modifications via standard database searching can incur significant performance penalties. Here we describe the latest developments in TagRecon, an algorithm that leverages inferred sequence tags to identify modified peptides in toxicoproteomic data sets. TagRecon identifies known modifications more effectively than the MyriMatch database search engine. TagRecon outperformed state of the art software in recognizing unanticipated modifications from LTQ, Orbitrap, and QTOF data sets. We developed user-friendly software for detecting persistent mass shifts from samples. We follow a three-step strategy for detecting unanticipated PTMs in samples. First, we identify the proteins present in the sample with a standard database search. Next, identified proteins are interrogated for unexpected PTMs with a sequence tag-based search. Finally, additional evidence is gathered for the detected mass shifts with a refinement search. Application of this technology on toxicoproteomic data sets revealed unintended cross-reactions between proteins and sample processing reagents. Twenty five proteins in rat liver showed signs of oxidative stress when exposed to potentially toxic drugs. These results demonstrate the value of mining toxicoproteomic data sets for modifications. PMID:21214251

  9. An expressed sequence tag database of T-cell-enriched activated chicken splenocytes: sequence analysis of 5251 clones.

    PubMed

    Tirunagaru, V G; Sofer, L; Cui, J; Burnside, J

    2000-06-01

    The cDNA and gene sequences of many mammalian cytokines and their receptors are known. However, corresponding information on avian cytokines is limited due to the lack of cross-species activity at the functional level or strong homology at the molecular level. To improve the efficiency of identifying cytokines and novel chicken genes, a directionally cloned cDNA library from T-cell-enriched activated chicken splenocytes was constructed, and the partial sequence of 5251 clones was obtained. Sequence clustering indicates that 2357 (42%) of the clones are present as a single copy, and 2961 are distinct clones, demonstrating the high level of complexity of this library. Comparisons of the sequence data with known DNA sequences in GenBank indicate that approximately 25% of the clones match known chicken genes, 39% have similarity to known genes in other species, and 11% had no match to any sequence in the database. Several previously uncharacterized chicken cytokines and their receptors were present in our library. This collection provides a useful database for cataloging genes expressed in T cells and a valuable resource for future investigations of gene expression in avian immunology. A chicken EST Web site (http://udgenome. ags.udel. edu/chickest/chick.htm) has been created to provide access to the data, and a set of unique sequences has been deposited with GenBank (Accession Nos. AI979741-AI982511). Our new Web site (http://www. chickest.udel.edu) will be active as of March 3, 2000, and will also provide keyword-searching capabilities for BLASTX and BLASTN hits of all our clones. PMID:10860659

  10. Construction of cDNA library and preliminary analysis of expressed sequence tags from Siberian tiger

    PubMed Central

    Liu, Chang-Qing; Lu, Tao-Feng; Feng, Bao-Gang; Liu, Dan; Guan, Wei-Jun; Ma, Yue-Hui

    2010-01-01

    In this study we successfully constructed a full-length cDNA library from Siberian tiger, Panthera tigris altaica, the most well-known wild Animal. Total RNA was extracted from cultured Siberian tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.30×106 pfu/ml and 1.62×109 pfu/ml respectively. The proportion of recombinants from unamplified library was 90.5% and average length of exogenous inserts was 1.13 kb. A total of 282 individual ESTs with sizes ranging from 328 to 1,142bps were then analyzed the BLASTX score revealed that 53.9% of the sequences were classified as strong match, 38.6% as nominal and 7.4% as weak match. 28.0% of them were found to be related to enzyme/catalytic protein, 20.9% ESTs to metabolism, 13.1% ESTs to transport, 12.1% ESTs to signal transducer/cell communication, 9.9% ESTs to structure protein, 3.9% ESTs to immunity protein/defense metabolism, 3.2% ESTs to cell cycle, and 8.9 ESTs classified as novel genes. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genomic research of Siberian tigers. PMID:20941376

  11. Generation and analysis of expressed sequence tags from Trypanosoma cruzi trypomastigote and amastigote cDNA libraries.

    PubMed

    Agüero, Fernán; Abdellah, Karim Ben; Tekiel, Valeria; Sánchez, Daniel O; González, Antonio

    2004-08-01

    We have generated 2771 expressed sequence tags (ESTs) from two cDNA libraries of Trypanosoma cruzi CL-Brener. The libraries were constructed from trypomastigote and amastigotes, using a spliced leader primer to synthesize the cDNA second strand, thus selecting for full-length cDNAs. Since the libraries were not normalized nor pre-screened, we compared the representation of transcripts between the two using a statistical test and identify a subset of transcripts that show apparent differential representation. A non-redundant set of 1619 reconstructed transcripts was generated by sequence clustering. This dataset was used to perform similarity searches against protein and nucleotide databases. Based on these searches, 339 sequences could be assigned a putative identity. One thousand one-hundred and sixteen sequences in the non-redundant clustered dataset (68.8%) are new expression tags, not represented in the T. cruzi epimastigote ESTs that are in the public databases. Additional information is provided online at http://genoma.unsam.edu.ar/projects/tram. To the best of our knowledge these are the first ESTs reported for the life cycle stages of T. cruzi that occur in the vertebrate host. PMID:15478800

  12. Expressed sequence tags of the peanut pod nematode Ditylenchus africanus: the first transcriptome analysis of an Anguinid nematode

    PubMed Central

    Haegeman, Annelies; Jacob, Joachim; Vanholme, Bartel; Kyndt, Tina; Mitreva, Makedonka; Gheysen, Godelieve

    2009-01-01

    In this study, 4847 expressed sequenced tags (ESTs) from mixed stages of the migratory plant-parasitic nematode Ditylenchus africanus (peanut pod nematode) were investigated. It is the first molecular survey of a nematode which belongs to the family of the Anguinidae (order Rhabditida, superfamily Sphaerularioidea). The sequences were clustered into 2596 unigenes, of which 43% did not show any homology to known protein, nucleotide, nematode EST or plant-parasitic nematode genome sequences. Gene ontology mapping revealed that most putative proteins are involved in developmental and reproductive processes. In addition unigenes involved in oxidative stress as well as in anhydrobiosis, such as LEA (late embryogenesis abundant protein) and trehalose-6-phosphate synthase were identified. Other tags showed homology to genes previously described as being involved in parasitism (expansin, SEC-2, calreticulin, 14-3-3b and various allergen proteins). In situ hybridization revealed that the expression of a putative expansin and a venom allergen protein was restricted to the gland cell area of the nematode, being in agreement with their presumed role in parasitism. Furthermore, 7 putative novel candidate parasitism genes were identified based on the prediction of a signal peptide in the corresponding protein sequence and homologous ESTs exclusively in parasitic nematodes. These genes are interesting for further research and functional characterization. Finally, 34 unigenes were retained as good target candidates for future RNAi experiments, because of their nematode specific nature and observed lethal phenotypes of Caenorhabditis elegans homologs. PMID:19383517

  13. Identification of novel highly expressed genes in pancreatic ductal adenocarcinomas through a bioinformatics analysis of expressed sequence tags.

    PubMed

    Cao, Dengfeng; Hustinx, Steven R; Sui, Guoping; Bala, P; Sato, Norihiro; Martin, Sean; Maitra, Anirban; Murphy, Kathleen M; Cameron, John L; Yeo, Charles J; Kern, Scott E; Goggins, Michael; Pandey, Akhilesh; Hruban, Ralph H

    2004-11-01

    In most microarray experiments, a significant fraction of the differentially expressed mRNAs identified correspond to expressed sequence tags (ESTs) and are generally discarded from further analyses. We used careful bioinformatics analyses to characterize those ESTs that were found to be highly overexpressed in a series of pancreatic adenocarcinomas. cDNA was prepared from 60 non-neoplastic samples (normal pancreas [n = 20], normal colon [n = 10], or normal duodenal mucosal [n = 30]) and from 64 pancreatic cancers (resected cancers [n = 50] or cancer cell lines [n = 14]) and hybridized to the complete Affymetrix Human Genome U133 GeneChip(R) set (arrays U133A and B) for simultaneous analysis of 45,000 fragments corresponding to 33,000 known genes and 6,000 ESTs. The GeneExpress(R) software system Fold Change Analysis Tool was used and 60 ESTs were identified that were expressed at levels at least 3-fold greater in the pancreatic cancers as compared to normal tissues. Searches against the human genomic sequence and comparative genomic analysis of human and mouse genomes was carried out using basic local alignment search tools (BLAST), BLASTN, and BLASTX, for identifying protein coding genes corresponding to the ESTs. Subsequently, in order to pick the most relevant candidate genes for a more detailed analysis, we looked for domains/motifs in the open reading frames using SMART and Pfam programs. We were able to definitively map 43 of the 60 ESTs to known or novel genes, and 15 of the ESTs could be localized in close proximity to a gene in the human genome although we were unable to establish that the EST was indeed derived from those genes. The differential expression of a subset of genes was confirmed at the protein level by immunohistochemical labeling of tissue microarrays (inhibin beta A [INHBA] and CD29) and/or at the transcript level by RT-PCR (INHBA, AKAP12, ELK3, FOXQ1, EIF5A2, and EFNA5). We conclude that bioinformatics tools can be used to characterize

  14. Complementary DNA sequencing: Expressed sequence tags and human genome project

    SciTech Connect

    Adams, M.D.; Kelley, J.M.; Gocayne, J.D.; Dubnick, M.; Wu, A.; Olde, B.; Moreno, R.F.; Kerlavage, A.R.; McCombie, W.R.; Venter, J.C. ); Polymeropoulos, M.H.; Hong Xiao; Merril, C.R. )

    1991-06-21

    Automated partial DNA sequencing was conducted on more than 600 randomly selected human brain complementary DNA (cDNA) clones to generate expressed sequence tags (ESTs). ESTs have applications in the discovery of new human genes, mapping of the human genome, and identification of coding regions in genomic sequences. Of the sequences generated, 337 represent new genes, including 48 with significant similarity to genes from other organisms, such as a yeast RNA polymerase II subunit; Drosophila kinesin, Notch, and Enhancer of split; and a murine tyrosine kinase receptor. Forty-six ESTs were mapped to chromosomes after amplification by the polymerase chain reaction. This fast approach to cDNA characterization will facilitate the tagging of most human genes in a few years at a fraction of the cost of complete genomic sequencing, provide new genetic markers, and serve as a resource in diverse biological research fields.

  15. Expressed sequence tag analysis of khat (Catha edulis) provides a putative molecular biochemical basis for the biosynthesis of phenylpropylamino alkaloids.

    PubMed

    Hagel, Jillian M; Krizevski, Raz; Kilpatrick, Korey; Sitrit, Yaron; Marsolais, Frédéric; Lewinsohn, Efraim; Facchini, Peter J

    2011-10-01

    Khat (Catha edulis Forsk.) is a flowering perennial shrub cultivated for its neurostimulant properties resulting mainly from the occurrence of (S)-cathinone in young leaves. The biosynthesis of (S)-cathinone and the related phenylpropylamino alkaloids (1S,2S)-cathine and (1R,2S)-norephedrine is not well characterized in plants. We prepared a cDNA library from young khat leaves and sequenced 4,896 random clones, generating an expressed sequence tag (EST) library of 3,293 unigenes. Putative functions were assigned to > 98% of the ESTs, providing a key resource for gene discovery. Candidates potentially involved at various stages of phenylpropylamino alkaloid biosynthesis from L-phenylalanine to (1S,2S)-cathine were identified. PMID:22215969

  16. Expressed sequence tag analysis of khat (Catha edulis) provides a putative molecular biochemical basis for the biosynthesis of phenylpropylamino alkaloids

    PubMed Central

    Hagel, Jillian M.; Krizevski, Raz; Kilpatrick, Korey; Sitrit, Yaron; Marsolais, Frédéric; Lewinsohn, Efraim; Facchini, Peter J.

    2011-01-01

    Khat (Catha edulis Forsk.) is a flowering perennial shrub cultivated for its neurostimulant properties resulting mainly from the occurrence of (S)-cathinone in young leaves. The biosynthesis of (S)-cathinone and the related phenylpropylamino alkaloids (1S,2S)-cathine and (1R,2S)-norephedrine is not well characterized in plants. We prepared a cDNA library from young khat leaves and sequenced 4,896 random clones, generating an expressed sequence tag (EST) library of 3,293 unigenes. Putative functions were assigned to > 98% of the ESTs, providing a key resource for gene discovery. Candidates potentially involved at various stages of phenylpropylamino alkaloid biosynthesis from L-phenylalanine to (1S,2S)-cathine were identified. PMID:22215969

  17. Gene Discovery through Expressed Sequence Tag Sequencing in Trypanosoma cruzi

    PubMed Central

    Verdun, Ramiro E.; Di Paolo, Nelson; Urmenyi, Turan P.; Rondinelli, Edson; Frasch, Alberto C. C.; Sanchez, Daniel O.

    1998-01-01

    Analysis of expressed sequence tags (ESTs) constitutes a useful approach for gene identification that, in the case of human pathogens, might result in the identification of new targets for chemotherapy and vaccine development. As part of the Trypanosoma cruzi genome project, we have partially sequenced the 5′ ends of 1,949 clones to generate ESTs. The clones were randomly selected from a normalized CL Brener epimastigote cDNA library. A total of 14.6% of the clones were homologous to previously identified T. cruzi genes, while 18.4% had significant matches to genes from other organisms in the database. A total of 67% of the ESTs had no matches in the database, and thus, some of them might be T. cruzi-specific genes. Functional groups of those sequences with matches in the database were constructed according to their putative biological functions. The two largest categories were protein synthesis (23.3%) and cell surface molecules (10.8%). The information reported in this paper should be useful for researchers in the field to analyze genes and proteins of their own interest. PMID:9784549

  18. Analysis of expressed sequence tags from the anamorphic basidiomycetous yeast, Pseudozyma antarctica, which produces glycolipid biosurfactants, mannosylerythritol lipids.

    PubMed

    Morita, Tomotake; Konishi, Masaaki; Fukuoka, Tokuma; Imura, Tomohiro; Kitamoto, Dai

    2006-07-15

    Pseudozyma antarctica T-34 secretes a large amount of biosurfactants (BS), mannosylerythritol lipids (MEL), from different carbon sources such as hydrocarbons and vegetable oils. The detailed biosynthetic pathway of MEL remained unknown due to lack of genetic information on the anamorphic basidiomycetous yeasts, including the genus Pseudozyma. Here, in order to obtain genetic information on P. antarctica T-34, we constructed a cDNA library from yeast cells producing MEL from soybean oil and identified the genes expressed through the creation of an expressed sequence tags (EST) library. We generated 398 ESTs, assembled into 146 contiguous sequences. Based upon a BLAST search similarity cut-off of Esequences in the protein database; 60.3% of all contiguous sequences shared significant identities to hypothetical protein of Ustilago maydis, which is a smut fungus and BS producer. Based on the gene expression study using real-time reverse transcriptase-PCR, the predicted genes, such as mannosyltranferase and acyltransferase, were demonstrated to be highly involved in MEL biosynthesis in soybean oil-grown cells. PMID:16845679

  19. Multiplexed genotyping with sequence-tagged molecular inversion probes.

    PubMed

    Hardenbol, Paul; Banér, Johan; Jain, Maneesh; Nilsson, Mats; Namsaraev, Eugeni A; Karlin-Neumann, George A; Fakhrai-Rad, Hossein; Ronaghi, Mostafa; Willis, Thomas D; Landegren, Ulf; Davis, Ronald W

    2003-06-01

    We report on the development of molecular inversion probe (MIP) genotyping, an efficient technology for large-scale single nucleotide polymorphism (SNP) analysis. This technique uses MIPs to produce inverted sequences, which undergo a unimolecular rearrangement and are then amplified by PCR using common primers and analyzed using universal sequence tag DNA microarrays, resulting in highly specific genotyping. With this technology, multiplex analysis of more than 1,000 probes in a single tube can be done using standard laboratory equipment. Genotypes are generated with a high call rate (95%) and high accuracy (>99%) as determined by independent sequencing. PMID:12730666

  20. Chromatin Interaction Analysis with Paired-End Tag Sequencing (ChIA-PET) for Mapping Chromatin Interactions and Understanding Transcription Regulation

    PubMed Central

    Poh, Huay Mei; Peh, Su Qin; Ong, Chin Thing; Zhang, Jingyao; Ruan, Xiaoan; Ruan, Yijun

    2012-01-01

    Genomes are organized into three-dimensional structures, adopting higher-order conformations inside the micron-sized nuclear spaces 7, 2, 12. Such architectures are not random and involve interactions between gene promoters and regulatory elements 13. The binding of transcription factors to specific regulatory sequences brings about a network of transcription regulation and coordination 1, 14. Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) was developed to identify these higher-order chromatin structures 5,6. Cells are fixed and interacting loci are captured by covalent DNA-protein cross-links. To minimize non-specific noise and reduce complexity, as well as to increase the specificity of the chromatin interaction analysis, chromatin immunoprecipitation (ChIP) is used against specific protein factors to enrich chromatin fragments of interest before proximity ligation. Ligation involving half-linkers subsequently forms covalent links between pairs of DNA fragments tethered together within individual chromatin complexes. The flanking MmeI restriction enzyme sites in the half-linkers allow extraction of paired end tag-linker-tag constructs (PETs) upon MmeI digestion. As the half-linkers are biotinylated, these PET constructs are purified using streptavidin-magnetic beads. The purified PETs are ligated with next-generation sequencing adaptors and a catalog of interacting fragments is generated via next-generation sequencers such as the Illumina Genome Analyzer. Mapping and bioinformatics analysis is then performed to identify ChIP-enriched binding sites and ChIP-enriched chromatin interactions 8. We have produced a video to demonstrate critical aspects of the ChIA-PET protocol, especially the preparation of ChIP as the quality of ChIP plays a major role in the outcome of a ChIA-PET library. As the protocols are very long, only the critical steps are shown in the video. PMID:22564980

  1. Development of expressed sequence tag-simple sequence repeat markers for genetic characterization and population structure analysis of Praxelis clematidea (Asteraceae).

    PubMed

    Wang, Q Z; Huang, M; Downie, S R; Chen, Z X

    2016-01-01

    Invasive plants tend to spread aggressively in new habitats and an understanding of their genetic diversity and population structure is useful for their management. In this study, expressed sequence tag-simple sequence repeat (EST-SSR) markers were developed for the invasive plant species Praxelis clematidea (Asteraceae) from 5548 Stevia rebaudiana (Asteraceae) expressed sequence tags (ESTs). A total of 133 microsatellite-containing ESTs (2.4%) were identified, of which 56 (42.1%) were hexanucleotide repeat motifs and 50 (37.6%) were trinucleotide repeat motifs. Of the 24 primer pairs designed from these 133 ESTs, 7 (29.2%) resulted in significant polymorphisms. The number of alleles per locus ranged from 5 to 9. The relatively high genetic diversity (H = 0.2667, I = 0.4212, and P = 100%) of P. clematidea was related to high gene flow (Nm = 1.4996) among populations. The coefficient of population differentiation (GST = 0.2500) indicated that most genetic variation occurred within populations. A Mantel test suggested that there was significant correlation between genetic distance and geographical distribution (r = 0.3192, P = 0.012). These results further support the transferability of EST-SSR markers between closely related genera of the same family. PMID:27323082

  2. Generation and Analysis of Expressed Sequence Tags from Chimonanthus praecox (Wintersweet) Flowers for Discovering Stress-Responsive and Floral Development-Related Genes

    PubMed Central

    Sui, Shunzhao; Luo, Jianghui; Ma, Jing; Zhu, Qinlong; Lei, Xinghua; Li, Mingyang

    2012-01-01

    A complementary DNA library was constructed from the flowers of Chimonanthus praecox, an ornamental perennial shrub blossoming in winter in China. Eight hundred sixty-seven high-quality expressed sequence tag sequences with an average read length of 673.8 bp were acquired. A nonredundant set of 479 unigenes, including 94 contigs and 385 singletons, was identified after the expressed sequence tags were clustered and assembled. BLAST analysis against the nonredundant protein database and nonredundant nucleotide database revealed that 405 unigenes shared significant homology with known genes. The homologous unigenes were categorized according to Gene Ontology hierarchies (biological, cellular, and molecular). By BLAST analysis and Gene Ontology annotation, 95 unigenes involved in stress and defense and 19 unigenes related to floral development were identified based on existing knowledge. Twelve genes, of which 9 were annotated as “cold response,” were examined by real-time RT-PCR to understand the changes in expression patterns under cold stress and to validate the findings. Fourteen genes, including 11 genes related to floral development, were also detected by real-time RT-PCR to validate the expression patterns in the blooming process and in different tissues. This study provides a useful basis for the genomic analysis of C. praecox. PMID:22536115

  3. TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets

    PubMed Central

    2010-01-01

    Background Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Furthermore, the tag sequence may be unavailable or incorrectly reported. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data. Results TagCleaner is a web application developed to automatically identify and remove known or unknown tag sequences allowing insertions and deletions in the dataset. TagCleaner is designed to filter the trimmed reads for duplicates, short reads, and reads with high rates of ambiguous sequences. An additional screening for and splitting of fragment-to-fragment concatenations that gave rise to artificial concatenated sequences can increase the quality of the dataset. Users may modify the different filter parameters according to their own preferences. Conclusions TagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at http://edwards.sdsu.edu/tagcleaner. PMID:20573248

  4. Identification and isolation of full-length cDNA sequences by sequencing and analysis of expressed sequence tags from guarana (Paullinia cupana).

    PubMed

    Figueirêdo, L C; Faria-Campos, A C; Astolfi-Filho, S; Azevedo, J L

    2011-01-01

    The current intense production of biological data, generated by sequencing techniques, has created an ever-growing volume of unanalyzed data. We reevaluated data produced by the guarana (Paullinia cupana) transcriptome sequencing project to identify cDNA clones with complete coding sequences (full-length clones) and complete sequences of genes of biotechnological interest, contributing to the knowledge of biological characteristics of this organism. We analyzed 15,490 ESTs of guarana in search of clones with complete coding regions. A total of 12,402 sequences were analyzed using BLAST, and 4697 full-length clones were identified, responsible for the production of 2297 different proteins. Eighty-four clones were identified as full-length for N-methyltransferase and 18 were sequenced in both directions to obtain the complete genome sequence, and confirm the search made in silico for full-length clones. Phylogenetic analyses were made with the complete genome sequences of three clones, which showed only 0.017% dissimilarity; these are phylogenetically close to the caffeine synthase of Theobroma cacao. The search for full-length clones allowed the identification of numerous clones that had the complete coding region, demonstrating this to be an efficient and useful tool in the process of biological data mining. The sequencing of the complete coding region of identified full-length clones corroborated the data from the in silico search, strengthening its efficiency and utility. PMID:21732283

  5. Exploring the Host Parasitism of the Migratory Plant-Parasitic Nematode Ditylenchus destuctor by Expressed Sequence Tags Analysis

    PubMed Central

    Peng, Huan; Gao, Bing-li; Kong, Ling-an; Yu, Qing; Huang, Wen-kun; He, Xu-feng; Long, Hai-bo; Peng, De-liang

    2013-01-01

    The potato rot nematode, Ditylenchus destructor, is a very destructive nematode pest on many agriculturally important crops worldwide, but the molecular characterization of its parasitism of plant has been limited. The effectors involved in nematode parasitism of plant for several sedentary endo-parasitic nematodes such as Heterodera glycines, Globodera rostochiensis and Meloidogyne incognita have been identified and extensively studied over the past two decades. Ditylenchus destructor, as a migratory plant parasitic nematode, has different feeding behavior, life cycle and host response. Comparing the transcriptome and parasitome among different types of plant-parasitic nematodes is the way to understand more fully the parasitic mechanism of plant nematodes. We undertook the approach of sequencing expressed sequence tags (ESTs) derived from a mixed stage cDNA library of D. destructor. This is the first study of D. destructor ESTs. A total of 9800 ESTs were grouped into 5008 clusters including 3606 singletons and 1402 multi-member contigs, representing a catalog of D. destructor genes. Implementing a bioinformatics' workflow, we found 1391 clusters have no match in the available gene database; 31 clusters only have similarities to genes identified from D. africanus, the most closely related species to D. destructor; 1991 clusters were annotated using Gene Ontology (GO); 1550 clusters were assigned enzyme commission (EC) numbers; and 1211 clusters were mapped to 181 KEGG biochemical pathways. 22 ESTs had similarities to reported nematode effectors. Interestedly, most of the effectors identified in this study are involved in host cell wall degradation or modification, such as 1,4-beta-glucanse, 1,3-beta-glucanse, pectate lyase, chitinases and expansin, or host defense suppression such as calreticulin, annexin and venom allergen-like protein. This result implies that the migratory plant-parasitic nematode D. destructor secrets similar effectors to those of sedentary

  6. Exploring the host parasitism of the migratory plant-parasitic nematode Ditylenchus destuctor by expressed sequence tags analysis.

    PubMed

    Peng, Huan; Gao, Bing-li; Kong, Ling-an; Yu, Qing; Huang, Wen-kun; He, Xu-feng; Long, Hai-bo; Peng, De-liang

    2013-01-01

    The potato rot nematode, Ditylenchus destructor, is a very destructive nematode pest on many agriculturally important crops worldwide, but the molecular characterization of its parasitism of plant has been limited. The effectors involved in nematode parasitism of plant for several sedentary endo-parasitic nematodes such as Heterodera glycines, Globodera rostochiensis and Meloidogyne incognita have been identified and extensively studied over the past two decades. Ditylenchus destructor, as a migratory plant parasitic nematode, has different feeding behavior, life cycle and host response. Comparing the transcriptome and parasitome among different types of plant-parasitic nematodes is the way to understand more fully the parasitic mechanism of plant nematodes. We undertook the approach of sequencing expressed sequence tags (ESTs) derived from a mixed stage cDNA library of D. destructor. This is the first study of D. destructor ESTs. A total of 9800 ESTs were grouped into 5008 clusters including 3606 singletons and 1402 multi-member contigs, representing a catalog of D. destructor genes. Implementing a bioinformatics' workflow, we found 1391 clusters have no match in the available gene database; 31 clusters only have similarities to genes identified from D. africanus, the most closely related species to D. destructor; 1991 clusters were annotated using Gene Ontology (GO); 1550 clusters were assigned enzyme commission (EC) numbers; and 1211 clusters were mapped to 181 KEGG biochemical pathways. 22 ESTs had similarities to reported nematode effectors. Interestedly, most of the effectors identified in this study are involved in host cell wall degradation or modification, such as 1,4-beta-glucanse, 1,3-beta-glucanse, pectate lyase, chitinases and expansin, or host defense suppression such as calreticulin, annexin and venom allergen-like protein. This result implies that the migratory plant-parasitic nematode D. destructor secrets similar effectors to those of sedentary

  7. Expressed sequence-tag analysis of ovaries of Brachiaria brizantha reveals genes associated with the early steps of embryo sac differentiation of apomictic plants.

    PubMed

    Silveira, Erica Duarte; Guimarães, Larissa Arrais; de Alencar Dusi, Diva Maria; da Silva, Felipe Rodrigues; Martins, Natália Florencio; do Carmo Costa, Marcos Mota; Alves-Ferreira, Márcio; de Campos Carneiro, Vera Tavares

    2012-02-01

    In apomixis, asexual mode of plant reproduction through seeds, an unreduced megagametophyte is formed due to circumvented or altered meiosis. The embryo develops autonomously from the unreduced egg cell, independently of fertilization. Brachiaria is a genus of tropical forage grasses that reproduces sexually or by apomixis. A limited number of studies have reported the sequencing of apomixis-related genes and a few Brachiaria sequences have been deposited at genebank databases. This work shows sequencing and expression analyses of expressed sequence-tags (ESTs) of Brachiaria genus and points to transcripts from ovaries with preferential expression at megasporogenesis in apomictic plants. From the 11 differentially expressed sequences from immature ovaries of sexual and apomictic Brachiaria brizantha obtained from macroarray analysis, 9 were preferentially detected in ovaries of apomicts, as confirmed by RT-qPCR. A putative involvement in early steps of Panicum-type embryo sac differentiation of four sequences from B. brizantha ovaries: BbrizHelic, BbrizRan, BbrizSec13 and BbrizSti1 is suggested. Two of these, BbrizSti1 and BbrizHelic, with similarity to a gene coding to stress induced protein and a helicase, respectively, are preferentially expressed in the early stages of apomictic ovaries development, especially in the nucellus, in a stage previous to the differentiation of aposporous initials, as verified by in situ hybridization. PMID:22068439

  8. Adult midgut expressed sequence tags from the tsetse fly Glossina morsitans morsitans and expression analysis of putative immune response genes

    PubMed Central

    Lehane, M J; Aksoy, S; Gibson, W; Kerhornou, A; Berriman, M; Hamilton, J; Soares, M B; Bonaldo, M F; Lehane, S; Hall, N

    2003-01-01

    Background Tsetse flies transmit African trypanosomiasis leading to half a million cases annually. Trypanosomiasis in animals (nagana) remains a massive brake on African agricultural development. While trypanosome biology is widely studied, knowledge of tsetse flies is very limited, particularly at the molecular level. This is a serious impediment to investigations of tsetse-trypanosome interactions. We have undertaken an expressed sequence tag (EST) project on the adult tsetse midgut, the major organ system for establishment and early development of trypanosomes. Results A total of 21,427 ESTs were produced from the midgut of adult Glossina morsitans morsitans and grouped into 8,876 clusters or singletons potentially representing unique genes. Putative functions were ascribed to 4,035 of these by homology. Of these, a remarkable 3,884 had their most significant matches in the Drosophila protein database. We selected 68 genes with putative immune-related functions, macroarrayed them and determined their expression profiles following bacterial or trypanosome challenge. In both infections many genes are downregulated, suggesting a malaise response in the midgut. Trypanosome and bacterial challenge result in upregulation of different genes, suggesting that different recognition pathways are involved in the two responses. The most notable block of genes upregulated in response to trypanosome challenge are a series of Toll and Imd genes and a series of genes involved in oxidative stress responses. Conclusions The project increases the number of known Glossina genes by two orders of magnitude. Identification of putative immunity genes and their preliminary characterization provides a resource for the experimental dissection of tsetse-trypanosome interactions. PMID:14519198

  9. Extending RAD tag analysis to microbial ecology: a comparison between MultiLocus Sequence Typing and 2b-RAD to investigate Listeria monocytogenes genetic structure.

    PubMed

    Pauletto, Marianna; Carraro, Lisa; Babbucci, Massimiliano; Lucchini, Rosaria; Bargelloni, Luca; Cardazzo, Barbara

    2016-05-01

    The advent of next-generation sequencing (NGS) has dramatically changed bacterial typing technologies, increasing our ability to differentiate bacterial isolates. Despite it is now possible to sequence a bacterial genome in a few days and at reasonable costs, most genetic analyses do not require whole-genome sequencing, which also remains impractical for large population samples due to the cost of individual library preparation and bioinformatics. More traditional sequencing approaches, however, such as MultiLocus Sequence Typing (mlst) are quite laborious and time-consuming, especially for large-scale analyses. In this study, a genotyping approach based on restriction site-associated (RAD) tag sequencing, 2b-RAD, was applied to characterize Listeria monocytogenes strains. To verify the feasibility of the method, an in silico analysis was performed on 30 available complete genomes. For the same set of strains, in silico mlst analysis was conducted as well. Subsequently, 2b-RAD and mlst analyses were experimentally carried out on 58 isolates collected from food samples or food-processing sites. The obtained results demonstrate that 2b-RAD predicts mlst types and often provides more detailed information on population structure than mlst. Moreover, the majority of variants differentiating identical sequence type isolates mapped against accessory fragments, thus providing additional information to characterize strains. Although mlst still represents a reliable typing method, large-scale studies on molecular epidemiology and public health, as well as bacterial phylogenetics, population genetics and biosafety could benefit of a low cost and fast turnaround time approach such as the 2b-RAD analysis proposed here. PMID:26613186

  10. Analysis of expression sequence tags from a full-length-enriched cDNA library of developing sesame seeds (Sesamum indicum)

    PubMed Central

    2011-01-01

    Background Sesame (Sesamum indicum) is one of the most important oilseed crops with high oil contents and rich nutrient value. However, genetic improvement efforts in sesame could not get benefit from molecular biology technology due to poor DNA and RNA sequence resources. In this study, we carried out a large scale of expressed sequence tags (ESTs) sequencing from developing sesame seeds and further conducted analysis on seed storage products-related genes. Results A normalized and full-length enriched cDNA library from 5 ~ 30 days old immature seeds was constructed and randomly sequenced, leading to generation of 41,248 expressed sequence tags (ESTs) which then formed 4,713 contigs and 27,708 singletons with 44.9% uniESTs being putative full-length open reading frames. Approximately 26,091 of all these uniESTs have significant matches to the counterparts in Nr database of GenBank, and 21,628 of them were assigned to one or more Gene ontology (GO) terms. Homologous genes involved in oil biosynthesis were identified including some conservative transcription factors regulating oil biosynthesis such as LEAFY COTYLEDON1 (LEC1), PICKLE (PKL), WRINKLED1 (WRI1) and majority of them were found for the first time in sesame seeds. One hundred and 17 ESTs were identified possibly involved in biosynthesis of sesame lignans, sesamin and sesamolin. In total, 9,347 putative functional genes from developing seeds were identified, which accounts for one third of total genes in the sesame genome. Further analysis of the uniESTs identified 1,949 non-redundant simple sequence repeats (SSRs). Conclusions This study has provided an overview of genes expressed during sesame seed development. This collection of sesame full-length cDNAs covered a wide variety of genes in seeds, in particular, candidate genes involved in biosynthesis of sesame oils and lignans. These EST sequences enriched with full length will contribute to comparative genomic studies on sesame and other oilseed plants

  11. Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

    PubMed

    Gao, Chunsheng; Xin, Pengfei; Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis. PMID:25329551

  12. Diversity Analysis in Cannabis sativa Based on Large-Scale Development of Expressed Sequence Tag-Derived Simple Sequence Repeat Markers

    PubMed Central

    Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis. PMID:25329551

  13. Development of Microsatellite Markers Derived from Expressed Sequence Tags of Polyporales for Genetic Diversity Analysis of Endangered Polyporus umbellatus.

    PubMed

    Zhang, Yuejin; Chen, Yuanyuan; Wang, Ruihong; Zeng, Ailin; Deyholos, Michael K; Shu, Jia; Guo, Hongbo

    2015-01-01

    A large scale of EST sequences of Polyporales was screened in this investigation in order to identify EST-SSR markers for various applications. The distribution of EST sequences and SSRs in five families of Polyporales was analyzed, respectively. Mononucleotide was the most abundant type, followed by trinucleotide. Among five families, Ganodermataceae occupied the most SSR markers, followed by Coriolaceae. Functional prediction of SSR marker-containing EST sequences in Ganoderma lucidum obtained three main groups, namely, cellular component, biological process, and molecular function. Thirty EST-SSR primers were designed to evaluate the genetic diversity of 13 natural Polyporus umbellatus accessions. Twenty one EST-SSRs were polymorphic with average PIC value of 0.33 and transferability rate of 71%. These 13 P. umbellatus accessions showed relatively high genetic diversity. The expected heterozygosity, Nei's gene diversity, and Shannon information index were 0.41, 0.39, and 0.57, respectively. Both UPGMA dendrogram and principal coordinate analysis (PCA) showed the same cluster result that divided the 13 accessions into three or four groups. PMID:26146636

  14. Expressed sequence tag analysis and development of gene associated markers in a near-isogenic plant system of Eragrostis curvula.

    PubMed

    Cervigni, Gerardo D L; Paniego, Norma; Díaz, Marina; Selva, Juan P; Zappacosta, Diego; Zanazzi, Darío; Landerreche, Iñaki; Martelotto, Luciano; Felitti, Silvina; Pessino, Silvina; Spangenberg, Germán; Echenique, Viviana

    2008-05-01

    Eragrostis curvula (Schrad.) Nees is a forage grass native to the semiarid regions of Southern Africa, which reproduces mainly by pseudogamous diplosporous apomixis. A collection of ESTs was generated from four cDNA libraries, three of them obtained from panicles of near-isogenic lines with different ploidy levels and reproductive modes, and one obtained from 12 days-old plant leaves. A total of 12,295 high-quality ESTs were clustered and assembled, rendering 8,864 unigenes, including 1,490 contigs and 7,394 singletons, with a genome coverage of 22%. A total of 7,029 (79.11%) unigenes were functionally categorized by BLASTX analysis against sequences deposited in public databases, but only 37.80% could be classified according to Gene Ontology. Sequence comparison against the cereals genes indexes (GI) revealed 50% significant hits. A total of 254 EST-SSRs were detected from 219 singletons and 35 from contigs. Di- and tri- motifs were similarly represented with percentages of 38.95 and 40.16%, respectively. In addition, 190 SNPs and Indels were detected in 18 contigs generated from 3 to 4 libraries. The ESTs and the molecular markers obtained in this study will provide valuable resources for a wide range of applications including gene identification, genetic mapping, cultivar identification, analysis of genetic diversity, phenotype mapping and marker assisted selection. PMID:18196464

  15. Identification of salt-induced genes from Salicornia brachiata, an extreme halophyte through expressed sequence tags analysis.

    PubMed

    Jha, Bhavanath; Agarwal, Pradeep K; Reddy, Palakolanu Sudhakar; Lal, Sanjay; Sopory, Sudhir K; Reddy, Malireddy K

    2009-04-01

    Salinity severely affects plant growth and development causing crop loss worldwide. We have isolated a large number of salt-induced genes as well as unknown and hypothetical genes from Salicornia brachiata Roxb. (Amaranthaceae). This is the first description of identification of genes in response to salinity stress in this extreme halophyte plant. Salicornia accumulates salt in its pith and survives even at 2 M NaCl under field conditions. For isolating salt responsive genes, cDNA subtractive hybridization was performed between control and 500 mM NaCl treated plants. Out of the 1200 recombinant clones, 930 sequences were submitted to the NCBI database (GenBank accession: EB484528 to EB485289 and EC906125 to EC906292). 789 ESTs showed matching with different genes in NCBI database. 4.8% ESTs belonged to stress-tolerant gene category and approximately 29% ESTs showed no homology with known functional gene sequences, thus classified as unknown or hypothetical. The detection of a large number of ESTs with unknown putative function in this species makes it an interesting contribution. The 90 unknown and hypothetical genes were selected to study their differential regulation by reverse Northern analysis for identifying their role in salinity tolerance. Interestingly, both up and down regulation at 500 mM NaCl were observed (21 and 10 genes, respectively). Northern analysis of two important salt tolerant genes, ASR1 (Abscisic acid stress ripening gene) and plasma membrane H+ATPase, showed the basal level of transcripts in control condition and an increase with NaCl treatment. ASR1 gene is made full length using 5' RACE and its potential role in imparting salt tolerance is being studied. PMID:19556705

  16. A sequence-tagged site map of human chromosome 11.

    PubMed

    Smith, M W; Clark, S P; Hutchinson, J S; Wei, Y H; Churukian, A C; Daniels, L B; Diggle, K L; Gen, M W; Romo, A J; Lin, Y

    1993-09-01

    We report the construction of 370 sequence-tagged sites (STSs) that are detectable by PCR amplification under sets of standardized conditions and that have been regionally mapped to human chromosome 11. DNA sequences were determined by sequencing directly from cosmid templates using primers complementary to T3 and T7 promoters present in the cloning vector. Oligonucleotide PCR primers were predicted by computer and tested using a battery of genomic DNAs. Cosmids were regionally localized on chromosome 11 by using fluorescence in situ hybridization or by analyzing a somatic cell hybrid panel. Additional STSs corresponding to known genes and markers on chromosome 11 were also produced under the same series of standardized conditions. The resulting STSs provide uniform coverage of chromosome 11 with an average spacing of 340 kb. The DNA sequence determined for use in STS production corresponds to about 0.1% (116 kb) of chromosome 11 and has been analyzed for the presence of repetitive sequences, similarities to known genes and motifs, and possible exons. Computer analysis of this sequence has identified and therefore mapped at least eight new genes on chromosome 11. PMID:8244387

  17. Computational exploration of microRNAs from expressed sequence tags of Humulus lupulus, target predictions and expression analysis.

    PubMed

    Mishra, Ajay Kumar; Duraisamy, Ganesh Selvaraj; Týcová, Anna; Matoušek, Jaroslav

    2015-12-01

    Among computationally predicted and experimentally validated plant miRNAs, several are conserved across species boundaries in the plant kingdom. In this study, a combined experimental-in silico computational based approach was adopted for the identification and characterization of miRNAs in Humulus lupulus (hop), which is widely cultivated for use by the brewing industry and apart from, used as a medicinal herb. A total of 22 miRNAs belonging to 17 miRNA families were identified in hop following comparative computational approach and EST-based homology search according to a series of filtering criteria. Selected miRNAs were validated by end-point PCR and quantitative reverse transcription-polymerase chain reaction (qRT-PCR), confirmed the existence of conserved miRNAs in hop. Based on the characteristic that miRNAs exhibit perfect or nearly perfect complementarity with their targeted mRNA sequences, a total of 47 potential miRNA targets were identified in hop. Strikingly, the majority of predicted targets were belong to transcriptional factors which could regulate hop growth and development, including leaf, root and even cone development. Moreover, the identified miRNAs may also be involved in other cellular and metabolic processes, such as stress response, signal transduction, and other physiological processes. The cis-regulatory elements relevant to biotic and abiotic stress, plant hormone response, flavonoid biosynthesis were identified in the promoter regions of those miRNA genes. Overall, findings from this study will accelerate the way for further researches of miRNAs, their functions in hop and shows a path for the prediction and analysis of miRNAs to those species whose genomes are not available. PMID:26476128

  18. OligoTag: a program for designing sets of tags for next-generation sequencing of multiplexed samples.

    PubMed

    Coissac, Eric

    2012-01-01

    Next-generation sequencing systems allow high-throughput production of DNA sequence data. But this technology is more adapted for analyzing a small number of samples needing a huge amount of sequences rather than a large number of samples needing a small number of sequences. One solution to this problem is sample multiplexing. To achieve this, one can add a small tag at the extremities of the sequenced DNA molecules. These tags will be identified using bioinformatics tools after the sequencing step to sort sequences among samples. The rules to apply for selecting a good set of tags adapted to each situation are described in this chapter. Depending on the number of samples to tag and on the required quality of assignation, different solutions are possible. The software oligoTag, a part of OBITools that computes these sets of tags, is presented with some example sets of tags. PMID:22665273

  19. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    PubMed Central

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  20. Generation and analysis of expressed sequence tags(ESTs) for marker development in yam (Dioscores alata L.)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A total of 44,757 EST sequences , 1705 EST-SSR and 104 SNP markers were generated from the cDNA libraries of the resistant and susceptible genotypes. We have developed a comprehensive annotated transcriptome data set in yam to enrich the EST information in public databases. These EST resources prov...

  1. Generation and Analysis of Expressed Sequence Tags (ESTs) from Halophyte Atriplex canescens to Explore Salt-Responsive Related Genes

    PubMed Central

    Li, Jingtao; Sun, Xinhua; Yu, Gang; Jia, Chengguo; Liu, Jinliang; Pan, Hongyu

    2014-01-01

    Little information is available on gene expression profiling of halophyte A. canescens. To elucidate the molecular mechanism for stress tolerance in A. canescens, a full-length complementary DNA library was generated from A. canescens exposed to 400 mM NaCl, and provided 343 high-quality ESTs. In an evaluation of 343 valid EST sequences in the cDNA library, 197 unigenes were assembled, among which 190 unigenes (83.1% ESTs) were identified according to their significant similarities with proteins of known functions. All the 343 EST sequences have been deposited in the dbEST GenBank under accession numbers JZ535802 to JZ536144. According to Arabidopsis MIPS functional category and GO classifications, we identified 193 unigenes of the 311 annotations EST, representing 72 non-redundant unigenes sharing similarities with genes related to the defense response. The sets of ESTs obtained provide a rich genetic resource and 17 up-regulated genes related to salt stress resistance were identified by qRT-PCR. Six of these genes may contribute crucially to earlier and later stage salt stress resistance. Additionally, among the 343 unigenes sequences, 22 simple sequence repeats (SSRs) were also identified contributing to the study of A. canescens resources. PMID:24960361

  2. Identification of Tuber borchii Vittad. mycelium proteins separated by two-dimensional polyacrylamide gel electrophoresis using amino acid analysis and sequence tagging.

    PubMed

    Vallorani, L; Bernardini, F; Sacconi, C; Pierleoni, R; Pieretti, B; Piccoli, G; Buffalini, M; Stocchi, V

    2000-11-01

    This paper reports the first results in the proteome analysis of Tuber borchii Vittad. mycelium, an ectomycorrhizal fungus poorly defined genetically, but known for its generation of edible fruit bodies known as white truffles. Employing isoelectric focusing on immobilized pH gradients, followed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis, we obtained an electropherogram presenting over 800 spots within the window of isoelectric points (pI) 3.5-9 and a molecular mass of 10-200 kDa. Different reducing agents were tested in the sample preparation buffers, and the standard lysis buffer plus 2% w/v polyvinylpolypyrrolidone allowed the best solubilization and resolution of the proteins. The T. borchii proteins separated in micropreparative gels were electroblotted onto polyvinylidene difluoride membranes and visualized by Coomassie staining. Twenty-three proteins were excised and analyzed by the combination of amino acid and N-terminal analysis. One protein was identified by matching its amino acid composition, estimated isoelectric point and molecular mass against the SWISS-PROT and EMBL databases. Four spots were successfully tagged by Edman microsequencing but no homologous sequences were found in databases. PMID:11271490

  3. Analysis of bacterial and archaeal diversity in coastal microbial mats using massive parallel 16S rRNA gene tag sequencing

    PubMed Central

    Bolhuis, Henk; Stal, Lucas J

    2011-01-01

    Coastal microbial mats are small-scale and largely closed ecosystems in which a plethora of different functional groups of microorganisms are responsible for the biogeochemical cycling of the elements. Coastal microbial mats play an important role in coastal protection and morphodynamics through stabilization of the sediments and by initiating the development of salt-marshes. Little is known about the bacterial and especially archaeal diversity and how it contributes to the ecological functioning of coastal microbial mats. Here, we analyzed three different types of coastal microbial mats that are located along a tidal gradient and can be characterized as marine (ST2), brackish (ST3) and freshwater (ST3) systems. The mats were sampled during three different seasons and subjected to massive parallel tag sequencing of the V6 region of the 16S rRNA genes of Bacteria and Archaea. Sequence analysis revealed that the mats are among the most diverse marine ecosystems studied so far and consist of several novel taxonomic levels ranging from classes to species. The diversity between the different mat types was far more pronounced than the changes between the different seasons at one location. The archaeal community for these mats have not been studied before and revealed a strong reaction on a short period of draught during summer resulting in a massive increase in halobacterial sequences, whereas the bacterial community was barely affected. We concluded that the community composition and the microbial diversity were intrinsic of the mat type and depend on the location along the tidal gradient indicating a relation with salinity. PMID:21544102

  4. A molecular analysis of desiccation tolerance mechanisms in the anhydrobiotic nematode Panagrolaimus superbus using expressed sequenced tags

    PubMed Central

    2012-01-01

    Background Some organisms can survive extreme desiccation by entering into a state of suspended animation known as anhydrobiosis. Panagrolaimus superbus is a free-living anhydrobiotic nematode that can survive rapid environmental desiccation. The mechanisms that P. superbus uses to combat the potentially lethal effects of cellular dehydration may include the constitutive and inducible expression of protective molecules, along with behavioural and/or morphological adaptations that slow the rate of cellular water loss. In addition, inducible repair and revival programmes may also be required for successful rehydration and recovery from anhydrobiosis. Results To identify constitutively expressed candidate anhydrobiotic genes we obtained 9,216 ESTs from an unstressed mixed stage population of P. superbus. We derived 4,009 unigenes from these ESTs. These unigene annotations and sequences can be accessed at http://www.nematodes.org/nembase4/species_info.php?species=PSC. We manually annotated a set of 187 constitutively expressed candidate anhydrobiotic genes from P. superbus. Notable among those is a putative lineage expansion of the lea (late embryogenesis abundant) gene family. The most abundantly expressed sequence was a member of the nematode specific sxp/ral-2 family that is highly expressed in parasitic nematodes and secreted onto the surface of the nematodes' cuticles. There were 2,059 novel unigenes (51.7% of the total), 149 of which are predicted to encode intrinsically disordered proteins lacking a fixed tertiary structure. One unigene may encode an exo-β-1,3-glucanase (GHF5 family), most similar to a sequence from Phytophthora infestans. GHF5 enzymes have been reported from several species of plant parasitic nematodes, with horizontal gene transfer (HGT) from bacteria proposed to explain their evolutionary origin. This P. superbus sequence represents another possible HGT event within the Nematoda. The expression of five of the 19 putative stress response

  5. Multiple tag labeling method for DNA sequencing

    DOEpatents

    Mathies, R.A.; Huang, X.C.; Quesada, M.A.

    1995-07-25

    A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.

  6. Multiple tag labeling method for DNA sequencing

    DOEpatents

    Mathies, Richard A.; Huang, Xiaohua C.; Quesada, Mark A.

    1995-01-01

    A DNA sequencing method described which uses single lane or channel electrophoresis. Sequencing fragments are separated in said lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radio-isotope labels.

  7. Analysis of expressed sequence tags from Actinidia: applications of a cross species EST database for gene discovery in the areas of flavor, health, color and ripening

    PubMed Central

    Crowhurst, Ross N; Gleave, Andrew P; MacRae, Elspeth A; Ampomah-Dwamena, Charles; Atkinson, Ross G; Beuning, Lesley L; Bulley, Sean M; Chagne, David; Marsh, Ken B; Matich, Adam J; Montefiori, Mirco; Newcomb, Richard D; Schaffer, Robert J; Usadel, Björn; Allan, Andrew C; Boldingh, Helen L; Bowen, Judith H; Davy, Marcus W; Eckloff, Rheinhart; Ferguson, A Ross; Fraser, Lena G; Gera, Emma; Hellens, Roger P; Janssen, Bart J; Klages, Karin; Lo, Kim R; MacDiarmid, Robin M; Nain, Bhawana; McNeilage, Mark A; Rassam, Maysoon; Richardson, Annette C; Rikkerink, Erik HA; Ross, Gavin S; Schröder, Roswitha; Snowden, Kimberley C; Souleyre, Edwige JF; Templeton, Matt D; Walton, Eric F; Wang, Daisy; Wang, Mindy Y; Wang, Yanming Y; Wood, Marion; Wu, Rongmei; Yauk, Yar-Khing; Laing, William A

    2008-01-01

    Background Kiwifruit (Actinidia spp.) are a relatively new, but economically important crop grown in many different parts of the world. Commercial success is driven by the development of new cultivars with novel consumer traits including flavor, appearance, healthful components and convenience. To increase our understanding of the genetic diversity and gene-based control of these key traits in Actinidia, we have produced a collection of 132,577 expressed sequence tags (ESTs). Results The ESTs were derived mainly from four Actinidia species (A. chinensis, A. deliciosa, A. arguta and A. eriantha) and fell into 41,858 non redundant clusters (18,070 tentative consensus sequences and 23,788 EST singletons). Analysis of flavor and fragrance-related gene families (acyltransferases and carboxylesterases) and pathways (terpenoid biosynthesis) is presented in comparison with a chemical analysis of the compounds present in Actinidia including esters, acids, alcohols and terpenes. ESTs are identified for most genes in color pathways controlling chlorophyll degradation and carotenoid biosynthesis. In the health area, data are presented on the ESTs involved in ascorbic acid and quinic acid biosynthesis showing not only that genes for many of the steps in these pathways are represented in the database, but that genes encoding some critical steps are absent. In the convenience area, genes related to different stages of fruit softening are identified. Conclusion This large EST resource will allow researchers to undertake the tremendous challenge of understanding the molecular basis of genetic diversity in the Actinidia genus as well as provide an EST resource for comparative fruit genomics. The various bioinformatics analyses we have undertaken demonstrates the extent of coverage of ESTs for genes encoding different biochemical pathways in Actinidia. PMID:18655731

  8. Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags

    PubMed Central

    Gorodkin, Jan; Cirera, Susanna; Hedegaard, Jakob; Gilchrist, Michael J; Panitz, Frank; Jørgensen, Claus; Scheibye-Knudsen, Karsten; Arvin, Troels; Lumholdt, Steen; Sawera, Milena; Green, Trine; Nielsen, Bente J; Havgaard, Jakob H; Rosenkilde, Carina; Wang, Jun; Li, Heng; Li, Ruiqiang; Liu, Bin; Hu, Songnian; Dong, Wei; Li, Wei; Yu, Jun; Wang, Jian; Stærfeldt, Hans-Henrik; Wernersson, Rasmus; Madsen, Lone B; Thomsen, Bo; Hornshøj, Henrik; Bujie, Zhan; Wang, Xuegang; Wang, Xuefei; Bolund, Lars; Brunak, Søren; Yang, Huanming; Bendixen, Christian; Fredholm, Merete

    2007-01-01

    Background Knowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages. Results Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which approximately 25% have a high confidence match to UniProt. Approximately 6,000 new porcine gene clusters were identified. Expression analysis based on the non-normalized libraries resulted in the following findings. The distribution of cluster sizes is scaling invariant. Brain and testes are among the tissues with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression of genes between different tissues, in particular brain/spinal cord, and found patterns of correlation between genes that share expression in pairs of libraries. Finally, there was remarkable agreement in expression between specialized tissues according to Gene Ontology categories. Conclusion This EST collection, the largest to date in pig, represents an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies. PMID:17407547

  9. Construction of a cDNA library and preliminary analysis of expressed sequence tags in Piper hainanense.

    PubMed

    Fan, R; Ling, P; Hao, C Y; Li, F P; Huang, L F; Wu, B D; Wu, H S

    2015-01-01

    Black pepper is a perennial climbing vine. It is widely cultivated because its berries can be utilized not only as a spice in food but also for medicinal use. This study aimed to construct a standardized, high-quality cDNA library to facilitated identification of new Piper hainanense transcripts. For this, 262 unigenes were used to generate raw reads. The average length of these 262 unigenes was 774.8 bp. Of these, 94 genes (35.9%) were newly identified, according to the NCBI protein database. Thus, identification of new genes may broaden the molecular knowledge of P. hainanense on the basis of Clusters of Orthologous Groups and Gene Ontology categories. In addition, certain basic genes linked to physiological processes, which can contribute to disease resistance and thereby to the breeding of black pepper. A total of 26 unigenes were found to be SSR markers. Dinucleotide SSR was the main repeat motif, accounting for 61.54%, followed by trinucleotide SSR (23.07%). Eight primer pairs successfully amplified DNA fragments and detected significant amounts of polymorphism among twenty-one piper germplasm. These results present a novel sequence information of P. hainanense, which can serve as the foundation for further genetic research on this species. PMID:26505424

  10. Generation and analysis of expressed sequence tags (ESTs) of Camelina sativa to mine drought stress-responsive genes.

    PubMed

    Kanth, Bashistha Kumar; Kumari, Shipra; Choi, Seo Hee; Ha, Hye-Jeong; Lee, Geung-Joo

    2015-11-01

    Camelina sativa is an oil-producing crop belonging to the family of Brassicaceae. Due to exceptionally high content of omega fatty acid, it is commercially grown around the world as edible oil, biofuel, and animal feed. A commonly referred 'false flax' or gold-of-pleasure Camelina sativa has been interested as one of biofuel feedstocks. The species can grow on marginal land due to its superior drought tolerance with low requirement of agricultural inputs. This crop has been unexploited due to very limited transcriptomic and genomic data. Use of gene-specific molecular markers is an important strategy for new cultivar development in breeding program. In this study, Illumina paired-end sequencing technology and bioinformatics tools were used to obtain expression profiling of genes responding to drought stress in Camelina sativa BN14. A total of more than 60,000 loci were assembled, corresponding to approximately 275 K transcripts. When the species was exposed to 10 kPa drought stress, 100 kPa drought stress, and rehydrated conditions, a total of 107, 2,989, and 982 genes, respectively, were up-regulated, while 146, 3,659, and 1189 genes, respectively, were down-regulated compared to control condition. Some unknown genes were found to be highly expressed under drought conditions, together with some already reported gene families such as senescence-associated genes, CAP160, and LEA under 100 kPa soil water condition, cysteine protease, 2OG, Fe(II)-dependent oxygenase, and RAD-like 1 under rehydrated condition. These genes will be further validated and mapped to determine their function and loci. This EST library will be favorably applied to develop gene-specific molecular markers and discover genes responsible for drought tolerance in Camelina species. PMID:26410535

  11. Applying thiouracil (TU)-tagging for mouse transcriptome analysis

    PubMed Central

    Gay, Leslie; Karfilis, Kate V.; Miller, Michael R.; Doe, Chris Q.; Stankunas, Kryn

    2014-01-01

    Transcriptional profiling is a powerful approach to study mouse development, physiology, and disease models. Here, we describe a protocol for mouse thiouracil-tagging (TU-tagging), a transcriptome analysis technology that includes in vivo covalent labeling, purification, and analysis of cell type-specific RNA. TU-tagging enables 1) the isolation of RNA from a given cell population of a complex tissue, avoiding transcriptional changes induced by cell isolation trauma, and 2) the identification of actively transcribed RNAs and not pre-existing transcripts. Therefore, in contrast to other cell-specific transcriptional profiling methods based on purification of tagged ribosomes or nuclei, TU-tagging provides a direct examination of transcriptional regulation. We describe how to: 1) deliver 4-thiouracil to transgenic mice to thio-label cell lineage-specific transcripts, 2) purify TU-tagged RNA and prepare libraries for Illumina sequencing, and 3) follow a straight-forward bioinformatics workflow to identify cell type-enriched or differentially expressed genes. Tissue containing TU-tagged RNA can be obtained in one day, RNA-Seq libraries generated within two days, and, following sequencing, an initial bioinformatics analysis completed in one additional day. PMID:24457332

  12. Needles in the EST Haystack: Large-Scale Identification and Analysis of Excretory-Secretory (ES) Proteins in Parasitic Nematodes Using Expressed Sequence Tags (ESTs)

    PubMed Central

    Nagaraj, Shivashankar H.; Gasser, Robin B.; Ranganathan, Shoba

    2008-01-01

    Background Parasitic nematodes of humans, other animals and plants continue to impose a significant public health and economic burden worldwide, due to the diseases they cause. Promising antiparasitic drug and vaccine candidates have been discovered from excreted or secreted (ES) proteins released from the parasite and exposed to the immune system of the host. Mining the entire expressed sequence tag (EST) data available from parasitic nematodes represents an approach to discover such ES targets. Methods and Findings In this study, we predicted, using EST2Secretome, a novel, high-throughput, computational workflow system, 4,710 ES proteins from 452,134 ESTs derived from 39 different species of nematodes, parasitic in animals (including humans) or plants. In total, 2,632, 786, and 1,292 ES proteins were predicted for animal-, human-, and plant-parasitic nematodes. Subsequently, we systematically analysed ES proteins using computational methods. Of these 4,710 proteins, 2,490 (52.8%) had orthologues in Caenorhabditis elegans, whereas 621 (13.8%) appeared to be novel, currently having no significant match to any molecule available in public databases. Of the C. elegans homologues, 267 had strong “loss-of-function” phenotypes by RNA interference (RNAi) in this nematode. We could functionally classify 1,948 (41.3%) sequences using the Gene Ontology (GO) terms, establish pathway associations for 573 (12.2%) sequences using Kyoto Encyclopaedia of Genes and Genomes (KEGG), and identify protein interaction partners for 1,774 (37.6%) molecules. We also mapped 758 (16.1%) proteins to protein domains including the nematode-specific protein family “transthyretin-like” and “chromadorea ALT,” considered as vaccine candidates against filariasis in humans. Conclusions We report the large-scale analysis of ES proteins inferred from EST data for a range of parasitic nematodes. This set of ES proteins provides an inventory of known and novel members of ES proteins as a

  13. Identification of potential vaccine and drug target candidates by expressed sequence tag analysis and immunoscreening of Onchocerca volvulus larval cDNA libraries.

    PubMed

    Lizotte-Waniewski, M; Tawe, W; Guiliano, D B; Lu, W; Liu, J; Williams, S A; Lustigman, S

    2000-06-01

    The search for appropriate vaccine candidates and drug targets against onchocerciasis has so far been confronted with several limitations due to the unavailability of biological material, appropriate molecular resources, and knowledge of the parasite biology. To identify targets for vaccine or chemotherapy development we have undertaken two approaches. First, cDNA expression libraries were constructed from life cycle stages that are critical for establishment of Onchocerca volvulus infection, the third-stage larvae (L3) and the molting L3. A gene discovery effort was then initiated by random expressed sequence tag analysis of 5,506 cDNA clones. Cluster analyses showed that many of the transcripts were up-regulated and/or stage specific in either one or both of the cDNA libraries when compared to the microfilariae, L2, and both adult stages of the parasite. Homology searches against the GenBank database facilitated the identification of several genes of interest, such as proteinases, proteinase inhibitors, antioxidant or detoxification enzymes, and neurotransmitter receptors, as well as structural and housekeeping genes. Other O. volvulus genes showed homology only to predicted genes from the free-living nematode Caenorhabditis elegans or were entirely novel. Some of the novel proteins contain potential secretory leaders. Secondly, by immunoscreening the molting L3 cDNA library with a pool of human sera from putatively immune individuals, we identified six novel immunogenic proteins that otherwise would not have been identified as potential vaccinogens using the gene discovery effort. This study lays a solid foundation for a better understanding of the biology of O. volvulus as well as for the identification of novel targets for filaricidal agents and/or vaccines against onchocerciasis based on immunological and rational hypothesis-driven research. PMID:10816503

  14. A score system for quality evaluation of RNA sequence tags: an improvement for gene expression profiling

    PubMed Central

    Pinheiro, Daniel G; Galante, Pedro AF; de Souza, Sandro J; Zago, Marco A; Silva, Wilson A

    2009-01-01

    Background High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS) or Sequencing-by-Synthesis (SBS) represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis. Results This methodology was applied to a public SAGE dataset. In order to compare data before and after filtering, a hierarchical clustering analysis was performed in samples from the same type of tissue, in distinct biological conditions, using these two datasets. Our results provide evidences suggesting that it is possible to find more congruous clusters after using S3T scoring system. Conclusion These results substantiate the proposed application to generate more reliable data. This is a significant contribution for determination of global gene expression profiles. The library analysis with S3T is freely available at . S3T source code and datasets can also be downloaded from the aforementioned website. PMID:19500384

  15. Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts

    PubMed Central

    Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming

    2015-01-01

    Background Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. Results We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington’s, Alzheimer’s and Parkinson’s diseases. This is the first description of degenerative disease-associated genes in jellyfish. Conclusion We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular

  16. Tag jumps illuminated--reducing sequence-to-sample misidentifications in metabarcoding studies.

    PubMed

    Schnell, Ida Baerholm; Bohmann, Kristine; Gilbert, M Thomas P

    2015-11-01

    Metabarcoding of environmental samples on second-generation sequencing platforms has rapidly become a valuable tool for ecological studies. A fundamental assumption of this approach is the reliance on being able to track tagged amplicons back to the samples from which they originated. In this study, we address the problem of sequences in metabarcoding sequencing outputs with false combinations of used tags (tag jumps). Unless these sequences can be identified and excluded from downstream analyses, tag jumps creating sequences with false, but already used tag combinations, can cause incorrect assignment of sequences to samples and artificially inflate diversity. In this study, we document and investigate tag jumping in metabarcoding studies on Illumina sequencing platforms by amplifying mixed-template extracts obtained from bat droppings and leech gut contents with tagged generic arthropod and mammal primers, respectively. We found that an average of 2.6% and 2.1% of sequences had tag combinations, which could be explained by tag jumping in the leech and bat diet study, respectively. We suggest that tag jumping can happen during blunt-ending of pools of tagged amplicons during library build and as a consequence of chimera formation during bulk amplification of tagged amplicons during library index PCR. We argue that tag jumping and contamination between libraries represents a considerable challenge for Illumina-based metabarcoding studies, and suggest measures to avoid false assignment of tag jumping-derived sequences to samples. PMID:25740652

  17. Development of expressed sequence tag-based microsatellite markers for the critically endangered Isoëtes sinensis (Isoetaceae) based on transcriptome analysis.

    PubMed

    Gichira, A W; Long, Z C; Wang, Q F; Chen, J M; Liao, K

    2016-01-01

    Isoëtes sinensis is a critically endangered quillwort. To facilitate studies on the conservation genetics of this species, we developed expressed sequence tag-simple sequence repeat (EST-SSR) markers. A total of 50,063 unigenes were predicted by transcriptome sequencing, 5294 (10.6%) of which significantly matched 3011 Gene Ontology annotations and 2363 were assigned to Kyoto Encyclopedia of Genes and Genomes metabolic pathways. Most of these (2297) were involved in metabolism. A total of 1982 SSR motifs were identified, with trinucleotides being the dominant repeat motif, and 1438 (72.6%) SSR primers were designed. Eighteen randomly selected primer pairs were used to genotype 24 I. sinensis accessions, which confirmed the suitability of these novel markers for molecular studies of I. sinensis. The heterozygosity index value ranged between 0.0799 and 0.9106, while the Shannon-Wiener diversity index value ranged between 0.1732 and 2.5589. The EST-SSRs reported in this study are linked to genic sequences, and are therefore ideal for investigating the evolutionary history of I. sinensis. These markers, together with the large EST dataset generated in this study, will greatly facilitate conservation genetic studies of I. sinensis. PMID:27525847

  18. Generation and analysis of a 29,745 unique Expressed Sequence Tags from the Pacific oyster (Crassostrea gigas) assembled into a publicly accessible database: the GigasDatabase

    PubMed Central

    2009-01-01

    Background Although bivalves are among the most-studied marine organisms because of their ecological role and economic importance, very little information is available on the genome sequences of oyster species. This report documents three large-scale cDNA sequencing projects for the Pacific oyster Crassostrea gigas initiated to provide a large number of expressed sequence tags that were subsequently compiled in a publicly accessible database. This resource allowed for the identification of a large number of transcripts and provides valuable information for ongoing investigations of tissue-specific and stimulus-dependant gene expression patterns. These data are crucial for constructing comprehensive DNA microarrays, identifying single nucleotide polymorphisms and microsatellites in coding regions, and for identifying genes when the entire genome sequence of C. gigas becomes available. Description In the present paper, we report the production of 40,845 high-quality ESTs that identify 29,745 unique transcribed sequences consisting of 7,940 contigs and 21,805 singletons. All of these new sequences, together with existing public sequence data, have been compiled into a publicly-available Website http://public-contigbrowser.sigenae.org:9090/Crassostrea_gigas/index.html. Approximately 43% of the unique ESTs had significant matches against the SwissProt database and 27% were annotated using Gene Ontology terms. In addition, we identified a total of 208 in silico microsatellites from the ESTs, with 173 having sufficient flanking sequence for primer design. We also identified a total of 7,530 putative in silico, single-nucleotide polymorphisms using existing and newly-generated EST resources for the Pacific oyster. Conclusion A publicly-available database has been populated with 29,745 unique sequences for the Pacific oyster Crassostrea gigas. The database provides many tools to search cleaned and assembled ESTs. The user may input and submit several filters, such as

  19. HIV-1 Quasispecies Delineation by Tag Linkage Deep Sequencing

    PubMed Central

    Wu, Nicholas C.; De La Cruz, Justin; Al-Mawsawi, Laith Q.; Olson, C. Anders; Qi, Hangfei; Luan, Harding H.; Nguyen, Nguyen; Du, Yushen; Le, Shuai; Wu, Ting-Ting; Li, Xinmin; Lewis, Martha J.; Yang, Otto O.; Sun, Ren

    2014-01-01

    Trade-offs between throughput, read length, and error rates in high-throughput sequencing limit certain applications such as monitoring viral quasispecies. Here, we describe a molecular-based tag linkage method that allows assemblage of short sequence reads into long DNA fragments. It enables haplotype phasing with high accuracy and sensitivity to interrogate individual viral sequences in a quasispecies. This approach is demonstrated to deduce ∼2000 unique 1.3 kb viral sequences from HIV-1 quasispecies in vivo and after passaging ex vivo with a detection limit of ∼0.005% to ∼0.001%. Reproducibility of the method is validated quantitatively and qualitatively by a technical replicate. This approach can improve monitoring of the genetic architecture and evolution dynamics in any quasispecies population. PMID:24842159

  20. A blackberry (Rubus L.) expressed sequence tag library for the development of simple sequence repeat markers

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A blackberry (Rubus L.) expressed sequence tag (EST) library was produced for developing simple sequence repeat (SSR) markers from the tetraploid blackberry cultivar, Merton Thornless, the source of the thornless trait in commercial cultivars. RNA was extracted from young expanding leaves and used f...

  1. Genomic and cDNA sequence tags of the hyperthermophilic archaeon Pyrobaculum aerophilum.

    PubMed Central

    Völkl, P; Markiewicz, P; Baikalov, C; Fitz-Gibbon, S; Stetter, K O; Miller, J H

    1996-01-01

    The hyperthermophilic archaeum, Pyrobaculum aerophilum, grows optimally at 100 degrees C with a doubling time of 180 min. It is a member of the phylogenetically ancient Thermoproteales order, but differs significantly from all other members by its facultatively aerobic metabolism. Due to its simple cultivation requirements and its nearly 100% plating efficiency, it was chosen as a model organism for studying the genome organization of hyperthermophilic ancient archaea. By a G+C content of the DNA of 52 mol%, sequence analysis was easily possible. At least some of the mRNA of P. aerophilum carried poly-A tails facilitating the construction of a cDNA library. 245 sequence tags of a poly-A primed cDNA library and 55 sequence tags from a 1-2 kb Sau3AI-fragment containing genomic library were analyzed and the corresponding amino acid sequences compared with protein sequences from databases. Fourteen percent of the cDNA and >9% of genomic DNA sequence tags revealed significant similarities to proteins in the databases. Matches were obtained to proteins from archaeal, bacterial and eukaryal sources. Some sequences showed greatest similarity to eukaryal rather than to bacterial versions of proteins, other matches were found to proteins which had previously only been found in eukaryotes. PMID:8948626

  2. Improved Sequence Tag Generation Method for Peptide Identification in Tandem Mass Spectrometry

    PubMed Central

    Cao, Xia; Nesvizhskii, Alexey I.

    2013-01-01

    The sequence tag-based peptide identification methods are a promising alternative to the traditional database search approach. However, a more comprehensive analysis, optimization, and comparison with established methods are necessary before these methods can gain widespread use in the proteomics community. Using the InsPecT open source code base (Tanner et al., Anal Chem. 2005, 77:4626–39), we present an improved sequence tag generation method that directly incorporates multi-charged fragment ion peaks present in many tandem mass spectra of higher charge states. We also investigate the performance of sequence tagging under different settings using control datasets generated on five different types of mass spectrometers, as well as using a complex phosphopeptide-enriched sample. We also demonstrate that additional modeling of InsPecT search scores using a semi-parametric approach incorporating the accuracy of the precursor ion mass measurement provides additional improvement in the ability to discriminate between correct and incorrect peptide identifications. The overall superior performance of the sequence tag-based peptide identification method is demonstrated by comparison with a commonly used SEQUEST/PeptideProphet approach. PMID:18785767

  3. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    PubMed Central

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-01-01

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105

  4. Construction of a full-length enriched cDNA library and preliminary analysis of expressed sequence tags from Bengal Tiger Panthera tigris tigris.

    PubMed

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-01-01

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105

  5. CREST--classification resources for environmental sequence tags.

    PubMed

    Lanzén, Anders; Jørgensen, Steffen L; Huson, Daniel H; Gorfer, Markus; Grindhaug, Svenn Helge; Jonassen, Inge; Øvreås, Lise; Urich, Tim

    2012-01-01

    Sequencing of taxonomic or phylogenetic markers is becoming a fast and efficient method for studying environmental microbial communities. This has resulted in a steadily growing collection of marker sequences, most notably of the small-subunit (SSU) ribosomal RNA gene, and an increased understanding of microbial phylogeny, diversity and community composition patterns. However, to utilize these large datasets together with new sequencing technologies, a reliable and flexible system for taxonomic classification is critical. We developed CREST (Classification Resources for Environmental Sequence Tags), a set of resources and tools for generating and utilizing custom taxonomies and reference datasets for classification of environmental sequences. CREST uses an alignment-based classification method with the lowest common ancestor algorithm. It also uses explicit rank similarity criteria to reduce false positives and identify novel taxa. We implemented this method in a web server, a command line tool and the graphical user interfaced program MEGAN. Further, we provide the SSU rRNA reference database and taxonomy SilvaMod, derived from the publicly available SILVA SSURef, for classification of sequences from bacteria, archaea and eukaryotes. Using cross-validation and environmental datasets, we compared the performance of CREST and SilvaMod to the RDP Classifier. We also utilized Greengenes as a reference database, both with CREST and the RDP Classifier. These analyses indicate that CREST performs better than alignment-free methods with higher recall rate (sensitivity) as well as precision, and with the ability to accurately identify most sequences from novel taxa. Classification using SilvaMod performed better than with Greengenes, particularly when applied to environmental sequences. CREST is freely available under a GNU General Public License (v3) from http://apps.cbu.uib.no/crest and http://lcaclassifier.googlecode.com. PMID:23145153

  6. Expressed sequence tags from the plant trypanosomatid Phytomonas serpens.

    PubMed

    Pappas, Georgios J; Benabdellah, Karim; Zingales, Bianca; González, Antonio

    2005-08-01

    We have generated 2190 expressed sequence tags (ESTs) from a cDNA library of the plant trypanosomatid Phytomonas serpens. Upon processing and clustering the set of 1893 accepted sequences was reduced to 697 clusters consisting of 452 singletons and 245 contigs. Functional categories were assigned based on BLAST searches against a database of the eukaryotic orthologous groups of proteins (KOG). Thirty six percent of the generated sequences showed no hits against the KOG database and 39.6% presented similarity to the KOG classes corresponding to translation, ribosomal structure and biogenesis. The most populated cluster contained 45 ESTs homologous to members of the glucose transporter family. This fact can be immediately correlated to the reported Phytomonas dependence on anaerobic glycolytic ATP production due to the lack of cytochrome-mediated respiratory chain. In this context, not only a number of enzymes of the glycolytic pathway were identified but also of the Krebs cycle as well as specific components of the respiratory chain. The data here reported, including a few hundred unique sequences and the description of tandemly repeated motifs and putative transcript stability motifs at untranslated mRNA ends, represent an initial approach to overcome the lack of information on the molecular biology of this organism. PMID:15869816

  7. Initiation of a Sarcocystis neurona expressed sequence tag (EST) sequencing project: a preliminary report.

    PubMed

    Howe, D K

    2001-02-26

    To accelerate genetic and molecular characterization of Sarcocystis neurona, the primary causative agent of equine protozoal myeloencephalitis (EPM), a sequencing project has been initiated that will generate approximately 7000-8000 expressed sequence tags (ESTs) from this apicomplexan parasite. Poly(A)(+) RNA was isolated from culture-derived S. neurona merozoites, and a cDNA library was constructed in a unidirectional lambda phage cloning vector. Sixty phage clones were randomly picked from the library, and the cDNA inserts were amplified from these clones using the T3 and T7 primers that flank the multi-cloning site of the lambda vector. This analysis demonstrated that 100% (60/60) of the clones selected from this library contained recombinant cDNA inserts ranging in size from 0.4 to 4.0 kilobases (kb) with an average size of 1.23kb. Single-pass sequencing from the 5' end of the 60 amplified cDNAs produced high-quality nucleotide sequence from 53 of the clones. Comparison of these ESTs to the current gene databases revealed significant matches for 10 of the ESTs, six of which are similar to sequences from other Apicomplexa (i.e., Toxoplasma gondii). Importantly, none of the ESTs were of obvious mammalian origin, thus indicating that the cDNAs in this library were derived primarily from parasite mRNA and not from mRNA of the bovine turbinate host cells. Collectively, these data indicate that the described cDNA library will provide an excellent substrate for generating a portion of the ESTs that are planned from S. neurona. This sequencing project will greatly hasten gene discovery for this protozoan pathogen thereby enhancing efforts towards the development of improved diagnostics, treatments, and preventatives for EPM. In addition, the S. neurona ESTs will represent a significant contribution to the extensive database of sequences from the Apicomplexa. Comparative analyses of these apicomplexan sequences will likely offer a multitude of important information

  8. Peptides derivatized with bicyclic quaternary ammonium ionization tags. Sequencing via tandem mass spectrometry.

    PubMed

    Setner, Bartosz; Rudowska, Magdalena; Klem, Ewelina; Cebrat, Marek; Szewczuk, Zbigniew

    2014-10-01

    Improving the sensitivity of detection and fragmentation of peptides to provide reliable sequencing of peptides is an important goal of mass spectrometric analysis. Peptides derivatized by bicyclic quaternary ammonium ionization tags: 1-azabicyclo[2.2.2]octane (ABCO) or 1,4-diazabicyclo[2.2.2]octane (DABCO), are characterized by an increased detection sensitivity in electrospray ionization mass spectrometry (ESI-MS) and longer retention times on the reverse-phase (RP) chromatography columns. The improvement of the detection limit was observed even for peptides dissolved in 10 mM NaCl. Collision-induced dissociation tandem mass spectrometry of quaternary ammonium salts derivatives of peptides showed dominant a- and b-type ions, allowing facile sequencing of peptides. The bicyclic ionization tags are stable in collision-induced dissociation experiments, and the resulted fragmentation pattern is not significantly influenced by either acidic or basic amino acid residues in the peptide sequence. Obtained results indicate the general usefulness of the bicyclic quaternary ammonium ionization tags for ESI-MS/MS sequencing of peptides. PMID:25303389

  9. Analysis of Expressed Sequence Tags from Chinese Bayberry Fruit (Myrica rubra Sieb. and Zucc.) at Different Ripening Stages and Their Association with Fruit Quality Development

    PubMed Central

    Zhu, Changqing; Feng, Chao; Li, Xian; Xu, Changjie; Sun, Chongde; Chen, Kunsong

    2013-01-01

    A total of 2000 EST sequences were produced from cDNA libraries generated from Chinese bayberry fruit (Myrica rubra Sieb. and Zucc. cv. “Biqi”) at four different ripening stages. After cluster and assembly analysis of the datasets by UniProt, 395 unigenes were identified, and their presumed functions were assigned to 14 putative cellular roles. Furthermore, a sequence BLAST was done for the top ten highly expressed genes in the ESTs, and genes associated with disease/defense and anthocyanin accumulation were analyzed. Gene-encoding elements associated with ethylene biosynthesis and signal transductions, in addition to other senescence-regulating proteins, as well as those associated with quality formation during fruit ripening, were also identified. Their possible roles were subsequently discussed. PMID:23377019

  10. Characterization of Expressed Sequence Tag-Derived Simple Sequence Repeat Markers for Aspergillus flavus: Emphasis on Variability of Isolates from the Southern United States

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Simple Sequence Repeat (SSR) markers were developed from Aspergillus flavus expressed sequence tag (EST) database to conduct an analysis of genetic relationships of Aspergillus isolates from numerous host species and geographical regions, but primarily from the United States. Twenty-nine primers wer...

  11. Identification of antimicrobial peptides from teleosts and anurans in expressed sequence tag databases using conserved signal sequences.

    PubMed

    Tessera, Valentina; Guida, Filomena; Juretić, Davor; Tossi, Alessandro

    2012-03-01

    The problem of multidrug resistance requires the efficient and accurate identification of new classes of antimicrobial agents. Endogenous antimicrobial peptides produced by most organisms are a promising source of such molecules. We have exploited the high conservation of signal sequences in teleost and anuran antimicrobial peptides to search cDNA (expressed sequence tag) databases for likely candidates. Subject sequences were then analysed for the presence of potential antimicrobial peptides based on physicochemical properties (amphipathic helical structure, cationicity) and use of the D-descriptor model to predict the therapeutic index (relation between the minimum inhibitory concentration and the concentration giving 50% haemolysis). This analysis also suggested mutations to probe the role of the primary structure in determining potency and selectivity. Selected sequences were chemically synthesized and the antimicrobial activity of the peptides was confirmed. In particular, a short (21-residue) sequence, likely of sticklefish origin, showed potent activity and it was possible to tune the spectrum of action and/or selectivity by combining three directed mutations. Membrane permeabilization studies on both bacterial and host cells indicate that the mode of action was prevalently membranolytic. This method opens up the possibility for more effective searching of the vast and continuously growing expressed sequence tag databases for novel antimicrobial peptides, which are likely abundant, and the efficient identification of the most promising candidates among them. PMID:22188679

  12. Analysis of STIS time-tag data

    NASA Technical Reports Server (NTRS)

    Lindler, Don J.; Gull, Theodore R.; Kraemer, Steven B.; Hulbert, Stephen J.

    1997-01-01

    Very high time resolution data can be obtained from the Space Telescope Imaging Spectrograph (STIS) Multi-Anode Microchannel Array (MAMA) detectors using the time-tag observing mode. In this mode, the photon events are not accumulated onboard the spacecraft. Instead, each event is recorded internally and transmitted to the ground as an X and Y location with an event time. Event times are recorded in units of 125 microseconds. Analysis of STIS Crab Pulsar data demonstrates that a time resolution of approaching 125 microseconds can be achieved. Furthermore, the time-tag observing mode has been demonstrated to be a very powerful diagnostic tool and can be used to increase the resolution of both imaging and spectral data.

  13. Perceptual learning of contrast discrimination under roving: the role of semantic sequence in stimulus tagging.

    PubMed

    Cong, Lin-Juan; Zhang, Jun-Yun

    2014-01-01

    Perceptual learning may occur when multiple contrasts are practiced in a fixed, but not in a roving (random), temporal sequence. However, learning may escape roving disruption when each contrast is assigned a letter tag (i.e., A, B, C, D). Because these letter tags carry not only stimulus identity information, but also semantic sequence information, here we investigated whether the semantic sequence information is necessary for learning of tagged contrasts under the roving condition. We found that assigning number tags (i.e., 1, 2, 3, 4), which also contained both identity and semantic sequence information, to four roving contrasts enabled significant learning of discrimination of each contrast, confirming previous data. However, learning became insignificant when the contrast tags were replaced with Greek letters that were familiar to our Chinese observers except their sequence or Chinese characters that carried no sequence information. In addition, assigning orientation tags, which carried no sequence information either, to roving contrasts was ineffective as well because learning occurred only with sequenced but not roving contrasts. These results suggest that semantic sequence information is necessary for stimulus tagging to effectively enable perceptual learning of multiple contrast discrimination under roving. PMID:25368338

  14. TagSmart: analysis and visualization for yeast mutant fitness data measured by tag microarrays

    PubMed Central

    Kim, Chulyun; Kim, Sangkyum; Dorer, Russell; Xie, Dan; Han, Jiawei; Zhong, Sheng

    2007-01-01

    Background A nearly complete collection of gene-deletion mutants (96% of annotated open reading frames) of the yeast Saccharomyces cerevisiae has been systematically constructed. Tag microarrays are widely used to measure the fitness of each mutant in a mutant mixture. The tag array experiments can have a complex experimental design, such as time course measurements and drug treatment with multiple dosages. Results TagSmart is a web application for analysis and visualization of Saccharomyces cerevisiae mutant fitness data measured by tag microarrays. It implements a robust statistical approach to assess the concentration differences among S. cerevisiae mutant strains. It also provides an interactive environment for data analysis and visualization. TagSmart has the following advantages over previously described analysis procedures: 1) it is user-friendly software rather than merely a description of analytical procedure; 2) It can handle complicated experimental designs, such as multiple time points and treatment with multiple dosages; 3) it has higher sensitivity and specificity; 4) It allows users to mask out "bad" tags in the analysis. Two biological tests were performed to illustrate the performance of TagSmart. First, we generated titration mixtures of mutant strains, in which the relative concentration of each strain was controlled. We used tag microarrays to measure the numbers of tag copies in each titration mixture. The data was analyzed with TagSmart and the result showed high precision and recall. Second, TagSmart was applied to a dataset in which heterozygous deletion strain mixture pools were treated with a new drug, Cincreasin. TagSmart identified 53 mutant strains as sensitive to Cincreasin treatment. We individually tested each identified mutant, and found 52 out of the 53 predicted mutants were indeed sensitive to Cincreasin. Conclusion TagSmart is provided "as is" to analyze tag array data produced by Affymetrix and Agilent arrays. TagSmart web

  15. Cardiac motion estimation by joint alignment of tagged MRI sequences.

    PubMed

    Oubel, E; De Craene, M; Hero, A O; Pourmorteza, A; Huguet, M; Avegliano, G; Bijnens, B H; Frangi, A F

    2012-01-01

    Image registration has been proposed as an automatic method for recovering cardiac displacement fields from tagged Magnetic Resonance Imaging (tMRI) sequences. Initially performed as a set of pairwise registrations, these techniques have evolved to the use of 3D+t deformation models, requiring metrics of joint image alignment (JA). However, only linear combinations of cost functions defined with respect to the first frame have been used. In this paper, we have applied k-Nearest Neighbors Graphs (kNNG) estimators of the α-entropy (H(α)) to measure the joint similarity between frames, and to combine the information provided by different cardiac views in an unified metric. Experiments performed on six subjects showed a significantly higher accuracy (p<0.05) with respect to a standard pairwise alignment (PA) approach in terms of mean positional error and variance with respect to manually placed landmarks. The developed method was used to study strains in patients with myocardial infarction, showing a consistency between strain, infarction location, and coronary occlusion. This paper also presents an interesting clinical application of graph-based metric estimators, showing their value for solving practical problems found in medical imaging. PMID:22000567

  16. Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

    PubMed Central

    Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

    2010-01-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087

  17. Analysis of expressed sequence tags from a single wheat cultivar facilitates interpretation of tandem mass spectrometry data and discrimination of gamma gliadin proteins that may play different functional roles in flour

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complement of gamma gliadin genes expressed in the wheat cultivar Butte 86 was evaluated by analyzing publicly available expressed sequence tag (EST) data. Eleven contigs were assembled from 153 Butte 86 ESTs. Nine of the contigs encoded full-length proteins and four of the proteins contained an...

  18. Peanut (Arachis hypogaea) expressed sequence tag (EST) project: Progress and application.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Millions of expressed sequence tag (EST) sequences from several hundred plant species have been deposited in public EST databases. Many plant ESTs have been sequenced as an alternative to whole genome sequences, including peanut because of the genome size and complexity. The US peanut research commu...

  19. Grouping and identification of sequence tags (GRIST): bioinformatics tools for the NEIBank database.

    PubMed

    Wistow, Graeme; Bernstein, Steven L; Touchman, Jeffrey W; Bouffard, Gerald; Wyatt, M Keith; Peterson, Katherine; Behal, Amita; Gao, James; Buchoff, Patee; Smith, Don

    2002-06-15

    NEIBank is a project to develop and organize genomics and bioinformatics resources for the eye. As part of this effort, tools have been developed for bioinformatics analysis and web based display of data from expressed sequence tag (EST) analyses. EST sequences are identified and formed into groups or clusters representing related transcripts from the same gene. This is carried out by a rules-based procedure called GRIST (GRouping and Identification of Sequence Tags) that uses sequence match parameters derived from BLAST programs. Linked procedures are used to eliminate non-mRNA contaminants. All data are assembled in a relational database and assembled for display as web pages with annotations and links to other informatics resources. Genome projects generate huge amounts of data that need to be classified and organized to become easily accessible to the research community. GRIST provides a useful tool for assembling and displaying the results of EST analyses. The NEIBank web site contains a growing set of pages cataloging the known transcriptional repertoire of eye tissues, derived from new NEIBank cDNA libraries and from eye-related data deposited in the dbEST section of GenBank. PMID:12107414

  20. Paired-end genomic signature tags: a method for the functional analysis of genomes and epigenomes.

    PubMed

    Dunn, John J; McCorkle, Sean R; Everett, Logan; Anderson, Carl W

    2007-01-01

    Because paired-end genomic signature tags are sequenced-based, they have the potential to become an alternate tool to tiled microarray hybridization as a method for genome-wide localization of transcription factors and other sequence-specific DNA binding proteins. As outlined here the method also can be used for global analysis of DNA methylation. One advantage of this approach is the ability to easily switch between different genome types without having to fabricate a new microarray for each and every DNA type. However, the method does have some disadvantages. Among the most rate-limiting steps of our PE-GST protocol are the need to concatemerize the diTAGs, size fractionate them and then clone them prior to sequencing. This is usually followed by additional steps to amplify and size select for long (> or = 500) concatemer inserts prior to sequencing. These time-consuming steps are important for standard DNA sequencing as they increase efficiency approximately 20-30-fold since each amplified concatemer can now provide information on multiple tags; the limitation on data acqui- sition is read length during sequencing. However, the development of new sequencing methods such as Life Sciences' 454 new nanotechnology-based sequencing instrument (41) could increase tag sequencing efficiency by several orders of magnitude (> or = 100,000 diTAG reads/run), which is sufficient to provide in-depth global analysis of all ChIP PE-GSTs in a single run. This is because the lengths of our paired-end diTAGs (approximately 60 bp) fall well within the region of high accuracy for read lengths on this instrument. In principle, sequence analysis of diTAGs could begin as soon as they are generated, thereby completely bypassing the need for the concatemerization, sizing, downstream cloning steps and sequencing template purification. In addition, our protocol places any one of several unique four-base long nucleotide sequences, such as GATC, between each and every diTAG pair, which could

  1. Genomic Sequence or Signature Tags (GSTs) from the Genome Group at Brookhaven National Laboratory (BNL)

    DOE Data Explorer

    Dunn, John J.; McCorkle, Sean R.; Praissman, Laura A.; Hind, Geoffrey; Van der Lelie, Daniel; Bahou, Wadie F.; Gnatenko, Dmitri V.; Krause, Maureen K.

    Genomic Signature Tags (GSTs) are the products of a method we have developed for identifying and quantitatively analyzing genomic DNAs. The DNA is initially fragmented with a type II restriction enzyme. An oligonucleotide adaptor containing a recognition site for MmeI, a type IIS restriction enzyme, is then used to release 21-bp tags from fixed positions in the DNA relative to the sites recognized by the fragmenting enzyme. These tags are PCR-amplified, purified, concatenated and then cloned and sequenced. The tag sequences and abundances are used to create a high resolution GST sequence profile of the genomic DNA. [Quoted from Genomic Signature Tags (GSTs): A System for Profiling Genomic DNA, Dunn, John J.; McCorkle, Sean R.; Praissman, Laura A.; Hind, Geoffrey; Van der Lelie, Daniel; Bahou, Wadie F.; Gnatenko, Dmitri V.; Krause, Maureen K., Revised 9/13/2002

  2. Expressed sequence tags from a NaCl-treated Suaeda salsa cDNA library.

    PubMed

    Zhang, L; Ma, X L; Zhang, Q; Ma, C L; Wang, P P; Sun, Y F; Zhao, Y X; Zhang, H

    2001-04-18

    Past efforts to improve plant tolerance to osmotic stress have had limited success owing to the genetic complexity of stress responses. The first step towards cataloging and categorizing genetically complex abotic stress responses is the rapid discovery of genes by the large-scale partial sequencing of randomly selected cDNA clones or expressed sequence tags (ESTs). Suaeda salsa, which can survive seawater-level salinity, is a favorite halophytic model for salt tolerant research. We constructed a NaCl-treated cDNA library of Suaeda salsa and sequenced 1048 randomly selected clones, out of which 1016 clones produced readable sequences (773 showed homology to previously identified genes, 227 matched unknown protein coding regions, 16 anomalous sequences or sequences of bacterial origin were excluded from further analysis). By sequence analysis we identified 492 unique clones: 315 showed homology to previously identified genes, 177 matched unknown protein coding regions (101 of which have been found before in other organisms and 76 are completely novel). All our EST data are available on the Internet. We believe that our dbEST and the associated DNA materials will be a useful source to scientists engaging in stress-tolerance study. PMID:11313146

  3. Insilico analysis of three different tag polypeptides with dual roles in scFv antibodies.

    PubMed

    Mohammadi, Mozafar; Nejatollahi, Foroogh; Sakhteman, Amirhossein; Zarei, Neda

    2016-08-01

    Single chain fragment variable (scFv) antibodies are composed of variable heavy (VH) and variable light (VL) domains that are joined by a polypeptide linker. Typically, [(Gly4Ser) n] sequence is used as a linker to retain the integrity of the antigen-binding domain. Due to its low immunogenicity, this sequence cannot be used as a tag for scFv detection and purification. Several evidences have shown that the addition of an N or C-terminal tag for scFv detection and purification will result in the decreased expression and binding capacity of this antibody fragment. In this study, we substituted the traditional linker (GGGGS) with His-tag, C-myc or E-tag sequences through molecular modeling. Stability and integrity of all models were assessed by molecular dynamic (MD) simulation. Based on MD simulation analysis, the model containing E-tag sequence as a linker indicated more stability compared to other molecules. The results suggest that E-tag not only can be substituted for the traditional linker, also eliminates the necessity of using additional tag for scFv detection and purification. PMID:27113782

  4. Express Sequence Tag Analysis - Identification of Anseriformes Trypsin Genes from Full-Length cDNA Library of the Duck (Anas platyrhynchos) and Characterization of Their Structure and Function.

    PubMed

    Yu, Haining; Cai, Shasha; Gao, Jiuxiang; Wang, Chen; Qiao, Xue; Wang, Hui; Feng, Lan; Wang, Yipeng

    2016-02-01

    Trypsins are key proteins important in animal protein digestion by breaking down the peptide bonds on the carboxyl side of lysine and arginine residues, hence it has been used widely in various biotechnological processes. In the current study, a full-length cDNA library with capacity of 5·10(5) CFU/ml from the duck (Anas platyrhynchos) was constructed. Using express sequence tag (EST) sequencing, genes coding two trypsins were identified and two full-length trypsin cDNAs were then obtained by rapid-amplification of cDNA end (RACE)-PCR. Using Blast, they were classified into the trypsin I and II subfamilies, but both encoded a signal peptide, an activation peptide, and a 223-a.a. mature protein located in the C-terminus. The two deduced mature proteins were designated as trypsin-IAP and trypsin-IIAP, and their theoretical isoelectric points (pI) and molecular weights (MW) were 7.99/23466.4 Da and 4.65/24066.0 Da, respectively. Molecular characterizations of genes were further performed by detailed bioinformatics analysis. Phylogenetic analysis revealed that trypsin-IIAP has an evolution pattern distinct from trypsin-IAP, suggesting its evolutionary advantage. Then the duck trypsin-IIAP was expressed in an Escherichia coli system, and its kinetic parameters were measured. The three dimensional structures of trypsin-IAP and trypsin-IIAP were predicted by homology modeling, and the conserved residues required for functionality were identified. Two loops controlling the specificity of the trypsin and the substrate-binding pocket represented in the model are almost identical in primary sequences and backbone tertiary structures of the trypsin families. PMID:27260395

  5. Degradation of C-terminal tag sequences on domain antibodies purified from E. coli supernatant

    PubMed Central

    Lykkemark, Simon; Mandrup, Ole Aalund; Friis, Niels Anton; Kristensen, Peter

    2014-01-01

    Expression of recombinant proteins often takes advantage of peptide tags expressed in fusion to allow easy detection and purification of the expressed proteins. However, as the fusion peptides most often are flexible appendages at the N- or C-terminal, proteolytic cleavage may result in removal of the tag sequence. Here, we evaluated the functionality and stability of 14 different combinations of commonly used tags for purification and detection of recombinant antibody fragments. The tag sequences were inserted in fusion with the c-terminal end of a domain antibody based on the HEL4 scaffold in a phagemid vector. This particular antibody fragment was able to refold on the membrane after blotting, allowing us to detect c-terminal tag breakdown by use of protein A in combination with detection of the tags in the specific constructs. The degradation of the c-terminal tags suggested specific sites to be particularly prone to proteolytic cleavage, leaving some of the tag combinations partially or completely degraded. This specific work illustrates the importance of tag design with regard to recombinant antibody expression in E. coli, but also aids the more general understanding of protein expression. PMID:25426869

  6. Degradation of C-terminal tag sequences on domain antibodies purified from E. coli supernatant.

    PubMed

    Lykkemark, Simon; Mandrup, Ole Aalund; Friis, Niels Anton; Kristensen, Peter

    2014-01-01

    Expression of recombinant proteins often takes advantage of peptide tags expressed in fusion to allow easy detection and purification of the expressed proteins. However, as the fusion peptides most often are flexible appendages at the N- or C-terminal, proteolytic cleavage may result in removal of the tag sequence. Here, we evaluated the functionality and stability of 14 different combinations of commonly used tags for purification and detection of recombinant antibody fragments. The tag sequences were inserted in fusion with the c-terminal end of a domain antibody based on the HEL4 scaffold in a phagemid vector. This particular antibody fragment was able to refold on the membrane after blotting, allowing us to detect c-terminal tag breakdown by use of protein A in combination with detection of the tags in the specific constructs. The degradation of the c-terminal tags suggested specific sites to be particularly prone to proteolytic cleavage, leaving some of the tag combinations partially or completely degraded. This specific work illustrates the importance of tag design with regard to recombinant antibody expression in E. coli, but also aids the more general understanding of protein expression. PMID:25426869

  7. Characterization of Expressed Sequence Tags From a Gallus gallus Pineal Gland cDNA Library

    PubMed Central

    Hartman, Stefanie; Touchton, Greg; Wynn, Jessica; Geng, Tuoyu; Chong, Nelson W.

    2005-01-01

    The pineal gland is the circadian oscillator in the chicken, regulating diverse functions ranging from egg laying to feeding. Here, we describe the isolation and characterization of expressed sequence tags (ESTs) isolated from a chicken pineal gland cDNA library. A total of 192 unique sequences were analysed and submitted to GenBank; 6% of the ESTs matched neither GenBank cDNA sequences nor the newly assembled chicken genomic DNA sequence, three ESTs aligned with sequences designated to be on the Z_random, while one matched a W chromosome sequence and could be useful in cataloguing functionally important genes on this sex chromosome. Additionally, single nucleotide polymorphisms (SNPs) were identified and validated in 10 ESTs that showed 98% or higher sequence similarity to known chicken genes. Here, we have described resources that may be useful in comparative and functional genomic analysis of genes expressed in an important organ, the pineal gland, in a model and agriculturally important organism. PMID:18629218

  8. Primer and platform effects on 16S rRNA tag sequencing

    DOE PAGESBeta

    Tremblay, Julien; Singh, Kanwar; Fern, Alison; Kirton, Edward S.; He, Shaomei; Woyke, Tanja; Lee, Janey; Chen, Feng; Dangl, Jeffery L.; Tringe, Susannah G.

    2015-08-04

    Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as wellmore » as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. In conclusion, beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.« less

  9. Primer and platform effects on 16S rRNA tag sequencing

    SciTech Connect

    Tremblay, Julien; Singh, Kanwar; Fern, Alison; Kirton, Edward S.; He, Shaomei; Woyke, Tanja; Lee, Janey; Chen, Feng; Dangl, Jeffery L.; Tringe, Susannah G.

    2015-08-04

    Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as well as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. In conclusion, beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.

  10. Expressed sequence tags of Chinese cabbage flower bud cDNA.

    PubMed Central

    Lim, C O; Kim, H Y; Kim, M G; Lee, S I; Chung, W S; Park, S H; Hwang, I; Cho, M J

    1996-01-01

    We randomly selected and partially sequenced cDNA clones from a library of Chinese cabbage (Brassica campestris L. ssp. pekinensis) flower bud cDNAs. Out of 1216 expressed sequence tags (ESTs), 904 cDNA clones were unique or nonredundant. Five hundred eighty-eight clones (48.4%) had sequence homology to functionally defined genes at the peptide level. Only 5 clones encoded known flower-specific proteins. Among the cDNAs with no similarity to known protein sequences (628), 184 clones had significant similarity to nucleotide sequences registered in the databases. Among these 184 clones, 142 exhibited similarities at the nucleotide level only with plant ESTs. Also, sequence similarities were evident between these 142 ESTs and their matching ESTs when compared using the deduced amino acid sequences. Therefore, it is possible that the anonymous ESTs encode plant-specific ubiquitous proteins. Our extensive EST analysis of genes expressed in floral organs not only contributes to the understanding of the dynamics of genome expression patterns in floral organs but also adds data to the repertoire of all genomic genes. PMID:8787028

  11. Primer and platform effects on 16S rRNA tag sequencing

    PubMed Central

    Tremblay, Julien; Singh, Kanwar; Fern, Alison; Kirton, Edward S.; He, Shaomei; Woyke, Tanja; Lee, Janey; Chen, Feng; Dangl, Jeffery L.; Tringe, Susannah G.

    2015-01-01

    Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as well as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. Beta diversity metrics are surprisingly robust to both primer and sequencing platform biases. PMID:26300854

  12. Generation and analysis of a large-scale expressed sequence tags from a full-length enriched cDNA library of Siberian tiger (Panthera tigris altaica).

    PubMed

    Guo, Yu; Liu, Changqing; Lu, Taofeng; Liu, Dan; Bai, Chunyu; Li, Xiangchen; Ma, Yuehui; Guan, Weijun

    2014-05-15

    In this study, a full-length enriched cDNA library was successfully constructed from Siberian tiger, the world's most endangered species. The titers of primary and amplified libraries were 1.28×10(6)pfu/mL and 1.59×10(10)pfu/mL respectively. The proportion of recombinants from unamplified library was 91.3% and the average length of exogenous inserts was 1.06kb. A total of 279 individual ESTs with sizes ranging from 316 to 1258bps were then analyzed. Furthermore, 204 unigenes were successfully annotated and involved in 49 functions of the GO classification, cell (175, 85.5%), cellular process (165, 80.9%), and binding (152, 74.5%) are the dominant terms. 198 unigenes were assigned to 156 KEGG pathways, and the pathways with the most representation are metabolic pathways (18, 9.1%). The proportion pattern of each COG subcategory was similar among Panthera tigris altaica, P. tigris tigris and Homo sapiens, and general function prediction only cluster (44, 15.8%) represents the largest group, followed by translation, ribosomal structure and biogenesis (33, 11.8%), replication, recombination and repair (24, 8.6%), and only 7.2% ESTs classified as novel genes. Moreover, the recombinant plasmid pET32a-TAT-COL6A2 was constructed, coded for the Trx-TAT-COL6A2 fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-COL6A2 recombinant protein was 2.64±0.18mg/mL. This library will provide a useful platform for the functional genome and transcriptome research of for the P. tigris and other felid animals in the future. PMID:24630959

  13. Identification of reproduction-related genes and SSR-markers through expressed sequence tags analysis of a monsoon breeding carp rohu, Labeo rohita (Hamilton).

    PubMed

    Sahu, Dinesh K; Panda, Soumya P; Panda, Sujata; Das, Paramananda; Meher, Prem K; Hazra, Rupenangshu K; Peatman, Eric; Liu, Zhanjiang J; Eknath, Ambekar E; Nandi, Samiran

    2013-07-15

    Labeo rohita (Ham.) also called rohu is the most important freshwater aquaculture species on the Indian sub continent. Monsoon dependent breeding restricts its seed production beyond season indicating a strong genetic control about which very limited information is available. Additionally, few genomic resources are publicly available for this species. Here we sought to identify reproduction-relevant genes from normalized cDNA libraries of the brain-pituitary-gonad-liver (BPGL-axis) tissues of adult L. rohita collected during post preparatory phase. 6161 random clones sequenced (Sanger-based) from these libraries produced 4642 (75.34%) high-quality sequences. They were assembled into 3631 (78.22%) unique sequences composed of 709 contigs and 2922 singletons. A total of 182 unique sequences were found to be associated with reproduction-related genes, mainly under the GO term categories of reproduction, neuro-peptide hormone activity, hormone and receptor binding, receptor activity, signal transduction, embryonic development, cell-cell signaling, cell death and anti-apoptosis process. Several important reproduction-related genes reported here for the first time in L. rohita are zona pellucida sperm-binding protein 3, aquaporin-12, spermine oxidase, sperm associated antigen 7, testis expressed 261, progesterone receptor membrane component, Neuropeptide Y and Pro-opiomelanocortin. Quantitative RT-PCR-based analyses of 8 known and 8 unknown transcripts during preparatory and post-spawning phase showed increased expression level of most of the transcripts during preparatory phase (except Neuropeptide Y) in comparison to post-spawning phase indicating possible roles in initiation of gonad maturation. Expression of unknown transcripts was also found in prolific breeder common carp and tilapia, but levels of expression were much higher in seasonal breeder rohu. 3631 unique sequences contained 236 (6.49%) putative microsatellites with the AG (28.16%) repeat as the most

  14. Phylogeny of Saccharina and Laminaria (Laminariaceae, Laminariales, Phaeophyta) in sequence-tagged-site markers

    NASA Astrophysics Data System (ADS)

    Qu, Jieqiong; Zhang, Jing; Wang, Xumin; Chi, Shan; Liu, Cui; Liu, Tao

    2014-01-01

    Laminaria and Saccharina have recently been recognized as two independent clades from the former genus Laminaria. Traditional morphological taxonomy is being challenged by molecular evidence from both nucleus and plastid. Intensive work is in great demand from the perspective of genome colinearity. In this study, 118 sequence-tagged site (STS) markers were screened for phylogenetic analyses, 29 based on genome sequences, while 89 were based on expressed sequence tag (EST) sequences. EST-based STS marker development (29.37%) had an effi ciency twice as high as genome-sequence-based development (9.48%) as a result of high conservation of gene transcripts among the relative species. S. ochotensis, S. religiosa, S. japonica, and L. hyperborea showed great homogeneity in all 118 STS markers. Our result supports the view that the diversifi cation between the genera Saccharina and Laminaria was a more recent event and that Saccharina and Laminaria shared high phylogenetic affi nity. However, when it came to the single nucleotide polymorphism (SNP) level among the 41 SNPs, L. hyperborea owned 29 unique SNPs against 12 within the left three Saccharina species and 12 of the 13 indels were supposedly unique for L. hyperborea, indicated by its high variability. Originating from homologous ancestors, species between the recently diverged genera Laminaria and Saccharina may have taken in enough mutations at the SNP level only, in spite of different evolutionary strategies for better adaptation to the environment. Our study lays a solid foundation from a new perspective, although more accurate phylogenetic analysis is still needed to clarify the evolutionary traces between the genera Saccharina and Laminaria.

  15. [Differentiation, identification and development of database of T. aestivum L. varieties of Ukrainian selection on the basis of sequence-tagged analysis of microsatellite repeats].

    PubMed

    Chebotar', S V; Sivolap, Iu M

    2001-01-01

    Determination of the variety genotype is very important for the development of theory and practice of plant breeding and for right protection of a variety originator. In this reason attention is focused on the molecular markers generated by polymerase chain reaction. On the basis of STMS-analysis principles of identification and development of database, which reflect molecular-genetics peculiarities of some varieties of the Plant Breeding and Genetics Institute and other Ukrainian breeding organizations, are formulated. Allelic state at microsatellite loci and their distribution were investigated. Wheat varieties were ranged according to genetic distances, data on pedigree and cluster distribution of varieties obtained using computer programs were compared. PMID:11944322

  16. Development of peanut EST (expressed sequence tag)-based genomic resources and tools

    Technology Transfer Automated Retrieval System (TEKTRAN)

    U.S. Peanut Genome Initiative (PGI) has widely recognized the need for peanut genome tools and resources development for mitigating peanut allergens and food safety. Genomics such as Expressed Sequence Tag (EST), microarray technologies, and whole genome sequencing provides robotic tools for profili...

  17. Development of peanut expessed sequence tag-based genomic resources and tools

    Technology Transfer Automated Retrieval System (TEKTRAN)

    U.S. Peanut Genome Initiative (PGI) has widely recognized the need for peanut genome tools and resources development for mitigating peanut allergens and food safety. Genomics such as Expressed Sequence Tag (EST), microarray technologies, and whole genome sequencing provides robotic tools for profili...

  18. TAG Sequence Identification of Genomic Regions Using TAGdb.

    PubMed

    Ruperao, Pradeep

    2016-01-01

    Second-generation sequencing (SGS) technology has enabled the sequencing of genomes and identification of genes. However, large complex plant genomes remain particularly difficult for de novo assembly. Access to the vast quantity of raw sequence data may facilitate discoveries; however the volume of this data makes access difficult. This chapter discusses the Web-based tool TAGdb that enables researchers to identify paired read second-generation DNA sequence data that share identity with a submitted query sequence. The identified reads can be used for PCR amplification of genomic regions to identify genes and promoters without the need for genome assembly. PMID:26519409

  19. Comparison of Sequencing (Barcode Region) and Sequence-Tagged-Site PCR for Blastocystis Subtyping

    PubMed Central

    2013-01-01

    Blastocystis is the most common nonfungal microeukaryote of the human intestinal tract and comprises numerous subtypes (STs), nine of which have been found in humans (ST1 to ST9). While efforts continue to explore the relationship between human health status and subtypes, no consensus regarding subtyping methodology exists. It has been speculated that differences detected in subtype distribution in various cohorts may to some extent reflect different approaches. Blastocystis subtypes have been determined primarily in one of two ways: (i) sequencing of small subunit rRNA gene (SSU-rDNA) PCR products and (ii) PCR with subtype-specific sequence-tagged-site (STS) diagnostic primers. Here, STS primers were evaluated against a panel of samples (n = 58) already subtyped by SSU-rDNA sequencing (barcode region), including subtypes for which STS primers are not available, and a small panel of DNAs from four other eukaryotes often present in feces (n = 18). Although the STS primers appeared to be highly specific, their sensitivity was only moderate, and the results indicated that some infections may go undetected when this method is used. False-negative STS results were not linked exclusively to certain subtypes or alleles, and evidence of substantial genetic variation in STS loci was obtained. Since the majority of DNAs included here were extracted from feces, it is possible that STS primers may generally work better with DNAs extracted from Blastocystis cultures. In conclusion, due to its higher applicability and sensitivity, and since sequence information is useful for other forms of research, SSU-rDNA barcoding is recommended as the method of choice for Blastocystis subtyping. PMID:23115257

  20. Gene ontology based characterization of expressed sequence tags (ESTs) of Brassica rapa cv. Osome.

    PubMed

    Arasan, Senthil Kumar Thamil; Park, Jong-In; Ahmed, Nasar Uddin; Jung, Hee-Jeong; Lee, In-Ho; Cho, Yong-Gu; Lim, Yong-Pyo; Kang, Kwon-Kyoo; Nou, Ill-Sup

    2013-07-01

    Chinese cabbage (Brassica rapa) is widely recognized for its economic importance and contribution to human nutrition but abiotic and biotic stresses are main obstacle for its quality, nutritional status and production. In this study, 3,429 Express Sequence Tag (EST) sequences were generated from B. rapa cv. Osome cDNA library and the unique transcripts were classified functionally using a gene ontology (GO) hierarchy, Kyoto encyclopedia of genes and genomes (KEGG). KEGG orthology and the structural domain data were obtained from the biological database for stress related genes (SRG). EST datasets provided a wide outlook of functional characterization of B. rapa cv. Osome. In silico analysis revealed % 83 of ESTs to be well annotated towards reeds one dimensional concept. Clustering of ESTs returned 333 contigs and 2,446 singlets, giving a total of 3,284 putative unigene sequences. This dataset contained 1,017 EST sequences functionally annotated to stress responses and from which expression of randomly selected SRGs were analyzed against cold, salt, drought, ABA, water and PEG stresses. Most of the SRGs showed differentially expression against these stresses. Thus, the EST dataset is very important for discovering the potential genes related to stress resistance in Chinese cabbage, and can be of useful resources for genetic engineering of Brassica sp. PMID:23898551

  1. Identification of Simple Sequence Repeat Biomarkers through Cross-Species Comparison in a Tag Cloud Representation

    PubMed Central

    2014-01-01

    Simple sequence repeats (SSRs) are not only applied as genetic markers in evolutionary studies but they also play an important role in gene regulatory activities. Efficient identification of conserved and exclusive SSRs through cross-species comparison is helpful for understanding the evolutionary mechanisms and associations between specific gene groups and SSR motifs. In this paper, we developed an online cross-species comparative system and integrated it with a tag cloud visualization technique for identifying potential SSR biomarkers within fourteen frequently used model species. Ultraconserved or exclusive SSRs among cross-species orthologous genes could be effectively retrieved and displayed through a friendly interface design. Four different types of testing cases were applied to demonstrate and verify the retrieved SSR biomarker candidates. Through statistical analysis and enhanced tag cloud representation on defined functional related genes and cross-species clusters, the proposed system can correctly represent the patterns, loci, colors, and sizes of identified SSRs in accordance with gene functions, pattern qualities, and conserved characteristics among species. PMID:24800246

  2. AGIA Tag System Based on a High Affinity Rabbit Monoclonal Antibody against Human Dopamine Receptor D1 for Protein Analysis

    PubMed Central

    Yano, Tomoya; Takeda, Hiroyuki; Uematsu, Atsushi; Yamanaka, Satoshi; Nomura, Shunsuke; Nemoto, Keiichirou; Iwasaki, Takahiro; Takahashi, Hirotaka; Sawasaki, Tatsuya

    2016-01-01

    Polypeptide tag technology is widely used for protein detection and affinity purification. It consists of two fundamental elements: a peptide sequence and a binder which specifically binds to the peptide tag. In many tag systems, antibodies have been used as binder due to their high affinity and specificity. Recently, we obtained clone Ra48, a high-affinity rabbit monoclonal antibody (mAb) against dopamine receptor D1 (DRD1). Here, we report a novel tag system composed of Ra48 antibody and its epitope sequence. Using a deletion assay, we identified EEAAGIARP in the C-terminal region of DRD1 as the minimal epitope of Ra48 mAb, and we named this sequence the “AGIA” tag, based on its central sequence. The tag sequence does not include the four amino acids, Ser, Thr, Tyr, or Lys, which are susceptible to post-translational modification. We demonstrated performance of this new tag system in biochemical and cell biology applications. SPR analysis demonstrated that the affinity of the Ra48 mAb to the AGIA tag was 4.90 × 10−9 M. AGIA tag showed remarkably high sensitivity and specificity in immunoblotting. A number of AGIA-fused proteins overexpressed in animal and plant cells were detected by anti-AGIA antibody in immunoblotting and immunostaining with low background, and were immunoprecipitated efficiently. Furthermore, a single amino acid substitution of the second Glu to Asp (AGIA/E2D) enabled competitive dissociation of AGIA/E2D-tagged protein by adding wild-type AGIA peptide. It enabled one-step purification of AGIA/E2D-tagged recombinant proteins by peptide competition under physiological conditions. The sensitivity and specificity of the AGIA system makes it suitable for use in multiple methods for protein analysis. PMID:27271343

  3. AGIA Tag System Based on a High Affinity Rabbit Monoclonal Antibody against Human Dopamine Receptor D1 for Protein Analysis.

    PubMed

    Yano, Tomoya; Takeda, Hiroyuki; Uematsu, Atsushi; Yamanaka, Satoshi; Nomura, Shunsuke; Nemoto, Keiichirou; Iwasaki, Takahiro; Takahashi, Hirotaka; Sawasaki, Tatsuya

    2016-01-01

    Polypeptide tag technology is widely used for protein detection and affinity purification. It consists of two fundamental elements: a peptide sequence and a binder which specifically binds to the peptide tag. In many tag systems, antibodies have been used as binder due to their high affinity and specificity. Recently, we obtained clone Ra48, a high-affinity rabbit monoclonal antibody (mAb) against dopamine receptor D1 (DRD1). Here, we report a novel tag system composed of Ra48 antibody and its epitope sequence. Using a deletion assay, we identified EEAAGIARP in the C-terminal region of DRD1 as the minimal epitope of Ra48 mAb, and we named this sequence the "AGIA" tag, based on its central sequence. The tag sequence does not include the four amino acids, Ser, Thr, Tyr, or Lys, which are susceptible to post-translational modification. We demonstrated performance of this new tag system in biochemical and cell biology applications. SPR analysis demonstrated that the affinity of the Ra48 mAb to the AGIA tag was 4.90 × 10-9 M. AGIA tag showed remarkably high sensitivity and specificity in immunoblotting. A number of AGIA-fused proteins overexpressed in animal and plant cells were detected by anti-AGIA antibody in immunoblotting and immunostaining with low background, and were immunoprecipitated efficiently. Furthermore, a single amino acid substitution of the second Glu to Asp (AGIA/E2D) enabled competitive dissociation of AGIA/E2D-tagged protein by adding wild-type AGIA peptide. It enabled one-step purification of AGIA/E2D-tagged recombinant proteins by peptide competition under physiological conditions. The sensitivity and specificity of the AGIA system makes it suitable for use in multiple methods for protein analysis. PMID:27271343

  4. New aldehyde tag sequences identified by screening formylglycine generating enzymes in vitro and in vivo.

    PubMed

    Rush, Jason S; Bertozzi, Carolyn R

    2008-09-17

    Formylglycine generating enzyme (FGE) performs a critical posttranslational modification of type I sulfatases, converting cysteine within the motif CxPxR to the aldehyde-bearing residue formylglycine (FGly). This concise motif can be installed within heterologous proteins as a genetically encoded "aldehyde tag" for site-specific labeling with aminooxy- or hydrazide-functionalized probes. In this report, we screened FGEs from M. tuberculosis and S. coelicolor against synthetic peptide libraries and identified new substrate sequences that diverge from the canonical motif. We found that E. coli's FGE-like activity is similarly promiscuous, enabling the use of novel aldehyde tag sequences for in vivo modification of recombinant proteins. PMID:18722427

  5. Peanut (Arachis hypogaea) Expressed Sequence Tag Project: Progress and Application

    PubMed Central

    Feng, Suping; Wang, Xingjun; Zhang, Xinyou; Dang, Phat M.; Holbrook, C. Corley; Culbreath, Albert K.; Wu, Yaoting; Guo, Baozhu

    2012-01-01

    Many plant ESTs have been sequenced as an alternative to whole genome sequences, including peanut because of the genome size and complexity. The US peanut research community had the historic 2004 Atlanta Genomics Workshop and named the EST project as a main priority. As of August 2011, the peanut research community had deposited 252,832 ESTs in the public NCBI EST database, and this resource has been providing the community valuable tools and core foundations for various genome-scale experiments before the whole genome sequencing project. These EST resources have been used for marker development, gene cloning, microarray gene expression and genetic map construction. Certainly, the peanut EST sequence resources have been shown to have a wide range of applications and accomplished its essential role at the time of need. Then the EST project contributes to the second historic event, the Peanut Genome Project 2010 Inaugural Meeting also held in Atlanta where it was decided to sequence the entire peanut genome. After the completion of peanut whole genome sequencing, ESTs or transcriptome will continue to play an important role to fill in knowledge gaps, to identify particular genes and to explore gene function. PMID:22745594

  6. Comparative mapping of expressed sequence tags containing microsatellites in rainbow trout (Oncorhynchus mykiss)

    PubMed Central

    Rexroad, Caird E; Rodriguez, Maria F; Coulibaly, Issa; Gharbi, Karim; Danzmann, Roy G; DeKoning, Jenefer; Phillips, Ruth; Palti, Yniv

    2005-01-01

    Background Comparative genomics, through the integration of genetic maps from species of interest with whole genome sequences of other species, will facilitate the identification of genes affecting phenotypes of interest. The development of microsatellite markers from expressed sequence tags will serve to increase marker densities on current salmonid genetic maps and initiate in silico comparative maps with species whose genomes have been fully sequenced. Results Eighty-nine polymorphic microsatellite markers were generated for rainbow trout of which at least 74 amplify in other salmonids. Fifty-five have been associated with functional annotation and 30 were mapped on existing genetic maps. Homologous sequences were identified for 20 of the EST containing microsatellites to identify comparative assignments within the tetraodon, mouse, and/or human genomes. Conclusion The addition of microsatellite markers constructed from expressed sequence tag data will facilitate the development of high-density genetic maps for rainbow trout and comparative maps with other salmonids and better studied species. PMID:15836796

  7. Analysis of common bean expressed sequence tags identifies sulfur metabolic pathways active in seed and sulfur-rich proteins highly expressed in the absence of phaseolin and major lectins

    PubMed Central

    2011-01-01

    Background A deficiency in phaseolin and phytohemagglutinin is associated with a near doubling of sulfur amino acid content in genetically related lines of common bean (Phaseolus vulgaris), particularly cysteine, elevated by 70%, and methionine, elevated by 10%. This mostly takes place at the expense of an abundant non-protein amino acid, S-methyl-cysteine. The deficiency in phaseolin and phytohemagglutinin is mainly compensated by increased levels of the 11S globulin legumin and residual lectins. Legumin, albumin-2, defensin and albumin-1 were previously identified as contributing to the increased sulfur amino acid content in the mutant line, on the basis of similarity to proteins from other legumes. Results Profiling of free amino acid in developing seeds of the BAT93 reference genotype revealed a biphasic accumulation of gamma-glutamyl-S-methyl-cysteine, the main soluble form of S-methyl-cysteine, with a lag phase occurring during storage protein accumulation. A collection of 30,147 expressed sequence tags (ESTs) was generated from four developmental stages, corresponding to distinct phases of gamma-glutamyl-S-methyl-cysteine accumulation, and covering the transitions to reserve accumulation and dessication. Analysis of gene ontology categories indicated the occurrence of multiple sulfur metabolic pathways, including all enzymatic activities responsible for sulfate assimilation, de novo cysteine and methionine biosynthesis. Integration of genomic and proteomic data enabled the identification and isolation of cDNAs coding for legumin, albumin-2, defensin D1 and albumin-1A and -B induced in the absence of phaseolin and phytohemagglutinin. Their deduced amino acid sequences have a higher content of cysteine than methionine, providing an explanation for the preferential increase of cysteine in the mutant line. Conclusion The EST collection provides a foundation to further investigate sulfur metabolism and the differential accumulation of sulfur amino acids in seed

  8. De novo sequencing of unique sequence tags for discovery of post-translational modifications of proteins.

    PubMed

    Shen, Yufeng; Tolić, Nikola; Hixson, Kim K; Purvine, Samuel O; Anderson, Gordon A; Smith, Richard D

    2008-10-15

    De novo sequencing is a spectrum analysis approach for mass spectrometry data to discover post-translational modifications in proteins; however, such an approach is still in its infancy and is still not widely applied to proteomic practices due to its limited reliability. In this work, we describe a de novo sequencing approach for the discovery of protein modifications based on identification of the proteome UStags (Shen, Y.; Tolić, N.; Hixson, K. K.; Purvine, S. O.; Pasa-Tolić, L.; Qian, W. J.; Adkins, J. N.; Moore, R. J.; Smith, R. D. Anal. Chem. 2008, 80, 1871-1882). The de novo information was obtained from Fourier-transform tandem mass spectrometry data for peptides and polypeptides from a yeast lysate, and the de novo sequences obtained were selected based on filter levels designed to provide a limited yet high quality subset of UStags. The DNA-predicted database protein sequences were then compared to the UStags, and the differences observed across or in the UStags (i.e., the UStags' prefix and suffix sequences and the UStags themselves) were used to infer possible sequence modifications. With this de novo-UStag approach, we uncovered some unexpected variances within several yeast protein sequences due to amino acid mutations and/or multiple modifications to the predicted protein sequences. To determine false discovery rates, two random (false) databases were independently used for sequence matching, and ~3% false discovery rates were estimated for the de novo-UStag approach. The factors affecting the reliability (e.g., existence of de novo sequencing noise residues and redundant sequences) and the sensitivity of the approach were investigated and described. The combined de novo-UStag approach complements the UStag method previously reported by enabling the discovery of new protein modifications. PMID:18783246

  9. Characterization of genome-wide ordered sequence-tagged Mycobacterium mutant libraries by Cartesian Pooling-Coordinate Sequencing

    PubMed Central

    Vandewalle, Kristof; Festjens, Nele; Plets, Evelyn; Vuylsteke, Marnik; Saeys, Yvan; Callewaert, Nico

    2015-01-01

    Reverse genetics research approaches require the availability of methods to rapidly generate specific mutants. Alternatively, where these methods are lacking, the construction of pre-characterized libraries of mutants can be extremely valuable. However, this can be complex, expensive and time consuming. Here, we describe a robust, easy to implement parallel sequencing-based method (Cartesian Pooling-Coordinate Sequencing or CP-CSeq) that reports both on the identity as well as on the location of sequence-tagged biological entities in well-plate archived clone collections. We demonstrate this approach using a transposon insertion mutant library of the Mycobacterium bovis BCG vaccine strain, providing the largest resource of mutants in any strain of the M. tuberculosis complex. The method is applicable to any entity for which sequence-tagged identification is possible. PMID:25960123

  10. Microbial Diversity in Deep-sea Methane Seep Sediments Presented by SSU rRNA Gene Tag Sequencing

    PubMed Central

    Nunoura, Takuro; Takaki, Yoshihiro; Kazama, Hiromi; Hirai, Miho; Ashi, Juichiro; Imachi, Hiroyuki; Takai, Ken

    2012-01-01

    Microbial community structures in methane seep sediments in the Nankai Trough were analyzed by tag-sequencing analysis for the small subunit (SSU) rRNA gene using a newly developed primer set. The dominant members of Archaea were Deep-sea Hydrothermal Vent Euryarchaeotic Group 6 (DHVEG 6), Marine Group I (MGI) and Deep Sea Archaeal Group (DSAG), and those in Bacteria were Alpha-, Gamma-, Delta- and Epsilonproteobacteria, Chloroflexi, Bacteroidetes, Planctomycetes and Acidobacteria. Diversity and richness were examined by 8,709 and 7,690 tag-sequences from sediments at 5 and 25 cm below the seafloor (cmbsf), respectively. The estimated diversity and richness in the methane seep sediment are as high as those in soil and deep-sea hydrothermal environments, although the tag-sequences obtained in this study were not sufficient to show whole microbial diversity in this analysis. We also compared the diversity and richness of each taxon/division between the sediments from the two depths, and found that the diversity and richness of some taxa/divisions varied significantly along with the depth. PMID:22510646

  11. Mining and comparison of haplotype-based expressed sequence tag single nucleotide polymorphisms among citrus cultivars

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In this paper, haplotype-based SNPs were mined out of publicly available citrus expressed sequence tags (ESTs) from different citrus cultivars (genotypes) individually and collectively for comparison. There were a total of 567,297 ESTs belonging to 27 cultivars in varying numbers and consequentially...

  12. Patterns of gene expression in microarrays and expressed sequence tags from normal and cataractous lenses.

    PubMed

    Sousounis, Konstantinos; Tsonis, Panagiotis A

    2012-01-01

    In this contribution, we have examined the patterns of gene expression in normal and cataractous lenses as presented in five different papers using microarrays and expressed sequence tags. The purpose was to evaluate unique and common patterns of gene expression during development, aging and cataracts. PMID:23244575

  13. Seventy microsatellite markers from Persea americana Miller (avocado) expressed sequence tags

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Expressed sequence tags (ESTs) for Persea americana Mill. were investigated to expand upon the number of informative microsatellite markers available for avocado. Seventy informative loci were discovered using twenty-four P. americana var. americana Mill. accessions. The number of alleles detected r...

  14. CoffeebEST: an integrated resource for Coffea spp expressed sequence tags.

    PubMed

    Paschoal, A R; Fernandes, E D M; Silva, J C; Lopes, F M; Pereira, L F P; Domingues, D S

    2014-01-01

    Coffee is one of the most important commodities in the world, and its production relies mainly on two species, Coffea arabica and Coffea canephora. Although there are diverse transcriptome datasets available for coffee trees, few research groups have exploited the potential knowledge contained in these data, especially with respect to fruit and seed development. Here, we present a comparative analysis of the transcriptomes of Coffea arabica and Coffea canephora with a focus on fruit development using publicly available expressed sequence tags (ESTs). Most of the fruit and seed EST data has been obtained from C. canephora. Therefore, we performed a fruit EST analysis of the 5 developmental stages of this species (18, 22, 30, 42, and 46 weeks after flowering) comprising 29,009 sequences. We compared C. canephora fruit ESTs to reference unigenes of C. canephora (7710 contigs and 8955 singletons) and C. arabica (15,656 contigs and 16,351 singletons). Additional analyses included functional annotation based on Gene Onthology, as well as an annotation using PlantCyc, a curated plant protein database. The Coffee Bean EST (CoffeebEST) is a public database available at http://bioinfo-02.cp.utfpr.edu.br/. This database represents an additional resource for the coffee scientific community, offering a user-friendly collection of information for non-specialists in coffee molecular biology to support experimental research on comparative and functional genomics. PMID:25526212

  15. Identification of genes related to Parkinson's disease using expressed sequence tags.

    PubMed

    Kim, Jeong-Min; Lee, Kyu-Hwa; Jeon, Yeo-Jin; Oh, Jung-Hwa; Jeong, So-Young; Song, In-Sung; Kim, Jin-Man; Lee, Dong-Seok; Kim, Nam-Soon

    2006-12-31

    In a search for novel target genes related to Parkinson's disease (PD), two full-length cDNA libraries were constructed from a human normal substantia nigra (SN) and a PD patient's SN. An analysis of the gene expression profiles between them was done using the expressed sequence tags (ESTs) frequency. Data for the differently expressed genes were verified by quantitative real-time RT-PCR, immunohistochemical analysis and a cell death assay. Among the 76 genes identified with a significant difference (P > 0.9), 21 upregulated genes and 13 downregulated genes were confirmed to be differentially expressed in human PD tissues and/or in an MPTP-treated mice model by quantitative real-time RT-PCR. Among those genes, an immunohistochemical analysis using an MPTP mice model for alpha-tubulin including TUBA3 and TUBA6 showed that the protein levels are downregulated, as well as the RNA levels. In addition, MBP, PBP and GNAS were confirmed to accelerate cell death activity, whereas SPP1 and TUBA3 to retard this process. Using an analysis of ESTs frequency, it was possible to identify a large number of genes related to human PD. These new genes, MBP, PBP, GNAS, SPP1 and TUBA3 in particular, represent potential biomarkers for PD and could serve as useful targets for elucidating the molecular mechanisms associated with PD. PMID:17213182

  16. Expressed sequence tags from Atta laevigata and identification of candidate genes for the control of pest leaf-cutting ants

    PubMed Central

    2011-01-01

    Background Leafcutters are the highest evolved within Neotropical ants in the tribe Attini and model systems for studying caste formation, labor division and symbiosis with microorganisms. Some species of leafcutters are agricultural pests controlled by chemicals which affect other animals and accumulate in the environment. Aiming to provide genetic basis for the study of leafcutters and for the development of more specific and environmentally friendly methods for the control of pest leafcutters, we generated expressed sequence tag data from Atta laevigata, one of the pest ants with broad geographic distribution in South America. Results The analysis of the expressed sequence tags allowed us to characterize 2,006 unique sequences in Atta laevigata. Sixteen of these genes had a high number of transcripts and are likely positively selected for high level of gene expression, being responsible for three basic biological functions: energy conservation through redox reactions in mitochondria; cytoskeleton and muscle structuring; regulation of gene expression and metabolism. Based on leafcutters lifestyle and reports of genes involved in key processes of other social insects, we identified 146 sequences potential targets for controlling pest leafcutters. The targets are responsible for antixenobiosis, development and longevity, immunity, resistance to pathogens, pheromone function, cell signaling, behavior, polysaccharide metabolism and arginine kynase activity. Conclusion The generation and analysis of expressed sequence tags from Atta laevigata have provided important genetic basis for future studies on the biology of leaf-cutting ants and may contribute to the development of a more specific and environmentally friendly method for the control of agricultural pest leafcutters. PMID:21682882

  17. Expressed sequence tags reveal genetic diversity and putative virulence factors of the pathogenic oomycete Pythium insidiosum.

    PubMed

    Krajaejun, Theerapong; Khositnithikul, Rommanee; Lerksuthirat, Tassanee; Lowhnoo, Tassanee; Rujirawat, Thidarat; Petchthong, Thanom; Yingyong, Wanta; Suriyaphol, Prapat; Smittipat, Nat; Juthayothin, Tada; Phuntumart, Vipaporn; Sullivan, Thomas D

    2011-07-01

    Oomycetes are unique eukaryotic microorganisms that share a mycelial morphology with fungi. Many oomycetes are pathogenic to plants, and a more limited number are pathogenic to animals. Pythium insidiosum is the only oomycete that is capable of infecting both humans and animals, and causes a life-threatening infectious disease, called "pythiosis". In the majority of pythiosis patients life-long handicaps result from the inevitable radical excision of infected organs, and many die from advanced infection. Better understanding P. insidiosum pathogenesis at molecular levels could lead to new forms of treatment. Genetic and genomic information is lacking for P. insidiosum, so we have undertaken an expressed sequence tag (EST) study, and report on the first dataset of 486 ESTs, assembled into 217 unigenes. Of these, 144 had significant sequence similarity with known genes, including 47 with ribosomal protein homology. Potential virulence factors included genes involved in antioxidation, thermal adaptation, immunomodulation, and iron and sterol binding. Effectors resembling pathogenicity factors of plant-pathogenic oomycetes were also discovered, such as, a CBEL-like protein (possible involvement in host cell adhesion and hemagglutination), a putative RXLR effector (possibly involved in host cell modulation) and elicitin-like (ELL) proteins. Phylogenetic analysis mapped P. insidiosum ELLs to several novel clades of oomycete elicitins (ELIs), and homology modeling predicted that P. insidiosum ELLs should bind sterols. Most of the P. insidiosum ESTs showed homology to sequences in the genome or EST databases of other oomycetes, but one putative gene, with unknown function, was found to be unique to P. insidiosum. The EST dataset reported here represents the first steps in identifying genes of P. insidiosum and beginning transcriptome analysis. This genetic information will facilitate understanding of pathogenic mechanisms of this devastating pathogen. PMID:21724174

  18. Sub-wavelength plasmonic readout for direct linear analysis of optically tagged DNA

    NASA Astrophysics Data System (ADS)

    Varsanik, Jonathan; Teynor, William; LeBlanc, John; Clark, Heather; Krogmeier, Jeffrey; Yang, Tian; Crozier, Kenneth; Bernstein, Jonathan

    2010-02-01

    This work describes the development and fabrication of a novel nanofluidic flow-through sensing chip that utilizes a plasmonic resonator to excite fluorescent tags with sub-wavelength resolution. We cover the design of the microfluidic chip and simulation of the plasmonic resonator using Finite Difference Time Domain (FDTD) software. The fabrication methods are presented, with testing procedures and preliminary results. This research is aimed at improving the resolution limits of the Direct Linear Analysis (DLA) technique developed by US Genomics [1]. In DLA, intercalating dyes which tag a specific 8 base-pair sequence are inserted in a DNA sample. This sample is pumped though a nano-fluidic channel, where it is stretched into a linear geometry and interrogated with light which excites the fluorescent tags. The resulting sequence of optical pulses produces a characteristic "fingerprint" of the sample which uniquely identifies any sample of DNA. Plasmonic confinement of light to a 100 nm wide metallic nano-stripe enables resolution of a higher tag density compared to free space optics. Prototype devices have been fabricated and are being tested with fluorophore solutions and tagged DNA. Preliminary results show evanescent coupling to the plasmonic resonator is occurring with 0.1 micron resolution, however light scattering limits the S/N of the detector. Two methods to reduce scattered light are presented: index matching and curved waveguides.

  19. Velocity measurement of clay intrusion through a sudden contraction step using a tagging pulse sequence.

    PubMed

    Tsushima, Shohji; Hasegawa, Atsushi; Suekane, Tetsuya; Hirai, Shuichiro; Tanaka, Yoshihiro; Nakasuji, Yoshizumi

    2003-07-01

    Magnetic resonance imaging (MRI) with a spatial tagging sequence was used to measure the velocity distribution of clay that was forced past a sudden contraction. A spatial tagging sequence provided magnetic resonance images of clay that allowed measurement of the velocity distribution in the clay, which can provide profound insights on the deformation process of clay during the intrusion process. The experiments were conducted using a specially-designed vessel that could operate at up to 30 MPa. The vessel offers a rectangle test section with a sudden contraction step that had a ratio of contraction of 2:1. The vessel was installed into a commercial magnetic resonance imaging equipment and then the fluid motion of clay flowing into the narrow contracted channel was quantitatively investigated to examine behaviors of flowing clay as non-Newtonian fluid. MRI results are compared with those obtained by computational fluid dynamics (CFD) calculation. Velocity distributions obtained from each tag displacement did not well agree with those predicted by CFD results near the contraction step where the fluid accelerated rapidly. However, a post-processing on calculation results, in which virtual tag displacement is calculated, gave better agreement with experiment and enabled us to compare MRI results with CFD results. PMID:12915199

  20. Multiplexed metagenome mining using short DNA sequence tags facilitates targeted discovery of epoxyketone proteasome inhibitors

    PubMed Central

    Owen, Jeremy G.; Charlop-Powers, Zachary; Smith, Alexandra G.; Ternei, Melinda A.; Calle, Paula Y.; Reddy, Boojala Vijay B.; Montiel, Daniel; Brady, Sean F.

    2015-01-01

    In molecular evolutionary analyses, short DNA sequences are used to infer phylogenetic relationships among species. Here we apply this principle to the study of bacterial biosynthesis, enabling the targeted isolation of previously unidentified natural products directly from complex metagenomes. Our approach uses short natural product sequence tags derived from conserved biosynthetic motifs to profile biosynthetic diversity in the environment and then guide the recovery of gene clusters from metagenomic libraries. The methodology is conceptually simple, requires only a small investment in sequencing, and is not computationally demanding. To demonstrate the power of this approach to natural product discovery we conducted a computational search for epoxyketone proteasome inhibitors within 185 globally distributed soil metagenomes. This led to the identification of 99 unique epoxyketone sequence tags, falling into 6 phylogenetically distinct clades. Complete gene clusters associated with nine unique tags were recovered from four saturating soil metagenomic libraries. Using heterologous expression methodologies, seven potent epoxyketone proteasome inhibitors (clarepoxcins A–E and landepoxcins A and B) were produced from these pathways, including compounds with different warhead structures and a naturally occurring halohydrin prodrug. This study provides a template for the targeted expansion of bacterially derived natural products using the global metagenome. PMID:25831524

  1. Identification of genes encoding Schistosoma mansoni antigens using an antigenic sequence tag strategy.

    PubMed

    Zouain, C S; Azevedo, V A; Franco, G R; Pena, S D; Goes, A M

    1998-12-01

    Another approach for the identification of genes that code for antigenic products is described using an antigenic sequence tag (AST) strategy. A Schistosoma mansoni adult worm cDNA library was screened with affinity chromatography-purified immunoglobulins from infected human sera and a mild oxidation treatment with sodium periodate. From 1 or both ends of 30 cDNA clones, 30 ASTs were obtained. Of these, 22 were previously known Sm antigens. One clone had matches with entries for other organisms in the databases and 6 had homology with Sm-expressed sequence tags (EST) entries. These clones, together with another 1 that had no significant database matches, were considered new antigenic genes in S. mansoni. The strategy proved to be efficient for the identification of genes that could be used for immunological studies and evaluation as vaccine candidates. PMID:9920341

  2. Evaluation of anonymous and expressed sequence tag derived polymorphic microsatellite markers in the tobacco budworm Heliothis virescens (Lepidoptera: noctuidae)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Polymorphic genetic markers were identified and characterized using a partial genomic library of Heliothis virescens enriched for simple sequence repeats (SSR) and nucleotide sequences of expressed sequence tags (EST). Nucleotide sequences of 192 clones from the partial genomic library yielded 147 u...

  3. Transferability of microsatellite and sequence tagged site markers in Oryza species.

    PubMed

    Brondani, Claudio; Rangel, Paulo Hideo Nakano; Borba, Tereza Cristina Oliveira; Brondani, Rosana Pereira Vianello

    2003-01-01

    The genus Oryza comprises 22 species which are potentially useful as a source of genetic variability that can be introgressed into the worldwide cultivated rice, Oryza sativa. Molecular markers are useful tools for monitoring gene introgressions and for detecting polymorphism among species. In this study, cross-amplification was estimated among 28 accessions of 16 Oryza species, representing the genomes AA, BB, CC, BBCC and CCDD, using 59 microsatellite (OG, OS and RM series) and 15 STS (Sequence Tagged Sites) markers. All markers amplified at least one Oryza species, indicating different levels of transferability across species. Markers based on microsatellite sequences amplified 37 % of the accessions, with an average of 6.58 alleles per locus and an average polymorphism information content (PIC) of 70 %. For STS markers, the amplification level was 53.3 %, and the average number of alleles and PIC values were 1.6 and 10 %, respectively. These Results showed that although the STS markers detected a reduced level of genetic diversity, the transferability was higher, indicating that they can be used for genetic analysis when evaluating less genetically related species of Oryza. Among the microsatellite markers, an analysis of species with an AA genome showed that the OG markers produced the highest level of polymorphic loci (54.6 %), followed by RM markers (48 %). Highly polymorphic and transferable molecular markers in Oryza can be useful for exploiting the genetic resources of this genus, for detecting allelic variants in loci associated with important agronomic traits, and for monitoring alleles introgressed from wild relatives to cultivated rice. PMID:14641482

  4. Sixteen Polymorphic Simple Sequence Repeat Markers from Expressed Sequence Tags of the Chinese Mitten Crab Eriocheir sinensis

    PubMed Central

    Gao, Xiang-Gang; Li, Hong-Jun; Li, Yun-Feng; Sui, Li-Jun; Zhu, Bao; Liang, Yu; Liu, Wei-Dong; He, Chong-Bo

    2010-01-01

    The Chinese mitten crab (Eriocheir sinensis) is an economically important aquaculture species in China. In this study, we developed and evaluated simple sequence repeat markers from expressed sequence tags of E. sinensis. Among the 40 wild E. sinensis individuals tested, 16 loci were polymorphic. The number of alleles per locus ranged from two to ten. The observed heterozygosity ranged from 0.0667 to 0.9667, whereas the expected heterozygosity ranged from 0.0661 to 0.9051. These markers have the potential for use in genetic studies of population structure and intraspecific variation in E. sinensis. PMID:21152289

  5. Identification and Characterization of Microsatellites in Expressed Sequence Tags and Their Cross Transferability in Different Plants

    PubMed Central

    Haq, Shamshad ul; Jain, Rohit; Sharma, Meenakshi; Kachhwaha, Sumita; Kothari, S. L.

    2014-01-01

    Expressed sequence tags (EST) are potential source for the development of genic microsatellite markers, gene discovery, comparative genomics, and other genomic studies. In the present study, 7630 ESTs were examined from NCBI for SSR identification and characterization. A total of 263 SSRs were identified with an average density of one SSR/4.2 kb (3.4% frequency). Analysis revealed that trinucleotide repeats (47.52%) were most abundant followed by tetranucleotide (19.77%), dinucleotide (19.01%), pentanucleotide (9.12%), and hexanucleotide repeats (4.56%). Functional annotation was done through homology search and gene ontology, and 35 EST-SSRs were selected. Primer pairs were designed for evaluation of cross transferability and polymorphism among 11 plants belonging to five different families. Total 402 alleles were generated at 155 loci with an average of 2.6 alleles/locus and the polymorphic information content (PIC) ranged from 0.15 to 0.92 with an average of 0.75. The cross transferability ranged from 34.84% to 98.06% in different plants, with an average of 67.86%. Thus, the validation study of annotated 35 EST-SSR markers which correspond to particular metabolic activity revealed polymorphism and evolutionary nature in different families of Angiospermic plants. PMID:25389527

  6. Sequencing degraded DNA from non-destructively sampled museum specimens for RAD-tagging and low-coverage shotgun phylogenetics.

    PubMed

    Tin, Mandy Man-Ying; Economo, Evan Philip; Mikheyev, Alexander Sergeyevich

    2014-01-01

    Ancient and archival DNA samples are valuable resources for the study of diverse historical processes. In particular, museum specimens provide access to biotas distant in time and space, and can provide insights into ecological and evolutionary changes over time. However, archival specimens are difficult to handle; they are often fragile and irreplaceable, and typically contain only short segments of denatured DNA. Here we present a set of tools for processing such samples for state-of-the-art genetic analysis. First, we report a protocol for minimally destructive DNA extraction of insect museum specimens, which produced sequenceable DNA from all of the samples assayed. The 11 specimens analyzed had fragmented DNA, rarely exceeding 100 bp in length, and could not be amplified by conventional PCR targeting the mitochondrial cytochrome oxidase I gene. Our approach made these samples amenable to analysis with commonly used next-generation sequencing-based molecular analytic tools, including RAD-tagging and shotgun genome re-sequencing. First, we used museum ant specimens from three species, each with its own reference genome, for RAD-tag mapping. Were able to use the degraded DNA sequences, which were sequenced in full, to identify duplicate reads and filter them prior to base calling. Second, we re-sequenced six Hawaiian Drosophila species, with millions of years of divergence, but with only a single available reference genome. Despite a shallow coverage of 0.37 ± 0.42 per base, we could recover a sufficient number of overlapping SNPs to fully resolve the species tree, which was consistent with earlier karyotypic studies, and previous molecular studies, at least in the regions of the tree that these studies could resolve. Although developed for use with degraded DNA, all of these techniques are readily applicable to more recent tissue, and are suitable for liquid handling automation. PMID:24828244

  7. Identification of SNP and SSR markers in eggplant using RAD tag sequencing

    PubMed Central

    2011-01-01

    Background The eggplant (Solanum melongena L.) genome is relatively unexplored, especially compared to those of the other major Solanaceae crops tomato and potato. In particular, no SNP markers are publicly available; on the other hand, over 1,000 SSR markers were developed and publicly available. We have combined the recently developed Restriction-site Associated DNA (RAD) approach with Illumina DNA sequencing for rapid and mass discovery of both SNP and SSR markers for eggplant. Results RAD tags were generated from the genomic DNA of a pair of eggplant mapping parents, and sequenced to produce ~17.5 Mb of sequences arrangeable into ~78,000 contigs. The resulting non-redundant genomic sequence dataset consisted of ~45,000 sequences, of which ~29% were putative coding sequences and ~70% were in common between the mapping parents. The shared sequences allowed the discovery of ~10,000 SNPs and nearly 1,000 indels, equivalent to a SNP frequency of 0.8 per Kb and an indel frequency of 0.07 per Kb. Over 2,000 of the SNPs are likely to be mappable via the Illumina GoldenGate assay. A subset of 384 SNPs was used to successfully fingerprint a panel of eggplant germplasm, producing a set of informative diversity data. The RAD sequences also included nearly 2,000 putative SSRs, and primer pairs were designed to amplify 1,155 loci. Conclusion The high throughput sequencing of the RAD tags allowed the discovery of a large number of DNA markers, which will prove useful for extending our current knowledge of the genome organization of eggplant, for assisting in marker-aided selection and for carrying out comparative genomic analyses within the Solanaceae family. PMID:21663628

  8. [Isolation and expression of novel expressed sequence tags (ESTs) from ovarian follicles of Shaoxing ducks].

    PubMed

    Shu, Gang; Chen, Jie; Ni, Ying-Dong; Zhou, Yu-Chuan; Zhao, Ru-Qian

    2004-10-01

    Three expressed sequence tags ( ESTs), SXDF0201 (271 bp), SXDF0202 (200 bp) and SXDF0203 (173 bp), were isolated from ovarian follicles of Shaoxing ducks by using silver staining mRNA differential display. GenBank/BLAST analysis revealed that SXDF0201 was not homologous to any of the published sequences from all species, indicating that it was a novel EST and was then registered in GenBank (GenBank Accession No.: CB072629), while SXDF0202 and SXDF0203 were found to be highly homologous to seven known chicken ESTs and chicken mRNA for gizzard smooth muscle myosin heavy chain. 5'-RACE was employed to extend the SXDF0201 to 544 bp which was confirmed as novel in BLAST search. The temporal and spatial expression of SXDF0201 and SXDF0202 were also investigated with semi-quantitative RT-PCR. The result showed that: both SXDF0201 and SXDF0202 were found to be expressed in hypothalamus, pituitary, muscle, liver, and fat tissues of Shaoxing ducks; SXDF0201 was expressed significantly higher in ovaries of 30-day-old Shaoxing ducks compared with that of 60-day-old (P < 0.05) and 90-day-old (P = 0.015), but the expression of SXDF0202 showed no difference throughout the ovarian development; granulose layers expressed higher SXDF0201 than theca layers in almost all hierarchical follicles, the expression of SXDF0202 in granulose layers increased along with follicular maturation (P < 0.01) from Fw to F3 follicles, but decreased dramatically to the lowest in F1 follicles (P < 0.01). In theca layers, the highest expression of SXDF0202 was found in Fw follicles (P < 0.01). PMID:15552044

  9. Development of expressed sequence tag and expressed sequence tag–simple sequence repeat marker resources for Musa acuminata

    PubMed Central

    Passos, Marco A. N.; de Oliveira Cruz, Viviane; Emediato, Flavia L.; de Camargo Teixeira, Cristiane; Souza, Manoel T.; Matsumoto, Takashi; Rennó Azevedo, Vânia C.; Ferreira, Claudia F.; Amorim, Edson P.; de Alencar Figueiredo, Lucio Flavio; Martins, Natalia F.; de Jesus Barbosa Cavalcante, Maria; Baurens, Franc-Christophe; da Silva, Orzenil Bonfim; Pappas, Georgios J.; Pignolet, Luc; Abadie, Catherine; Ciampi, Ana Y.; Piffanelli, Pietro; Miller, Robert N. G.

    2012-01-01

    Background and aims Banana (Musa acuminata) is a crop contributing to global food security. Many varieties lack resistance to biotic stresses, due to sterility and narrow genetic background. The objective of this study was to develop an expressed sequence tag (EST) database of transcripts expressed during compatible and incompatible banana–Mycosphaerella fijiensis (Mf) interactions. Black leaf streak disease (BLSD), caused by Mf, is a destructive disease of banana. Microsatellite markers were developed as a resource for crop improvement. Methodology cDNA libraries were constructed from in vitro-infected leaves from BLSD-resistant M. acuminata ssp. burmaniccoides Calcutta 4 (MAC4) and susceptible M. acuminata cv. Cavendish Grande Naine (MACV). Clones were 5′-end Sanger sequenced, ESTs assembled with TGICL and unigenes annotated using BLAST, Blast2GO and InterProScan. Mreps was used to screen for simple sequence repeats (SSRs), with markers evaluated for polymorphism using 20 diploid (AA) M. acuminata accessions contrasting in resistance to Mycosphaerella leaf spot diseases. Principal results A total of 9333 high-quality ESTs were obtained for MAC4 and 3964 for MACV, which assembled into 3995 unigenes. Of these, 2592 displayed homology to genes encoding proteins with known or putative function, and 266 to genes encoding proteins with unknown function. Gene ontology (GO) classification identified 543 GO terms, 2300 unigenes were assigned to EuKaryotic orthologous group categories and 312 mapped to Kyoto Encyclopedia of Genes and Genomes pathways. A total of 624 SSR loci were identified, with trinucleotide repeat motifs the most abundant in MAC4 (54.1 %) and MACV (57.6 %). Polymorphism across M. acuminata accessions was observed with 75 markers. Alleles per polymorphic locus ranged from 2 to 8, totalling 289. The polymorphism information content ranged from 0.08 to 0.81. Conclusions This EST collection offers a resource for studying functional genes, including

  10. Gene expression profile in the anterior regeneration of the earthworm using expressed sequence tags.

    PubMed

    Cho, Sung-Jin; Lee, Myung Sik; Tak, Eun Sik; Lee, Eun; Koh, Ki Seok; Ahn, Chi Hyun; Park, Soon Cheol

    2009-01-01

    In order to gain insight into the gene expression profiles associated with anterior regeneration of the earthworm, Perionyx excavatus, we analyzed 1,159 expressed sequence tags (ESTs) derived from cDNA library early anterior regenerated tissue. Among the 1,159 ESTs analyzed, 622 (53.7%) ESTs showed significant similarity to known genes and represented 338 genes, of which 233 ESTs were singletons and 105 ESTs manifested as two or more ESTs. While 663 ESTs (57.2%) were sequenced only once, 308 ESTs (26.6%) appeared 2 to 5 times, and 188 ESTs (16.2%) were sequenced more than 5 times. A total of 803 genes were categorized into 15 groups according to their biological functions. Among 1,159 ESTs sequenced, we found several gene encoding signaling molecules, such as Notch and Distal-less. The ESTs used in this study should provide a resource for future research in earthworm regeneration. PMID:19129665

  11. Large-scale detection and application of expressed sequence tag single nucleotide polymorphisms in Nicotiana.

    PubMed

    Wang, Y; Zhou, D; Wang, S; Yang, L

    2015-01-01

    Single nucleotide polymorphisms (SNPs) are widespread in the Nicotiana genome. Using an alignment and variation detection method, we developed 20,607,973 SNPs, based on the expressed sequence tag sequences of 10 Nicotiana species. The replacement rate was much higher than the transversion rate in the SNPs, and SNPs widely exist in the Nicotiana. In vitro verification indicated that all of the SNPs were high quality and accurate. Evolutionary relationships between 15 varieties were investigated by polymerase chain reaction with a special primer; the specific 302 locus of these sequence results clearly indicated the origin of Zhongyan 100. A database of Nicotiana SNPs (NSNP) was developed to store and search for SNPs in Nicotiana. NSNP is a tool for researchers to develop SNP markers of sequence data. PMID:26214460

  12. A physical map of the X chromosome of Drosophila melanogaster: Cosmid contigs and sequence tagged sites

    SciTech Connect

    Madueno, E.; Modolell, J.; Papagiannakis, G.

    1995-04-01

    A physical map of the euchromatic X chromosome of Drosophila melanogaster has been constructed by assembling contiguous arrays of cosmids that were selected by screening a library with DNA isolated from microamplified chromosomal divisions. This map, consisting of 893 cosmids, covers {approximately}64% of the euchromatic part of the chromosome. In addition, 568 sequence tagged sites (STS), in aggregate representing 120 kb of sequenced DNA, were derived from selected cosmids. Most of these STSs, spaced at an average distance of {approximately} 35 kb along the euchromatic region of the chromosome, represent DNA tags that can be used as entry points to the fruitfly genome. Furthermore, 42 genes have been placed on the physical map, either through the hybridization of specific probes to the cosmids or through the fact that they were represented among the STSs. These provide a link between the physical and the genetic maps of D. melanogaster. Nine novel genes have been tentatively identified in Drosophila on the basis of matches between STS sequences and sequences from other species. 32 refs., 3 figs., 4 tabs.

  13. De novo sequencing of unique sequence tags for discovery of post-translational modifications of proteins

    SciTech Connect

    Shen, Yufeng; Tolic, Nikola; Hixson, Kim K.; Purvine, Samuel O.; Anderson, Gordon A.; Smith, Richard D.

    2008-10-15

    De novo sequencing has a promise to discover the protein post-translation modifications; however, such approach is still in their infancy and not widely applied for proteomics practices due to its limited reliability. In this work, we describe a de novo sequencing approach for discovery of protein modifications through identification of the UStags (Anal. Chem. 2008, 80, 1871-1882). The de novo information was obtained from Fourier-transform tandem mass spectrometry for peptides and polypeptides in a yeast lysate, and the de novo sequences obtained were filtered to define a more limited set of UStags. The DNA-predicted database protein sequences were then compared to the UStags, and the differences observed across or in the UStags (i.e., the UStags’ prefix and suffix sequences and the UStags themselves) were used to infer the possible sequence modifications. With this de novo-UStag approach, we uncovered some unexpected variances of yeast protein sequences due to amino acid mutations and/or multiple modifications to the predicted protein sequences. Random matching of the de novo sequences to the predicted sequences were examined with use of two random (false) databases, and ~3% false discovery rates were estimated for the de novo-UStag approach. The factors affecting the reliability (e.g., existence of de novo sequencing noise residues and redundant sequences) and the sensitivity are described. The de novo-UStag complements the UStag method previously reported by enabling discovery of new protein modifications.

  14. Comprehensive analyses of prostate gene expression: convergence of expressed sequence tag databases, transcript profiling and proteomics.

    PubMed

    Nelson, P S; Han, D; Rochon, Y; Corthals, G L; Lin, B; Monson, A; Nguyen, V; Franza, B R; Plymate, S R; Aebersold, R; Hood, L

    2000-05-01

    Several methods have been developed for the comprehensive analysis of gene expression in complex biological systems. Generally these procedures assess either a portion of the cellular transcriptome or a portion of the cellular proteome. Each approach has distinct conceptual and methodological advantages and disadvantages. We have investigated the application of both methods to characterize the gene expression pathway mediated by androgens and the androgen receptor in prostate cancer cells. This pathway is of critical importance for the development and progression of prostate cancer. Of clinical importance, modulation of androgens remains the mainstay of treatment for patients with advanced disease. To facilitate global gene expression studies we have first sought to define the prostate transcriptome by assembling and annotating prostate-derived expressed sequence tags (ESTs). A total of 55000 prostate ESTs were assembled into a set of 15953 clusters putatively representing 15953 distinct transcripts. These clusters were used to construct cDNA microarrays suitable for examining the androgen-response pathway at the level of transcription. The expression of 20 genes was found to be induced by androgens. This cohort included known androgen-regulated genes such as prostate-specific antigen (PSA) and several novel complementary DNAs (cDNAs). Protein expression profiles of androgen-stimulated prostate cancer cells were generated by two-dimensional electrophoresis (2-DE). Mass spectrometric analysis of androgen-regulated proteins in these cells identified the metastasis-suppressor gene NDKA/nm23, a finding that may explain a marked reduction in metastatic potential when these cells express a functional androgen receptor pathway. PMID:10870968

  15. Development of expressed sequence tag-simple sequence repeat markers for Chrysanthemum morifolium and closely related species.

    PubMed

    Liu, H; Zhang, Q X; Sun, M; Pan, H T; Kong, Z X

    2015-01-01

    With the development of chrysanthemum breeding in recent years, an increasing number of wild species in genera related to Chrysanthemum were introduced to extend the genetic resources and facilitate the genetic improvement of chrysanthemums via hybridization. However, few simple sequence repeat (SSR) markers are available for marker-assisted breeding and population genetic studies of chrysanthemum and closely related species. Expressed sequence tags (ESTs) in public databases and cross-species transferable markers are considered to be a cost-effective means for developing sequence-based markers. In this study, 25 EST-SSRs were successfully developed from Chrysanthemum EST sequences for Chrysanthemum morifolium and closely related species. In total, 4164 unigene sequences were assembled from 7180 ESTs of chrysanthemum in GenBank, which were subsequently used to screen for the presence of microsatellites with the SSRIT software. The screening criteria were 8, 5, 4, and 3 repeating units for di-, tri-, tetra-, and penta- and higher-order nucleotides, respectively. Moreover, 310 SSR loci from 296 sequences were identified, and 198 primer pairs for SSR amplification were designed with the Primer Premier 5.0 software, of which 25 SSR loci showed polymorphic amplification in 52 species and varieties belonging to Chrysanthemum, Ajania, and Opisthopappus. The application of EST-SSR markers to the identification of intergeneric hybrids between Chrysanthemum and Ajania was demonstrated. Therefore, EST-SSRs can be developed for species that lack gene sequences or ESTs by utilizing ESTs of closely related species. PMID:26214436

  16. Generation and Analysis of a Large-Scale Expressed Sequence Tag Database from a Full-Length Enriched cDNA Library of Developing Leaves of Gossypium hirsutum L

    PubMed Central

    Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun

    2013-01-01

    Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation

  17. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array.

    PubMed

    Fuller, Carl W; Kumar, Shiv; Porel, Mintu; Chien, Minchen; Bibillo, Arek; Stranges, P Benjamin; Dorwart, Michael; Tao, Chuanjuan; Li, Zengmin; Guo, Wenjing; Shi, Shundi; Korenblum, Daniel; Trans, Andrew; Aguirre, Anne; Liu, Edward; Harada, Eric T; Pollard, James; Bhat, Ashwini; Cech, Cynthia; Yang, Alexander; Arnold, Cleoma; Palla, Mirkó; Hovis, Jennifer; Chen, Roger; Morozova, Irina; Kalachikov, Sergey; Russo, James J; Kasianowicz, John J; Davis, Randy; Roever, Stefan; Church, George M; Ju, Jingyue

    2016-05-10

    DNA sequencing by synthesis (SBS) offers a robust platform to decipher nucleic acid sequences. Recently, we reported a single-molecule nanopore-based SBS strategy that accurately distinguishes four bases by electronically detecting and differentiating four different polymer tags attached to the 5'-phosphate of the nucleotides during their incorporation into a growing DNA strand catalyzed by DNA polymerase. Further developing this approach, we report here the use of nucleotides tagged at the terminal phosphate with oligonucleotide-based polymers to perform nanopore SBS on an α-hemolysin nanopore array platform. We designed and synthesized several polymer-tagged nucleotides using tags that produce different electrical current blockade levels and verified they are active substrates for DNA polymerase. A highly processive DNA polymerase was conjugated to the nanopore, and the conjugates were complexed with primer/template DNA and inserted into lipid bilayers over individually addressable electrodes of the nanopore chip. When an incoming complementary-tagged nucleotide forms a tight ternary complex with the primer/template and polymerase, the tag enters the pore, and the current blockade level is measured. The levels displayed by the four nucleotides tagged with four different polymers captured in the nanopore in such ternary complexes were clearly distinguishable and sequence-specific, enabling continuous sequence determination during the polymerase reaction. Thus, real-time single-molecule electronic DNA sequencing data with single-base resolution were obtained. The use of these polymer-tagged nucleotides, combined with polymerase tethering to nanopores and multiplexed nanopore sensors, should lead to new high-throughput sequencing methods. PMID:27091962

  18. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array

    PubMed Central

    Fuller, Carl W.; Kumar, Shiv; Porel, Mintu; Chien, Minchen; Bibillo, Arek; Stranges, P. Benjamin; Dorwart, Michael; Tao, Chuanjuan; Li, Zengmin; Guo, Wenjing; Shi, Shundi; Korenblum, Daniel; Trans, Andrew; Aguirre, Anne; Liu, Edward; Harada, Eric T.; Pollard, James; Bhat, Ashwini; Cech, Cynthia; Yang, Alexander; Arnold, Cleoma; Palla, Mirkó; Hovis, Jennifer; Chen, Roger; Morozova, Irina; Kalachikov, Sergey; Russo, James J.; Kasianowicz, John J.; Davis, Randy; Roever, Stefan; Church, George M.; Ju, Jingyue

    2016-01-01

    DNA sequencing by synthesis (SBS) offers a robust platform to decipher nucleic acid sequences. Recently, we reported a single-molecule nanopore-based SBS strategy that accurately distinguishes four bases by electronically detecting and differentiating four different polymer tags attached to the 5′-phosphate of the nucleotides during their incorporation into a growing DNA strand catalyzed by DNA polymerase. Further developing this approach, we report here the use of nucleotides tagged at the terminal phosphate with oligonucleotide-based polymers to perform nanopore SBS on an α-hemolysin nanopore array platform. We designed and synthesized several polymer-tagged nucleotides using tags that produce different electrical current blockade levels and verified they are active substrates for DNA polymerase. A highly processive DNA polymerase was conjugated to the nanopore, and the conjugates were complexed with primer/template DNA and inserted into lipid bilayers over individually addressable electrodes of the nanopore chip. When an incoming complementary-tagged nucleotide forms a tight ternary complex with the primer/template and polymerase, the tag enters the pore, and the current blockade level is measured. The levels displayed by the four nucleotides tagged with four different polymers captured in the nanopore in such ternary complexes were clearly distinguishable and sequence-specific, enabling continuous sequence determination during the polymerase reaction. Thus, real-time single-molecule electronic DNA sequencing data with single-base resolution were obtained. The use of these polymer-tagged nucleotides, combined with polymerase tethering to nanopores and multiplexed nanopore sensors, should lead to new high-throughput sequencing methods. PMID:27091962

  19. Linguistic Preprocessing and Tagging for Problem Report Trend Analysis

    NASA Technical Reports Server (NTRS)

    Beil, Robert J.; Malin, Jane T.

    2012-01-01

    Mr. Robert Beil, Systems Engineer at Kennedy Space Center (KSC), requested the NASA Engineering and Safety Center (NESC) develop a prototype tool suite that combines complementary software technology used at Johnson Space Center (JSC) and KSC for problem report preprocessing and semantic tag extraction, to improve input to data mining and trend analysis. This document contains the outcome of the assessment and the Findings, Observations and NESC Recommendations.

  20. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

    PubMed Central

    Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.

    2001-01-01

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022

  1. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.

    PubMed

    Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M

    2001-10-01

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022

  2. OSIRIS-REx Touch-And-Go (TAG) Mission Design and Analysis

    NASA Technical Reports Server (NTRS)

    Berry, Kevin; Sutter, Brian; May, Alex; Williams, Ken; Barbee, Brent W.; Beckman, Mark; Williams, Bobby

    2013-01-01

    The Origins Spectral Interpretation Resource Identification Security Regolith Explorer (OSIRIS-REx) mission is a NASA New Frontiers mission launching in 2016 to rendezvous with the near-Earth asteroid (101955) 1999 RQ36 in late 2018. After several months in formation with and orbit about the asteroid, OSIRIS-REx will fly a Touch-And-Go (TAG) trajectory to the asteroid s surface to obtain a regolith sample. This paper describes the mission design of the TAG sequence and the propulsive maneuvers required to achieve the trajectory. This paper also shows preliminary results of orbit covariance analysis and Monte-Carlo analysis that demonstrate the ability to arrive at a targeted location on the surface of RQ36 within a 25 meter radius with 98.3% confidence.

  3. Generation of expressed sequence tags of random root cDNA clones of Brassica napus by single-run partial sequencing.

    PubMed Central

    Park, Y S; Kwak, J M; Kwon, O Y; Kim, Y S; Lee, D S; Cho, M J; Lee, H H; Nam, H G

    1993-01-01

    Two hundred thirty-seven expressed sequence tags (ESTs) of Brassica napus were generated by single-run partial sequencing of 197 random root cDNA clones. A computer search of these root ESTs revealed that 21 ESTs show significant similarity to the protein-coding sequences in the existing data bases, including five stress- or defense-related genes and four clones related to the genes from other kingdoms. Northern blot analysis of the 10 data base-matched cDNA clones revealed that many of the clones are expressed most abundantly in root but less abundantly in other organs. However, two clones were highly root specific. The results show that generation of the root ESTs by partial sequencing of random cDNA clones along with the expression analysis is an efficient approach to isolate genes that are functional in plant root in a large scale. We also discuss the results of the examination of cDNA libraries and sequencing methods suitable for this approach. PMID:8029332

  4. Generation of expressed sequence tags under cadmium stress for gene discovery and development of molecular markers in chickpea.

    PubMed

    Gaur, Rashmi; Bhatia, Sabhyata; Gupta, Meetu

    2014-07-01

    Chickpea is the world's third most important legume crop and belongs to Fabaceae family but suffered from severe yield loss due to various biotic and abiotic stresses. Development of modern genomic tools such as molecular markers and identification of resistant genes associated with these stresses facilitate improvement in chickpea breeding towards abiotic stress tolerance. In this study, 1597 high-quality expressed sequence tags (ESTs) were generated from a cDNA library of variety Pusa 1105 root tissue after cadmium (Cd) treatment. Assembly of ESTs resulted in a total of 914 unigenes of which putative homology was obtained for 38.8 % of unigenes after BLASTX search. In terms of species distribution, majority of sequences found similarity with Medicago truncatula followed by Glycine max, Vitis vinifera and Populus trichocarpa and Pisum sativum sequences. Functional annotation was assigned using Blast2Go, and the Gene Ontology (GO) terms were categorized into biological process, molecular function and cellular component. Approximately 10.83 % of unigenes were assigned at least one GO term. Moreover, in the distribution of transcripts into various biological pathways, 20 of the annotated transcripts were assigned to ten pathways in KEGG database. A majority of the genes were found to be involved in sulphur and nitrogen metabolism. In the quantitative real-time PCR analysis, five of the transcription factors and three of the transporter genes were found to be highly expressed after Cd treatment. Besides, the utility of ESTs was demonstrated by exploiting them for the development of 83 genic molecular markers including EST-simple sequence repeats and intron targeted polymorphism that would assist in tagging of genes related to metal stress for future prospects. PMID:24414095

  5. Development of polymorphic microsatellite markers based on expressed sequence tags in Populus cathayana (Salicaceae).

    PubMed

    Tian, Z Z; Zhang, F Q; Cai, Z Y; Chen, S L

    2016-01-01

    Populus cathayana occupies a large area within the northern, central, and southwestern regions of China, and is considered to be an important reforestation species in western China. In order to investigate the population genetic structure of this species, 10 polymorphic microsatellite loci were identified based on expressed sequence tags from de novo sequencing on the Illumina HiSeq 2000 platform. All microsatellite primers were tested on 48 P. cathayana individuals from four locations on the Qinghai-Tibet Plateau. The observed heterozygosity ranged from 0.000 to 1.000, and the null-allele frequency ranged from 0.000 to 0.904. These microsatellite markers may be a useful tool in genetic studies on P. cathayana and closely related species. PMID:27525845

  6. A known expressed sequence tag, BM742401, is a potent lincRNA inhibiting cancer metastasis.

    PubMed

    Park, Seong-Min; Park, Sung-Joon; Kim, Hee-Jin; Kwon, Oh-Hyung; Kang, Tae-Wook; Sohn, Hyun-Ahm; Kim, Seon-Kyu; Moo Noh, Seung; Song, Kyu-Sang; Jang, Se-Jin; Sung Kim, Yong; Kim, Seon-Young

    2013-01-01

    Long intergenic non-coding RNAs (lincRNAs) have historically been ignored in cancer biology. However, thousands of lincRNAs have been identified in mammals using recently developed genomic tools, including microarray and high-throughput RNA sequencing (RNA-seq). Several of the lincRNAs identified have been well characterized for their functions in carcinogenesis. Here we performed RNA-seq experiments comparing gastric cancer with normal tissues to find differentially expressed transcripts in intergenic regions. By analyzing our own RNA-seq and public microarray data, we identified 31 transcripts, including a known expressed sequence tag, BM742401. BM742401 was downregulated in cancer, and its downregulation was associated with poor survival in gastric cancer patients. Ectopic overexpression of BM742401 inhibited metastasis-related phenotypes and decreased the concentration of extracellular MMP9. These results suggest that BM742401 is a potential lincRNA marker and therapeutic target. PMID:23846333

  7. Species diagnostic single-nucleotide polymorphism and sequence-tagged site markers for the parasitic WASP Genus Nasonia (Hymenoptera: Ptermalidae)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We developed, identified and evaluated eight single nucleotide polymorphism (SNP) and three sequence-tagged site (STS) markers in nuclear gene sequences of the wasp genus Nasonia (Hymenoptera). We studied variation of these markers in natural populations of the closely related and regionally sympatr...

  8. Behavior Analysis Based on Coordinates of Body Tags

    NASA Astrophysics Data System (ADS)

    Luštrek, Mitja; Kaluža, Boštjan; Dovgan, Erik; Pogorelc, Bogdan; Gams, Matjaž

    This paper describes fall detection, activity recognition and the detection of anomalous gait in the Confidence project. The project aims to prolong the independence of the elderly by detecting falls and other types of behavior indicating a health problem. The behavior will be analyzed based on the coordinates of tags worn on the body. The coordinates will be detected with radio sensors. We describe two Confidence modules. The first one classifies the user's activity into one of six classes, including falling. The second one detects walking anomalies, such as limping, dizziness and hemiplegia. The walking analysis can automatically adapt to each person by using only the examples of normal walking of that person. Both modules employ machine learning: the paper focuses on the features they use and the effect of tag placement and sensor noise on the classification accuracy. Four tags were enough for activity recognition accuracy of over 93% at moderate sensor noise, while six were needed to detect walking anomalies with the accuracy of over 90%.

  9. A Comprehensive Approach to Clustering of Expressed Human Gene Sequence: The Sequence Tag Alignment and Consensus Knowledge Base

    PubMed Central

    Miller, Robert T.; Christoffels, Alan G.; Gopalakrishnan, Chella; Burke, John; Ptitsyn, Andrey A.; Broveak, Tania R.; Hide, Winston A.

    1999-01-01

    The expressed human genome is being sequenced and analyzed by disparate groups producing disparate data. The majority of the identified coding portion is in the form of expressed sequence tags (ESTs). The need to discover exonic representation and expression forms of full-length cDNAs for each human gene is frustrated by the partial and variable quality nature of this data delivery. A highly redundant human EST data set has been processed into integrated and unified expressed transcript indices that consist of hierarchically organized human transcript consensi reflecting gene expression forms and genetic polymorphism within an index class. The expression index and its intermediate outputs include cleaned transcript sequence, expression, and alignment information and a higher fidelity subset, SANIGENE. The STACK_PACK clustering system has been applied to dbEST release 121598 (GenBank version 110). Sixty-four percent of 1,313,103 Homo sapiens ESTs are condensed into 143,885 tissue level multiple sequence clusters; linking through clone-ID annotations produces 68,701 total assemblies, such that 81% of the original input set is captured in a STACK multiple sequence or linked cluster. Indexing of alignments by substituent EST accession allows browsing of the data structure and its cross-links to UniGene. STACK metaclusters consolidate a greater number of ESTs by a factor of 1.86 with respect to the corresponding UniGene build. Fidelity comparison with genome reference sequence AC004106 demonstrates consensus expression clusters that reflect significantly lower spurious repeat sequence content and capture alternate splicing within a whole body index cluster and three STACK v.2.3 tissue-level clusters. Statistics of a staggered release whole body index build of STACK v.2.0 are presented. PMID:10568754

  10. Development of expressed sequence tag resources for Vanda Mimi Palmer and data mining for EST-SSR.

    PubMed

    Teh, Seow-Ling; Chan, Wai-Sun; Abdullah, Janna Ong; Namasivayam, Parameswari

    2011-08-01

    Vanda Mimi Palmer (VMP) is a highly sought as fragrant-orchid hybrid in Malaysia. It is economically important in cosmetic and beauty industries and also a famous potted ornamental plant. To date, no work on fragrance-related genes of vandaceous orchids has been reported from other research groups although the analysis of floral fragrance or volatiles have been extensively studied. An expressed sequence tag (EST) resource was developed for VMP principally to mine any potential fragrance-related expressed sequence tag-simple sequence repeat (EST-SSR) for future development as markers in the identification of fragrant vandaceous orchids endemic to Malaysia. Clustering, annotation and assembling of the ESTs identified 1,196 unigenes which defined 966 singletons and 230 contigs. The VMP dbEST was functionally classified by gene ontology (GO) into three groups: molecular functions (51.2%), cellular components (16.4%) and biological processes (24.6%) while the remaining 7.8% showed no hits with GO identifier. A total of 112 EST-SSR (9.4%) was mined on which at least five units of di-, tri-, tetra-, penta-, or hexa-nucleotide repeats were predicted. The di-nucleotide motif repeats appeared to be the most frequent repeats among the detected SSRs with the AT/TA types as the most abundant among the dimerics, while AAG/TTC, AGA/TCT-type were the most frequent trimerics. The mined EST-SSR is believed to be useful in the development of EST-SSR markers that is applicable in the screening and characterization of fragrance-related transcripts in closely related species. PMID:21116862

  11. Development of Simple Sequence Repeat Markers from Expressed Sequence Tags of the Maize Gray Leaf Spot Pathogen, Cercospora Zea-Maydis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Ten simple sequence repeat markers were developed from expressed sequence tags of Cercospora zeae-maydis, the cause of gray leaf spot of maize (Zea mays). All loci were evaluated on 80 isolates from a local population of C. zeae-maydis and all were highly polymorphic, with 4 to 14 alleles per locus....

  12. Genome-wide characterization and selection of expressed sequence tag simple sequence repeat primers for optimized marker distribution and reliability in peach

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Expressed sequence tag (EST) simple sequence repeats (SSRs) in Prunus were mined, and flanking primers designed and used for genome-wide characterization and selection of primers to optimize marker distribution and reliability. A total of 12,618 contigs were assembled from 84,727 ESTs, along with 34...

  13. Development of polymorphic expressed sequence tag-single sequence repeat markers in the common Chinese cuttlefish, Sepiella maindroni.

    PubMed

    Li, R H; Lu, S K; Zhang, C L; Song, W W; Mu, C K; Wang, C L

    2014-01-01

    The common Chinese cuttlefish (Sepiella maindroni) is one of the popular edible cephalopod consumed across Asia. To facilitate the population genetic investigation of this species, we developed fourteen polymorphic microsatellite makers from expressed sequence tags of S. maindroni. The number of alleles at each locus ranged from 6 to 10 with an average of 7.9 alleles per locus. The ranges of observed and expected heterozygosity were from 0.615 to 0.962 and 0.685 to 0.888, respectively. Four loci were found deviated significantly from Hardy-Weinberg equilibrium. The polymorphism information content ranged from 0.638 to 0.833. These polymorphic microsatellite loci will be helpful for the population genetic, genetic linkage map, and other genetic studies of S. maindroni. PMID:25117305

  14. Characterization of expressed sequence tags from a full-length enriched cDNA library of Cryptomeria japonica male strobili

    PubMed Central

    Futamura, Norihiro; Totoki, Yasushi; Toyoda, Atsushi; Igasaki, Tomohiro; Nanjo, Tokihiko; Seki, Motoaki; Sakaki, Yoshiyuki; Mari, Adriano; Shinozaki, Kazuo; Shinohara, Kenji

    2008-01-01

    Background Cryptomeria japonica D. Don is one of the most commercially important conifers in Japan. However, the allergic disease caused by its pollen is a severe public health problem in Japan. Since large-scale analysis of expressed sequence tags (ESTs) in the male strobili of C. japonica should help us to clarify the overall expression of genes during the process of pollen development, we constructed a full-length enriched cDNA library that was derived from male strobili at various developmental stages. Results We obtained 36,011 expressed sequence tags (ESTs) from either one or both ends of 19,437 clones derived from the cDNA library of C. japonica male strobili at various developmental stages. The 19,437 cDNA clones corresponded to 10,463 transcripts. Approximately 80% of the transcripts resembled ESTs from Pinus and Picea, while approximately 75% had homologs in Arabidopsis. An analysis of homologies between ESTs from C. japonica male strobili and known pollen allergens in the Allergome Database revealed that products of 180 transcripts exhibited significant homology. Approximately 2% of the transcripts appeared to encode transcription factors. We identified twelve genes for MADS-box proteins among these transcription factors. The twelve MADS-box genes were classified as DEF/GLO/GGM13-, AG-, AGL6-, TM3- and TM8-like MIKCC genes and type I MADS-box genes. Conclusion Our full-length enriched cDNA library derived from C. japonica male strobili provides information on expression of genes during the development of male reproductive organs. We provided potential allergens in C. japonica. We also provided new information about transcription factors including MADS-box genes expressed in male strobili of C. japonica. Large-scale gene discovery using full-length cDNAs is a valuable tool for studies of gymnosperm species. PMID:18691438

  15. Expressed sequence tags from the red imported fire ant, Solenopsis invicta: annotation and utilization for discovery of viruses.

    PubMed

    Valles, Steven M; Strong, Charles A; Hunter, Wayne B; Dang, Phat M; Pereira, Roberto M; Oi, David H; Williams, David F

    2008-09-01

    An expression library was created and 2304 clones sequenced from a monogyne colony of Solenopsis invicta. The primary intention of the project was to utilize homologous gene identification to facilitate discovery of viruses infecting this ant pest that could potentially be used in pest management. Additional genes were identified from the ant host and associated pathogens that serve as an important resource for studying these organisms. After assembly and removal of mitochondrial and poor quality sequences, 1054 unique sequences were yielded and deposited into the GenBank database under Accession Nos. EH412746 through EH413799. At least nine expressed sequence tags (ESTs) were identified as possessing microsatellite motifs and 15 ESTs exhibited significant homology with microsporidian genes. These sequences most likely originated from Thelohania solenopsae, a well-characterized microsporidian that infects S. invicta. Six ESTs exhibited significant homology with single-stranded RNA viruses (3B4, 3F6, 11F1, 12G12, 14D5, and 24C10). Subsequent analysis of these putative viral ESTs revealed that 3B4 was most likely a ribosomal gene of S. invicta, 11F1 was a single-stranded RNA (ssRNA) virus contaminant introduced into the colony from the cricket food source, 12G12 appeared to be a plant-infecting tenuivirus also introduced into the colony as a field contaminant, and 3F6, 14D5, and 24C10 were all from a unique ssRNA virus found to infect S. invicta. The sequencing project illustrates the utility of this method for discovery of viruses and pathogens that may otherwise go undiscovered. PMID:18329665

  16. Mining for single nucleotide polymorphisms and insertions / deletions in expressed sequence tag libraries of oil palm.

    PubMed

    Riju, Aykkal; Chandrasekar, Arumugam; Arunachalam, Vadivel

    2007-01-01

    The oil palm is a tropical oil bearing tree. Recently EST-derived SNPs and SSRs are a free by-product of the currently expanding EST (Expressed Sequence Tag) data bases. The development of high-throughput methods for the detection of SNPs (Single Nucleotide Polymorphism) and small indels (insertion / deletion) has led to a revolution in their use as molecular markers. Available (5452) Oil palm EST sequences were mined from dbEST of NCBI. CAP3 program was used to assemble EST sequences into contigs. Candidate SNPs and Indel polymorphisms were detected using the perl script auto_snip version 1.0 which has used 576 ESTs for detecting SNPs and Indel sites. We found 1180 SNP sites and 137 indel polymorphisms with frequency 1.36 SNPs / 100 bp. Among the six tissues from which the EST libraries had been generated, mesocarp had high frequency of 2.91 SNPs and indels per 100 bp whereas the zygotic embryos had lowest frequency of 0.15 per 100 bp. We also used the Shannon index to analyze the proportion of ten possible types of SNP/indels. ESTs from tissues of normal apex showed highest values of Shannon index (0.60) whereas abnormal apex had least value (0.02). The present report deals the use of Shannon index for comparing SNP/ indel frequencies mined from ESTlibraries and also confirm that the frequency of SNP occurrence in oil palm to use them as markers for genetic studies. PMID:21670789

  17. Serial number tagging reveals a prominent sequence preference of retrotransposon integration.

    PubMed

    Chatterjee, Atreyi Ghatak; Esnault, Caroline; Guo, Yabin; Hung, Stevephen; McQueen, Philip G; Levin, Henry L

    2014-07-01

    Transposable elements (TE) have both negative and positive impact on the biology of their host. As a result, a balance is struck between the host and the TE that relies on directing integration to specific genome territories. The extraordinary capacity of DNA sequencing can create ultra dense maps of integration that are being used to study the mechanisms that position integration. Unfortunately, the great increase in the numbers of insertion sites detected comes with the cost of not knowing which positions are rare targets and which sustain high numbers of insertions. To address this problem we developed the serial number system, a TE tagging method that measures the frequency of integration at single nucleotide positions. We sequenced 1 million insertions of retrotransposon Tf1 in the genome of Schizosaccharomyces pombe and obtained the first profile of integration with frequencies for each individual position. Integration levels at individual nucleotides varied over two orders of magnitude and revealed that sequence recognition plays a key role in positioning integration. The serial number system is a general method that can be applied to determine precise integration maps for retroviruses and gene therapy vectors. PMID:24948612

  18. GST-PRIME: a genome-wide primer design software for the generation of gene sequence tags

    PubMed Central

    Varotto, Claudio; Richly, Erik; Salamini, Francesco; Leister, Dario

    2001-01-01

    The availability of sequenced genomes has generated a need for experimental approaches that allow the simultaneous analysis of large, or even complete, sets of genes. To facilitate such analyses, we have developed GST-PRIME, a software package for retrieving and assembling gene sequences, even from complex genomes, using the NCBI public database, and then designing sets of primer pairs for use in gene amplification. Primers were designed by the program for the direct amplification of gene sequence tags (GSTs) from either genomic DNA or cDNA. Test runs of GST-PRIME on 2000 randomly selected Arabidopsis and Drosophila genes demonstrate that 93 and 88% of resulting GSTs, respectively, fulfilled imposed length criteria. GST-PRIME primer pairs were tested on a set of 1900 Arabidopsis genes coding for chloroplast-targeted proteins: 95% of the primer pairs used in PCRs with genomic DNA generated the correct amplicons. GST-PRIME can thus be reliably used for large-scale or specific amplification of intron-containing genes of multicellular eukaryotes. PMID:11691924

  19. GST-PRIME: a genome-wide primer design software for the generation of gene sequence tags.

    PubMed

    Varotto, C; Richly, E; Salamini, F; Leister, D

    2001-11-01

    The availability of sequenced genomes has generated a need for experimental approaches that allow the simultaneous analysis of large, or even complete, sets of genes. To facilitate such analyses, we have developed GST-PRIME, a software package for retrieving and assembling gene sequences, even from complex genomes, using the NCBI public database, and then designing sets of primer pairs for use in gene amplification. Primers were designed by the program for the direct amplification of gene sequence tags (GSTs) from either genomic DNA or cDNA. Test runs of GST-PRIME on 2000 randomly selected Arabidopsis and Drosophila genes demonstrate that 93 and 88% of resulting GSTs, respectively, fulfilled imposed length criteria. GST-PRIME primer pairs were tested on a set of 1900 Arabidopsis genes coding for chloroplast-targeted proteins: 95% of the primer pairs used in PCRs with genomic DNA generated the correct amplicons. GST-PRIME can thus be reliably used for large-scale or specific amplification of intron-containing genes of multicellular eukaryotes. PMID:11691924

  20. Proteomic analysis of Trypanosoma cruzi developmental stages using isotope-coded affinity tag reagents.

    PubMed

    Paba, Jaime; Ricart, Carlos A O; Fontes, Wagner; Santana, Jaime M; Teixeira, Antonio R L; Marchese, Jason; Williamson, Brian; Hunt, Tony; Karger, Barry L; Sousa, Marcelo V

    2004-01-01

    Comparative proteome analysis of developmental stages of the human pathogen Trypanosoma cruzi was carried out by isotope-coded affinity tag technology (ICAT) associated with liquid cromatography-mass spectrometry peptide sequencing (LC-MS/MS). Protein extracts of the protozoan trypomastigote and amastigote stages were labeled with heavy (D8) and light (D0) ICAT reagents and subjected to cation exchange and avidin affinity chromatographies followed by LC-MS/MS analysis. High confidence sequence information and expression levels for 41 T. cruzi polypeptides, including metabolic enzymes, paraflagellar rod components, tubulins, and heat-shock proteins were reported. Twenty-nine proteins displayed similar levels of expression in both forms of the parasite, nine proteins presented higher levels in trypomastigotes, whereas three were more expressed in amastigotes. PMID:15253433

  1. Mapping of Heterologous Expressed Sequence Tags as an Alternative to Microarrays for Study of Defense Responses in Plants

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In this study, we used publicly available EST (expressed sequence tags) database derived from four different plant species infected with a variety of pathogens, to generate an expression profile of orthologous genes involved in defense response of a model organism, Arabidopsis thaliana. Computer-ass...

  2. The HaloTag: Improving Soluble Expression and Applications in Protein Functional Analysis.

    PubMed

    N Peterson, Scott; Kwon, Keehwan

    2012-01-01

    Technological and methodological advances have been critical for the rapidly evolving field of proteomics. The development of fusion tag systems is essential for purification and analysis of recombinant proteins. The HaloTag is a 34 KDa monomeric protein derived from a bacterial haloalkane dehalogenase. The majority of fusion tags in use today utilize a reversible binding interaction with a specific ligand. The HaloTag system is unique in that it forms a covalent linkage to its chloroalkane ligand. This linkage permits attachment of the HaloTag to a variety of functional reporters, which can be used to label and immobilize recombinant proteins. The success rate for HaloTag expression of soluble proteins is very high and comparable to maltose binding protein (MBP) tag. Furthermore, cleavage of the HaloTag does not result in protein insolubility that often is observed with the MBP tag. In the present report, we describe applications of the HaloTag system in our ongoing investigation of protein-protein interactions of the Y. pestis Type 3 secretion system on a custom protein microarray. We also describe the utilization of affinity purification/mass spectroscopy (AP/MS) to evaluate the utility of the Halo Tag system to characterize DNA binding activity and protein specificity. PMID:23115610

  3. The HaloTag: Improving Soluble Expression and Applications in Protein Functional Analysis

    PubMed Central

    N Peterson, Scott; Kwon, Keehwan

    2012-01-01

    Technological and methodological advances have been critical for the rapidly evolving field of proteomics. The development of fusion tag systems is essential for purification and analysis of recombinant proteins. The HaloTag is a 34 KDa monomeric protein derived from a bacterial haloalkane dehalogenase. The majority of fusion tags in use today utilize a reversible binding interaction with a specific ligand. The HaloTag system is unique in that it forms a covalent linkage to its chloroalkane ligand. This linkage permits attachment of the HaloTag to a variety of functional reporters, which can be used to label and immobilize recombinant proteins. The success rate for HaloTag expression of soluble proteins is very high and comparable to maltose binding protein (MBP) tag. Furthermore, cleavage of the HaloTag does not result in protein insolubility that often is observed with the MBP tag. In the present report, we describe applications of the HaloTag system in our ongoing investigation of protein-protein interactions of the Y. pestis Type 3 secretion system on a custom protein microarray. We also describe the utilization of affinity purification/mass spectroscopy (AP/MS) to evaluate the utility of the Halo Tag system to characterize DNA binding activity and protein specificity. PMID:23115610

  4. Construction of a Lotus japonicus late nodulin expressed sequence tag library and identification of novel nodule-specific genes.

    PubMed Central

    Szczyglowski, K; Hamburger, D; Kapranov, P; de Bruijn, F J

    1997-01-01

    A range of novel expressed sequence tags (ESTs) associated with late developmental events during nodule organogenesis in the legume Lotus japonicus were identified using mRNA differential display; 110 differentially displayed polymerase chain reaction products were cloned and analyzed. Of 88 unique cDNAs obtained, 22 shared significant homology to DNA/protein sequences in the respective databases. This group comprises, among others, a nodule-specific homolog of protein phosphatase 2C, a peptide transporter protein, and a nodule-specific form of cytochrome P450. RNA gel-blot analysis of 16 differentially displayed ESTs confirmed their nodule-specific expression pattern. The kinetics of mRNA accumulation of the majority of the ESTs analyzed were found to resemble the expression pattern observed for the L. japonicus leghemoglobin gene. These results indicate that the newly isolated molecular markers correspond to genes induced during late developmental stages of L. japonicus nodule organogenesis and provide important, novel tools for the study of nodulation. PMID:9276951

  5. Myocardial motion estimation in tagged MR sequences by using alphaMI-based non rigid registration.

    PubMed

    Oubel, E; Tobon-Gomez, C; Hero, A O; Frangi, A F

    2005-01-01

    Tagged Magnetic Resonance Imaging (MRI) is currently the reference MR modality for myocardial motion and strain analysis. NMI-based non rigid registration has proven to be an accurate method to retrieve cardiac deformation fields. The use of alphaMI permits higher dimensional features to be implemented in myocardial deformation estimation through image registration. This paper demonstrates that this is feasible with a set of Haar wavelet features of high dimension. While we do not demonstrate performance improvement for this set of features, there is no significant degradation as compared to implementing the registration method with the traditional NMI metric. We use Entropic Spanning Graphs (ESGs) to estimate the alphaMI of the wavelet feature vectors WFVs since this is not possible with histograms. To the best of our knowledge, this is the first time that ESGs are used for non rigid registration. PMID:16685969

  6. Parallel tagged amplicon sequencing of transcriptome-based genetic markers for Triturus newts with the Ion Torrent next-generation sequencing platform

    PubMed Central

    Wielstra, B; Duijm, E; Lagler, P; Lammers, Y; Meilink, W R M; Ziermann, J M; Arntzen, J W

    2014-01-01

    Next-generation sequencing is a fast and cost-effective way to obtain sequence data for nonmodel organisms for many markers and for many individuals. We describe a protocol through which we obtain orthologous markers for the crested newts (Amphibia: Salamandridae: Triturus), suitable for analysis of interspecific hybridization. We use transcriptome data of a single Triturus species and design 96 primer pairs that amplify c. 180 bp fragments positioned in 3-prime untranslated regions. Next, these markers are tested with uniplex PCR for a set of species spanning the taxonomical width of the genus Triturus. The 52 markers that consistently show a single band of expected length at gel electrophoreses for all tested crested newt species are then amplified in five multiplex PCRs (with a plexity of ten or eleven) for 132 individual newts: a set of 84 representing the seven (candidate) species and a set of 48 from a presumed hybrid population. After pooling multiplexes per individual, unique tags are ligated to link amplicons to individuals. Subsequently, individuals are pooled equimolar and sequenced on the Ion Torrent next-generation sequencing platform. A bioinformatics pipeline identifies the alleles and recodes these to a genotypic format. Next, we test the utility of our markers. baps allocates the 84 crested newt individuals representing (candidate) species to their expected (candidate) species, confirming the markers are suitable for species delineation. newhybrids, a hybrid index and hiest confirm the 48 individuals from the presumed hybrid population to be genetically admixed, illustrating the potential of the markers to identify interspecific hybridization. We expect the set of markers we designed to provide a high resolving power for analysis of hybridization in Triturus. PMID:24571307

  7. Sequence analysis on microcomputers.

    PubMed

    Cannon, G C

    1987-10-01

    Overall, each of the program packages performed their tasks satisfactorily. For analyses where there was a well-defined answer, such as a search for a restriction site, there were few significant differences between the program sets. However, for tasks in which a degree of flexibility is desirable, such as homology or similarity determinations and database searches, DNASTAR consistently afforded the user more options in conducting the required analysis than did the other two packages. However, for laboratories where sequence analysis is not a major effort and the expense of a full sequence analysis workstation cannot be justified, MicroGenie and IBI-Pustell offer a satisfactory alternative. MicroGenie is a polished program system. Many may find that its user interface is more "user friendly" than the standard menu-driven interfaces. Its system of filing sequences under individual passwords facilitates use by more than one person. MicroGenie uses a hardware device for software protection that occupies a card slot in the computer on which it is used. Although I am sympathetic to the problem of software piracy, I feel that a less drastic solution is in order for a program likely to be sharing limited computer space with other software packages. The IBI-Pustell package performs the required analysis functions as accurately and quickly as MicroGenie but it lacks the clearness and ease of use. The menu system seems disjointed, and new or infrequent users often find themselves at apparent "dead-end menus" where the only clear alternative is to restart the entire program package. It is suggested from published accounts that the user interface is going to be upgraded and perhaps when that version is available, use of the system will be improved. The documentation accompanying each package was relatively clear as to how to run the programs, but all three packages assumed that the user was familiar with the computational techniques employed. MicroGenie and IBI-Pustell further

  8. A new method to identify flanking sequence tags in chlamydomonas using 3’-RACE

    PubMed Central

    2012-01-01

    Background The green alga Chlamydomonas reinhardtii, although a premier model organism in biology, still lacks extensive insertion mutant libraries with well-identified Flanking Sequence Tags (FSTs). Rapid and efficient methods are needed for FST retrieval. Results Here, we present a novel method to identify FSTs in insertional mutants of Chlamydomonas. Transformants can be obtained with a resistance cassette lacking a 3’ untranslated region (UTR), suggesting that the RNA that is produced from the resistance marker terminates in the flanking genome when it encounters a cleavage/polyadenylation signal. We have used a robust 3’-RACE method to specifically amplify such chimeric cDNAs. Out of 38 randomly chosen transformants, 27 (71%) yielded valid FSTs, of which 23 could be unambiguously mapped to the genome. Eighteen of the mutants lie within a predicted gene. All but two of the intragenic insertions occur in the sense orientation with respect to transcription, suggesting a bias against situations of convergent transcription. Among the 14 insertion sites tested by genomic PCR, 12 could be confirmed. Among these are insertions in genes coding for PSBS3 (possibly involved in non-photochemical quenching), the NimA-related protein kinase CNK2, the mono-dehydroascorbate reductase MDAR1, the phosphoglycerate mutase PGM5 etc.. Conclusion We propose that our 3’-RACE FST method can be used to build large scale FST libraries in Chlamydomonas and other transformable organisms. PMID:22735168

  9. The non-coding RNA composition of the mitotic chromosome by 5'-tag sequencing.

    PubMed

    Meng, Yicong; Yi, Xianfu; Li, Xinhui; Hu, Chuansheng; Wang, Ju; Bai, Ling; Czajkowsky, Daniel M; Shao, Zhifeng

    2016-06-01

    Mitotic chromosomes are one of the most commonly recognized sub-cellular structures in eukaryotic cells. Yet basic information necessary to understand their structure and assembly, such as their composition, is still lacking. Recent proteomic studies have begun to fill this void, identifying hundreds of RNA-binding proteins bound to mitotic chromosomes. However, by contrast, there are only two RNA species (U3 snRNA and rRNA) that are known to be associated with the mitotic chromosome, suggesting that there are many mitotic chromosome-associated RNAs (mCARs) not yet identified. Here, using a targeted protocol based on 5'-tag sequencing to profile the mammalian mCAR population, we report the identification of 1279 mCARs, the majority of which are ncRNAs, including lncRNAs that exhibit greater conservation across 60 vertebrate species than the entire population of lncRNAs. There is also a significant enrichment of snoRNAs and specific SINE RNAs. Finally, ∼40% of the mCARs are presently unannotated, many of which are as abundant as the annotated mCARs, suggesting that there are also many novel ncRNAs in the mCARs. Overall, the mCARs identified here, together with the previous proteomic and genomic data, constitute the first comprehensive catalogue of the molecular composition of the eukaryotic mitotic chromosomes. PMID:27016738

  10. Isolation of expressed sequence tags of Agaricus bisporus and their assignment to chromosomes.

    PubMed Central

    Sonnenberg, A S; de Groot, P W; Schaap, P J; Baars, J J; Visser, J; Van Griensven, L J

    1996-01-01

    The genome of the cultivated basidiomycete Agaricus bisporus Horst U1 and of its homokaryotic parents has been characterized by using an optimized method of pulsed-field gel electrophoresis. Expressed sequence tags obtained as expressed cDNAs from a primordial tissue-derived cDNA library and a number of previously isolated genes were used to identify the individual chromosomes of the parental lines of Horst U1. The genome consists of 13 chromosomes, and its total size is 31 Mb. For those chromosomes that could not be resolved by contour-clamped homogeneous electric field electrophoresis, the segregation of marker genes was studied in a set of 86 homokaryotic offspring of Horst U1. At least two markers were assigned to each individual chromosome. In this way all individual chromosomes were unequivocally identified. The large size difference observed between the homologous chromosomes IX, harboring the rDNA repeat, was shown to be largely due to a higher copy number of rDNA in parental strain H97 than in parental strain H39. PMID:8953726

  11. The non-coding RNA composition of the mitotic chromosome by 5′-tag sequencing

    PubMed Central

    Meng, Yicong; Yi, Xianfu; Li, Xinhui; Hu, Chuansheng; Wang, Ju; Bai, Ling; Czajkowsky, Daniel M.; Shao, Zhifeng

    2016-01-01

    Mitotic chromosomes are one of the most commonly recognized sub-cellular structures in eukaryotic cells. Yet basic information necessary to understand their structure and assembly, such as their composition, is still lacking. Recent proteomic studies have begun to fill this void, identifying hundreds of RNA-binding proteins bound to mitotic chromosomes. However, by contrast, there are only two RNA species (U3 snRNA and rRNA) that are known to be associated with the mitotic chromosome, suggesting that there are many mitotic chromosome-associated RNAs (mCARs) not yet identified. Here, using a targeted protocol based on 5′-tag sequencing to profile the mammalian mCAR population, we report the identification of 1279 mCARs, the majority of which are ncRNAs, including lncRNAs that exhibit greater conservation across 60 vertebrate species than the entire population of lncRNAs. There is also a significant enrichment of snoRNAs and specific SINE RNAs. Finally, ∼40% of the mCARs are presently unannotated, many of which are as abundant as the annotated mCARs, suggesting that there are also many novel ncRNAs in the mCARs. Overall, the mCARs identified here, together with the previous proteomic and genomic data, constitute the first comprehensive catalogue of the molecular composition of the eukaryotic mitotic chromosomes. PMID:27016738

  12. ISHAN: sequence homology analysis package.

    PubMed

    Shil, Pratip; Dudani, Niraj; Vidyasagar, Pandit B

    2006-01-01

    Sequence based homology studies play an important role in evolutionary tracing and classification of proteins. Various methods are available to analyze biological sequence information. However, with the advent of proteomics era, there is a growing demand for analysis of huge amount of biological sequence information, and it has become necessary to have programs that would provide speedy analysis. ISHAN has been developed as a homology analysis package, built on various sequence analysis tools viz FASTA, ALIGN, CLUSTALW, PHYLIP and CODONW (for DNA sequences). This JAVA application offers the user choice of analysis tools. For testing, ISHAN was applied to perform phylogenetic analysis for sets of Caspase 3 DNA sequences and NF-kappaB p105 amino acid sequences. By integrating several tools it has made analysis much faster and reduced manual intervention. PMID:17274766

  13. Rediscovering Medicinal Plants' Potential with OMICS: Microsatellite Survey in Expressed Sequence Tags of Eleven Traditional Plants with Potent Antidiabetic Properties

    PubMed Central

    Sahu, Jagajjit; Sen, Priyabrata; Choudhury, Manabendra Dutta; Dehury, Budheswar; Barooah, Madhumita; Modi, Mahendra Kumar

    2014-01-01

    Abstract Herbal medicines and traditionally used medicinal plants present an untapped potential for novel molecular target discovery using systems science and OMICS biotechnology driven strategies. Since up to 40% of the world's poor people have no access to government health services, traditional and folk medicines are often the only therapeutics available to them. In this vein, North East (NE) India is recognized for its rich bioresources. As part of the Indo-Burma hotspot, it is regarded as an epicenter of biodiversity for several plants having myriad traditional uses, including medicinal use. However, the improvement of these valuable bioresources through molecular breeding strategies, for example, using genic microsatellites or Simple Sequence Repeats (SSRs) or Expressed Sequence Tags (ESTs)-derived SSRs has not been fully utilized in large scale to date. In this study, we identified a total of 47,700 microsatellites from 109,609 ESTs of 11 medicinal plants (pineapple, papaya, noyontara, bitter orange, bermuda brass, ratalu, barbados nut, mango, mulberry, lotus, and guduchi) having proven antidiabetic properties. A total of 58,159 primer pairs were designed for the non-redundant 8060 SSR-positive ESTs and putative functions were assigned to 4483 unique contigs. Among the identified microsatellites, excluding mononucleotide repeats, di-/trinucleotides are predominant, among which repeat motifs of AG/CT and AAG/CTT were most abundant. Similarity search of SSR containing ESTs and antidiabetic gene sequences revealed 11 microsatellites linked to antidiabetic genes in five plants. GO term enrichment analysis revealed a total of 80 enriched GO terms widely distributed in 53 biological processes, 17 molecular functions, and 10 cellular components associated with the 11 markers. The present study therefore provides concrete insights into the frequency and distribution of SSRs in important medicinal resources. The microsatellite markers reported here markedly add to

  14. Image analysis methods for tagged MRI cardiac studies

    NASA Astrophysics Data System (ADS)

    Guttman, Michael A.; Prince, Jerry L.

    1990-07-01

    Tracking of magnetic resonance (MR) tags in myocardial tissue promises to be an effective tool in the assessment of myocardial motion. The amount of data acquired is very large and the measurements are numerous and must be precise requiring automated tracking methods. We describe a hierarchy of image processing steps that estimate both the endocardial and epicardial boundaries of the left ventricle and also estimate the spines of radial tags that emanate outward from the left ventricular cavity. The first stage determines the position of the myocardial boundaries for each of 128 rays emanating from the origin. To counter the deleterious effects of noise and the presence of the tags when determining the boundary positions we use nonlinear filtering concepts from mathematical morphology together with a prion knowledge related to boundary smoothness to improve the estimates. The second stage estimates the tag spines by matching a template in a direction orthogonal to the expected tag direction. We show results on tagged images and discuss further research directions. 1.

  15. Chromosome-specific physical localisation of expressed sequence tag loci in Corchorus olitorius L.

    PubMed

    Joshi, A; Das, S K; Samanta, P; Paria, P; Sen, S K; Basu, A

    2014-11-01

    Jute (Corchorus spp.), as a natural fibre-producing species, ranks next only to cotton. Inadequate understanding of its genetic architecture is a major lacuna for genetic improvement of this crop in terms of yield and quality. Establishment of a physical map provides a genomic tool that helps in positional cloning of valuable genes. In this report, an attempt was initiated to study association and localisation of single copy expressed sequence tag (EST) loci in the genome of Corchorus olitorius. The chromosome-specific association of EST was determined based on the appearance of an extra signal for a single copy cDNA probe in mitotic interphase nuclei of specific trisomic(s) for fluorescence in situ hybridisation, and validated using a cDNA fragment of the 26S rRNA gene (600 bp) as molecular probe. The probe exhibited three signals in meiotic interphase nuclei of trisomic 5, instead of two as observed in diploids and other trisomics, indicating its association with chromosome 5. Subsequent hybridisation of the same probe on the pachytene chromosomes of diploids confirmed that 26S rRNA occupies the terminal end of the short arm of chromosome 5 in C. olitorius. Subsequently, chromosome-specific association of 63 single copy EST and their physical localisation were determined on chromosomes 2, 4, 5 and 7. The study describes chromosome-specific physical localisation of genes in jute. The approach used here could be a step towards construction of genome-wide physical maps for any recalcitrant plant species like jute. PMID:24628982

  16. Transmural Myocardial Strain in Mouse: Quantification of High-Resolution MR Tagging using HARP Analysis

    PubMed Central

    Zhong, Jia; Liu, Wei; Yu, Xin

    2009-01-01

    MR tagging allows noninvasive examination of regional myocardial function with high accuracy and reproducibility. Current tagging method is limited by low tagging resolution for accurate transmural strain quantification. Previously, a SPAMM-based method was proposed to increase the tagging resolution by combining two or more tagged images with different tagging grid positions. However, there has been limited application due to the challenge in image processing of multiple data sets. In the current study, we propose a HARP-based method for automated and fast analysis of high-resolution tagged images. First-order harmonic peaks from low tagging resolution images were combined to generate the composite second-order harmonic peak for strain computation. The combined images reached a tagging resolution of 0.3 mm. The proposed method was applied to the quantification of transmural myocardial wall strain in 7 normal C57BL/6 mice. Principal strains, as well as radial and circumferential strains, were quantified using the current method. PMID:19319888

  17. Twin Mitochondrial Sequence Analysis.

    PubMed

    Bouhlal, Yosr; Martinez, Selena; Gong, Henry; Dumas, Kevin; Shieh, Joseph T C

    2013-09-01

    When applying genome-wide sequencing technologies to disease investigation, it is increasingly important to resolve sequence variation in regions of the genome that may have homologous sequences. The human mitochondrial genome challenges interpretation given the potential for heteroplasmy, somatic variation, and homologous nuclear mitochondrial sequences (numts). Identical twins share the same mitochondrial DNA (mtDNA) from early life, but whether the mitochondrial sequence remains similar is unclear. We compared an adult monozygotic twin pair using high throughput-sequencing and evaluated variants with primer extension and mitochondrial pre-enrichment. Thirty-seven variants were shared between the twin individuals, and the variants were verified on the original genomic DNA. These studies support highly identical genetic sequence in this case. Certain low-level variant calls were of high quality and homology to the mitochondrial DNA, and they were further evaluated. When we assessed calls in pre-enriched mitochondrial DNA templates, we found that these may represent numts, which can be differentiated from mtDNA variation. We conclude that twin identity extends to mitochondrial DNA, and it is critical to differentiate between numts and mtDNA in genome sequencing, particularly since significant heteroplasmy could influence genome interpretation. Further studies on mtDNA and numts will aid in understanding how variation occurs and persists. PMID:24040623

  18. Regional localisation of 19 brain expressed sequence tags to human chromosome 11 using PCR amplification of somatic cell hybrid DNAs.

    PubMed

    Slorach, E M; Polymeropoulos, M H; Evans, K L; Seawright, A; Fletcher, J M; Porteous, D J; Brookes, A J

    1995-01-01

    Expressed sequence tags (ESTs) provide an efficient route to the identification of genes involved in normal development and in disease. PCR amplification of somatic cell hybrid DNAs was used to localise 22 brain-derived ESTs to subregions of human chromosome 11. Problems encountered with the standardised PCR conditions were overcome by optimising the annealing temperatures and the use of "touchdown" PCR. Amplification of the correct target sequence allowed the mapping of 19 ESTs, 8 to the short arm and 11 to the long arm of chromosome 11. No definitive localisation could be determined for the three remaining ESTs. PMID:7736794

  19. A direct method for regiospecific analysis of TAG using alpha-MAG.

    PubMed

    Turon, F; Bachain, P; Caro, Y; Pina, M; Graille, J

    2002-08-01

    An analytical procedure was developed for regiodistribution analysis of TAG using alpha-MAG prepared by an ethyl magnesium bromide deacylation. In the present communication, the deacylation procedure is shown to lead to representative alpha-MAG, allowing the composition of the native TAG in the alpha-position to be determined directly. The composition in the beta-position can then be estimated from the composition of the alpha-MAG and TAG according to the formula 3 x TAG - 2 x alpha-MAG. The estimates are superior to those obtained using the alpha,beta-DAG and Brockerhoff calculations as they come closer to the theoretical value and have smaller SD. The present procedure, first demonstrated on a synthetic TAG, was then successfully applied to the analysis of borage oil, milkfat, and tuna oil. PMID:12371754

  20. Chasing migration genes: a brain expressed sequence tag resource for summer and migratory monarch butterflies (Danaus plexippus).

    PubMed

    Zhu, Haisun; Casselman, Amy; Reppert, Steven M

    2008-01-01

    North American monarch butterflies (Danaus plexippus) undergo a spectacular fall migration. In contrast to summer butterflies, migrants are juvenile hormone (JH) deficient, which leads to reproductive diapause and increased longevity. Migrants also utilize time-compensated sun compass orientation to help them navigate to their overwintering grounds. Here, we describe a brain expressed sequence tag (EST) resource to identify genes involved in migratory behaviors. A brain EST library was constructed from summer and migrating butterflies. Of 9,484 unique sequences, 6068 had positive hits with the non-redundant protein database; the EST database likely represents approximately 52% of the gene-encoding potential of the monarch genome. The brain transcriptome was cataloged using Gene Ontology and compared to Drosophila. Monarch genes were well represented, including those implicated in behavior. Three genes involved in increased JH activity (allatotropin, juvenile hormone acid methyltransfersase, and takeout) were upregulated in summer butterflies, compared to migrants. The locomotion-relevant turtle gene was marginally upregulated in migrants, while the foraging and single-minded genes were not differentially regulated. Many of the genes important for the monarch circadian clock mechanism (involved in sun compass orientation) were in the EST resource, including the newly identified cryptochrome 2. The EST database also revealed a novel Na+/K+ ATPase allele predicted to be more resistant to the toxic effects of milkweed than that reported previously. Potential genetic markers were identified from 3,486 EST contigs and included 1599 double-hit single nucleotide polymorphisms (SNPs) and 98 microsatellite polymorphisms. These data provide a template of the brain transcriptome for the monarch butterfly. Our "snap-shot" analysis of the differential regulation of candidate genes between summer and migratory butterflies suggests that unbiased, comprehensive transcriptional

  1. Identification of Disulfide Bonds in Protein Proteolytic Degradation Products Using de Novo-Protein Unique Sequence Tags Approach

    SciTech Connect

    Shen, Yufeng; Tolic, Nikola; Purvine, Samuel O.; Smith, Richard D.

    2010-08-01

    Disulfide bonds are a form of posttranslational modification that often determines protein structure(s) and function(s). In this work, we report a mass spectrometry method for identification of disulfides in degradation products of proteins, and specifically endogenous peptides in the human blood plasma peptidome. LC-Fourier transform tandem mass spectrometry (FT MS/MS) was used for acquiring mass spectra that were de novo sequenced and then searched against the IPI human protein database. Through the use of unique sequence tags (UStags) we unambiguously correlated the spectra to specific database proteins. Examination of the UStags’ prefix and/or suffix sequences that contain cysteine(s) in conjunction with sequences of the UStags-specified database proteins is shown to enable the unambigious determination of disulfide bonds. Using this method, we identified the intermolecular and intramolecular disulfides in human blood plasma peptidome peptides that have molecular weights of up to ~10 kDa.

  2. Identification of disulfide bonds in protein proteolytic degradation products using de novo-protein unique sequence tags approach.

    PubMed

    Shen, Yufeng; Tolić, Nikola; Purvine, Samuel O; Smith, Richard D

    2010-08-01

    Disulfide bonds are a form of post-translational modification that often determines protein structure(s) and function(s). In this work, we report a mass spectrometry method for identification of disulfides in degradation products of proteins, specifically endogenous peptides in the human blood plasma peptidome. LC-Fourier transform tandem mass spectrometry (FT MS/MS) was used for acquiring mass spectra that were de novo sequenced and then searched against the IPI human protein database. Through the use of unique sequence tags (UStags), we unambiguously correlated the spectra to specific database proteins. Examination of the UStags' prefix and/or suffix sequences that contain cysteine(s) in conjunction with sequences of the UStags-specified database proteins is shown to enable the unambigious determination of disulfide bonds. Using this method, we identified the intermolecular and intramolecular disulfides in human blood plasma peptidome peptides that have molecular weights of up to approximately 10 kDa. PMID:20590115

  3. Expressed sequence tags (ESTs) from immune tissues of turbot (Scophthalmus maximus) challenged with pathogens

    PubMed Central

    Pardo, Belén G; Fernández, Carlos; Millán, Adrián; Bouza, Carmen; Vázquez-López, Araceli; Vera, Manuel; Alvarez-Dios, José A; Calaza, Manuel; Gómez-Tato, Antonio; Vázquez, María; Cabaleiro, Santiago; Magariños, Beatriz; Lemos, Manuel L; Leiro, José M; Martínez, Paulino

    2008-01-01

    Background The turbot (Scophthalmus maximus; Scophthalmidae; Pleuronectiformes) is a flatfish species of great relevance for marine aquaculture in Europe. In contrast to other cultured flatfish, very few genomic resources are available in this species. Aeromonas salmonicida and Philasterides dicentrarchi are two pathogens that affect turbot culture causing serious economic losses to the turbot industry. Little is known about the molecular mechanisms for disease resistance and host-pathogen interactions in this species. In this work, thousands of ESTs for functional genomic studies and potential markers linked to ESTs for mapping (microsatellites and single nucleotide polymorphisms (SNPs)) are provided. This information enabled us to obtain a preliminary view of regulated genes in response to these pathogens and it constitutes the basis for subsequent and more accurate microarray analysis. Results A total of 12584 cDNAs partially sequenced from three different cDNA libraries of turbot (Scophthalmus maximus) infected with Aeromonas salmonicida, Philasterides dicentrarchi and from healthy fish were analyzed. Three immune-relevant tissues (liver, spleen and head kidney) were sampled at several time points in the infection process for library construction. The sequences were processed into 9256 high-quality sequences, which constituted the source for the turbot EST database. Clustering and assembly of these sequences, revealed 3482 different putative transcripts, 1073 contigs and 2409 singletons. BLAST searches with public databases detected significant similarity (e-value ≤ 1e-5) in 1766 (50.7%) sequences and 816 of them (23.4%) could be functionally annotated. Two hundred three of these genes (24.9%), encoding for defence/immune-related proteins, were mostly identified for the first time in turbot. Some ESTs showed significant differences in the number of transcripts when comparing the three libraries, suggesting regulation in response to these pathogens. A total of

  4. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw. PMID:20478825

  5. Development and Characterization of 1,827 Expressed Sequence Tag-Derived Simple Sequence Repeat Markers for Ramie (Boehmeria nivea L. Gaud)

    PubMed Central

    Liu, Touming; Zhu, Siyuan; Fu, Lili; Tang, Qingming; Yu, Yongting; Chen, Ping; Luan, Mingbao; Wang, Changbiao; Tang, Shouwei

    2013-01-01

    Ramie (Boehmeria nivea L. Gaud) is one of the most important natural fiber crops, and improvement of fiber yield and quality is the main goal in efforts to breed superior cultivars. However, efforts aimed at enhancing the understanding of ramie genetics and developing more effective breeding strategies have been hampered by the shortage of simple sequence repeat (SSR) markers. In our previous study, we had assembled de novo 43,990 expressed sequence tags (ESTs). In the present study, we searched these previously assembled ESTs for SSRs and identified 1,685 ESTs (3.83%) containing 1,878 SSRs. Next, we designed 1,827 primer pairs complementary to regions flanking these SSRs, and these regions were designated as SSR markers. Among these markers, dinucleotide and trinucleotide repeat motifs were the most abundant types (36.4% and 36.3%, respectively), whereas tetranucleotide, pentanucleotide, and hexanucleotide motifs represented <10% of the markers. The motif AG/CT was the most abundant, accounting for 28.74% of the markers. One hundred EST-SSR markers (97 SSRs located in genes encoding transcription factors and 3 SSRs in genes encoding cellulose synthases) were amplified using polymerase chain reaction for detecting 24 ramie varieties. Of these 100 markers, 98 markers were successfully amplified and 81 markers were polymorphic, with 2–6 alleles among the 24 varieties. Analysis of the genetic diversity of all 24 varieties revealed similarity coefficients that ranged from 0.51 to 0.80. The EST-SSRs developed in this study represent the first large-scale development of SSR markers for ramie. These SSR markers could be used for development of genetic and physical maps, quantitative trait loci mapping, genetic diversity studies, association mapping, and cultivar fingerprinting. PMID:23565230

  6. Sequence analysis of diacylglycerol acyltransferases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Diacylglycerol acyltransferases (DGATs) catalyze the final step of triacylglycerol (TAG) biosynthesis in eukaryotes. DGATs esterify sn-1,2-diacylglycerol with a long-chain fatty acyl-CoA. Plants and animals deficient in DGATs accumulate less TAG and over-expression of DGATs increases TAG. DGAT knock...

  7. Existence of microsatellites in expressed sequence tags of common carp ( Cyprinus carpio L.) available in GenBank dbEST database

    NASA Astrophysics Data System (ADS)

    Jingjie, Hu; Xiaolong, Wang; Xiaoli, Hu; Zhenmin, Bao

    2006-01-01

    Common carp expressed sequence tags (ESTs) were analyzed for the existence of microsatellites, or simple sequence repeats (SSRs). In the NCBI dbEST database, a total of 10612 sequences were registered before December 31, 2004. A complete search of 2-6 nucleotide microsatellites resulted in the identification of 513 SSR-containing ESTs, accounting for 4.8% of the total. Cluster analysis indicated that 73 sequences of SSR-containing ESTs fell into 27 groups and the remaining 440 ESTs were indenpendent. A total of 467 unique SSR-containing ESTs were identified. These EST-SSRs contained a variety of simple sequence types, and di- and tri-nucleotide repeats were the most abundant, accounting for 42.1% and 27.9% of the whole, respectively. Of the dinucleotide repeats, CA/TG was the most abundant, followed by GA/TC. BLASTx search showed that 38.1% of the SSR loci could be associated with genes or proteins of known or unknown function. BLASTx searches of SSR-containing ESTs also showed high frequencies (98/179) of hits on zebrafish sequences.

  8. Existence of microsatellites in expressed sequence tags of common carp ( Cyprinus carpio L.) available in GenBank dbEST database

    NASA Astrophysics Data System (ADS)

    Hu, Jingjie; Wang, Xiaolong; Hu, Xiaoli; Bao, Zhenmin

    2006-01-01

    Common carp expressed sequence tags (ESTs) were analyzed for the existence of microsatellites, or simple sequence repeats (SSRs). In the NCBI dbEST database, a total of 10612 sequences were registered before December 31, 2004. A complete search of 2 6 nucleotide microsatellites resulted in the identification of 513 SSR-containing ESTs, accounting for 4.8% of the total. Cluster analysis indicated that 73 sequences of SSR-containing ESTs fell into 27 groups and the remaining 440 ESTs were indenpendent. A total of 467 unique SSR-containing ESTs were identified. These EST-SSRs contained a variety of simple sequence types, and di- and tri-nucleotide repeats were the most abundant, accounting for 42.1% and 27.9% of the whole, respectively. Of the dinucleotide repeats, CA/TG was the most abundant, followed by GA/TC. BLASTx search showed that 38.1% of the SSR loci could be associated with genes or proteins of known or unknown function. BLASTx searches of SSR-containing ESTs also showed high frequencies (98/179) of hits on zebrafish sequences.

  9. Gene expression profiling of coelomic cells and discovery of immune-related genes in the earthworm, Eisenia andrei, using expressed sequence tags.

    PubMed

    Tak, Eun Sik; Cho, Sung-Jin; Park, Soon Cheol

    2015-01-01

    The coelomic cells of the earthworm consist of leukocytes, chlorogocytes, and coelomocytes, which play an important role in innate immunity reactions. To gain insight into the expression profiles of coelomic cells of the earthworm, Eisenia andrei, we analyzed 1151 expressed sequence tags (ESTs) derived from the cDNA library of the coelomic cells. Among the 1151 ESTs analyzed, 493 ESTs (42.8%) showed a significant similarity to known genes and represented 164 unique genes, of which 93 ESTs were singletons and 71 ESTs manifested as two or more ESTs. From the 164 unique genes sequenced, we found 24 immune-related and cell defense genes. Furthermore, real-time PCR analysis showed that levels of lysenin-related proteins mRNA in coelomic cells of E. andrei were upregulated after the injection of Bacillus subtilis bacteria. This EST data-set would provide a valuable resource for future researches of earthworm immune system. PMID:25496401

  10. Miniaturised wireless smart tag for optical chemical analysis applications.

    PubMed

    Steinberg, Matthew D; Kassal, Petar; Tkalčec, Biserka; Murković Steinberg, Ivana

    2014-01-01

    A novel miniaturised photometer has been developed as an ultra-portable and mobile analytical chemical instrument. The low-cost photometer presents a paradigm shift in mobile chemical sensor instrumentation because it is built around a contactless smart card format. The photometer tag is based on the radio-frequency identification (RFID) smart card system, which provides short-range wireless data and power transfer between the photometer and a proximal reader, and which allows the reader to also energise the photometer by near field electromagnetic induction. RFID is set to become a key enabling technology of the Internet-of-Things (IoT), hence devices such as the photometer described here will enable numerous mobile, wearable and vanguard chemical sensing applications in the emerging connected world. In the work presented here, we demonstrate the characterisation of a low-power RFID wireless sensor tag with an LED/photodiode-based photometric input. The performance of the wireless photometer has been tested through two different model analytical applications. The first is photometry in solution, where colour intensity as a function of dye concentration was measured. The second is an ion-selective optode system in which potassium ion concentrations were determined by using previously well characterised bulk optode membranes. The analytical performance of the wireless photometer smart tag is clearly demonstrated by these optical absorption-based analytical experiments, with excellent data agreement to a reference laboratory instrument. PMID:24274311

  11. Analytic signal phase-based myocardial motion estimation in tagged MRI sequences by a bilinear model and motion compensation.

    PubMed

    Wang, Liang; Basarab, Adrian; Girard, Patrick R; Croisille, Pierre; Clarysse, Patrick; Delachartre, Philippe

    2015-08-01

    Different mathematical tools, such as multidimensional analytic signals, allow for the calculation of 2D spatial phases of real-value images. The motion estimation method proposed in this paper is based on two spatial phases of the 2D analytic signal applied to cardiac sequences. By combining the information of these phases issued from analytic signals of two successive frames, we propose an analytical estimator for 2D local displacements. To improve the accuracy of the motion estimation, a local bilinear deformation model is used within an iterative estimation scheme. The main advantages of our method are: (1) The phase-based method allows the displacement to be estimated with subpixel accuracy and is robust to image intensity variation in time; (2) Preliminary filtering is not required due to the bilinear model. The proposed algorithm, integrating phase-based optical flow motion estimation and the combination of global motion compensation with local bilinear transform, allows spatio-temporal cardiac motion analysis, e.g. strain and dense trajectory estimation over the cardiac cycle. Results from 7 realistic simulated tagged magnetic resonance imaging (MRI) sequences show that our method is more accurate compared with state-of-the-art method for cardiac motion analysis and with another differential approach from the literature. The motion estimation errors (end point error) of the proposed method are reduced by about 33% compared with that of the two methods. In our work, the frame-to-frame displacements are further accumulated in time, to allow for the calculation of myocardial Lagrangian cardiac strains and point trajectories. Indeed, from the estimated trajectories in time on 11 in vivo data sets (9 patients and 2 healthy volunteers), the shape of myocardial point trajectories belonging to pathological regions are clearly reduced in magnitude compared with the ones from normal regions. Myocardial point trajectories, estimated from our phase-based analytic

  12. Exploring the Structure of Library and Information Science Web Space Based on Multivariate Analysis of Social Tags

    ERIC Educational Resources Information Center

    Joo, Soohyung; Kipp, Margaret E. I.

    2015-01-01

    Introduction: This study examines the structure of Web space in the field of library and information science using multivariate analysis of social tags from the Website, Delicious.com. A few studies have examined mathematical modelling of tags, mainly examining tagging in terms of tripartite graphs, pattern tracing and descriptive statistics. This…

  13. Image analysis for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Palaniappan, Kannappan; Huang, Thomas S.

    1991-07-01

    There is a great deal of interest in automating the process of DNA (deoxyribonucleic acid) sequencing to support the analysis of genomic DNA such as the Human and Mouse Genome projects. In one class of gel-based sequencing protocols autoradiograph images are generated in the final step and usually require manual interpretation to reconstruct the DNA sequence represented by the image. The need to handle a large volume of sequence information necessitates automation of the manual autoradiograph reading step through image analysis in order to reduce the length of time required to obtain sequence data and reduce transcription errors. Various adaptive image enhancement, segmentation and alignment methods were applied to autoradiograph images. The methods are adaptive to the local characteristics of the image such as noise, background signal, or presence of edges. Once the two-dimensional data is converted to a set of aligned one-dimensional profiles waveform analysis is used to determine the location of each band which represents one nucleotide in the sequence. Different classification strategies including a rule-based approach are investigated to map the profile signals, augmented with the original two-dimensional image data as necessary, to textual DNA sequence information.

  14. Evidence from sequence-tagged-site markers of a recent progenitor-derivative species pair in conifers

    PubMed Central

    Perron, Martin; Perry, Daniel J.; Andalo, Christophe; Bousquet, Jean

    2000-01-01

    Black spruce (Picea mariana [B.S.P.] Mill.) and red spruce (Picea rubens Sarg.) are two conifer species known to hybridize naturally in northeastern North America. We hypothesized that there is a progenitor-derivative relationship between these two taxa and conducted a genetic investigation by using sequence-tagged-site markers of expressed genes. Based on the 26 sequence-tagged-site loci assayed in this study, the unbiased genetic identity between the two taxa was quite high with a value of 0.920. The mean number of polymorphic loci, the mean number of alleles per polymorphic locus, and the average observed heterozygosity were lower in red spruce (P = 35%, AP = 2.1, Ho = 0.069) than in black spruce (P = 54%, AP = 2.9, Ho = 0.103). No unique alleles were found in red spruce, and the observed patterns of allele distribution indicated that the genetic diversity of red spruce was essentially a subset of that found in black spruce. When considered in combination with ecological evidence and simulation results, these observations clearly support the existence of a progenitor-derivative relationship and suggest that the reduced level of genetic diversity in red spruce may result from allopatric speciation through glaciation-induced isolation of a preexisting black spruce population during the Pleistocene era. Our observations signal a need for a thorough reexamination of several conifer species complexes in which natural hybridization is known to occur. PMID:11016967

  15. Micro- and minisatellite-expressed sequence tag (EST) markers discriminate between populations of Rhipicephalus appendiculatus.

    PubMed

    Kanduma, Esther G; Mwacharo, Joram M; Sunter, Jack D; Nzuki, Inosters; Mwaura, Stephen; Kinyanjui, Peter W; Kibe, Michael; Heyne, Heloise; Hanotte, Olivier; Skilton, Robert A; Bishop, Richard P

    2012-06-01

    Biological differences, including vector competence for the protozoan parasite Theileria parva have been reported among populations of Rhipicephalus appendiculatus (Acari: Ixodidae) from different geographic regions. However, the genetic diversity and population structure of this important tick vector remain unknown due to the absence of appropriate genetic markers. Here, we describe the development and evaluation of a panel of EST micro- and minisatellite markers to characterize the genetic diversity within and between populations of R. appendiculatus and other rhipicephaline species. Sixty-six micro- and minisatellite markers were identified through analysis of the R. appendiculatus Gene Index (RaGI) EST database and selected bacterial artificial chromosome (BAC) sequences. These were used to genotype 979 individual ticks from 10 field populations, 10 laboratory-bred stocks, and 5 additional Rhipicephalus species. Twenty-nine markers were polymorphic and therefore informative for genetic studies while 6 were monomorphic. Primers designed from the remaining 31 loci did not reliably generate amplicons. The 29 polymorphic markers discriminated populations of R. appendiculatus and also 4 other Rhipicephalus species, but not R. zambeziensis. The percentage Principal Component Analysis (PCA) implemented using Multiple Co-inertia Analysis (MCoA) clustered populations of R. appendiculatus into 2 groups. Individual markers however differed in their ability to generate the reference typology using the MCoA approach. This indicates that different panels of markers may be required for different applications. The 29 informative polymorphic micro- and minisatellite markers are the first available tools for the analysis of the phylogeography and population genetics of R. appendiculatus. PMID:22789728

  16. SNP discovery using Paired-End RAD-tag sequencing on pooled genomic DNA of Sisymbrium austriacum (Brassicaceae).

    PubMed

    Vandepitte, K; Honnay, O; Mergeay, J; Breyne, P; Roldán-Ruiz, I; De Meyer, T

    2013-03-01

    Single nucleotide polymorphisms SNPs are rapidly replacing anonymous markers in population genomic studies, but their use in non model organisms is hampered by the scarcity of cost-effective approaches to uncover genome-wide variation in a comprehensive subset of individuals. The screening of one or only a few individuals induces ascertainment bias. To discover SNPs for a population genomic study of the Pyrenean rocket (Sisymbrium austriacum subsp. chrysanthum), we undertook a pooled RAD-PE (Restriction site Associated DNA Paired-End sequencing) approach. RAD tags were generated from the PstI-digested pooled genomic DNA of 12 individuals sampled across the species distribution range and paired-end sequenced using Illumina technology to produce ~24.5 Mb of sequences, covering ~7% of the specie's genome. Sequences were assembled into ~76 000 contigs with a mean length of 323 bp (N(50)  = 357 bp, sequencing depth = 24x). In all, >15 000 SNPs were called, of which 47% were annotated in putative genic regions based on homology with the Arabidopsis thaliana genome. Gene ontology (GO) slim categorization demonstrated that the identified SNPs covered extant genic variation well. The validation of 300 SNPs on a larger set of individuals using a KASPar assay underpinned the utility of pooled RAD-PE as an inexpensive genome-wide SNP discovery technique (success rate: 87%). In addition to SNPs, we discovered >600 putative SSR markers. PMID:23231662

  17. Precipitation recycling in West Africa - regional modeling, evaporation tagging and atmospheric water budget analysis

    NASA Astrophysics Data System (ADS)

    Arnault, Joel; Kunstmann, Harald; Knoche, Hans-Richard

    2015-04-01

    Many numerical studies have shown that the West African monsoon is highly sensitive to the state of the land surface. It is however questionable to which extend a local change of land surface properties would affect the local climate, especially with respect to precipitation. This issue is traditionally addressed with the concept of precipitation recycling, defined as the contribution of local surface evaporation to local precipitation. For this study the West African monsoon has been simulated with the Weather Research and Forecasting (WRF) model using explicit convection, for the domain (1°S-21°N, 18°W-14°E) at a spatial resolution of 10 km, for the period January-October 2013, and using ERA-Interim reanalyses as driving data. This WRF configuration has been selected for its ability to simulate monthly precipitation amounts and daily histograms close to TRMM (Tropical Rainfall Measuring Mission) data. In order to investigate precipitation recycling in this WRF simulation, surface evaporation tagging has been implemented in the WRF source code as well as the budget of total and tagged atmospheric water. Surface evaporation tagging consists in duplicating all water species and the respective prognostic equations in the source code. Then, tagged water species are set to zero at the lateral boundaries of the simulated domain (no inflow of tagged water vapor), and tagged surface evaporation is considered only in a specified region. All the source terms of the prognostic equations of total and tagged water species are finally saved in the outputs for the budget analysis. This allows quantifying the respective contribution of total and tagged atmospheric water to atmospheric precipitation processes. The WRF simulation with surface evaporation tagging and budgets has been conducted two times, first with a 100 km2 tagged region (11-12°N, 1-2°W), and second with a 1000 km2 tagged region (7-16°N, 6°W -3°E). In this presentation we will investigate hydro

  18. FAST: FAST Analysis of Sequences Toolbox.

    PubMed

    Lawrence, Travis J; Kauffman, Kyle T; Amrine, Katherine C H; Carper, Dana L; Lee, Raymond S; Becich, Peter J; Canales, Claudia J; Ardell, David H

    2015-01-01

    FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought. PMID:26042145

  19. FAST: FAST Analysis of Sequences Toolbox

    PubMed Central

    Lawrence, Travis J.; Kauffman, Kyle T.; Amrine, Katherine C. H.; Carper, Dana L.; Lee, Raymond S.; Becich, Peter J.; Canales, Claudia J.; Ardell, David H.

    2015-01-01

    FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought. PMID:26042145

  20. Protein identities from 'Graphocephala atropunctata' expressed sequence tags: Expanding leafhopper vector biology

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Heat shock proteins and 44 protein sequences from the blue-green sharpshooter, BGSS, were produced and identified. The sequences were submitted and published under accession numbers: DQ445499-DQ445542, in the National Center for Biotechnology Information, NCBI, Public Database. The blue-green sharps...

  1. Ribosomal proteins and expressed sequence tags from Lysiphlebus testaceipes(Hymenoptera: Aphidiidae)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A dataset containing 101 putative ribosomal protein (RP) sequences is provided for the aphid parasitoid, Lysiphlebus testaceipes. These data were obtained as a subset from a cDNA library constructed from adult L. testaceipes, and represent one of the largest complete sets of cytoplasmic RP sequence...

  2. Transient Analysis Generator /TAG/ simulates behavior of large class of electrical networks

    NASA Technical Reports Server (NTRS)

    Thomas, W. J.

    1967-01-01

    Transient Analysis Generator program simulates both transient and dc steady-state behavior of a large class of electrical networks. It generates a special analysis program for each circuit described in an easily understood and manipulated programming language. A generator or preprocessor and a simulation system make up the TAG system.

  3. Statistical analysis of nucleotide sequences.

    PubMed Central

    Stückle, E E; Emmrich, C; Grob, U; Nielsen, P J

    1990-01-01

    In order to scan nucleic acid databases for potentially relevant but as yet unknown signals, we have developed an improved statistical model for pattern analysis of nucleic acid sequences by modifying previous methods based on Markov chains. We demonstrate the importance of selecting the appropriate parameters in order for the method to function at all. The model allows the simultaneous analysis of several short sequences with unequal base frequencies and Markov order k not equal to 0 as is usually the case in databases. As a test of these modifications, we show that in E. coli sequences there is a bias against palindromic hexamers which correspond to known restriction enzyme recognition sites. PMID:2251125

  4. [Multilocus sequence typing (MLST) analysis].

    PubMed

    Matsumura, Yasufumi

    2013-12-01

    Multilocus sequence typing (MLST) analysis has been emerging as a powerful tool for genotyping specific bacterial species. MLST utilizes internal fragments of multiple housekeeping genes and the combination of each allele defines the sequence type for each isolate. MLST databases contain reference data and are freely accessible via internet websites. The standard method for investigating short-term hospital outbreaks is still pulse-field gel-electrophoresis and MLST analysis is not a substitute. However, analysis of sequence types and clonal complexes (closely related sequence types) enables identification and understanding of a specific clone that is widely spreading among drug-resistant organisms, or a key clone that is important for evolution of the organism. In the case of Escherichia coli, CTX-M-15 or CTX-M-14 extended-spectrum beta-lactamase producing ST131 clone has emerged and spread globally in the last 10 years. MLST analysis is an unambiguous procedure and is becoming a common typing method to characterize isolates. PMID:24605545

  5. Toward a physical map of Drosophila buzzatii. Use of randomly amplified polymorphic dna polymorphisms and sequence-tagged site landmarks.

    PubMed Central

    Laayouni, H; Santos, M; Fontdevila, A

    2000-01-01

    We present a physical map based on RAPD polymorphic fragments and sequence-tagged sites (STSs) for the repleta group species Drosophila buzzatii. One hundred forty-four RAPD markers have been used as probes for in situ hybridization to the polytene chromosomes, and positive results allowing the precise localization of 108 RAPDs were obtained. Of these, 73 behave as effectively unique markers for physical map construction, and in 9 additional cases the probes gave two hybridization signals, each on a different chromosome. Most markers (68%) are located on chromosomes 2 and 4, which partially agree with previous estimates on the distribution of genetic variation over chromosomes. One RAPD maps close to the proximal breakpoint of inversion 2z(3) but is not included within the inverted fragment. However, it was possible to conclude from this RAPD that the distal breakpoint of 2z(3) had previously been wrongly assigned. A total of 39 cytologically mapped RAPDs were converted to STSs and yielded an aggregate sequence of 28,431 bp. Thirty-six RAPDs (25%) did not produce any detectable hybridization signal, and we obtained the DNA sequence from three of them. Further prospects toward obtaining a more developed genetic map than the one currently available for D. buzzatii are discussed. PMID:11102375

  6. RefNetBuilder: a platform for construction of integrated reference gene regulatory networks from expressed sequence tags

    PubMed Central

    2011-01-01

    Background Gene Regulatory Networks (GRNs) provide integrated views of gene interactions that control biological processes. Many public databases contain biological interactions extracted from experimentally validated literature reports, but most furnish only information for a few genetic model organisms. In order to provide a bioinformatic tool for researchers who work with non-model organisms, we developed RefNetBuilder, a new platform that allows construction of putative reference pathways or GRNs from expressed sequence tags (ESTs). Results RefNetBuilder was designed to have the flexibility to extract and archive pathway or GRN information from public databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG). It features sequence alignment tools such as BLAST to allow mapping ESTs to pathways and GRNs in model organisms. A scoring algorithm was incorporated to rank and select the best match for each query EST. We validated RefNetBuilder using DNA sequences of Caenorhabditis elegans, a model organism having manually curated KEGG pathways. Using the earthworm Eisenia fetida as an example, we demonstrated the functionalities and features of RefNetBuilder. Conclusions The RefNetBuilder provides a standalone application for building reference GRNs for non-model organisms on a number of operating system platforms with standard desktop computer hardware. As a new bioinformatic tool aimed for constructing putative GRNs for non-model organisms that have only ESTs available, RefNetBuilder is especially useful to explore pathway- or network-related information in these organisms. PMID:22166047

  7. In silico identification of conserved microRNAs and their target transcripts from expressed sequence tags of three earthworm species.

    PubMed

    Gong, Ping; Xie, Fuliang; Zhang, Baohong; Perkins, Edward J

    2010-12-01

    MicroRNAs are a recently identified class of small regulatory RNAs that target more than 30% protein-coding genes. Elevating evidence shows that miRNAs play a critical role in many biological processes, including developmental timing, tissue differentiation, and response to chemical exposure. In this study, we applied a computational approach to analyze expressed sequence tags, and identified 32 miRNAs belonging to 22 miRNA families, in three earthworm species Eisenia fetida, Eisenia andrei, and Lumbricus rubellus. These newly identified earthworm miRNAs possess a difference of 2-4 nucleotides from their homologous counterparts in Caenorhabditis elegans. They also share similar features with other known animal miRNAs, for instance, the nucleotide U being dominant in both mature and pre-miRNA sequences, particularly in the first position of mature miRNA sequences at the 5' end. The newly identified earthworm miRNAs putatively regulate mRNA genes that are involved in many important biological processes and pathways related to development, growth, locomotion, and reproduction as well as response to stresses, particularly oxidative stress. Future efforts will focus on experimental validation of their presence and target mRNA genes to further elucidate their biological functions in earthworms. PMID:21030313

  8. RSAT: regulatory sequence analysis tools.

    PubMed

    Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques

    2008-07-01

    The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published. PMID:18495751

  9. The Short ITS2 Sequence Serves as an Efficient Taxonomic Sequence Tag in Comparison with the Full-Length ITS

    PubMed Central

    Han, Jianping; Zhu, Yingjie; Chen, Xiaochen; Liao, Baoshen; Yao, Hui; Song, Jingyuan; Chen, Shilin; Meng, Fanyun

    2013-01-01

    An ideal DNA barcoding region should be short enough to be amplified from degraded DNA. In this paper, we discuss the possibility of using a short nuclear DNA sequence as a barcode to identify a wide range of medicinal plant species. First, the PCR and sequencing success rates of ITS and ITS2 were evaluated based entirely on materials from dry medicinal product and herbarium voucher specimens, including some samples collected back to 90 years ago. The results showed that ITS2 could recover 91% while ITS could recover only 23% efficiency of PCR and sequencing by using one pair of primer. Second, 12861 ITS and ITS2 plant sequences were used to compare the identification efficiency of the two regions. Four identification criteria (BLAST, inter- and intradivergence Wilcoxon signed rank tests, and TaxonDNA) were evaluated. Our results supported the hypothesis that ITS2 can be used as a minibarcode to effectively identify species in a wide variety of specimens and medicinal materials. PMID:23484151

  10. Random Tagging Genotyping by Sequencing (rtGBS), an Unbiased Approach to Locate Restriction Enzyme Sites across the Target Genome

    PubMed Central

    Hilario, Elena; Barron, Lorna; Deng, Cecilia H.; Datson, Paul M.; Davy, Marcus W.; Storey, Roy D.

    2015-01-01

    Genotyping by sequencing (GBS) is a restriction enzyme based targeted approach developed to reduce the genome complexity and discover genetic markers when a priori sequence information is unavailable. Sufficient coverage at each locus is essential to distinguish heterozygous from homozygous sites accurately. The number of GBS samples able to be pooled in one sequencing lane is limited by the number of restriction sites present in the genome and the read depth required at each site per sample for accurate calling of single-nucleotide polymorphisms. Loci bias was observed using a slight modification of the Elshire et al. method: some restriction enzyme sites were represented in higher proportions while others were poorly represented or absent. This bias could be due to the quality of genomic DNA, the endonuclease and ligase reaction efficiency, the distance between restriction sites, the preferential amplification of small library restriction fragments, or bias towards cluster formation of small amplicons during the sequencing process. To overcome these issues, we have developed a GBS method based on randomly tagging genomic DNA (rtGBS). By randomly landing on the genome, we can, with less bias, find restriction sites that are far apart, and undetected by the standard GBS (stdGBS) method. The study comprises two types of biological replicates: six different kiwifruit plants and two independent DNA extractions per plant; and three types of technical replicates: four samples of each DNA extraction, stdGBS vs. rtGBS methods, and two independent library amplifications, each sequenced in separate lanes. A statistically significant unbiased distribution of restriction fragment size by rtGBS showed that this method targeted 49% (39,145) of BamH I sites shared with the reference genome, compared to only 14% (11,513) by stdGBS. PMID:26633193