These are representative sample records from Science.gov related to your search topic.
For comprehensive and current results, perform a real-time search at Science.gov.
1

FUNCTIONAL ANNOTATION OF OIL PALM GENES USING AN AUTOMATED BIOINFORMATICS APPROACH FUNCTIONAL ANNOTATION OF OIL PALM  

E-print Network

FUNCTIONAL ANNOTATION OF OIL PALM GENES USING AN AUTOMATED BIOINFORMATICS APPROACH 35 FUNCTIONAL ANNOTATION OF OIL PALM GENES USING AN AUTOMATED BIOINFORMATICS APPROACH LAURA B WILLIS*; PHILIP A LESSARD sequence, we have developed bioinformatics tools that can be run on a desktop computer and save significant

Sinskey, Anthony J.

2

Gene Coexpression Network Analysis as a Source of Functional Annotation for Rice Genes  

PubMed Central

With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa) gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional annotation of those modules. Additionally, the expression patterns of genes across the treatments/conditions of an expression experiment comprise a second form of useful annotation. PMID:21799793

Childs, Kevin L.; Davidson, Rebecca M.; Buell, C. Robin

2011-01-01

3

Annotation Enrichment Analysis: An Alternative Method for Evaluating the Functional Properties of Gene Sets  

NASA Astrophysics Data System (ADS)

Gene annotation databases (compendiums maintained by the scientific community that describe the biological functions performed by individual genes) are commonly used to evaluate the functional properties of experimentally derived gene sets. Overlap statistics, such as Fishers Exact test (FET), are often employed to assess these associations, but don't account for non-uniformity in the number of genes annotated to individual functions or the number of functions associated with individual genes. We find FET is strongly biased toward over-estimating overlap significance if a gene set has an unusually high number of annotations. To correct for these biases, we develop Annotation Enrichment Analysis (AEA), which properly accounts for the non-uniformity of annotations. We show that AEA is able to identify biologically meaningful functional enrichments that are obscured by numerous false-positive enrichment scores in FET, and we therefore suggest it be used to more accurately assess the biological properties of gene sets.

Glass, Kimberly; Girvan, Michelle

2014-02-01

4

Gene Coexpression Network Analysis as a Source of Functional Annotation for Rice Genes  

Microsoft Academic Search

With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions\\/treatments have also been used to create coexpression networks to help

Kevin L. Childs; Rebecca M. Davidson; C. Robin Buell

2011-01-01

5

Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists  

Microsoft Academic Search

BACKGROUND: The increasing protein family and domain based annotations constitute important information to understand protein functions and gain insight into relations among their codifying genes. To allow analyzing of gene proteomic annotations, we implemented novel modules within GFINDer, a Web system we previously developed that dynamically aggregates functional and phenotypic annotations of user-uploaded gene lists and allows performing their statistical

Marco Masseroli; Elisa Bellistri; Andrea Franceschini; Francesco Pinciroli

2007-01-01

6

Expression profiling of hypothetical genes in Desulfovibrio vulgaris leads to improved functional annotation  

PubMed Central

Hypothetical (HyP) and conserved HyP genes account for >30% of sequenced bacterial genomes. For the sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough, 347 of the 3634 genes were annotated as conserved HyP (9.5%) along with 887 HyP genes (24.4%). Given the large fraction of the genome, it is plausible that some of these genes serve critical cellular roles. The study goals were to determine which genes were expressed and provide a more functionally based annotation. To accomplish this, expression profiles of 1234 HyP and conserved genes were used from transcriptomic datasets of 11 environmental stresses, complemented with shotgun LC–MS/MS and AMT tag proteomic data. Genes were divided into putatively polycistronic operons and those predicted to be monocistronic, then classified by basal expression levels and grouped according to changes in expression for one or multiple stresses. One thousand two hundred and twelve of these genes were transcribed with 786 producing detectable proteins. There was no evidence for expression of 17 predicted genes. Except for the latter, monocistronic gene annotation was expanded using the above criteria along with matching Clusters of Orthologous Groups. Polycistronic genes were annotated in the same manner with inferences from their proximity to more confidently annotated genes. Two targeted deletion mutants were used as test cases to determine the relevance of the inferred functional annotations. PMID:19293273

Elias, Dwayne A.; Mukhopadhyay, Aindrila; Joachimiak, Marcin P.; Drury, Elliott C.; Redding, Alyssa M.; Yen, Huei-Che B.; Fields, Matthew W.; Hazen, Terry C.; Arkin, Adam P.; Keasling, Jay D.; Wall, Judy D.

2009-01-01

7

Expression profiling of hypothetical genes in Desulfovibrio vulgaris leads to improved functional annotation  

SciTech Connect

Hypothetical and conserved hypothetical genes account for>30percent of sequenced bacterial genomes. For the sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough, 347 of the 3634 genes were annotated as conserved hypothetical (9.5percent) along with 887 hypothetical genes (24.4percent). Given the large fraction of the genome, it is plausible that some of these genes serve critical cellular roles. The study goals were to determine which genes were expressed and provide a more functionally based annotation. To accomplish this, expression profiles of 1234 hypothetical and conserved genes were used from transcriptomic datasets of 11 environmental stresses, complemented with shotgun LC-MS/MS and AMT tag proteomic data. Genes were divided into putatively polycistronic operons and those predicted to be monocistronic, then classified by basal expression levels and grouped according to changes in expression for one or multiple stresses. 1212 of these genes were transcribed with 786 producing detectable proteins. There was no evidence for expression of 17 predicted genes. Except for the latter, monocistronic gene annotation was expanded using the above criteria along with matching Clusters of Orthologous Groups. Polycistronic genes were annotated in the same manner with inferences from their proximity to more confidently annotated genes. Two targeted deletion mutants were used as test cases to determine the relevance of the inferred functional annotations.

Elias, Dwayne A.; Mukhopadhyay, Aindrila; Joachimiak, Marcin P.; Drury, Elliott C.; Redding, Alyssa M.; Yen, Huei-Che B.; Fields, Matthew W.; Hazen, Terry C.; Arkin, Adam P.; Keasling, Jay D.; Wall, Judy D.

2008-10-27

8

Algal functional annotation tool  

SciTech Connect

The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes on KEGG pathway maps and batch gene identifier conversion.

Lopez, D. [UCLA; Casero, D. [UCLA; Cokus, S. J. [UCLA; Merchant, S. S. [UCLA; Pellegrini, M. [UCLA

2012-07-01

9

The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species.  

PubMed

The Gene Ontology (GO) is a collaborative effort that provides structured vocabularies for annotating the molecular function, biological role, and cellular location of gene products in a highly systematic way and in a species-neutral manner with the aim of unifying the representation of gene function across different organisms. Each contributing member of the GO Consortium independently associates GO terms to gene products from the organism(s) they are annotating. Here we introduce the Reference Genome project, which brings together those independent efforts into a unified framework based on the evolutionary relationships between genes in these different organisms. The Reference Genome project has two primary goals: to increase the depth and breadth of annotations for genes in each of the organisms in the project, and to create data sets and tools that enable other genome annotation efforts to infer GO annotations for homologous genes in their organisms. In addition, the project has several important incidental benefits, such as increasing annotation consistency across genome databases, and providing important improvements to the GO's logical structure and biological content. PMID:19578431

2009-07-01

10

Evaluation of a Hybrid Approach Using UBLAST and BLASTX for Metagenomic Sequences Annotation of Specific Functional Genes  

PubMed Central

The fast development of next generation sequencing (NGS) has dramatically increased the application of metagenomics in various aspects. Functional annotation is a major step in the metagenomics studies. Fast annotation of functional genes has been a challenge because of the deluge of NGS data and expanding databases. A hybrid annotation pipeline proposed previously for taxonomic assignments was evaluated in this study for metagenomic sequences annotation of specific functional genes, such as antibiotic resistance genes, arsenic resistance genes and key genes in nitrogen metabolism. The hybrid approach using UBLAST and BLASTX is 44–177 times faster than direct BLASTX in the annotation using the small protein database for the specific functional genes, with the cost of missing a small portion (<1.8%) of target sequences compared with direct BLASTX hits. Different from direct BLASTX, the time required for specific functional genes annotation using the hybrid annotation pipeline depends on the abundance for the target genes. Thus this hybrid annotation pipeline is more suitable in specific functional genes annotation than in comprehensive functional genes annotation. PMID:25347677

Yang, Ying; Jiang, Xiao-Tao; Zhang, Tong

2014-01-01

11

Global profiling of Shewanella oneidensis MR-1: expression of hypothetical genes and improved functional annotations.  

PubMed

The gamma-proteobacterium Shewanella oneidensis strain MR-1 is a metabolically versatile organism that can reduce a wide range of organic compounds, metal ions, and radionuclides. Similar to most other sequenced organisms, approximately 40% of the predicted ORFs in the S. oneidensis genome were annotated as uncharacterized "hypothetical" genes. We implemented an integrative approach by using experimental and computational analyses to provide more detailed insight into gene function. Global expression profiles were determined for cells after UV irradiation and under aerobic and suboxic growth conditions. Transcriptomic and proteomic analyses confidently identified 538 hypothetical genes as expressed in S. oneidensis cells both as mRNAs and proteins (33% of all predicted hypothetical proteins). Publicly available analysis tools and databases and the expression data were applied to improve the annotation of these genes. The annotation results were scored by using a seven-category schema that ranked both confidence and precision of the functional assignment. We were able to identify homologs for nearly all of these hypothetical proteins (97%), but could confidently assign exact biochemical functions for only 16 proteins (category 1; 3%). Altogether, computational and experimental evidence provided functional assignments or insights for 240 more genes (categories 2-5; 45%). These functional annotations advance our understanding of genes involved in vital cellular processes, including energy conversion, ion transport, secondary metabolism, and signal transduction. We propose that this integrative approach offers a valuable means to undertake the enormous challenge of characterizing the rapidly growing number of hypothetical proteins with each newly sequenced genome. PMID:15684069

Kolker, Eugene; Picone, Alex F; Galperin, Michael Y; Romine, Margaret F; Higdon, Roger; Makarova, Kira S; Kolker, Natali; Anderson, Gordon A; Qiu, Xiaoyun; Auberry, Kenneth J; Babnigg, Gyorgy; Beliaev, Alex S; Edlefsen, Paul; Elias, Dwayne A; Gorby, Yuri A; Holzman, Ted; Klappenbach, Joel A; Konstantinidis, Konstantinos T; Land, Miriam L; Lipton, Mary S; McCue, Lee-Ann; Monroe, Matthew; Pasa-Tolic, Ljiljana; Pinchuk, Grigoriy; Purvine, Samuel; Serres, Margrethe H; Tsapin, Sasha; Zakrajsek, Brian A; Zhu, Wenhong; Zhou, Jizhong; Larimer, Frank W; Lawrence, Charles E; Riley, Monica; Collart, Frank R; Yates, John R; Smith, Richard D; Giometti, Carol S; Nealson, Kenneth H; Fredrickson, James K; Tiedje, James M

2005-02-01

12

Enhanced function annotations for Drosophila serine proteases: A case study for systematic annotation of multi-member gene families  

Microsoft Academic Search

Systematicallyannotatingfunctionofenzymesthatbelongtolargeproteinfamiliesencodedinasingleeukaryoticgenomeisaverychallengingtask. We carried out such an exercise to annotate function for serine-protease family of the trypsin fold in Drosophila melanogaster, with an emphasis on annotating serine-protease homologues (SPHs) that may have lost their catalytic function. Our approach involves data mining and data integration to providefunction annotations for 190Drosophilagene products containing serine-protease-like domains,ofwhich 35areSPHs. Thiswas accomplished by analysis of structure-function relationships,

Parantu K. Shah; Lokesh P. Tripathi; Lars Juhl Jensen; Murad Gahnim; Christopher Mason; Eileen E. Furlong; Veronica Rodrigues; Kevin P. White; Peer Bork; R. Sowdhamini

2007-01-01

13

Gene Expression and Functional Annotation of the Human and Mouse Choroid Plexus Epithelium  

PubMed Central

Background The choroid plexus epithelium (CPE) is a lobed neuro-epithelial structure that forms the outer blood-brain barrier. The CPE protrudes into the brain ventricles and produces the cerebrospinal fluid (CSF), which is crucial for brain homeostasis. Malfunction of the CPE is possibly implicated in disorders like Alzheimer disease, hydrocephalus or glaucoma. To study human genetic diseases and potential new therapies, mouse models are widely used. This requires a detailed knowledge of similarities and differences in gene expression and functional annotation between the species. The aim of this study is to analyze and compare gene expression and functional annotation of healthy human and mouse CPE. Methods We performed 44k Agilent microarray hybridizations with RNA derived from laser dissected healthy human and mouse CPE cells. We functionally annotated and compared the gene expression data of human and mouse CPE using the knowledge database Ingenuity. We searched for common and species specific gene expression patterns and function between human and mouse CPE. We also made a comparison with previously published CPE human and mouse gene expression data. Results Overall, the human and mouse CPE transcriptomes are very similar. Their major functionalities included epithelial junctions, transport, energy production, neuro-endocrine signaling, as well as immunological, neurological and hematological functions and disorders. The mouse CPE presented two additional functions not found in the human CPE: carbohydrate metabolism and a more extensive list of (neural) developmental functions. We found three genes specifically expressed in the mouse CPE compared to human CPE, being ACE, PON1 and TRIM3 and no human specifically expressed CPE genes compared to mouse CPE. Conclusion Human and mouse CPE transcriptomes are very similar, and display many common functionalities. Nonetheless, we also identified a few genes and pathways which suggest that the CPE between mouse and man differ with respect to transport and metabolic functions. PMID:24391755

Janssen, Sarah F.; van der Spek, Sophie J. F.; ten Brink, Jacoline B.; Essing, Anke H. W.; Gorgels, Theo G. M. F.; van der Spek, Peter J.; Jansonius, Nomdo M.; Bergen, Arthur A. B.

2013-01-01

14

Expression profiling of hypothetical genes in Desulfovibrio vulgaris leads to improved functional annotation  

SciTech Connect

Hypothetical (HyP) and conserved HyP genes account for >30% of sequenced bacterial genomes. For the sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough, 347 of the 3634 genes were annotated as conserved HyP (9.5%) along with 887 HyP genes (24.4%). Given the large fraction of the genome, it is plausible that some of these genes serve critical cellular roles. The study goals were to determine which genes were expressed and provide a more functionally based annotation. To accomplish this, expression profiles of 1234 HyP and conserved genes were used from transcriptomic datasets of 11 environmental stresses, complemented with shotgun LC–MS/MS and AMT tag proteomic data. Genes were divided into putatively polycistronic operons and those predicted to be monocistronic, then classified by basal expression levels and grouped according to changes in expression for one or multiple stresses. One thousand two hundred and twelve of these genes were transcribed with 786 producing detectable proteins. There was no evidence for expression of 17 predicted genes.

Elias, Dwayne A.; Mukhopadhyay, Aindrila; Joachimiak, Marcine P.; Drury, Elliott C.; Redding, Alyssa M.; Yen, Huei-Che B.; Fields, Matthew; Hazen, Terry C.; Arkin, Adam P.; Keasling, Jay D.; Wall, Judy D.

2009-03-17

15

Genome, Functional Gene Annotation, and Nuclear Transformation of the Heterokont Oleaginous Alga Nannochloropsis oceanica CCMP1779  

PubMed Central

Unicellular marine algae have promise for providing sustainable and scalable biofuel feedstocks, although no single species has emerged as a preferred organism. Moreover, adequate molecular and genetic resources prerequisite for the rational engineering of marine algal feedstocks are lacking for most candidate species. Heterokonts of the genus Nannochloropsis naturally have high cellular oil content and are already in use for industrial production of high-value lipid products. First success in applying reverse genetics by targeted gene replacement makes Nannochloropsis oceanica an attractive model to investigate the cell and molecular biology and biochemistry of this fascinating organism group. Here we present the assembly of the 28.7 Mb genome of N. oceanica CCMP1779. RNA sequencing data from nitrogen-replete and nitrogen-depleted growth conditions support a total of 11,973 genes, of which in addition to automatic annotation some were manually inspected to predict the biochemical repertoire for this organism. Among others, more than 100 genes putatively related to lipid metabolism, 114 predicted transcription factors, and 109 transcriptional regulators were annotated. Comparison of the N. oceanica CCMP1779 gene repertoire with the recently published N. gaditana genome identified 2,649 genes likely specific to N. oceanica CCMP1779. Many of these N. oceanica–specific genes have putative orthologs in other species or are supported by transcriptional evidence. However, because similarity-based annotations are limited, functions of most of these species-specific genes remain unknown. Aside from the genome sequence and its analysis, protocols for the transformation of N. oceanica CCMP1779 are provided. The availability of genomic and transcriptomic data for Nannochloropsis oceanica CCMP1779, along with efficient transformation protocols, provides a blueprint for future detailed gene functional analysis and genetic engineering of Nannochloropsis species by a growing academic community focused on this genus. PMID:23166516

Tsai, Chia-Hong; Bullard, Blair; Cornish, Adam J.; Harvey, Christopher; Reca, Ida-Barbara; Thornburg, Chelsea; Achawanantakun, Rujira; Buehl, Christopher J.; Campbell, Michael S.; Cavalier, David; Childs, Kevin L.; Clark, Teresa J.; Deshpande, Rahul; Erickson, Erika; Armenia Ferguson, Ann; Handee, Witawas; Kong, Que; Li, Xiaobo; Liu, Bensheng; Lundback, Steven; Peng, Cheng; Roston, Rebecca L.; Sanjaya; Simpson, Jeffrey P.; TerBush, Allan; Warakanont, Jaruswan; Zäuner, Simone; Farre, Eva M.; Hegg, Eric L.; Jiang, Ning; Kuo, Min-Hao; Lu, Yan; Niyogi, Krishna K.; Ohlrogge, John; Osteryoung, Katherine W.; Shachar-Hill, Yair; Sears, Barbara B.; Sun, Yanni; Takahashi, Hideki; Yandell, Mark; Shiu, Shin-Han; Benning, Christoph

2012-01-01

16

Gene Ontology Annotations and Resources  

PubMed Central

The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new ‘phylogenetic annotation’ process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources. PMID:23161678

2013-01-01

17

Discovering gene functional relationships using FAUN (Feature Annotation Using Nonnegative matrix factorization)  

PubMed Central

Background Searching the enormous amount of information available in biomedical literature to extract novel functional relationships among genes remains a challenge in the field of bioinformatics. While numerous (software) tools have been developed to extract and identify gene relationships from biological databases, few effectively deal with extracting new (or implied) gene relationships, a process which is useful in interpretation of discovery-oriented genome-wide experiments. Results In this study, we develop a Web-based bioinformatics software environment called FAUN or Feature Annotation Using Nonnegative matrix factorization (NMF) to facilitate both the discovery and classification of functional relationships among genes. Both the computational complexity and parameterization of NMF for processing gene sets are discussed. FAUN is tested on three manually constructed gene document collections. Its utility and performance as a knowledge discovery tool is demonstrated using a set of genes associated with Autism. Conclusions FAUN not only assists researchers to use biomedical literature efficiently, but also provides utilities for knowledge discovery. This Web-based software environment may be useful for the validation and analysis of functional associations in gene subsets identified by high-throughput experiments. PMID:20946597

2010-01-01

18

Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists  

PubMed Central

Background The increasing protein family and domain based annotations constitute important information to understand protein functions and gain insight into relations among their codifying genes. To allow analyzing of gene proteomic annotations, we implemented novel modules within GFINDer, a Web system we previously developed that dynamically aggregates functional and phenotypic annotations of user-uploaded gene lists and allows performing their statistical analysis and mining. Results Exploiting protein information in Pfam and InterPro databanks, we developed and added in GFINDer original modules specifically devoted to the exploration and analysis of functional signatures of gene protein products. They allow annotating numerous user-classified nucleotide sequence identifiers with controlled information on related protein families, domains and functional sites, classifying them according to such protein annotation categories, and statistically analyzing the obtained classifications. In particular, when uploaded nucleotide sequence identifiers are subdivided in classes, the Statistics Protein Families&Domains module allows estimating relevance of Pfam or InterPro controlled annotations for the uploaded genes by highlighting protein signatures significantly more represented within user-defined classes of genes. In addition, the Logistic Regression module allows identifying protein functional signatures that better explain the considered gene classification. Conclusion Novel GFINDer modules provide genomic protein family and domain analyses supporting better functional interpretation of gene classes, for instance defined through statistical and clustering analyses of gene expression results from microarray experiments. They can hence help understanding fundamental biological processes and complex cellular mechanisms influenced by protein domain composition, and contribute to unveil new biomedical knowledge about the codifying genes. PMID:17430558

Masseroli, Marco; Bellistri, Elisa; Franceschini, Andrea; Pinciroli, Francesco

2007-01-01

19

On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report  

PubMed Central

A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011) has proposed a metric for the “functional similarity” between two genes that uses only the Gene Ontology (GO) annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the “ortholog conjecture” (or, more properly, the “ortholog functional conservation hypothesis”). First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1) that GO annotations are often incomplete, potentially in a biased manner, and subject to an “open world assumption” (absence of an annotation does not imply absence of a function), and 2) that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the conclusions have a justifiable biological basis. PMID:22359495

Thomas, Paul D.; Wood, Valerie; Mungall, Christopher J.; Lewis, Suzanna E.; Blake, Judith A.

2012-01-01

20

Gene Expression and Functional Annotation of the Human Ciliary Body Epithelia  

PubMed Central

Purpose The ciliary body (CB) of the human eye consists of the non-pigmented (NPE) and pigmented (PE) neuro-epithelia. We investigated the gene expression of NPE and PE, to shed light on the molecular mechanisms underlying the most important functions of the CB. We also developed molecular signatures for the NPE and PE and studied possible new clues for glaucoma. Methods We isolated NPE and PE cells from seven healthy human donor eyes using laser dissection microscopy. Next, we performed RNA isolation, amplification, labeling and hybridization against 44×k Agilent microarrays. For microarray conformations, we used a literature study, RT-PCRs, and immunohistochemical stainings. We analyzed the gene expression data with R and with the knowledge database Ingenuity. Results The gene expression profiles and functional annotations of the NPE and PE were highly similar. We found that the most important functionalities of the NPE and PE were related to developmental processes, neural nature of the tissue, endocrine and metabolic signaling, and immunological functions. In total 1576 genes differed statistically significantly between NPE and PE. From these genes, at least 3 were cell-specific for the NPE and 143 for the PE. Finally, we observed high expression in the (N)PE of 35 genes previously implicated in molecular mechanisms related to glaucoma. Conclusion Our gene expression analysis suggested that the NPE and PE of the CB were quite similar. Nonetheless, cell-type specific differences were found. The molecular machineries of the human NPE and PE are involved in a range of neuro-endocrinological, developmental and immunological functions, and perhaps glaucoma. PMID:23028713

Janssen, Sarah F.; Gorgels, Theo G. M. F.; Bossers, Koen; ten Brink, Jacoline B.; Essing, Anke H. W.; Nagtegaal, Martijn; van der Spek, Peter J.; Jansonius, Nomdo M.; Bergen, Arthur A. B.

2012-01-01

21

Annotation of gene function in citrus using gene expression information and co-expression networks  

PubMed Central

Background The genus Citrus encompasses major cultivated plants such as sweet orange, mandarin, lemon and grapefruit, among the world’s most economically important fruit crops. With increasing volumes of transcriptomics data available for these species, Gene Co-expression Network (GCN) analysis is a viable option for predicting gene function at a genome-wide scale. GCN analysis is based on a “guilt-by-association” principle whereby genes encoding proteins involved in similar and/or related biological processes may exhibit similar expression patterns across diverse sets of experimental conditions. While bioinformatics resources such as GCN analysis are widely available for efficient gene function prediction in model plant species including Arabidopsis, soybean and rice, in citrus these tools are not yet developed. Results We have constructed a comprehensive GCN for citrus inferred from 297 publicly available Affymetrix Genechip Citrus Genome microarray datasets, providing gene co-expression relationships at a genome-wide scale (33,000 transcripts). The comprehensive citrus GCN consists of a global GCN (condition-independent) and four condition-dependent GCNs that survey the sweet orange species only, all citrus fruit tissues, all citrus leaf tissues, or stress-exposed plants. All of these GCNs are clustered using genome-wide, gene-centric (guide) and graph clustering algorithms for flexibility of gene function prediction. For each putative cluster, gene ontology (GO) enrichment and gene expression specificity analyses were performed to enhance gene function, expression and regulation pattern prediction. The guide-gene approach was used to infer novel roles of genes involved in disease susceptibility and vitamin C metabolism, and graph-clustering approaches were used to investigate isoprenoid/phenylpropanoid metabolism in citrus peel, and citric acid catabolism via the GABA shunt in citrus fruit. Conclusions Integration of citrus gene co-expression networks, functional enrichment analysis and gene expression information provide opportunities to infer gene function in citrus. We present a publicly accessible tool, Network Inference for Citrus Co-Expression (NICCE, http://citrus.adelaide.edu.au/nicce/home.aspx), for the gene co-expression analysis in citrus. PMID:25023870

2014-01-01

22

Commonality of functional annotation: a method for prioritization of candidate genes from genome-wide linkage studies†  

PubMed Central

Linkage studies of complex traits frequently yield multiple linkage regions covering hundreds of genes. Testing each candidate gene from every region is prohibitively expensive and computational methods that simplify this process would benefit genetic research. We present a new method based on commonality of functional annotation (CFA) that aids dissection of complex traits for which multiple causal genes act in a single pathway or process. CFA works by testing individual Gene Ontology (GO) terms for enrichment among candidate gene pools, performs multiple hypothesis testing adjustment using an estimate of independent tests based on correlation of GO terms, and then scores and ranks genes annotated with significantly-enriched terms based on the number of quantitative trait loci regions in which genes bearing those annotations appear. We evaluate CFA using simulated linkage data and show that CFA has good power despite being conservative. We apply CFA to published linkage studies investigating age-of-onset of Alzheimer's disease and body mass index and obtain previously known and new candidate genes. CFA provides a new tool for studies in which causal genes are expected to participate in a common pathway or process and can easily be extended to utilize annotation schemes in addition to the GO. PMID:18263617

Shriner, Daniel; Baye, Tesfaye M.; Padilla, Miguel A.; Zhang, Shiju; Vaughan, Laura K.; Loraine, Ann E.

2008-01-01

23

FunnyBase: a systems level functional annotation of Fundulus ESTs for the analysis of gene expression  

PubMed Central

Background While studies of non-model organisms are critical for many research areas, such as evolution, development, and environmental biology, they present particular challenges for both experimental and computational genomic level research. Resources such as mass-produced microarrays and the computational tools linking these data to functional annotation at the system and pathway level are rarely available for non-model species. This type of "systems-level" analysis is critical to the understanding of patterns of gene expression that underlie biological processes. Results We describe a bioinformatics pipeline known as FunnyBase that has been used to store, annotate, and analyze 40,363 expressed sequence tags (ESTs) from the heart and liver of the fish, Fundulus heteroclitus. Primary annotations based on sequence similarity are linked to networks of systematic annotation in Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) and can be queried and computationally utilized in downstream analyses. Steps are taken to ensure that the annotation is self-consistent and that the structure of GO is used to identify higher level functions that may not be annotated directly. An integrated framework for cDNA library production, sequencing, quality control, expression data generation, and systems-level analysis is presented and utilized. In a case study, a set of genes, that had statistically significant regression between gene expression levels and environmental temperature along the Atlantic Coast, shows a statistically significant (P < 0.001) enrichment in genes associated with amine metabolism. Conclusion The methods described have application for functional genomics studies, particularly among non-model organisms. The web interface for FunnyBase can be accessed at . Data and source code are available by request at jpaschall@bioinfobase.umkc.edu. PMID:15610557

Paschall, Justin E; Oleksiak, Marjorie F; VanWye, Jeffrey D; Roach, Jennifer L; Whitehead, J Andrew; Wyckoff, Gerald J; Kolell, Kevin J; Crawford, Douglas L

2004-01-01

24

Community gene annotation in practice  

PubMed Central

Manual annotation of genomic data is extremely valuable to produce an accurate reference gene set but is expensive compared with automatic methods and so has been limited to model organisms. Annotation tools that have been developed at the Wellcome Trust Sanger Institute (WTSI, http://www.sanger.ac.uk/.) are being used to fill that gap, as they can be used remotely and so open up viable community annotation collaborations. We introduce the ‘Blessed’ annotator and ‘Gatekeeper’ approach to Community Annotation using the Otterlace/ZMap genome annotation tool. We also describe the strategies adopted for annotation consistency, quality control and viewing of the annotation. Database URL: http://vega.sanger.ac.uk/index.html PMID:22434843

Loveland, Jane E.; Gilbert, James G.R.; Griffiths, Ed; Harrow, Jennifer L.

2012-01-01

25

Transitive Functional Annotation by Shortest-path Analysis of Gene Expression Data  

Microsoft Academic Search

attribute to link genes of the same biological pathway. Based on large-scale yeast microarray expression data, we use the shortest-path analysis to identify transitive genes between two given genes from the same biological process. We find that not only functionally related genes with correlated expression profiles are identified but also those without. In the latter case, we compare our method

Xianghong Zhou; Ming-Chih J. Kao; Wing Hung Wong

2002-01-01

26

CELLO2GO: A Web Server for Protein subCELlular LOcalization Prediction with Functional Gene Ontology Annotation  

PubMed Central

CELLO2GO (http://cello.life.nctu.edu.tw/cello2go/) is a publicly available, web-based system for screening various properties of a targeted protein and its subcellular localization. Herein, we describe how this platform is used to obtain a brief or detailed gene ontology (GO)-type categories, including subcellular localization(s), for the queried proteins by combining the CELLO localization-predicting and BLAST homology-searching approaches. Given a query protein sequence, CELLO2GO uses BLAST to search for homologous sequences that are GO annotated in an in-house database derived from the UniProt KnowledgeBase database. At the same time, CELLO attempts predict at least one subcellular localization on the basis of the species in which the protein is found. When homologs for the query sequence have been identified, the number of terms found for each of their GO categories, i.e., cellular compartment, molecular function, and biological process, are summed and presented as pie charts representing possible functional annotations for the queried protein. Although the experimental subcellular localization of a protein may not be known, and thus not annotated, CELLO can confidentially suggest a subcellular localization. CELLO2GO should be a useful tool for research involving complex subcellular systems because it combines CELLO and BLAST into one platform and its output is easily manipulated such that the user-specific questions may be readily addressed. PMID:24911789

Yu, Chin-Sheng; Cheng, Chih-Wen; Su, Wen-Chi; Chang, Kuei-Chung; Huang, Shao-Wei; Hwang, Jenn-Kang; Lu, Chih-Hao

2014-01-01

27

The Genome Sequence of Leishmania (Leishmania) amazonensis: Functional Annotation and Extended Analysis of Gene Models  

PubMed Central

We present the sequencing and annotation of the Leishmania (Leishmania) amazonensis genome, an etiological agent of human cutaneous leishmaniasis in the Amazon region of Brazil. L. (L.) amazonensis shares features with Leishmania (L.) mexicana but also exhibits unique characteristics regarding geographical distribution and clinical manifestations of cutaneous lesions (e.g. borderline disseminated cutaneous leishmaniasis). Predicted genes were scored for orthologous gene families and conserved domains in comparison with other human pathogenic Leishmania spp. Carboxypeptidase, aminotransferase, and 3?-nucleotidase genes and ATPase, thioredoxin, and chaperone-related domains were represented more abundantly in L. (L.) amazonensis and L. (L.) mexicana species. Phylogenetic analysis revealed that these two species share groups of amastin surface proteins unique to the genus that could be related to specific features of disease outcomes and host cell interactions. Additionally, we describe a hypothetical hybrid interactome of potentially secreted L. (L.) amazonensis proteins and host proteins under the assumption that parasite factors mimic their mammalian counterparts. The model predicts an interaction between an L. (L.) amazonensis heat-shock protein and mammalian Toll-like receptor 9, which is implicated in important immune responses such as cytokine and nitric oxide production. The analysis presented here represents valuable information for future studies of leishmaniasis pathogenicity and treatment. PMID:23857904

Real, Fernando; Vidal, Ramon Oliveira; Carazzolle, Marcelo Falsarella; Mondego, Jorge Maurício Costa; Costa, Gustavo Gilson Lacerda; Herai, Roberto Hirochi; Würtele, Martin; de Carvalho, Lucas Miguel; e Ferreira, Renata Carmona; Mortara, Renato Arruda; Barbiéri, Clara Lucia; Mieczkowski, Piotr; da Silveira, José Franco; Briones, Marcelo Ribeiro da Silva; Pereira, Gonçalo Amarante Guimarães; Bahia, Diana

2013-01-01

28

An atlas of tissue-specific conserved coexpression for functional annotation and disease gene prediction  

PubMed Central

Gene coexpression relationships that are phylogenetically conserved between human and mouse have been shown to provide important clues about gene function that can be efficiently used to identify promising candidate genes for human hereditary disorders. In the past, such approaches have considered mostly generic gene expression profiles that cover multiple tissues and organs. The individual genes of multicellular organisms, however, can participate in different transcriptional programs, operating at scales as different as single-cell types, tissues, organs, body regions or the entire organism. Therefore, systematic analysis of tissue-specific coexpression could be, in principle, a very powerful strategy to dissect those functional relationships among genes that emerge only in particular tissues or organs. In this report, we show that, in fact, conserved coexpression as determined from tissue-specific and condition-specific data sets can predict many functional relationships that are not detected by analyzing heterogeneous microarray data sets. More importantly, we find that, when combined with disease networks, the simultaneous use of both generic (multi-tissue) and tissue-specific conserved coexpression allows a more efficient prediction of human disease genes than the use of generic conserved coexpression alone. Using this strategy, we were able to identify high-probability candidates for 238 orphan disease loci. We provide proof of concept that this combined use of generic and tissue-specific conserved coexpression can be very useful to prioritize the mutational candidates obtained from deep-sequencing projects, even in the case of genetic disorders as heterogeneous as XLMR. PMID:21654723

Piro, Rosario Michael; Ala, Ugo; Molineris, Ivan; Grassi, Elena; Bracco, Chiara; Perego, Gian Paolo; Provero, Paolo; Di Cunto, Ferdinando

2011-01-01

29

A Framework for Comparing Phenotype Annotations of Orthologous Genes  

PubMed Central

Objectives Animal models are a key resource for the investigation of human diseases. In contrast to functional annotation, phenotype annotation is less standard, and comparing phenotypes across species remains challenging. The objective of this paper is to propose a framework for comparing phenotype annotations of orthologous genes based on the Medical Subject Headings (MeSH) indexing of biomedical articles in which these genes are discussed. Methods 17,769 pairs of orthologous genes (mouse and human) are downloaded from the Mouse Genome Informatics (MGI) system and linked to biomedical articles through Entrez Gene. MeSH index terms corresponding to diseases are extracted from Medline. Results 11,111 pairs of genes exhibited at least one phenotype annotation for each gene in the pair. Among these, 81% have at least one phenotype annotation in common, 80% have at least one annotation specific to the human gene and 84% have at least one annotation specific to the mouse gene. Four disease categories represent 54% of all phenotype annotations. Conclusions This framework supports the curation of phenotype annotation and the generation of research hypotheses based on comparative studies. PMID:20841896

Bodenreider, Olivier; Burgun, Anita

2015-01-01

30

SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data  

Microsoft Academic Search

The explosion in the number of functional genomic datasets generated with tools such as DNA micro- arrays has created a critical need for resources that facilitate the interpretation of large-scale biological data. SOURCE is a web-based database that brings together information from a broad range of resources, and provides it in manner particularly useful for genome-scale analyses. SOURCE's GeneReports include

Maximilian Diehn; Gavin Sherlock; Gail Binkley; Heng Jin; John C. Matese; Tina Hernandez-boussard; Christian A. Rees; J. Michael Cherry; David Botstein; Patrick O. Brown; Ash A. Alizadeh

2003-01-01

31

Improving gene annotation of complete viral genomes  

PubMed Central

Gene annotation in viruses often relies upon similarity search methods. These methods possess high specificity but some genes may be missed, either those unique to a particular genome or those highly divergent from known homologs. To identify potentially missing viral genes we have analyzed all complete viral genomes currently available in GenBank with a specialized and augmented version of the gene finding program GeneMarkS. In particular, by implementing genome-specific self-training protocols we have better adjusted the GeneMarkS statistical models to sequences of viral genomes. Hundreds of new genes were identified, some in well studied viral genomes. For example, a new gene predicted in the genome of the Epstein–Barr virus was shown to encode a protein similar to ?-herpesvirus minor tegument protein UL14 with heat shock functions. Convincing evidence of this similarity was obtained after only 12 PSI-BLAST iterations. In another example, several iterations of PSI-BLAST were required to demonstrate that a gene predicted in the genome of Alcelaphine herpesvirus 1 encodes a BALF1-like protein which is thought to be involved in apoptosis regulation and, potentially, carcinogenesis. New predictions were used to refine annotations of viral genomes in the RefSeq collection curated by the National Center for Biotechnology Information. Importantly, even in those cases where no sequence similarities were detected, GeneMarkS significantly reduced the number of primary targets for experimental characterization by identifying the most probable candidate genes. The new genome annotations were stored in VIOLIN, an interactive database which provides access to similarity search tools for up-to-date analysis of predicted viral proteins. PMID:14627837

Mills, Ryan; Rozanov, Michael; Lomsadze, Alexandre; Tatusova, Tatiana; Borodovsky, Mark

2003-01-01

32

Structural and functional annotation of the porcine immunome  

PubMed Central

Background The domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems. Results The Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome. Conclusions This extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig’s adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response. PMID:23676093

2013-01-01

33

A prioritization analysis of disease association by data-mining of functional annotation of human genes.  

PubMed

Complex diseases result from contributions of multiple genes that act in concert through pathways. Here we present a method to prioritize novel candidates of disease-susceptibility genes depending on the biological similarities to the known disease-related genes. The extent of disease-susceptibility of a gene is prioritized by analyzing seven features of human genes captured in H-InvDB. Taking rheumatoid arthritis (RA) and prostate cancer (PC) as two examples, we evaluated the efficiency of our method. Highly scored genes obtained included TNFSF12 and OSM as candidate disease genes for RA and PC, respectively. Subsequent characterization of these genes based upon an extensive literature survey reinforced the validity of these highly scored genes as possible disease-susceptibility genes. Our approach, Prioritization ANalysis of Disease Association (PANDA), is an efficient and cost-effective method to narrow down a large set of genes into smaller subsets that are most likely to be involved in the disease pathogenesis. PMID:22019378

Taniya, Takayuki; Tanaka, Susumu; Yamaguchi-Kabata, Yumi; Hanaoka, Hideki; Yamasaki, Chisato; Maekawa, Harutoshi; Barrero, Roberto A; Lenhard, Boris; Datta, Milton W; Shimoyama, Mary; Bumgarner, Roger; Chakraborty, Ranajit; Hopkinson, Ian; Jia, Libin; Hide, Winston; Auffray, Charles; Minoshima, Shinsei; Imanishi, Tadashi; Gojobori, Takashi

2012-01-01

34

The GOA database: Gene Ontology annotation updates for 2015.  

PubMed

The Gene Ontology Annotation (GOA) resource (http://www.ebi.ac.uk/GOA) provides evidence-based Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB). Manual annotations provided by UniProt curators are supplemented by manual and automatic annotations from model organism databases and specialist annotation groups. GOA currently supplies 368 million GO annotations to almost 54 million proteins in more than 480 000 taxonomic groups. The resource now provides annotations to five times the number of proteins it did 4 years ago. As a member of the GO Consortium, we adhere to the most up-to-date Consortium-agreed annotation guidelines via the use of quality control checks that ensures that the GOA resource supplies high-quality functional information to proteins from a wide range of species. Annotations from GOA are freely available and are accessible through a powerful web browser as well as a variety of annotation file formats. PMID:25378336

Huntley, Rachael P; Sawford, Tony; Mutowo-Meullenet, Prudence; Shypitsyna, Aleksandra; Bonilla, Carlos; Martin, Maria J; O'Donovan, Claire

2015-01-28

35

Metagenomic gene annotation by a homology-independent approach  

SciTech Connect

Fully understanding the genetic potential of a microbial community requires functional annotation of all the genes it encodes. The recently developed deep metagenome sequencing approach has enabled rapid identification of millions of genes from a complex microbial community without cultivation. Current homology-based gene annotation fails to detect distantly-related or structural homologs. Furthermore, homology searches with millions of genes are very computational intensive. To overcome these limitations, we developed rhModeller, a homology-independent software pipeline to efficiently annotate genes from metagenomic sequencing projects. Using cellulases and carbonic anhydrases as two independent test cases, we demonstrated that rhModeller is much faster than HMMER but with comparable accuracy, at 94.5percent and 99.9percent accuracy, respectively. More importantly, rhModeller has the ability to detect novel proteins that do not share significant homology to any known protein families. As {approx}50percent of the 2 million genes derived from the cow rumen metagenome failed to be annotated based on sequence homology, we tested whether rhModeller could be used to annotate these genes. Preliminary results suggest that rhModeller is robust in the presence of missense and frameshift mutations, two common errors in metagenomic genes. Applying the pipeline to the cow rumen genes identified 4,990 novel cellulases candidates and 8,196 novel carbonic anhydrase candidates.In summary, we expect rhModeller to dramatically increase the speed and quality of metagnomic gene annotation.

Froula, Jeff; Zhang, Tao; Salmeen, Annette; Hess, Matthias; Kerfeld, Cheryl A.; Wang, Zhong; Du, Changbin

2011-06-02

36

Genome-wide annotation of the soybean WRKY family and functional characterization of genes involved in response to Phakopsora pachyrhizi infection.  

PubMed

BackgroundMany previous studies have shown that soybean WRKY transcription factors are involved in the plant response to biotic and abiotic stresses. Phakopsora pachyrhizi is the causal agent of Asian Soybean Rust, one of the most important soybean diseases. There are evidences that WRKYs are involved in the resistance of some soybean genotypes against that fungus. The number of WRKY genes already annotated in soybean genome was underrepresented. In the present study, a genome-wide annotation of the soybean WRKY family was carried out and members involved in the response to P. pachyrhizi were identified.ResultsAs a result of a soybean genomic databases search, 182 WRKY-encoding genes were annotated and 33 putative pseudogenes identified. Genes involved in the response to P. pachyrhizi infection were identified using superSAGE, RNA-Seq of microdissected lesions and microarray experiments. Seventy-five genes were differentially expressed during fungal infection. The expression of eight WRKY genes was validated by RT-qPCR. The expression of these genes in a resistant genotype was earlier and/or stronger compared with a susceptible genotype in response to P. pachyrhizi infection. Soybean somatic embryos were transformed in order to overexpress or silence WRKY genes. Embryos overexpressing a WRKY gene were obtained, but they were unable to convert into plants. When infected with P. pachyrhizi, the leaves of the silenced transgenic line showed a higher number of lesions than the wild-type plants. ConclusionsThe present study reports a genome-wide annotation of soybean WRKY family. The participation of some members in response to P. pachyrhizi infection was demonstrated. The results contribute to the elucidation of gene function and suggest the manipulation of WRKYs as a strategy to increase fungal resistance in soybean plants. PMID:25201117

Bencke-Malato, Marta; Cabreira, Caroline; Wiebke-Strohm, Beatriz; Bücker-Neto, Lauro; Mancini, Estefania; Osorio, Marina B; Homrich, Milena S; Turchetto-Zolet, Andreia; De Carvalho, Mayra; Stolf, Renata; Weber, Ricardo; Westergaard, Gastón; Castagnaro, Atílio P; Abdelnoor, Ricardo V; Marcelino-Guimarães, Francismar C; Margis-Pinheiro, Márcia; Bodanese-Zanettini, Maria

2014-09-10

37

Gene Characterization Index: Assessing the Depth of Gene Annotation  

PubMed Central

Background We introduce the Gene Characterization Index, a bioinformatics method for scoring the extent to which a protein-encoding gene is functionally described. Inherently a reflection of human perception, the Gene Characterization Index is applied for assessing the characterization status of individual genes, thus serving the advancement of both genome annotation and applied genomics research by rapid and unbiased identification of groups of uncharacterized genes for diverse applications such as directed functional studies and delineation of novel drug targets. Methodology/Principal Findings The scoring procedure is based on a global survey of researchers, who assigned characterization scores from 1 (poor) to 10 (extensive) for a sample of genes based on major online resources. By evaluating the survey as training data, we developed a bioinformatics procedure to assign gene characterization scores to all genes in the human genome. We analyzed snapshots of functional genome annotation over a period of 6 years to assess temporal changes reflected by the increase of the average Gene Characterization Index. Applying the Gene Characterization Index to genes within pharmaceutically relevant classes, we confirmed known drug targets as high-scoring genes and revealed potentially interesting novel targets with low characterization indexes. Removing known drug targets and genes linked to sequence-related patent filings from the entirety of indexed genes, we identified sets of low-scoring genes particularly suited for further experimental investigation. Conclusions/Significance The Gene Characterization Index is intended to serve as a tool to the scientific community and granting agencies for focusing resources and efforts on unexplored areas of the genome. The Gene Characterization Index is available from http://cisreg.ca/gci/. PMID:18213364

Yusuf, Dimas; Brumm, Jochen; Cheung, Warren; Wahlestedt, Claes; Lenhard, Boris; Wasserman, Wyeth W.

2008-01-01

38

Functional Annotation Analytics of Rhodopseudomonas palustris Genomes  

PubMed Central

Rhodopseudomonas palustris, a nonsulphur purple photosynthetic bacteria, has been extensively investigated for its metabolic versatility including ability to produce hydrogen gas from sunlight and biomass. The availability of the finished genome sequences of six R. palustris strains (BisA53, BisB18, BisB5, CGA009, HaA2 and TIE-1) combined with online bioinformatics software for integrated analysis presents new opportunities to determine the genomic basis of metabolic versatility and ecological lifestyles of the bacteria species. The purpose of this investigation was to compare the functional annotations available for multiple R. palustris genomes to identify annotations that can be further investigated for strain-specific or uniquely shared phenotypic characteristics. A total of 2,355 protein family Pfam domain annotations were clustered based on presence or absence in the six genomes. The clustering process identified groups of functional annotations including those that could be verified as strain-specific or uniquely shared phenotypes. For example, genes encoding water/glycerol transport were present in the genome sequences of strains CGA009 and BisB5, but absent in strains BisA53, BisB18, HaA2 and TIE-1. Protein structural homology modeling predicted that the two orthologous 240 aa R. palustris aquaporins have water-specific transport function. Based on observations in other microbes, the presence of aquaporin in R. palustris strains may improve freeze tolerance in natural conditions of rapid freezing such as nitrogen fixation at low temperatures where access to liquid water is a limiting factor for nitrogenase activation. In the case of adaptive loss of aquaporin genes, strains may be better adapted to survive in conditions of high-sugar content such as fermentation of biomass for biohydrogen production. Finally, web-based resources were developed to allow for interactive, user-defined selection of the relationship between protein family annotations and the R. palustris genomes. PMID:22084572

Simmons, Shaneka S.; Isokpehi, Raphael D.; Brown, Shyretha D.; McAllister, Donee L.; Hall, Charnia C.; McDuffy, Wanaki M.; Medley, Tamara L.; Udensi, Udensi K.; Rajnarayanan, Rajendram V.; Ayensu, Wellington K.; Cohly, Hari H.P.

2011-01-01

39

Functional genomics tools applied to plant metabolism: a survey on plant respiration, its connections and the annotation of complex gene functions  

PubMed Central

The application of post-genomic techniques in plant respiration studies has greatly improved our ability to assign functions to gene products. In addition it has also revealed previously unappreciated interactions between distal elements of metabolism. Such results have reinforced the need to consider plant respiratory metabolism as part of a complex network and making sense of such interactions will ultimately require the construction of predictive and mechanistic models. Transcriptomics, proteomics, metabolomics, and the quantification of metabolic flux will be of great value in creating such models both by facilitating the annotation of complex gene function, determining their structure and by furnishing the quantitative data required to test them. In this review, we highlight how these experimental approaches have contributed to our current understanding of plant respiratory metabolism and its interplay with associated process (e.g., photosynthesis, photorespiration, and nitrogen metabolism). We also discuss how data from these techniques may be integrated, with the ultimate aim of identifying mechanisms that control and regulate plant respiration and discovering novel gene functions with potential biotechnological implications. PMID:22973288

Araújo, Wagner L.; Nunes-Nesi, Adriano; Williams, Thomas C. R.

2012-01-01

40

Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA)  

PubMed Central

The assignment of gene function remains a difficult but important task in computational biology. The establishment of the first Critical Assessment of Functional Annotation (CAFA) was aimed at increasing progress in the field. We present an independent analysis of the results of CAFA, aimed at identifying challenges in assessment and at understanding trends in prediction performance. We found that well-accepted methods based on sequence similarity (i.e., BLAST) have a dominant effect. Many of the most informative predictions turned out to be either recovering existing knowledge about sequence similarity or were "post-dictions" already documented in the literature. These results indicate that deep challenges remain in even defining the task of function assignment, with a particular difficulty posed by the problem of defining function in a way that is not dependent on either flawed gold standards or the input data itself. In particular, we suggest that using the Gene Ontology (or other similar systematizations of function) as a gold standard is unlikely to be the way forward. PMID:23630983

2013-01-01

41

Functional Annotation, Genome Organization and Phylogeny of the Grapevine (Vitis vinifera) Terpene Synthase Gene Family Based on Genome Assembly, FLcDNA Cloning, and Enzyme Assays  

PubMed Central

Background Terpenoids are among the most important constituents of grape flavour and wine bouquet, and serve as useful metabolite markers in viticulture and enology. Based on the initial 8-fold sequencing of a nearly homozygous Pinot noir inbred line, 89 putative terpenoid synthase genes (VvTPS) were predicted by in silico analysis of the grapevine (Vitis vinifera) genome assembly [1]. The finding of this very large VvTPS family, combined with the importance of terpenoid metabolism for the organoleptic properties of grapevine berries and finished wines, prompted a detailed examination of this gene family at the genomic level as well as an investigation into VvTPS biochemical functions. Results We present findings from the analysis of the up-dated 12-fold sequencing and assembly of the grapevine genome that place the number of predicted VvTPS genes at 69 putatively functional VvTPS, 20 partial VvTPS, and 63 VvTPS probable pseudogenes. Gene discovery and annotation included information about gene architecture and chromosomal location. A dense cluster of 45 VvTPS is localized on chromosome 18. Extensive FLcDNA cloning, gene synthesis, and protein expression enabled functional characterization of 39 VvTPS; this is the largest number of functionally characterized TPS for any species reported to date. Of these enzymes, 23 have unique functions and/or phylogenetic locations within the plant TPS gene family. Phylogenetic analyses of the TPS gene family showed that while most VvTPS form species-specific gene clusters, there are several examples of gene orthology with TPS of other plant species, representing perhaps more ancient VvTPS, which have maintained functions independent of speciation. Conclusions The highly expanded VvTPS gene family underpins the prominence of terpenoid metabolism in grapevine. We provide a detailed experimental functional annotation of 39 members of this important gene family in grapevine and comprehensive information about gene structure and phylogeny for the entire currently known VvTPS gene family. PMID:20964856

2010-01-01

42

Logical Gene Ontology Annotations (GOAL): Exploring gene ontology annotations with OWL  

E-print Network

Abstract Motivation Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species comparisons of genes possible, along with a wide range of other analytical activities. The bio-ontologies community, in particular the Open...

2012-04-24

43

Gene calling and bacterial genome annotation with BG7.  

PubMed

New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services). PMID:25343866

Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

2015-01-01

44

Metabolomics as a Hypothesis-Generating Functional Genomics Tool for the Annotation of Arabidopsis thaliana Genes of “Unknown Function”  

PubMed Central

Metabolomics is the methodology that identifies and measures global pools of small molecules (of less than about 1,000?Da) of a biological sample, which are collectively called the metabolome. Metabolomics can therefore reveal the metabolic outcome of a genetic or environmental perturbation of a metabolic regulatory network, and thus provide insights into the structure and regulation of that network. Because of the chemical complexity of the metabolome and limitations associated with individual analytical platforms for determining the metabolome, it is currently difficult to capture the complete metabolome of an organism or tissue, which is in contrast to genomics and transcriptomics. This paper describes the analysis of Arabidopsis metabolomics data sets acquired by a consortium that includes five analytical laboratories, bioinformaticists, and biostatisticians, which aims to develop and validate metabolomics as a hypothesis-generating functional genomics tool. The consortium is determining the metabolomes of Arabidopsis T-DNA mutant stocks, grown in standardized controlled environment optimized to minimize environmental impacts on the metabolomes. Metabolomics data were generated with seven analytical platforms, and the combined data is being provided to the research community to formulate initial hypotheses about genes of unknown function (GUFs). A public database (www.PlantMetabolomics.org) has been developed to provide the scientific community with access to the data along with tools to allow for its interactive analysis. Exemplary datasets are discussed to validate the approach, which illustrate how initial hypotheses can be generated from the consortium-produced metabolomics data, integrated with prior knowledge to provide a testable hypothesis concerning the functionality of GUFs. PMID:22645570

Quanbeck, Stephanie M.; Brachova, Libuse; Campbell, Alexis A.; Guan, Xin; Perera, Ann; He, Kun; Rhee, Seung Y.; Bais, Preeti; Dickerson, Julie A.; Dixon, Philip; Wohlgemuth, Gert; Fiehn, Oliver; Barkan, Lenore; Lange, Iris; Lange, B. Markus; Lee, Insuk; Cortes, Diego; Salazar, Carolina; Shuman, Joel; Shulaev, Vladimir; Huhman, David V.; Sumner, Lloyd W.; Roth, Mary R.; Welti, Ruth; Ilarslan, Hilal; Wurtele, Eve S.; Nikolau, Basil J.

2012-01-01

45

Functional Annotation of Rheumatoid Arthritis and Osteoarthritis Associated Genes by Integrative Genome-Wide Gene Expression Profiling Analysis  

PubMed Central

Background Rheumatoid arthritis (RA) and osteoarthritis (OA) are two major types of joint diseases that share multiple common symptoms. However, their pathological mechanism remains largely unknown. The aim of our study is to identify RA and OA related-genes and gain an insight into the underlying genetic basis of these diseases. Methods We collected 11 whole genome-wide expression profiling datasets from RA and OA cohorts and performed a meta-analysis to comprehensively investigate their expression signatures. This method can avoid some pitfalls of single dataset analyses. Results and Conclusion We found that several biological pathways (i.e., the immunity, inflammation and apoptosis related pathways) are commonly involved in the development of both RA and OA. Whereas several other pathways (i.e., vasopressin-related pathway, regulation of autophagy, endocytosis, calcium transport and endoplasmic reticulum stress related pathways) present significant difference between RA and OA. This study provides novel insights into the molecular mechanisms underlying this disease, thereby aiding the diagnosis and treatment of the disease. PMID:24551036

Li, Zhan-Chun; Xiao, Jie; Peng, Jin-Liang; Chen, Jian-Wei; Ma, Tao; Cheng, Guang-Qi; Dong, Yu-Qi; Wang, Wei-li; Liu, Zu-De

2014-01-01

46

Annotation and functional assignment of the genes for the C30 carotenoid pathways from the genomes of two bacteria: Bacillus indicus and Bacillus firmus.  

PubMed

Bacillus indicus and Bacillus firmus synthesize C30 carotenoids via farnesyl pyrophosphate, forming apophytoene as the first committed step in the pathway. The products of the pathways were methyl 4'-[6-O-acyl-glycosyl)oxy]-4,4'-diapolycopen-4-oic acid and 4,4'-diapolycopen-4,4'-dioic acid with putative glycosyl esters. The genomes of both bacteria were sequenced, and the genes for their early terpenoid and specific carotenoid pathways annotated. All genes for a functional 1-deoxy-d-xylulose 5-phosphate synthase pathway were identified in both species, whereas genes of the mevalonate pathway were absent. The genes for specific carotenoid synthesis and conversion were found on gene clusters which were organized differently in the two species. The genes involved in the formation of the carotenoid cores were assigned by functional complementation in Escherichia coli. This bacterium was co-transformed with a plasmid mediating the formation of the putative substrate and a second plasmid with the gene of interest. Carotenoid products in the transformants were determined by HPLC. Using this approach, we identified the genes for a 4,4'-diapophytoene synthase (crtM), 4,4'-diapophytoene desaturase (crtNa), 4,4'-diapolycopene ketolase (crtNb) and 4,4'-diapolycopene aldehyde oxidase (crtNc). The three crtN genes were closely related and belonged to the crtI gene family with a similar reaction mechanism of their enzyme products. Additional genes encoding glycosyltransferases and acyltransferases for the modification of the carotenoid skeleton of the diapolycopenoic acids were identified by comparison with the corresponding genes from other bacteria. PMID:25326460

Steiger, Sabine; Perez-Fons, Laura; Cutting, Simon M; Fraser, Paul D; Sandmann, Gerhard

2015-01-01

47

GFam: a platform for automatic annotation of gene families  

PubMed Central

We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam’s capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/. PMID:22790981

Sasidharan, Rajkumar; Nepusz, Tamás; Swarbreck, David; Huala, Eva; Paccanaro, Alberto

2012-01-01

48

Protein family classification and functional annotation  

Microsoft Academic Search

With the accelerated accumulation of genomic sequence data, there is a pressing need to develop computational methods and advanced bioinformatics infrastructure for reliable and large-scale protein annotation and biological knowledge discovery. The Protein Information Resource (PIR) provides an integrated public resource of protein informatics to support genomic and proteomic research. PIR produces the Protein Sequence Database of functionally annotated protein

Cathy H. Wu; Hongzhan Huang; Lai-su L. Yeh; Winona C. Barker

2003-01-01

49

A method for increasing expressivity of Gene Ontology annotations using a compositional approach  

PubMed Central

Background The Gene Ontology project integrates data about the function of gene products across a diverse range of organisms, allowing the transfer of knowledge from model organisms to humans, and enabling computational analyses for interpretation of high-throughput experimental and clinical data. The core data structure is the annotation, an association between a gene product and a term from one of the three ontologies comprising the GO. Historically, it has not been possible to provide additional information about the context of a GO term, such as the target gene or the location of a molecular function. This has limited the specificity of knowledge that can be expressed by GO annotations. Results The GO Consortium has introduced annotation extensions that enable manually curated GO annotations to capture additional contextual details. Extensions represent effector–target relationships such as localization dependencies, substrates of protein modifiers and regulation targets of signaling pathways and transcription factors as well as spatial and temporal aspects of processes such as cell or tissue type or developmental stage. We describe the content and structure of annotation extensions, provide examples, and summarize the current usage of annotation extensions. Conclusions The additional contextual information captured by annotation extensions improves the utility of functional annotation by representing dependencies between annotations to terms in the different ontologies of GO, external ontologies, or an organism’s gene products. These enhanced annotations can also support sophisticated queries and reasoning, and will provide curated, directional links between many gene products to support pathway and network reconstruction. PMID:24885854

2014-01-01

50

Taxonomic Precision of Different Hypervariable Regions of 16S rRNA Gene and Annotation Methods for Functional Bacterial Groups in Biological Wastewater Treatment  

PubMed Central

High throughput sequencing of 16S rRNA gene leads us into a deeper understanding on bacterial diversity for complex environmental samples, but introduces blurring due to the relatively low taxonomic capability of short read. For wastewater treatment plant, only those functional bacterial genera categorized as nutrient remediators, bulk/foaming species, and potential pathogens are significant to biological wastewater treatment and environmental impacts. Precise taxonomic assignment of these bacteria at least at genus level is important for microbial ecological research and routine wastewater treatment monitoring. Therefore, the focus of this study was to evaluate the taxonomic precisions of different ribosomal RNA (rRNA) gene hypervariable regions generated from a mix activated sludge sample. In addition, three commonly used classification methods including RDP Classifier, BLAST-based best-hit annotation, and the lowest common ancestor annotation by MEGAN were evaluated by comparing their consistency. Under an unsupervised way, analysis of consistency among different classification methods suggests there are no hypervariable regions with good taxonomic coverage for all genera. Taxonomic assignment based on certain regions of the 16S rRNA genes, e.g. the V1&V2 regions – provide fairly consistent taxonomic assignment for a relatively wide range of genera. Hence, it is recommended to use these regions for studying functional groups in activated sludge. Moreover, the inconsistency among methods also demonstrated that a specific method might not be suitable for identification of some bacterial genera using certain 16S rRNA gene regions. As a general rule, drawing conclusions based only on one sequencing region and one classification method should be avoided due to the potential false negative results. PMID:24146837

Guo, Feng; Ju, Feng; Cai, Lin; Zhang, Tong

2013-01-01

51

Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO)  

Microsoft Academic Search

The Saccharomyces Genome Database (SGD) resources, ranging from genetic and physical maps to genome-wide analysis tools, reflect the scientific progress in identifying genes and their functions over the last decade. As emphasis shifts from identi- fication of the genes to identification of the role of their gene products in the cell, SGD seeks to provide its users with annotations that

Selina S. Dwight; Midori A. Harris; Kara Dolinski; Catherine A. Ball; Gail Binkley; Karen R. Christie; Dianna G. Fisk; Laurie Issel-tarver; Mark Schroeder; Gavin Sherlock; Anand Sethuraman; Shuai Weng; David Botstein; J. Michael Cherry

2002-01-01

52

A guide to best practices for Gene Ontology (GO) manual annotation  

PubMed Central

The Gene Ontology Consortium (GOC) is a community-based bioinformatics project that classifies gene product function through the use of structured controlled vocabularies. A fundamental application of the Gene Ontology (GO) is in the creation of gene product annotations, evidence-based associations between GO definitions and experimental or sequence-based analysis. Currently, the GOC disseminates 126 million annotations covering >374 000 species including all the kingdoms of life. This number includes two classes of GO annotations: those created manually by experienced biocurators reviewing the literature or by examination of biological data (1.1 million annotations covering 2226 species) and those generated computationally via automated methods. As manual annotations are often used to propagate functional predictions between related proteins within and between genomes, it is critical to provide accurate consistent manual annotations. Toward this goal, we present here the conventions defined by the GOC for the creation of manual annotation. This guide represents the best practices for manual annotation as established by the GOC project over the past 12 years. We hope this guide will encourage research communities to annotate gene products of their interest to enhance the corpus of GO annotations available to all. Database URL: http://www.geneontology.org PMID:23842463

Balakrishnan, Rama; Harris, Midori A.; Huntley, Rachael; Van Auken, Kimberly; Cherry, J. Michael

2013-01-01

53

Systems developmental biology: the use of ontologies in annotating models and in identifying gene function within and across species  

Microsoft Academic Search

Systems developmental biology is an approach to the study of embryogenesis that attempts to analyze complex developmental\\u000a processes through integrating the roles of their molecular, cellular, and tissue participants within a computational framework.\\u000a This article discusses ways of annotating these participants using standard terms and IDs now available in public ontologies\\u000a (these are areas of hierarchical knowledge formalized to be

Jonathan Bard

2007-01-01

54

Genomic Resources for Gene Discovery, Functional Genome Annotation, and Evolutionary Studies of Maize and Its Close Relatives  

PubMed Central

Maize is one of the most important food crops and a key model for genetics and developmental biology. A genetically anchored and high-quality draft genome sequence of maize inbred B73 has been obtained to serve as a reference sequence. To facilitate evolutionary studies in maize and its close relatives, much like the Oryza Map Alignment Project (OMAP) (www.OMAP.org) bacterial artificial chromosome (BAC) resource did for the rice community, we constructed BAC libraries for maize inbred lines Zheng58, Chang7-2, and Mo17 and maize wild relatives Zea mays ssp. parviglumis and Tripsacum dactyloides. Furthermore, to extend functional genomic studies to maize and sorghum, we also constructed binary BAC (BIBAC) libraries for the maize inbred B73 and the sorghum landrace Nengsi-1. The BAC/BIBAC vectors facilitate transfer of large intact DNA inserts from BAC clones to the BIBAC vector and functional complementation of large DNA fragments. These seven Zea Map Alignment Project (ZMAP) BAC/BIBAC libraries have average insert sizes ranging from 92 to 148 kb, organellar DNA from 0.17 to 2.3%, empty vector rates between 0.35 and 5.56%, and genome equivalents of 4.7- to 8.4-fold. The usefulness of the Parviglumis and Tripsacum BAC libraries was demonstrated by mapping clones to the reference genome. Novel genes and alleles present in these ZMAP libraries can now be used for functional complementation studies and positional or homology-based cloning of genes for translational genomics. PMID:24037269

Wang, Chao; Shi, Xue; Liu, Lin; Li, Haiyan; Ammiraju, Jetty S.S.; Kudrna, David A.; Xiong, Wentao; Wang, Hao; Dai, Zhaozhao; Zheng, Yonglian; Lai, Jinsheng; Jin, Weiwei; Messing, Joachim; Bennetzen, Jeffrey L; Wing, Rod A.; Luo, Meizhong

2013-01-01

55

Genomic resources for gene discovery, functional genome annotation, and evolutionary studies of maize and its close relatives.  

PubMed

Maize is one of the most important food crops and a key model for genetics and developmental biology. A genetically anchored and high-quality draft genome sequence of maize inbred B73 has been obtained to serve as a reference sequence. To facilitate evolutionary studies in maize and its close relatives, much like the Oryza Map Alignment Project (OMAP) (www.OMAP.org) bacterial artificial chromosome (BAC) resource did for the rice community, we constructed BAC libraries for maize inbred lines Zheng58, Chang7-2, and Mo17 and maize wild relatives Zea mays ssp. parviglumis and Tripsacum dactyloides. Furthermore, to extend functional genomic studies to maize and sorghum, we also constructed binary BAC (BIBAC) libraries for the maize inbred B73 and the sorghum landrace Nengsi-1. The BAC/BIBAC vectors facilitate transfer of large intact DNA inserts from BAC clones to the BIBAC vector and functional complementation of large DNA fragments. These seven Zea Map Alignment Project (ZMAP) BAC/BIBAC libraries have average insert sizes ranging from 92 to 148 kb, organellar DNA from 0.17 to 2.3%, empty vector rates between 0.35 and 5.56%, and genome equivalents of 4.7- to 8.4-fold. The usefulness of the Parviglumis and Tripsacum BAC libraries was demonstrated by mapping clones to the reference genome. Novel genes and alleles present in these ZMAP libraries can now be used for functional complementation studies and positional or homology-based cloning of genes for translational genomics. PMID:24037269

Wang, Chao; Shi, Xue; Liu, Lin; Li, Haiyan; Ammiraju, Jetty S S; Kudrna, David A; Xiong, Wentao; Wang, Hao; Dai, Zhaozhao; Zheng, Yonglian; Lai, Jinsheng; Jin, Weiwei; Messing, Joachim; Bennetzen, Jeffrey L; Wing, Rod A; Luo, Meizhong

2013-11-01

56

Annotation of stress responsive candidate genes in peanut ESTs.  

PubMed

Peanut (Arachis hypogaea L.) is an internationally important crop for human consumption as a good source of protein and vegetable oil. Peanut is widely cultivated around the world in tropical, sub-tropical and warm temperate climate. Because of its huge genome size (2.8 Gb) and unsequenced genome, studies on genomics and genetic modification of peanut are less as compared to other model crops. As peanut can be cultivated in arid and semi-arid regions, and its growth is drastically affected by various stresses that reduces the yield. Therefore, study on stress responsive genes and its regulation are very much important. Here we report about the identification and annotation of some stress responsive candidate genes using peanut Expressed Sequences Tags (ESTs). The selection of genes was based on the publically available expression data. Due to good expression data and lack of available literature in peanut some of the stress responsive genes were screened. Individual EST of the said group were further searched in peanut ESTs (1, 78,490 whole EST sequences) using computational approach. Various tools like Vec-Screen, Repeat Masker, EST Trimmer, DNA Baser and WISE2 were being used for stress responsive gene identification and annotation. Research progress made towards contigs assembly, determination of biological function of genes, and prediction of domain as well as 3D structure for related protein are included. PMID:25183351

Ranjan, Amar; Kumari, Archana; Pandey, Dev Mani

2014-09-01

57

Functional annotation of the human retinal pigment epithelium transcriptome  

PubMed Central

Background To determine level, variability and functional annotation of gene expression of the human retinal pigment epithelium (RPE), the key tissue involved in retinal diseases like age-related macular degeneration and retinitis pigmentosa. Macular RPE cells from six selected healthy human donor eyes (aged 63–78 years) were laser dissected and used for 22k microarray studies (Agilent technologies). Data were analyzed with Rosetta Resolver, the web tool DAVID and Ingenuity software. Results In total, we identified 19,746 array entries with significant expression in the RPE. Gene expression was analyzed according to expression levels, interindividual variability and functionality. A group of highly (n = 2,194) expressed RPE genes showed an overrepresentation of genes of the oxidative phosphorylation, ATP synthesis and ribosome pathways. In the group of moderately expressed genes (n = 8,776) genes of the phosphatidylinositol signaling system and aminosugars metabolism were overrepresented. As expected, the top 10 percent (n = 2,194) of genes with the highest interindividual differences in expression showed functional overrepresentation of the complement cascade, essential in inflammation in age-related macular degeneration, and other signaling pathways. Surprisingly, this same category also includes the genes involved in Bruch's membrane (BM) composition. Among the top 10 percent of genes with low interindividual differences, there was an overrepresentation of genes involved in local glycosaminoglycan turnover. Conclusion Our study expands current knowledge of the RPE transcriptome by assigning new genes, and adding data about expression level and interindividual variation. Functional annotation suggests that the RPE has high levels of protein synthesis, strong energy demands, and is exposed to high levels of oxidative stress and a variable degree of inflammation. Our data sheds new light on the molecular composition of BM, adjacent to the RPE, and is useful for candidate retinal disease gene identification or gene dose-dependent therapeutic studies. PMID:19379482

Booij, Judith C; van Soest, Simone; Swagemakers, Sigrid MA; Essing, Anke HW; Verkerk, Annemieke JMH; van der Spek, Peter J; Gorgels, Theo GMF; Bergen, Arthur AB

2009-01-01

58

dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts.  

PubMed

The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284

Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M Fouad; Agier, Marie; Martre, Pierre

2013-01-01

59

dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts  

PubMed Central

The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284

Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre

2013-01-01

60

MeSH Key Terms for Validation and Annotation of Gene Expression Clusters  

E-print Network

MeSH Key Terms for Validation and Annotation of Gene Expression Clusters Andreas Rechtsteiner and Luis M Rocha 1 Keywords: gene expression analysis, validation, information retrieval, automated and obtain functional information for gene expression clusters2. The controlled and hierarchical Me

Rocha, Luis

61

Improving functional annotation for industrial microbes: a case study with Pichia pastoris  

PubMed Central

The research communities studying microbial model organisms, such as Escherichia coli or Saccharomyces cerevisiae, are well served by model organism databases that have extensive functional annotation. However, this is not true of many industrial microbes that are used widely in biotechnology. In this Opinion piece, we use Pichia (Komagataella) pastoris to illustrate the limitations of the available annotation. We consider the resources that can be implemented in the short term both to improve Gene Ontology (GO) annotation coverage based on annotation transfer, and to establish curation pipelines for the literature corpus of this organism. PMID:24929579

Dikicioglu, Duygu; Wood, Valerie; Rutherford, Kim M.; McDowall, Mark D.; Oliver, Stephen G.

2014-01-01

62

The use of semantic similarity measures for optimally integrating heterogeneous Gene Ontology data from large scale annotation pipelines  

PubMed Central

With the advancement of new high throughput sequencing technologies, there has been an increase in the number of genome sequencing projects worldwide, which has yielded complete genome sequences of human, animals and plants. Subsequently, several labs have focused on genome annotation, consisting of assigning functions to gene products, mostly using Gene Ontology (GO) terms. As a consequence, there is an increased heterogeneity in annotations across genomes due to different approaches used by different pipelines to infer these annotations and also due to the nature of the GO structure itself. This makes a curator's task difficult, even if they adhere to the established guidelines for assessing these protein annotations. Here we develop a genome-scale approach for integrating GO annotations from different pipelines using semantic similarity measures. We used this approach to identify inconsistencies and similarities in functional annotations between orthologs of human and Drosophila melanogaster, to assess the quality of GO annotations derived from InterPro2GO mappings compared to manually annotated GO annotations for the Drosophila melanogaster proteome from a FlyBase dataset and human, and to filter GO annotation data for these proteomes. Results obtained indicate that an efficient integration of GO annotations eliminates redundancy up to 27.08 and 22.32% in the Drosophila melanogaster and human GO annotation datasets, respectively. Furthermore, we identified lack of and missing annotations for some orthologs, and annotation mismatches between InterPro2GO and manual pipelines in these two proteomes, thus requiring further curation. This simplifies and facilitates tasks of curators in assessing protein annotations, reduces redundancy and eliminates inconsistencies in large annotation datasets for ease of comparative functional genomics. PMID:25147557

Mazandu, Gaston K.; Mulder, Nicola J.

2014-01-01

63

Functional annotation of a full-length mouse cDNA collection  

Microsoft Academic Search

The RIKEN Mouse Gene Encyclopaedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collection and sequencing of full-length complementary DNAs and physical mapping of the corresponding genes to the mouse genome. We organized an international functional annotation meeting (FANTOM) to annotate the first 21,076 cDNAs to be analysed in this project. Here we

J. Kawai; A. Shinagawa; K. Shibata; M. Yoshino; M. Itoh; Y. Ishii; T. Arakawa; A. Hara; Y. Fukunishi; H. Konno; J. Adachi; S. Fukuda; K. Aizawa; M. Izawa; K. Nishi; H. Kiyosawa; S. Kondo; I. Yamanaka; T. Saito; Y. Okazaki; T. Gojobori; H. Bono; T. Kasukawa; R. Saito; K. Kadota; H. Matsuda; M. Ashburner; S. Batalov; T. Casavant; W. Fleischmann; T. Gaasterland; C. Gissi; B. King; H. Kochiwa; P. Kuehl; S. Lewis; Y. Matsuo; I. Nikaido; G. Pesole; J. Quackenbush; L. M. Schriml; F. Staubli; R. Suzuki; M. Tomita; L. Wagner; T. Washio; K. Sakai; T. Okido; M. Furuno; H. Aono; R. Baldarelli; G. Barsh; J. Blake; D. Boffelli; N. Bojunga; P. Carninci; M. F. de Bonaldo; M. J. Brownstein; C. Bult; C. Fletcher; M. Fujita; M. Gariboldi; S. Gustincich; D. Hill; M. Hofmann; D. A. Hume; M. Kamiya; N. H. Lee; P. Lyons; L. Marchionni; J. Mashima; J. Mazzarelli; P. Mombaerts; P. Nordone; B. Ring; M. Ringwald; I. Rodriguez; N. Sakamoto; H. Sasaki; K. Sato; C. Schönbach; T. Seya; Y. Shibata; K.-F. Storch; H. Suzuki; K. Toyo-oka; K. H. Wang; C. Weitz; C. Whittaker; L. Wilming; A. Wynshaw-Boris; K. Yoshida; Y. Hasegawa; H. Kawaji; S. Kohtsuki; Y. Hayashizaki

2001-01-01

64

A robust data-driven approach for gene ontology annotation  

PubMed Central

Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For the evidence sentence extraction subtask, we built a binary classifier to identify evidence sentences using reference distance estimator (RDE), a recently proposed semi-supervised learning method that learns new features from around 10 million unlabeled sentences, achieving an F1 of 19.3% in exact match and 32.5% in relaxed match. In the post-submission experiment, we obtained 22.1% and 35.7% F1 performance by incorporating bigram features in RDE learning. In both development and test sets, RDE-based method achieved over 20% relative improvement on F1 and AUC performance against classical supervised learning methods, e.g. support vector machine and logistic regression. For the GO term prediction subtask, we developed an information retrieval-based method to retrieve the GO term most relevant to each evidence sentence using a ranking function that combined cosine similarity and the frequency of GO terms in documents, and a filtering method based on high-level GO classes. The best performance of our submitted runs was 7.8% F1 and 22.2% hierarchy F1. We found that the incorporation of frequency information and hierarchy filtering substantially improved the performance. In the post-submission evaluation, we obtained a 10.6% F1 using a simpler setting. Overall, the experimental analysis showed our approaches were robust in both the two tasks. PMID:25425037

Li, Yanpeng; Yu, Hong

2014-01-01

65

Large-Scale Protein Annotation through Gene Ontology  

PubMed Central

Recent progress in genomic sequencing, computational biology, and ontology development has presented an opportunity to investigate biological systems from a unique perspective, that is, examining genomes and transcriptomes through the multiple and hierarchical structure of Gene Ontology (GO). We report here our development of GO Engine, a computational platform for GO annotation, and analysis of the resultant GO annotations of human proteins. Protein annotation was centered on sequence homology with GO-annotated proteins and protein domain analysis. Text information analysis and a multiparameter cellular localization predictive tool were also used to increase the annotation accuracy, and to predict novel annotations. The majority of proteins corresponding to full-length mRNA in GenBank, and the majority of proteins in the NR database (nonredundant database of proteins) were annotated with one or more GO nodes in each of the three GO categories. The annotations of GenBank and SWISS-PROT proteins are available to the public at the GO Consortium web site. PMID:11997345

Xie, Hanqing; Wasserman, Alon; Levine, Zurit; Novik, Amit; Grebinskiy, Vladimir; Shoshan, Avi; Mintz, Liat

2002-01-01

66

CATH: comprehensive structural and functional annotations for genome sequences.  

PubMed

The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235 000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our 'current' putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages. PMID:25348408

Sillitoe, Ian; Lewis, Tony E; Cuff, Alison; Das, Sayoni; Ashford, Paul; Dawson, Natalie L; Furnham, Nicholas; Laskowski, Roman A; Lee, David; Lees, Jonathan G; Lehtinen, Sonja; Studer, Romain A; Thornton, Janet; Orengo, Christine A

2015-01-28

67

Novel definition files for human GeneChips based on GeneAnnot  

Microsoft Academic Search

BACKGROUND: Improvements in genome sequence annotation revealed discrepancies in the original probeset\\/gene assignment in Affymetrix microarray and the existence of differences between annotations and effective alignments of probes and transcription products. In the current generation of Affymetrix human GeneChips, most probesets include probes matching transcripts from more than one gene and probes which do not match any transcribed sequence. RESULTS:

Francesco Ferrari; Stefania Bortoluzzi; Alessandro Coppe; Alexandra Sirota; Marilyn Safran; Michael Shmoish; Sergio Ferrari; Doron Lancet; Gian Antonio Danieli; Silvio Bicciato

2007-01-01

68

Drosophila Gene Expression Pattern Annotation through Multi-Instance  

E-print Network

Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning Ying-Xin Li, Shuiwang Ji, Sudhir Kumar, Jieping Ye, and Zhi-Hua Zhou Abstract--In the studies of DrosophilaExpress database (a digital library of standardized Drosophila gene expression pattern images) reveal

Ji, Shuiwang

69

De Novo Assembly, Functional Annotation and Comparative Analysis of Withania somnifera Leaf and Root Transcriptomes to Identify Putative Genes Involved in the Withanolides Biosynthesis  

PubMed Central

Withania somnifera is one of the most valuable medicinal plants used in Ayurvedic and other indigenous medicine systems due to bioactive molecules known as withanolides. As genomic information regarding this plant is very limited, little information is available about biosynthesis of withanolides. To facilitate the basic understanding about the withanolide biosynthesis pathways, we performed transcriptome sequencing for Withania leaf (101L) and root (101R) which specifically synthesize withaferin A and withanolide A, respectively. Pyrosequencing yielded 8,34,068 and 7,21,755 reads which got assembled into 89,548 and 1,14,814 unique sequences from 101L and 101R, respectively. A total of 47,885 (101L) and 54,123 (101R) could be annotated using TAIR10, NR, tomato and potato databases. Gene Ontology and KEGG analyses provided a detailed view of all the enzymes involved in withanolide backbone synthesis. Our analysis identified members of cytochrome P450, glycosyltransferase and methyltransferase gene families with unique presence or differential expression in leaf and root and might be involved in synthesis of tissue-specific withanolides. We also detected simple sequence repeats (SSRs) in transcriptome data for use in future genetic studies. Comprehensive sequence resource developed for Withania, in this study, will help to elucidate biosynthetic pathway for tissue-specific synthesis of secondary plant products in non-model plant organisms as well as will be helpful in developing strategies for enhanced biosynthesis of withanolides through biotechnological approaches. PMID:23667511

Gupta, Parul; Goel, Ridhi; Pathak, Sumya; Srivastava, Apeksha; Singh, Surya Pratap; Sangwan, Rajender Singh; Asif, Mehar Hasan; Trivedi, Prabodh Kumar

2013-01-01

70

Lynx web services for annotations and systems analysis of multi-gene disorders.  

PubMed

Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform. PMID:24948611

Sulakhe, Dinanath; Taylor, Andrew; Balasubramanian, Sandhya; Feng, Bo; Xie, Bingqing; Börnigen, Daniela; Dave, Utpal J; Foster, Ian T; Gilliam, T Conrad; Maltsev, Natalia

2014-07-01

71

Lynx web services for annotations and systems analysis of multi-gene disorders  

PubMed Central

Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform. PMID:24948611

Sulakhe, Dinanath; Taylor, Andrew; Balasubramanian, Sandhya; Feng, Bo; Xie, Bingqing; Börnigen, Daniela; Dave, Utpal J.; Foster, Ian T.; Gilliam, T. Conrad; Maltsev, Natalia

2014-01-01

72

Proteomic Detection of Non-Annotated Protein-Coding Genes in Pseudomonas fluorescens Pf0-1  

SciTech Connect

Genome sequences are annotated by computational prediction of coding sequences, followed by similarity searches such as BLAST, which provide a layer of (possible) functional information. While the existence of processes such as alternative splicing complicates matters for eukaryote genomes, the view of bacterial genomes as a linear series of closely spaced genes leads to the assumption that computational annotations which predict such arrangements completely describe the coding capacity of bacterial genomes. We undertook a proteomic study to identify proteins expressed by Pseudomonas fluorescens Pf0-1 from genes which were not predicted during the genome annotation. Mapping peptides to the Pf0-1 genome sequence identified sixteen non-annotated protein-coding regions, of which nine were antisense to predicted genes, six were intergenic, and one read in the same direction as an annotated gene but in a different frame. The expression of all but one of the newly discovered genes was verified by RT-PCR. Few clues as to the function of the new genes were gleaned from informatic analyses, but potential orthologues in other Pseudomonas genomes were identified for eight of the new genes. The 16 newly identified genes improve the quality of the Pf0-1 genome annotation, and the detection of antisense protein-coding genes indicates the under-appreciated complexity of bacterial genome organization.

Kim, Wook; Silby, Mark W.; Purvine, Samuel O.; Nicoll, Julie S.; Hixson, Kim K.; Monroe, Matthew E.; Nicora, Carrie D.; Lipton, Mary S.; Levy, Stuart B.

2009-12-24

73

Functional annotation of a full-length mouse cDNA collection  

SciTech Connect

The RIKEN Mouse Gene Encyclopedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collection and sequencing of full-length complementary DNAs and physical mapping of the corresponding genes to the mouse genome. We organized an international functional annotation meeting (FANTOM) to annotate the first 21,076 cDNAs to be analyzed in this project. Here we describe the first RIKEN clone collection, which is one of the largest described for any organism. Analysis of these cDNAs extends known gene families and identifies new ones.

Kawai, J.; Shinagawa, A.; Shibata, K.; Yoshino, M.; Itoh, M.; Ishii, Y.; Arakawa, T.; Hara, A.; Fukunishi, Y.; Konno, H.; Adachi, J.; Fukuda, S.; Aizawa, K.; Izawa, M.; Nishi, K.; Kiyosawa, H.; Kondo, S.; Yamanaka, I.; Saito, T.; Okazaki, Y.; Gojobori, T.; Bono, H.; Kasukawa, T.; Saito, R.; Kadota, K.; Matsuda, H.; Ashburner, M.; Batalov, S.; Casavant, T.; Fleischmann, W.; Gaasterland, T.; Gissi, C.; King, B.; Kochiwa, H.; Kuehl, P.; Lewis, S.; Matsuo, Y.; Nikaido, I.; Pesole, G.; Quackenbush, J.; Schriml, L.M.; Staubli, F.; Suzuki, R.; Tomita, M.; Wagner, L.; Washio, T.; Sakai, K.; Okido, T.; Furuno, M.; Aono, H.; Baldarelli, R.; Barsh, G.; Blake, J.; Boffelli, D.; Bojunga, N.; Carninci, P.; de Bonaldo, M.F.; Brownstein, M.J.; Bult, C.; Fletcher, C.; Fujita, M.; Gariboldi, M.; Gustincich, S.; Hill, D.; Hofmann, M.; Hume, D.A.; Kamiya, M.; Lee, N.H.; Lyons, P.; Marchionni, L.; Mashima, J.; Mazzarelli, J.; Mombaerts, P.; Nordone, P.; Ring, B.; Ringwald, M.; Rodriguez, I.; Sakamoto, N.; Sasaki, H.; Sato, K.; Schonbach, C.; Seya, T.; Shibata, Y.; Storch, K.-F.; Suzuki, H.; Toyo-oka, K.; Wang, K.H.; Weitz, C.; Whittaker, C.; Wilming, L.; Wynshaw-Boris, A.; Yoshida, K.; Hasegawa, Y.; Kawaji, H.; Kohtsuki, S.; Hayashizaki, Y.; RIKEN Genome Exploration Research Group Phase II T; FANTOM Consortium

2001-01-01

74

Comprehensive annotation of secondary metabolite biosynthetic genes and gene clusters of Aspergillus nidulans, A. fumigatus, A. niger and A. oryzae  

PubMed Central

Background Secondary metabolite production, a hallmark of filamentous fungi, is an expanding area of research for the Aspergilli. These compounds are potent chemicals, ranging from deadly toxins to therapeutic antibiotics to potential anti-cancer drugs. The genome sequences for multiple Aspergilli have been determined, and provide a wealth of predictive information about secondary metabolite production. Sequence analysis and gene overexpression strategies have enabled the discovery of novel secondary metabolites and the genes involved in their biosynthesis. The Aspergillus Genome Database (AspGD) provides a central repository for gene annotation and protein information for Aspergillus species. These annotations include Gene Ontology (GO) terms, phenotype data, gene names and descriptions and they are crucial for interpreting both small- and large-scale data and for aiding in the design of new experiments that further Aspergillus research. Results We have manually curated Biological Process GO annotations for all genes in AspGD with recorded functions in secondary metabolite production, adding new GO terms that specifically describe each secondary metabolite. We then leveraged these new annotations to predict roles in secondary metabolism for genes lacking experimental characterization. As a starting point for manually annotating Aspergillus secondary metabolite gene clusters, we used antiSMASH (antibiotics and Secondary Metabolite Analysis SHell) and SMURF (Secondary Metabolite Unknown Regions Finder) algorithms to identify potential clusters in A. nidulans, A. fumigatus, A. niger and A. oryzae, which we subsequently refined through manual curation. Conclusions This set of 266 manually curated secondary metabolite gene clusters will facilitate the investigation of novel Aspergillus secondary metabolites. PMID:23617571

2013-01-01

75

Functional annotation of colon cancer risk SNPs  

PubMed Central

Colorectal cancer (CRC) is a leading cause of cancer-related deaths in the United States. Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) associated with increased risk for CRC. A molecular understanding of the functional consequences of this genetic variation has been complicated because each GWAS SNP is a surrogate for hundreds of other SNPs, most of which are located in non-coding regions. Here we use genomic and epigenomic information to test the hypothesis that the GWAS SNPs and/or correlated SNPs are in elements that regulate gene expression, and identify 23 promoters and 28 enhancers. Using gene expression data from normal and tumour cells, we identify 66 putative target genes of the risk-associated enhancers (10 of which were also identified by promoter SNPs). Employing CRISPR nucleases, we delete one risk-associated enhancer and identify genes showing altered expression. We suggest that similar studies be performed to characterize all CRC risk-associated enhancers. PMID:25268989

Yao, Lijing; Tak, Yu Gyoung; Berman, Benjamin P.; Farnham, Peggy J.

2014-01-01

76

Image-level and group-level models for Drosophila gene expression pattern annotation  

PubMed Central

Background Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison. Results We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach. Conclusion In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation. PMID:24299119

2013-01-01

77

Functional Annotation and Comparative Analysis of a Zygopteran Transcriptome  

PubMed Central

In this paper we present a de novo assembly of the transcriptome of the damselfly (Enallagma hageni) through the use of 454 pyrosequencing. E. hageni is a member of the suborder Zygoptera, in the order Odonata, and Odonata organisms form the basal lineage of the winged insects (Pterygota). To date, sequence data used in phylogenetic analysis of Enallagma species have been derived from either mitochondrial DNA or ribosomal nuclear DNA. This Enallagma transcriptome contained 31,661 contigs that were assembled and translated into 14,813 individual open reading frames. Using these data, we constructed an extensive dataset of 634 orthologous nuclear protein-encoding genes across 11 species of Arthropoda and used Bayesian techniques to elucidate the position of Enallagma in the arthropod phylogenetic tree. Additionally, we demonstrated that the Enallagma transcriptome contains 169 genes that are evolving at rates that differ relative to those of the rest of the transcriptome (29 accelerated and 140 decreased), and, through multiple Gene Ontology searches and clustering methods, we present the first functional annotation of any palaeopteran’s transcriptome in the literature. PMID:23550132

Shanku, Alexander G.; McPeek, Mark A.; Kern, Andrew D.

2013-01-01

78

Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments  

SciTech Connect

EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

Haas, B J; Salzberg, S L; Zhu, W; Pertea, M; Allen, J E; Orvis, J; White, O; Buell, C R; Wortman, J R

2007-12-10

79

Annotator: Post-processing Software for generating function-based signatures from quantitative mass spectrometry  

PubMed Central

Mass spectrometry is used to investigate global changes in protein abundance in cell lysates. Increasingly powerful methods of data collection have emerged over the past decade, but this has left researchers with the task of sifting through mountains of data for biologically significant results. Often, the end result is a list of proteins with no obvious quantitative relationships to define the larger context of changes in cell behavior. Researchers are often forced to perform a manual analysis from this list or to fall back on a range of disparate tools, which can hinder the communication of results and their reproducibility. To address these methodological problems we developed Annotator, an application that filters validated mass spectrometry data and applies a battery of standardized heuristic and statistical tests to determine significance. To address systems-level interpretations we incorporated UniProt and Gene Ontology keywords as statistical units of analysis, yielding quantitative information about changes in abundance for an entire functional category. This provides a consistent and quantitative method for formulating conclusions about cellular behavior, independent of network models or standard enrichment analyses. Annotator allows for “bottom-up” annotations that are based on experimental data and not inferred by comparison to external or hypothetical models. Annotator was developed as an independent post-processing platform that runs on all common operating systems, thereby providing a useful tool for establishing the inherently dynamic nature of functional annotations, which depend on results from on-going proteomic experiments. Annotator is available for download at http://people.cs.uchicago.edu/~tyler/annotator/annotator_desktop_0.1.tar.gz. PMID:22224429

Sylvester, Juliesta E.; Bray, Tyler S.; Kron, Stephen J.

2012-01-01

80

Gene inactivation and its implications for annotation in the era of personal genomics  

PubMed Central

The first wave of personal genomes documents how no single individual genome contains the full complement of functional genes. Here, we describe the extent of variation in gene and pseudogene numbers between individuals arising from inactivation events such as premature termination or aberrant splicing due to single-nucleotide polymorphisms. This highlights the inadequacy of the current reference sequence and gene set. We present a proposal to define a reference gene set that will remain stable as more individuals are sequenced. In particular, we recommend that the ancestral allele be used to define the reference sequence from which a core human reference gene annotation set can be derived. In addition, we call for the development of an expanded gene set to include human-specific genes that have arisen recently and are absent from the ancestral set. PMID:21205862

Balasubramanian, Suganthi; Habegger, Lukas; Frankish, Adam; MacArthur, Daniel G.; Harte, Rachel; Tyler-Smith, Chris; Harrow, Jennifer; Gerstein, Mark

2011-01-01

81

Towards integrative gene functional similarity measurement  

PubMed Central

Background In Gene Ontology, the "Molecular Function" (MF) categorization is a widely used knowledge framework for gene function comparison and prediction. Its structure and annotation provide a convenient way to compare gene functional similarities at the molecular level. The existing gene similarity measures, however, solely rely on one or few aspects of MF without utilizing all the rich information available including structure, annotation, common terms, lowest common parents. Results We introduce a rank-based gene semantic similarity measure called InteGO by synergistically integrating the state-of-the-art gene-to-gene similarity measures. By integrating three GO based seed measures, InteGO significantly improves the performance by about two-fold in all the three species studied (yeast, Arabidopsis and human). Conclusions InteGO is a systematic and novel method to study gene functional associations. The software and description are available at http://www.msu.edu/~jinchen/InteGO. PMID:24564710

2014-01-01

82

High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource  

PubMed Central

The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today’s annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed. PMID:24927599

Seaver, Samuel M. D.; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M. T.; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D.; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D.; Henry, Christopher S.

2014-01-01

83

High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource.  

PubMed

The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed. PMID:24927599

Seaver, Samuel M D; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M T; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D; Henry, Christopher S

2014-07-01

84

Comprehensive Functional Annotation of 77 Prostate Cancer Risk Loci  

PubMed Central

Genome-wide association studies (GWAS) have revolutionized the field of cancer genetics, but the causal links between increased genetic risk and onset/progression of disease processes remain to be identified. Here we report the first step in such an endeavor for prostate cancer. We provide a comprehensive annotation of the 77 known risk loci, based upon highly correlated variants in biologically relevant chromatin annotations— we identified 727 such potentially functional SNPs. We also provide a detailed account of possible protein disruption, microRNA target sequence disruption and regulatory response element disruption of all correlated SNPs at . 88% of the 727 SNPs fall within putative enhancers, and many alter critical residues in the response elements of transcription factors known to be involved in prostate biology. We define as risk enhancers those regions with enhancer chromatin biofeatures in prostate-derived cell lines with prostate-cancer correlated SNPs. To aid the identification of these enhancers, we performed genomewide ChIP-seq for H3K27-acetylation, a mark of actively engaged enhancers, as well as the transcription factor TCF7L2. We analyzed in depth three variants in risk enhancers, two of which show significantly altered androgen sensitivity in LNCaP cells. This includes rs4907792, that is in linkage disequilibrium () with an eQTL for NUDT11 (on the X chromosome) in prostate tissue, and rs10486567, the index SNP in intron 3 of the JAZF1 gene on chromosome 7. Rs4907792 is within a critical residue of a strong consensus androgen response element that is interrupted in the protective allele, resulting in a 56% decrease in its androgen sensitivity, whereas rs10486567 affects both NKX3-1 and FOXA-AR motifs where the risk allele results in a 39% increase in basal activity and a 28% fold-increase in androgen stimulated enhancer activity. Identification of such enhancer variants and their potential target genes represents a preliminary step in connecting risk to disease process. PMID:24497837

Hazelett, Dennis J.; Rhie, Suhn Kyong; Gaddis, Malaina; Yan, Chunli; Lakeland, Daniel L.; Coetzee, Simon G.; Henderson, Brian E.; Noushmehr, Houtan; Cozen, Wendy; Kote-Jarai, Zsofia; Eeles, Rosalind A.; Easton, Douglas F.; Haiman, Christopher A.; Lu, Wange; Farnham, Peggy J.; Coetzee, Gerhard A.

2014-01-01

85

RESEARCH ARTICLE Open Access Transferring functional annotations of membrane  

E-print Network

RESEARCH ARTICLE Open Access Transferring functional annotations of membrane transporters to the hierarchical transporter classification (TC) system [5] adopted by the International Union of Biochemistry. This is an open access article distributed under the terms of the Creative Commons Attribution License (http

Kaski, Samuel

86

DATABASE Open Access DFLAT: functional annotation for human  

E-print Network

ability to interpret genomic studies of human fetal and neonatal development. Keywords: Human developmentDATABASE Open Access DFLAT: functional annotation for human development Heather C Wick1* , Harold Bianchi3,4 and Donna K Slonim1,3* Abstract Background: Recent increases in genomic studies

Kaski, Samuel

87

Integrating information retrieval with distant supervision for Gene Ontology annotation  

PubMed Central

This article describes our participation of the Gene Ontology Curation task (GO task) in BioCreative IV where we participated in both subtasks: A) identification of GO evidence sentences (GOESs) for relevant genes in full-text articles and B) prediction of GO terms for relevant genes in full-text articles. For subtask A, we trained a logistic regression model to detect GOES based on annotations in the training data supplemented with more noisy negatives from an external resource. Then, a greedy approach was applied to associate genes with sentences. For subtask B, we designed two types of systems: (i) search-based systems, which predict GO terms based on existing annotations for GOESs that are of different textual granularities (i.e., full-text articles, abstracts, and sentences) using state-of-the-art information retrieval techniques (i.e., a novel application of the idea of distant supervision) and (ii) a similarity-based system, which assigns GO terms based on the distance between words in sentences and GO terms/synonyms. Our best performing system for subtask A achieves an F1 score of 0.27 based on exact match and 0.387 allowing relaxed overlap match. Our best performing system for subtask B, a search-based system, achieves an F1 score of 0.075 based on exact match and 0.301 considering hierarchical matches. Our search-based systems for subtask B significantly outperformed the similarity-based system. Database URL: https://github.com/noname2020/Bioc PMID:25183856

Zhu, Dongqing; Li, Dingcheng; Carterette, Ben; Liu, Hongfang

2014-01-01

88

miRDB: an online resource for microRNA target prediction and functional annotations.  

PubMed

MicroRNAs (miRNAs) are small non-coding RNAs that are extensively involved in many physiological and disease processes. One major challenge in miRNA studies is the identification of genes regulated by miRNAs. To this end, we have developed an online resource, miRDB (http://mirdb.org), for miRNA target prediction and functional annotations. Here, we describe recently updated features of miRDB, including 2.1 million predicted gene targets regulated by 6709 miRNAs. In addition to presenting precompiled prediction data, a new feature is the web server interface that allows submission of user-provided sequences for miRNA target prediction. In this way, users have the flexibility to study any custom miRNAs or target genes of interest. Another major update of miRDB is related to functional miRNA annotations. Although thousands of miRNAs have been identified, many of the reported miRNAs are not likely to play active functional roles or may even have been falsely identified as miRNAs from high-throughput studies. To address this issue, we have performed combined computational analyses and literature mining, and identified 568 and 452 functional miRNAs in humans and mice, respectively. These miRNAs, as well as associated functional annotations, are presented in the FuncMir Collection in miRDB. PMID:25378301

Wong, Nathan; Wang, Xiaowei

2015-01-28

89

Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning  

E-print Network

Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning Ying.kumar}@asu.edu Abstract The Berkeley Drosophila Genome Project (BDGP) has produced a large number of gene expression the annotation task. Empirical study shows that the proposed method outperforms the state-of-the-art Drosophila

Ji, Shuiwang

90

Functional annotation and kinetic characterization of PhnO from Salmonella enterica.  

PubMed Central

Phosphorus is an essential nutrient for all living organisms. Under conditions of inorganic phosphate starvation, genes from the Pho regulon are induced allowing microorganisms to use phosphonates as a source of phosphorus. The phnO gene was previously annotated as a transcriptional regulator of unknown function due to sequence homology with members of the GCN5-related N-acyltransferase family (GNAT). PhnO can now be functionally annotated as an aminoalkylphosphonic acid N-acetyltransferase which is able to acetylate a range of aminoalkylphosphonic acids. Studies revealed that PhnO proceeds via an ordered, sequential kinetic mechanism with acetyl-CoA binding first followed by aminoalkylphosphonate. Attack by the amine on the thioester of AcCoA generates the tetrahedral intermediate that collapses to generate the products. The enzyme also requires a divalent metal ion for activity, which is the first example of this requirement for a GNAT family member. PMID:16503658

Errey, James C.; Blanchard, John S.

2008-01-01

91

Structural and Functional Analysis of Rv3214 from Mycobacterium tuberculosis, a Protein with Conflicting Functional Annotations, Leads to Its Characterization as a Phosphatase  

Microsoft Academic Search

The availability of complete genome sequences has highlighted the problems of functional annotation of the many gene products that have only limited sequence similarity with proteins of known function. The predicted protein encoded by open reading frame Rv3214 from the Mycobacterium tuberculosis H37Rv genome was originally annotated as EntD through sequence similarity with the Escherichia coli EntD, a 4-phosphopante- theinyl

Harriet A. Watkins; Edward N. Baker

2006-01-01

92

Optimizing high performance computing workflow for protein functional annotation  

PubMed Central

Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296

Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

2014-01-01

93

Optimizing high performance computing workflow for protein functional annotation.  

PubMed

Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296

Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

2014-09-10

94

Functional annotation of the human chromosome 7 "missing" proteins: a bioinformatics approach.  

PubMed

The chromosome-centric human proteome project aims to systematically map all human proteins, chromosome by chromosome, in a gene-centric manner through dedicated efforts from national and international teams. This mapping will lead to a knowledge-based resource defining the full set of proteins encoded in each chromosome and laying the foundation for the development of a standardized approach to analyze the massive proteomic data sets currently being generated. The neXtProt database lists 946 proteins as the human proteome of chromosome 7. However, 170 (18%) proteins of human chromosome 7 have no evidence at the proteomic, antibody, or structural levels and are considered "missing" in this study as they lack experimental support. We have developed a protocol for the functional annotation of these "missing" proteins by integrating several bioinformatics analysis and annotation tools, sequential BLAST homology searches, protein domain/motif and gene ontology (GO) mapping, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. Using the BLAST search strategy, homologues for reviewed non-human mammalian proteins with protein evidence were identified for 90 "missing" proteins while another 38 had reviewed non-human mammalian homologues. Putative functional annotations were assigned to 27 of the remaining 43 novel proteins. Proteotypic peptides have been computationally generated to facilitate rapid identification of these proteins. Four of the "missing" chromosome 7 proteins have been substantiated by the ENCODE proteogenomic peptide data. PMID:23308364

Ranganathan, Shoba; Khan, Javed M; Garg, Gagan; Baker, Mark S

2013-06-01

95

A bag-of-words approach for Drosophila gene expression pattern annotation  

PubMed Central

Background Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task. Results We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods. Conclusion The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance. PMID:19383139

Ji, Shuiwang; Li, Ying-Xin; Zhou, Zhi-Hua; Kumar, Sudhir; Ye, Jieping

2009-01-01

96

CoMAGC: a corpus with multi-faceted annotations of gene-cancer relations  

PubMed Central

Background In order to access the large amount of information in biomedical literature about genes implicated in various cancers both efficiently and accurately, the aid of text mining (TM) systems is invaluable. Current TM systems do target either gene-cancer relations or biological processes involving genes and cancers, but the former type produces information not comprehensive enough to explain how a gene affects a cancer, and the latter does not provide a concise summary of gene-cancer relations. Results In this paper, we present a corpus for the development of TM systems that are specifically targeting gene-cancer relations but are still able to capture complex information in biomedical sentences. We describe CoMAGC, a corpus with multi-faceted annotations of gene-cancer relations. In CoMAGC, a piece of annotation is composed of four semantically orthogonal concepts that together express 1) how a gene changes, 2) how a cancer changes and 3) the causality between the gene and the cancer. The multi-faceted annotations are shown to have high inter-annotator agreement. In addition, we show that the annotations in CoMAGC allow us to infer the prospective roles of genes in cancers and to classify the genes into three classes according to the inferred roles. We encode the mapping between multi-faceted annotations and gene classes into 10 inference rules. The inference rules produce results with high accuracy as measured against human annotations. CoMAGC consists of 821 sentences on prostate, breast and ovarian cancers. Currently, we deal with changes in gene expression levels among other types of gene changes. The corpus is available at http://biopathway.org/CoMAGCunder the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0). Conclusions The corpus will be an important resource for the development of advanced TM systems on gene-cancer relations. PMID:24225062

2013-01-01

97

Annotation of proteins of unknown function: initial enzyme results.  

PubMed

Working with a combination of ProMOL (a plugin for PyMOL that searches a library of enzymatic motifs for local structural homologs), BLAST and Pfam (servers that identify global sequence homologs), and Dali (a server that identifies global structural homologs), we have begun the process of assigning functional annotations to the approximately 3,500 structures in the Protein Data Bank that are currently classified as having "unknown function". Using a limited template library of 388 motifs, over 500 promising in silico matches have been identified by ProMOL, among which 65 exceptionally good matches have been identified. The characteristics of the exceptionally good matches are discussed. PMID:25630330

McKay, Talia; Hart, Kaitlin; Horn, Alison; Kessler, Haeja; Dodge, Greg; Bardhi, Keti; Bardhi, Kostandina; Mills, Jeffrey L; Bernstein, Herbert J; Craig, Paul A

2015-03-01

98

Insect Innate Immunity Database (IIID): An Annotation Tool for Identifying Immune Genes in Insect Genomes  

E-print Network

Insect Innate Immunity Database (IIID): An Annotation Tool for Identifying Immune Genes in Insect'' project, which aims to sequence 5,000 insect genomes by 2016, many novel insect genomes will soon become publicly available, yet few annotation resources are currently available for insects. Thus, we developed

Bordenstein, Seth

99

GOASVM: PROTEIN SUBCELLULAR LOCALIZATION PREDICTION BASED ON GENE ONTOLOGY ANNOTATION AND SVM  

E-print Network

GOASVM: PROTEIN SUBCELLULAR LOCALIZATION PREDICTION BASED ON GENE ONTOLOGY ANNOTATION AND SVM. of Electrical Engineering Princeton University New Jersey, USA email: kung@princeton.edu ABSTRACT Protein subcellular localization is an essential step to annotate proteins and to design drugs. This paper proposes

Mak, Man-Wai

100

Gene Ontology annotations: what they mean and where they come from.  

PubMed

To address the challenges of information integration and retrieval, the computational genomics community increasingly has come to rely on the methodology of creating annotations of scientific literature using terms from controlled structured vocabularies such as the Gene Ontology (GO). Here we address the question of what such annotations signify and of how they are created by working biologists. Our goal is to promote a better understanding of how the results of experiments are captured in annotations, in the hope that this will lead both to better representations of biological reality through annotation and ontology development and to more informed use of GO resources by experimental scientists. PMID:18460184

Hill, David P; Smith, Barry; McAndrews-Hill, Monica S; Blake, Judith A

2008-01-01

101

Gene predictions and annotations Roderic Guig (Insitut Municipal d'Investigaci Mdica,  

E-print Network

1 Chapter 17 Gene predictions and annotations Roderic Guigó (Insitut Municipal d.Q. (Cold Spring Harbor Laboratory, NY, USA) Table of contents 1. Introduction 2. Ab initio gene prediction a. Prediction of signals b. Prediction of exons c. Exon assembly into genes d. Gene Prediction

102

A New Strategy to Identify and Annotate Human RPE-Specific Gene Expression  

PubMed Central

Background To identify and functionally annotate cell type-specific gene expression in the human retinal pigment epithelium (RPE), a key tissue involved in age-related macular degeneration and retinitis pigmentosa. Methodology RPE, photoreceptor and choroidal cells were isolated from selected freshly frozen healthy human donor eyes using laser microdissection. RNA isolation, amplification and hybridization to 44 k microarrays was carried out according to Agilent specifications. Bioinformatics was carried out using Rosetta Resolver, David and Ingenuity software. Principal Findings Our previous 22 k analysis of the RPE transcriptome showed that the RPE has high levels of protein synthesis, strong energy demands, is exposed to high levels of oxidative stress and a variable degree of inflammation. We currently use a complementary new strategy aimed at the identification and functional annotation of RPE-specific expressed transcripts. This strategy takes advantage of the multilayered cellular structure of the retina and overcomes a number of limitations of previous studies. In triplicate, we compared the transcriptomes of RPE, photoreceptor and choroidal cells and we deduced RPE specific expression. We identified at least 114 entries with RPE-specific gene expression. Thirty-nine of these 114 genes also show high expression in the RPE, comparison with the literature showed that 85% of these 39 were previously identified to be expressed in the RPE. In the group of 114 RPE specific genes there was an overrepresentation of genes involved in (membrane) transport, vision and ophthalmic disease. More fundamentally, we found RPE-specific involvement in the RAR-activation, retinol metabolism and GABA receptor signaling pathways. Conclusions In this study we provide a further specification and understanding of the RPE transcriptome by identifying and analyzing genes that are specifically expressed in the RPE. PMID:20479888

Booij, Judith C.; ten Brink, Jacoline B.; Swagemakers, Sigrid M. A.; Verkerk, Annemieke J. M. H.; Essing, Anke H. W.; van der Spek, Peter J.; Bergen, Arthur A. B.

2010-01-01

103

Probabilistic annotation of protein sequences based on functional classifications  

E-print Network

ral ssBioMed CentBMC Bioinformatics Open AcceResearch article Probabilistic annotation of protein sequences based on functional classifications Emmanuel D Levy1,3, Christos A Ouzounis*1, Walter R Gilks2 and Benjamin Audit*1,4 Address: 1... @mrc-lmb.cam.ac.uk; Christos A Ouzounis* - ouzounis@ebi.ac.uk; Walter R Gilks - wally.gilks@mrc- bsu.cam.ac.uk; Benjamin Audit* - Benjamin.Audit@ens-lyon.fr * Corresponding authors Abstract Background: One of the most evident achievements of bioinformatics...

Levy, Emmanuel D; Ouzounis, Christos A; Gilks, Walter R; Audit, Benjamin

2005-12-14

104

Functional Annotation of Putative Regulatory Elements at Cancer Susceptibility Loci  

PubMed Central

Most cancer-associated genetic variants identified from genome-wide association studies (GWAS) do not obviously change protein structure, leading to the hypothesis that the associations are attributable to regulatory polymorphisms. Translating genetic associations into mechanistic insights can be facilitated by knowledge of the causal regulatory variant (or variants) responsible for the statistical signal. Experimental validation of candidate functional variants is onerous, making bioinformatic approaches necessary to prioritize candidates for laboratory analysis. Thus, a systematic approach for recognizing functional (and, therefore, likely causal) variants in noncoding regions is an important step toward interpreting cancer risk loci. This review provides a detailed introduction to current regulatory variant annotations, followed by an overview of how to leverage these resources to prioritize candidate functional polymorphisms in regulatory regions. PMID:25288875

Rosse, Stephanie A; Auer, Paul L; Carlson, Christopher S

2014-01-01

105

Initiating the mollusk genomics annotation community: toward creating the complete curated gene-set of the Japanese Pearl Oyster, Pinctada fucata.  

PubMed

The genome sequence of the Japanese pearl oyster, the first draft genome from a mollusk, was published in February 2012. In order to curate the draft genome assemblies and annotate the predicted gene models, two annotation Jamborees were held in Okinawa and Tokyo. To date, 761 genes have been surveyed and curated. A preparatory meeting and a debriefing were held at the Misaki Marine Biological Station before and after the Jamborees. These four events, in conjunction with the sequence-decoding project, have facilitated the first series of gene annotations. Genome annotators among the Jamboree participants added 22 functional categories to the annotation system to date. Of these, 17 are included in Generic Gene Ontology. The other five categories are specific to molluskan biology, such as "Byssus Formation" and "Shell Formation", including Biomineralization and Acidic Proteins. A total of 731 genes from our latest version of gene models are annotated and classified into these 22 categories. The resulting data will serve as a useful reference for future genomic analyses of this species as well as comparative analyses among mollusks. PMID:24125643

Kawashima, Takeshi; Takeuchi, Takeshi; Koyanagi, Ryo; Kinoshita, Shigeharu; Endo, Hirotoshi; Endo, Kazuyoshi

2013-10-01

106

Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida  

PubMed Central

Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730

Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

2007-01-01

107

Synergistic use of plant-prokaryote comparative genomics for functional annotations  

PubMed Central

Background Identifying functions for all gene products in all sequenced organisms is a central challenge of the post-genomic era. However, at least 30-50% of the proteins encoded by any given genome are of unknown or vaguely known function, and a large number are wrongly annotated. Many of these ‘unknown’ proteins are common to prokaryotes and plants. We set out to predict and experimentally test the functions of such proteins. Our approach to functional prediction integrates comparative genomics based mainly on microbial genomes with functional genomic data from model microorganisms and post-genomic data from plants. This approach bridges the gap between automated homology-based annotations and the classical gene discovery efforts of experimentalists, and is more powerful than purely computational approaches to identifying gene-function associations. Results Among Arabidopsis genes, we focused on those (2,325 in total) that (i) are unique or belong to families with no more than three members, (ii) occur in prokaryotes, and (iii) have unknown or poorly known functions. Computer-assisted selection of promising targets for deeper analysis was based on homology-independent characteristics associated in the SEED database with the prokaryotic members of each family. In-depth comparative genomic analysis was performed for 360 top candidate families. From this pool, 78 families were connected to general areas of metabolism and, of these families, specific functional predictions were made for 41. Twenty-one predicted functions have been experimentally tested or are currently under investigation by our group in at least one prokaryotic organism (nine of them have been validated, four invalidated, and eight are in progress). Ten additional predictions have been independently validated by other groups. Discovering the function of very widespread but hitherto enigmatic proteins such as the YrdC or YgfZ families illustrates the power of our approach. Conclusions Our approach correctly predicted functions for 19 uncharacterized protein families from plants and prokaryotes; none of these functions had previously been correctly predicted by computational methods. The resulting annotations could be propagated with confidence to over six thousand homologous proteins encoded in over 900 bacterial, archaeal, and eukaryotic genomes currently available in public databases. PMID:21810204

2011-01-01

108

Towards Experimental Annotation of Genes by High Throughput Sequencing  

SciTech Connect

Andrew Bradbury of Los Alamos National Laboratory discusses turning annotation into a sequencing pipeline on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

Bradbury, Andrew [Los Alamos National Laboratory

2010-06-03

109

High-resolution functional annotation of human transcriptome: predicting isoform functions by a novel multiple instance-based label propagation method.  

PubMed

Alternative transcript processing is an important mechanism for generating functional diversity in genes. However, little is known about the precise functions of individual isoforms. In fact, proteins (translated from transcript isoforms), not genes, are the function carriers. By integrating multiple human RNA-seq data sets, we carried out the first systematic prediction of isoform functions, enabling high-resolution functional annotation of human transcriptome. Unlike gene function prediction, isoform function prediction faces a unique challenge: the lack of the training data--all known functional annotations are at the gene level. To address this challenge, we modelled the gene-isoform relationships as multiple instance data and developed a novel label propagation method to predict functions. Our method achieved an average area under the receiver operating characteristic curve of 0.67 and assigned functions to 15 572 isoforms. Interestingly, we observed that different functions have different sensitivities to alternative isoform processing, and that the function diversity of isoforms from the same gene is positively correlated with their tissue expression diversity. Finally, we surveyed the literature to validate our predictions for a number of apoptotic genes. Strikingly, for the famous 'TP53' gene, we not only accurately identified the apoptosis regulation function of its five isoforms, but also correctly predicted the precise direction of the regulation. PMID:24369432

Li, Wenyuan; Kang, Shuli; Liu, Chun-Chi; Zhang, Shihua; Shi, Yi; Liu, Yan; Zhou, Xianghong Jasmine

2014-04-01

110

Computational Method for Annotation of Protein Sequence According to Gene Ontology Terms  

Microsoft Academic Search

Annotation of a protein sequence is pivotal for the understanding of its function. Accuracy of manual annotation provided by curators is still questionable by having lesser evidence strength and yet a hard task and time consuming. A number of computational methods including tools have been developed to tackle this challenging task. However, they require high-cost hardware, are difficult to be

Razib M. Othman; Safaai Deris; Rosli M. Illias

111

Approaching the Functional Annotation of Fungal Virulence Factors Using Cross-Species Genetic Interaction Profiling  

PubMed Central

In many human fungal pathogens, genes required for disease remain largely unannotated, limiting the impact of virulence gene discovery efforts. We tested the utility of a cross-species genetic interaction profiling approach to obtain clues to the molecular function of unannotated pathogenicity factors in the human pathogen Cryptococcus neoformans. This approach involves expression of C. neoformans genes of interest in each member of the Saccharomyces cerevisiae gene deletion library, quantification of their impact on growth, and calculation of the cross-species genetic interaction profiles. To develop functional predictions, we computed and analyzed the correlations of these profiles with existing genetic interaction profiles of S. cerevisiae deletion mutants. For C. neoformans LIV7, which has no S. cerevisiae ortholog, this profiling approach predicted an unanticipated role in the Golgi apparatus. Validation studies in C. neoformans demonstrated that Liv7 is a functional Golgi factor where it promotes the suppression of the exposure of a specific immunostimulatory molecule, mannose, on the cell surface, thereby inhibiting phagocytosis. The genetic interaction profile of another pathogenicity gene that lacks an S. cerevisiae ortholog, LIV6, strongly predicted a role in endosome function. This prediction was also supported by studies of the corresponding C. neoformans null mutant. Our results demonstrate the utility of quantitative cross-species genetic interaction profiling for the functional annotation of fungal pathogenicity proteins of unknown function including, surprisingly, those that are not conserved in sequence across fungi. PMID:23300468

Brown, Jessica C. S.; Madhani, Hiten D.

2012-01-01

112

Annotation of a 95-kb Populus deltoides genomic sequence reveals a disease resistance gene cluster and novel class I and class II transposable elements  

Microsoft Academic Search

Poplar has become a model system for functional genomics in woody plants. Here, we report the sequencing and annotation of the first large contiguous stretch of genomic sequence (95 kb) of poplar, corresponding to a bacterial artificial chromosome clone mapped 0.6 centiMorgan from the Melampsora larici-populina resistance locus. The annotation revealed 15 putative genetic objects, of which five were classified as hypothetical genes

M. Lescot; S. Rombauts; J. Zhang; S. Aubourg; C. Mathé; S. Jansson; P. Rouzé; W. Boerjan

2004-01-01

113

Using phylogenetically-informed annotation (PIA) to search for light-interacting genes in transcriptomes from non-model organisms.  

PubMed

BackgroundTools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families.ResultsWe generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 30 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository (http://bitbucket.org/osiris_phylogenetics/pia/) and we demonstrate PIA on a publicly-accessible web server (http://galaxy-dev.cnsi.ucsb.edu/pia/).ConclusionsOur new trees for LIT genes will be a valuable resource for researchers studying the evolution of eyes or other light-interacting structures. We also introduce PIA, a high throughput method for using phylogenetic relationships to identify LIT genes in transcriptomes from non-model organisms. With simple modifications, our methods may be used to search for different sets of genes or to annotate data sets from taxa outside of Metazoa. PMID:25407802

Speiser, Daniel I; Pankey, M; Zaharoff, Alexander K; Battelle, Barbara A; Bracken-Grissom, Heather D; Breinholt, Jesse W; Bybee, Seth M; Cronin, Thomas W; Garm, Anders; Lindgren, Annie R; Patel, Nipam H; Porter, Megan L; Protas, Meredith E; Rivera, Ajna S; Serb, Jeanne M; Zigler, Kirk S; Crandall, Keith A; Oakley, Todd H

2014-11-19

114

In Silico Functional Pathway Annotation of 86 Established Prostate Cancer Risk Variants  

PubMed Central

Heritability is one of the strongest risk factors of prostate cancer, emphasizing the importance of the genetic contribution towards prostate cancer risk. To date, 86 established prostate cancer risk variants have been identified by genome-wide association studies (GWAS). To determine if these risk variants are located near genes that interact together in biological networks or pathways contributing to prostate cancer initiation or progression, we generated gene sets based on proximity to the 86 prostate cancer risk variants. We took two approaches to generate gene lists. The first strategy included all immediate flanking genes, up- and downstream of the risk variant, regardless of distance from the index variant, and the second strategy included genes closest to the index GWAS marker and to variants in high LD (r2 ?0.8 in Europeans) with the index variant, within a 100 kb window up- and downstream. Pathway mapping of the two gene sets supported the importance of the androgen receptor-mediated signaling in prostate cancer biology. In addition, the hedgehog and Wnt/?-catenin signaling pathways were identified in pathway mapping for the flanking gene set. We also used the HaploReg resource to examine the 86 risk loci and variants high LD (r2 ?0.8) for functional elements. We found that there was a 12.8 fold (p = 2.9 x 10-4) enrichment for enhancer motifs in a stem cell line and a 4.4 fold (p = 1.1 x 10-3) enrichment of DNase hypersensitivity in a prostate adenocarcinoma cell line, indicating that the risk and correlated variants are enriched for transcriptional regulatory motifs. Our pathway-based functional annotation of the prostate cancer risk variants highlights the potential regulatory function that GWAS risk markers, and their highly correlated variants, exert on genes. Our study also shows that these genes may function cooperatively in key signaling pathways in prostate cancer biology. PMID:25658610

Loo, Lenora W. M.; Fong, Aaron Y. W.; Cheng, Iona; Le Marchand, Loïc

2015-01-01

115

Rice DB: an Oryza Information Portal linking annotation, subcellular location, function, expression, regulation, and evolutionary information for rice and Arabidopsis  

PubMed Central

Omics research in Oryza sativa (rice) relies on the use of multiple databases to obtain different types of information to define gene function. We present Rice DB, an Oryza information portal that is a functional genomics database, linking gene loci to comprehensive annotations, expression data and the subcellular location of encoded proteins. Rice DB has been designed to integrate the direct comparison of rice with Arabidopsis (Arabidopsis thaliana), based on orthology or ‘expressology’, thus using and combining available information from two pre-eminent plant models. To establish Rice DB, gene identifiers (more than 40 types) and annotations from a variety of sources were compiled, functional information based on large-scale and individual studies was manually collated, hundreds of microarrays were analysed to generate expression annotations, and the occurrences of potential functional regulatory motifs in promoter regions were calculated. A range of computational subcellular localization predictions were also run for all putative proteins encoded in the rice genome, and experimentally confirmed protein localizations have been collated, curated and linked to functional studies in rice. A single search box allows anything from gene identifiers (for rice and/or Arabidopsis), motif sequences, subcellular location, to keyword searches to be entered, with the capability of Boolean searches (such as AND/OR). To demonstrate the utility of Rice DB, several examples are presented including a rice mitochondrial proteome, which draws on a variety of sources for subcellular location data within Rice DB. Comparisons of subcellular location, functional annotations, as well as transcript expression in parallel with Arabidopsis reveals examples of conservation between rice and Arabidopsis, using Rice DB (http://ricedb.plantenergy.uwa.edu.au). PMID:24147765

Narsai, Reena; Devenish, James; Castleden, Ian; Narsai, Kabir; Xu, Lin; Shou, Huixia; Whelan, James

2013-01-01

116

Mining locus tags in PubMed Central to improve microbial gene annotation  

PubMed Central

Background The scientific literature contains millions of microbial gene identifiers within the full text and tables, but these annotations rarely get incorporated into public sequence databases. We propose to utilize the Open Access (OA) subset of PubMed Central (PMC) as a gene annotation database and have developed an R package called pmcXML to automatically mine and extract locus tags from full text, tables and supplements. Results We mined locus tags from 1835 OA publications in ten microbial genomes and extracted tags mentioned in 30,891 sentences in main text and 20,489 rows in tables. We identified locus tag pairs marking the start and end of a region such as an operon or genomic island and expanded these ranges to add another 13,043 tags. We also searched for locus tags in supplementary tables and publications outside the OA subset in Burkholderia pseudomallei K96243 for comparison. There were 168 publications containing 48,470 locus tags and 83% of mentions were from supplementary materials and 9% from publications outside the OA subset. Conclusions B. pseudomallei locus tags within the full text and tables of OA publications represent only a small fraction of the total mentions in the literature. For microbial genomes with very few functionally characterized proteins, the locus tags mentioned in supplementary tables and within ranges like genomic islands contain the majority of locus tags. Significantly, the functions in the R package provide access to additional resources in the OA subset that are not currently indexed or returned by searching PMC. PMID:24499370

2014-01-01

117

Predicting function: from genes to genomes and back1  

Microsoft Academic Search

Predicting function from sequence using computational tools is a highly complicated procedure that is generally done for each gene individually. This review focuses on the added value that is provided by completely sequenced genomes in function prediction. Various levels of sequence annotation and function prediction are discussed, ranging from genomic sequence to that of complex cellular processes. Protein function is

Peer Bork; Thomas Dandekar; Yolande Diaz-Lazcoz; Frank Eisenhaber; Martijn Huynen; Yanping Yuan

1998-01-01

118

RNA-Seq Analysis of Quercus pubescens Leaves: De Novo Transcriptome Assembly, Annotation and Functional Markers Development  

PubMed Central

Quercus pubescens Willd., a species distributed from Spain to southwest Asia, ranks high for drought tolerance among European oaks. Q. pubescens performs a role of outstanding significance in most Mediterranean forest ecosystems, but few mechanistic studies have been conducted to explore its response to environmental constrains, due to the lack of genomic resources. In our study, we performed a deep transcriptomic sequencing in Q. pubescens leaves, including de novo assembly, functional annotation and the identification of new molecular markers. Our results are a pre-requisite for undertaking molecular functional studies, and may give support in population and association genetic studies. 254,265,700 clean reads were generated by the Illumina HiSeq 2000 platform, with an average length of 98 bp. De novo assembly, using CLC Genomics, produced 96,006 contigs, having a mean length of 618 bp. Sequence similarity analyses against seven public databases (Uniprot, NR, RefSeq and KOGs at NCBI, Pfam, InterPro and KEGG) resulted in 83,065 transcripts annotated with gene descriptions, conserved protein domains, or gene ontology terms. These annotations and local BLAST allowed identify genes specifically associated with mechanisms of drought avoidance. Finally, 14,202 microsatellite markers and 18,425 single nucleotide polymorphisms (SNPs) were, in silico, discovered in assembled and annotated sequences. We completed a successful global analysis of the Q. pubescens leaf transcriptome using RNA-seq. The assembled and annotated sequences together with newly discovered molecular markers provide genomic information for functional genomic studies in Q. pubescens, with special emphasis to response mechanisms to severe constrain of the Mediterranean climate. Our tools enable comparative genomics studies on other Quercus species taking advantage of large intra-specific ecophysiological differences. PMID:25393112

Torre, Sara; Tattini, Massimiliano; Brunetti, Cecilia; Fineschi, Silvia; Fini, Alessio; Ferrini, Francesco; Sebastiani, Federico

2014-01-01

119

COMBREX: a project to accelerate the functional annotation of prokaryotic genomes  

E-print Network

COMBREX (http://combrex.bu.edu) is a project to increase the speed of the functional annotation of new bacterial and archaeal genomes. It consists of a database of functional predictions produced by computational biologists ...

Roberts, Richard J.

120

Identification and computational annotation of genes differentially expressed in pulp development of Cocos nucifera L. by suppression subtractive hybridization  

PubMed Central

Background Coconut (Cocos nucifera L.) is one of the world’s most versatile, economically important tropical crops. Little is known about the physiological and molecular basis of coconut pulp (endosperm) development and only a few coconut genes and gene product sequences are available in public databases. This study identified genes that were differentially expressed during development of coconut pulp and functionally annotated these identified genes using bioinformatics analysis. Results Pulp from three different coconut developmental stages was collected. Four suppression subtractive hybridization (SSH) libraries were constructed (forward and reverse libraries A and B between stages 1 and 2, and C and D between stages 2 and 3), and identified sequences were computationally annotated using Blast2GO software. A total of 1272 clones were obtained for analysis from four SSH libraries with 63% showing similarity to known proteins. Pairwise comparing of stage-specific gene ontology ids from libraries B-D, A-C, B-C and A-D showed that 32 genes were continuously upregulated and seven downregulated; 28 were transiently upregulated and 23 downregulated. KEGG (Kyoto Encyclopedia of Genes and Genomes) analysis showed that 1-acyl-sn-glycerol-3-phosphate acyltransferase (LPAAT), phospholipase D, acetyl-CoA carboxylase carboxyltransferase beta subunit, 3-hydroxyisobutyryl-CoA hydrolase-like and pyruvate dehydrogenase E1 ? subunit were associated with fatty acid biosynthesis or metabolism. Triose phosphate isomerase, cellulose synthase and glucan 1,3-?-glucosidase were related to carbohydrate metabolism, and phosphoenolpyruvate carboxylase was related to both fatty acid and carbohydrate metabolism. Of 737 unigenes, 103 encoded enzymes were involved in fatty acid and carbohydrate biosynthesis and metabolism, and a number of transcription factors and other interesting genes with stage-specific expression were confirmed by real-time PCR, with validation of the SSH results as high as 66.6%. Based on determination of coconut endosperm fatty acids content by gas chromatography–mass spectrometry, a number of candidate genes in fatty acid anabolism were selected for further study. Conclusion Functional annotation of genes differentially expressed in coconut pulp development helped determine the molecular basis of coconut endosperm development. The SSH method identified genes related to fatty acids, carbohydrate and secondary metabolites. The results will be important for understanding gene functions and regulatory networks in coconut fruit. PMID:25084812

2014-01-01

121

Protein structure prediction and structure-based protein function annotation  

E-print Network

of predicted EC numbers........................................................................................... 1014.2.2.2 Analysis of predicted GO terms.................................................................................................104 4... ..............................................................................................................................119 5.1.1 Protein structure predictions using the I-TASSER server ............................................. 119 5.1.2 Detection of functional sites in protein using the COFACTOR algorithm .............. 120 5.1.3 Prediction of EC number and Gene...

Roy, Ambrish

2011-12-31

122

Structural and Functional Annotation of the Porcine Immunome  

Technology Transfer Automated Retrieval System (TEKTRAN)

The domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. H...

123

Seeing the forest for the trees: annotating small RNA producing genes in plants.  

PubMed

A key goal in genomics is the complete annotation of the expressed regions of the genome. In plants, substantial portions of the genome make regulatory small RNAs produced by Dicer-Like (DCL) proteins and utilized by Argonaute (AGO) proteins. These include miRNAs and various types of endogenous siRNAs. Small RNA-seq, enabled by cheap and fast DNA sequencing, has produced an enormous volume of data on plant miRNA and siRNA expression in recent years. In this review, we discuss recent progress in using small RNA-seq data to produce stable and reliable annotations of miRNA and siRNA genes in plants. In addition, we highlight key goals for the future of small RNA gene annotation in plants. PMID:24632306

Coruh, Ceyda; Shahid, Saima; Axtell, Michael J

2014-04-01

124

Identification of novel biomass-degrading enzymes from genomic dark matter: Populating genomic sequence space with functional annotation.  

PubMed

Although recent nucleotide sequencing technologies have significantly enhanced our understanding of microbial genomes, the function of ?35% of genes identified in a genome currently remains unknown. To improve the understanding of microbial genomes and consequently of microbial processes it will be crucial to assign a function to this "genomic dark matter." Due to the urgent need for additional carbohydrate-active enzymes for improved production of transportation fuels from lignocellulosic biomass, we screened the genomes of more than 5,500 microorganisms for hypothetical proteins that are located in the proximity of already known cellulases. We identified, synthesized and expressed a total of 17 putative cellulase genes with insufficient sequence similarity to currently known cellulases to be identified as such using traditional sequence annotation techniques that rely on significant sequence similarity. The recombinant proteins of the newly identified putative cellulases were subjected to enzymatic activity assays to verify their hydrolytic activity towards cellulose and lignocellulosic biomass. Eleven (65%) of the tested enzymes had significant activity towards at least one of the substrates. This high success rate highlights that a gene context-based approach can be used to assign function to genes that are otherwise categorized as "genomic dark matter" and to identify biomass-degrading enzymes that have little sequence similarity to already known cellulases. The ability to assign function to genes that have no related sequence representatives with functional annotation will be important to enhance our understanding of microbial processes and to identify microbial proteins for a wide range of applications. PMID:24728961

Piao, Hailan; Froula, Jeff; Du, Changbin; Kim, Tae-Wan; Hawley, Erik R; Bauer, Stefan; Wang, Zhong; Ivanova, Nathalia; Clark, Douglas S; Klenk, Hans-Peter; Hess, Matthias

2014-08-01

125

Annotation-Modules: a tool for finding significant combinations of multisource annotations for gene lists  

Microsoft Academic Search

Motivation: The ontological analysis of the gene lists obtained from DNA microarray experiments constitutes an important step in under- standing the underlying biology of the analyzed system. Over the last years, many other high-throughput techniques emerged, cover- ing now basically all \\

Michael Hackenberg; Rune Matthiesen

2008-01-01

126

Microarray analysis of genes and gene functions in disc degeneration  

PubMed Central

The aim of the present study was to screen differentially expressed genes (DEGs) in human degenerative intervertebral discs (IVDs), and to perform functional analysis on these DEGs. The gene expression profile was downloaded from the Gene Expression Omnibus database (GSE34095)and included six human IVD samples: three degenerative and three non-degenerative. The DEGs between the normal and disease samples were identified using R packages. The online software WebGestalt was used to perform the functional analysis of the DEGs, followed by Osprey software to search for interactions between the DEGs. The Database for Annotation, Visualization and Integrated Discovery was utilized to annotate the DEGs in the interaction network and then the DEGs were uploaded to the Connectivity Map database to search for small molecules. In addition, the active binding sites for the hub genes in the network were obtained, based on the Universal Protein database. By comparing the gene expression profiles of the non-degenerative and degenerative IVDs, the DEGs between the samples were identified. The DEGs were significantly associated with transforming growth factor ? and the extracellular matrix. Matrix metalloproteinase 2 (MMP2) was identified as the hub gene of the interaction network of DEGs. In addition, MMP2 was found to be upregulated in degenerative IVDs. The screened small molecules and the active binding sites of MMP2 may facilitate the development of methods to inhibit overexpression of MMP2. PMID:24396401

TANG, YANCHUN; WANG, SHAOKUN; LIU, YING; WANG, XUYUN

2014-01-01

127

Comparison of assembly algorithms for improving rate of metatranscriptomic functional annotation  

PubMed Central

Background Microbiome-wide gene expression profiling through high-throughput RNA sequencing (‘metatranscriptomics’) offers a powerful means to functionally interrogate complex microbial communities. Key to successful exploitation of these datasets is the ability to confidently match relatively short sequence reads to known bacterial transcripts. In the absence of reference genomes, such annotation efforts may be enhanced by assembling reads into longer contiguous sequences (‘contigs’), prior to database search strategies. Since reads from homologous transcripts may derive from several species, represented at different abundance levels, it is not clear how well current assembly pipelines perform for metatranscriptomic datasets. Here we evaluate the performance of four currently employed assemblers including de novo transcriptome assemblers - Trinity and Oases; the metagenomic assembler - Metavelvet; and the recently developed metatranscriptomic assembler IDBA-MT. Results We evaluated the performance of the assemblers on a previously published dataset of single-end RNA sequence reads derived from the large intestine of an inbred non-obese diabetic mouse model of type 1 diabetes. We found that Trinity performed best as judged by contigs assembled, reads assigned to contigs, and number of reads that could be annotated to a known bacterial transcript. Only 15.5% of RNA sequence reads could be annotated to a known transcript in contrast to 50.3% with Trinity assembly. Paired-end reads generated from the same mouse samples resulted in modest performance gains. A database search estimated that the assemblies are unlikely to erroneously merge multiple unrelated genes sharing a region of similarity (<2% of contigs). A simulated dataset based on ten species confirmed these findings. A more complex simulated dataset based on 72 species found that greater assembly errors were introduced than is expected by sequencing quality. Through the detailed evaluation of assembly performance, the insights provided by this study will help drive the design of future metatranscriptomic analyses. Conclusion Assembly of metatranscriptome datasets greatly improved read annotation. Of the four assemblers evaluated, Trinity provided the best performance. For more complex datasets, reads generated from transcripts sharing considerable sequence similarity can be a source of significant assembly error, suggesting a need to collate reads on the basis of common taxonomic origin prior to assembly. PMID:25411636

2014-01-01

128

PIPA: A High-Throughput Pipeline for Protein Function Annotation Chenggang Yu, Valmik Desai, Nela Zavaljevski, and Jaques Reifman  

E-print Network

PIPA: A High-Throughput Pipeline for Protein Function Annotation Chenggang Yu, Valmik Desai, Nela of multisource predictions. We developed Pipeline for Protein Annotation (PIPA), a genome-wide protein function annotation pipeline that runs in a high-performance computing environment. PIPA integrates different tools

129

A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif-based linear discriminatory analysis  

PubMed Central

Background Several computational candidate gene selection and prioritization methods have recently been developed. These in silico selection and prioritization techniques are usually based on two central approaches - the examination of similarities to known disease genes and/or the evaluation of functional annotation of genes. Each of these approaches has its own caveats. Here we employ a previously described method of candidate gene prioritization based mainly on gene annotation, in accompaniment with a technique based on the evaluation of pertinent sequence motifs or signatures, in an attempt to refine the gene prioritization approach. We apply this approach to X-linked mental retardation (XLMR), a group of heterogeneous disorders for which some of the underlying genetics is known. Results The gene annotation-based binary filtering method yielded a ranked list of putative XLMR candidate genes with good plausibility of being associated with the development of mental retardation. In parallel, a motif finding approach based on linear discriminatory analysis (LDA) was employed to identify short sequence patterns that may discriminate XLMR from non-XLMR genes. High rates (>80%) of correct classification was achieved, suggesting that the identification of these motifs effectively captures genomic signals associated with XLMR vs. non-XLMR genes. The computational tools developed for the motif-based LDA is integrated into the freely available genomic analysis portal Galaxy (http://main.g2.bx.psu.edu/). Nine genes (APLN, ZC4H2, MAGED4, MAGED4B, RAP2C, FAM156A, FAM156B, TBL1X, and UXT) were highlighted as highly-ranked XLMR methods. Conclusions The combination of gene annotation information and sequence motif-orientated computational candidate gene prediction methods highlight an added benefit in generating a list of plausible candidate genes, as has been demonstrated for XLMR. Reviewers: This article was reviewed by Dr Barbara Bardoni (nominated by Prof Juergen Brosius); Prof Neil Smalheiser and Dr Dustin Holloway (nominated by Prof Charles DeLisi). PMID:21668950

2011-01-01

130

De novo transcriptome assembly, gene annotation, marker development, and miRNA potential target genes validation under abiotic stresses in Oenanthe javanica.  

PubMed

Oenanthe javanica is an aquatic perennial herb with known medicinal properties and an edible vegetable with high vitamin and mineral content. The understanding of the biology of O. javanica is limited by the absence of information on its genome, transcriptome, and small RNA. In this study, transcriptome sequencing and small RNA sequencing were performed to annotate function genes, develop SSR markers and analyze potential target genes of miRNAs in O. javanica. All reads with total nucleotides number of 1,440,321,408 bp were assembled into 58,072 transcripts and 40,208 unigenes. A total of 1,233 SSRs were identified from O. javanica. Generated unigenes were aligned against seven databases and annotated with functions. A total of 29 potential targets were predicted. Expression of 10 miRNAs and their corresponding target genes under abiotic stresses (heat, cold, salinity, and drought) was validated. All ten miRNAs were confirmed to response to abiotic stresses. A pair of miRNA and its target gene was found. This study can serve as a valuable resource for future studies on O. javanica, which may focus on novel gene discovery, SSR development, gene mapping, and miRNA-affected processes and pathways. This can promote the development of the useful medicinal properties of O. javanica in medical science. PMID:25416420

Jiang, Qian; Wang, Feng; Tan, Hua-Wei; Li, Meng-Yao; Xu, Zhi-Sheng; Tan, Guo-Fei; Xiong, Ai-Sheng

2014-11-22

131

Comparative Analysis of Chloroplast Genomes: Functional Annotation, Genome-Based Phylogeny, and Deduced Evolutionary Patterns  

PubMed Central

All protein sequences from 19 complete chloroplast genomes (cpDNA) have been studied using a new computational method able to analyze functional correlations among series of protein sequences contained in complete proteomes. First, all open reading frames (ORFs) from the cpDNAs, comprising a total of 2266 protein sequences, were compared against the 3168 proteins from Synechocystis PCC6803 complete genome to find functionally related orthologous proteins. Additionally, all cpDNA genomes were pairwise compared to find orthologous groups not present in cyanobacteria. Annotations in the cluster of othologous proteins database and CyanoBase were used as reference for the functional assignments. Following this protocol, new functional assignments were made for ORFs of unknown function and for ycfs (hypothetical chloroplast frames), which still lack a functional assignment. Using this information, a matrix of functional relationships was derived from profiles of the presence and/or absence of orthologous proteins; the matrix included 1837 proteins in 277 orthologous clusters. A factor analysis study of this matrix, followed by cluster analysis, allowed us to obtain accurate phylogenetic reconstructions and the detection of genes probably involved in speciation as phylogenetic correlates. Finally, by grouping common evolutionary patterns, we show that it is possible to determine functionally linked protein networks. This has allowed us to suggest putative associations for some unknown ORFs. PMID:11932241

Rivas, Javier De Las; Lozano, Juan Jose; Ortiz, Angel R.

2002-01-01

132

Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project  

PubMed Central

The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine. PMID:18193213

Horton, Roger; Gibson, Richard; Coggill, Penny; Miretti, Marcos; Allcock, Richard J.; Almeida, Jeff; Forbes, Simon; Gilbert, James G. R.; Halls, Karen; Harrow, Jennifer L.; Hart, Elizabeth; Howe, Kevin; Jackson, David K.; Palmer, Sophie; Roberts, Anne N.; Sims, Sarah; Stewart, C. Andrew; Traherne, James A.; Trevanion, Steve; Wilming, Laurens; Rogers, Jane; de Jong, Pieter J.; Elliott, John F.; Sawcer, Stephen; Todd, John A.; Trowsdale, John

2008-01-01

133

ANNOTATION OF TRIBOLIUM CUTICLE PROTEIN AND PERITROPHIN GENES  

Technology Transfer Automated Retrieval System (TEKTRAN)

The recently completed genome sequence of the hard-bodied beetle, Tribolium, could reveal new insights into genetic mechanisms for chitin and cuticle production in pest insects. The genome sequence is being "mined" for cuticle genes using a combination of automated and manual gene-finding procedure...

134

An integrated gene annotation and transcriptional profiling approach towards the full gene content of the Drosophila genome  

PubMed Central

Background While the genome sequences for a variety of organisms are now available, the precise number of the genes encoded is still a matter of debate. For the human genome several stringent annotation approaches have resulted in the same number of potential genes, but a careful comparison revealed only limited overlap. This indicates that only the combination of different computational prediction methods and experimental evaluation of such in silico data will provide more complete genome annotations. In order to get a more complete gene content of the Drosophila melanogaster genome, we based our new D. melanogaster whole-transcriptome microarray, the Heidelberg FlyArray, on the combination of the Berkeley Drosophila Genome Project (BDGP) annotation and a novel ab initio gene prediction of lower stringency using the Fgenesh software. Results Here we provide evidence for the transcription of approximately 2,600 additional genes predicted by Fgenesh. Validation of the developmental profiling data by RT-PCR and in situ hybridization indicates a lower limit of 2,000 novel annotations, thus substantially raising the number of genes that make a fly. Conclusions The successful design and application of this novel Drosophila microarray on the basis of our integrated in silico/wet biology approach confirms our expectation that in silico approaches alone will always tend to be incomplete. The identification of at least 2,000 novel genes highlights the importance of gathering experimental evidence to discover all genes within a genome. Moreover, as such an approach is independent of homology criteria, it will allow the discovery of novel genes unrelated to known protein families or those that have not been strictly conserved between species. PMID:14709175

Hild, M; Beckmann, B; Haas, SA; Koch, B; Solovyev, V; Busold, C; Fellenberg, K; Boutros, M; Vingron, M; Sauer, F; Hoheisel, JD; Paro, R

2004-01-01

135

Likelihood-based gene annotations for gap filling and quality assessment in genome-scale metabolic models.  

PubMed

Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genes and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface. PMID:25329157

Benedict, Matthew N; Mundy, Michael B; Henry, Christopher S; Chia, Nicholas; Price, Nathan D

2014-10-01

136

Annotation and comparative analysis of the glycoside hydrolase genes in Brachypodium distachyon  

SciTech Connect

Background Glycoside hydrolases cleave the bond between a carbohydrate and another carbohydrate, a protein, lipid or other moiety. Genes encoding glycoside hydrolases are found in a wide range of organisms, from archea to animals, and are relatively abundant in plant genomes. In plants, these enzymes are involved in diverse processes, including starch metabolism, defense, and cell-wall remodeling. Glycoside hydrolase genes have been previously cataloged for Oryza sativa (rice), the model dicotyledonous plant Arabidopsis thaliana, and the fast-growing tree Populus trichocarpa (poplar). To improve our understanding of glycoside hydrolases in plants generally and in grasses specifically, we annotated the glycoside hydrolase genes in the grasses Brachypodium distachyon (an emerging monocotyledonous model) and Sorghum bicolor (sorghum). We then compared the glycoside hydrolases across species, both at the whole-genome level and at the level of individual glycoside hydrolase families. Results We identified 356 glycoside hydrolase genes in Brachypodium and 404 in sorghum. The corresponding proteins fell into the same 34 families that are represented in rice, Arabidopsis, and poplar, helping to define a glycoside hydrolase family profile which may be common to flowering plants. Examination of individual glycoside hydrolase familes (GH5, GH13, GH18, GH19, GH28, and GH51) revealed both similarities and distinctions between monocots and dicots, as well as between species. Shared evolutionary histories appear to be modified by lineage-specific expansions or deletions. Within families, the Brachypodium and sorghum proteins generally cluster with those from other monocots. Conclusions This work provides the foundation for further comparative and functional analyses of plant glycoside hydrolases. Defining the Brachypodium glycoside hydrolases sets the stage for Brachypodium to be a monocot model for investigations of these enzymes and their diverse roles in planta. Insights gained from Brachypodium will inform translational research studies, with applications for the improvement of cereal crops and bioenergy grasses.

Tyler, Ludmila [United States Department of Agriculture (USDA), Western Regional Research Center (WRRC), Albany; Bragg, Jennifer [United States Department of Agriculture (USDA), Western Regional Research Center (WRRC), Albany; Wu, Jiajie [United States Department of Agriculture (USDA), Western Regional Research Center (WRRC), Albany; Yang, Xiaohan [ORNL; Tuskan, Gerald A [ORNL; Vogel, John [United States Department of Agriculture (USDA), Western Regional Research Center (WRRC), Albany

2010-01-01

137

Phylogeny, Functional Annotation, and Protein Interaction Network Analyses of the Xenopus tropicalis Basic Helix-Loop-Helix Transcription Factors  

PubMed Central

The previous survey identified 70 basic helix-loop-helix (bHLH) proteins, but it was proved to be incomplete, and the functional information and regulatory networks of frog bHLH transcription factors were not fully known. Therefore, we conducted an updated genome-wide survey in the Xenopus tropicalis genome project databases and identified 105 bHLH sequences. Among the retrieved 105 sequences, phylogenetic analyses revealed that 103 bHLH proteins belonged to 43 families or subfamilies with 46, 26, 11, 3, 15, and 4 members in the corresponding supergroups. Next, gene ontology (GO) enrichment analyses showed 65 significant GO annotations of biological processes and molecular functions and KEGG pathways counted in frequency. To explore the functional pathways, regulatory gene networks, and/or related gene groups coding for Xenopus tropicalis bHLH proteins, the identified bHLH genes were put into the databases KOBAS and STRING to get the signaling information of pathways and protein interaction networks according to available public databases and known protein interactions. From the genome annotation and pathway analysis using KOBAS, we identified 16 pathways in the Xenopus tropicalis genome. From the STRING interaction analysis, 68 hub proteins were identified, and many hub proteins created a tight network or a functional module within the protein families. PMID:24312906

Chen, Deyu

2013-01-01

138

ARG-ANNOT, a New Bioinformatic Tool To Discover Antibiotic Resistance Genes in Bacterial Genomes  

PubMed Central

ARG-ANNOT (Antibiotic Resistance Gene-ANNOTation) is a new bioinformatic tool that was created to detect existing and putative new antibiotic resistance (AR) genes in bacterial genomes. ARG-ANNOT uses a local BLAST program in Bio-Edit software that allows the user to analyze sequences without a Web interface. All AR genetic determinants were collected from published works and online resources; nucleotide and protein sequences were retrieved from the NCBI GenBank database. After building a database that includes 1,689 antibiotic resistance genes, the software was tested in a blind manner using 100 random sequences selected from the database to verify that the sensitivity and specificity were at 100% even when partial sequences were queried. Notably, BLAST analysis results obtained using the rmtF gene sequence (a new aminoglycoside-modifying enzyme gene sequence that is not included in the database) as a query revealed that the tool was able to link this sequence to short sequences (17 to 40 bp) found in other genes of the rmt family with significant E values. Finally, the analysis of 178 Acinetobacter baumannii and 20 Staphylococcus aureus genomes allowed the detection of a significantly higher number of AR genes than the Resfinder gene analyzer and 11 point mutations in target genes known to be associated with AR. The average time for the analysis of a genome was 3.35 ± 0.13 min. We have created a concise database for BLAST using a Bio-Edit interface that can detect AR genetic determinants in bacterial genomes and can rapidly and easily discover putative new AR genetic determinants. PMID:24145532

Gupta, Sushim Kumar; Padmanabhan, Babu Roshan; Diene, Seydina M.; Lopez-Rojas, Rafael; Kempf, Marie; Landraud, Luce

2014-01-01

139

ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes.  

PubMed

ARG-ANNOT (Antibiotic Resistance Gene-ANNOTation) is a new bioinformatic tool that was created to detect existing and putative new antibiotic resistance (AR) genes in bacterial genomes. ARG-ANNOT uses a local BLAST program in Bio-Edit software that allows the user to analyze sequences without a Web interface. All AR genetic determinants were collected from published works and online resources; nucleotide and protein sequences were retrieved from the NCBI GenBank database. After building a database that includes 1,689 antibiotic resistance genes, the software was tested in a blind manner using 100 random sequences selected from the database to verify that the sensitivity and specificity were at 100% even when partial sequences were queried. Notably, BLAST analysis results obtained using the rmtF gene sequence (a new aminoglycoside-modifying enzyme gene sequence that is not included in the database) as a query revealed that the tool was able to link this sequence to short sequences (17 to 40 bp) found in other genes of the rmt family with significant E values. Finally, the analysis of 178 Acinetobacter baumannii and 20 Staphylococcus aureus genomes allowed the detection of a significantly higher number of AR genes than the Resfinder gene analyzer and 11 point mutations in target genes known to be associated with AR. The average time for the analysis of a genome was 3.35 ± 0.13 min. We have created a concise database for BLAST using a Bio-Edit interface that can detect AR genetic determinants in bacterial genomes and can rapidly and easily discover putative new AR genetic determinants. PMID:24145532

Gupta, Sushim Kumar; Padmanabhan, Babu Roshan; Diene, Seydina M; Lopez-Rojas, Rafael; Kempf, Marie; Landraud, Luce; Rolain, Jean-Marc

2014-01-01

140

Functional Annotation of Proteomic Data from Chicken Heterophils and Macrophages Induced by Carbon Nanotube Exposure  

PubMed Central

With the expanding applications of carbon nanotubes (CNT) in biomedicine and agriculture, questions about the toxicity and biocompatibility of CNT in humans and domestic animals are becoming matters of serious concern. This study used proteomic methods to profile gene expression in chicken macrophages and heterophils in response to CNT exposure. Two-dimensional gel electrophoresis identified 12 proteins in macrophages and 15 in heterophils, with differential expression patterns in response to CNT co-incubation (0, 1, 10, and 100 ?g/mL of CNT for 6 h) (p < 0.05). Gene ontology analysis showed that most of the differentially expressed proteins are associated with protein interactions, cellular metabolic processes, and cell mobility, suggesting activation of innate immune functions. Western blot analysis with heat shock protein 70, high mobility group protein, and peptidylprolyl isomerase A confirmed the alterations of the profiled proteins. The functional annotations were further confirmed by effective cell migration, promoted interleukin-1? secretion, and more cell death in both macrophages and heterophils exposed to CNT (p < 0.05). In conclusion, results of this study suggest that CNT exposure affects protein expression, leading to activation of macrophages and heterophils, resulting in altered cytoskeleton remodeling, cell migration, and cytokine production, and thereby mediates tissue immune responses. PMID:24823882

Li, Yun-Ze; Cheng, Chung-Shi; Chen, Chao-Jung; Li, Zi-Lin; Lin, Yao-Tung; Chen, Shuen-Ei; Huang, San-Yuan

2014-01-01

141

Annotation and retrieval system of CAD models based on functional semantics  

NASA Astrophysics Data System (ADS)

CAD model retrieval based on functional semantics is more significant than content-based 3D model retrieval during the mechanical conceptual design phase. However, relevant research is still not fully discussed. Therefore, a functional semantic-based CAD model annotation and retrieval method is proposed to support mechanical conceptual design and design reuse, inspire designer creativity through existing CAD models, shorten design cycle, and reduce costs. Firstly, the CAD model functional semantic ontology is constructed to formally represent the functional semantics of CAD models and describe the mechanical conceptual design space comprehensively and consistently. Secondly, an approach to represent CAD models as attributed adjacency graphs(AAG) is proposed. In this method, the geometry and topology data are extracted from STEP models. On the basis of AAG, the functional semantics of CAD models are annotated semi-automatically by matching CAD models that contain the partial features of which functional semantics have been annotated manually, thereby constructing CAD Model Repository that supports model retrieval based on functional semantics. Thirdly, a CAD model retrieval algorithm that supports multi-function extended retrieval is proposed to explore more potential creative design knowledge in the semantic level. Finally, a prototype system, called Functional Semantic-based CAD Model Annotation and Retrieval System(FSMARS), is implemented. A case demonstrates that FSMARS can successfully botain multiple potential CAD models that conform to the desired function. The proposed research addresses actual needs and presents a new way to acquire CAD models in the mechanical conceptual design phase.

Wang, Zhansong; Tian, Ling; Duan, Wenrui

2014-11-01

142

Annotation and retrieval system of CAD models based on functional semantics  

NASA Astrophysics Data System (ADS)

CAD model retrieval based on functional semantics is more significant than content-based 3D model retrieval during the mechanical conceptual design phase. However, relevant research is still not fully discussed. Therefore, a functional semantic-based CAD model annotation and retrieval method is proposed to support mechanical conceptual design and design reuse, inspire designer creativity through existing CAD models, shorten design cycle, and reduce costs. Firstly, the CAD model functional semantic ontology is constructed to formally represent the functional semantics of CAD models and describe the mechanical conceptual design space comprehensively and consistently. Secondly, an approach to represent CAD models as attributed adjacency graphs(AAG) is proposed. In this method, the geometry and topology data are extracted from STEP models. On the basis of AAG, the functional semantics of CAD models are annotated semi-automatically by matching CAD models that contain the partial features of which functional semantics have been annotated manually, thereby constructing CAD Model Repository that supports model retrieval based on functional semantics. Thirdly, a CAD model retrieval algorithm that supports multi-function extended retrieval is proposed to explore more potential creative design knowledge in the semantic level. Finally, a prototype system, called Functional Semantic-based CAD Model Annotation and Retrieval System(FSMARS), is implemented. A case demonstrates that FSMARS can successfully botain multiple potential CAD models that conform to the desired function. The proposed research addresses actual needs and presents a new way to acquire CAD models in the mechanical conceptual design phase.

Wang, Zhansong; Tian, Ling; Duan, Wenrui

2014-10-01

143

MicroRNA expression profiling and functional annotation analysis of their targets in patients with type 1 diabetes mellitus.  

PubMed

Type 1 diabetes mellitus (T1DM) results from an autoimmune attack against the insulin-producing pancreatic ?-cells, leading to elimination of insulin production. The exact cause of this disorder is still unclear. Although the differential expression of microRNAs (miRNAs), small non-coding RNAs that control gene expression in a post-transcriptional manner, has been identified in many diseases, including T1DM, only scarce information exists concerning miRNA expression profile in T1DM. Thus, we employed the microarray technology to examine the miRNA expression profiles displayed by peripheral blood mononuclear cells (PBMCs) from T1DM patients compared with healthy subjects. Total RNA extracted from PBMCs from 11 T1DM patients and nine healthy subjects was hybridized onto Agilent human miRNA microarray slides (V3), 8x15K, and expression data were analyzed on R statistical environment. After applying the rank products statistical test, the receiver-operating characteristic (ROC) curves were generated and the areas under the ROC curves (AUC) were calculated. To examine the functions of the differentially expressed (p-value<0.01, percentage of false-positives <0.05) miRNAs that passed the AUC cutoff value ? 0.90, the database miRWalk was used to predict their potential targets, which were afterwards submitted to the functional annotation tool provided by the Database for Annotation, Visualization, and Integrated Discovery (DAVID), version 6.7, using annotations from the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. We found 57 probes, corresponding to 44 different miRNAs (35 up-regulated and 9 down-regulated), that were differentially expressed in T1DM and passed the AUC threshold of 0.90. The hierarchical clustering analysis indicated the discriminatory power of those miRNAs, since they were able to clearly distinguish T1DM patients from healthy individuals. Target prediction indicated that 47 candidate genes for T1DM are potentially regulated by the differentially expressed miRNAs. After performing functional annotation analysis of the predicted targets, we observed 22 and 12 annotated KEGG pathways for the induced and repressed miRNAs, respectively. Interestingly, many pathways were enriched for the targets of both up- and down-regulated miRNAs and the majority of those pathways have been previously associated with T1DM, including many cancer-related pathways. In conclusion, our study indicated miRNAs that may be potential biomarkers of T1DM as well as provided new insights into the molecular mechanisms involved in this disorder. PMID:24530307

Takahashi, Paula; Xavier, Danilo J; Evangelista, Adriane F; Manoel-Caetano, Fernanda S; Macedo, Claudia; Collares, Cristhianna V A; Foss-Freitas, Maria C; Foss, Milton C; Rassi, Diane M; Donadi, Eduardo A; Passos, Geraldo A; Sakamoto-Hojo, Elza T

2014-04-15

144

BioBuilder as a database development and functional annotation platform for proteins  

PubMed Central

Background The explosion in biological information creates the need for databases that are easy to develop, easy to maintain and can be easily manipulated by annotators who are most likely to be biologists. However, deployment of scalable and extensible databases is not an easy task and generally requires substantial expertise in database development. Results BioBuilder is a Zope-based software tool that was developed to facilitate intuitive creation of protein databases. Protein data can be entered and annotated through web forms along with the flexibility to add customized annotation features to protein entries. A built-in review system permits a global team of scientists to coordinate their annotation efforts. We have already used BioBuilder to develop Human Protein Reference Database , a comprehensive annotated repository of the human proteome. The data can be exported in the extensible markup language (XML) format, which is rapidly becoming as the standard format for data exchange. Conclusions As the proteomic data for several organisms begins to accumulate, BioBuilder will prove to be an invaluable platform for functional annotation and development of customizable protein centric databases. BioBuilder is open source and is available under the terms of LGPL. PMID:15099404

Navarro, J Daniel; Talreja, Naveen; Peri, Suraj; Vrushabendra, BM; Rashmi, BP; Padma, N; Surendranath, Vineeth; Jonnalagadda, Chandra Kiran; Kousthub, PS; Deshpande, Nandan; Shanker, K; Pandey, Akhilesh

2004-01-01

145

VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment  

PubMed Central

Summary: The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. Availability and Implementation: VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org. Contact: lukas.habegger@yale.edu or mark.gerstein@yale.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:22743228

Habegger, Lukas; Balasubramanian, Suganthi; Chen, David Z.; Khurana, Ekta; Sboner, Andrea; Harmanci, Arif; Rozowsky, Joel; Clarke, Declan; Snyder, Michael; Gerstein, Mark

2012-01-01

146

Functional annotation and characterization of 3-hydroxybenzoate 6-hydroxylase from Rhodococcus jostii RHA1.  

PubMed

The genome of Rhodococcus jostii RHA1 contains an unusually large number of oxygenase encoding genes. Many of these genes have yet an unknown function, implying that a notable part of the biochemical and catabolic biodiversity of this Gram-positive soil actinomycete is still elusive. Here we present a multiple sequence alignment and phylogenetic analysis of putative R. jostii RHA1 flavoprotein hydroxylases. Out of 18 candidate sequences, three hydroxylases are absent in other available Rhodococcus genomes. In addition, we report the biochemical characterization of 3-hydroxybenzoate 6-hydroxylase (3HB6H), a gentisate-producing enzyme originally mis-annotated as salicylate hydroxylase. R. jostii RHA1 3HB6H expressed in Escherichia coli is a homodimer with each 47kDa subunit containing a non-covalently bound FAD cofactor. The enzyme has a pH optimum around pH 8.3 and prefers NADH as external electron donor. 3HB6H is active with a series of 3-hydroxybenzoate analogues, bearing substituents in ortho- or meta-position of the aromatic ring. Gentisate, the physiological product, is a non-substrate effector of 3HB6H. This compound is not hydroxylated but strongly stimulates the NADH oxidase activity of the enzyme. PMID:22207056

Montersino, Stefania; van Berkel, Willem J H

2012-03-01

147

Comparative annotation of functional regions in the human genome using epigenomic data  

E-print Network

Comparative annotation of functional regions in the human genome using epigenomic data Kyoung. The recently available epigenomic data in multiple cell types provide an unprecedented oppor- tunity of multiple epigenomes. INTRODUCTION Identifying cell-type­specific functional regions is an important step

Wang, Wei

148

AN ONTOLOGY-BASED ANNOTATION FRAMEWORK FOR REPRESENTING THE FUNCTIONALITY OF ENGINEERING DEVICES  

Microsoft Academic Search

This paper proposes a metadata schema relating to func- tionality based on Semantic Web technology for the manage- ment of the information content of engineering design docu- ments. The schema enables us to annotate web-documents with RDF metadata, which represents devices as having specific functions. The metadata provide a clear and operational seman- tics for the functional terms in documents.

Yoshinobu Kitamura; Naoya Washio; Yusuke Koji; Munehiko Sasajima; Sunao Takafuji; Riichiro Mizoguchi

149

Improved structural annotation of protein-coding genes in the Meloidogyne hapla genome using RNA-Seq  

PubMed Central

As high-throughput cDNA sequencing (RNA-Seq) is increasingly applied to hypothesis-driven biological studies, the prediction of protein coding genes based on these data are usurping strictly in silico approaches. Compared with computationally derived gene predictions, structural annotation is more accurate when based on biological evidence, particularly RNA-Seq data. Here, we refine the current genome annotation for the Meloidogyne hapla genome utilizing RNA-Seq data. Published structural annotation defines 14?420 protein-coding genes in the M. hapla genome. Of these, 25% (3751) were found to exhibit some incongruence with RNA-Seq data. Manual annotation enabled these discrepancies to be resolved. Our analysis revealed 544 new gene models that were missing from the prior annotation. Additionally, 1457 transcribed regions were newly identified on the ends of as-yet-unjoined contigs. We also searched for trans-spliced leaders, and based on RNA-Seq data, identified genes that appear to be trans-spliced. Four 22-bp trans-spliced leaders were identified using our pipeline, including the known trans-spliced leader, which is the M. hapla ortholog of SL1. In silico predictions of trans-splicing were validated by comparison with earlier results derived from an independent cDNA library constructed to capture trans-spliced transcripts. The new annotation, which we term HapPep5, is publically available at www.hapla.org. PMID:25254153

Guo, Yuelong; Bird, David McK; Nielsen, Dahlia M

2014-01-01

150

Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation  

PubMed Central

Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific information about genes or microRNAs is quick and easily accessible. Hence, this platform can support the ongoing OS research and biomarker discovery. Database URL: http://osteosarcoma-db.uni-muenster.de PMID:24865352

Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

2014-01-01

151

De Novo Assembly, Gene Annotation, and Marker Discovery in Stored-Product Pest Liposcelis entomophila (Enderlein) Using Transcriptome Sequences  

PubMed Central

Background As a major stored-product pest insect, Liposcelis entomophila has developed high levels of resistance to various insecticides in grain storage systems. However, the molecular mechanisms underlying resistance and environmental stress have not been characterized. To date, there is a lack of genomic information for this species. Therefore, studies aimed at profiling the L. entomophila transcriptome would provide a better understanding of the biological functions at the molecular levels. Methodology/Principal Findings We applied Illumina sequencing technology to sequence the transcriptome of L. entomophila. A total of 54,406,328 clean reads were obtained and that de novo assembled into 54,220 unigenes, with an average length of 571 bp. Through a similarity search, 33,404 (61.61%) unigenes were matched to known proteins in the NCBI non-redundant (Nr) protein database. These unigenes were further functionally annotated with gene ontology (GO), cluster of orthologous groups of proteins (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. A large number of genes potentially involved in insecticide resistance were manually curated, including 68 putative cytochrome P450 genes, 37 putative glutathione S-transferase (GST) genes, 19 putative carboxyl/cholinesterase (CCE) genes, and other 126 transcripts to contain target site sequences or encoding detoxification genes representing eight types of resistance enzymes. Furthermore, to gain insight into the molecular basis of the L. entomophila toward thermal stresses, 25 heat shock protein (Hsp) genes were identified. In addition, 1,100 SSRs and 57,757 SNPs were detected and 231 pairs of SSR primes were designed for investigating the genetic diversity in future. Conclusions/Significance We developed a comprehensive transcriptomic database for L. entomophila. These sequences and putative molecular markers would further promote our understanding of the molecular mechanisms underlying insecticide resistance or environmental stress, and will facilitate studies on population genetics for psocids, as well as providing useful information for functional genomic research in the future. PMID:24244605

Wei, Dan-Dan; Chen, Er-Hu; Ding, Tian-Bo; Chen, Shi-Chun; Dou, Wei; Wang, Jin-Jun

2013-01-01

152

Annokey: an annotation tool based on key term search of the NCBI Entrez Gene database  

PubMed Central

Background The NCBI Entrez Gene and PubMed databases contain a wealth of high-quality information about genes for many different organisms. The NCBI Entrez online web-search interface is convenient for simple manual search for a small number of genes but impractical for the kinds of outputs seen in typical genomics projects. Results We have developed an efficient open source tool implemented in Python called Annokey, which annotates gene lists with the results of a keyword search of the NCBI Entrez Gene database and linked Pubmed article information. The user steers the search by specifying a ranked list of keywords (including multi-word phrases and regular expressions) that are correlated with their topic of interest. Rank information of matched terms allows the user to guide further investigation. We applied Annokey to the entire human Entrez Gene database using the key-term “DNA repair” and assessed its performance in identifying the 176 members of a published “gold standard” list of genes established to be involved in this pathway. For this test case we observed a sensitivity and specificity of 97% and 96%, respectively. Conclusions Annokey facilitates the identification of genes related to an area of interest, a task which can be onerous if performed manually on a large number of genes. Annokey provides a way to capitalize on the high quality information provided by the Entrez Gene database allowing both scalability and compatibility with automated analysis pipelines, thus offering the potential to significantly enhance research productivity.

2014-01-01

153

Proteomics and transcriptomics of the BABA-induced resistance response in potato using a novel functional annotation approach  

PubMed Central

Background Induced resistance (IR) can be part of a sustainable plant protection strategy against important plant diseases. ?-aminobutyric acid (BABA) can induce resistance in a wide range of plants against several types of pathogens, including potato infected with Phytophthora infestans. However, the molecular mechanisms behind this are unclear and seem to be dependent on the system studied. To elucidate the defence responses activated by BABA in potato, a genome-wide transcript microarray analysis in combination with label-free quantitative proteomics analysis of the apoplast secretome were performed two days after treatment of the leaf canopy with BABA at two concentrations, 1 and 10 mM. Results Over 5000 transcripts were differentially expressed and over 90 secretome proteins changed in abundance indicating a massive activation of defence mechanisms with 10 mM BABA, the concentration effective against late blight disease. To aid analysis, we present a more comprehensive functional annotation of the microarray probes and gene models by retrieving information from orthologous gene families across 26 sequenced plant genomes. The new annotation provided GO terms to 8616 previously un-annotated probes. Conclusions BABA at 10 mM affected several processes related to plant hormones and amino acid metabolism. A major accumulation of PR proteins was also evident, and in the mevalonate pathway, genes involved in sterol biosynthesis were down-regulated, whereas several enzymes involved in the sesquiterpene phytoalexin biosynthesis were up-regulated. Interestingly, abscisic acid (ABA) responsive genes were not as clearly regulated by BABA in potato as previously reported in Arabidopsis. Together these findings provide candidates and markers for improved resistance in potato, one of the most important crops in the world. PMID:24773703

2014-01-01

154

DroID: the Drosophila Interactions Database, a comprehensive resource for annotated gene and protein interactions  

PubMed Central

Background Charting the interactions among genes and among their protein products is essential for understanding biological systems. A flood of interaction data is emerging from high throughput technologies, computational approaches, and literature mining methods. Quick and efficient access to this data has become a critical issue for biologists. Several excellent multi-organism databases for gene and protein interactions are available, yet most of these have understandable difficulty maintaining comprehensive information for any one organism. No single database, for example, includes all available interactions, integrated gene expression data, and comprehensive and searchable gene information for the important model organism, Drosophila melanogaster. Description DroID, the Drosophila Interactions Database, is a comprehensive interactions database designed specifically for Drosophila. DroID houses published physical protein interactions, genetic interactions, and computationally predicted interactions, including interologs based on data for other model organisms and humans. All interactions are annotated with original experimental data and source information. DroID can be searched and filtered based on interaction information or a comprehensive set of gene attributes from Flybase. DroID also contains gene expression and expression correlation data that can be searched and used to filter datasets, for example, to focus a study on sub-networks of co-expressed genes. To address the inherent noise in interaction data, DroID employs an updatable confidence scoring system that assigns a score to each physical interaction based on the likelihood that it represents a biologically significant link. Conclusion DroID is the most comprehensive interactions database available for Drosophila. To facilitate downstream analyses, interactions are annotated with original experimental information, gene expression data, and confidence scores. All data in DroID are freely available and can be searched, explored, and downloaded through three different interfaces, including a text based web site, a Java applet with dynamic graphing capabilities (IM Browser), and a Cytoscape plug-in. DroID is available at . PMID:18840285

Yu, Jingkai; Pacifico, Svetlana; Liu, Guozhen; Finley, Russell L

2008-01-01

155

Functional Annotation of 2D Protein Maps: The GelMap Portal  

PubMed Central

In classical proteome analyses, final experimental data are (a) images of 2D protein separations obtained by gel electrophoresis and (b) corresponding lists of proteins which were identified by mass spectrometry (MS). For data annotation, software tools were developed which allow the linking of protein identity data directly to 2D gels (“clickable gels”). GelMap is a new online software tool to annotate 2D protein maps. It allows (i) functional annotation of all identified proteins according to biological categories defined by the user, e.g., subcellular localization, metabolic pathway, or assignment to a protein complex and (ii) annotation of several proteins per analyzed protein “spot” according to MS primary data. Options to differentially display proteins of functional categories offer new opportunities for data evaluation. For instance, if used for the annotation of 2D Blue native/SDS gels, GelMap allows the identification of protein complexes of low abundance. A web portal has been established for presentation and evaluation of protein identity data related to 2D gels and is freely accessible at http://www.gelmap.de/. PMID:22639671

Senkler, Michael; Braun, Hans-Peter

2012-01-01

156

Comprehensive functional annotation of 18 missense mutations found in suspected hemochromatosis type 4 patients.  

PubMed

Hemochromatosis type 4 is a rare form of primary iron overload transmitted as an autosomal dominant trait caused by mutations in the gene encoding the iron transport protein ferroportin 1 (SLC40A1). SLC40A1 mutations fall into two functional categories (loss- versus gain-of-function) underlying two distinct clinical entities (hemochromatosis type 4A versus type 4B). However, the vast majority of SLC40A1 mutations are rare missense variations, with only a few showing strong evidence of causality. The present study reports the results of an integrated approach collecting genetic and phenotypic data from 44 suspected hemochromatosis type 4 patients, with comprehensive structural and functional annotations. Causality was demonstrated for 10 missense variants, showing a clear dichotomy between the two hemochromatosis type 4 subtypes. Two subgroups of loss-of-function mutations were distinguished: one impairing cell-surface expression and one altering only iron egress. Additionally, a new gain-of-function mutation was identified, and the degradation of ferroportin on hepcidin binding was shown to probably depend on the integrity of a large extracellular loop outside of the hepcidin-binding domain. Eight further missense variations, on the other hand, were shown to have no discernible effects at either protein or RNA level; these were found in apparently isolated patients and were associated with a less severe phenotype. The present findings illustrate the importance of combining in silico and biochemical approaches to fully distinguish pathogenic SLC40A1 mutations from benign variants. This has profound implications for patient management. PMID:24714983

Callebaut, Isabelle; Joubrel, Rozenn; Pissard, Serge; Kannengiesser, Caroline; Gérolami, Victoria; Ged, Cécile; Cadet, Estelle; Cartault, François; Ka, Chandran; Gourlaouen, Isabelle; Gourhant, Lénaick; Oudin, Claire; Goossens, Michel; Grandchamp, Bernard; De Verneuil, Hubert; Rochette, Jacques; Férec, Claude; Le Gac, Gérald

2014-09-01

157

Accurate Protein Structure Annotation through Competitive Diffusion of Enzymatic Functions over a Network of Local Evolutionary Similarities  

Microsoft Academic Search

High-throughput Structural Genomics yields many new protein structures without known molecular function. This study aims to uncover these missing annotations by globally comparing select functional residues across the structural proteome. First, Evolutionary Trace Annotation, or ETA, identifies which proteins have local evolutionary and structural features in common; next, these proteins are linked together into a proteomic network of ETA similarities;

Eric Venner; Andreas Martin Lisewski; Serkan Erdin; R. Matthew Ward; Shivas R. Amin; Olivier Lichtarge; Christos Ouzounis

2010-01-01

158

Linking enzyme sequence to function using conserved property difference locator to identify and annotate positions likely to control specific functionality  

Microsoft Academic Search

BACKGROUND: Families of homologous enzymes evolved from common progenitors. The availability of multiple sequences representing each activity presents an opportunity for extracting information specifying the functionality of individual homologs. We present a straightforward method for the identification of residues likely to determine class specific functionality in which multiple sequence alignments are converted to an annotated graphical form by the Conserved

Kimberly M. Mayer; Sean R. Mccorkle; John Shanklin

2005-01-01

159

Annotated genetic linkage maps of Pinus pinaster Ait. from a Central Spain population using microsatellite and gene based markers  

PubMed Central

Background Pinus pinaster Ait. is a major resin producing species in Spain. Genetic linkage mapping can facilitate marker-assisted selection (MAS) through the identification of Quantitative Trait Loci and selection of allelic variants of interest in breeding populations. In this study, we report annotated genetic linkage maps for two individuals (C14 and C15) belonging to a breeding program aiming to increase resin production. We use different types of DNA markers, including last-generation molecular markers. Results We obtained 13 and 14 linkage groups for C14 and C15 maps, respectively. A total of 211 and 215 markers were positioned on each map and estimated genome length was between 1,870 and 2,166 cM respectively, which represents near 65% of genome coverage. Comparative mapping with previously developed genetic linkage maps for P. pinaster based on about 60 common markers enabled aligning linkage groups to this reference map. The comparison of our annotated linkage maps and linkage maps reporting QTL information revealed 11 annotated SNPs in candidate genes that co-localized with previously reported QTLs for wood properties and water use efficiency. Conclusions This study provides genetic linkage maps from a Spanish population that shows high levels of genetic divergence with French populations from which segregating progenies have been previously mapped. These genetic maps will be of interest to construct a reliable consensus linkage map for the species. The importance of developing functional genetic linkage maps is highlighted, especially when working with breeding populations for its future application in MAS for traits of interest. PMID:23036012

2012-01-01

160

Towards protein function annotations for matching remote homologs  

E-print Network

Identifying functional similarities for proteins with low sequence identity and low structure similarity often suffers from high false positives and false negatives results. To improve the functional prediction ability based on the local protein...

Lei, Seak Fei

2008-07-03

161

Annotating Nucleic Acid-Binding Function Based on Protein Structure  

E-print Network

. Gregoret2 * and Yael Mandel-Gutfreund2 1 Department of Molecular, Cell and Developmental Biology University or no structural similarity to those currently in the database. Therefore, novel function prediction methods function by searching the databases for homologous proteins with known function.5 Detec- tion of conserved

Mandel-Gutfreund, Yael

162

The IGS Standard Operating Procedure for Automated Prokaryotic Annotation.  

PubMed

The Institute for Genome Sciences (IGS) has developed a prokaryotic annotation pipeline that is used for coding gene/RNA prediction and functional annotation of Bacteria and Archaea. The fully automated pipeline accepts one or many genomic sequences as input and produces output in a variety of standard formats. Functional annotation is primarily based on similarity searches and motif finding combined with a hierarchical rule based annotation system. The output annotations can also be loaded into a relational database and accessed through visualization tools. PMID:21677861

Galens, Kevin; Orvis, Joshua; Daugherty, Sean; Creasy, Heather H; Angiuoli, Sam; White, Owen; Wortman, Jennifer; Mahurkar, Anup; Giglio, Michelle Gwinn

2011-04-29

163

tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles  

PubMed Central

The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the ‘tagtog’ system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation. Database URL: www.tagtog.net, www.flybase.org PMID:24715220

Cejuela, Juan Miguel; McQuilton, Peter; Ponting, Laura; Marygold, Steven J.; Stefancsik, Raymund; Millburn, Gillian H.; Rost, Burkhard

2014-01-01

164

Next-Generation Annotation of Prokaryotic Genomes with EuGene-P: Application to Sinorhizobium meliloti 2011  

PubMed Central

The availability of next-generation sequences of transcripts from prokaryotic organisms offers the opportunity to design a new generation of automated genome annotation tools not yet available for prokaryotes. In this work, we designed EuGene-P, the first integrative prokaryotic gene finder tool which combines a variety of high-throughput data, including oriented RNA-Seq data, directly into the prediction process. This enables the automated prediction of coding sequences (CDSs), untranslated regions, transcription start sites (TSSs) and non-coding RNA (ncRNA, sense and antisense) genes. EuGene-P was used to comprehensively and accurately annotate the genome of the nitrogen-fixing bacterium Sinorhizobium meliloti strain 2011, leading to the prediction of 6308 CDSs as well as 1876 ncRNAs. Among them, 1280 appeared as antisense to a CDS, which supports recent findings that antisense transcription activity is widespread in bacteria. Moreover, 4077 TSSs upstream of protein-coding or non-coding genes were precisely mapped providing valuable data for the study of promoter regions. By looking for RpoE2-binding sites upstream of annotated TSSs, we were able to extend the S. meliloti RpoE2 regulon by ?3-fold. Altogether, these observations demonstrate the power of EuGene-P to produce a reliable and high-resolution automatic annotation of prokaryotic genomes. PMID:23599422

Sallet, Erika; Roux, Brice; Sauviac, Laurent; Jardinaud, Marie-Franc¸oise; Carrère, Sébastien; Faraut, Thomas; de Carvalho-Niebel, Fernanda; Gouzy, Jérôme; Gamas, Pascal; Capela, Delphine; Bruand, Claude; Schiex, Thomas

2013-01-01

165

SNPit: a federated data integration system for the purpose of functional SNP annotation.  

PubMed

Genome wide association studies can potentially identify the genetic causes behind the majority of human diseases. With the advent of more advanced genotyping techniques, there is now an explosion of data gathered on single nucleotide polymorphisms (SNPs). The need exists for an integrated system that can provide up-to-date functional annotation information on SNPs. We have developed the SNP Integration Tool (SNPit) system to address this need. Built upon a federated data integration system, SNPit provides current information on a comprehensive list of SNP data sources. Additional logical inference analysis was included through an inference engine plug in. The SNPit web servlet is available online for use. SNPit allows users to go to one source for up-to-date information on the functional annotation of SNPs. A tool that can help to integrate and analyze the potential functional significance of SNPs is important for understanding the results from genome wide association studies. PMID:19327864

Shen, Terry H; Carlson, Christopher S; Tarczy-Hornoch, Peter

2009-08-01

166

Genome Annotation by Shotgun Inactivation of a Native Gene in Hemizygous Cells: Application to BRCA2 with Implication of Hypomorphic Variants.  

PubMed

The greatest interpretive challenge of modern medicine may be to functionally annotate the vast variation of human genomes. Demonstrating a proposed approach, we created a library of BRCA2 exon 27 shotgun-mutant plasmids including solitary and multiplex mutations to generate human knockin clones using homologous recombination. This 55-mutation, 13-clone syngeneic variance library (SyVaL) comprised severely affected clones having early-stop nonsense mutations, functionally hypomorphic clones having multiple missense mutations emphasizing the potential to identify and assess hypomorphic mutations in novel proteomic and epidemiologic studies, and neutral clones having multiple missense mutations. Efficient coverage of non-essential amino acids was provided by mutation multiplexing. Severe mutations were distinguished from hypomorphic or neutral changes by chemosensitivity assays (hypersensitivity to mitomycin C and acetaldehyde), by analysis of RAD51 focus-formation, and by mitotic multipolarity. A multiplex unbiased approach of generating all-human SyVaLs in medically important genes, with random mutations in native genes, would provide databases of variants that could be functionally annotated without concerns arising from exogenous cDNA constructs or interspecies interactions, as a basis for subsequent proteomic domain-mapping or clinical calibration if desired. Such gene-irrelevant approaches could be scaled up for multiple genes of clinical interest, providing distributable cellular libraries linked to public shared functional databases. This article is protected by copyright. All rights reserved. PMID:25451944

Ghosh, Soma; Bhunia, Anil K; Paun, Bogdan C; Gilbert, Samuel F; Dhru, Urmil; Patel, Kalpesh; Kern, Scott E

2014-11-29

167

Semantic Particularity Measure for Functional Characterization of Gene Sets Using Gene Ontology  

PubMed Central

Background Genetic and genomic data analyses are outputting large sets of genes. Functional comparison of these gene sets is a key part of the analysis, as it identifies their shared functions, and the functions that distinguish each set. The Gene Ontology (GO) initiative provides a unified reference for analyzing the genes molecular functions, biological processes and cellular components. Numerous semantic similarity measures have been developed to systematically quantify the weight of the GO terms shared by two genes. We studied how gene set comparisons can be improved by considering gene set particularity in addition to gene set similarity. Results We propose a new approach to compute gene set particularities based on the information conveyed by GO terms. A GO term informativeness can be computed using either its information content based on the term frequency in a corpus, or a function of the term's distance to the root. We defined the semantic particularity of a set of GO terms Sg1 compared to another set of GO terms Sg2. We combined our particularity measure with a similarity measure to compare gene sets. We demonstrated that the combination of semantic similarity and semantic particularity measures was able to identify genes with particular functions from among similar genes. This differentiation was not recognized using only a semantic similarity measure. Conclusion Semantic particularity should be used in conjunction with semantic similarity to perform functional analysis of GO-annotated gene sets. The principle is generalizable to other ontologies. PMID:24489737

Bettembourg, Charles; Diot, Christian; Dameron, Olivier

2014-01-01

168

Gene Ontology annotation of sequence-specific DNA binding transcription factors: setting the stage for a large-scale curation effort  

PubMed Central

Transcription factors control which information in a genome becomes transcribed to produce RNAs that function in the biological systems of cells and organisms. Reliable and comprehensive information about transcription factors is invaluable for large-scale network-based studies. However, existing transcription factor knowledge bases are still lacking in well-documented functional information. Here, we provide guidelines for a curation strategy, which constitutes a robust framework for using the controlled vocabularies defined by the Gene Ontology Consortium to annotate specific DNA binding transcription factors (DbTFs) based on experimental evidence reported in literature. Our standardized protocol and workflow for annotating specific DNA binding RNA polymerase II transcription factors is designed to document high-quality and decisive evidence from valid experimental methods. Within a collaborative biocuration effort involving the user community, we are now in the process of exhaustively annotating the full repertoire of human, mouse and rat proteins that qualify as DbTFs in as much as they are experimentally documented in the biomedical literature today. The completion of this task will significantly enrich Gene Ontology-based information resources for the research community. Database URL: www.tfcheckpoint.org PMID:23981286

Tripathi, Sushil; Christie, Karen R.; Balakrishnan, Rama; Huntley, Rachael; Hill, David P.; Thommesen, Liv; Blake, Judith A.; Kuiper, Martin; Lægreid, Astrid

2013-01-01

169

Automated pipeline for atlas-based annotation of gene expression patterns: Application to postnatal day 7 mouse brain  

PubMed Central

Massive amounts of image data have been collected and continue to be generated for representing cellular gene expression throughout the mouse brain. Critical to exploiting this key effort of the post-genomic era is the ability to place these data into a common spatial reference that enables rapid interactive queries, analysis, data sharing, and visualization. In this paper, we present a set of automated protocols for generating and annotating gene expression patterns suitable for the establishment of a database. The steps include imaging tissue slices, detecting cellular gene expression levels, spatial registration with an atlas, and textual annotation. Using high-throughput in situ hybridization to generate serial sets of tissues displaying gene expression, this process was applied towards the establishment of a database representing over 200 genes in the postnatal day 7 mouse brain. These data using this protocol are now well-suited for interactive comparisons, analysis, queries, and visualization. PMID:19698790

Carson, James; Ju, Tao; Bello, Musodiq; Thaller, Christina; Warren, Joe; Kakadiaris, Ioannis A.; Chiu, Wah; Eichele, Gregor

2009-01-01

170

Revised Annotations, Sex-Biased Expression, and Lineage-Specific Genes in the Drosophila melanogaster Group  

PubMed Central

Here, we provide revised gene models for D. ananassae, D. yakuba, and D. simulans, which include untranslated regions and empirically verified intron-exon boundaries, as well as ortholog groups identified using a fuzzy reciprocal-best-hit blast comparison. Using these revised annotations, we perform differential expression testing using the cufflinks suite to provide a broad overview of differential expression between reproductive tissues and the carcass. We identify thousands of genes that are differentially expressed across tissues in D. yakuba and D. simulans, with roughly 60% agreement in expression patterns of orthologs in D. yakuba and D. simulans. We identify several cases of putative polycistronic transcripts, pointing to a combination of transcriptional read-through in the genome as well as putative gene fusion and fission events across taxa. We furthermore identify hundreds of lineage specific genes in each species with no blast hits among transcripts of any other Drosophila species, which are candidates for neofunctionalized proteins and a potential source of genetic novelty. PMID:25273863

Rogers, Rebekah L.; Shao, Ling; Sanjak, Jaleal S.; Andolfatto, Peter; Thornton, Kevin R.

2014-01-01

171

Culturable diversity and functional annotation of psychrotrophic bacteria from cold desert of Leh Ladakh (India).  

PubMed

To study culturable bacterial diversity under subzero temperature conditions and their possible functional annotation, soil and water samples from Leh Ladakh region were analysed. Ten different nutrient combinations were used to isolate the maximum possible culturable morphotypes. A total of 325 bacterial isolates were characterized employing 16S rDNA-Amplified Ribosomal DNA Restriction Analysis with three restriction endonucleases AluI, MspI and HaeIII, which led to formation of 23-40 groups for the different sites at 75 % similarity index, adding up to 175 groups. Phylogenetic analysis based on 16S rRNA gene sequencing led to the identification of 175 bacteria, grouped in four phyla, Firmicutes (54 %), Proteobacteria (28 %), Actinobacteria (16 %) and Bacteroidetes (3 %), and included 29 different genera with 57 distinct species. Overall 39 % of the total morphotypes belonged to the Bacillus and Bacillus derived genera (BBDG) followed by Pseudomonas (14 %), Arthrobacter (9 %), Exiguobacterium (8 %), Alishewanella (4 %), Brachybacterium, Providencia, Planococcus (3 %), Janthinobacterium, Sphingobacterium, Kocuria (2 %) and Aurantimonas, Citricoccus, Cellulosimicrobium, Brevundimonas, Desemzia, Flavobacterium, Klebsiella, Paracoccus, Psychrobacter, Sporosarcina, Staphylococcus, Sinobaca, Stenotrophomonas, Sanguibacter, Vibrio (1 %). The representative isolates from each cluster were screened for their plant growth promoting characteristics at low temperature (5-15 °C). Variations were observed among strains for production of ammonia, hydrogen cyanide, indole-3-acetic acid and siderophore, solubilisation of phosphate, 1-aminocyclopropane-1-carboxylate deaminase activity and biocontrol activity against Rhizoctonia solani and Macrophomina phaseolina. Cold adapted microbes may have application as inoculants and biocontrol agents in crops growing at high altitudes under cold climate condition. PMID:25371316

Yadav, Ajar Nath; Sachan, Shashwati Ghosh; Verma, Priyanka; Tyagi, Satya Prakash; Kaushik, Rajeev; Saxena, Anil K

2015-01-01

172

Mouse Genetics: Determining gene function  

E-print Network

Mouse Genetics: Determining gene function An International Centre for Mouse Genetics Mammalian Genetics Unit #12;Determining gene function · Mutagenesis approaches · Gene-driven, phenotype for Mouse Genetics Mammalian Genetics Unit #12;An International Centre for Mouse Genetics Mammalian Genetics

Goldschmidt, Christina

173

RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes  

PubMed Central

The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception. PMID:25666585

Brettin, Thomas; Davis, James J.; Disz, Terry; Edwards, Robert A.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Shukla, Maulik; Thomason, James A.; Stevens, Rick; Vonstein, Veronika; Wattam, Alice R.; Xia, Fangfang

2015-01-01

174

RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes.  

PubMed

The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception. PMID:25666585

Brettin, Thomas; Davis, James J; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Olsen, Gary J; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Shukla, Maulik; Thomason, James A; Stevens, Rick; Vonstein, Veronika; Wattam, Alice R; Xia, Fangfang

2015-01-01

175

Gene prediction and annotation in Penstemon (Plantaginaceae): A workflow for marker development from extremely low-coverage genome sequencing1  

PubMed Central

• Premise of the study: Penstemon (Plantaginaceae) is a large and diverse genus endemic to North America. However, determining the phylogenetic relationships among its 280 species has been difficult due to its recent evolutionary radiation. The development of a large, multilocus data set can help to resolve this challenge. • Methods: Using both previously sequenced genomic libraries and our own low-coverage whole-genome shotgun sequencing libraries, we used the MAKER2 Annotation Pipeline to identify gene regions for the development of sequencing loci from six extremely low-coverage Penstemon genomes (?0.005ז0.007×). We also compared this approach to BLAST searches, and conducted analyses to characterize sequence divergence across the species sequenced. • Results: Annotations and gene predictions were successfully added to more than 10,000 contigs for potential use in downstream primer design. Primers were then designed for chloroplast, mitochondrial, and nuclear loci from these annotated sequences. MAKER2 identified longer gene regions in all six Penstemon genomes when compared with BLASTN and BLASTX searches. The average level of sequence divergence among the six species was 7.14%. • Discussion: Combining bioinformatics tools into a workflow that produces annotations can be useful for creating potential phylogenetic markers from thousands of sequences even when genome coverage is extremely low and reference data are only available from distant relatives. Furthermore, the output from MAKER2 contains information about important gene features, such as exon boundaries, and can be easily integrated with visualization tools to facilitate the process of marker development. PMID:25506519

Blischak, Paul D.; Wenzel, Aaron J.; Wolfe, Andrea D.

2014-01-01

176

Analysis and functional annotation of expressed sequence tags of water buffalo.  

PubMed

An elucidated genome of domestic livestock river buffalo will contribute enormously to economy and better understanding of genome evolution as well. An attempt is made to obtain genomic information on buffalo, based on total Expressed Sequence Tags (ESTs) of Bubalus bubalis available in public domain. These ESTs were annotated and classified into 15 different functional categories based on their homology to the known proteins. Interestingly, 41.79% of the contigs were found to be buffalo specific novel ESTs with respect to other species used in analysis which needs further studies. Also, 224 pSNPs (putative Single Nucleotide Polymorphism) were detected. This study will provide a home base for further genomic studies of buffalo and comparative studies enabling a starting point for the genome annotation of the organism. Supplementary materials are available for this article online. PMID:23394367

Bajetha, Garima; Bhati, Jyotika; Sarika; Iquebal, M A; Rai, Anil; Arora, Vasu; Kumar, Dinesh

2013-01-01

177

The cytochrome P450 gene superfamily in Drosophila melanogaster: Annotation, intron-exon organization and phylogeny  

Microsoft Academic Search

The cytochrome P450 gene superfamily is represented by 90 sequences in the Drosophila melanogaster genome. Of these 90 P450 sequences, 83 code for apparently functional genes whereas seven are apparent pseudogenes. More than half of the genes belong to only two families, CYP4 and CYP6. The CYP6 family is insect specific whereas the CYP4 family includes sequences from vertebrates. There

Nathalie Tijet; Christian Helvig; René Feyereisen

2001-01-01

178

Automated Update, Revision, and Quality Control of the Maize Genome Annotations Using MAKER-P Improves the B73 RefGen_v3 Gene Models and Identifies New Genes1[OPEN  

PubMed Central

The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-P to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure. MAKER-P identified and annotated 4,466 additional, well-supported protein-coding genes not present in the 5b+ annotation build, added additional untranslated regions to 1,393 5b+ gene models, identified 2,647 5b+ gene models that lack any supporting evidence (despite the use of large and diverse evidence data sets), identified 104,215 pseudogene fragments, and created an additional 2,522 noncoding gene annotations. We also describe a method for de novo training of MAKER-P for the annotation of newly sequenced grass genomes. Collectively, these results lead to the 6a maize genome annotation and demonstrate the utility of MAKER-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes. PMID:25384563

Law, MeiYee; Childs, Kevin L.; Campbell, Michael S.; Stein, Joshua C.; Olson, Andrew J.; Holt, Carson; Panchy, Nicholas; Lei, Jikai; Jiao, Dian; Andorf, Carson M.; Lawrence, Carolyn J.; Ware, Doreen; Shiu, Shin-Han; Sun, Yanni; Jiang, Ning; Yandell, Mark

2015-01-01

179

Automated Update, Revision, and Quality Control of the Maize Genome Annotations Using MAKER-P Improves the B73 RefGen_v3 Gene Models and Identifies New Genes.  

PubMed

The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-P to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure. MAKER-P identified and annotated 4,466 additional, well-supported protein-coding genes not present in the 5b+ annotation build, added additional untranslated regions to 1,393 5b+ gene models, identified 2,647 5b+ gene models that lack any supporting evidence (despite the use of large and diverse evidence data sets), identified 104,215 pseudogene fragments, and created an additional 2,522 noncoding gene annotations. We also describe a method for de novo training of MAKER-P for the annotation of newly sequenced grass genomes. Collectively, these results lead to the 6a maize genome annotation and demonstrate the utility of MAKER-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes. PMID:25384563

Law, MeiYee; Childs, Kevin L; Campbell, Michael S; Stein, Joshua C; Olson, Andrew J; Holt, Carson; Panchy, Nicholas; Lei, Jikai; Jiao, Dian; Andorf, Carson M; Lawrence, Carolyn J; Ware, Doreen; Shiu, Shin-Han; Sun, Yanni; Jiang, Ning; Yandell, Mark

2015-01-01

180

Meta4: a web application for sharing and annotating metagenomic gene predictions using web services  

PubMed Central

Whole-genome shotgun metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website1, code is available on Github2, a cloud image is available, and an example implementation can be seen at PMID:24046776

Richardson, Emily J.; Escalettes, Franck; Fotheringham, Ian; Wallace, Robert J.; Watson, Mick

2013-01-01

181

WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation  

PubMed Central

Background SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases. Results The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO3d programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively. Conclusions WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go. PMID:23819482

2013-01-01

182

Genome-wide identification and functional annotation of Plasmodium falciparum long noncoding RNAs from RNA-seq data.  

PubMed

The life cycle of Plasmodium falciparum is very complex, with an erythrocytic stage that involves the invasion of red blood cells and the survival and growth of the parasite within the host. Over the past several decades, numbers of studies have shown that proteins exported by P. falciparum to the surface of infected red blood cells play a critical role in recognition and interaction with host receptors and are thus essential for the completion of the life cycle of P. falciparum. However, little is known about long noncoding RNAs (lncRNAs). In this study, we designed a computational pipeline to identify new lncRNAs of P. falciparum from published RNA-seq data and analyzed their sequences and expression features. As a result, 164 novel lncRNAs were found. The sequences and expression features of P. falciparum lncRNAs were similar to those of humans and mice: there was a lack of sequence conservation, low expression levels, and high expression coefficient of variance and co-expression with nearby coding sequences in the genome. Next, a coding/noncoding gene co-expression network for P. falciparum was constructed to further annotate the functions of novel and known lncRNAs. In total, the functions of 69 lncRNAs, including 44 novel lncRNAs, were annotated. The main functions of the lncRNAs included metabolic processes, biosynthetic processes, regulation of biological processes, establishment of localization, catabolic processes, cellular component organization, and interspecies interactions between organisms. Our results will provide clues to further the investigation of interactions between human hosts and parasites and the mechanisms of P. falciparum infection. PMID:24522451

Liao, Qi; Shen, Jia; Liu, Jianfa; Sun, Xi; Zhao, Guoguang; Chang, Yanzi; Xu, Leiting; Li, Xuerong; Zhao, Ya; Zheng, Huanqin; Zhao, Yi; Wu, Zhongdao

2014-04-01

183

Analysis of the leaf transcriptome of Musa acuminata during interaction with Mycosphaerella musicola: gene assembly, annotation and marker development  

PubMed Central

Background Although banana (Musa sp.) is an important edible crop, contributing towards poverty alleviation and food security, limited transcriptome datasets are available for use in accelerated molecular-based breeding in this genus. 454 GS-FLX Titanium technology was employed to determine the sequence of gene transcripts in genotypes of Musa acuminata ssp. burmannicoides Calcutta 4 and M. acuminata subgroup Cavendish cv. Grande Naine, contrasting in resistance to the fungal pathogen Mycosphaerella musicola, causal organism of Sigatoka leaf spot disease. To enrich for transcripts under biotic stress responses, full length-enriched cDNA libraries were prepared from whole plant leaf materials, both uninfected and artificially challenged with pathogen conidiospores. Results The study generated 846,762 high quality sequence reads, with an average length of 334 bp and totalling 283 Mbp. De novo assembly generated 36,384 and 35,269 unigene sequences for M. acuminata Calcutta 4 and Cavendish Grande Naine, respectively. A total of 64.4% of the unigenes were annotated through Basic Local Alignment Search Tool (BLAST) similarity analyses against public databases. Assembled sequences were functionally mapped to Gene Ontology (GO) terms, with unigene functions covering a diverse range of molecular functions, biological processes and cellular components. Genes from a number of defense-related pathways were observed in transcripts from each cDNA library. Over 99% of contig unigenes mapped to exon regions in the reference M. acuminata DH Pahang whole genome sequence. A total of 4068 genic-SSR loci were identified in Calcutta 4 and 4095 in Cavendish Grande Naine. A subset of 95 potential defense-related gene-derived simple sequence repeat (SSR) loci were validated for specific amplification and polymorphism across M. acuminata accessions. Fourteen loci were polymorphic, with alleles per polymorphic locus ranging from 3 to 8 and polymorphism information content ranging from 0.34 to 0.82. Conclusions A large set of unigenes were characterized in this study for both M. acuminata Calcutta 4 and Cavendish Grande Naine, increasing the number of public domain Musa ESTs. This transcriptome is an invaluable resource for furthering our understanding of biological processes elicited during biotic stresses in Musa. Gene-based markers will facilitate molecular breeding strategies, forming the basis of genetic linkage mapping and analysis of quantitative trait loci. PMID:23379821

2013-01-01

184

Mining functionally relevant gene sets for analyzing physiologically novel clinical expression data.  

PubMed

Gene set analyses have become a standard approach for increasing the sensitivity of transcriptomic studies. However, analytical methods incorporating gene sets require the availability of pre-defined gene sets relevant to the underlying physiology being studied. For novel physiological problems, relevant gene sets may be unavailable or existing gene set databases may bias the results towards only the best-studied of the relevant biological processes. We describe a successful attempt to mine novel functional gene sets for translational projects where the underlying physiology is not necessarily well characterized in existing annotation databases. We choose targeted training data from public expression data repositories and define new criteria for selecting biclusters to serve as candidate gene sets. Many of the discovered gene sets show little or no enrichment for informative Gene Ontology terms or other functional annotation. However, we observe that such gene sets show coherent differential expression in new clinical test data sets, even if derived from different species, tissues, and disease states. We demonstrate the efficacy of this method on a human metabolic data set, where we discover novel, uncharacterized gene sets that are diagnostic of diabetes, and on additional data sets related to neuronal processes and human development. Our results suggest that our approach may be an efficient way to generate a collection of gene sets relevant to the analysis of data for novel clinical applications where existing functional annotation is relatively incomplete. PMID:21121032

Turcan, Sevin; Vetter, Douglas E; Maron, Jill L; Wei, Xintao; Slonim, Donna K

2011-01-01

185

MINING FUNCTIONALLY RELEVANT GENE SETS FOR ANALYZING PHYSIOLOGICALLY NOVEL CLINICAL EXPRESSION DATA  

PubMed Central

Gene set analyses have become a standard approach for increasing the sensitivity of transcriptomic studies. However, analytical methods incorporating gene sets require the availability of pre-defined gene sets relevant to the underlying physiology being studied. For novel physiological problems, relevant gene sets may be unavailable or existing gene set databases may bias the results towards only the best-studied of the relevant biological processes. We describe a successful attempt to mine novel functional gene sets for translational projects where the underlying physiology is not necessarily well characterized in existing annotation databases. We choose targeted training data from public expression data repositories and define new criteria for selecting biclusters to serve as candidate gene sets. Many of the discovered gene sets show little or no enrichment for informative Gene Ontology terms or other functional annotation. However, we observe that such gene sets show coherent differential expression in new clinical test data sets, even if derived from different species, tissues, and disease states. We demonstrate the efficacy of this method on a human metabolic data set, where we discover novel, uncharacterized gene sets that are diagnostic of diabetes, and on additional data sets related to neuronal processes and human development. Our results suggest that our approach may be an efficient way to generate a collection of gene sets relevant to the analysis of data for novel clinical applications where existing functional annotation is relatively incomplete. PMID:21121032

Turcan, Sevin; Vetter, Douglas E.; Maron, Jill L.; Wei, Xintao; Slonim, Donna K.

2011-01-01

186

BambooGDB: a bamboo genome database with functional annotation and an analysis platform.  

PubMed

Bamboo, as one of the most important non-timber forest products and fastest-growing plants in the world, represents the only major lineage of grasses that is native to forests. Recent success on the first high-quality draft genome sequence of moso bamboo (Phyllostachys edulis) provides new insights on bamboo genetics and evolution. To further extend our understanding on bamboo genome and facilitate future studies on the basis of previous achievements, here we have developed BambooGDB, a bamboo genome database with functional annotation and analysis platform. The de novo sequencing data, together with the full-length complementary DNA and RNA-seq data of moso bamboo composed the main contents of this database. Based on these sequence data, a comprehensively functional annotation for bamboo genome was made. Besides, an analytical platform composed of comparative genomic analysis, protein-protein interactions network, pathway analysis and visualization of genomic data was also constructed. As discovery tools to understand and identify biological mechanisms of bamboo, the platform can be used as a systematic framework for helping and designing experiments for further validation. Moreover, diverse and powerful search tools and a convenient browser were incorporated to facilitate the navigation of these data. As far as we know, this is the first genome database for bamboo. Through integrating high-throughput sequencing data, a full functional annotation and several analysis modules, BambooGDB aims to provide worldwide researchers with a central genomic resource and an extensible analysis platform for bamboo genome. BambooGDB is freely available at http://www.bamboogdb.org/. Database URL: http://www.bamboogdb.org. PMID:24602877

Zhao, Hansheng; Peng, Zhenhua; Fei, Benhua; Li, Lubin; Hu, Tao; Gao, Zhimin; Jiang, Zehui

2014-01-01

187

BambooGDB: a bamboo genome database with functional annotation and an analysis platform  

PubMed Central

Bamboo, as one of the most important non-timber forest products and fastest-growing plants in the world, represents the only major lineage of grasses that is native to forests. Recent success on the first high-quality draft genome sequence of moso bamboo (Phyllostachys edulis) provides new insights on bamboo genetics and evolution. To further extend our understanding on bamboo genome and facilitate future studies on the basis of previous achievements, here we have developed BambooGDB, a bamboo genome database with functional annotation and analysis platform. The de novo sequencing data, together with the full-length complementary DNA and RNA-seq data of moso bamboo composed the main contents of this database. Based on these sequence data, a comprehensively functional annotation for bamboo genome was made. Besides, an analytical platform composed of comparative genomic analysis, protein–protein interactions network, pathway analysis and visualization of genomic data was also constructed. As discovery tools to understand and identify biological mechanisms of bamboo, the platform can be used as a systematic framework for helping and designing experiments for further validation. Moreover, diverse and powerful search tools and a convenient browser were incorporated to facilitate the navigation of these data. As far as we know, this is the first genome database for bamboo. Through integrating high-throughput sequencing data, a full functional annotation and several analysis modules, BambooGDB aims to provide worldwide researchers with a central genomic resource and an extensible analysis platform for bamboo genome. BambooGDB is freely available at http://www.bamboogdb.org/. Database URL: http://www.bamboogdb.org PMID:24602877

Zhao, Hansheng; Peng, Zhenhua; Fei, Benhua; Li, Lubin; Hu, Tao; Gao, Zhimin; Jiang, Zehui

2014-01-01

188

Parallel-META 2.0: Enhanced Metagenomic Data Analysis with Functional Annotation, High Performance Computing and Advanced Visualization  

PubMed Central

The metagenomic method directly sequences and analyses genome information from microbial communities. The main computational tasks for metagenomic analyses include taxonomical and functional structure analysis for all genomes in a microbial community (also referred to as a metagenomic sample). With the advancement of Next Generation Sequencing (NGS) techniques, the number of metagenomic samples and the data size for each sample are increasing rapidly. Current metagenomic analysis is both data- and computation- intensive, especially when there are many species in a metagenomic sample, and each has a large number of sequences. As such, metagenomic analyses require extensive computational power. The increasing analytical requirements further augment the challenges for computation analysis. In this work, we have proposed Parallel-META 2.0, a metagenomic analysis software package, to cope with such needs for efficient and fast analyses of taxonomical and functional structures for microbial communities. Parallel-META 2.0 is an extended and improved version of Parallel-META 1.0, which enhances the taxonomical analysis using multiple databases, improves computation efficiency by optimized parallel computing, and supports interactive visualization of results in multiple views. Furthermore, it enables functional analysis for metagenomic samples including short-reads assembly, gene prediction and functional annotation. Therefore, it could provide accurate taxonomical and functional analyses of the metagenomic samples in high-throughput manner and on large scale. PMID:24595159

Song, Baoxing; Xu, Jian; Ning, Kang

2014-01-01

189

The Zebrafish GenomeWiki: a crowdsourcing approach to connect the long tail for zebrafish gene annotation  

PubMed Central

A large repertoire of gene-centric data has been generated in the field of zebrafish biology. Although the bulk of these data are available in the public domain, most of them are not readily accessible or available in nonstandard formats. One major challenge is to unify and integrate these widely scattered data sources. We tested the hypothesis that active community participation could be a viable option to address this challenge. We present here our approach to create standards for assimilation and sharing of information and a system of open standards for database intercommunication. We have attempted to address this challenge by creating a community-centric solution for zebrafish gene annotation. The Zebrafish GenomeWiki is a ‘wiki’-based resource, which aims to provide an altruistic shared environment for collective annotation of the zebrafish genes. The Zebrafish GenomeWiki has features that enable users to comment, annotate, edit and rate this gene-centric information. The credits for contributions can be tracked through a transparent microattribution system. In contrast to other wikis, the Zebrafish GenomeWiki is a ‘structured wiki’ or rather a ‘semantic wiki’. The Zebrafish GenomeWiki implements a semantically linked data structure, which in the future would be amenable to semantic search. Database URL: http://genome.igib.res.in/twiki PMID:24578356

Singh, Meghna; Bhartiya, Deeksha; Maini, Jayant; Sharma, Meenakshi; Singh, Angom Ramcharan; Kadarkaraisamy, Subburaj; Rana, Rajiv; Sabharwal, Ankit; Nanda, Srishti; Ramachandran, Aravindhakshan; Mittal, Ashish; Kapoor, Shruti; Sehgal, Paras; Asad, Zainab; Kaushik, Kriti; Vellarikkal, Shamsudheen Karuthedath; Jagga, Divya; Muthuswami, Muthulakshmi; Chauhan, Rajendra K.; Leonard, Elvin; Priyadarshini, Ruby; Halimani, Mahantappa; Malhotra, Sunny; Patowary, Ashok; Vishwakarma, Harinder; Joshi, Prateek; Bhardwaj, Vivek; Bhaumik, Arijit; Bhatt, Bharat; Jha, Aamod; Kumar, Aalok; Budakoti, Prerna; Lalwani, Mukesh Kumar; Meli, Rajeshwari; Jalali, Saakshi; Joshi, Kandarp; Pal, Koustav; Dhiman, Heena; Laddha, Saurabh V.; Jadhav, Vaibhav; Singh, Naresh; Pandey, Vikas; Sachidanandan, Chetana; Ekker, Stephen C.; Klee, Eric W.; Scaria, Vinod; Sivasubbu, Sridhar

2014-01-01

190

The NFI-Regulome Database: A tool for annotation and analysis of control regions of genes regulated by Nuclear Factor I transcription factors  

PubMed Central

Background Genome annotation plays an essential role in the interpretation and use of genome sequence information. While great strides have been made in the annotation of coding regions of genes, less success has been achieved in the annotation of the regulatory regions of genes, including promoters, enhancers/silencers, and other regulatory elements. One reason for this disparity in annotated information is that coding regions can be assessed using high-throughput techniques such as EST sequencing, while annotation of regulatory regions often requires a gene-by-gene approach. Results The NFI-Regulome database http://nfiregulome.ccr.buffalo.edu was designed to promote easy annotation of the regulatory regions of genes that contain binding sites for the NFI (Nuclear Factor I) family of transcription factors, using data from the published literature. Binding sites are annotated together with the sequence of the gene, obtained from the UCSC Genome site, and the locations of all binding sites for multiple genes can be displayed in a number of formats designed to facilitate inter-gene comparisons. Classes of genes based on expression pattern, disease involvement, or types of binding sites present can be readily compared in order to assess common "architectural" structures in the regulatory regions. Conclusions The NFI-Regulome database allows rapid display of the relative locations and number of transcription factor binding sites of individual or defined sets of genes that contain binding sites for NFI transcription factors. This database may in the future be expanded into a distributed database structure including other families of transcription factors. Such databases may be useful for identifying common regulatory structures in genes essential for organ development, tissue-specific gene expression or those genes related to specific diseases. PMID:21884625

2011-01-01

191

De novo Cloning and Annotation of Genes Associated with Immunity, Detoxification and Energy Metabolism from the Fat Body of the Oriental Fruit Fly, Bactrocera dorsalis  

PubMed Central

The oriental fruit fly, Bactrocera dorsalis, is a destructive pest in tropical and subtropical areas. In this study, we performed transcriptome-wide analysis of the fat body of B. dorsalis and obtained more than 59 million sequencing reads, which were assembled into 27,787 unigenes with an average length of 591 bp. Among them, 17,442 (62.8%) unigenes matched known proteins in the NCBI database. The assembled sequences were further annotated with gene ontology, cluster of orthologous group terms, and Kyoto encyclopedia of genes and genomes. In depth analysis was performed to identify genes putatively involved in immunity, detoxification, and energy metabolism. Many new genes were identified including serpins, peptidoglycan recognition proteins and defensins, which were potentially linked to immune defense. Many detoxification genes were identified, including cytochrome P450s, glutathione S-transferases and ATP-binding cassette (ABC) transporters. Many new transcripts possibly involved in energy metabolism, including fatty acid desaturases, lipases, alpha amylases, and trehalose-6-phosphate synthases, were identified. Moreover, we randomly selected some genes to examine their expression patterns in different tissues by quantitative real-time PCR, which indicated that some genes exhibited fat body-specific expression in B. dorsalis. The identification of a numerous transcripts in the fat body of B. dorsalis laid the foundation for future studies on the functions of these genes. PMID:24710118

Yang, Wen-Jia; Yuan, Guo-Rui; Cong, Lin; Xie, Yi-Fei; Wang, Jin-Jun

2014-01-01

192

PHYLOGENOMICS - GUIDED VALIDATION OF FUNCTION FOR CONSERVED UNKNOWN GENES  

SciTech Connect

Identifying functions for all gene products in all sequenced organisms is a central challenge of the post-genomic era. However, at least 30-50% of the proteins encoded by any given genome are of unknown function, or wrongly or vaguely annotated. Many of these 'unknown' proteins are common to prokaryotes and plants. We accordingly set out to predict and experimentally test the functions of such proteins. Our approach to functional prediction is integrative, coupling the extensive post-genomic resources available for plants with comparative genomics based on hundreds of microbial genomes, and functional genomic datasets from model microorganisms. The early phase is computer-assisted; later phases incorporate intellectual input from expert plant and microbial biochemists. The approach thus bridges the gap between automated homology-based annotations and the classical gene discovery efforts of experimentalists, and is much more powerful than purely computational approaches to identifying gene-function associations. Among Arabidopsis genes, we focused on those (2,325 in total) that (i) are unique or belong to families with no more than three members, (ii) are conserved between plants and prokaryotes, and (iii) have unknown or poorly known functions. Computer-assisted selection of promising targets for deeper analysis was based on homology .. independent characteristics associated in the SEED database with the prokaryotic members of each family, specifically gene clustering and phyletic spread, as well as availability of functional genomics data, and publications that could link candidate families to general metabolic areas, or to specific functions. In-depth comparative genomic analysis was then performed for about 500 top candidate families, which connected ~55 of them to general areas of metabolism and led to specific functional predictions for a subset of ~25 more. Twenty predicted functions were experimentally tested in at least one prokaryotic organism via reverse genetics, metabolic profiling, functional complementation, and recombinant protein biochemistry. Our approach predicted and validated functions for 10 formerly uncharacterized protein families common to plants and prokaryotes; none of these functions had previously been correctly predicted by computational methods. The functions of five more are currently being validated. Experimental testing of diverse representatives of these families combined with in silica analysis allowed accurate projection of the annotations to hundreds more sequenced genomes.

V, DE CRECY-LAGARD; D, HANSON A

2012-01-03

193

Gene Set Enrichment in eQTL Data Identifies Novel Annotations and Pathway Regulators  

PubMed Central

Genome-wide gene expression profiling has been extensively used to generate biological hypotheses based on differential expression. Recently, many studies have used microarrays to measure gene expression levels across genetic mapping populations. These gene expression phenotypes have been used for genome-wide association analyses, an analysis referred to as expression QTL (eQTL) mapping. Here, eQTL analysis was performed in adipose tissue from 28 inbred strains of mice. We focused our analysis on “trans-eQTL bands”, defined as instances in which the expression patterns of many genes were all associated to a common genetic locus. Genes comprising trans-eQTL bands were screened for enrichments in functional gene sets representing known biological pathways, and genes located at associated trans-eQTL band loci were considered candidate transcriptional modulators. We demonstrate that these patterns were enriched for previously characterized relationships between known upstream transcriptional regulators and their downstream target genes. Moreover, we used this strategy to identify both novel regulators and novel members of known pathways. Finally, based on a putative regulatory relationship identified in our analysis, we identified and validated a previously uncharacterized role for cyclin H in the regulation of oxidative phosphorylation. We believe that the specific molecular hypotheses generated in this study will reveal many additional pathway members and regulators, and that the analysis approaches described herein will be broadly applicable to other eQTL data sets. PMID:18464898

Wu, Chunlei; Delano, David L.; Mitro, Nico; Su, Stephen V.; Janes, Jeff; McClurg, Phillip; Batalov, Serge; Welch, Genevieve L.; Zhang, Jie; Orth, Anthony P.; Walker, John R.; Glynne, Richard J.; Cooke, Michael P.; Takahashi, Joseph S.; Shimomura, Kazuhiro; Kohsaka, Akira; Bass, Joseph; Saez, Enrique; Wiltshire, Tim; Su, Andrew I.

2008-01-01

194

De Novo Assembly and Annotation of the Transcriptome of the Agricultural Weed Ipomoea purpurea Uncovers Gene Expression Changes Associated with Herbicide Resistance  

PubMed Central

Human-mediated selection can lead to rapid evolution in very short time scales, and the evolution of herbicide resistance in agricultural weeds is an excellent example of this phenomenon. The common morning glory, Ipomoea purpurea, is resistant to the herbicide glyphosate, but genetic investigations of this trait have been hampered by the lack of genomic resources for this species. Here, we present the annotated transcriptome of the common morning glory, Ipomoea purpurea, along with an examination of whole genome expression profiling to assess potential gene expression differences between three artificially selected herbicide resistant lines and three susceptible lines. The assembled Ipomoea transcriptome reported in this work contains 65,459 assembled transcripts, ~28,000 of which were functionally annotated by assignment to Gene Ontology categories. Our RNA-seq survey using this reference transcriptome identified 19 differentially expressed genes associated with resistance—one of which, a cytochrome P450, belongs to a large plant family of genes involved in xenobiotic detoxification. The differentially expressed genes also broadly implicated receptor-like kinases, which were down-regulated in the resistant lines, and other growth and defense genes, which were up-regulated in resistant lines. Interestingly, the target of glyphosate—EPSP synthase—was not overexpressed in the resistant Ipomoea lines as in other glyphosate resistant weeds. Overall, this work identifies potential candidate resistance loci for future investigations and dramatically increases genomic resources for this species. The assembled transcriptome presented herein will also provide a valuable resource to the Ipomoea community, as well as to those interested in utilizing the close relationship between the Convolvulaceae and the Solanaceae for phylogenetic and comparative genomics examinations. PMID:25155274

Leslie, Trent; Baucom, Regina S.

2014-01-01

195

Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane  

PubMed Central

To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979

Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

2003-01-01

196

Functional Annotation of Conserved Hypothetical Proteins from Haemophilus influenzae Rd KW20  

PubMed Central

Haemophilus influenzae is a Gram negative bacterium that belongs to the family Pasteurellaceae, causes bacteremia, pneumonia and acute bacterial meningitis in infants. The emergence of multi-drug resistance H. influenzae strain in clinical isolates demands the development of better/new drugs against this pathogen. Our study combines a number of bioinformatics tools for function predictions of previously not assigned proteins in the genome of H. influenzae. This genome was extensively analyzed and found 1,657 functional proteins in which function of 429 proteins are unknown, termed as hypothetical proteins (HPs). Amino acid sequences of all 429 HPs were extensively annotated and we successfully assigned the function to 296 HPs with high confidence. We also characterized the function of 124 HPs precisely, but with less confidence. We believed that sequence of a protein can be used as a framework to explain known functional properties. Here we have combined the latest versions of protein family databases, protein motifs, intrinsic features from the amino acid sequence, pathway and genome context methods to assign a precise function to hypothetical proteins for which no experimental information is available. We found these HPs belong to various classes of proteins such as enzymes, transporters, carriers, receptors, signal transducers, binding proteins, virulence and other proteins. The outcome of this work will be helpful for a better understanding of the mechanism of pathogenesis and in finding novel therapeutic targets for H. influenzae. PMID:24391926

Shahbaaz, Mohd; Md. ImtaiyazHassan; Ahmad, Faizan

2013-01-01

197

Gene Annotation and Drug Target Discovery in Candida albicans with a Tagged Transposon Mutant Collection  

Microsoft Academic Search

Candida albicans is the most common human fungal pathogen, causing infections that can be lethal in immunocompromised patients. Although Saccharomyces cerevisiae has been used as a model for C. albicans, it lacks C. albicans' diverse morphogenic forms and is primarily non-pathogenic. Comprehensive genetic analyses that have been instrumental for determining gene function in S. cerevisiae are hampered in C. albicans,

Julia Oh; Eula Fung; Ulrich Schlecht; Ronald W. Davis; Guri Giaever; Robert P. St. Onge; Adam Deutschbauer; Corey Nislow

2010-01-01

198

Fine Scale Regulatory Annotation of Cancer Genes 29 June 7 August 2009  

E-print Network

are then inactivated in cancer cells, resulting in the loss of normal functions in those cells, such as accurate DNA specifically shown to be active in tumour (cancer) cells ­ we will refer to these as "cancer" genes cancers are caused by abnormalities in the genetic material of the transformed cells. These abnormalities

Goldschmidt, Christina

199

The mammalian gene function resource: the International Knockout Mouse Consortium.  

PubMed

In 2007, the International Knockout Mouse Consortium (IKMC) made the ambitious promise to generate mutations in virtually every protein-coding gene of the mouse genome in a concerted worldwide action. Now, 5 years later, the IKMC members have developed high-throughput gene trapping and, in particular, gene-targeting pipelines and generated more than 17,400 mutant murine embryonic stem (ES) cell clones and more than 1,700 mutant mouse strains, most of them conditional. A common IKMC web portal (www.knockoutmouse.org) has been established, allowing easy access to this unparalleled biological resource. The IKMC materials considerably enhance functional gene annotation of the mammalian genome and will have a major impact on future biomedical research. PMID:22968824

Bradley, Allan; Anastassiadis, Konstantinos; Ayadi, Abdelkader; Battey, James F; Bell, Cindy; Birling, Marie-Christine; Bottomley, Joanna; Brown, Steve D; Bürger, Antje; Bult, Carol J; Bushell, Wendy; Collins, Francis S; Desaintes, Christian; Doe, Brendan; Economides, Aris; Eppig, Janan T; Finnell, Richard H; Fletcher, Colin; Fray, Martin; Frendewey, David; Friedel, Roland H; Grosveld, Frank G; Hansen, Jens; Hérault, Yann; Hicks, Geoffrey; Hörlein, Andreas; Houghton, Richard; Hrabé de Angelis, Martin; Huylebroeck, Danny; Iyer, Vivek; de Jong, Pieter J; Kadin, James A; Kaloff, Cornelia; Kennedy, Karen; Koutsourakis, Manousos; Lloyd, K C Kent; Marschall, Susan; Mason, Jeremy; McKerlie, Colin; McLeod, Michael P; von Melchner, Harald; Moore, Mark; Mujica, Alejandro O; Nagy, Andras; Nefedov, Mikhail; Nutter, Lauryl M; Pavlovic, Guillaume; Peterson, Jane L; Pollock, Jonathan; Ramirez-Solis, Ramiro; Rancourt, Derrick E; Raspa, Marcello; Remacle, Jacques E; Ringwald, Martin; Rosen, Barry; Rosenthal, Nadia; Rossant, Janet; Ruiz Noppinger, Patricia; Ryder, Ed; Schick, Joel Zupicich; Schnütgen, Frank; Schofield, Paul; Seisenberger, Claudia; Selloum, Mohammed; Simpson, Elizabeth M; Skarnes, William C; Smedley, Damian; Stanford, William L; Stewart, A Francis; Stone, Kevin; Swan, Kate; Tadepally, Hamsa; Teboul, Lydia; Tocchini-Valentini, Glauco P; Valenzuela, David; West, Anthony P; Yamamura, Ken-ichi; Yoshinaga, Yuko; Wurst, Wolfgang

2012-10-01

200

MICheck: a web tool for fast checking of syntactic annotations of bacterial genomes  

PubMed Central

The annotation of newly sequenced bacterial genomes begins with running several automatic analysis methods, with major emphasis on the identification of protein-coding genes. DNA sequences are heterogeneous in local nucleotide composition and this leads sometimes to sequences being annotated as authentic genes when they are not protein-coding genes or are true but uncharacterized protein-coding genes. This first annotation step is generally followed by an expert manual annotation of the predicted genes. The genomic data (sequence and annotations) organized in an appropriate databank file format is subsequently submitted to an entry point of the International Nucleotide Sequence Database. These procedures are inevitably subject to mistakes, and this can lead to unintentional syntactic annotation errors being stored in public databanks. Here, we present a new web program, MICheck (MIcrobial genome Checker), that enables rapid verification of sets of annotated genes and frameshifts in previously published bacterial genomes. The web interface allows one easily to investigate the MICheck results, i.e. inaccurate or missed gene annotations: a graphical representation is drawn, in which the genomic context of a unique coding DNA sequence annotation or a predicted frameshift is given, using information on the coding potential (curves) and annotation of the neighbouring genes. We illustrate some capabilities of the MICheck site through the analysis of 20 bacterial genomes, 9 of which were selected for their ‘Reviewed’ status in the National Center for Biotechnology Information (NCBI) Reference Sequence Project (RefSeq). In the context of the numerous re-annotation projects for microbial genomes, this tool can be seen as a preliminary step before the functional re-annotation step to check quickly for missing or wrongly annotated genes. The MICheck website is accessible at the following address: . PMID:15980515

Cruveiller, Stéphane; Le Saux, Jérôme; Vallenet, David; Lajus, Aurélie; Bocs, Stéphanie; Médigue, Claudine

2005-01-01

201

Gene function classification using Bayesian models with hierarchy-based priors  

Microsoft Academic Search

We investigate the application of hierarchical classification schemes to the annotation of gene function based on several characteristics of protein sequences including phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL) model, a hierarchical model based on a

Babak Shahbaba; Radford M. Neal

2006-01-01

202

Functionally Enigmatic Genes: A Case Study of the Brain Ignorome  

PubMed Central

What proportion of genes with intense and selective expression in specific tissues, cells, or systems are still almost completely uncharacterized with respect to biological function? In what ways do these functionally enigmatic genes differ from well-studied genes? To address these two questions, we devised a computational approach that defines so-called ignoromes. As proof of principle, we extracted and analyzed a large subset of genes with intense and selective expression in brain. We find that publications associated with this set are highly skewed—the top 5% of genes absorb 70% of the relevant literature. In contrast, approximately 20% of genes have essentially no neuroscience literature. Analysis of the ignorome over the past decade demonstrates that it is stubbornly persistent, and the rapid expansion of the neuroscience literature has not had the expected effect on numbers of these genes. Surprisingly, ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains. The major distinguishing characteristic between these sets of genes is date of discovery, early discovery being associated with greater research momentum—a genomic bandwagon effect. Finally we ask to what extent massive genomic, imaging, and phenotype data sets can be used to provide high-throughput functional annotation for an entire ignorome. In a majority of cases we have been able to extract and add significant information for these neglected genes. In several cases—ELMOD1, TMEM88B, and DZANK1—we have exploited sequence polymorphisms, large phenome data sets, and reverse genetic methods to evaluate the function of ignorome genes. PMID:24523945

Pandey, Ashutosh K.; Lu, Lu; Wang, Xusheng; Homayouni, Ramin; Williams, Robert W.

2014-01-01

203

The DOE-JGI Standard Operating Procedure for the Annotations of the Microbial Genomes  

SciTech Connect

The DOE-JGI Microbial Annotation Pipeline (DOE-JGI MAP) supports gene prediction and/or functional annotation of microbial genomes towards comparative analysis with the Integrated Microbial Genome (IMG) system. DOE-JGI MAP annotation is applied on nucleotide sequence datasets included in the IMG-ER (Expert Review) version of IMG via the IMG ER submission site. Users can submit the sequence datasets consisting of one or more contigs in a multi-fasta file. DOE-JGI MAP annotation includes prediction of protein coding and RNA genes, as well as repeats and assignment of product names to these genes.

Mavromatis, Konstantinos; Ivanova, Natalia; Chen, I-Min A.; Szeto, Ernest; Markowitz, Victor; Kyrpides, Nikos C.

2009-05-20

204

Characterization of Liaoning Cashmere Goat Transcriptome: Sequencing, De Novo Assembly, Functional Annotation and Comparative Analysis  

PubMed Central

Background Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology. Results Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51%) unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17%) unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes. Conclusion The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses. PMID:24130835

Liu, Hongliang; Wang, Tingting; Wang, Jinke; Quan, Fusheng; Zhang, Yong

2013-01-01

205

A weighted power framework for integrating multisource information: gene function prediction in yeast.  

PubMed

Predicting the functions of unannotated genes is one of the major challenges of biological investigation. In this study, we propose a weighted power scoring framework, called weighted power biological score (WPBS), for combining different biological data sources and predicting the function of some of the unclassified yeast Saccharomyces cerevisiae genes. The relative power and weight coefficients of different data sources, in the proposed score, are estimated systematically by utilizing functional annotations [yeast Gene Ontology (GO)-Slim: Process] of classified genes, available from Saccharomyces Genome Database. Genes are then clustered by applying k-medoids algorithm on WPBS, and functional categories of 334 unclassified genes are predicted using a P-value cutoff 1 ×10(-5). The WPBS is available online at http://www.isical.ac.in/~ shubhra/WPBS/WPBS.html, where one can download WPBS, related files, and a MATLAB code to predict functions of unclassified genes. PMID:22318478

Ray, Shubhra Sankar; Bandyopadhyay, Sanghamitra; Pal, Sankar K

2012-04-01

206

Correlation between Gene Expression and GO Semantic Similarity  

Microsoft Academic Search

This research analyzes some aspects of the relationship between gene expression, gene function, and gene annotation. Many recent studies are implicitly based on the assumption that gene products that are biologically and functionally related would maintain this similarity both in their expression profiles as well as in their Gene Ontology (GO) annotation. We analyze how accurate this assumption proves to

Jose L. Sevilla; Victor Segura; Adam Podhorski; Elizabeth Guruceaga; Jose M. Mato; Luis A. Martinez-Cruz; Fernando J. Corrales; Angel Rubio

2005-01-01

207

Functional analysis of differentially expressed genes associated with glaucoma from DNA microarray data.  

PubMed

Microarray data of astrocytes extracted from the optic nerves of donors with and without glaucoma were analyzed to screen for differentially expressed genes (DEGs). Functional exploration with bioinformatic tools was then used to understand the roles of the identified DEGs in glaucoma. Microarray data were downloaded from the Gene Expression Omnibus (GEO) database, which contains 13 astrocyte samples, 6 from healthy subjects and 7 from patients suffering from glaucoma. Data were pre-processed, and DEGs were screened out using R software packages. Interactions between DEGs were identified, and networks were built using Search Tool for the Retrieval of Interacting Genes/Proteins (STRING). GENECODIS was utilized for the functional analysis of the DEGs, and GOTM was used for module division, for which functional annotation was conducted with the Database for Annotation, Visualization, and Integrated Discovery (DAVID). A total of 371 DEGs were identified between glaucoma-associated samples and normal samples. Three modules included in the PPID database were generated with 11, 12, and 2 significant functional annotations, including immune system processes, inflammatory responses, and synaptic vesicle endocytosis, respectively. We found that the most significantly enriched functions for each module were associated with immune function. Several genes that play interesting roles in the development of glaucoma are described; these genes may be potential biomarkers for glaucoma diagnosis or treatment. PMID:25501152

Wu, Y; Zang, W D; Jiang, W

2014-01-01

208

Heterologous expression of plasmodial proteins for structural studies and functional annotation  

PubMed Central

Malaria remains the world's most devastating tropical infectious disease with as many as 40% of the world population living in risk areas. The widespread resistance of Plasmodium parasites to the cost-effective chloroquine and antifolates has forced the introduction of more costly drug combinations, such as Coartem®. In the absence of a vaccine in the foreseeable future, one strategy to address the growing malaria problem is to identify and characterize new and durable antimalarial drug targets, the majority of which are parasite proteins. Biochemical and structure-activity analysis of these proteins is ultimately essential in the characterization of such targets but requires large amounts of functional protein. Even though heterologous protein production has now become a relatively routine endeavour for most proteins of diverse origins, the functional expression of soluble plasmodial proteins is highly problematic and slows the progress of antimalarial drug target discovery. Here the status quo of heterologous production of plasmodial proteins is presented, constraints are highlighted and alternative strategies and hosts for functional expression and annotation of plasmodial proteins are reviewed. PMID:18828893

Birkholtz, Lyn-Marie; Blatch, Gregory; Coetzer, Theresa L; Hoppe, Heinrich C; Human, Esmaré; Morris, Elizabeth J; Ngcete, Zoleka; Oldfield, Lyndon; Roth, Robyn; Shonhai, Addmore; Stephens, Linda; Louw, Abraham I

2008-01-01

209

A transcriptomic analysis of striped catfish (Pangasianodon hypophthalmus) in response to salinity adaptation: De novo assembly, gene annotation and marker discovery.  

PubMed

The striped catfish (Pangasianodon hypophthalmus) culture industry in the Mekong Delta in Vietnam has developed rapidly over the past decade. The culture industry now however, faces some significant challenges, especially related to climate change impacts notably from predicted extensive saltwater intrusion into many low topographical coastal provinces across the Mekong Delta. This problem highlights a need for development of culture stocks that can tolerate more saline culture environments as a response to expansion of saline water-intruded land. While a traditional artificial selection program can potentially address this need, understanding the genomic basis of salinity tolerance can assist development of more productive culture lines. The current study applied a transcriptomic approach using Ion PGM technology to generate expressed sequence tag (EST) resources from the intestine and swim bladder from striped catfish reared at a salinity level of 9ppt which showed best growth performance. Total sequence data generated was 467.8Mbp, consisting of 4,116,424 reads with an average length of 112bp. De novo assembly was employed that generated 51,188 contigs, and allowed identification of 16,116 putative genes based on the GenBank non-redundant database. GO annotation, KEGG pathway mapping, and functional annotation of the EST sequences recovered with a wide diversity of biological functions and processes. In addition, more than 11,600 simple sequence repeats were also detected. This is the first comprehensive analysis of a striped catfish transcriptome, and provides a valuable genomic resource for future selective breeding programs and functional or evolutionary studies of genes that influence salinity tolerance in this important culture species. PMID:24841517

Thanh, Nguyen Minh; Jung, Hyungtaek; Lyons, Russell E; Chand, Vincent; Tuan, Nguyen Viet; Thu, Vo Thi Minh; Mather, Peter

2014-06-01

210

CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues  

Microsoft Academic Search

Cavities on a proteins surface as well as specific amino acid positioning within it create the physico- chemical properties needed for a protein to perform its function. CASTp (http:\\/\\/cast.engr.uic.edu) is an online tool that locates and measures pockets and voids on 3D protein structures. This new version of CASTp includes annotated functional information of specific residues on the protein structure.

Joe Dundas; Zheng Ouyang; Jeffery Tseng; T. Andrew Binkowski; Yaron Turpaz; Jie Liang

2006-01-01

211

Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones  

Microsoft Academic Search

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult

Tadashi Imanishi; Takeshi Itoh; Yutaka Suzuki; Claire ODonovan; Satoshi Fukuchi; Kanako O. Koyanagi; Roberto A. Barrero; Takuro Tamura; Yumi Yamaguchi-Kabata; Motohiko Tanino; Kei Yura; Satoru Miyazaki; Kazuho Ikeo; Keiichi Homma; Arek Kasprzyk; Tetsuo Nishikawa; Mika Hirakawa; Jean Thierry-Mieg; Danielle Thierry-Mieg; Jennifer Ashurst; Libin Jia; Mitsuteru Nakao; Michael A. Thomas; Nicola Mulder; Youla Karavidopoulou; Lihua Jin; Sangsoo Kim; Tomohiro Yasuda; Boris Lenhard; Eric Eveno; Yoshiyuki Suzuki; Chisato Yamasaki; Jun-ichi Takeda; Craig Gough; Phillip Hilton; Yasuyuki Fujii; Hiroaki Sakai; Susumu Tanaka; Clara Amid; Matthew Bellgard; Maria de Fatima Bonaldo; Hidemasa Bono; Susan K. Bromberg; Anthony J. Brookes; Elspeth Bruford; Piero Carninci; Claude Chelala; Christine Couillault; Sandro J. de Souza; Marie-Anne Debily; Marie-Dominique Devignes; Inna Dubchak; Toshinori Endo; Anne Estreicher; Eduardo Eyras; Kaoru Fukami-Kobayashi; Gopal R. Gopinath; Esther Graudens; Yoonsoo Hahn; Michael Han; Ze-Guang Han; Kousuke Hanada; Hideki Hanaoka; Erimi Harada; Katsuyuki Hashimoto; Ursula Hinz; Momoki Hirai; Teruyoshi Hishiki; Ian Hopkinson; Sandrine Imbeaud; Hidetoshi Inoko; Alexander Kanapin; Yayoi Kaneko; Takeya Kasukawa; Janet Kelso; Paul Kersey; Reiko Kikuno; Kouichi Kimura; Bernhard Korn; Vladimir Kuryshev; Izabela Makalowska; Takashi Makino; Shuhei Mano; Regine Mariage-Samson; Jun Mashima; Hideo Matsuda; Hans-Werner Mewes; Shinsei Minoshima; Keiichi Nagai; Hideki Nagasaki; Naoki Nagata; Rajni Nigam; Osamu Ogasawara; Osamu Ohara; Masafumi Ohtsubo; Norihiro Okada; Toshihisa Okido; Satoshi Oota; Motonori Ota; Toshio Ota; Tetsuji Otsuki; Dominique Piatier-Tonneau; Annemarie Poustka; Shuang-Xi Ren; Naruya Saitou; Katsunaga Sakai; Shigetaka Sakamoto; Ryuichi Sakate; Ingo Schupp; Florence Servant; Stephen Sherry; Rie Shiba; Nobuyoshi Shimizu; Mary Shimoyama; Andrew J Simpson; Bento Soares; Charles Steward; Makiko Suwa; Mami Suzuki; Aiko Takahashi; Gen Tamiya; Hiroshi Tanaka; Todd Taylor; Joseph D Terwilliger; Per Unneberg; Vamsi Veeramachaneni; Shinya Watanabe; Laurens Wilming; Norikazu Yasuda; Hyang-Sook Yoo; Marvin Stodolsky; Wojciech Makalowski; Mitiko Go; Kenta Nakai; Toshihisa Takagi; Minoru Kanehisa; Yoshiyuki Sakaki; John Quackenbush; Yasushi Okazaki; Yoshihide Hayashizaki; Winston Hide; Ranajit Chakraborty; Ken Nishikawa; Hideaki Sugawara; Yoshio Tateno; Zhu Chen; Michio Oishi; Peter Tonellato; Rolf Apweiler; Kousaku Okubo; Lukas Wagner; Stefan Wiemann; Robert L Strausberg; Takao Isogai; Charles Auffray; Nobuo Nomura; Takashi Gojobori; Sumio Sugano

2004-01-01

212

Connectionist Approaches for Predicting Mouse Gene Function from Gene Expression  

E-print Network

Therapy. Identifying gene function based on gene expression data is much easier in prokaryotes than eukaryotes due to the relatively simple structure of prokaryotes. That is why tissue-specific expression ways, especially in Gene Therapy [5]. Identifying gene function in prokaryotes is much easier than

Bonner, Anthony

213

Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research  

PubMed Central

Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/. PMID:24358873

Köhler, Sebastian; Mungall, Christopher J

2014-01-01

214

Closing the loop: from paper to protein annotation using supervised Gene Ontology classification  

PubMed Central

Gene function curation of the literature with Gene Ontology (GO) concepts is one particularly time-consuming task in genomics, and the help from bioinformatics is highly requested to keep up with the flow of publications. In 2004, the first BioCreative challenge already designed a task of automatic GO concepts assignment from a full text. At this time, results were judged far from reaching the performances required by real curation workflows. In particular, supervised approaches produced the most disappointing results because of lack of training data. Ten years later, the available curation data have massively grown. In 2013, the BioCreative IV GO task revisited the automatic GO assignment task. For this issue, we investigated the power of our supervised classifier, GOCat. GOCat computes similarities between an input text and already curated instances contained in a knowledge base to infer GO concepts. The subtask A consisted in selecting GO evidence sentences for a relevant gene in a full text. For this, we designed a state-of-the-art supervised statistical approach, using a naïve Bayes classifier and the official training set, and obtained fair results. The subtask B consisted in predicting GO concepts from the previous output. For this, we applied GOCat and reached leading results, up to 65% for hierarchical recall in the top 20 outputted concepts. Contrary to previous competitions, machine learning has this time outperformed standard dictionary-based approaches. Thanks to BioCreative IV, we were able to design a complete workflow for curation: given a gene name and a full text, this system is able to select evidence sentences for curation and to deliver highly relevant GO concepts. Contrary to previous competitions, machine learning this time outperformed dictionary-based systems. Observed performances are sufficient for being used in a real semiautomatic curation workflow. GOCat is available at http://eagl.unige.ch/GOCat/. Database URL: http://eagl.unige.ch/GOCat4FT/ PMID:25190367

Gobeill, Julien; Pasche, Emilie; Vishnyakova, Dina; Ruch, Patrick

2014-01-01

215

De Novo Assembly and Functional Annotation of the Olive (Olea europaea) Transcriptome  

PubMed Central

Olive breeding programmes are focused on selecting for traits as short juvenile period, plant architecture suited for mechanical harvest, or oil characteristics, including fatty acid composition, phenolic, and volatile compounds to suit new markets. Understanding the molecular basis of these characteristics and improving the efficiency of such breeding programmes require the development of genomic information and tools. However, despite its economic relevance, genomic information on olive or closely related species is still scarce. We have applied Sanger and 454 pyrosequencing technologies to generate close to 2 million reads from 12 cDNA libraries obtained from the Picual, Arbequina, and Lechin de Sevilla cultivars and seedlings from a segregating progeny of a Picual × Arbequina cross. The libraries include fruit mesocarp and seeds at three relevant developmental stages, young stems and leaves, active juvenile and adult buds as well as dormant buds, and juvenile and adult roots. The reads were assembled by library or tissue and then assembled together into 81 020 unigenes with an average size of 496 bases. Here, we report their assembly and their functional annotation. PMID:23297299

Muñoz-Mérida, Antonio; González-Plaza, Juan José; Cañada, Andrés; Blanco, Ana María; García-López, Maria del Carmen; Rodríguez, José Manuel; Pedrola, Laia; Sicardo, M. Dolores; Hernández, M. Luisa; De la Rosa, Raúl; Belaj, Angjelina; Gil-Borja, Mayte; Luque, Francisco; Martínez-Rivas, José Manuel; Pisano, David G.; Trelles, Oswaldo; Valpuesta, Victoriano; Beuzón, Carmen R.

2013-01-01

216

Comparative annotation of functional regions in the human genome using epigenomic data  

PubMed Central

Epigenetic regulation is dynamic and cell-type dependent. The recently available epigenomic data in multiple cell types provide an unprecedented opportunity for a comparative study of epigenetic landscape. We developed a machine-learning method called ChroModule to annotate the epigenetic states in eight ENCyclopedia Of DNA Elements cell types. The trained model successfully captured the characteristic histone-modification patterns associated with regulatory elements, such as promoters and enhancers, and showed superior performance on identifying enhancers compared with the state-of-art methods. In addition, given the fixed number of epigenetic states in the model, ChroModule allows straightforward illustration of epigenetic variability in multiple cell types. Using this feature, we found that invariable and variable epigenetic states across cell types correspond to housekeeping functions and stimulus response, respectively. Especially, we observed that enhancers, but not the other regulatory elements, dictate cell specificity, as similar cell types share common enhancers, and cell-type–specific enhancers are often bound by transcription factors playing critical roles in that cell type. More interestingly, we found some genomic regions are dormant in cell type but primed to become active in other cell types. These observations highlight the usefulness of ChroModule in comparative analysis and interpretation of multiple epigenomes. PMID:23482391

Won, Kyoung-Jae; Zhang, Xian; Wang, Tao; Ding, Bo; Raha, Debasish; Snyder, Michael; Ren, Bing; Wang, Wei

2013-01-01

217

Automated annotation of functional imaging experiments via multi-label classification  

PubMed Central

Identifying the experimental methods in human neuroimaging papers is important for grouping meaningfully similar experiments for meta-analyses. Currently, this can only be done by human readers. We present the performance of common machine learning (text mining) methods applied to the problem of automatically classifying or labeling this literature. Labeling terms are from the Cognitive Paradigm Ontology (CogPO), the text corpora are abstracts of published functional neuroimaging papers, and the methods use the performance of a human expert as training data. We aim to replicate the expert's annotation of multiple labels per abstract identifying the experimental stimuli, cognitive paradigms, response types, and other relevant dimensions of the experiments. We use several standard machine learning methods: naive Bayes (NB), k-nearest neighbor, and support vector machines (specifically SMO or sequential minimal optimization). Exact match performance ranged from only 15% in the worst cases to 78% in the best cases. NB methods combined with binary relevance transformations performed strongly and were robust to overfitting. This collection of results demonstrates what can be achieved with off-the-shelf software components and little to no pre-processing of raw text. PMID:24409112

Turner, Matthew D.; Chakrabarti, Chayan; Jones, Thomas B.; Xu, Jiawei F.; Fox, Peter T.; Luger, George F.; Laird, Angela R.; Turner, Jessica A.

2013-01-01

218

Annotation of Ehux ESTs  

SciTech Connect

22 percent ESTs do no align with scaffolds. EST Pipeleine assembles 17126 consensi from the noaligned ESTs. Annotation Pipeline predicts 8564 ORFS on the consensi. Domain analysis of ORFs reveals missing genes. Cluster analysis reveals missing genes. Expression analysis reveals potential strain specific genes.

Kuo, Alan; Grigoriev, Igor

2009-06-12

219

Gene Ontology annotation highlights shared and divergent pathogenic strategies of type III effector proteins deployed by the plant pathogen Pseudomonas syringae pv tomato DC3000 and animal pathogenic Escherichia coli strains  

PubMed Central

Genome-informed identification and characterization of Type III effector repertoires in various bacterial strains and species is revealing important insights into the critical roles that these proteins play in the pathogenic strategies of diverse bacteria. However, non-systematic discipline-specific approaches to their annotation impede analysis of the accumulating wealth of data and inhibit easy communication of findings among researchers working on different experimental systems. The development of Gene Ontology (GO) terms to capture biological processes occurring during the interaction between organisms creates a common language that facilitates cross-genome analyses. The application of these terms to annotate type III effector genes in different bacterial species – the plant pathogen Pseudomonas syringae pv tomato DC3000 and animal pathogenic strains of Escherichia coli – illustrates how GO can effectively describe fundamental similarities and differences among different gene products deployed as part of diverse pathogenic strategies. In depth descriptions of the GO annotations for P. syringae pv tomato DC3000 effector AvrPtoB and the E. coli effector Tir are described, with special emphasis given to GO capability for capturing information about interacting proteins and taxa. GO-highlighted similarities in biological process and molecular function for effectors from additional pathosystems are also discussed. PMID:19278552

Lindeberg, Magdalen; Biehl, Bryan S; Glasner, Jeremy D; Perna, Nicole T; Collmer, Alan; Collmer, Candace W

2009-01-01

220

Multidimensional annotation of the Escherichia coli K-12 genome  

Microsoft Academic Search

The annotation of the Escherichia coli K-12 genome in the EcoCyc database is one of the most accurate, complete and multidimensional genome annota- tions. Of the 4460 E. coli genes, EcoCyc assigns biochemical functions to 76%, and 66% of all genes had their functions determined experimentally. EcoCyc assigns E. coli genes to Gene Ontology and to MultiFun. Seventy-five percent of

Peter D. Karp; Ingrid M. Keseler; Alexander Shearer; Mario Latendresse; Markus Krummenacker; Suzanne M. Paley; Ian Paulsen; Julio Collado-Vides; Socorro Gama-Castro; Martin Peralta-Gil; Alberto Santos-Zavaleta; M. I. Penaloza-Spinola; C. Bonavides-Martinez; J. Ingraham

2007-01-01

221

Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction  

PubMed Central

Background Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. Results This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. Conclusions Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions. PMID:24070402

2013-01-01

222

The UniProt-GO Annotation database in 2011  

PubMed Central

The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins. Currently supplying over 100 million annotations to 11 million proteins in more than 360?000 taxa, this resource has increased 2-fold over the last 2?years and has benefited from a wealth of checks to improve annotation correctness and consistency as well as now supplying a greater information content enabled by GO Consortium annotation format developments. Detailed, manual GO annotations obtained from the curation of peer-reviewed papers are directly contributed by all UniProt curators and supplemented with manual and electronic annotations from 36 model organism and domain-focused scientific resources. The inclusion of high-quality, automatic annotation predictions ensures the UniProt GO annotation dataset supplies functional information to a wide range of proteins, including those from poorly characterized, non-model organism species. UniProt GO annotations are freely available in a range of formats accessible by both file downloads and web-based views. In addition, the introduction of a new, normalized file format in 2010 has made for easier handling of the complete UniProt-GOA data set. PMID:22123736

Dimmer, Emily C.; Huntley, Rachael P.; Alam-Faruque, Yasmin; Sawford, Tony; O'Donovan, Claire; Martin, Maria J.; Bely, Benoit; Browne, Paul; Mun Chan, Wei; Eberhardt, Ruth; Gardner, Michael; Laiho, Kati; Legge, Duncan; Magrane, Michele; Pichler, Klemens; Poggioli, Diego; Sehra, Harminder; Auchincloss, Andrea; Axelsen, Kristian; Blatter, Marie-Claude; Boutet, Emmanuel; Braconi-Quintaje, Silvia; Breuza, Lionel; Bridge, Alan; Coudert, Elizabeth; Estreicher, Anne; Famiglietti, Livia; Ferro-Rojas, Serenella; Feuermann, Marc; Gos, Arnaud; Gruaz-Gumowski, Nadine; Hinz, Ursula; Hulo, Chantal; James, Janet; Jimenez, Silvia; Jungo, Florence; Keller, Guillaume; Lemercier, Phillippe; Lieberherr, Damien; Masson, Patrick; Moinat, Madelaine; Pedruzzi, Ivo; Poux, Sylvain; Rivoire, Catherine; Roechert, Bernd; Schneider, Michael; Stutz, Andre; Sundaram, Shyamala; Tognolli, Michael; Bougueleret, Lydie; Argoud-Puy, Ghislaine; Cusin, Isabelle; Duek- Roggli, Paula; Xenarios, Ioannis; Apweiler, Rolf

2012-01-01

223

Elucidating gene function and function evolution through comparison of co-expression networks of plants  

PubMed Central

The analysis of gene expression data has shown that transcriptionally coordinated (co-expressed) genes are often functionally related, enabling scientists to use expression data in gene function prediction. This Focused Review discusses our original paper (Large-scale co-expression approach to dissect secondary cell wall formation across plant species, Frontiers in Plant Science 2:23). In this paper we applied cross-species analysis to co-expression networks of genes involved in cellulose biosynthesis. We showed that the co-expression networks from different species are highly similar, indicating that whole biological pathways are conserved across species. This finding has two important implications. First, the analysis can transfer gene function annotation from well-studied plants, such as Arabidopsis, to other, uncharacterized plant species. As the analysis finds genes that have similar sequence and similar expression pattern across different organisms, functionally equivalent genes can be identified. Second, since co-expression analyses are often noisy, a comparative analysis should have higher performance, as parts of co-expression networks that are conserved are more likely to be functionally relevant. In this Focused Review, we outline the comparative analysis done in the original paper and comment on the recent advances and approaches that allow comparative analyses of co-function networks. We hypothesize that in comparison to simple co-expression analysis, comparative analysis would yield more accurate gene function predictions. Finally, by combining comparative analysis with genomic information of green plants, we propose a possible composition of cellulose biosynthesis machinery during earlier stages of plant evolution. PMID:25191328

Hansen, Bjoern O.; Vaid, Neha; Musialak-Lange, Magdalena; Janowski, Marcin; Mutwil, Marek

2014-01-01

224

Managing the data deluge: data-driven GO category assignment improves while complexity of functional annotation increases.  

PubMed

The available curated data lag behind current biological knowledge contained in the literature. Text mining can assist biologists and curators to locate and access this knowledge, for instance by characterizing the functional profile of publications. Gene Ontology (GO) category assignment in free text already supports various applications, such as powering ontology-based search engines, finding curation-relevant articles (triage) or helping the curator to identify and encode functions. Popular text mining tools for GO classification are based on so called thesaurus-based--or dictionary-based--approaches, which exploit similarities between the input text and GO terms themselves. But their effectiveness remains limited owing to the complex nature of GO terms, which rarely occur in text. In contrast, machine learning approaches exploit similarities between the input text and already curated instances contained in a knowledge base to infer a functional profile. GO Annotations (GOA) and MEDLINE make possible to exploit a growing amount of curated abstracts (97 000 in November 2012) for populating this knowledge base. Our study compares a state-of-the-art thesaurus-based system with a machine learning system (based on a k-Nearest Neighbours algorithm) for the task of proposing a functional profile for unseen MEDLINE abstracts, and shows how resources and performances have evolved. Systems are evaluated on their ability to propose for a given abstract the GO terms (2.8 on average) used for curation in GOA. We show that since 2006, although a massive effort was put into adding synonyms in GO (+300%), our thesaurus-based system effectiveness is rather constant, reaching from 0.28 to 0.31 for Recall at 20 (R20). In contrast, thanks to its knowledge base growth, our machine learning system has steadily improved, reaching from 0.38 in 2006 to 0.56 for R20 in 2012. Integrated in semi-automatic workflows or in fully automatic pipelines, such systems are more and more efficient to provide assistance to biologists. DATABASE URL: http://eagl.unige.ch/GOCat/ PMID:23842461

Gobeill, Julien; Pasche, Emilie; Vishnyakova, Dina; Ruch, Patrick

2013-01-01

225

RIDDLE: reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network  

PubMed Central

The growing availability of large-scale functional networks has promoted the development of many successful techniques for predicting functions of genes. Here we extend these network-based principles and techniques to functionally characterize whole sets of genes. We present RIDDLE (Reflective Diffusion and Local Extension), which uses well developed guilt-by-association principles upon a human gene network to identify associations of gene sets. RIDDLE is particularly adept at characterizing sets with no annotations, a major challenge where most traditional set analyses fail. Notably, RIDDLE found microRNA-450a to be strongly implicated in ocular diseases and development. A web application is available at http://www.functionalnet.org/RIDDLE. PMID:23268829

2012-01-01

226

De Novo Whole-Genome Sequence and Genome Annotation of Lichtheimia ramosa  

PubMed Central

We report the annotated draft genome sequence of Lichtheimia ramosa (JMRC FSU:6197). It has been reported to be a causative organism of mucormycosis, a rare but rapidly progressive infection in immunocompromised humans. The functionally annotated genomic sequence consists of 74 scaffolds with a total number of 11,510 genes. PMID:25212617

Linde, Jörg; Schwartze, Volker; Binder, Ulrike; Lass-Flörl, Cornelia

2014-01-01

227

Function of the DISC1 Gene  

NSDL National Science Digital Library

As a result of the human genome project, we now know largely where our genes are, and what structure they have. The search to uncover each gene's function, on the other hand, is only in its infancy. Functional genomics is an area of research dedicated to studying what protein is produced by a gene, and what happens in the body when it is activated. Understanding gene function is the next major hurdle in genomic research, which holds the key to developing revolutionary therapeutics.

2009-04-14

228

Mapping and Functional Characterization of Candidate Genes  

E-print Network

Mapping and Functional Characterization of Candidate Genes and Mutations for Chicken Growth: GeneticAssociation) #12;Mapping and Functional Characterization of Candidate Genes and Mutations Karyotype 17! 2.4! Gene Mapping and Association Studies 19! 2.4.1! QTL Analysis 19! 2.4.2! Genome

229

Automation of Drosophila gene expression pattern image annotation : development of web-based image annotation tool and application of machine learning methods  

E-print Network

Large-scale in situ hybridization screens are providing an abundance of spatio-temporal patterns of gene expression data that is valuable for understanding the mechanisms of gene regulation. Drosophila gene expression ...

Ayuso, Anna Maria E

2011-01-01

230

Annotation and comparative analysis of the glycoside hydrolase genes in Brachypodium distachyon  

Microsoft Academic Search

BACKGROUND: Glycoside hydrolases cleave the bond between a carbohydrate and another carbohydrate, a protein, lipid or other moiety. Genes encoding glycoside hydrolases are found in a wide range of organisms, from archea to animals, and are relatively abundant in plant genomes. In plants, these enzymes are involved in diverse processes, including starch metabolism, defense, and cell-wall remodeling. Glycoside hydrolase genes

Ludmila Tyler; Jennifer N Bragg; Jiajie Wu; Xiaohan Yang; Gerald A Tuskan; John P Vogel

2010-01-01

231

GENOMIC SEQUENCE AND ANNOTATION OF A REGION THAT HARBORS MAJOR HISTOCOMPATIBILITY GENES IN RAINBOW TROUT  

Technology Transfer Automated Retrieval System (TEKTRAN)

We have previously shown that the rainbow trout genome contains at least 4 unlinked regions of major histocompatibility (MH) genes. One of the regions which we previously dubbed the extended MH class II region is located on chromosome arm 3p and harbors the TAP1 (aka ABCB2) gene. TAP1 is one of the ...

232

Text Detective: a rule-based system for gene annotation in biomedical texts  

PubMed Central

Background The identification of mentions of gene or gene products in biomedical texts is a critical step in the development of text mining applications in biosciences. The complexity and ambiguity of gene nomenclature makes this a very difficult task. Methods Here we present a novel approach based on a combination of carefully designed rules and several lexicons of biological concepts, implemented in the Text Detective system. Text Detective is able to normalize the results of gene mentions found by offering the appropriate database reference. Results In BioCreAtIvE evaluation, Text Detective achieved results of 84% precision, 71% recall for task 1A, and 79% precision, 71% recall for mouse genes in task 1B. PMID:15960822

Tamames, Javier

2005-01-01

233

Predictive Integration of Gene Ontology-Driven Similarity and Functional Interactions  

PubMed Central

There is a need to develop methods to automatically incorporate prior knowledge to support the prediction and validation of novel functional associations. One such important source is represented by the Gene Ontology (GO)™ and the many model organism databases of gene products annotated to the GO. We investigated quantitative relationships between the GO-driven similarity of genes and their functional interactions by analyzing different types of associations in Saccharomyces cerevisiae and Caenorhabditis elegans. Interacting genes exhibited significantly higher levels of GO-driven similarity (GOS) in comparison to random pairs of genes used as a surrogate for negative interactions. The Biological Process hierarchy provides more reliable results for co-regulatory and protein-protein interactions. GOS represent a relevant resource to support prediction of functional networks in combination with other resources.

Wang, Haiying; Zheng, Huiru; Bodenreider, Olivier; Chesneau, Alban

2015-01-01

234

Analysis and functional annotation of expressed sequence tags from the fall armyworm Spodoptera frugiperda  

PubMed Central

Background Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm. Results We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences. Conclusion S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses. PMID:17052344

Deng, Youping; Dong, Yinghua; Thodima, Venkata; Clem, Rollie J; Passarelli, A Lorena

2006-01-01

235

FunGene: the functional gene pipeline and repository  

PubMed Central

Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer. While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; http://fungene.cme.msu.edu/) offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes. PMID:24101916

Fish, Jordan A.; Chai, Benli; Wang, Qiong; Sun, Yanni; Brown, C. Titus; Tiedje, James M.; Cole, James R.

2013-01-01

236

The evolution of the plastid chromosome in land plants: gene content, gene order, gene function  

E-print Network

The evolution of the plastid chromosome in land plants: gene content, gene order, gene function and evolution- ary aspects of plastid chromosome architecture in land plants and their putative ancestors. We. Keywords Plastid genome Á Land plants Á Genome evolution Á Plastid gene function Á Gene retention

dePamphilis, Claude

237

Cloning, annotation and expression analysis of mycoparasitism-related genes in Trichoderma harzianum 88.  

PubMed

Trichoderma harzianum 88, a filamentous soil fungus, is an effective biocontrol agent against several plant pathogens. High-throughput sequencing was used here to study the mycoparasitism mechanisms of T. harzianum 88. Plate confrontation tests of T. harzianum 88 against plant pathogens were conducted, and a cDNA library was constructed from T. harzianum 88 mycelia in the presence of plant pathogen cell walls. Randomly selected transcripts from the cDNA library were compared with eukaryotic plant and fungal genomes. Of the 1,386 transcripts sequenced, the most abundant Gene Ontology (GO) classification group was "physiological process". Differential expression of 19 genes was confirmed by real-time RT-PCR at different mycoparasitism stages against plant pathogens. Gene expression analysis revealed the transcription of various genes involved in mycoparasitism of T. harzianum 88. Our study provides helpful insights into the mechanisms of T. harzianum 88-plant pathogen interactions. PMID:23625217

Yao, Lin; Yang, Qian; Song, Jinzhu; Tan, Chong; Guo, Changhong; Wang, Li; Qu, Lianhai; Wang, Yun

2013-04-01

238

Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation  

Microsoft Academic Search

We have developed a DNA tag sequencing and mapping strategy called gene identification signature (GIS) analysis, in which 5? and 3? signatures of full-length cDNAs are accurately extracted into paired-end ditags (PETs) that are concatenated for efficient sequencing and mapped to genome sequences to demarcate the transcription boundaries of every gene. GIS analysis is potentially 30-fold more efficient than standard

Patrick Ng; Chia-Lin Wei; Wing-Kin Sung; Kuo Ping Chiu; Leonard Lipovich; Chin Chin Ang; Sanjay Gupta; Atif Shahab; Azmi Ridwan; Chee Hong Wong; Edison T Liu; Yijun Ruan

2005-01-01

239

The functional diversity of essential genes required for mammalian cardiac development  

PubMed Central

Genes required for an organism to develop to maturity (for which no other gene can compensate) are considered essential. The continuing functional annotation of the mouse genome has enabled the identification of many essential genes required for specific developmental processes including cardiac development. Patterns are now emerging regarding the functional nature of genes required at specific points throughout gestation. Essential genes required for development beyond cardiac progenitor cell migration and induction include a small and functionally homogenous group encoding transcription factors, ligands and receptors. Actions of core cardiogenic transcription factors from the Gata, Nkx, Mef, Hand, and Tbx families trigger a marked expansion in the functional diversity of essential genes from midgestation onwards. As the embryo grows in size and complexity, genes required to maintain a functional heartbeat and to provide muscular strength and regulate blood flow are well represented. These essential genes regulate further specialization and polarization of cell types along with proliferative, migratory, adhesive, contractile, and structural processes. The identification of patterns regarding the functional nature of essential genes across numerous developmental systems may aid prediction of further essential genes and those important to development and/or progression of disease. genesis 52:713–737, 2014. PMID:24866031

Clowes, Christopher; Boylan, Michael GS; Ridge, Liam A; Barnes, Emma; Wright, Jayne A; Hentges, Kathryn E

2014-01-01

240

INTERFEROME v2.0: an updated database of annotated interferon-regulated genes  

PubMed Central

Interferome v2.0 (http://interferome.its.monash.edu.au/interferome/) is an update of an earlier version of the Interferome DB published in the 2009 NAR database edition. Vastly improved computational infrastructure now enables more complex and faster queries, and supports more data sets from types I, II and III interferon (IFN)-treated cells, mice or humans. Quantitative, MIAME compliant data are collected, subjected to thorough, standardized, quantitative and statistical analyses and then significant changes in gene expression are uploaded. Comprehensive manual collection of metadata in v2.0 allows flexible, detailed search capacity including the parameters: range of -fold change, IFN type, concentration and time, and cell/tissue type. There is no limit to the number of genes that can be used to search the database in a single query. Secondary analysis such as gene ontology, regulatory factors, chromosomal location or tissue expression plots of IFN-regulated genes (IRGs) can be performed in Interferome v2.0, or data can be downloaded in convenient text formats compatible with common secondary analysis programs. Given the importance of IFN to innate immune responses in infectious, inflammatory diseases and cancer, this upgrade of the Interferome to version 2.0 will facilitate the identification of gene signatures of importance in the pathogenesis of these diseases. PMID:23203888

Rusinova, Irina; Forster, Sam; Yu, Simon; Kannan, Anitha; Masse, Marion; Cumming, Helen; Chapman, Ross; Hertzog, Paul J.

2013-01-01

241

Interferome v2.0: an updated database of annotated interferon-regulated genes.  

PubMed

Interferome v2.0 (http://interferome.its.monash.edu.au/interferome/) is an update of an earlier version of the Interferome DB published in the 2009 NAR database edition. Vastly improved computational infrastructure now enables more complex and faster queries, and supports more data sets from types I, II and III interferon (IFN)-treated cells, mice or humans. Quantitative, MIAME compliant data are collected, subjected to thorough, standardized, quantitative and statistical analyses and then significant changes in gene expression are uploaded. Comprehensive manual collection of metadata in v2.0 allows flexible, detailed search capacity including the parameters: range of -fold change, IFN type, concentration and time, and cell/tissue type. There is no limit to the number of genes that can be used to search the database in a single query. Secondary analysis such as gene ontology, regulatory factors, chromosomal location or tissue expression plots of IFN-regulated genes (IRGs) can be performed in Interferome v2.0, or data can be downloaded in convenient text formats compatible with common secondary analysis programs. Given the importance of IFN to innate immune responses in infectious, inflammatory diseases and cancer, this upgrade of the Interferome to version 2.0 will facilitate the identification of gene signatures of importance in the pathogenesis of these diseases. PMID:23203888

Rusinova, Irina; Forster, Sam; Yu, Simon; Kannan, Anitha; Masse, Marion; Cumming, Helen; Chapman, Ross; Hertzog, Paul J

2013-01-01

242

IMG ER: A System for Microbial Genome Annotation Expert Review and Curation  

SciTech Connect

A rapidly increasing number of microbial genomes are sequenced by organizations worldwide and are eventually included into various public genome data resources. The quality of the annotations depends largely on the original dataset providers, with erroneous or incomplete annotations often carried over into the public resources and difficult to correct. We have developed an Expert Review (ER) version of the Integrated Microbial Genomes (IMG) system, with the goal of supporting systematic and efficient revision of microbial genome annotations. IMG ER provides tools for the review and curation of annotations of both new and publicly available microbial genomes within IMG's rich integrated genome framework. New genome datasets are included into IMG ER prior to their public release either with their native annotations or with annotations generated by IMG ER's annotation pipeline. IMG ER tools allow addressing annotation problems detected with IMG's comparative analysis tools, such as genes missed by gene prediction pipelines or genes without an associated function. Over the past year, IMG ER was used for improving the annotations of about 150 microbial genomes.

Markowitz, Victor M.; Mavromatis, Konstantinos; Ivanova, Natalia N.; Chen, I-Min A.; Chu, Ken; Kyrpides, Nikos C.

2009-05-25

243

Organization and annotation of the Xcat critical region: elimination of seven positional candidate genes.  

PubMed

Xcat mice display X-linked congenital cataracts and are a mouse model for the human X-linked cataract disease Nance Horan syndrome (NHS). The genetic defect in Xcat mice and NHS patients is not known. We isolated and sequenced a BAC contig representing a portion of the Xcat critical region. We combined our sequencing data with the most recent mouse sequence assemblies from both Celera and public databases. The sequence of the 2.2-Mb Xcat critical region was then analyzed for potential Xcat candidate genes. The coding regions of the seven known genes within this area (Rai2, Rbbp7, Ctps2, Calb3, Grpr, Reps2, and Syap1) were sequenced in Xcat mice and no mutations were detected. The expression of Rai2 was quantitatively identical in wild-type and Xcat mutant eyes. These results indicate that the Xcat mutation is within a novel, undiscovered gene. PMID:15081118

Huang, Kristen M; Geunes-Boyer, Scarlett; Wu, Sufen; Dutra, Amalia; Favor, Jack; Stambolian, Dwight

2004-05-01

244

Redundant Gene Functions and Natural Selection  

Microsoft Academic Search

Redundant gene functions are ubiquitous, and they are a potentially important source of evolutionary innovations on the biochemical level. It is therefore highly desirable to understand the mechanisms governing their evolution. Gene duplication is clearly a prominent mechanism generating redundant genes. However, because redundancy provides a protective effect against deleterious mutations, natural selection might be involved in generating and maintaining

Andreas Wagner

1997-01-01

245

Sma3s: A Three-Step Modular Annotator for Large Sequence Datasets  

PubMed Central

Automatic sequence annotation is an essential component of modern ‘omics’ studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ?85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes. PMID:24501397

Muñoz-Mérida, Antonio; Viguera, Enrique; Claros, M. Gonzalo; Trelles, Oswaldo; Pérez-Pulido, Antonio J.

2014-01-01

246

Syntactic Annotation of a German Newspaper Corpus  

Microsoft Academic Search

\\u000a We report on the syntactic annotation of a German newspaper corpus. The annotations consist of context-free structures, additionally\\u000a allowing crossing branches, with labeled nodes (phrases) and edges (grammatical functions). Furthermore, we present a new,\\u000a interactive semi-automatic annotation process that allows efficient and reliable annotations. The annotation process is sped\\u000a up by incrementally presenting structures and by automatically highlighting unreliable assignments.

Thorsten Brants; Wojciech Skut; Hans Uszkoreit

247

Gene Transfer Strategies for Augmenting Cardiac Function  

Microsoft Academic Search

Recent transgenic as well as gene-targeted animal models have greatly increased our understanding of the molecular mechanisms of normal and compromised heart function. These studies have raised the possibility of using somatic gene transfer as a means for improving cardiac function. DNA transfer to a significant portion of the myocardium has thus far been difficult to accomplish. This review describes

Karsten Peppel; Walter J Koch; Robert J Lefkowitz

1997-01-01

248

The GeneMine system for genome\\/proteome annotation and collaborative data mining  

Microsoft Academic Search

As genome data and bioinformatics resources grow exponentially in size and complexity, there is an increasing need for software that can bridge the gap between biologists with questions and the worldwide set of highly specialized tools for answering them. The GeneMine system for small-to medium-scale genome analysis provides: (1) automated analysis of DNA (deoxyribonucleic acid) and protein sequence data using

Christopher Lee; Kris Irizarry

2001-01-01

249

MaGe: a microbial genome annotation system supported by synteny results  

PubMed Central

Magnifying Genomes (MaGe) is a microbial genome annotation system based on a relational database containing information on bacterial genomes, as well as a web interface to achieve genome annotation projects. Our system allows one to initiate the annotation of a genome at the early stage of the finishing phase. MaGe's main features are (i) integration of annotation data from bacterial genomes enhanced by a gene coding re-annotation process using accurate gene models, (ii) integration of results obtained with a wide range of bioinformatics methods, among which exploration of gene context by searching for conserved synteny and reconstruction of metabolic pathways, (iii) an advanced web interface allowing multiple users to refine the automatic assignment of gene product functions. MaGe is also linked to numerous well-known biological databases and systems. Our system has been thoroughly tested during the annotation of complete bacterial genomes (Acinetobacter baylyi ADP1, Pseudoalteromonas haloplanktis, Frankia alni) and is currently used in the context of several new microbial genome annotation projects. In addition, MaGe allows for annotation curation and exploration of already published genomes from various genera (e.g. Yersinia, Bacillus and Neisseria). MaGe can be accessed at . PMID:16407324

Vallenet, David; Labarre, Laurent; Rouy, Zoé; Barbe, Valérie; Bocs, Stéphanie; Cruveiller, Stéphane; Lajus, Aurélie; Pascal, Géraldine; Scarpelli, Claude; Médigue, Claudine

2006-01-01

250

Experimental strategies for functional annotation and metabolism discovery: targeted screening of solute binding proteins and unbiased panning of metabolomes.  

PubMed

The rate at which genome sequencing data is accruing demands enhanced methods for functional annotation and metabolism discovery. Solute binding proteins (SBPs) facilitate the transport of the first reactant in a metabolic pathway, thereby constraining the regions of chemical space and the chemistries that must be considered for pathway reconstruction. We describe high-throughput protein production and differential scanning fluorimetry platforms, which enabled the screening of 158 SBPs against a 189 component library specifically tailored for this class of proteins. Like all screening efforts, this approach is limited by the practical constraints imposed by construction of the library, i.e., we can study only those metabolites that are known to exist and which can be made in sufficient quantities for experimentation. To move beyond these inherent limitations, we illustrate the promise of crystallographic- and mass spectrometric-based approaches for the unbiased use of entire metabolomes as screening libraries. Together, our approaches identified 40 new SBP ligands, generated experiment-based annotations for 2084 SBPs in 71 isofunctional clusters, and defined numerous metabolic pathways, including novel catabolic pathways for the utilization of ethanolamine as sole nitrogen source and the use of d-Ala-d-Ala as sole carbon source. These efforts begin to define an integrated strategy for realizing the full value of amassing genome sequence data. PMID:25540822

Vetting, Matthew W; Al-Obaidi, Nawar; Zhao, Suwen; San Francisco, Brian; Kim, Jungwook; Wichelecki, Daniel J; Bouvier, Jason T; Solbiati, Jose O; Vu, Hoan; Zhang, Xinshuai; Rodionov, Dmitry A; Love, James D; Hillerich, Brandan S; Seidel, Ronald D; Quinn, Ronald J; Osterman, Andrei L; Cronan, John E; Jacobson, Matthew P; Gerlt, John A; Almo, Steven C

2015-01-27

251

Novel cardiovascular gene functions revealed via systematic phenotype prediction in zebrafish  

PubMed Central

Comprehensive functional annotation of vertebrate genomes is fundamental to biological discovery. Reverse genetic screening has been highly useful for determination of gene function, but is untenable as a systematic approach in vertebrate model organisms given the number of surveyable genes and observable phenotypes. Unbiased prediction of gene-phenotype relationships offers a strategy to direct finite experimental resources towards likely phenotypes, thus maximizing de novo discovery of gene functions. Here we prioritized genes for phenotypic assay in zebrafish through machine learning, predicting the effect of loss of function of each of 15,106 zebrafish genes on 338 distinct embryonic anatomical processes. Focusing on cardiovascular phenotypes, the learning procedure predicted known knockdown and mutant phenotypes with high precision. In proof-of-concept studies we validated 16 high-confidence cardiac predictions using targeted morpholino knockdown and initial blinded phenotyping in embryonic zebrafish, confirming a significant enrichment for cardiac phenotypes as compared with morpholino controls. Subsequent detailed analyses of cardiac function confirmed these results, identifying novel physiological defects for 11 tested genes. Among these we identified tmem88a, a recently described attenuator of Wnt signaling, as a discrete regulator of the patterning of intercellular coupling in the zebrafish cardiac epithelium. Thus, we show that systematic prioritization in zebrafish can accelerate the pace of developmental gene function discovery. PMID:24346703

Musso, Gabriel; Tasan, Murat; Mosimann, Christian; Beaver, John E.; Plovie, Eva; Carr, Logan A.; Chua, Hon Nian; Dunham, Julie; Zuberi, Khalid; Rodriguez, Harold; Morris, Quaid; Zon, Leonard; Roth, Frederick P.; MacRae, Calum A.

2014-01-01

252

IIS – Integrated Interactome System: A Web-Based Platform for the Annotation, Analysis and Visualization of Protein-Metabolite-Gene-Drug Interactions by Integrating a Variety of Data Sources and Tools  

PubMed Central

Background High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. Results We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. Conclusions We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/. PMID:24949626

Carazzolle, Marcelo Falsarella; de Carvalho, Lucas Miguel; Slepicka, Hugo Henrique; Vidal, Ramon Oliveira; Pereira, Gonçalo Amarante Guimarães; Kobarg, Jörg; Vaz Meirelles, Gabriela

2014-01-01

253

Antagonistic functional duality of cancer genes.  

PubMed

Cancer evolution is a stochastic process both at the genome and gene levels. Most of tumors contain multiple genetic subclones, evolving in either succession or in parallel, either in a linear or branching manner, with heterogeneous genome and gene alterations, extensively rewired signaling networks, and addicted to multiple oncogenes easily switching with each other during cancer progression and medical intervention. Hundreds of discovered cancer genes are classified according to whether they function in a dominant (oncogenes) or recessive (tumor suppressor genes) manner in a cancer cell. However, there are many cancer "gene-chameleons", which behave distinctly in opposite way in the different experimental settings showing antagonistic duality. In contrast to the widely accepted view that mutant NADP(+)-dependent isocitrate dehydrogenases 1/2 (IDH1/2) and associated metabolite 2-hydroxyglutarate (R)-enantiomer are intrinsically "the drivers" of tumourigenesis, mutant IDH1/2 inhibited, promoted or had no effect on cell proliferation, growth and tumorigenicity in diverse experiments. Similar behavior was evidenced for dozens of cancer genes. Gene function is dependent on genetic network, which is defined by the genome context. The overall changes in karyotype can result in alterations of the role and function of the same genes and pathways. The diverse cell lines and tumor samples have been used in experiments for proving gene tumor promoting/suppressive activity. They all display heterogeneous individual karyotypes and disturbed signaling networks. Consequently, the effect and function of gene under investigation can be opposite and versatile in cells with different genomes that may explain antagonistic duality of cancer genes and the cell type- or the cellular genetic/context-dependent response to the same protein. Antagonistic duality of cancer genes might contribute to failure of chemotherapy. Instructive examples of unexpected activity of cancer genes and "paradoxical" effects of different anticancer drugs depending on the cellular genetic context/signaling network are discussed. PMID:23933273

Stepanenko, A A; Vassetzky, Y S; Kavsan, V M

2013-10-25

254

Functional identity of the gamma tropomyosin gene  

PubMed Central

The actin filament system is fundamental to cellular functions including regulation of shape, motility, cytokinesis, intracellular trafficking and tissue organization. Tropomyosins (Tm) are highly conserved components of actin filaments which differentially regulate filament stability and function. The mammalian Tm family consists of four genes; ?Tm, ?Tm, ?Tm and ?Tm. Multiple Tm isoforms (>40) are generated by alternative splicing and expression of these isoforms is highly regulated during development. In order to further identify the role of Tm isoforms during development, we tested the specificity of function of products from the ?Tm gene family in mice using a series of gene knockouts. Ablation of all ?Tm gene cytoskeletal products results in embryonic lethality. Elimination of just two cytoskeletal products from the ?Tm gene (NM1,2) resulted in a 50% reduction in embryo viability. It was also not possible to generate homozygous knockout ES cells for the targets which eliminated or reduced embryo viability in mice. In contrast, homozygous knockout ES cells were generated for a different set of isoforms (NM3,5,6,8,9,11) which were not required for embryogenesis. We also observed that males hemizygous for the knockout of all cytoskeletal products from the ?Tm gene preferentially transmitted the minus allele with 80–100% transmission. Since all four Tm genes are expressed in early embryos, ES cells and sperm, we conclude that isoforms of the ?Tm gene are functionally unique in their role in embryogenesis, ES cell viability and sperm function. PMID:21866263

Hook, Jeff; Lemckert, Frances; Schevzov, Galina; Fath, Thomas

2011-01-01

255

The MitoDrome database annotates and compares the OXPHOS nuclear genes of Drosophila melanogaster, Drosophila pseudoobscura and Anopheles gambiae  

Microsoft Academic Search

The oxidative phosphorylation (OXPHOS) is the primary energy-producing process of all aerobic organisms and the only cellular function under the dual control of both the mitochondrial and the nuclear genomes. Functional characterization and evolutionary study of the OXPHOS system is of great importance for the understanding of many as yet unclear aspects of nucleus-mitochondrion genomic co-evolution and co-regulation gene networks.

Domenica D’Elia; Domenico Catalano; Flavio Licciulli; Antonio Turi; Gaetano Tripoli; Damiano Porcelli; Cecilia Saccone; Corrado Caggese

2006-01-01

256

Matrix factorization-based data fusion for gene function prediction in baker's yeast and slime mold.  

PubMed

The development of effective methods for the characterization of gene functions that are able to combine diverse data sources in a sound and easily-extendible way is an important goal in computational biology. We have previously developed a general matrix factorization-based data fusion approach for gene function prediction. In this manuscript, we show that this data fusion approach can be applied to gene function prediction and that it can fuse various heterogeneous data sources, such as gene expression profiles, known protein annotations, interaction and literature data. The fusion is achieved by simultaneous matrix tri-factorization that shares matrix factors between sources. We demonstrate the effectiveness of the approach by evaluating its performance on predicting ontological annotations in slime mold D. discoideum and on recognizing proteins of baker's yeast S. cerevisiae that participate in the ribosome or are located in the cell membrane. Our approach achieves predictive performance comparable to that of the state-of-the-art kernel-based data fusion, but requires fewer data preprocessing steps. PMID:24297565

Zitnik, Marinka; Zupan, Blaž

2014-01-01

257

Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms  

PubMed Central

Background Predicting protein function has become increasingly demanding in the era of next generation sequencing technology. The task to assign a curator-reviewed function to every single sequence is impracticable. Bioinformatics tools, easy to use and able to provide automatic and reliable annotations at a genomic scale, are necessary and urgent. In this scenario, the Gene Ontology has provided the means to standardize the annotation classification with a structured vocabulary which can be easily exploited by computational methods. Results Argot2 is a web-based function prediction tool able to annotate nucleic or protein sequences from small datasets up to entire genomes. It accepts as input a list of sequences in FASTA format, which are processed using BLAST and HMMER searches vs UniProKB and Pfam databases respectively; these sequences are then annotated with GO terms retrieved from the UniProtKB-GOA database and the terms are weighted using the e-values from BLAST and HMMER. The weighted GO terms are processed according to both their semantic similarity relations described by the Gene Ontology and their associated score. The algorithm is based on the original idea developed in a previous tool called Argot. The entire engine has been completely rewritten to improve both accuracy and computational efficiency, thus allowing for the annotation of complete genomes. Conclusions The revised algorithm has been already employed and successfully tested during in-house genome projects of grape and apple, and has proven to have a high precision and recall in all our benchmark conditions. It has also been successfully compared with Blast2GO, one of the methods most commonly employed for sequence annotation. The server is freely accessible at http://www.medcomp.medicina.unipd.it/Argot2. PMID:22536960

2012-01-01

258

Anxiety induced by prenatal stress is associated with suppression of hippocampal genes involved in synaptic function  

E-print Network

Anxiety induced by prenatal stress is associated with suppression of hippocampal genes involved and changes in behaviour resulting from prenatal stress. Keywords: anxiety, exocytosis, gene annotation stress to a greater incidence of anxiety, affective disorders, attention deficits and schizophrenia

Linial, Michal

259

Fungal avirulence genes: structure and possible functions.  

PubMed

Avirulence (Avr) genes exist in many fungi that share a gene-for-gene relationship with their host plant. They represent unique genetic determinants that prevent fungi from causing disease on plants that possess matching resistance (R) genes. Interaction between elicitors (primary or secondary products of Avr genes) and host receptors in resistant plants causes induction of various defense responses often involving a hypersensitive response. Avr genes have been successfully isolated by reverse genetics and positional cloning. Five cultivar-specific Avr genes (Avr4, Avr9, and Ecp2 from Cladosporium fulvum; nip1 from Rhynchosporium secalis; and Avr2-YAMO from Magnaporthe grisea) and three species-specific Avr genes (PWL1 and PWL2 from M. grisea and inf1 from Phytophthora infestans) have been cloned. Isolation of additional Avr genes from these fungi, but also from other fungi such as Uromyces vignae, Melampsora lini, Phytophthora sojae, and Leptosphaeria maculans, is in progress. Molecular analyses of nonfunctional Avr gene alleles show that these originate from deletions or mutations in the open reading frame or the promoter sequence of an Avr gene. Although intrinsic biological functions of most Avr gene products are still unknown, recent studies have shown that two Avr genes, nip1 and Ecp2, encode products that are important pathogenicity factors. All fungal Avr genes cloned so far have been demonstrated or predicted to encode extracellular proteins. Current studies focus on unraveling the mechanisms of perception of avirulence factors by plant receptors. The exploitation of Avr genes and the matching R genes in engineered resistance is also discussed. PMID:9756710

Laugé, R; De Wit, P J

1998-08-01

260

Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes  

PubMed Central

Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to belong to a detailed functional class, but not in a broader class that, due to the vocabulary structure, includes the predicted one. We present a novel discrete optimization algorithm called Functional Annotation with Labeling CONsistency (FALCON) that resolves such contradictions. The GO is modeled as a discrete Bayesian Network. For any given input of GO term membership probabilities, the algorithm returns the most probable GO term assignments that are in accordance with the Gene Ontology structure. The optimization is done using the Differential Evolution algorithm. Performance is evaluated on simulated and also real data from Arabidopsis thaliana showing improvement compared to related approaches. We finally applied the FALCON algorithm to obtain genome-wide function predictions for six eukaryotic species based on data provided by the CAFA (Critical Assessment of Function Annotation) project. PMID:23531338

2013-01-01

261

Assessment of community-submitted ontology annotations from a novel database-journal partnership  

PubMed Central

As the scientific literature grows, leading to an increasing volume of published experimental data, so does the need to access and analyze this data using computational tools. The most commonly used method to convert published experimental data on gene function into controlled vocabulary annotations relies on a professional curator, employed by a model organism database or a more general resource such as UniProt, to read published articles and compose annotation statements based on the articles' contents. A more cost-effective and scalable approach capable of capturing gene function data across the whole range of biological research organisms in computable form is urgently needed. We have analyzed a set of ontology annotations generated through collaborations between the Arabidopsis Information Resource and several plant science journals. Analysis of the submissions entered using the online submission tool shows that most community annotations were well supported and the ontology terms chosen were at an appropriate level of specificity. Of the 503 individual annotations that were submitted, 97% were approved and community submissions captured 72% of all possible annotations. This new method for capturing experimental results in a computable form provides a cost-effective way to greatly increase the available body of annotations without sacrificing annotation quality. Database URL: www.arabidopsis.org PMID:22859749

Berardini, Tanya Z.; Li, Donghui; Muller, Robert; Chetty, Raymond; Ploetz, Larry; Singh, Shanker; Wensel, April; Huala, Eva

2012-01-01

262

Improving functional annotation for industrial microbes: a case study with Pichia pastoris  

E-print Network

models. Thus, in the near future, the ability of these models to predict correctly the metabolic impact of manipulating the endog- enous genes of this yeast, or expressing heterologous cod- ing sequences should be greatly enhanced. Evaluation...

Dikicioglu, Duygu; Wood, Valerie; Rutherford, Kim M.; McDowall, Mark D.; Oliver, Stephen G.

2014-06-11

263

B2G-FAR, a species-centered GO annotation repository  

PubMed Central

Motivation: Functional genomics research has expanded enormously in the last decade thanks to the cost reduction in high-throughput technologies and the development of computational tools that generate, standardize and share information on gene and protein function such as the Gene Ontology (GO). Nevertheless, many biologists, especially working with non-model organisms, still suffer from non-existing or low-coverage functional annotation, or simply struggle retrieving, summarizing and querying these data. Results: The Blast2GO Functional Annotation Repository (B2G-FAR) is a bioinformatics resource envisaged to provide functional information for otherwise uncharacterized sequence data and offers data mining tools to analyze a larger repertoire of species than currently available. This new annotation resource has been created by applying the Blast2GO functional annotation engine in a strongly high-throughput manner to the entire space of public available sequences. The resulting repository contains GO term predictions for over 13.2 million non-redundant protein sequences based on BLAST search alignments from the SIMAP database. We generated GO annotation for approximately 150 000 different taxa making available 2000 species with the highest coverage through B2G-FAR. A second section within B2G-FAR holds functional annotations for 17 non-model organism Affymetrix GeneChips. Conclusions: B2G-FAR provides easy access to exhaustive functional annotation for 2000 species offering a good balance between quality and quantity, thereby supporting functional genomics research especially in the case of non-model organisms. Availability: The annotation resource is available at http://www.b2gfar.org. Contact: aconesa@cipf.es; sgoetz@cipf.es Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21335611

Götz, Stefan; Arnold, Roland; Sebastián-León, Patricia; Martín-Rodríguez, Samuel; Tischler, Patrick; Jehl, Marc-André; Dopazo, Joaquín; Rattei, Thomas; Conesa, Ana

2011-01-01

264

CoCiter: An Efficient Tool to Infer Gene Function by Assessing the Significance of Literature Co-Citation  

PubMed Central

A routine approach to inferring functions for a gene set is by using function enrichment analysis based on GO, KEGG or other curated terms and pathways. However, such analysis requires the existence of overlapping genes between the query gene set and those annotated by GO/KEGG. Furthermore, GO/KEGG databases only maintain a very restricted vocabulary. Here, we have developed a tool called “CoCiter” based on literature co-citations to address the limitations in conventional function enrichment analysis. Co-citation analysis is widely used in ranking articles and predicting protein-protein interactions (PPIs). Our algorithm can further assess the co-citation significance of a gene set with any other user-defined gene sets, or with free terms. We show that compared with the traditional approaches, CoCiter is a more accurate and flexible function enrichment analysis method. CoCiter is freely available at www.picb.ac.cn/hanlab/cociter/. PMID:24086311

Naveed, Hammad; Green, Christopher D.; Han, Jing-Dong J.

2013-01-01

265

PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations  

PubMed Central

Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets. PMID:24675610

Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

2014-01-01

266

PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.  

PubMed

Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets. PMID:24675610

Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

2014-01-01

267

GeneCards Version 3: the human gene integrator.  

PubMed

GeneCards (www.genecards.org) is a comprehensive, authoritative compendium of annotative information about human genes, widely used for nearly 15 years. Its gene-centric content is automatically mined and integrated from over 80 digital sources, resulting in a web-based deep-linked card for each of >73,000 human gene entries, encompassing the following categories: protein coding, pseudogene, RNA gene, genetic locus, cluster and uncategorized. We now introduce GeneCards Version 3, featuring a speedy and sophisticated search engine and a revamped, technologically enabling infrastructure, catering to the expanding needs of biomedical researchers. A key focus is on gene-set analyses, which leverage GeneCards' unique wealth of combinatorial annotations. These include the GeneALaCart batch query facility, which tabulates user-selected annotations for multiple genes and GeneDecks, which identifies similar genes with shared annotations, and finds set-shared annotations by descriptor enrichment analysis. Such set-centric features address a host of applications, including microarray data analysis, cross-database annotation mapping and gene-disorder associations for drug targeting. We highlight the new Version 3 database architecture, its multi-faceted search engine, and its semi-automated quality assurance system. Data enhancements include an expanded visualization of gene expression patterns in normal and cancer tissues, an integrated alternative splicing pattern display, and augmented multi-source SNPs and pathways sections. GeneCards now provides direct links to gene-related research reagents such as antibodies, recombinant proteins, DNA clones and inhibitory RNAs and features gene-related drugs and compounds lists. We also portray the GeneCards Inferred Functionality Score annotation landscape tool for scoring a gene's functional information status. Finally, we delineate examples of applications and collaborations that have benefited from the GeneCards suite. Database URL: www.genecards.org. PMID:20689021

Safran, Marilyn; Dalah, Irina; Alexander, Justin; Rosen, Naomi; Iny Stein, Tsippi; Shmoish, Michael; Nativ, Noam; Bahir, Iris; Doniger, Tirza; Krug, Hagit; Sirota-Madi, Alexandra; Olender, Tsviya; Golan, Yaron; Stelzer, Gil; Harel, Arye; Lancet, Doron

2010-01-01

268

On the detection of functionally coherent groups of protein domains with an extension to protein annotation  

Microsoft Academic Search

BACKGROUND: Protein domains coordinate to perform multifaceted cellular functions, and domain combinations serve as the functional building blocks of the cell. The available methods to identify functional domain combinations are limited in their scope, e.g. to the identification of combinations falling within individual proteins or within specific regions in a translated genome. Further effort is needed to identify groups of

William A. Mclaughlin; Ken Chen; Tingjun Hou; Wei Wang

2007-01-01

269

Annotated Bibliography  

NSDL National Science Digital Library

Annotations are short and cannot give detailed information, but they should cover these points: 1. The general contents of the work. What does it discuss and how detailed is it? This is the main portion of the annotation. 2. The author's qualifications. Is the writer a trained scholar? A journalist? Someone relating a personal experience? 3. An evaluation of the reliability. Is the information given reliable? Are facts or opinions stressed? 4. The intended audience. Is it for a general reader or a specialist? How much, if any, background knowledge is needed to understand it? Was is easy or difficult to read?

Davis, Leslie

270

Annotation, phylogeny and expression analysis of the nuclear factor Y gene families in common bean (Phaseolus vulgaris)  

PubMed Central

In the past decade, plant nuclear factor Y (NF-Y) genes have gained major interest due to their roles in many biological processes in plant development or adaptation to environmental conditions, particularly in the root nodule symbiosis established between legume plants and nitrogen fixing bacteria. NF-Ys are heterotrimeric transcriptional complexes composed of three subunits, NF-YA, NF-YB, and NF-YC, which bind with high affinity and specificity to the CCAAT box, a cis element present in many eukaryotic promoters. In plants, NF-Y subunits consist of gene families with about 10 members each. In this study, we have identified and characterized the NF-Y gene families of common bean (Phaseolus vulgaris), a grain legume of worldwide economical importance and the main source of dietary protein of developing countries. Expression analysis showed that some members of each family are up-regulated at early or late stages of the nitrogen fixing symbiotic interaction with its partner Rhizobium etli. We also showed that some genes are differentially accumulated in response to inoculation with high or less efficient R. etli strains, constituting excellent candidates to participate in the strain-specific response during symbiosis. Genes of the NF-YA family exhibit a highly structured intron-exon organization. Moreover, this family is characterized by the presence of upstream ORFs when introns in the 5? UTR are retained and miRNA target sites in their 3? UTR, suggesting that these genes might be subjected to a complex post-transcriptional regulation. Multiple protein alignments indicated the presence of highly conserved domains in each of the NF-Y families, presumably involved in subunit interactions and DNA binding. The analysis presented here constitutes a starting point to understand the regulation and biological function of individual members of the NF-Y families in different developmental processes in this grain legume. PMID:25642232

Rípodas, Carolina; Castaingts, Mélisse; Clúa, Joaquín; Blanco, Flavio; Zanetti, María Eugenia

2015-01-01

271

The DNA sequence and biological annotation of human chromosome1  

Microsoft Academic Search

The reference sequence for each human chromosome provides the framework for understanding genome function, variation and evolution. Here we report the finished sequence and biological annotation of human chromosome1. Chromosome1 is gene-dense, with 3,141 genes and 991 pseudogenes, and many coding sequences overlap. Rearrangements and mutations of chromosome1 are prevalent in cancer and many other diseases. Patterns of sequence variation

S. G. Gregory; K. F. Barlow; K. E. McLay; R. Kaul; D. Swarbreck; A. Dunham; C. E. Scott; K. L. Howe; K. Woodfine; C. C. A. Spencer; M. C. Jones; C. Gillson; S. Searle; Y. Zhou; F. Kokocinski; L. McDonald; R. Evans; K. Phillips; A. Atkinson; R. Cooper; C. Jones; R. E. Hall; T. D. Andrews; C. Lloyd; R. Ainscough; J. P. Almeida; K. D. Ambrose; F. Anderson; R. W. Andrew; R. I. S. Ashwell; K. Aubin; A. K. Babbage; C. L. Bagguley; J. Bailey; H. Beasley; G. Bethel; C. P. Bird; S. Bray-Allen; J. Y. Brown; A. J. Brown; D. Buckley; J. Burton; J. Bye; C. Carder; J. C. Chapman; S. Y. Clark; G. Clarke; C. Clee; V. Cobley; R. E. Collier; N. Corby; G. J. Coville; J. Davies; R. Deadman; M. Dunn; M. Earthrowl; A. G. Ellington; H. Errington; A. Frankish; J. Frankland; P. Garner; J. Garnett; L. Gay; M. R. J. Ghori; R. Gibson; L. M. Gilby; W. Gillett; R. J. Glithero; D. V. Grafham; C. Griffiths; S. Griffiths-Jones; R. Grocock; S. Hammond; E. S. I. Harrison; E. Haugen; P. D. Heath; S. Holmes; K. Holt; P. J. Howden; A. R. Hunt; S. E. Hunt; G. Hunter; J. Isherwood; R. James; C. Johnson; D. Johnson; A. Joy; M. Kay; J. K. Kershaw; M. Kibukawa; A. M. Kimberley; A. J. Knights; H. Lad; G. Laird; S. Lawlor; D. A. Leongamornlert; D. M. Lloyd; J. Loveland; J. Lovell; M. J. Lush; R. Lyne; S. Martin; M. Mashreghi-Mohammadi; L. Matthews; N. S. W. Matthews; S. McLaren; S. Milne; S. Mistry; M. J. F. M Oore; T. Nickerson; C. N. O'Dell; K. Oliver; A. Palmeiri; S. A. Palmer; A. Parker; D. Patel; A. V. Pearce; A. I. Peck; S. Pelan; K. Phelps; R. Plumb; J. Rajan; C. Raymond; G. Rouse; C. Saenphimmachak; H. K. Sehra; E. Sheridan; R. Shownkeen; S. Sims; C. D. Skuce; M. Smith; C. Steward; S. Subramanian; N. Sycamore; A. Tracey; A. Tromans; Z. van Helmond; M. Wall; J. M. Wallis; S. L. Whitehead; J. E. Wilkinson; D. L. Willey; H. Williams; L. Wilming; P. W. Wray; Z. Wu; A. Coulson; M. Vaudin; J. E. Sulston; R. Durbin; I. Dunham; N. P. Carter; G. McVean; M. T. Ross; J. Harrow; M. V. Olson; S. Beck; J. Rogers; D. R. Bentley

2006-01-01

272

Studying Functions of All Yeast Genes Simultaneously  

NASA Technical Reports Server (NTRS)

A method of studying the functions of all the genes of a given species of microorganism simultaneously has been developed in experiments on Saccharomyces cerevisiae (commonly known as baker's or brewer's yeast). It is already known that many yeast genes perform functions similar to those of corresponding human genes; therefore, by facilitating understanding of yeast genes, the method may ultimately also contribute to the knowledge needed to treat some diseases in humans. Because of the complexity of the method and the highly specialized nature of the underlying knowledge, it is possible to give only a brief and sketchy summary here. The method involves the use of unique synthetic deoxyribonucleic acid (DNA) sequences that are denoted as DNA bar codes because of their utility as molecular labels. The method also involves the disruption of gene functions through deletion of genes. Saccharomyces cerevisiae is a particularly powerful experimental system in that multiple deletion strains easily can be pooled for parallel growth assays. Individual deletion strains recently have been created for 5,918 open reading frames, representing nearly all of the estimated 6,000 genetic loci of Saccharomyces cerevisiae. Tagging of each deletion strain with one or two unique 20-nucleotide sequences enables identification of genes affected by specific growth conditions, without prior knowledge of gene functions. Hybridization of bar-code DNA to oligonucleotide arrays can be used to measure the growth rate of each strain over several cell-division generations. The growth rate thus measured serves as an index of the fitness of the strain.

Stolc, Viktor; Eason, Robert G.; Poumand, Nader; Herman, Zelek S.; Davis, Ronald W.; Anthony Kevin; Jejelowo, Olufisayo

2006-01-01

273

RNA interference for wheat functional gene analysis  

Microsoft Academic Search

RNA interference (RNAi) refers to a common mechanism of RNA-based post-transcriptional gene silencing in eukaryotic cells.\\u000a In model plant species such as Arabidopsis and rice, RNAi has been routinely used to characterize gene function and to engineer novel phenotypes. In polyploid species,\\u000a this approach is in its early stages, but has great potential since multiple homoeologous copies can be simultaneously

Daolin Fu; Cristobal Uauy; Ann Blechl; Jorge Dubcovsky

2007-01-01

274

Annotated Videography.  

ERIC Educational Resources Information Center

This annotated list of 43 videotapes recommended for classroom use addresses various themes for teaching about the Holocaust, including: (1) overviews of the Holocaust; (2) life before the Holocaust; (3) propaganda; (4) racism, anti-Semitism; (5) "enemies of the state"; (6) ghettos; (7) camps; (8) genocide; (9) rescue; (10) resistance; (11)…

United States Holocaust Memorial Museum, Washington, DC.

275

Next Generation Models for Storage and Representation of Microbial Biological Annotation  

SciTech Connect

Background Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. Conclusions The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them.

Quest, Daniel J [ORNL; Land, Miriam L [ORNL; Brettin, Thomas S [ORNL; Cottingham, Robert W [ORNL

2010-01-01

276

Integrative bioinformatics for functional genome annotation: trawling for G protein-coupled receptors.  

PubMed

G protein-coupled receptors (GPCR) are amongst the best studied and most functionally diverse types of cell-surface protein. The importance of GPCRs as mediates or cell function and organismal developmental underlies their involvement in key physiological roles and their prominence as targets for pharmacological therapeutics. In this review, we highlight the requirement for integrated protocols which underline the different perspectives offered by different sequence analysis methods. BLAST and FastA offer broad brush strokes. Motif-based search methods add the fine detail. Structural modelling offers another perspective which allows us to elucidate the physicochemical properties that underlie ligand binding. Together, these different views provide a more informative and a more detailed picture of GPCR structure and function. Many GPCRs remain orphan receptors with no identified ligand, yet as computer-driven functional genomics starts to elaborate their functions, a new understanding of their roles in cell and developmental biology will follow. PMID:15561589

Flower, Darren R; Attwood, Teresa K

2004-12-01

277

Strain-specific genes of Helicobacter pylori: distribution, function and dynamics  

PubMed Central

Whole-genome clustering of the two available genome sequences of Helicobacter pylori strains 26695 and J99 allows the detection of 110 and 52 strain-specific genes, respectively. This set of strain-specific genes was compared with the sets obtained with other computational approaches of direct genome comparison as well as experimental data from microarray analysis. A considerable number of novel function assignments is possible using database-driven sequence annotation, although the function of the majority of the identified genes remains unknown. Using whole-genome clustering, it is also possible to detect species-specific genes by comparing the two H.pylori strains against the genome sequence of Campylobacter jejuni. It is interesting that the majority of strain-specific genes appear to be species specific. Finally, we introduce a novel approach to gene position analysis by employing measures from directional statistics. We show that although the two strains exhibit differences with respect to strain-specific gene distributions, this is due to the extensive genome rearrangements. If these are taken into account, a common pattern for the genome dynamics of the two Helicobacter strains emerges, suggestive of certain spatial constraints that may act as control mechanisms of gene flux. PMID:11691927

Janssen, Paul J.; Audit, Benjamin; Ouzounis, Christos A.

2001-01-01

278

Assigning functions to genes: identification of S-phase expressed genes in Leishmania major based on post-transcriptional control elements  

PubMed Central

Assigning functions to genes is one of the major challenges of the post-genomic era. Usually, functions are assigned based on similarity of the coding sequences to sequences of known genes, or by identification of transcriptional cis-regulatory elements that are known to be associated with specific pathways or conditions. In trypanosomatids, where regulation of gene expression takes place mainly at the post-transcriptional level, new approaches for function assignment are needed. Here we demonstrate the identification of novel S-phase expressed genes in Leishmania major, based on a post-transcriptional control element that was recognized in Crithidia fasciculata as involved in the cell cycle-dependent expression of several nuclear and mitochondrial S-phase expressed genes. Hypothesizing that a similar regulatory mechanism is manifested in L.major, we have applied a computational search for similar control elements in the genome of L.major. Our computational scan yielded 132 genes, of which 33% are homologues of known DNA metabolism genes and 63% lack any annotation. Experimental testing of seven of these genes revealed that their mRNAs cycle throughout the cell cycle, reaching a maximum level during S-phase or just prior to it. It is suggested that screening for post-transcriptional control elements associated with a specific function provides an efficient method for assigning functions to trypanosomatid genes. PMID:16052032

Zick, Aviad; Onn, Itay; Bezalel, Rachel; Margalit, Hanah; Shlomai, Joseph

2005-01-01

279

BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments  

Microsoft Academic Search

We present a new version of Babelomics, a com- plete suite of web tools for functional analysis of genome-scale experiments, with new and improved tools. New functionally relevant terms have been included such as CisRed motifs or bioentities obtained by text-mining procedures. An improved indexing has considerably speeded up several of the modules. An improved version of the FatiScan method

Fátima Al-shahrour; Pablo Minguez; Joaquín Tárraga; David Montaner; Eva Alloza; Juan M. Vaquerizas; Lucía Conde; Christian Blaschke; Javier Vera; Joaquín Dopazo

2006-01-01

280

Defining functional distances over Gene Ontology  

PubMed Central

Background A fundamental problem when trying to define the functional relationships between proteins is the difficulty in quantifying functional similarities, even when well-structured ontologies exist regarding the activity of proteins (i.e. 'gene ontology' -GO-). However, functional metrics can overcome the problems in the comparing and evaluating functional assignments and predictions. As a reference of proximity, previous approaches to compare GO terms considered linkage in terms of ontology weighted by a probability distribution that balances the non-uniform 'richness' of different parts of the Direct Acyclic Graph. Here, we have followed a different approach to quantify functional similarities between GO terms. Results We propose a new method to derive 'functional distances' between GO terms that is based on the simultaneous occurrence of terms in the same set of Interpro entries, instead of relying on the structure of the GO. The coincidence of GO terms reveals natural biological links between the GO functions and defines a distance model Df which fulfils the properties of a Metric Space. The distances obtained in this way can be represented as a hierarchical 'Functional Tree'. Conclusion The method proposed provides a new definition of distance that enables the similarity between GO terms to be quantified. Additionally, the 'Functional Tree' defines groups with biological meaning enhancing its utility for protein function comparison and prediction. Finally, this approach could be for function-based protein searches in databases, and for analysing the gene clusters produced by DNA array experiments. PMID:18221506

del Pozo, Angela; Pazos, Florencio; Valencia, Alfonso

2008-01-01

281

Comparative Omics-Driven Genome Annotation Refinement: Application across Yersiniae  

SciTech Connect

Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. To date, the perceived value of manual curation for genome annotations is not offset by the real cost and time associated with the process. In order to balance the large number of sequences generated, the annotation process is now performed almost exclusively in an automated fashion for most genome sequencing projects. One possible way to reduce errors inherent to automated computational annotations is to apply data from 'omics' measurements (i.e. transcriptional and proteomic) to the un-annotated genome with a proteogenomic-based approach. This approach does require additional experimental and bioinformatics methods to include omics technologies; however, the approach is readily automatable and can benefit from rapid developments occurring in those research domains as well. The annotation process can be improved by experimental validation of transcription and translation and aid in the discovery of annotation errors. Here the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species, as is becoming common in sequencing efforts. Transcriptomic and proteomic data derived from three highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis pestoides F, and Y. pseudotuberculosis PB1/+) was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 previously incorrect protein-coding sequences (e.g., observed frameshifts, extended start sites, and translated pseudogenes) within the three current Yersinia genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent pathogen, thus the discovery of many translated pseudogenes underscores a need for functional analyses to investigate hypotheses related to divergence. Refinements included the discovery of a seemingly essential ribosomal protein, several virulence-associated factors, and a transcriptional regulator, among other proteins, most of which are annotated as hypothetical, that were missed during annotation.

Rutledge, Alexandra C.; Jones, Marcus B.; Chauhan, Sadhana; Purvine, Samuel O.; Sanford, James; Monroe, Matthew E.; Brewer, Heather M.; Payne, Samuel H.; Ansong, Charles; Frank, Bryan C.; Smith, Richard D.; Peterson, Scott; Motin, Vladimir L.; Adkins, Joshua N.

2012-03-27

282

Protein surface analysis for function annotation in high-throughput structural genomics pipeline.  

PubMed

Structural genomics (SG) initiatives are expanding the universe of protein fold space by rapidly determining structures of proteins that were intentionally selected on the basis of low sequence similarity to proteins of known structure. Often these proteins have no associated biochemical or cellular functions. The SG success has resulted in an accelerated deposition of novel structures. In some cases the structural bioinformatics analysis applied to these novel structures has provided specific functional assignment. However, this approach has also uncovered limitations in the functional analysis of uncharacterized proteins using traditional sequence and backbone structure methodologies. A novel method, named pvSOAR (pocket and void Surface of Amino Acid Residues), of comparing the protein surfaces of geometrically defined pockets and voids was developed. pvSOAR was able to detect previously unrecognized and novel functional relationships between surface features of proteins. In this study, pvSOAR is applied to several structural genomics proteins. We examined the surfaces of YecM, BioH, and RpiB from Escherichia coli as well as the CBS domains from inosine-5'-monosphate dehydrogenase from Streptococcus pyogenes, conserved hypothetical protein Ta549 from Thermoplasm acidophilum, and CBS domain protein mt1622 from Methanobacterium thermoautotrophicum with the goal to infer information about their biochemical function. PMID:16322579

Binkowski, T Andrew; Joachimiak, Andrzej; Liang, Jie

2005-12-01

283

RNA sequencing reveals sexually dimorphic gene expression before gonadal differentiation in chicken and allows comprehensive annotation of the W-chromosome  

PubMed Central

Background Birds have a ZZ male: ZW female sex chromosome system and while the Z-linked DMRT1 gene is necessary for testis development, the exact mechanism of sex determination in birds remains unsolved. This is partly due to the poor annotation of the W chromosome, which is speculated to carry a female determinant. Few genes have been mapped to the W and little is known of their expression. Results We used RNA-seq to produce a comprehensive profile of gene expression in chicken blastoderms and embryonic gonads prior to sexual differentiation. We found robust sexually dimorphic gene expression in both tissues pre-dating gonadogenesis, including sex-linked and autosomal genes. This supports the hypothesis that sexual differentiation at the molecular level is at least partly cell autonomous in birds. Different sets of genes were sexually dimorphic in the two tissues, indicating that molecular sexual differentiation is tissue specific. Further analyses allowed the assembly of full-length transcripts for 26 W chromosome genes, providing a view of the W transcriptome in embryonic tissues. This is the first extensive analysis of W-linked genes and their expression profiles in early avian embryos. Conclusion Sexual differentiation at the molecular level is established in chicken early in embryogenesis, before gonadal sex differentiation. We find that the W chromosome is more transcriptionally active than previously thought, expand the number of known genes to 26 and present complete coding sequences for these W genes. This includes two novel W-linked sequences and three small RNAs reassigned to the W from the Un_Random chromosome. PMID:23531366

2013-01-01

284

Comparative validation of the D. melanogaster modENCODE transcriptome annotation.  

PubMed

Accurate gene model annotation of reference genomes is critical for making them useful. The modENCODE project has improved the D. melanogaster genome annotation by using deep and diverse high-throughput data. Since transcriptional activity that has been evolutionarily conserved is likely to have an advantageous function, we have performed large-scale interspecific comparisons to increase confidence in predicted annotations. To support comparative genomics, we filled in divergence gaps in the Drosophila phylogeny by generating draft genomes for eight new species. For comparative transcriptome analysis, we generated mRNA expression profiles on 81 samples from multiple tissues and developmental stages of 15 Drosophila species, and we performed cap analysis of gene expression in D. melanogaster and D. pseudoobscura. We also describe conservation of four distinct core promoter structures composed of combinations of elements at three positions. Overall, each type of genomic feature shows a characteristic divergence rate relative to neutral models, highlighting the value of multispecies alignment in annotating a target genome that should prove useful in the annotation of other high priority genomes, especially human and other mammalian genomes that are rich in noncoding sequences. We report that the vast majority of elements in the annotation are evolutionarily conserved, indicating that the annotation will be an important springboard for functional genetic testing by the Drosophila community. PMID:24985915

Chen, Zhen-Xia; Sturgill, David; Qu, Jiaxin; Jiang, Huaiyang; Park, Soo; Boley, Nathan; Suzuki, Ana Maria; Fletcher, Anthony R; Plachetzki, David C; FitzGerald, Peter C; Artieri, Carlo G; Atallah, Joel; Barmina, Olga; Brown, James B; Blankenburg, Kerstin P; Clough, Emily; Dasgupta, Abhijit; Gubbala, Sai; Han, Yi; Jayaseelan, Joy C; Kalra, Divya; Kim, Yoo-Ah; Kovar, Christie L; Lee, Sandra L; Li, Mingmei; Malley, James D; Malone, John H; Mathew, Tittu; Mattiuzzo, Nicolas R; Munidasa, Mala; Muzny, Donna M; Ongeri, Fiona; Perales, Lora; Przytycka, Teresa M; Pu, Ling-Ling; Robinson, Garrett; Thornton, Rebecca L; Saada, Nehad; Scherer, Steven E; Smith, Harold E; Vinson, Charles; Warner, Crystal B; Worley, Kim C; Wu, Yuan-Qing; Zou, Xiaoyan; Cherbas, Peter; Kellis, Manolis; Eisen, Michael B; Piano, Fabio; Kionte, Karin; Fitch, David H; Sternberg, Paul W; Cutter, Asher D; Duff, Michael O; Hoskins, Roger A; Graveley, Brenton R; Gibbs, Richard A; Bickel, Peter J; Kopp, Artyom; Carninci, Piero; Celniker, Susan E; Oliver, Brian; Richards, Stephen

2014-07-01

285

Annotation of Differentially Expressed Genes in the Somatic Embryogenesis of Musa and Their Location in the Banana Genome  

PubMed Central

Analysis of cDNA-AFLP was used to study the genes expressed in zygotic and somatic embryogenesis of Musa acuminata Colla ssp. malaccensis, and a comparison was made between their differential transcribed fragments (TDFs) and the sequenced genome of the double haploid- (DH-) Pahang of the malaccensis subspecies that is available in the network. A total of 253 transcript-derived fragments (TDFs) were detected with apparent size of 100–4000 bp using 5 pairs of AFLP primers, of which 21 were differentially expressed during the different stages of banana embryogenesis; 15 of the sequences have matched DH-Pahang chromosomes, with 7 of them being homologous to gene sequences encoding either known or putative protein domains of higher plants. Four TDF sequences were located in all Musa chromosomes, while the rest were located in one or two chromosomes. Their putative individual function is briefly reviewed based on published information, and the potential roles of these genes in embryo development are discussed. Thus the availability of the genome of Musa and the information of TDFs sequences presented here opens new possibilities for an in-depth study of the molecular and biochemical research of zygotic and somatic embryogenesis of Musa. PMID:24027442

Maldonado-Borges, Josefina Ines; Ku-Cauich, José Roberto; Escobedo-GraciaMedrano, Rosa Maria

2013-01-01

286

Current trend of annotating single nucleotide variation in humans - A case study on SNVrap.  

PubMed

As high throughput methods, such as whole genome genotyping arrays, whole exome sequencing (WES) and whole genome sequencing (WGS), have detected huge amounts of genetic variants associated with human diseases, function annotation of these variants is an indispensable step in understanding disease etiology. Large-scale functional genomics projects, such as The ENCODE Project and Roadmap Epigenomics Project, provide genome-wide profiling of functional elements across different human cell types and tissues. With the urgent demands for identification of disease-causal variants, comprehensive and easy-to-use annotation tool is highly in demand. Here we review and discuss current progress and trend of the variant annotation field. Furthermore, we introduce a comprehensive web portal for annotating human genetic variants. We use gene-based features and the latest functional genomics datasets to annotate single nucleotide variation (SNVs) in human, at whole genome scale. We further apply several function prediction algorithms to annotate SNVs that might affect different biological processes, including transcriptional gene regulation, alternative splicing, post-transcriptional regulation, translation and post-translational modifications. The SNVrap web portal is freely available at http://jjwanglab.org/snvrap. PMID:25308971

Li, Mulin Jun; Wang, Junwen

2014-10-13

287

Rice functionality, starch structure and the genes  

Technology Transfer Automated Retrieval System (TEKTRAN)

Through collaborative efforts among USDA scientists at Beaumont, Texas, we have gained in-depth knowledge of how rice functionality, i.e. the texture of the cooked rice, rice processing properties, and starch gelatinization temperature, are associated with starch-synthesis genes and starch structure...

288

Comparative Omics-Driven Genome Annotation Refinement: Application across Yersiniae  

PubMed Central

Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. The annotation process is now performed almost exclusively in an automated fashion to balance the large number of sequences generated. One possible way of reducing errors inherent to automated computational annotations is to apply data from omics measurements (i.e. transcriptional and proteomic) to the un-annotated genome with a proteogenomic-based approach. Here, the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species. Transcriptomic and proteomic data derived from highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis Pestoides F, and Y. pseudotuberculosis PB1/+) was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 incorrect (i.e., observed frameshifts, extended start sites, and translated pseudogenes) protein-coding sequences within the three current genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent pathogen, thus the discovery of many translated pseudogenes, including the insertion-ablated argD, underscores a need for functional analyses to investigate hypotheses related to divergence. Refinements included the discovery of a seemingly essential ribosomal protein, several virulence-associated factors, a transcriptional regulator, and many hypothetical proteins that were missed during annotation. PMID:22479471

Schrimpe-Rutledge, Alexandra C.; Jones, Marcus B.; Chauhan, Sadhana; Purvine, Samuel O.; Sanford, James A.; Monroe, Matthew E.; Brewer, Heather M.; Payne, Samuel H.; Ansong, Charles; Frank, Bryan C.; Smith, Richard D.; Peterson, Scott N.; Motin, Vladimir L.; Adkins, Joshua N.

2012-01-01

289

Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium  

SciTech Connect

Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. For example systems biology-oriented genome scale modeling efforts greatly benefit from accurate annotation of protein-coding genes to develop proper functioning models. However, determining protein-coding genes for most new genomes is almost completely performed by inference, using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. With the ability to directly measure peptides arising from expressed proteins, mass spectrometry-based proteomics approaches can be used to augment and verify coding regions of a genomic sequence and importantly detect post-translational processing events. In this study we utilized “shotgun” proteomics to guide accurate primary genome annotation of the bacterial pathogen Salmonella Typhimurium 14028 to facilitate a systems-level understanding of Salmonella biology. The data provides protein-level experimental confirmation for 44% of predicted protein-coding genes, suggests revisions to 48 genes assigned incorrect translational start sites, and uncovers 13 non-annotated genes missed by gene prediction programs. We also present a comprehensive analysis of post-translational processing events in Salmonella, revealing a wide range of complex chemical modifications (70 distinct modifications) and confirming more than 130 signal peptide and N-terminal methionine cleavage events in Salmonella. This study highlights several ways in which proteomics data applied during the primary stages of annotation can improve the quality of genome annotations, especially with regards to the annotation of mature protein products.

Ansong, Charles; Tolic, Nikola; Purvine, Samuel O.; Porwollik, Steffen; Jones, Marcus B.; Yoon, Hyunjin; Payne, Samuel H.; Martin, Jessica L.; Burnet, Meagan C.; Monroe, Matthew E.; Venepally, Pratap; Smith, Richard D.; Peterson, Scott; Heffron, Fred; Mcclelland, Michael; Adkins, Joshua N.

2011-08-25

290

TreeQ-VISTA: An Interactive Tree Visualization Tool withFunctional Annotation Query Capabilities  

SciTech Connect

Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.

Gu, Shengyin; Anderson, Iain; Kunin, Victor; Cipriano, Michael; Minovitsky, Simon; Weber, Gunther; Amenta, Nina; Hamann, Bernd; Dubchak,Inna

2007-05-07

291

Gene co-expression network and function modules in three types of glioma.  

PubMed

The aim of the present study was to identify the disease?associated genes and their functions involved in the development of three types of glioma (astrocytoma, glioblastoma and oligodendroglioma) with DNA microarray technology, and to analyze their differences and correlations. First, the gene expression profile GSE4290 was downloaded from the Gene Expression Omnibus database, then the probe?level data were pre?processed and the differentially expressed genes (DEGs) were identified with limma package in R language. Gene functions of the selected DEGs were further analyzed with the Database for Annotation, Visualization and Integrated Discovery. After the co?expression network of DEGs was constructed by Cytoscape, the functional modules were mined and enrichment analysis was performed, and then the similarities and differences between any two types of glioma were compared. A total of 1151 genes between normal and astrocytoma tissues, 684 genes between normal and malignant glioma tissues, and 551 genes between normal and oligodendroglioma tissues were filtered as DEGs, respectively. By constructing co?expression networks of DEGs, a total of 77232, 455 and 987 interactions were involved in the differentially co?expressed networks of astrocytoma, oligodendroglioma and glioblastoma, respectively. The functions of DEGs were consistent with the modules in astrocytoma, glioblastoma and oligodendroglioma, which were mainly enriched in neuron signal transmission, immune responses and synthesis of organic acids, respectively. Model functions of astrocytoma and glioblastoma were similar (mainly related with immune response), while the model functions of oligodendroglioma differed markedly from that of the other two types. The identification of the associations among these three types of glioma has potential clinical utility for improving the diagnosis of different types of glioma in the future. In addition, these results have marked significance in studying the underlying mechanisms, distinguishing between normal and cancer tissues, and examining novel therapeutic strategies for patients with glioma. PMID:25435164

Li, Gang; Pan, Weiran; Yang, Xiaoxiao; Miao, Jinming

2015-04-01

292

Clock genes, pancreatic function, and diabetes.  

PubMed

Circadian physiology is responsible for the temporal regulation of metabolism to optimize energy homeostasis throughout the day. Disturbances in the light/dark cycle, sleep/wake schedule, or feeding/activity behavior can affect the circadian function of the clocks located in the brain and peripheral tissues. These alterations have been associated with impaired glucose tolerance and type 2 diabetes. Animal models with molecular manipulation of clock genes and genetic studies in humans also support these links. It has been demonstrated that the endocrine pancreas has an intrinsic self-sustained clock, and recent studies have revealed an important role of clock genes in pancreatic ? cells, glucose homeostasis, and diabetes. PMID:25457619

Vieira, Elaine; Burris, Thomas P; Quesada, Ivan

2014-11-01

293

Metagenomic Annotation Networks: Construction and Applications  

PubMed Central

The derivation and comparison of biological interaction networks are vital for understanding the functional capacity and hierarchical organization of integrated microbial communities. In the current work we present metagenomic annotation networks as a novel taxonomy-free approach for understanding the functional architecture of metagenomes. Specifically, metagenomic operon predictions are exploited to derive functional interactions that are translated and categorized according to their associated functional annotations. The result is a collection of discrete networks of weighted annotation linkages. These networks are subsequently examined for the occurrence of annotation modules that portray the functional and organizational characteristics of various microbial communities. A variety of network perspectives and annotation categories are applied to recover a diverse range of modules with different degrees of annotative cohesiveness. Applications to biocatalyst discovery and human health issues are discussed, as well as the limitations of the current implementation. PMID:22879885

Vey, Gregory; Moreno-Hagelsieb, Gabriel

2012-01-01

294

Systematic Identification of Cell-Wall Related Genes in Populus Based on Analysis of Functional Modules in Co-Expression Network  

PubMed Central

The identification of novel genes relevant to plant cell wall (PCW) biosynthesis in Populus is a highly important and challenging problem. We surveyed candidate Populus cell wall genes using a non-targeted approach. First, a genome-wide Populus gene co-expression network (PGCN) was constructed using microarray data available in the public domain. Module detection was then performed, followed by gene ontology (GO) enrichment analysis, to assign the functional category to these modules. Based on GO annotation, the modules involved in PCW biosynthesis were then selected and analyzed in detail to annotate the candidate PCW genes in these modules, including gene annotation, expression of genes in different tissues, and so on. We examined the overrepresented cis-regulatory elements (CREs) in the gene promoters to understand the possible transcriptionally co-regulated relationships among the genes within the functional modules of cell wall biosynthesis. PGCN contains 6,854 nodes (genes) with 324,238 edges. The topological properties of the network indicate scale-free and modular behavior. A total of 435 modules were identified; among which, 67 modules were identified by overrepresented GO terms. Six modules involved in cell wall biosynthesis were identified. Module 9 was mainly involved in cellular polysaccharide metabolic process in the primary cell wall, whereas Module 4 comprises genes involved in secondary cell wall biogenesis. In addition, we predicted and analyzed 10 putative CREs in the promoters of the genes in Module 4 and Module 9. The non-targeted approach of gene network analysis and the data presented here can help further identify and characterize cell wall related genes in Populus. PMID:24736620

Cai, Bin; Li, Cheng-Hui; Huang, Jian

2014-01-01

295

Ranking Biomedical Annotations with Annotator's Semantic Relevancy  

PubMed Central

Biomedical annotation is a common and affective artifact for researchers to discuss, show opinion, and share discoveries. It becomes increasing popular in many online research communities, and implies much useful information. Ranking biomedical annotations is a critical problem for data user to efficiently get information. As the annotator's knowledge about the annotated entity normally determines quality of the annotations, we evaluate the knowledge, that is, semantic relationship between them, in two ways. The first is extracting relational information from credible websites by mining association rules between an annotator and a biomedical entity. The second way is frequent pattern mining from historical annotations, which reveals common features of biomedical entities that an annotator can annotate with high quality. We propose a weighted and concept-extended RDF model to represent an annotator, a biomedical entity, and their background attributes and merge information from the two ways as the context of an annotator. Based on that, we present a method to rank the annotations by evaluating their correctness according to user's vote and the semantic relevancy between the annotator and the annotated entity. The experimental results show that the approach is applicable and efficient even when data set is large. PMID:24899918

2014-01-01

296

Ranking biomedical annotations with annotator's semantic relevancy.  

PubMed

Biomedical annotation is a common and affective artifact for researchers to discuss, show opinion, and share discoveries. It becomes increasing popular in many online research communities, and implies much useful information. Ranking biomedical annotations is a critical problem for data user to efficiently get information. As the annotator's knowledge about the annotated entity normally determines quality of the annotations, we evaluate the knowledge, that is, semantic relationship between them, in two ways. The first is extracting relational information from credible websites by mining association rules between an annotator and a biomedical entity. The second way is frequent pattern mining from historical annotations, which reveals common features of biomedical entities that an annotator can annotate with high quality. We propose a weighted and concept-extended RDF model to represent an annotator, a biomedical entity, and their background attributes and merge information from the two ways as the context of an annotator. Based on that, we present a method to rank the annotations by evaluating their correctness according to user's vote and the semantic relevancy between the annotator and the annotated entity. The experimental results show that the approach is applicable and efficient even when data set is large. PMID:24899918

Wu, Aihua

2014-01-01

297

Sequence-function-stability relationships in proteins from datasets of functionally annotated variants: the case of TEM ?-lactamases.  

PubMed

A dataset of TEM lactamase variants with different substrate and inhibition profiles was compiled and analyzed. Trends show that loops are the main evolvable regions in these enzymes, gradually accumulating mutations to generate increasingly complex functions. Notably, many mutations present in evolved enzymes are also found in simpler variants, probably originating functional promiscuity. Following a function-stability tradeoff, the increase in functional complexity driven by accumulation of mutations fosters the incorporation of other stability-restoring substitutions, although our analysis suggests they might not be as "global" as generally accepted and seem instead specific to different networks of protein sites. Finally, we show how this dataset can be used to model functional changes in TEMs based on the physicochemical properties of the amino acids. PMID:22850115

Abriata, Luciano A; Salverda, Merijn L M; Tomatis, Pablo E

2012-09-21

298

Functional Annotation of Two New Carboxypeptidases from the Amidohydrolase Superfamily of Enzymes  

SciTech Connect

Two proteins from the amidohydrolase superfamily of enzymes were cloned, expressed, and purified to homogeneity. The first protein, Cc0300, was from Caulobacter crescentus CB-15 (Cc0300), while the second one (Sgx9355e) was derived from an environmental DNA sequence originally isolated from the Sargasso Sea (gi|44371129). The catalytic functions and the substrate profiles for the two enzymes were determined with the aid of combinatorial dipeptide libraries. Both enzymes were shown to catalyze the hydrolysis of l-Xaa-l-Xaa dipeptides in which the amino acid at the N-terminus was relatively unimportant. These enzymes were specific for hydrophobic amino acids at the C-terminus. With Cc0300, substrates terminating in isoleucine, leucine, phenylalanine, tyrosine, valine, methionine, and tryptophan were hydrolyzed. The same specificity was observed with Sgx9355e, but this protein was also able to hydrolyze peptides terminating in threonine. Both enzymes were able to hydrolyze N-acetyl and N-formyl derivatives of the hydrophobic amino acids and tripeptides. The best substrates identified for Cc0300 were l-Ala-l-Leu with kcat and kcat/Km values of 37 s-1 and 1.1 x 105 M-1 s-1, respectively, and N-formyl-l-Tyr with kcat and kcat/Km values of 33 s-1 and 3.9 x 105 M-1 s-1, respectively. The best substrate identified for Sgx9355e was l-Ala-l-Phe with kcat and kcat/Km values of 0.41 s-1 and 5.8 x 103 M-1 s-1. The three-dimensional structure of Sgx9355e was determined to a resolution of 2.33 Angstroms with l-methionine bound in the active site. The a-carboxylate of the methionine is ion-paired to His-237 and also hydrogen bonded to the backbone amide groups of Val-201 and Leu-202. The a-amino group of the bound methionine interacts with Asp-328. The structural determinants for substrate recognition were identified and compared with other enzymes in this superfamily that hydrolyze dipeptides with different specificities.

Xiang, D.; Xu, C; Kumaran, D; Brown, A; Sauder, M; Burley, S; Swaminathan, S; Raushel, F

2009-01-01

299

An atlas of bovine gene expression reveals novel distinctive tissue characteristics and evidence for improving genome annotation  

Microsoft Academic Search

ABSTRACT: BACKGROUND: A comprehensive transcriptome survey, or gene atlas, provides information essential for a complete understanding of the genomic biology of an organism. We present an atlas of RNA abundance for 92 adult, juvenile and fetal cattle tissues and three cattle cell lines. RESULTS: The Bovine Gene Atlas was generated from 7.2 million unique digital gene expression tag sequences (300.2

Gregory P Harhay; Timothy PL Smith; Leeson J Alexander; Christian D Haudenschild; John W Keele; Lakshmi K Matukumalli; Steven G Schroeder; Curtis P Van Tassell; Cathy R Gresham; Susan M Bridges; Shane C Burgess; Tad S Sonstegard

2010-01-01

300

Annotation of a hybrid partial genome of the coffee rust (Hemileia vastatrix) contributes to the gene repertoire catalog of the Pucciniales  

PubMed Central

Coffee leaf rust caused by the fungus Hemileia vastatrix is the most damaging disease to coffee worldwide. The pathogen has recently appeared in multiple outbreaks in coffee producing countries resulting in significant yield losses and increases in costs related to its control. New races/isolates are constantly emerging as evidenced by the presence of the fungus in plants that were previously resistant. Genomic studies are opening new avenues for the study of the evolution of pathogens, the detailed description of plant-pathogen interactions and the development of molecular techniques for the identification of individual isolates. For this purpose we sequenced 8 different H. vastatrix isolates using NGS technologies and gathered partial genome assemblies due to the large repetitive content in the coffee rust hybrid genome; 74.4% of the assembled contigs harbor repetitive sequences. A hybrid assembly of 333 Mb was built based on the 8 isolates; this assembly was used for subsequent analyses. Analysis of the conserved gene space showed that the hybrid H. vastatrix genome, though highly fragmented, had a satisfactory level of completion with 91.94% of core protein-coding orthologous genes present. RNA-Seq from urediniospores was used to guide the de novo annotation of the H. vastatrix gene complement. In total, 14,445 genes organized in 3921 families were uncovered; a considerable proportion of the predicted proteins (73.8%) were homologous to other Pucciniales species genomes. Several gene families related to the fungal lifestyle were identified, particularly 483 predicted secreted proteins that represent candidate effector genes and will provide interesting hints to decipher virulence in the coffee rust fungus. The genome sequence of Hva will serve as a template to understand the molecular mechanisms used by this fungus to attack the coffee plant, to study the diversity of this species and for the development of molecular markers to distinguish races/isolates. PMID:25400655

Cristancho, Marco A.; Botero-Rozo, David Octavio; Giraldo, William; Tabima, Javier; Riaño-Pachón, Diego Mauricio; Escobar, Carolina; Rozo, Yomara; Rivera, Luis F.; Durán, Andrés; Restrepo, Silvia; Eilam, Tamar; Anikster, Yehoshua; Gaitán, Alvaro L.

2014-01-01

301

Genome-Wide SNP Genotyping to Infer the Effects on Gene Functions in Tomato  

PubMed Central

The genotype data of 7054 single nucleotide polymorphism (SNP) loci in 40 tomato lines, including inbred lines, F1 hybrids, and wild relatives, were collected using Illumina's Infinium and GoldenGate assay platforms, the latter of which was utilized in our previous study. The dendrogram based on the genotype data corresponded well to the breeding types of tomato and wild relatives. The SNPs were classified into six categories according to their positions in the genes predicted on the tomato genome sequence. The genes with SNPs were annotated by homology searches against the nucleotide and protein databases, as well as by domain searches, and they were classified into the functional categories defined by the NCBI's eukaryotic orthologous groups (KOG). To infer the SNPs' effects on the gene functions, the three-dimensional structures of the 843 proteins that were encoded by the genes with SNPs causing missense mutations were constructed by homology modelling, and 200 of these proteins were considered to carry non-synonymous amino acid substitutions in the predicted functional sites. The SNP information obtained in this study is available at the Kazusa Tomato Genomics Database (http://plant1.kazusa.or.jp/tomato/). PMID:23482505

Hirakawa, Hideki; Shirasawa, Kenta; Ohyama, Akio; Fukuoka, Hiroyuki; Aoki, Koh; Rothan, Christophe; Sato, Shusei; Isobe, Sachiko; Tabata, Satoshi

2013-01-01

302

Computer-Based Annotation of Putative AraC/XylS-Family Transcription Factors of Known Structure but Unknown Function  

PubMed Central

Currently, about 20 crystal structures per day are released and deposited in the Protein Data Bank. A significant fraction of these structures is produced by research groups associated with the structural genomics consortium. The biological function of many of these proteins is generally unknown or not validated by experiment. Therefore, a growing need for functional prediction of protein structures has emerged. Here we present an integrated bioinformatics method that combines sequence-based relationships and three-dimensional (3D) structural similarity of transcriptional regulators with computer prediction of their cognate DNA binding sequences. We applied this method to the AraC/XylS family of transcription factors, which is a large family of transcriptional regulators found in many bacteria controlling the expression of genes involved in diverse biological functions. Three putative new members of this family with known 3D structure but unknown function were identified for which a probable functional classification is provided. Our bioinformatics analyses suggest that they could be involved in plant cell wall degradation (Lin2118 protein from Listeria innocua, PDB code 3oou), symbiotic nitrogen fixation (protein from Chromobacterium violaceum, PDB code 3oio), and either metabolism of plant-derived biomass or nitrogen fixation (protein from Rhodopseudomonas palustris, PDB code 3mn2). PMID:22505803

Schüller, Andreas; Slater, Alex W.; Norambuena, Tomás; Cifuentes, Juan J.; Almonacid, Leonardo I.; Melo, Francisco

2012-01-01

303

Neural Networks Approaches for Discovering the Learnable Correlation between Gene Function and Gene  

E-print Network

applications. Identifying gene function based on gene expression data is much easier in prokaryotes than eukaryotes due to the relatively simple structure of prokaryotes. Recent studies have shown in many ways, especially in Gene Therapy [18]. Identifying gene function in prokaryotes is much easier

Bonner, Anthony

304

ACID: annotation of cassette and integron data  

E-print Network

Background: Although integrons and their associated gene cassettes are present in ~10% of bacteria and can represent up to 3% of the genome in which they are found, very few have been properly identified and annotated in ...

Boucher, Yan

305

Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs  

Microsoft Academic Search

Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into

Y. Okazaki; M. Furuno; T. Kasukawa; J. Adachi; H. Bono; S. Kondo; I. Nikaido; N. Osato; R. Saito; H. Suzuki; I. Yamanaka; H. Kiyosawa; K. Yagi; Y. Tomaru; Y. Hasegawa; A. Nogami; C. Schönbach; T. Gojobori; R. Baldarelli; D. P. Hill; C. Bult; D. A. Hume; J. Quackenbush; L. M. Schriml; A. Kanapin; H. Matsuda; S. Batalov; K. W. Beisel; J. A. Blake; D. Bradt; V. Brusic; C. Chothia; L. E. Corbani; S. Cousins; E. Dalla; T. A. Dragani; C. F. Fletcher; A. Forrest; K. S. Frazer; T. Gaasterland; M. Gariboldi; C. Gissi; A. Godzik; J. Gough; S. Grimmond; S. Gustincich; N. Hirokawa; I. J. Jackson; E. D. Jarvis; A. Kanai; H. Kawaji; Y. Kawasawa; R. M. Kedzierski; B. L. King; A. Konagaya; I. V. Kurochkin; Y. Lee; B. Lenhard; P. A. Lyons; D. R. Maglott; L. Maltais; L. Marchionni; L. McKenzie; H. Miki; T. Nagashima; K. Numata; T. Okido; W. J. Pavan; G. Pertea; G. Pesole; N. Petrovsky; R. Pillai; J. U. Pontius; D. Qi; S. Ramachandran; T. Ravasi; J. C. Reed; D. J. Reed; J. Reid; B. Z. Ring; M. Ringwald; A. Sandelin; C. Schneider; C. A. M. Semple; M. Setou; K. Shimada; R. Sultana; Y. Takenaka; M. S. Taylor; R. D. Teasdale; M. Tomita; R. Verardo; L. Wagner; C. Wahlestedt; Y. Wang; Y. Watanabe; C. Wells; L. G. Wilming; A. Wynshaw-Boris; M. Yanagisawa; I. Yang; L. Yang; Z. Yuan; M. Zavolan; Y. Zhu; A. Zimmer; P. Carninci; N. Hayatsu; T. Hirozane-Kishikawa; H. Konno; M. Nakamura; N. Sakazume; K. Sato; T. Shiraki; K. Waki; J. Kawai; K. Aizawa; T. Arakawa; S. Fukuda; A. Hara; W. Hashizume; K. Imotani; Y. Ishii; M. Itoh; I. Kagawa; A. Miyazaki; K. Sakai; D. Sasaki; K. Shibata; A. Shinagawa; A. Yasunishi; M. Yoshino; R. Waterston; E. S. Lander; J. Rogers; E. Birney; Y. Hayashizaki

2002-01-01

306

An atlas of bovine gene expression reveals novel distinctive tissue characteristics and evidence for improving genome annotation  

Technology Transfer Automated Retrieval System (TEKTRAN)

Background A comprehensive transcriptome survey, or gene atlas, provides information essential for a complete understanding of the genomic biology of an organism. We present an atlas of RNA abundance for 92 adult, juvenile and fetal cattle tissues and three cattle cell lines. Results The Bovine Gene...

307

MAGPIE/EGRET Annotation of the 2.9-Mb Drosophila melanogaster Adh Region  

PubMed Central

Our challenge in annotating the 2.91-Mb Adh region of the Drosophila melanogaster genome was to identify genetic and genomic features automatically, completely, and precisely within a 6-week period. To do so, we augmented the MAGPIE microbial genome annotation system to handle eukaryotic genomic sequence data. The new configuration required the integration of eukaryotic gene-finding tools and DNA repeat tools into the automatic data collection module. It also required us to define in MAGPIE new strategies to combine data about eukaryotic exon predictions with functional data to refine the exon predictions. At the heart of the resulting new eukaryotic genome annotation system is a reverse comparison of public protein and complementary DNA sequences against the input genome to identify missing exons and to refine exon boundaries. The software modules that add eukaryotic genome annotation capability to MAGPIE are available as EGRET (Eukaryotic Genome Rapid Evaluation Tool). PMID:10779489

Gaasterland, Terry; Sczyrba, Alexander; Thomas, Elizabeth; Aytekin-Kurban, Gulriz; Gordon, Paul; Sensen, Christoph W.

2000-01-01

308

A Prototype System for Retrieval of Gene Functional Information  

PubMed Central

Microarrays allow researchers to gather data about the expression patterns of thousands of genes simultaneously. Statistical analysis can reveal which genes show statistically significant results. Making biological sense of those results requires the retrieval of functional information about the genes thus identified, typically a manual gene-by-gene retrieval of information from various on-line databases. For experiments generating thousands of genes of interest, retrieval of functional information can become a significant bottleneck. To address this issue, we are currently developing a prototype system to automate the process of retrieval of functional information from multiple on-line sources. PMID:14728346

Folk, Lillian C.; Patrick, Timothy B.; Pattison, James S.; Wolfinger, Russell D.; Mitchell, Joyce A.

2003-01-01

309

Functional genomics and molecular networks Gene expression regulations  

E-print Network

ontology categorization has allowed a more functional view on the list of differentially ex- pressed genes. Gene ontology now groups 28 154 terms with 19 913 for biological_process, 2 775 for cellular gene expression studies have always the same question in mind which is: what are the genes dif

310

Information Content-Based Gene Ontology Functional Similarity Measures: Which One to Use for a Given Biological Data Type?  

PubMed Central

The current increase in Gene Ontology (GO) annotations of proteins in the existing genome databases and their use in different analyses have fostered the improvement of several biomedical and biological applications. To integrate this functional data into different analyses, several protein functional similarity measures based on GO term information content (IC) have been proposed and evaluated, especially in the context of annotation-based measures. In the case of topology-based measures, each approach was set with a specific functional similarity measure depending on its conception and applications for which it was designed. However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration. We show that, in general, a specific functional similarity measure often used with a given term IC or term semantic similarity approach is not always the best for different biological data and applications. We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach. The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration. PMID:25474538

Mazandu, Gaston K.; Mulder, Nicola J.

2014-01-01

311

Classification using functional data analysis for temporal gene expression data  

Microsoft Academic Search

Motivation: Temporal gene expression profiles provide an important characterization of gene function, as biological systems are pre- dominantly developmental and dynamic. We propose a method of classifying collections of temporal gene expression curves in which individual expression profiles are modeled as independent realizati- ons of a stochastic process. The method uses a recently developed functional logistic regression tool based on

Xiaoyan Leng; Hans-georg Müller

2006-01-01

312

INVESTIGATION Gene Functional Trade-Offs and the Evolution  

E-print Network

INVESTIGATION Gene Functional Trade-Offs and the Evolution of Pleiotropy Frédéric Guillaume*,1, Vancouver, British Columbia, Canada V6T 1Z4 ABSTRACT Pleiotropy is the property of genes affecting multiple functions or characters of an organism. Genes vary widely in their degree of pleiotropy, but this variation

Otto, Sarah

313

Mining Association Rules among Gene Functions in Clusters of Similar Gene Expression Maps  

PubMed Central

Association rules mining methods have been recently applied to gene expression data analysis to reveal relationships between genes and different conditions and features. However, not much effort has focused on detecting the relation between gene expression maps and related gene functions. Here we describe such an approach to mine association rules among gene functions in clusters of similar gene expression maps on mouse brain. The experimental results show that the detected association rules make sense biologically. By inspecting the obtained clusters and the genes having the gene functions of frequent itemsets, interesting clues were discovered that provide valuable insight to biological scientists. Moreover, discovered association rules can be potentially used to predict gene functions based on similarity of gene expression maps.

An, Li; Obradovic, Zoran; Smith, Desmond; Bodenreider, Olivier; Megalooikonomou, Vasileios

2015-01-01

314

MicroRNA expression profiling and functional annotation analysis of their targets associated with the malignant transformation of oral leukoplakia.  

PubMed

In spite the tremendous achievements that have been acquired in the field of molecular biology, the underlying mechanism associated with malignant transformed oral leukoplakia (OLK) is still unclear and poorly understood. The aim of this study is to investigate the microRNA (miRNA) expression profiles in OLK and its aggressive transformed tissues from the white lesion of human oral mucosa. The original miRNA expression dataset was downloaded from Gene Expression Omnibus (GEO) database and differentially expressed miRNAs were identified using two-sample t test method. Unsupervised hierarchical clustering and principal component analysis of these differentially expressed miRNAs indicated that 38-miRNA candidates could significantly discriminate OLK from malignant transformed oral mucosa samples. Besides, potential transcription factors were predicted using CyTargetLinker plugin and the miRNA-mRNA regulatory network associated with the malignant pathogenesis was visualized in Cytoscape environment. Totally, 3-miRNA signatures (miR-129-5p, miR-339-5p and miR-31*) were found to be hubs that mediated the initiation and progression of OLK from the non-malignant to the aggressive one via targeting various transcription factors. Functional enrichment analysis based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) suggested that the dysregulation of immune response was responsible for oral carcinogenesis. In conclusion, we constructed a miRNA-mRNA regulatory network associated with the malignant transformation of OLK, and screened out some miRNAs and transcription factors that may have prominent roles during OLK malignant progression. PMID:25576219

Maimaiti, Aikebaier; Abudoukeremu, Kaisaier; Tie, Lu; Pan, Yan; Li, Xuejun

2015-03-10

315

Research article The functional landscape of mouse gene expression  

E-print Network

organisms such as yeast. Although the idea that tissue-specific expression is indicative of gene function microarray expression data for nearly 40,000 known and predicted mRNAs in 55 mouse tissues, using custom function. Hundreds of functional categories, as defined by Gene Ontology `Biological Processes

Frey, Brendan J.

316

Towards an Informative Mutant Phenotype for Every Bacterial Gene  

PubMed Central

Mutant phenotypes provide strong clues to the functions of the underlying genes and could allow annotation of the millions of sequenced yet uncharacterized bacterial genes. However, it is not known how many genes have a phenotype under laboratory conditions, how many phenotypes are biologically interpretable for predicting gene function, and what experimental conditions are optimal to maximize the number of genes with a phenotype. To address these issues, we measured the mutant fitness of 1,586 genes of the ethanol-producing bacterium Zymomonas mobilis ZM4 across 492 diverse experiments and found statistically significant phenotypes for 89% of all assayed genes. Thus, in Z. mobilis, most genes have a functional consequence under laboratory conditions. We demonstrate that 41% of Z. mobilis genes have both a strong phenotype and a similar fitness pattern (cofitness) to another gene, and are therefore good candidates for functional annotation using mutant fitness. Among 502 poorly characterized Z. mobilis genes, we identified a significant cofitness relationship for 174. For 57 of these genes without a specific functional annotation, we found additional evidence to support the biological significance of these gene-gene associations, and in 33 instances, we were able to predict specific physiological or biochemical roles for the poorly characterized genes. Last, we identified a set of 79 diverse mutant fitness experiments in Z. mobilis that are nearly as biologically informative as the entire set of 492 experiments. Therefore, our work provides a blueprint for the functional annotation of diverse bacteria using mutant fitness. PMID:25112473

Deutschbauer, Adam; Price, Morgan N.; Wetmore, Kelly M.; Tarjan, Daniel R.; Xu, Zhuchen; Shao, Wenjun; Leon, Dacia

2014-01-01

317

Neural networks approaches for discovering the learnable correlation between gene function and gene expression in mouse  

E-print Network

function based on gene expression data is much easier in prokaryotes than eukaryotes due to the relatively simple structure of prokaryotes. Recent studies have shown that there is a strong learnable correlation, especially in gene therapy [18]. Identifying gene function in prokaryotes is much easier than eukaryotes due

Morris, Quaid

318

Speeding disease gene discovery by sequence based candidate prioritization   

E-print Network

Background: Regions of interest identified through genetic linkage studies regularly exceed 30 centimorgans in size and can contain hundreds of genes. Traditionally this number is reduced by matching functional annotation ...

Adie, Euan A; Adams, Richard R; Evans, Kathryn L; Porteous, David; Pickard, Ben S

2005-03-14

319

Integration of biological networks and gene expression data using Cytoscape  

E-print Network

) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types measurement of expression profiles and functional interactions from the cells and tissues of many different

320

Global functional proling of gene expression  

Microsoft Academic Search

The typical result of a microarray experiment is a list of tens or hundreds of genes found to be dieren tially regulated in the condition under study. Independently of the methods used to select these genes, the common task faced by any researcher is to translate these lists of genes into a better understanding of the biological phenomena involved. Currently,

Sorin Draghici; Purvesh Khatri; Rui P. Martins; G. Charles Ostermeier; Stephen A. Krawetz

321

Writing Annotation Instructions Janyce Wiebe  

E-print Network

Writing Annotation Instructions Janyce Wiebe Department of Computer Science and the Computing Strategies for Writing Annotation Instructions In two corpus annotation projects, we followed similar strategies for developing annotation instructions and obtained good inter-coder reliability results for both

Wiebe, Janyce M.

322

Identification and Evaluation of Functional Modules in Gene Coexpression Networks  

Microsoft Academic Search

Identifying gene functional modules is an im- portant step towards elucidating gene func- tions at a global scale. In this paper, we introduce a simple method to construct gene co-expression networks from microarray data, and then propose an efficient spectral clustering algorithm to identify natural com- munities, which are relatively densely con- nected sub-graphs, in the network. To as- sess

Jianhua Ruan; Weixiong Zhang

2006-01-01

323

Tissue-specific functions based on information content of gene ontology using cap analysis gene expression  

Microsoft Academic Search

Gene expressions differ depending on tissue types and developmental stages. Analyzing how each gene is expressed is thus important.\\u000a One way of analyzing gene expression patterns is to identify tissue-specific functions. This is useful for understanding how\\u000a vital activities are performed. DNA microarray has been widely used to observe gene expressions exhaustively. However, comparing\\u000a the expression value of a gene

Sami Maekawa; Atsuko Matsumoto; Yoichi Takenaka; Hideo Matsuda

2007-01-01

324

Annotation of the Drosophila melanogaster euchromatic genome: a systematic review  

Microsoft Academic Search

BACKGROUND: The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation

Sima Misra; Madeline A Crosby; Christopher J Mungall; Beverley B Matthews; Kathryn S Campbell; Pavel Hradecky; Yanmei Huang; Joshua S Kaminker; Gillian H Millburn; Simon E Prochnik; Christopher D Smith; Jonathan L Tupy; Eleanor J Whitfield; Leyla Bayraktaroglu; Benjamin P Berman; Brian R Bettencourt; Susan E Celniker; Aubrey DNJ de Grey; Rachel A Drysdale; Nomi L Harris; John Richter; Susan Russo; Andrew J Schroeder; ShengQiang Shu; Mark Stapleton; Chihiro Yamada; Michael Ashburner; William M Gelbart; Gerald M Rubin; Suzanna E Lewis

2002-01-01

325

Teachers Reference: Annotations  

NSDL National Science Digital Library

This collection of 171 annotations was written to enhance and explain the text of the book 'Stone Wall Secrets'. Each annotation consists of a number that refers specifically to the phrase preceding it. Each annotation number is followed by three indexing elements: subject category, one or more keywords, and one or more sample questions with answers.

326

Geochip-Based Functional Gene Analysis of Anodophilic  

E-print Network

that the functional and phylogenetic diversity of MEC microbial communities after 4 months was quite high despite microbial diversity. Multivariateanalysesshowedthatcommunitiesthatdeveloped in the MECs were well separatedGeochip-Based Functional Gene Analysis of Anodophilic Communities in Microbial Electrolysis Cells

327

Gametocidal genes in wheat and its relatives. IV. Functional relationships between six gametocidal genes.  

PubMed

Gametocidal (Gc) genes in Aegilops species are known to cause gamete abortion and chromosome breakage when they are introduced into the wheat genetic background. Interactions of five Gc genes so far identified were investigated by analysis of wheat hybrids among lines carrying different gametocidal genes. As a result, the genes were classified into three functional groups. The first group includes two Gc genes of Ae. speltoides (Gc1a and Gc1b) and one gene (Gc-Sl3) on chromosome 2S1 of Ae. sharonensis. These genes were hypostatic to the genes (Gc-Sl1, Gc-Sl2) on chromosome 4S1 of Ae. longissima and Ae. sharonensis, which constitute the second group. In addition, plants carrying Gc genes of both the first and the second group produced progeny with higher frequencies of chromosome breakage than those found in the progeny of single gene carriers. It was concluded that there were specific interactions between these genes to enhance chromosome breakage. On the other hand, there was no interaction between the Gc gene (Gc-C) of Ae. triuncialis, the third group, and Gc genes belonging to the former two groups. These functional groups might be a reflection of the mechanisms by which Gc genes induce gamete abortion and chromosome breakage. Based on functional and local relationships, the symbols of the Gc genes were systematically redesignated. PMID:18470167

Tsujimoto, H

1995-04-01

328

Range of potential functions of the Drosophila melanogaster hdc gene  

Microsoft Academic Search

The Drosophila melanogaster hdc gene controls trachea branching, which starts during embryo development. Expression in imaginal disks and reproductive organs\\u000a suggests additional functions for the hdc gene. The gene was demonstrated to have a maternal effect, which was denied previously. Analysis of cell proliferation in\\u000a imaginal disks with hdc mutations showed that the gene does not possess tumor suppressor properties

A. M. Gusachenko; E. M. Akhmamet’eva; L. V. Omel’yanchuk

2006-01-01

329

Recent achievement in gene cloning and functional genomics in soybean.  

PubMed

Soybean is a model plant for photoperiodism as well as for symbiotic nitrogen fixation. However, a rather low efficiency in soybean transformation hampers functional analysis of genes isolated from soybean. In comparison, rapid development and progress in flowering time and photoperiodic response have been achieved in Arabidopsis and rice. As the soybean genomic information has been released since 2008, gene cloning and functional genomic studies have been revived as indicated by successfully characterizing genes involved in maturity and nematode resistance. Here, we review some major achievements in the cloning of some important genes and some specific features at genetic or genomic levels revealed by the analysis of functional genomics of soybean. PMID:24311973

Xia, Zhengjun; Zhai, Hong; Lü, Shixiang; Wu, Hongyan; Zhang, Yupeng

2013-01-01

330

Gene expression modeling through positive boolean functions  

Microsoft Academic Search

Abstract In the framework of gene expression data analysis, the selection of biologically rel- evant sets of genes and the discovery of new subclasses of diseases at bio-molecular level represent two significant problems. Unfortunately, in both cases the correct solution is usually unknown,and the evaluation of the performance of gene selection and clustering methods is dicult,and in many cases unfeasible.

Francesca Ruffino; Marco Muselli; Giorgio Valentini

2008-01-01

331

Biosynthesis of Akaeolide and Lorneic Acids and Annotation of Type I Polyketide Synthase Gene Clusters in the Genome of Streptomyces sp. NPS554.  

PubMed

The incorporation pattern of biosynthetic precursors into two structurally unique polyketides, akaeolide and lorneic acid A, was elucidated by feeding experiments with 13C-labeled precursors. In addition, the draft genome sequence of the producer, Streptomyces sp. NPS554, was performed and the biosynthetic gene clusters for these polyketides were identified. The putative gene clusters contain all the polyketide synthase (PKS) domains necessary for assembly of the carbon skeletons. Combined with the 13C-labeling results, gene function prediction enabled us to propose biosynthetic pathways involving unusual carbon-carbon bond formation reactions. Genome analysis also indicated the presence of at least ten orphan type I PKS gene clusters that might be responsible for the production of new polyketides. PMID:25603349

Zhou, Tao; Komaki, Hisayuki; Ichikawa, Natsuko; Hosoyama, Akira; Sato, Seizo; Igarashi, Yasuhiro

2015-01-01

332

Biosynthesis of Akaeolide and Lorneic Acids and Annotation of Type I Polyketide Synthase Gene Clusters in the Genome of Streptomyces sp. NPS554  

PubMed Central

The incorporation pattern of biosynthetic precursors into two structurally unique polyketides, akaeolide and lorneic acid A, was elucidated by feeding experiments with 13C-labeled precursors. In addition, the draft genome sequence of the producer, Streptomyces sp. NPS554, was performed and the biosynthetic gene clusters for these polyketides were identified. The putative gene clusters contain all the polyketide synthase (PKS) domains necessary for assembly of the carbon skeletons. Combined with the 13C-labeling results, gene function prediction enabled us to propose biosynthetic pathways involving unusual carbon-carbon bond formation reactions. Genome analysis also indicated the presence of at least ten orphan type I PKS gene clusters that might be responsible for the production of new polyketides. PMID:25603349

Zhou, Tao; Komaki, Hisayuki; Ichikawa, Natsuko; Hosoyama, Akira; Sato, Seizo; Igarashi, Yasuhiro

2015-01-01

333

Annotating Enzymes of Uncertain Function: The Deacylation of d-Amino Acids by Members of the Amidohydrolase Superfamily  

SciTech Connect

The catalytic activities of three members of the amidohydrolase superfamily were discovered using amino acid substrate libraries. Bb3285 from Bordetella bronchiseptica, Gox1177 from Gluconobacter oxidans, and Sco4986 from Streptomyces coelicolor are currently annotated as d-aminoacylases or N-acetyl-d-glutamate deacetylases. These three enzymes are 22-34% identical to one another in amino acid sequence. Substrate libraries containing nearly all combinations of N-formyl-d-Xaa, N-acetyl-d-Xaa, N-succinyl-d-Xaa, and l-Xaa-d-Xaa were used to establish the substrate profiles for these enzymes. It was demonstrated that Bb3285 is restricted to the hydrolysis of N-acyl-substituted derivatives of d-glutamate. The best substrates for this enzyme are N-formyl-d-glutamate (k{sub cat}/K{sub m} = 5.8 x 10{sup 6} M{sup -1} s{sup -1}), N-acetyl-d-glutamate (k{sub cat}/K{sub m} = 5.2 x 10{sup 6} M{sup -1} s{sup -1}), and l-methionine-d-glutamate (k{sub cat}/K{sub m} = 3.4 x 10{sup 5} M{sup -1} s{sup -1}). Gox1177 and Sco4986 preferentially hydrolyze N-acyl-substituted derivatives of hydrophobic d-amino acids. The best substrates for Gox1177 are N-acetyl-d-leucine (k{sub cat}/K{sub m} = 3.2 x 104 M{sup -1} s-1), N-acetyl-d-tryptophan (kcat/Km = 4.1 x 104 M-1 s-1), and l-tyrosine-d-leucine (kcat/Km = 1.5 x 104 M-1 s-1). A fourth protein, Bb2785 from B. bronchiseptica, did not have d-aminoacylase activity. The best substrates for Sco4986 are N-acetyl-d-phenylalanine and N-acetyl-d-tryptophan. The three-dimensional structures of Bb3285 in the presence of the product acetate or a potent mimic of the tetrahedral intermediate were determined by X-ray diffraction methods. The side chain of the d-glutamate moiety of the inhibitor is ion-paired to Arg-295, while the {alpha}-carboxylate is ion-paired with Lys-250 and Arg-376. These results have revealed the chemical and structural determinants for substrate specificity in this protein. Bioinformatic analyses of an additional {approx}250 sequences identified as members of this group suggest that there are no simple motifs that allow prediction of substrate specificity for most of these unknowns, highlighting the challenges for computational annotation of some groups of homologous proteins.

Cummings, J.; Fedorov, A; Xu, C; Brown, S; Fedorov, E; Babbitt, P; Almo, S; Raushel, F

2009-01-01

334

Annotating Enzymes of Uncertain Function: The Deacylation of d-Amino Acids by Members of the Amidohydrolase Superfamily†  

PubMed Central

The catalytic activities of three members of the amidohydrolase superfamily were discovered using amino acid substrate libraries. Bb3285 from Bordetella bronchiseptica, Gox1177 from Gluconobacter oxydans, and Sco4986 from Streptomyces coelicolor are currently annotated as d-aminoacylases or N-acetyl-d-glutamate deacetylases. These three enzymes are 22?34% identical to one another in amino acid sequence. Substrate libraries containing nearly all combinations of N-formyl-d-Xaa, N-acetyl-d-Xaa, N-succinyl-d-Xaa, and l-Xaa-d-Xaa were used to establish the substrate profiles for these enzymes. It was demonstrated that Bb3285 is restricted to the hydrolysis of N-acyl substituted derivatives of d-glutamate. The best substrates for this enzyme are N-formyl-d-glutamate (kcat/Km = 5.8 × 106 M?1 s?1), N-acetyl-d-glutamate (kcat/Km = 5.2 × 106 M?1 s?1) and l-methionine-d-glutamate (kcat/Km = 3.4 × 105 M?1 s?1). Gox1177 and Sco4986 preferentially hydrolyze N-acyl substituted derivatives of hydrophobic d-amino acids. The best substrates for Gox1177 are N-acetyl-d-leucine (kcat/Km = 3.2 × 104 M?1 s?1), N-acetyl-d-tryptophan (kcat/Km = 4.1 × 104 M?1 s?1) and l-tyrosine-d-leucine (kcat/Km = 1.5 × 104 M?1 s?1). A fourth protein, Bb2785 from B. bronchiseptica, did not have d-aminoacylase activity. The best substrates for Sco4986 are N-acetyl-d-phenylalanine and N-acetyl-d-tryptophan. The three-dimensional structures of Bb3285 in the presence of the product acetate or a potent mimic of the tetrahedral intermediate were determined by X-ray diffraction methods. The side chain of the d-glutamate moiety of the inhibitor is ion-paired to Arg-295 while the ?-carboxylate is ion-paired with Lys-250 and Arg-376. These results have revealed the chemical and structural determinants for substrate specificity in this protein. Bioinformatic analyses of an additional ?250 sequences identified as members of this group suggest that there are no simple motifs that allow prediction of substrate specificity for most of these unknowns, highlighting the challenges for computational annotation of some groups of homologous proteins. PMID:19518059

Cummings, Jennifer; Fedorov, Alexander A.; Xu, Chengfu; Brown, Shoshana; Fedorov, Elena; Babbitt, Patricia C.; Almo, Steven C.; Raushel, Frank M.

2009-01-01

335

Short communication The MitoDrome database annotates and compares the OXPHOS nuclear genes of Drosophila melanogaster, Drosophila pseudoobscura and Anopheles gambiae  

Microsoft Academic Search

The oxidative phosphorylation (OXPHOS) is the primary energy-producing process of all aerobic organisms and the only cellular function under the dual control of both the mitochondrial and the nuclear genomes. Functional characterization and evolutionary study of the OXPHOS system is of great importance for the understanding of many as yet unclear aspects of nucleus-mitochondrion genomic co-evolution and co-regulation gene networks.

Domenica D'Elia; Domenico Catalano; Flavio Licciulli; Antonio Turi; Gaetano Tripoli; Damiano Porcelli; Cecilia Saccone; Corrado Caggese

336

The Distributed Annotation System  

PubMed Central

Background Currently, most genome annotation is curated by centralized groups with limited resources. Efforts to share annotations transparently among multiple groups have not yet been satisfactory. Results Here we introduce a concept called the Distributed Annotation System (DAS). DAS allows sequence annotations to be decentralized among multiple third-party annotators and integrated on an as-needed basis by client-side software. The communication between client and servers in DAS is defined by the DAS XML specification. Annotations are displayed in layers, one per server. Any client or server adhering to the DAS XML specification can participate in the system; we describe a simple prototype client and server example. Conclusions The DAS specification is being used experimentally by Ensembl, WormBase, and the Berkeley Drosophila Genome Project. Continued success will depend on the readiness of the research community to adopt DAS and provide annotations. All components are freely available from the project website . PMID:11667947

Dowell, Robin D; Jokerst, Rodney M; Day, Allen; Eddy, Sean R; Stein, Lincoln

2001-01-01

337

The Neural\\/Immune Gene Ontology: clipping the Gene Ontology for neurological and immunological systems  

Microsoft Academic Search

BACKGROUND: The Gene Ontology (GO) is used to describe genes and gene products from many organisms. When used for functional annotation of microarray data, GO is often slimmed by editing so that only higher level terms remain. This practice is designed to improve the summarizing of experimental results by grouping high level terms and the statistical power of GO term

Nophar Geifman; Alon Monsonego; Eitan Rubin

2010-01-01

338

Towards the identification of protein complexes and functional modules by integrating PPI network and gene expression data  

PubMed Central

Background Identification of protein complexes and functional modules from protein-protein interaction (PPI) networks is crucial to understanding the principles of cellular organization and predicting protein functions. In the past few years, many computational methods have been proposed. However, most of them considered the PPI networks as static graphs and overlooked the dynamics inherent within these networks. Moreover, few of them can distinguish between protein complexes and functional modules. Results In this paper, a new framework is proposed to distinguish between protein complexes and functional modules by integrating gene expression data into protein-protein interaction (PPI) data. A series of time-sequenced subnetworks (TSNs) is constructed according to the time that the interactions were activated. The algorithm TSN-PCD was then developed to identify protein complexes from these TSNs. As protein complexes are significantly related to functional modules, a new algorithm DFM-CIN is proposed to discover functional modules based on the identified complexes. The experimental results show that the combination of temporal gene expression data with PPI data contributes to identifying protein complexes more precisely. A quantitative comparison based on f-measure reveals that our algorithm TSN-PCD outperforms the other previous protein complex discovery algorithms. Furthermore, we evaluate the identified functional modules by using “Biological Process” annotated in GO (Gene Ontology). The validation shows that the identified functional modules are statistically significant in terms of “Biological Process”. More importantly, the relationship between protein complexes and functional modules are studied. Conclusions The proposed framework based on the integration of PPI data and gene expression data makes it possible to identify protein complexes and functional modules more effectively. Moveover, the proposed new framework and algorithms can distinguish between protein complexes and functional modules. Our findings suggest that functional modules are closely related to protein complexes and a functional module may consist of one or multiple protein complexes. The program is available at http://netlab.csu.edu.cn/bioinfomatics/limin/DFM-CIN/index.html. PMID:22621308

2012-01-01

339

SemFunSim: A New Method for Measuring Disease Similarity by Integrating Semantic and Gene Functional Association  

PubMed Central

Background Measuring similarity between diseases plays an important role in disease-related molecular function research. Functional associations between disease-related genes and semantic associations between diseases are often used to identify pairs of similar diseases from different perspectives. Currently, it is still a challenge to exploit both of them to calculate disease similarity. Therefore, a new method (SemFunSim) that integrates semantic and functional association is proposed to address the issue. Methods SemFunSim is designed as follows. First of all, FunSim (Functional similarity) is proposed to calculate disease similarity using disease-related gene sets in a weighted network of human gene function. Next, SemSim (Semantic Similarity) is devised to calculate disease similarity using the relationship between two diseases from Disease Ontology. Finally, FunSim and SemSim are integrated to measure disease similarity. Results The high average AUC (area under the receiver operating characteristic curve) (96.37%) shows that SemFunSim achieves a high true positive rate and a low false positive rate. 79 of the top 100 pairs of similar diseases identified by SemFunSim are annotated in the Comparative Toxicogenomics Database (CTD) as being targeted by the same therapeutic compounds, while other methods we compared could identify 35 or less such pairs among the top 100. Moreover, when using our method on diseases without annotated compounds in CTD, we could confirm many of our predicted candidate compounds from literature. This indicates that SemFunSim is an effective method for drug repositioning. PMID:24932637

Ju, Peng; Peng, Jiajie; Wang, Yadong

2014-01-01

340

Learning from Multiple Annotators with Gaussian Processes  

E-print Network

, annotators can be radiologists [1, 2] who provide a subjective (possible noisy) opinion about a suspicious and outputs occurring in D. We assume that D is generated by an unknown function f : RD R where

Groot, Perry

341

Using deep RNA sequencing for the structural annotation of the laccaria bicolor mycorrhizal transcriptome.  

SciTech Connect

Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derived from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. 69% of expressed mycorrhizal JGI 'best' gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided that there is a sequenced genome and a set of gene models.

Larsen, P. E.; Trivedi, G.; Sreedasyam, A.; Lu, V.; Podila, G. K.; Collart, F. R.; Biosciences Division; Univ. of Alabama

2010-07-06

342

FUNCTIONAL NANOPARTICLES FOR MOLECULAR IMAGING GUIDED GENE DELIVERY  

PubMed Central

Gene therapy has great potential to bring tremendous changes in treatment of various diseases and disorders. However, one of the impediments to successful gene therapy is the inefficient delivery of genes to target tissues and the inability to monitor delivery of genes and therapeutic responses at the targeted site. The emergence of molecular imaging strategies has been pivotal in optimizing gene therapy; since it can allow us to evaluate the effectiveness of gene delivery noninvasively and spatiotemporally. Due to the unique physiochemical properties of nanomaterials, numerous functional nanoparticles show promise in accomplishing gene delivery with the necessary feature of visualizing the delivery. In this review, recent developments of nanoparticles for molecular imaging guided gene delivery are summarized. PMID:22473061

Liu, Gang; Swierczewska, Magdalena; Lee, Seulki; Chen, Xiaoyuan

2010-01-01

343

Human Intellectual Disability Genes Form Conserved Functional Modules in Drosophila  

PubMed Central

Intellectual Disability (ID) disorders, defined by an IQ below 70, are genetically and phenotypically highly heterogeneous. Identification of common molecular pathways underlying these disorders is crucial for understanding the molecular basis of cognition and for the development of therapeutic intervention strategies. To systematically establish their functional connectivity, we used transgenic RNAi to target 270 ID gene orthologs in the Drosophila eye. Assessment of neuronal function in behavioral and electrophysiological assays and multiparametric morphological analysis identified phenotypes associated with knockdown of 180 ID gene orthologs. Most of these genotype-phenotype associations were novel. For example, we uncovered 16 genes that are required for basal neurotransmission and have not previously been implicated in this process in any system or organism. ID gene orthologs with morphological eye phenotypes, in contrast to genes without phenotypes, are relatively highly expressed in the human nervous system and are enriched for neuronal functions, suggesting that eye phenotyping can distinguish different classes of ID genes. Indeed, grouping genes by Drosophila phenotype uncovered 26 connected functional modules. Novel links between ID genes successfully predicted that MYCN, PIGV and UPF3B regulate synapse development. Drosophila phenotype groups show, in addition to ID, significant phenotypic similarity also in humans, indicating that functional modules are conserved. The combined data indicate that ID disorders, despite their extreme genetic diversity, are caused by disruption of a limited number of highly connected functional modules. PMID:24204314

Oortveld, Merel A. W.; Keerthikumar, Shivakumar; Oti, Martin; Nijhof, Bonnie; Fernandes, Ana Clara; Kochinke, Korinna; Castells-Nobau, Anna; van Engelen, Eva; Ellenkamp, Thijs; Eshuis, Lilian; Galy, Anne; van Bokhoven, Hans; Habermann, Bianca; Brunner, Han G.; Zweier, Christiane; Verstreken, Patrik; Huynen, Martijn A.; Schenck, Annette

2013-01-01

344

The 2008 update of the Aspergillus nidulans genome annotation: a community effort  

PubMed Central

The identification and annotation of protein-coding genes is one of the primary goals of whole-genome sequencing projects, and the accuracy of predicting the primary protein products of gene expression is vital to the interpretation of the available data and the design of downstream functional applications. Nevertheless, the comprehensive annotation of eukaryotic genomes remains a considerable challenge. Many genomes submitted to public databases, including those of major model organisms, contain significant numbers of wrong and incomplete gene predictions. We present a community-based reannotation of the Aspergillus nidulans genome with the primary goal of increasing the number and quality of protein functional assignments through the careful review of experts in the field of fungal biology. PMID:19146970

Wortman, Jennifer Russo; Gilsenan, Jane Mabey; Joardar, Vinita; Deegan, Jennifer; Clutterbuck, John; Andersen, Mikael R.; Archer, David; Bencina, Mojca; Braus, Gerhard; Coutinho, Pedro; von Döhren, Hans; Doonan, John; Driessen, Arnold J.M.; Durek, Pawel; Espeso, Eduardo; Fekete, Erzsébet; Flipphi, Michel; Estrada, Carlos Garcia; Geysens, Steven; Goldman, Gustavo; de Groot, Piet W.J.; Hansen, Kim; Harris, Steven D.; Heinekamp, Thorsten; Helmstaedt, Kerstin; Henrissat, Bernard; Hofmann, Gerald; Homan, Tim; Horio, Tetsuya; Horiuchi, Hiroyuki; James, Steve; Jones, Meriel; Karaffa, Levente; Karányi, Zsolt; Kato, Masashi; Keller, Nancy; Kelly, Diane E.; Kiel, Jan A.K.W.; Kim, Jung-Mi; van der Klei, Ida J.; Klis, Frans M.; Kovalchuk, Andriy; Kraševec, Nada; Kubicek, Christian P.; Liu, Bo; MacCabe, Andrew; Meyer, Vera; Mirabito, Pete; Miskei, Márton; Mos, Magdalena; Mullins, Jonathan; Nelson, David R.; Nielsen, Jens; Oakley, Berl R.; Osmani, Stephen A.; Pakula, Tiina; Paszewski, Andrzej; Paulsen, Ian; Pilsyk, Sebastian; Pócsi, István; Punt, Peter J.; Ram, Arthur F.J.; Ren, Qinghu; Robellet, Xavier; Robson, Geoff; Seiboth, Bernhard; Solingen, Piet van; Specht, Thomas; Sun, Jibin; Taheri-Talesh, Naimeh; Takeshita, Norio; Ussery, Dave; vanKuyk, Patricia A.; Visser, Hans; van de Vondervoort, Peter J.I.; de Vries, Ronald P.; Walton, Jonathan; Xiang, Xin; Xiong, Yi; Zeng, An Ping; Brandt, Bernd W.; Cornell, Michael J.; van den Hondel, Cees A.M.J.J.; Visser, Jacob; Oliver, Stephen G.; Turner, Geoffrey

2010-01-01

345

GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation.  

PubMed

Results from Genome-Wide Association Studies (GWAS) have shown that complex diseases are often affected by many genetic variants with small or moderate effects. Identifications of these risk variants remain a very challenging problem. There is a need to develop more powerful statistical methods to leverage available information to improve upon traditional approaches that focus on a single GWAS dataset without incorporating additional data. In this paper, we propose a novel statistical approach, GPA (Genetic analysis incorporating Pleiotropy and Annotation), to increase statistical power to identify risk variants through joint analysis of multiple GWAS data sets and annotation information because: (1) accumulating evidence suggests that different complex diseases share common risk bases, i.e., pleiotropy; and (2) functionally annotated variants have been consistently demonstrated to be enriched among GWAS hits. GPA can integrate multiple GWAS datasets and functional annotations to seek association signals, and it can also perform hypothesis testing to test the presence of pleiotropy and enrichment of functional annotation. Statistical inference of the model parameters and SNP ranking is achieved through an EM algorithm that can handle genome-wide markers efficiently. When we applied GPA to jointly analyze five psychiatric disorders with annotation information, not only did GPA identify many weak signals missed by the traditional single phenotype analysis, but it also revealed relationships in the genetic architecture of these disorders. Using our hypothesis testing framework, statistically significant pleiotropic effects were detected among these psychiatric disorders, and the markers annotated in the central nervous system genes and eQTLs from the Genotype-Tissue Expression (GTEx) database were significantly enriched. We also applied GPA to a bladder cancer GWAS data set with the ENCODE DNase-seq data from 125 cell lines. GPA was able to detect cell lines that are biologically more relevant to bladder cancer. The R implementation of GPA is currently available at http://dongjunchung.github.io/GPA/. PMID:25393678

Chung, Dongjun; Yang, Can; Li, Cong; Gelernter, Joel; Zhao, Hongyu

2014-11-01

346

GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy and Annotation  

PubMed Central

Results from Genome-Wide Association Studies (GWAS) have shown that complex diseases are often affected by many genetic variants with small or moderate effects. Identifications of these risk variants remain a very challenging problem. There is a need to develop more powerful statistical methods to leverage available information to improve upon traditional approaches that focus on a single GWAS dataset without incorporating additional data. In this paper, we propose a novel statistical approach, GPA (Genetic analysis incorporating Pleiotropy and Annotation), to increase statistical power to identify risk variants through joint analysis of multiple GWAS data sets and annotation information because: (1) accumulating evidence suggests that different complex diseases share common risk bases, i.e., pleiotropy; and (2) functionally annotated variants have been consistently demonstrated to be enriched among GWAS hits. GPA can integrate multiple GWAS datasets and functional annotations to seek association signals, and it can also perform hypothesis testing to test the presence of pleiotropy and enrichment of functional annotation. Statistical inference of the model parameters and SNP ranking is achieved through an EM algorithm that can handle genome-wide markers efficiently. When we applied GPA to jointly analyze five psychiatric disorders with annotation information, not only did GPA identify many weak signals missed by the traditional single phenotype analysis, but it also revealed relationships in the genetic architecture of these disorders. Using our hypothesis testing framework, statistically significant pleiotropic effects were detected among these psychiatric disorders, and the markers annotated in the central nervous system genes and eQTLs from the Genotype-Tissue Expression (GTEx) database were significantly enriched. We also applied GPA to a bladder cancer GWAS data set with the ENCODE DNase-seq data from 125 cell lines. GPA was able to detect cell lines that are biologically more relevant to bladder cancer. The R implementation of GPA is currently available at http://dongjunchung.github.io/GPA/. PMID:25393678

Li, Cong; Gelernter, Joel; Zhao, Hongyu

2014-01-01

347

Saliva Microbiota Carry Caries-Specific Functional Gene Signatures  

PubMed Central

Human saliva microbiota is phylogenetically divergent among host individuals yet their roles in health and disease are poorly appreciated. We employed a microbial functional gene microarray, HuMiChip 1.0, to reconstruct the global functional profiles of human saliva microbiota from ten healthy and ten caries-active adults. Saliva microbiota in the pilot population featured a vast diversity of functional genes. No significant distinction in gene number or diversity indices was observed between healthy and caries-active microbiota. However, co-presence network analysis of functional genes revealed that caries-active microbiota was more divergent in non-core genes than healthy microbiota, despite both groups exhibited a similar degree of conservation at their respective core genes. Furthermore, functional gene structure of saliva microbiota could potentially distinguish caries-active patients from healthy hosts. Microbial functions such as Diaminopimelate epimerase, Prephenate dehydrogenase, Pyruvate-formate lyase and N-acetylmuramoyl-L-alanine amidase were significantly linked to caries. Therefore, saliva microbiota carried disease-associated functional signatures, which could be potentially exploited for caries diagnosis. PMID:24533043

Chang, Xingzhi; Yuan, Xiao; Tu, Qichao; Yuan, Tong; Deng, Ye; Hemme, Christopher L.; Van Nostrand, Joy; Cui, Xinping; He, Zhili; Chen, Zhenggang; Guo, Dawei; Yu, Jiangbo; Zhang, Yue; Zhou, Jizhong; Xu, Jian

2014-01-01

348

Saliva microbiota carry caries-specific functional gene signatures.  

PubMed

Human saliva microbiota is phylogenetically divergent among host individuals yet their roles in health and disease are poorly appreciated. We employed a microbial functional gene microarray, HuMiChip 1.0, to reconstruct the global functional profiles of human saliva microbiota from ten healthy and ten caries-active adults. Saliva microbiota in the pilot population featured a vast diversity of functional genes. No significant distinction in gene number or diversity indices was observed between healthy and caries-active microbiota. However, co-presence network analysis of functional genes revealed that caries-active microbiota was more divergent in non-core genes than healthy microbiota, despite both groups exhibited a similar degree of conservation at their respective core genes. Furthermore, functional gene structure of saliva microbiota could potentially distinguish caries-active patients from healthy hosts. Microbial functions such as Diaminopimelate epimerase, Prephenate dehydrogenase, Pyruvate-formate lyase and N-acetylmuramoyl-L-alanine amidase were significantly linked to caries. Therefore, saliva microbiota carried disease-associated functional signatures, which could be potentially exploited for caries diagnosis. PMID:24533043

Yang, Fang; Ning, Kang; Chang, Xingzhi; Yuan, Xiao; Tu, Qichao; Yuan, Tong; Deng, Ye; Hemme, Christopher L; Van Nostrand, Joy; Cui, Xinping; He, Zhili; Chen, Zhenggang; Guo, Dawei; Yu, Jiangbo; Zhang, Yue; Zhou, Jizhong; Xu, Jian

2014-01-01

349

Assembly, Annotation, and Integration of UNIGENE Clusters into the Human Genome Draft  

PubMed Central

The recent release of the first draft of the human genome provides an unprecedented opportunity to integrate human genes and their functions in a complete positional context. However, at least three significant technical hurdles remain: first, to assemble a complete and nonredundant human transcript index; second, to accurately place the individual transcript indices on the human genome; and third, to functionally annotate all human genes. Here, we report the extension of the UNIGENE database through the assembly of its sequence clusters into nonredundant sequence contigs. Each resulting consensus was aligned to the human genome draft. A unique location for each transcript within the human genome was determined by the integration of the restriction fingerprint, assembled genomic contig, and radiation hybrid (RH) maps. A total of 59,500 UNIGENE clusters were mapped on the basis of at least three independent criteria as compared with the 30,000 human genes/ESTs currently mapped in Genemap'99. Finally, the extension of the human transcript consensus in this study enabled a greater number of putative functional assignments than the 11,000 annotated entries in UNIGENE. This study reports a draft physical map with annotations for a majority of the human transcripts, called the Human Index of Nonredundant Transcripts (HINT). Such information can be immediately applied to the discovery of new genes and the identification of candidate genes for positional cloning. PMID:11337484

Zhuo, Degen; Zhao, Wei D.; Wright, Fred A.; Yang, Hee-Yung; Wang, Jian-Ping; Sears, Russell; Baer, Troy; Kwon, Do-Hun; Gordon, David; Gibbs, Solomon; Dai, Dean; Yang, Qing; Spitzner, Joe; Krahe, Ralf; Stredney, Don; Stutz, Al; Yuan, Bo

2001-01-01

350

Analysis and functional annotation of expressed sequence tags (ESTs) from multiple tissues of oil palm (Elaeis guineensis Jacq.)  

PubMed Central

Background Oil palm is the second largest source of edible oil which contributes to approximately 20% of the world's production of oils and fats. In order to understand the molecular biology involved in in vitro propagation, flowering, efficient utilization of nitrogen sources and root diseases, we have initiated an expressed sequence tag (EST) analysis on oil palm. Results In this study, six cDNA libraries from oil palm zygotic embryos, suspension cells, shoot apical meristems, young flowers, mature flowers and roots, were constructed. We have generated a total of 14537 expressed sequence tags (ESTs) from these libraries, from which 6464 tentative unique contigs (TUCs) and 2129 singletons were obtained. Approximately 6008 of these tentative unique genes (TUGs) have significant matches to the non-redundant protein database, from which 2361 were assigned to one or more Gene Ontology categories. Predominant transcripts and differentially expressed genes were identified in multiple oil palm tissues. Homologues of genes involved in many aspects of flower development were also identified among the EST collection, such as CONSTANS-like, AGAMOUS-like (AGL)2, AGL20, LFY-like, SQUAMOSA, SQUAMOSA binding protein (SBP) etc. Majority of them are the first representatives in oil palm, providing opportunities to explore the cause of epigenetic homeotic flowering abnormality in oil palm, given the importance of flowering in fruit production. The transcript levels of two flowering-related genes, EgSBP and EgSEP were analysed in the flower tissues of various developmental stages. Gene homologues for enzymes involved in oil biosynthesis, utilization of nitrogen sources, and scavenging of oxygen radicals, were also uncovered among the oil palm ESTs. Conclusion The EST sequences generated will allow comparative genomic studies between oil palm and other monocotyledonous and dicotyledonous plants, development of gene-targeted markers for the reference genetic map, design and fabrication of DNA array for future studies of oil palm. The outcomes of such studies will contribute to oil palm improvements through the establishment of breeding program using marker-assisted selection, development of diagnostic assays using gene targeted markers, and discovery of candidate genes related to important agronomic traits of oil palm. PMID:17953740

Ho, Chai-Ling; Kwan, Yen-Yen; Choi, Mei-Chooi; Tee, Sue-Sean; Ng, Wai-Har; Lim, Kok-Ang; Lee, Yang-Ping; Ooi, Siew-Eng; Lee, Weng-Wah; Tee, Jin-Ming; Tan, Siang-Hee; Kulaveerasingam, Harikrishna; Alwee, Sharifah Shahrul Rabiah Syed; Abdullah, Meilina Ong

2007-01-01

351

Target Gene and Function Prediction of Differentially Expressed MicroRNAs in Lactating Mammary Glands of Dairy Goats  

PubMed Central

MicroRNAs are small noncoding RNAs that can regulate gene expression, and they can be involved in the regulation of mammary gland development. The differential expression of miRNAs during mammary gland development is expected to provide insight into their roles in regulating the homeostasis of mammary gland tissues. To screen out miRNAs that should have important regulatory function in the development of mammary gland from miRNA expression profiles and to predict their function, in this study, the target genes of differentially expressed miRNAs in the lactating mammary glands of Laoshan dairy goats are predicted, and then the functions of these miRNAs are analyzed via bioinformatics. First, we screen the expression patterns of 25 miRNAs that had shown significant differences during the different lactation stages in the mammary gland. Then, these miRNAs are clustered according to their expression patterns. Computational methods were used to obtain 215 target genes for 22 of these miRNAs. Combining gene ontology annotation, Fisher's exact test, and KEGG analysis with the target prediction for these miRNAs, the regulatory functions of miRNAs belonging to different clusters are predicted. PMID:24195063

Ji, Zhi-Bin; Chen, Cun-Xian; Wang, Gui-Zhi; Wang, Jian-Min

2013-01-01

352

Layered Functional Network Analysis of Gene Expression in Human Heart Failure  

PubMed Central

Background Although dilated cardiomyopathy (DCM) is a leading cause of heart failure (HF), the mechanism underlying DCM is not well understood. Previously, it has been demonstrated that an integrative analysis of gene expression and protein-protein interaction (PPI) networks can provide insights into the molecular mechanisms of various diseases. In this study we develop a systems approach by linking public available gene expression data on ischemic dilated cardiomyopathy (ICM), a main pathological form of DCM, with data from a layered PPI network. We propose that the use of a layered PPI network, as opposed to a traditional PPI network, provides unique insights into the mechanism of DCM. Methods Four Cytoscape plugins including BionetBuilder, NetworkAnalyzer, Cerebral and GenePro were used to establish the layered PPI network, which was based upon validated subcellular protein localization data retrieved from the HRPD and Entrez Gene databases. The DAVID function annotation clustering tool was used for gene ontology (GO) analysis. Results The assembled layered PPI network was divided into four layers: extracellular, plasma membrane, cytoplasm and nucleus. The characteristics of the gene expression pattern of the four layers were compared. In the extracellular and plasma membrane layers, there were more proteins encoded by down-regulated genes than by up-regulated genes, but in the other two layers, the opposite trend was found. GO analysis established that proteins encoded by up-regulated genes, reflecting significantly over-represented biological processes, were mainly located in the nucleus and cytoplasm layers, while proteins encoded by down-regulated genes were mainly located in the extracellular and plasma membrane layers. The PPI network analysis revealed that the Janus family tyrosine kinase-signal transducer and activator of transcription (Jak-STAT) signaling pathway might play an important role in the development of ICM and could be exploited as a therapeutic target of ICM. In addition, glycogen synthase kinase 3 beta (GSK3B) may also be a potential candidate target, but more evidence is required. Conclusion This study illustrated that by incorporating subcellular localization information into a PPI network based analysis, one can derive greater insights into the mechanisms underlying ICM. PMID:19629191

Zhu, Wenliang; Yang, Lei; Du, Zhimin

2009-01-01

353

Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression  

Microsoft Academic Search

Descriptions of recently evolved genes suggest several mechanisms of origin including exon shuffling, gene fission\\/fusion, retrotransposition, duplication-divergence, and lateral gene transfer, all of which involve recruitment of preexisting genes or genetic elements into new function. The importance of noncoding DNA in the origin of novel genes remains an open question. We used the well annotated genome of the genetic model

Mia T. Levine; Corbin D. Jones; Andrew D. Kern; Heather A. Lindfors; David J. Begun

2006-01-01

354

Gene functional dynamics: environment as a trigger?  

PubMed

Recent decades have seen the deciphering of the human genome and, also, progress in studies related to the effects of space-weather on humans. The progress in genetics allows us to connect many human pathologies with specific gene abnormalities. Concomitantly it has been shown that many congenital and adherent diseases, and the timing of death are connected with space factors such as solar activity (SA), geomagnetic activity (GMA), cosmic ray activity (CRA), and space neutron and proton flux. Here arises the question to what extent gene expression is affected by the aforementioned space physical activity parameters. This is the motto of this hypothetical paper. In conclusion, the space-weather-related timing of many medical events invites presumption that gene activity is a changing phenomenon and space weather components may be playing a regulatory role in these changes. PMID:24259246

Stoupel, Eliyahu G

2014-05-01

355

The ubiquilin gene family: evolutionary patterns and functional insights  

PubMed Central

Background Ubiquilins are proteins that function as ubiquitin receptors in eukaryotes. Mutations in two ubiquilin-encoding genes have been linked to the genesis of neurodegenerative diseases. However, ubiquilin functions are still poorly understood. Results In this study, evolutionary and functional data are combined to determine the origin and diversification of the ubiquilin gene family and to characterize novel potential roles of ubiquilins in mammalian species, including humans. The analysis of more than six hundred sequences allowed characterizing ubiquilin diversity in all the main eukaryotic groups. Many organisms (e. g. fungi, many animals) have single ubiquilin genes, but duplications in animal, plant, alveolate and excavate species are described. Seven different ubiquilins have been detected in vertebrates. Two of them, here called UBQLN5 and UBQLN6, had not been hitherto described. Significantly, marsupial and eutherian mammals have the most complex ubiquilin gene families, composed of up to 6 genes. This exceptional mammalian-specific expansion is the result of the recent emergence of four new genes, three of them (UBQLN3, UBQLN5 and UBQLNL) with precise testis-specific expression patterns that indicate roles in the postmeiotic stages of spermatogenesis. A gene with related features has independently arisen in species of the Drosophila genus. Positive selection acting on some mammalian ubiquilins has been detected. Conclusions The ubiquilin gene family is highly conserved in eukaryotes. The infrequent lineage-specific amplifications observed may be linked to the emergence of novel functions in particular tissues. PMID:24674348

2014-01-01

356

Automated Discovery of Functional Generality of Human Gene Expression Programs  

E-print Network

expression datasets measuring the responses of human cells to infectious agents or immuneAutomated Discovery of Functional Generality of Human Gene Expression Programs Georg K. Gerber1 is the identification of expression programs, sets of co- expressed genes orchestrating normal or pathological processes

Williams, Brian C.

357

Functional requirements driving the gene duplication in 12 Drosophila species  

PubMed Central

Background Gene duplication supplies the raw materials for novel gene functions and many gene families arisen from duplication experience adaptive evolution. Most studies of young duplicates have focused on mammals, especially humans, whereas reports describing their genome-wide evolutionary patterns across the closely related Drosophila species are rare. The sequenced 12 Drosophila genomes provide the opportunity to address this issue. Results In our study, 3,647 young duplicate gene families were identified across the 12 Drosophila species and three types of expansions, species-specific, lineage-specific and complex expansions, were detected in these gene families. Our data showed that the species-specific young duplicate genes predominated (86.6%) over the other two types. Interestingly, many independent species-specific expansions in the same gene family have been observed in many species, even including 11 or 12 Drosophila species. Our data also showed that the functional bias observed in these young duplicate genes was mainly related to responses to environmental stimuli and biotic stresses. Conclusions This study reveals the evolutionary patterns of young duplicates across 12 Drosophila species on a genomic scale. Our results suggest that convergent evolution acts on young duplicate genes after the species differentiation and adaptive evolution may play an important role in duplicate genes for adaption to ecological factors and environmental changes in Drosophila. PMID:23945147

2013-01-01

358

Beyond genomic variation - comparison and functional annotation of three Brassica rapa genomes: a turnip, a rapid cycling and a Chinese cabbage  

PubMed Central

Background Brassica rapa is an economically important crop species. During its long breeding history, a large number of morphotypes have been generated, including leafy vegetables such as Chinese cabbage and pakchoi, turnip tuber crops and oil crops. Results To investigate the genetic variation underlying this morphological variation, we re-sequenced, assembled and annotated the genomes of two B. rapa subspecies, turnip crops (turnip) and a rapid cycling. We then analysed the two resulting genomes together with the Chinese cabbage Chiifu reference genome to obtain an impression of the B. rapa pan-genome. The number of genes with protein-coding changes between the three genotypes was lower than that among different accessions of Arabidopsis thaliana, which can be explained by the smaller effective population size of B. rapa due to its domestication. Based on orthology to a number of non-brassica species, we estimated the date of divergence among the three B. rapa morphotypes at approximately 250,000 YA, far predating Brassica domestication (5,000-10,000 YA). Conclusions By analysing genes unique to turnip we found evidence for copy number differences in peroxidases, pointing to a role for the phenylpropanoid biosynthesis pathway in the generation of morphological variation. The estimated date of divergence among three B. rapa morphotypes implies that prior to domestication there was already considerably divergence among B. rapa genotypes. Our study thus provides two new B. rapa reference genomes, delivers a set of computer tools to analyse the resulting pan-genome and uses these to shed light on genetic drivers behind the rich morphological variation found in B. rapa. PMID:24684742

2014-01-01

359

The genomic and functional characteristics of disease genes.  

PubMed

Increasing evidence indicates that genes containing disease causal variation have distinct functional and genomic properties. The importance of understanding these properties is highlighted by efforts to filter lists of variants from next-generation sequencing studies, where the number of potentially deleterious variants, which are in fact unrelated to disease, may be large. Available evidence indicates that the majority of disease genes are 'non-essential' and their products occupy functionally peripheral positions in protein networks. They tend to be intermediate between genes that have core biological functions, particularly low mutation rates and low haplotype diversity, and genes for which high haplotype diversity and high mutation rates are advantageous (such as those involved in sensory perception and some immune system functions). Evidence presented here supports these conclusions through analysis of integrated data sets incorporating the latest mutational profiles, linkage disequilibrium structure and other genomic properties of individual genes. The analysis highlights the contrasting functions of genes predicted as least and most likely to contain disease variation and provides a basis for filtering gene variant lists to exclude the least plausible disease candidates. PMID:24425794

Collins, Andrew

2015-01-01

360

Structure Based Annotation of Helicobacter pylori Strain 26695 Proteome  

PubMed Central

The availability of complete genome sequences of H. pylori 26695 has provided a wealth of information enabling us to carry out in silico studies to identify new molecular targets for pharmaceutical treatment. In order to construe the structural and functional information of complete proteome, use of computational methods are more relevant since these methods are reliable and provide a solution to the time consuming and expensive experimental methods. Out of 1590 predicted protein coding genes in H. pylori, experimentally determined structures are available for only 145 proteins in the PDB. In the absence of experimental structures, computational studies on the three dimensional (3D) structural organization would help in deciphering the protein fold, structure and active site. Functional annotation of each protein was carried out based on structural fold and binding site based ligand association. Most of these proteins are uncharacterized in this proteome and through our annotation pipeline we were able to annotate most of them. We could assign structural folds to 464 uncharacterized proteins from an initial list of 557 sequences. Of the 1195 known structural folds present in the SCOP database, 411 (34% of all known folds) are observed in the whole H. pylori 26695 proteome, with greater inclination for domains belonging to ?/? class (36.63%). Top folds include P-loop containing nucleoside triphosphate hydrolases (22.6%), TIM barrel (16.7%), transmembrane helix hairpin (16.05%), alpha-alpha superhelix (11.1%) and S-adenosyl-L-methionine-dependent methyltransferases (10.7%). PMID:25549250

Singh, Swati; Guttula, Praveen Kumar; Guruprasad, Lalitha

2014-01-01

361

Omics data management and annotation.  

PubMed

Technological Omics breakthroughs, including next generation sequencing, bring avalanches of data which need to undergo effective data management to ensure integrity, security, and maximal knowledge-gleaning. Data management system requirements include flexible input formats, diverse data entry mechanisms and views, user friendliness, attention to standards, hardware and software platform definition, as well as robustness. Relevant solutions elaborated by the scientific community include Laboratory Information Management Systems (LIMS) and standardization protocols facilitating data sharing and managing. In project planning, special consideration has to be made when choosing relevant Omics annotation sources, since many of them overlap and require sophisticated integration heuristics. The data modeling step defines and categorizes the data into objects (e.g., genes, articles, disorders) and creates an application flow. A data storage/warehouse mechanism must be selected, such as file-based systems and relational databases, the latter typically used for larger projects. Omics project life cycle considerations must include the definition and deployment of new versions, incorporating either full or partial updates. Finally, quality assurance (QA) procedures must validate data and feature integrity, as well as system performance expectations. We illustrate these data management principles with examples from the life cycle of the GeneCards Omics project (http://www.genecards.org), a comprehensive, widely used compendium of annotative information about human genes. For example, the GeneCards infrastructure has recently been changed from text files to a relational database, enabling better organization and views of the growing data. Omics data handling benefits from the wealth of Web-based information, the vast amount of public domain software, increasingly affordable hardware, and effective use of data management and annotation principles as outlined in this chapter. PMID:21370079

Harel, Arye; Dalah, Irina; Pietrokovski, Shmuel; Safran, Marilyn; Lancet, Doron

2011-01-01

362

A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus  

PubMed Central

Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an important step towards the identification of all genes in the citrus genome. Furthermore, public availability of the cDNA clones generated in this study, and not only their sequence, enables testing of the biological function of the genes represented in the collection. Expression of the citrus SEP3 homologue, CitrSEP, in Arabidopsis results in early flowering, along with other phenotypes resembling the over-expression of the Arabidopsis SEPALLATA genes. Our findings suggest that the members of the SEP gene family play similar roles in these quite distant plant species. PMID:19747386

Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A

2009-01-01

363

Evolutionary interrogation of human biology in well-annotated genomic framework of rhesus macaque.  

PubMed

With genome sequence and composition highly analogous to human, rhesus macaque represents a unique reference for evolutionary studies of human biology. Here, we developed a comprehensive genomic framework of rhesus macaque, the RhesusBase2, for evolutionary interrogation of human genes and the associated regulations. A total of 1,667 next-generation sequencing (NGS) data sets were processed, integrated, and evaluated, generating 51.2 million new functional annotation records. With extensive NGS annotations, RhesusBase2 refined the fine-scale structures in 30% of the macaque Ensembl transcripts, reporting an accurate, up-to-date set of macaque gene models. On the basis of these annotations and accurate macaque gene models, we further developed an NGS-oriented Molecular Evolution Gateway to access and visualize macaque annotations in reference to human orthologous genes and associated regulations (www.rhesusbase.org/molEvo). We highlighted the application of this well-annotated genomic framework in generating hypothetical link of human-biased regulations to human-specific traits, by using mechanistic characterization of the DIEXF gene as an example that provides novel clues to the understanding of digestive system reduction in human evolution. On a global scale, we also identified a catalog of 9,295 human-biased regulatory events, which may represent novel elements that have a substantial impact on shaping human transcriptome and possibly underpin recent human phenotypic evolution. Taken together, we provide an NGS data-driven, information-rich framework that will broadly benefit genomics research in general and serves as an important resource for in-depth evolutionary studies of human biology. PMID:24577841

Zhang, Shi-Jian; Liu, Chu-Jun; Yu, Peng; Zhong, Xiaoming; Chen, Jia-Yu; Yang, Xinzhuang; Peng, Jiguang; Yan, Shouyu; Wang, Chenqu; Zhu, Xiaotong; Xiong, Jingwei; Zhang, Yong E; Tan, Bertrand Chin-Ming; Li, Chuan-Yun

2014-05-01

364

GRAIL and GenQuest Sequence Annotation Tools  

SciTech Connect

Our goal is to develop and implement an integrated intelligent system which can recognize biologically significant features in DNA sequence and provide insight into the organization and function of regions of genomic DNA. GRAIL is a modular expert system which facilitates the recognition of gene features and provides an environment for the construction of sequence annotation. The last several years have seen a rapid evolution of the technology for analyzing genomic DNA sequences. The current GRAIL systems (including the e-mail, XGRAIL, JAVA-GRAIL and genQuest systems) are perhaps the most widely used, comprehensive, and user friendly systems available for computational characterization of genomic DNA sequence.

Xu, Ying; Shah, Manesh B.; Einstein, J. Ralph; Parang, Morey; Snoddy, Jay; Petrov, Sergey; Olman, Victor; Zhang, Ge; Mural, Richard J.; Uberbacher, Edward C.

1997-12-31