Science.gov

Sample records for kegg pathways database

  1. Using the KEGG database resource.

    PubMed

    Tanabe, Mao; Kanehisa, Minoru

    2012-06-01

    KEGG (Kyoto Encyclopedia of Genes and Genomes) is a bioinformatics resource for understanding the functions and utilities of cells and organisms from both high-level and genomic perspectives. It is a self-sufficient, integrated resource consisting of genomic, chemical, and network information, with cross-references to numerous outside databases. The genomic and chemical information is a complete set of building blocks (genes and molecules) and the network information includes molecular wiring diagrams (interaction/reaction networks) and hierarchical classifications (relation networks) to represent high-level functions. This unit describes protocols for using KEGG, focusing on molecular network information in KEGG PATHWAY, KEGG BRITE, and KEGG MODULE, perturbed molecular networks in KEGG DISEASE and KEGG DRUG, molecular building block information in KEGG GENES and KEGG LIGAND, and a mechanism for linking genomes to molecular networks in KEGG ORTHOLOGY (KO). All of these many protocols enable the user to take advantage of the full breadth of the functionality provided by KEGG.

  2. BiKEGG: a COBRA toolbox extension for bridging the BiGG and KEGG databases.

    PubMed

    Jamialahmadi, Oveis; Motamedian, Ehsan; Hashemi-Najafabadi, Sameereh

    2016-10-18

    Development of an interface tool between the Biochemical, Genetic and Genomic (BiGG) and KEGG databases is necessary for simultaneous access to the features of both databases. For this purpose, we present the BiKEGG toolbox, an open source COBRA toolbox extension providing a set of functions to infer the reaction correspondences between the KEGG reaction identifiers and those in the BiGG knowledgebase using a combination of manual verification and computational methods. Inferred reaction correspondences using this approach are supported by evidence from the literature, which provides a higher number of reconciled reactions between these two databases compared to the MetaNetX and MetRxn databases. This set of equivalent reactions is then used to automatically superimpose the predicted fluxes using COBRA methods on classical KEGG pathway maps or to create a customized metabolic map based on the KEGG global metabolic pathway, and to find the corresponding reactions in BiGG based on the genome annotation of an organism in the KEGG database. Customized metabolic maps can be created for a set of pathways of interest, for the whole KEGG global map or exclusively for all pathways for which there exists at least one flux carrying reaction. This flexibility in visualization enables BiKEGG to indicate reaction directionality as well as to visualize the reaction fluxes for different static or dynamic conditions in an animated manner. BiKEGG allows the user to export (1) the output visualized metabolic maps to various standard image formats or save them as a video or animated GIF file, and (2) the equivalent reactions for an organism as an Excel spreadsheet.

  3. KEGG: new perspectives on genomes, pathways, diseases and drugs

    PubMed Central

    Kanehisa, Minoru; Furumichi, Miho; Tanabe, Mao; Sato, Yoko; Morishima, Kanae

    2017-01-01

    KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an encyclopedia of genes and genomes. Assigning functional meanings to genes and genomes both at the molecular and higher levels is the primary objective of the KEGG database project. Molecular-level functions are stored in the KO (KEGG Orthology) database, where each KO is defined as a functional ortholog of genes and proteins. Higher-level functions are represented by networks of molecular interactions, reactions and relations in the forms of KEGG pathway maps, BRITE hierarchies and KEGG modules. In the past the KO database was developed for the purpose of defining nodes of molecular networks, but now the content has been expanded and the quality improved irrespective of whether or not the KOs appear in the three molecular network databases. The newly introduced addendum category of the GENES database is a collection of individual proteins whose functions are experimentally characterized and from which an increasing number of KOs are defined. Furthermore, the DISEASE and DRUG databases have been improved by systematic analysis of drug labels for better integration of diseases and drugs with the KEGG molecular networks. KEGG is moving towards becoming a comprehensive knowledge base for both functional interpretation and practical application of genomic information. PMID:27899662

  4. The use of Gene Ontology terms and KEGG pathways for analysis and prediction of oncogenes.

    PubMed

    Xing, Zhihao; Chu, Chen; Chen, Lei; Kong, Xiangyin

    2016-11-01

    Oncogenes are a type of genes that have the potential to cause cancer. Most normal cells undergo programmed cell death, namely apoptosis, but activated oncogenes can help cells avoid apoptosis and survive. Thus, studying oncogenes is helpful for obtaining a good understanding of the formation and development of various types of cancers. In this study, we proposed a computational method, called OPM, for investigating oncogenes from the view of Gene Ontology (GO) and biological pathways. All investigated genes, including validated oncogenes retrieved from some public databases and other genes that have not been reported to be oncogenes thus far, were encoded into numeric vectors according to the enrichment theory of GO terms and KEGG pathways. Some popular feature selection methods, minimum redundancy maximum relevance and incremental feature selection, and an advanced machine learning algorithm, random forest, were adopted to analyze the numeric vectors to extract key GO terms and KEGG pathways. Along with the oncogenes, GO terms and KEGG pathways were discussed in terms of their relevance in this study. Some important GO terms and KEGG pathways were extracted using feature selection methods and were confirmed to be highly related to oncogenes. Additionally, the importance of these terms and pathways in predicting oncogenes was further demonstrated by finding new putative oncogenes based on them. This study investigated oncogenes based on GO terms and KEGG pathways. Some important GO terms and KEGG pathways were confirmed to be highly related to oncogenes. We hope that these GO terms and KEGG pathways can provide new insight for the study of oncogenes, particularly for building more effective prediction models to identify novel oncogenes. The program is available upon request. We hope that the new findings listed in this study may provide a new insight for the investigation of oncogenes. This article is part of a Special Issue entitled "System Genetics" Guest Editor

  5. Consistency, comprehensiveness, and compatibility of pathway databases

    PubMed Central

    2010-01-01

    Background It is necessary to analyze microarray experiments together with biological information to make better biological inferences. We investigate the adequacy of current biological databases to address this need. Description Our results show a low level of consistency, comprehensiveness and compatibility among three popular pathway databases (KEGG, Ingenuity and Wikipathways). The level of consistency for genes in similar pathways across databases ranges from 0% to 88%. The corresponding level of consistency for interacting genes pairs is 0%-61%. These three original sources can be assumed to be reliable in the sense that the interacting gene pairs reported in them are correct because they are curated. However, the lack of concordance between these databases suggests each source has missed out many genes and interacting gene pairs. Conclusions Researchers will hence find it challenging to obtain consistent pathway information out of these diverse data sources. It is therefore critical to enable them to access these sources via a consistent, comprehensive and unified pathway API. We accumulated sufficient data to create such an aggregated resource with the convenience of an API to access its information. This unified resource can be accessed at http://www.pathwayapi.com. PMID:20819233

  6. Putting The Plant Metabolic Network pathway databases to work: going offline to gain new capabilities.

    PubMed

    Dreher, Kate

    2014-01-01

    Metabolic databases such as The Plant Metabolic Network/MetaCyc and KEGG PATHWAY are publicly accessible resources providing organism-specific information on reactions and metabolites. KEGG PATHWAY depicts metabolic networks as wired, electronic circuit-like maps, whereas the MetaCyc family of databases uses a canonical textbook-like representation. The first MetaCyc-based database for a plant species was AraCyc, which describes metabolism in the model plant Arabidopsis. This database was created over 10 years ago and has since then undergone extensive manual curation to reflect updated information on enzymes and pathways in Arabidopsis. This chapter describes accessing and using AraCyc and its underlying Pathway Tools software. Specifically, methods for (1) navigating Pathway Tools, (2) visualizing omics data and superimposing the data on a metabolic pathway map, and (3) creating pathways and pathway components are discussed.

  7. Drug-Path: a database for drug-induced pathways

    PubMed Central

    Zeng, Hui; Cui, Qinghua

    2015-01-01

    Some databases for drug-associated pathways have been built and are publicly available. However, the pathways curated in most of these databases are drug-action or drug-metabolism pathways. In recent years, high-throughput technologies such as microarray and RNA-sequencing have produced lots of drug-induced gene expression profiles. Interestingly, drug-induced gene expression profile frequently show distinct patterns, indicating that drugs normally induce the activation or repression of distinct pathways. Therefore, these pathways contribute to study the mechanisms of drugs and drug-repurposing. Here, we present Drug-Path, a database of drug-induced pathways, which was generated by KEGG pathway enrichment analysis for drug-induced upregulated genes and downregulated genes based on drug-induced gene expression datasets in Connectivity Map. Drug-Path provides user-friendly interfaces to retrieve, visualize and download the drug-induced pathway data in the database. In addition, the genes deregulated by a given drug are highlighted in the pathways. All data were organized using SQLite. The web site was implemented using Django, a Python web framework. Finally, we believe that this database will be useful for related researches. Database URL: http://www.cuilab.cn/drugpath PMID:26130661

  8. Drug-Path: a database for drug-induced pathways.

    PubMed

    Zeng, Hui; Qiu, Chengxiang; Cui, Qinghua

    2015-01-01

    Some databases for drug-associated pathways have been built and are publicly available. However, the pathways curated in most of these databases are drug-action or drug-metabolism pathways. In recent years, high-throughput technologies such as microarray and RNA-sequencing have produced lots of drug-induced gene expression profiles. Interestingly, drug-induced gene expression profile frequently show distinct patterns, indicating that drugs normally induce the activation or repression of distinct pathways. Therefore, these pathways contribute to study the mechanisms of drugs and drug-repurposing. Here, we present Drug-Path, a database of drug-induced pathways, which was generated by KEGG pathway enrichment analysis for drug-induced upregulated genes and downregulated genes based on drug-induced gene expression datasets in Connectivity Map. Drug-Path provides user-friendly interfaces to retrieve, visualize and download the drug-induced pathway data in the database. In addition, the genes deregulated by a given drug are highlighted in the pathways. All data were organized using SQLite. The web site was implemented using Django, a Python web framework. Finally, we believe that this database will be useful for related researches. © The Author(s) 2015. Published by Oxford University Press.

  9. Putative drug and vaccine target protein identification using comparative genomic analysis of KEGG annotated metabolic pathways of Mycoplasma hyopneumoniae.

    PubMed

    Damte, Dereje; Suh, Joo-Won; Lee, Seung-Jin; Yohannes, Sileshi Belew; Hossain, Md Akil; Park, Seung-Chun

    2013-07-01

    In the present study, a computational comparative and subtractive genomic/proteomic analysis aimed at the identification of putative therapeutic target and vaccine candidate proteins from Kyoto Encyclopedia of Genes and Genomes (KEGG) annotated metabolic pathways of Mycoplasma hyopneumoniae was performed for drug design and vaccine production pipelines against M.hyopneumoniae. The employed comparative genomic and metabolic pathway analysis with a predefined computational systemic workflow extracted a total of 41 annotated metabolic pathways from KEGG among which five were unique to M. hyopneumoniae. A total of 234 proteins were identified to be involved in these metabolic pathways. Although 125 non homologous and predicted essential proteins were found from the total that could serve as potential drug targets and vaccine candidates, additional prioritizing parameters characterize 21 proteins as vaccine candidate while druggability of each of the identified proteins evaluated by the DrugBank database prioritized 42 proteins suitable for drug targets.

  10. Microarray and synchronization of neuronal differentiation with pathway changes in the Kyoto Encyclopedia of Genes and Genomes (KEGG) databank in nerve growth factor-treated PC12 cells.

    PubMed

    Lin, Chih-Ming; Feng, Wayne

    2012-08-01

    The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database creates networks from interrelations between molecular biology and underlying chemical elements. This allows for analysis of biologic networks, genomic information, and higher-order functional information at a systems level. We performed microarray experiments and used the KEGG database, systems biology analysis, and annotation of pathway function to study nerve growth factor (NGF)-induced differentiation of PC12 cells. Cells were cultured to 70%-80% confluence, treated with NGF for 1 or 3 hours (h), and RNA was extracted. Stage 1 data analysis involved analysis of variance (ANOVA), and stage 2 involved cluster analysis and heat map generation. We identified 2020 NGF-induced PC12 genes (1038 at 1 h and 1554 at 3 h). Results showed changes in gene expression over time. We compared these genes with 6035 genes from the KEGG database. Cross-matching resulted in 830 genes. Among these, we identified 395 altered genes (155 at 1 h and 301 at 3 h; 2-fold increase from 1 h to 3 h). We identified 191 biologic pathways in the KEGG database; the top 15 showed correlations with neuronal differentiation (mitogen-activated protein kinase [MAPK] pathway: 35 genes at 1 h, 54 genes at 3 h; genes associated with axonal guidance: 12 at 1 h, 26 at 3 h; Wnt pathway: 16 at 1 h, 25 at 3 h; neurotrophin pathway: 4 at 1 h, 14 at 3 h). Thus, we identified changes in neuronal differentiation pathways with the KEGG database, which were synchronized with NGF-induced differentiation.

  11. Construction of gene/protein interaction networks for primary myelofibrosis and KEGG pathway-enrichment analysis of molecular compounds.

    PubMed

    Sun, C G; Cao, X J; Zhou, C; Liu, L J; Feng, F B; Liu, R J; Zhuang, J; Li, Y J

    2015-12-08

    The objective of this study was the development of a gene/protein interaction network for primary myelofibrosis based on gene expression, and the enrichment analysis of KEGG pathways underlying the molecular complexes in this network. To achieve this, genes involved in primary myelofibrosis were selected from the OMIM database. A gene/protein interaction network for primary myelofibrosis was obtained through Cytoscape with the literature mining performed using the Agilent Literature Search plugin. The molecular complexes in the network were detected by ClusterViz plugin and KEGG pathway enrichment of molecular complexes was performed using DAVID online. We found 75 genes associated with primary myelofibrosis in the OMIM database. The gene/protein interaction network of primary myelofibrosis contained 608 nodes, 2086 edges, and 4 molecular complexes with a correlation integral value greater than 4. Molecular complexes involved in KEGG pathways are related to cytokine regulation, immune function regulation, ECM-receptor interaction, focal adhesion, actin cytoskeleton regulation, cell adhesion molecules, and other biological behavior of tumors, which can provide a reliable direction for the treatment of primary myelofibrosis and the bioinformatic foundation for further understanding the molecular mechanisms of this disease.

  12. MPW : the metabolic pathways database.

    SciTech Connect

    Selkov, E., Jr.; Grechkin, Y.; Mikhailova, N.; Selkov, E.; Mathematics and Computer Science; Russian Academy of Sciences

    1998-01-01

    The Metabolic Pathways Database (MPW) (www.biobase.com/emphome.html/homepage. html.pags/pathways.html) a derivative of EMP (www.biobase.com/EMP) plays a fundamental role in the technology of metabolic reconstructions from sequenced genomes under the PUMA (www.mcs.anl.gov/home/compbio/PUMA/Production/ ReconstructedMetabolism/reconstruction.html), WIT (www.mcs.anl.gov/home/compbio/WIT/wit.html ) and WIT2 (beauty.isdn.msc.anl.gov/WIT2.pub/CGI/user.cgi) systems. In October 1997, it included some 2800 pathway diagrams covering primary and secondary metabolism, membrane transport, signal transduction pathways, intracellular traffic, translation and transcription. In the current public release of MPW (beauty.isdn.mcs.anl.gov/MPW), the encoding is based on the logical structure of the pathways and is represented by the objects commonly used in electronic circuit design. This facilitates drawing and editing the diagrams and makes possible automation of the basic simulation operations such as deriving stoichiometric matrices, rate laws, and, ultimately, dynamic models of metabolic pathways. Individual pathway diagrams, automatically derived from the original ASCII records, are stored as SGML instances supplemented by relational indices. An auxiliary database of compound names and structures, encoded in the SMILES format, is maintained to unambiguously connect the pathways to the chemical structures of their intermediates.

  13. MicrobesFlux: a web platform for drafting metabolic models from the KEGG database.

    PubMed

    Feng, Xueyang; Xu, You; Chen, Yixin; Tang, Yinjie J

    2012-08-02

    Concurrent with the efforts currently underway in mapping microbial genomes using high-throughput sequencing methods, systems biologists are building metabolic models to characterize and predict cell metabolisms. One of the key steps in building a metabolic model is using multiple databases to collect and assemble essential information about genome-annotations and the architecture of the metabolic network for a specific organism. To speed up metabolic model development for a large number of microorganisms, we need a user-friendly platform to construct metabolic networks and to perform constraint-based flux balance analysis based on genome databases and experimental results. We have developed a semi-automatic, web-based platform (MicrobesFlux) for generating and reconstructing metabolic models for annotated microorganisms. MicrobesFlux is able to automatically download the metabolic network (including enzymatic reactions and metabolites) of ~1,200 species from the KEGG database (Kyoto Encyclopedia of Genes and Genomes) and then convert it to a metabolic model draft. The platform also provides diverse customized tools, such as gene knockouts and the introduction of heterologous pathways, for users to reconstruct the model network. The reconstructed metabolic network can be formulated to a constraint-based flux model to predict and analyze the carbon fluxes in microbial metabolisms. The simulation results can be exported in the SBML format (The Systems Biology Markup Language). Furthermore, we also demonstrated the platform functionalities by developing an FBA model (including 229 reactions) for a recent annotated bioethanol producer, Thermoanaerobacter sp. strain X514, to predict its biomass growth and ethanol production. MicrobesFlux is an installation-free and open-source platform that enables biologists without prior programming knowledge to develop metabolic models for annotated microorganisms in the KEGG database. Our system facilitates users to reconstruct

  14. KEGG: Kyoto Encyclopedia of Genes and Genomes.

    PubMed

    Ogata, H; Goto, S; Sato, K; Fujibuchi, W; Bono, H; Kanehisa, M

    1999-01-01

    Kyoto Encyclopedia of Genes and Genomes (KEGG) is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules. The major component of KEGG is the PATHWAY database that consists of graphical diagrams of biochemical pathways including most of the known metabolic pathways and some of the known regulatory pathways. The pathway information is also represented by the ortholog group tables summarizing orthologous and paralogous gene groups among different organisms. KEGG maintains the GENES database for the gene catalogs of all organisms with complete genomes and selected organisms with partial genomes, which are continuously re-annotated, as well as the LIGAND database for chemical compounds and enzymes. Each gene catalog is associated with the graphical genome map for chromosomal locations that is represented by Java applet. In addition to the data collection efforts, KEGG develops and provides various computational tools, such as for reconstructing biochemical pathways from the complete genome sequence and for predicting gene regulatory networks from the gene expression profiles. The KEGG databases are daily updated and made freely available (http://www.genome.ad.jp/kegg/).

  15. IPAD: the Integrated Pathway Analysis Database for Systematic Enrichment Analysis.

    PubMed

    Zhang, Fan; Drabier, Renee

    2012-01-01

    multiple available data sources.IPAD is a comprehensive database covering about 22,498 genes, 25,469 proteins, 1956 pathways, 6704 diseases, 5615 drugs, and 52 organs integrated from databases including the BioCarta, KEGG, NCI-Nature curated, Reactome, CTD, PharmGKB, DrugBank, PharmGKB, and HOMER. The database has a web-based user interface that allows users to perform enrichment analysis from genes/proteins/molecules and inter-association analysis from a pathway, disease, drug, and organ.Moreover, the quality of the database was validated with the context of the existing biological knowledge and a "gold standard" constructed from reputable and reliable sources. Two case studies were also presented to demonstrate: 1) self-validation of enrichment analysis and inter-association analysis on brain-specific markers, and 2) identification of previously undiscovered components by the enrichment analysis from a prostate cancer study. IPAD is a new resource for analyzing, identifying, and validating pathway, disease, drug, organ specificity and their inter-associations. The statistical method we developed for enrichment and similarity measurement and the two criteria we described for setting the threshold parameters can be extended to other enrichment applications. Enriched pathways, diseases, drugs, organs and their inter-associations can be searched, displayed, and downloaded from our online user interface. The current IPAD database can help users address a wide range of biological pathway related, disease susceptibility related, drug target related and organ specificity related questions in human disease studies.

  16. IPAD: the Integrated Pathway Analysis Database for Systematic Enrichment Analysis

    PubMed Central

    2012-01-01

    ) cross-linking of multiple available data sources. IPAD is a comprehensive database covering about 22,498 genes, 25,469 proteins, 1956 pathways, 6704 diseases, 5615 drugs, and 52 organs integrated from databases including the BioCarta, KEGG, NCI-Nature curated, Reactome, CTD, PharmGKB, DrugBank, PharmGKB, and HOMER. The database has a web-based user interface that allows users to perform enrichment analysis from genes/proteins/molecules and inter-association analysis from a pathway, disease, drug, and organ. Moreover, the quality of the database was validated with the context of the existing biological knowledge and a "gold standard" constructed from reputable and reliable sources. Two case studies were also presented to demonstrate: 1) self-validation of enrichment analysis and inter-association analysis on brain-specific markers, and 2) identification of previously undiscovered components by the enrichment analysis from a prostate cancer study. Conclusions IPAD is a new resource for analyzing, identifying, and validating pathway, disease, drug, organ specificity and their inter-associations. The statistical method we developed for enrichment and similarity measurement and the two criteria we described for setting the threshold parameters can be extended to other enrichment applications. Enriched pathways, diseases, drugs, organs and their inter-associations can be searched, displayed, and downloaded from our online user interface. The current IPAD database can help users address a wide range of biological pathway related, disease susceptibility related, drug target related and organ specificity related questions in human disease studies. PMID:23046449

  17. Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways.

    PubMed

    Chen, Lei; Zhang, Yu-Hang; Wang, ShaoPeng; Zhang, YunHua; Huang, Tao; Cai, Yu-Dong

    2017-01-01

    Identifying essential genes in a given organism is important for research on their fundamental roles in organism survival. Furthermore, if possible, uncovering the links between core functions or pathways with these essential genes will further help us obtain deep insight into the key roles of these genes. In this study, we investigated the essential and non-essential genes reported in a previous study and extracted gene ontology (GO) terms and biological pathways that are important for the determination of essential genes. Through the enrichment theory of GO and KEGG pathways, we encoded each essential/non-essential gene into a vector in which each component represented the relationship between the gene and one GO term or KEGG pathway. To analyze these relationships, the maximum relevance minimum redundancy (mRMR) was adopted. Then, the incremental feature selection (IFS) and support vector machine (SVM) were employed to extract important GO terms and KEGG pathways. A prediction model was built simultaneously using the extracted GO terms and KEGG pathways, which yielded nearly perfect performance, with a Matthews correlation coefficient of 0.951, for distinguishing essential and non-essential genes. To fully investigate the key factors influencing the fundamental roles of essential genes, the 21 most important GO terms and three KEGG pathways were analyzed in detail. In addition, several genes was provided in this study, which were predicted to be essential genes by our prediction model. We suggest that this study provides more functional and pathway information on the essential genes and provides a new way to investigate related problems.

  18. Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways

    PubMed Central

    Wang, ShaoPeng; Zhang, YunHua; Huang, Tao

    2017-01-01

    Identifying essential genes in a given organism is important for research on their fundamental roles in organism survival. Furthermore, if possible, uncovering the links between core functions or pathways with these essential genes will further help us obtain deep insight into the key roles of these genes. In this study, we investigated the essential and non-essential genes reported in a previous study and extracted gene ontology (GO) terms and biological pathways that are important for the determination of essential genes. Through the enrichment theory of GO and KEGG pathways, we encoded each essential/non-essential gene into a vector in which each component represented the relationship between the gene and one GO term or KEGG pathway. To analyze these relationships, the maximum relevance minimum redundancy (mRMR) was adopted. Then, the incremental feature selection (IFS) and support vector machine (SVM) were employed to extract important GO terms and KEGG pathways. A prediction model was built simultaneously using the extracted GO terms and KEGG pathways, which yielded nearly perfect performance, with a Matthews correlation coefficient of 0.951, for distinguishing essential and non-essential genes. To fully investigate the key factors influencing the fundamental roles of essential genes, the 21 most important GO terms and three KEGG pathways were analyzed in detail. In addition, several genes was provided in this study, which were predicted to be essential genes by our prediction model. We suggest that this study provides more functional and pathway information on the essential genes and provides a new way to investigate related problems. PMID:28873455

  19. KEGG-PATH: Kyoto encyclopedia of genes and genomes-based pathway analysis using a path analysis model.

    PubMed

    Du, Junli; Yuan, Zhifa; Ma, Ziwei; Song, Jiuzhou; Xie, Xiaoli; Chen, Yulin

    2014-07-29

    The dynamic impact approach (DIA) represents an alternative to overrepresentation analysis (ORA) for functional analysis of time-course experiments or those involving multiple treatments. The DIA can be used to estimate the biological impact of the differentially expressed genes (DEGs) associated with particular biological functions, for example, as represented by the Kyoto encyclopedia of genes and genomes (KEGG) annotations. However, the DIA does not take into account the correlated dependence structure of the KEGG pathway hierarchy. We have developed herein a path analysis model (KEGG-PATH) to subdivide the total effect of each KEGG pathway into the direct effect and indirect effect by taking into account not only each KEGG pathway itself, but also the correlation with its related pathways. In addition, this work also attempts to preliminarily estimate the impact direction of each KEGG pathway by a gradient analysis method from principal component analysis (PCA). As a result, the advantage of the KEGG-PATH model is demonstrated through the functional analysis of the bovine mammary transcriptome during lactation.

  20. KEGG as a reference resource for gene and protein annotation

    PubMed Central

    Kanehisa, Minoru; Sato, Yoko; Kawashima, Masayuki; Furumichi, Miho; Tanabe, Mao

    2016-01-01

    KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. Molecular functions of genes and proteins are associated with ortholog groups and stored in the KEGG Orthology (KO) database. The KEGG pathway maps, BRITE hierarchies and KEGG modules are developed as networks of KO nodes, representing high-level functions of the cell and the organism. Currently, more than 4000 complete genomes are annotated with KOs in the KEGG GENES database, which can be used as a reference data set for KO assignment and subsequent reconstruction of KEGG pathways and other molecular networks. As an annotation resource, the following improvements have been made. First, each KO record is re-examined and associated with protein sequence data used in experiments of functional characterization. Second, the GENES database now includes viruses, plasmids, and the addendum category for functionally characterized proteins that are not represented in complete genomes. Third, new automatic annotation servers, BlastKOALA and GhostKOALA, are made available utilizing the non-redundant pangenome data set generated from the GENES database. As a resource for translational bioinformatics, various data sets are created for antimicrobial resistance and drug interaction networks. PMID:26476454

  1. KEGG as a reference resource for gene and protein annotation.

    PubMed

    Kanehisa, Minoru; Sato, Yoko; Kawashima, Masayuki; Furumichi, Miho; Tanabe, Mao

    2016-01-04

    KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. Molecular functions of genes and proteins are associated with ortholog groups and stored in the KEGG Orthology (KO) database. The KEGG pathway maps, BRITE hierarchies and KEGG modules are developed as networks of KO nodes, representing high-level functions of the cell and the organism. Currently, more than 4000 complete genomes are annotated with KOs in the KEGG GENES database, which can be used as a reference data set for KO assignment and subsequent reconstruction of KEGG pathways and other molecular networks. As an annotation resource, the following improvements have been made. First, each KO record is re-examined and associated with protein sequence data used in experiments of functional characterization. Second, the GENES database now includes viruses, plasmids, and the addendum category for functionally characterized proteins that are not represented in complete genomes. Third, new automatic annotation servers, BlastKOALA and GhostKOALA, are made available utilizing the non-redundant pangenome data set generated from the GENES database. As a resource for translational bioinformatics, various data sets are created for antimicrobial resistance and drug interaction networks. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. IntPath--an integrated pathway gene relationship database for model organisms and important pathogens

    PubMed Central

    2012-01-01

    Background Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases. Results In this paper, we overcome these issues through extraction, normalization and integration of pathway data from several major public databases (KEGG, WikiPathways, BioCyc, etc). We build a database that not only hosts our integrated pathway gene relationship data for public access but also maintains the necessary updates in the long run. This public repository is named IntPath (Integrated Pathway gene relationship database for model organisms and important pathogens). Four organisms--S. cerevisiae, M. tuberculosis H37Rv, H. Sapiens and M. musculus--are included in this version (V2.0) of IntPath. IntPath uses the "full unification" approach to ensure no deletion and no introduced noise in this process. Therefore, IntPath contains much richer pathway-gene and pathway-gene pair relationships and much larger number of non-redundant genes and gene pairs than any of the single-source databases. The gene relationships of each gene (measured by average node degree) per pathway are significantly richer. The gene relationships in each pathway (measured by average number of gene pairs per pathway) are also considerably richer in the integrated pathways. Moderate manual curation are involved to get rid of errors and noises from source data (e.g., the gene ID errors

  3. KEGG: kyoto encyclopedia of genes and genomes.

    PubMed

    Kanehisa, M; Goto, S

    2000-01-01

    KEGG (Kyoto Encyclopedia of Genes and Genomes) is a knowledge base for systematic analysis of gene functions, linking genomic information with higher order functional information. The genomic information is stored in the GENES database, which is a collection of gene catalogs for all the completely sequenced genomes and some partial genomes with up-to-date annotation of gene functions. The higher order functional information is stored in the PATHWAY database, which contains graphical representations of cellular processes, such as metabolism, membrane transport, signal transduction and cell cycle. The PATHWAY database is supplemented by a set of ortholog group tables for the information about conserved subpathways (pathway motifs), which are often encoded by positionally coupled genes on the chromosome and which are especially useful in predicting gene functions. A third database in KEGG is LIGAND for the information about chemical compounds, enzyme molecules and enzymatic reactions. KEGG provides Java graphics tools for browsing genome maps, comparing two genome maps and manipulating expression maps, as well as computational tools for sequence comparison, graph comparison and path computation. The KEGG databases are daily updated and made freely available (http://www. genome.ad.jp/kegg/).

  4. HPD: an online integrated human pathway database enabling systems biology studies.

    PubMed

    Chowbina, Sudhir R; Wu, Xiaogang; Zhang, Fan; Li, Peter M; Pandey, Ragini; Kasamsetty, Harini N; Chen, Jake Y

    2009-10-08

    Pathway-oriented experimental and computational studies have led to a significant accumulation of biological knowledge concerning three major types of biological pathway events: molecular signaling events, gene regulation events, and metabolic reaction events. A pathway consists of a series of molecular pathway events that link molecular entities such as proteins, genes, and metabolites. There are approximately 300 biological pathway resources as of April 2009 according to the Pathguide database; however, these pathway databases generally have poor coverage or poor quality, and are difficult to integrate, due to syntactic-level and semantic-level data incompatibilities. We developed the Human Pathway Database (HPD) by integrating heterogeneous human pathway data that are either curated at the NCI Pathway Interaction Database (PID), Reactome, BioCarta, KEGG or indexed from the Protein Lounge Web sites. Integration of pathway data at syntactic, semantic, and schematic levels was based on a unified pathway data model and data warehousing-based integration techniques. HPD provides a comprehensive online view that connects human proteins, genes, RNA transcripts, enzymes, signaling events, metabolic reaction events, and gene regulatory events. At the time of this writing HPD includes 999 human pathways and more than 59,341 human molecular entities. The HPD software provides both a user-friendly Web interface for online use and a robust relational database backend for advanced pathway querying. This pathway tool enables users to 1) search for human pathways from different resources by simply entering genes/proteins involved in pathways or words appearing in pathway names, 2) analyze pathway-protein association, 3) study pathway-pathway similarity, and 4) build integrated pathway networks. We demonstrated the usage and characteristics of the new HPD through three breast cancer case studies. HPD http://bio.informatics.iupui.edu/HPD is a new resource for searching, managing

  5. Knowledge representation in metabolic pathway databases.

    PubMed

    Stobbe, Miranda D; Jansen, Gerbert A; Moerland, Perry D; van Kampen, Antoine H C

    2014-05-01

    The accurate representation of all aspects of a metabolic network in a structured format, such that it can be used for a wide variety of computational analyses, is a challenge faced by a growing number of researchers. Analysis of five major metabolic pathway databases reveals that each database has made widely different choices to address this challenge, including how to deal with knowledge that is uncertain or missing. In concise overviews, we show how concepts such as compartments, enzymatic complexes and the direction of reactions are represented in each database. Importantly, also concepts which a database does not represent are described. Which aspects of the metabolic network need to be available in a structured format and to what detail differs per application. For example, for in silico phenotype prediction, a detailed representation of gene-protein-reaction relations and the compartmentalization of the network is essential. Our analysis also shows that current databases are still limited in capturing all details of the biology of the metabolic network, further illustrated with a detailed analysis of three metabolic processes. Finally, we conclude that the conceptual differences between the databases, which make knowledge exchange and integration a challenge, have not been resolved, so far, by the exchange formats in which knowledge representation is standardized.

  6. Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary.

    PubMed

    Mao, Xizeng; Cai, Tao; Olyarchuk, John G; Wei, Liping

    2005-10-01

    High-throughput technologies such as DNA sequencing and microarrays have created the need for automated annotation of large sets of genes, including whole genomes, and automated identification of pathways. Ontologies, such as the popular Gene Ontology (GO), provide a common controlled vocabulary for these types of automated analysis. Yet, while GO offers tremendous value, it also has certain limitations such as the lack of direct association with pathways. We demonstrated the use of the KEGG Orthology (KO), part of the KEGG suite of resources, as an alternative controlled vocabulary for automated annotation and pathway identification. We developed a KO-Based Annotation System (KOBAS) that can automatically annotate a set of sequences with KO terms and identify both the most frequent and the statistically significantly enriched pathways. Results from both whole genome and microarray gene cluster annotations with KOBAS are comparable and complementary to known annotations. KOBAS is a freely available stand-alone Python program that can contribute significantly to genome annotation and microarray analysis.

  7. EXPath: a database of comparative expression analysis inferring metabolic pathways for plants

    PubMed Central

    2015-01-01

    Background In general, the expression of gene alters conditionally to catalyze a specific metabolic pathway. Microarray-based datasets have been massively produced to monitor gene expression levels in parallel with numerous experimental treatments. Although several studies facilitated the linkage of gene expression data and metabolic pathways, none of them are amassed for plants. Moreover, advanced analysis such as pathways enrichment or how genes express under different conditions is not rendered. Description Therefore, EXPath was developed to not only comprehensively congregate the public microarray expression data from over 1000 samples in biotic stress, abiotic stress, and hormone secretion but also allow the usage of this abundant resource for coexpression analysis and differentially expression genes (DEGs) identification, finally inferring the enriched KEGG pathways and gene ontology (GO) terms of three model plants: Arabidopsis thaliana, Oryza sativa, and Zea mays. Users can access the gene expression patterns of interest under various conditions via five main functions (Gene Search, Pathway Search, DEGs Search, Pathways/GO Enrichment, and Coexpression analysis) in EXPath, which are presented by a user-friendly interface and valuable for further research. Conclusions In conclusion, EXPath, freely available at http://expath.itps.ncku.edu.tw, is a database resource that collects and utilizes gene expression profiles derived from microarray platforms under various conditions to infer metabolic pathways for plants. PMID:25708775

  8. EXPath: a database of comparative expression analysis inferring metabolic pathways for plants.

    PubMed

    Chien, Chia-Hung; Chow, Chi-Nga; Wu, Nai-Yun; Chiang-Hsieh, Yi-Fan; Hou, Ping-Fu; Chang, Wen-Chi

    2015-01-01

    In general, the expression of gene alters conditionally to catalyze a specific metabolic pathway. Microarray-based datasets have been massively produced to monitor gene expression levels in parallel with numerous experimental treatments. Although several studies facilitated the linkage of gene expression data and metabolic pathways, none of them are amassed for plants. Moreover, advanced analysis such as pathways enrichment or how genes express under different conditions is not rendered. Therefore, EXPath was developed to not only comprehensively congregate the public microarray expression data from over 1000 samples in biotic stress, abiotic stress, and hormone secretion but also allow the usage of this abundant resource for coexpression analysis and differentially expression genes (DEGs) identification, finally inferring the enriched KEGG pathways and gene ontology (GO) terms of three model plants: Arabidopsis thaliana, Oryza sativa, and Zea mays. Users can access the gene expression patterns of interest under various conditions via five main functions (Gene Search, Pathway Search, DEGs Search, Pathways/GO Enrichment, and Coexpression analysis) in EXPath, which are presented by a user-friendly interface and valuable for further research. In conclusion, EXPath, freely available at http://expath.itps.ncku.edu.tw, is a database resource that collects and utilizes gene expression profiles derived from microarray platforms under various conditions to infer metabolic pathways for plants.

  9. Analysis of tumor suppressor genes based on gene ontology and the KEGG pathway.

    PubMed

    Yang, Jing; Chen, Lei; Kong, Xiangyin; Huang, Tao; Cai, Yu-Dong

    2014-01-01

    Cancer is a serious disease that causes many deaths every year. We urgently need to design effective treatments to cure this disease. Tumor suppressor genes (TSGs) are a type of gene that can protect cells from becoming cancerous. In view of this, correct identification of TSGs is an alternative method for identifying effective cancer therapies. In this study, we performed gene ontology (GO) and pathway enrichment analysis of the TSGs and non-TSGs. Some popular feature selection methods, including minimum redundancy maximum relevance (mRMR) and incremental feature selection (IFS), were employed to analyze the enrichment features. Accordingly, some GO terms and KEGG pathways, such as biological adhesion, cell cycle control, genomic stability maintenance and cell death regulation, were extracted, which are important factors for identifying TSGs. We hope these findings can help in building effective prediction methods for identifying TSGs and thereby, promoting the discovery of effective cancer treatments.

  10. The Use of Gene Ontology Term and KEGG Pathway Enrichment for Analysis of Drug Half-Life

    PubMed Central

    Chen, Lei; Lu, Jing; Kong, XiangYin; Huang, Tao; Li, HaiPeng

    2016-01-01

    A drug’s biological half-life is defined as the time required for the human body to metabolize or eliminate 50% of the initial drug dosage. Correctly measuring the half-life of a given drug is helpful for the safe and accurate usage of the drug. In this study, we investigated which gene ontology (GO) terms and biological pathways were highly related to the determination of drug half-life. The investigated drugs, with known half-lives, were analyzed based on their enrichment scores for associated GO terms and KEGG pathways. These scores indicate which GO terms or KEGG pathways the drug targets. The feature selection method, minimum redundancy maximum relevance, was used to analyze these GO terms and KEGG pathways and to identify important GO terms and pathways, such as sodium-independent organic anion transmembrane transporter activity (GO:0015347), monoamine transmembrane transporter activity (GO:0008504), negative regulation of synaptic transmission (GO:0050805), neuroactive ligand-receptor interaction (hsa04080), serotonergic synapse (hsa04726), and linoleic acid metabolism (hsa00591), among others. This analysis confirmed our results and may show evidence for a new method in studying drug half-lives and building effective computational methods for the prediction of drug half-lives. PMID:27780226

  11. 1-CMDb: A Curated Database of Genomic Variations of the One-Carbon Metabolism Pathway.

    PubMed

    Bhat, Manoj K; Gadekar, Veerendra P; Jain, Aditya; Paul, Bobby; Rai, Padmalatha S; Satyamoorthy, Kapaettu

    2017-01-01

    The one-carbon metabolism pathway is vital in maintaining tissue homeostasis by driving the critical reactions of folate and methionine cycles. A myriad of genetic and epigenetic events mark the rate of reactions in a tissue-specific manner. Integration of these to predict and provide personalized health management requires robust computational tools that can process multiomics data. The DNA sequences that may determine the chain of biological events and the endpoint reactions within one-carbon metabolism genes remain to be comprehensively recorded. Hence, we designed the one-carbon metabolism database (1-CMDb) as a platform to interrogate its association with a host of human disorders. DNA sequence and network information of a total of 48 genes were extracted from a literature survey and KEGG pathway that are involved in the one-carbon folate-mediated pathway. The information generated, collected, and compiled for all these genes from the UCSC genome browser included the single nucleotide polymorphisms (SNPs), CpGs, copy number variations (CNVs), and miRNAs, and a comprehensive database was created. Furthermore, a significant correlation analysis was performed for SNPs in the pathway genes. Detailed data of SNPs, CNVs, CpG islands, and miRNAs for 48 folate pathway genes were compiled. The SNPs in CNVs (9670), CpGs (984), and miRNAs (14) were also compiled for all pathway genes. The SIFT score, the prediction and PolyPhen score, as well as the prediction for each of the SNPs were tabulated and represented for folate pathway genes. Also included in the database for folate pathway genes were the links to 124 various phenotypes and disease associations as reported in the literature and from publicly available information. A comprehensive database was generated consisting of genomic elements within and among SNPs, CNVs, CpGs, and miRNAs of one-carbon metabolism pathways to facilitate (a) single source of information and (b) integration into large-genome scale network

  12. Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data.

    PubMed

    Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia

    2015-01-01

    Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/.

  13. Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data

    PubMed Central

    Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia

    2015-01-01

    Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/ PMID:26363020

  14. SPIKE: a database of highly curated human signaling pathways.

    PubMed

    Paz, Arnon; Brownstein, Zippora; Ber, Yaara; Bialik, Shani; David, Eyal; Sagir, Dorit; Ulitsky, Igor; Elkon, Ran; Kimchi, Adi; Avraham, Karen B; Shiloh, Yosef; Shamir, Ron

    2011-01-01

    The rapid accumulation of knowledge on biological signaling pathways and their regulatory mechanisms has highlighted the need for specific repositories that can store, organize and allow retrieval of pathway information in a way that will be useful for the research community. SPIKE (Signaling Pathways Integrated Knowledge Engine; http://www.cs.tau.ac.il/&~spike/) is a database for achieving this goal, containing highly curated interactions for particular human pathways, along with literature-referenced information on the nature of each interaction. To make database population and pathway comprehension straightforward, a simple yet informative data model is used, and pathways are laid out as maps that reflect the curator’s understanding and make the utilization of the pathways easy. The database currently focuses primarily on pathways describing DNA damage response, cell cycle, programmed cell death and hearing related pathways. Pathways are regularly updated, and additional pathways are gradually added. The complete database and the individual maps are freely exportable in several formats. The database is accompanied by a stand-alone software tool for analysis and dynamic visualization of pathways.

  15. The KEGG resource for deciphering the genome.

    PubMed

    Kanehisa, Minoru; Goto, Susumu; Kawashima, Shuichi; Okuno, Yasushi; Hattori, Masahiro

    2004-01-01

    A grand challenge in the post-genomic era is a complete computer representation of the cell and the organism, which will enable computational prediction of higher-level complexity of cellular processes and organism behavior from genomic information. Toward this end we have been developing a knowledge-based approach for network prediction, which is to predict, given a complete set of genes in the genome, the protein interaction networks that are responsible for various cellular processes. KEGG at http://www.genome.ad.jp/kegg/ is the reference knowledge base that integrates current knowledge on molecular interaction networks such as pathways and complexes (PATHWAY database), information about genes and proteins generated by genome projects (GENES/SSDB/KO databases) and information about biochemical compounds and reactions (COMPOUND/GLYCAN/REACTION databases). These three types of database actually represent three graph objects, called the protein network, the gene universe and the chemical universe. New efforts are being made to abstract knowledge, both computationally and manually, about ortholog clusters in the KO (KEGG Orthology) database, and to collect and analyze carbohydrate structures in the GLYCAN database.

  16. The KEGG resource for deciphering the genome

    PubMed Central

    Kanehisa, Minoru; Goto, Susumu; Kawashima, Shuichi; Okuno, Yasushi; Hattori, Masahiro

    2004-01-01

    A grand challenge in the post-genomic era is a complete computer representation of the cell and the organism, which will enable computational prediction of higher-level complexity of cellular processes and organism behavior from genomic information. Toward this end we have been developing a knowledge-based approach for network prediction, which is to predict, given a complete set of genes in the genome, the protein interaction networks that are responsible for various cellular processes. KEGG at http://www.genome.ad.jp/kegg/ is the reference knowledge base that integrates current knowledge on molecular interaction networks such as pathways and complexes (PATHWAY database), information about genes and proteins generated by genome projects (GENES/SSDB/KO databases) and information about biochemical compounds and reactions (COMPOUND/GLYCAN/REACTION databases). These three types of database actually represent three graph objects, called the protein network, the gene universe and the chemical universe. New efforts are being made to abstract knowledge, both computationally and manually, about ortholog clusters in the KO (KEGG Orthology) database, and to collect and analyze carbohydrate structures in the GLYCAN database. PMID:14681412

  17. KaPPA-View4: a metabolic pathway database for representation and analysis of correlation networks of gene co-expression and metabolite co-accumulation and omics data

    PubMed Central

    Sakurai, Nozomu; Ara, Takeshi; Ogata, Yoshiyuki; Sano, Ryosuke; Ohno, Takashi; Sugiyama, Kenjiro; Hiruta, Atsushi; Yamazaki, Kiyoshi; Yano, Kentaro; Aoki, Koh; Aharoni, Asaph; Hamada, Kazuki; Yokoyama, Koji; Kawamura, Shingo; Otsuka, Hirofumi; Tokimatsu, Toshiaki; Kanehisa, Minoru; Suzuki, Hideyuki; Saito, Kazuki; Shibata, Daisuke

    2011-01-01

    Correlations of gene-to-gene co-expression and metabolite-to-metabolite co-accumulation calculated from large amounts of transcriptome and metabolome data are useful for uncovering unknown functions of genes, functional diversities of gene family members and regulatory mechanisms of metabolic pathway flows. Many databases and tools are available to interpret quantitative transcriptome and metabolome data, but there are only limited ones that connect correlation data to biological knowledge and can be utilized to find biological significance of it. We report here a new metabolic pathway database, KaPPA-View4 (http://kpv.kazusa.or.jp/kpv4/), which is able to overlay gene-to-gene and/or metabolite-to-metabolite relationships as curves on a metabolic pathway map, or on a combination of up to four maps. This representation would help to discover, for example, novel functions of a transcription factor that regulates genes on a metabolic pathway. Pathway maps of the Kyoto Encyclopedia of Genes and Genomes (KEGG) and maps generated from their gene classifications are available at KaPPA-View4 KEGG version (http://kpv.kazusa.or.jp/kpv4-kegg/). At present, gene co-expression data from the databases ATTED-II, COXPRESdb, CoP and MiBASE for human, mouse, rat, Arabidopsis, rice, tomato and other plants are available. PMID:21097783

  18. Prediction of Effective Drug Combinations by Chemical Interaction, Protein Interaction and Target Enrichment of KEGG Pathways

    PubMed Central

    Chen, Lei; Zheng, Ming-Yue; Zhang, Jian; Feng, Kai-Yan; Cai, Yu-Dong

    2013-01-01

    Drug combinatorial therapy could be more effective in treating some complex diseases than single agents due to better efficacy and reduced side effects. Although some drug combinations are being used, their underlying molecular mechanisms are still poorly understood. Therefore, it is of great interest to deduce a novel drug combination by their molecular mechanisms in a robust and rigorous way. This paper attempts to predict effective drug combinations by a combined consideration of: (1) chemical interaction between drugs, (2) protein interactions between drugs' targets, and (3) target enrichment of KEGG pathways. A benchmark dataset was constructed, consisting of 121 confirmed effective combinations and 605 random combinations. Each drug combination was represented by 465 features derived from the aforementioned three properties. Some feature selection techniques, including Minimum Redundancy Maximum Relevance and Incremental Feature Selection, were adopted to extract the key features. Random forest model was built with its performance evaluated by 5-fold cross-validation. As a result, 55 key features providing the best prediction result were selected. These important features may help to gain insights into the mechanisms of drug combinations, and the proposed prediction model could become a useful tool for screening possible drug combinations. PMID:24083237

  19. Redundancy control in pathway databases (ReCiPa): an application for improving gene-set enrichment analysis in Omics studies and "Big data" biology.

    PubMed

    Vivar, Juan C; Pemu, Priscilla; McPherson, Ruth; Ghosh, Sujoy

    2013-08-01

    Abstract Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of "Big Data" that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of 'omics'-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large 'omics' datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association

  20. Available pathways database (APD): an essential resource for combinatorial biology.

    PubMed

    Pirrung, M C; Silva, C M; Jaeger, J

    2000-10-01

    A relational database, the Available Pathways Database (APD), has been constructed of microbial natural products, their producing strains, and their biosynthetic pathways. The database allows the ready selection of donor strains for combinatorial biology experiments. It provides the same type of resource for combinatorial biology as the Available Chemicals Directory (ACD) does for combinatorial chemical library generation. Its cataloging ability can also provide insight into novel aspects of biosynthetic routes. In particular, no 10-unit Type I polyketides were found in the compilation of this edition of the APD (Version I).

  1. XTalkDB: a database of signaling pathway crosstalk

    PubMed Central

    Sam, Sarah A.; Teel, Joelle; Tegge, Allison N.; Bharadwaj, Aditya; Murali, T.M.

    2017-01-01

    Analysis of signaling pathways and their crosstalk is a cornerstone of systems biology. Thousands of papers have been published on these topics. Surprisingly, there is no database that carefully and explicitly documents crosstalk between specific pairs of signaling pathways. We have developed XTalkDB (http://www.xtalkdb.org) to fill this very important gap. XTalkDB contains curated information for 650 pairs of pathways from over 1600 publications. In addition, the database reports the molecular components (e.g. proteins, hormones, microRNAs) that mediate crosstalk between a pair of pathways and the species and tissue in which the crosstalk was observed. The XTalkDB website provides an easy-to-use interface for scientists to browse crosstalk information by querying one or more pathways or molecules of interest. PMID:27899583

  2. Carcinogenic effects of oil dispersants: A KEGG pathway-based RNA-seq study of human airway epithelial cells.

    PubMed

    Liu, Yao-Zhong; Zhang, Lei; Roy-Engel, Astrid M; Saito, Shigeki; Lasky, Joseph A; Wang, Guangdi; Wang, He

    2017-02-20

    The health impacts of the BP oil spill are yet to be further revealed as the toxicological effects of oil products and dispersants on human respiratory system may be latent and complex, and hence difficult to study and follow up. Here we performed RNA-seq analyses of a system of human airway epithelial cells treated with the BP crude oil and/or dispersants Corexit 9500 and Corexit 9527 that were used to help break up the oil spill. Based on the RNA-seq data, we then systemically analyzed the transcriptomic perturbations of the cells at the KEGG pathway level using two pathway-based analysis tools, GAGE (generally applicable gene set enrichment) and GSNCA (Gene Sets Net Correlations Analysis). Our results suggested a pattern of change towards carcinogenesis for the treated cells marked by upregulation of ribosomal biosynthesis (hsa03008) (p=1.97E-13), protein processing (hsa04141) (p=4.09E-7), Wnt signaling (hsa04310) (p=6.76E-3), neurotrophin signaling (hsa04722) (p=7.73E-3) and insulin signaling (hsa04910) (p=1.16E-2) pathways under the dispersant Corexit 9527 treatment, as identified by GAGE analysis. Furthermore, through GSNCA analysis, we identified gene co-expression changes for several KEGG cancer pathways, including small cell lung cancer pathway (hsa05222, p=9.99E-5), under various treatments of oil/dispersant, especially the mixture of oil and Corexit 9527. Overall, our results suggested carcinogenic effects of dispersants (in particular Corexit 9527) and their mixtures with the BP crude oil, and provided further support for more stringent safety precautions and regulations for operations involving long-term respiratory exposure to oil and dispersants.

  3. NAViGaTing the micronome--using multiple microRNA prediction databases to identify signalling pathway-associated microRNAs.

    PubMed

    Shirdel, Elize A; Xie, Wing; Mak, Tak W; Jurisica, Igor

    2011-02-25

    MicroRNAs are a class of small RNAs known to regulate gene expression at the transcript level, the protein level, or both. Since microRNA binding is sequence-based but possibly structure-specific, work in this area has resulted in multiple databases storing predicted microRNA:target relationships computed using diverse algorithms. We integrate prediction databases, compare predictions to in vitro data, and use cross-database predictions to model the microRNA:transcript interactome--referred to as the micronome--to study microRNA involvement in well-known signalling pathways as well as associations with disease. We make this data freely available with a flexible user interface as our microRNA Data Integration Portal--mirDIP (http://ophid.utoronto.ca/mirDIP). mirDIP integrates prediction databases to elucidate accurate microRNA:target relationships. Using NAViGaTOR to produce interaction networks implicating microRNAs in literature-based, KEGG-based and Reactome-based pathways, we find these signalling pathway networks have significantly more microRNA involvement compared to chance (p<0.05), suggesting microRNAs co-target many genes in a given pathway. Further examination of the micronome shows two distinct classes of microRNAs; universe microRNAs, which are involved in many signalling pathways; and intra-pathway microRNAs, which target multiple genes within one signalling pathway. We find universe microRNAs to have more targets (p<0.0001), to be more studied (p<0.0002), and to have higher degree in the KEGG cancer pathway (p<0.0001), compared to intra-pathway microRNAs. Our pathway-based analysis of mirDIP data suggests microRNAs are involved in intra-pathway signalling. We identify two distinct classes of microRNAs, suggesting a hierarchical organization of microRNAs co-targeting genes both within and between pathways, and implying differential involvement of universe and intra-pathway microRNAs at the disease level.

  4. NAViGaTing the Micronome – Using Multiple MicroRNA Prediction Databases to Identify Signalling Pathway-Associated MicroRNAs

    PubMed Central

    Shirdel, Elize A.; Xie, Wing; Mak, Tak W.; Jurisica, Igor

    2011-01-01

    Background MicroRNAs are a class of small RNAs known to regulate gene expression at the transcript level, the protein level, or both. Since microRNA binding is sequence-based but possibly structure-specific, work in this area has resulted in multiple databases storing predicted microRNA:target relationships computed using diverse algorithms. We integrate prediction databases, compare predictions to in vitro data, and use cross-database predictions to model the microRNA:transcript interactome – referred to as the micronome – to study microRNA involvement in well-known signalling pathways as well as associations with disease. We make this data freely available with a flexible user interface as our microRNA Data Integration Portal — mirDIP (http://ophid.utoronto.ca/mirDIP). Results mirDIP integrates prediction databases to elucidate accurate microRNA:target relationships. Using NAViGaTOR to produce interaction networks implicating microRNAs in literature-based, KEGG-based and Reactome-based pathways, we find these signalling pathway networks have significantly more microRNA involvement compared to chance (p<0.05), suggesting microRNAs co-target many genes in a given pathway. Further examination of the micronome shows two distinct classes of microRNAs; universe microRNAs, which are involved in many signalling pathways; and intra-pathway microRNAs, which target multiple genes within one signalling pathway. We find universe microRNAs to have more targets (p<0.0001), to be more studied (p<0.0002), and to have higher degree in the KEGG cancer pathway (p<0.0001), compared to intra-pathway microRNAs. Conclusions Our pathway-based analysis of mirDIP data suggests microRNAs are involved in intra-pathway signalling. We identify two distinct classes of microRNAs, suggesting a hierarchical organization of microRNAs co-targeting genes both within and between pathways, and implying differential involvement of universe and intra-pathway microRNAs at the disease level. PMID

  5. Co-expressed Pathways DataBase for Tomato: a database to predict pathways relevant to a query gene.

    PubMed

    Narise, Takafumi; Sakurai, Nozomu; Obayashi, Takeshi; Ohta, Hiroyuki; Shibata, Daisuke

    2017-06-05

    Gene co-expression, the similarity of gene expression profiles under various experimental conditions, has been used as an indicator of functional relationships between genes, and many co-expression databases have been developed for predicting gene functions. These databases usually provide users with a co-expression network and a list of strongly co-expressed genes for a query gene. Several of these databases also provide functional information on a set of strongly co-expressed genes (i.e., provide biological processes and pathways that are enriched in these strongly co-expressed genes), which is generally analyzed via over-representation analysis (ORA). A limitation of this approach may be that users can predict gene functions only based on the strongly co-expressed genes. In this study, we developed a new co-expression database that enables users to predict the function of tomato genes from the results of functional enrichment analyses of co-expressed genes while considering the genes that are not strongly co-expressed. To achieve this, we used the ORA approach with several thresholds to select co-expressed genes, and performed gene set enrichment analysis (GSEA) applied to a ranked list of genes ordered by the co-expression degree. We found that internal correlation in pathways affected the significance levels of the enrichment analyses. Therefore, we introduced a new measure for evaluating the relationship between the gene and pathway, termed the percentile (p)-score, which enables users to predict functionally relevant pathways without being affected by the internal correlation in pathways. In addition, we evaluated our approaches using receiver operating characteristic curves, which concluded that the p-score could improve the performance of the ORA. We developed a new database, named Co-expressed Pathways DataBase for Tomato, which is available at http://cox-path-db.kazusa.or.jp/tomato . The database allows users to predict pathways that are relevant to a

  6. Consensus and conflict cards for metabolic pathway databases

    PubMed Central

    2013-01-01

    Background The metabolic network of H. sapiens and many other organisms is described in multiple pathway databases. The level of agreement between these descriptions, however, has proven to be low. We can use these different descriptions to our advantage by identifying conflicting information and combining their knowledge into a single, more accurate, and more complete description. This task is, however, far from trivial. Results We introduce the concept of Consensus and Conflict Cards (C2Cards) to provide concise overviews of what the databases do or do not agree on. Each card is centered at a single gene, EC number or reaction. These three complementary perspectives make it possible to distinguish disagreements on the underlying biology of a metabolic process from differences that can be explained by different decisions on how and in what detail to represent knowledge. As a proof-of-concept, we implemented C2CardsHuman, as a web application http://www.molgenis.org/c2cards, covering five human pathway databases. Conclusions C2Cards can contribute to ongoing reconciliation efforts by simplifying the identification of consensus and conflicts between pathway databases and lowering the threshold for experts to contribute. Several case studies illustrate the potential of the C2Cards in identifying disagreements on the underlying biology of a metabolic process. The overviews may also point out controversial biological knowledge that should be subject of further research. Finally, the examples provided emphasize the importance of manual curation and the need for a broad community involvement. PMID:23803311

  7. miRnalyze: an interactive database linking tool to unlock intuitive microRNA regulation of cell signaling pathways.

    PubMed

    Subhra Das, Sankha; James, Mithun; Paul, Sandip; Chakravorty, Nishant

    2017-01-01

    The various pathophysiological processes occurring in living systems are known to be orchestrated by delicate interplays and cross-talks between different genes and their regulators. Among the various regulators of genes, there is a class of small non-coding RNA molecules known as microRNAs. Although, the relative simplicity of miRNAs and their ability to modulate cellular processes make them attractive therapeutic candidates, their presence in large numbers make it challenging for experimental researchers to interpret the intricacies of the molecular processes they regulate. Most of the existing bioinformatic tools fail to address these challenges. Here, we present a new web resource 'miRnalyze' that has been specifically designed to directly identify the putative regulation of cell signaling pathways by miRNAs. The tool integrates miRNA-target predictions with signaling cascade members by utilizing TargetScanHuman 7.1 miRNA-target prediction tool and the KEGG pathway database, and thus provides researchers with in-depth insights into modulation of signal transduction pathways by miRNAs. miRnalyze is capable of identifying common miRNAs targeting more than one gene in the same signaling pathway-a feature that further increases the probability of modulating the pathway and downstream reactions when using miRNA modulators. Additionally, miRnalyze can sort miRNAs according to the seed-match types and TargetScan Context ++ score, thus providing a hierarchical list of most valuable miRNAs. Furthermore, in order to provide users with comprehensive information regarding miRNAs, genes and pathways, miRnalyze also links to expression data of miRNAs (miRmine) and genes (TiGER) and proteome abundance (PaxDb) data. To validate the capability of the tool, we have documented the correlation of miRnalyze's prediction with experimental confirmation studies. http://www.mirnalyze.in.

  8. Interactive web service system for exploration of biological pathways.

    PubMed

    Yin, Zong-Xian; Li, Sin-Yan

    2014-09-01

    Existing bioinformatics databases such as KEGG (Kyoto Encyclopedia of Genes and Genomes) provide a wealth of information. However, they generally lack a user-friendly and interactive interface. The study proposes a web service system for exploring the contents of the KEGG database in an intuitive and interactive manner. In the proposed system, the requested pathways are uploaded from the KEGG database and are converted from a static format into an interactive format such that their contents can be more readily explored. The system supports two basic functions, namely an exhaustive search for all possible reaction paths between two specified genes in a biological pathway, and the identification of similar reaction sequences in different biological pathways. The feasibility of the proposed system is evaluated by means of an initial pilot study involving 10 students with varying degrees of experience of the KEGG website and its operations. The results indicate that the system provides a useful learning tool for investigating biological pathways. A system is proposed for converting the static pathway maps in KEGG into interactive maps such that they can be explored at will. The results of a preliminary trial confirm that the system is straightforward to use and provides a versatile and effective tool for examining and comparing biological pathways. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. DEOP: a database on osmoprotectants and associated pathways

    PubMed Central

    Bougouffa, Salim; Radovanovic, Aleksandar; Essack, Magbubah; Bajic, Vladimir B.

    2014-01-01

    Microorganisms are known to counteract salt stress through salt influx or by the accumulation of osmoprotectants (also called compatible solutes). Understanding the pathways that synthesize and/or breakdown these osmoprotectants is of interest to studies of crops halotolerance and to biotechnology applications that use microbes as cell factories for production of biomass or commercial chemicals. To facilitate the exploration of osmoprotectants, we have developed the first online resource, ‘Dragon Explorer of Osmoprotection associated Pathways’ (DEOP) that gathers and presents curated information about osmoprotectants, complemented by information about reactions and pathways that use or affect them. A combined total of 141 compounds were confirmed osmoprotectants, which were matched to 1883 reactions and 834 pathways. DEOP can also be used to map genes or microbial genomes to potential osmoprotection-associated pathways, and thus link genes and genomes to other associated osmoprotection information. Moreover, DEOP provides a text-mining utility to search deeper into the scientific literature for supporting evidence or for new associations of osmoprotectants to pathways, reactions, enzymes, genes or organisms. Two case studies are provided to demonstrate the usefulness of DEOP. The system can be accessed at. Database URL: http://www.cbrc.kaust.edu.sa/deop/ PMID:25326239

  10. Database for exchangeable gene trap clones: pathway and gene ontology analysis of exchangeable gene trap clone mouse lines.

    PubMed

    Araki, Masatake; Nakahara, Mai; Muta, Mayumi; Itou, Miharu; Yanai, Chika; Yamazoe, Fumika; Miyake, Mikiko; Morita, Ayaka; Araki, Miyuki; Okamoto, Yoshiyuki; Nakagata, Naomi; Yoshinobu, Kumiko; Yamamura, Ken-ichi; Araki, Kimi

    2014-02-01

    Gene trapping in embryonic stem (ES) cells is a proven method for large-scale random insertional mutagenesis in the mouse genome. We have established an exchangeable gene trap system, in which a reporter gene can be exchanged for any other DNA of interest through Cre/mutant lox-mediated recombination. We isolated trap clones, analyzed trapped genes, and constructed the database for Exchangeable Gene Trap Clones (EGTC) [http://egtc.jp]. The number of registered ES cell lines was 1162 on 31 August 2013. We also established 454 mouse lines from trap ES clones and deposited them in the mouse embryo bank at the Center for Animal Resources and Development, Kumamoto University, Japan. The EGTC database is the most extensive academic resource for gene-trap mouse lines. Because we used a promoter-trap strategy, all trapped genes were expressed in ES cells. To understand the general characteristics of the trapped genes in the EGTC library, we used Kyoto Encyclopedia of Genes and Genomes (KEGG) for pathway analysis and found that the EGTC ES clones covered a broad range of pathways. We also used Gene Ontology (GO) classification data provided by Mouse Genome Informatics (MGI) to compare the functional distribution of genes in each GO term between trapped genes in the EGTC mouse lines and total genes annotated in MGI. We found the functional distributions for the trapped genes in the EGTC mouse lines and for the RefSeq genes for the whole mouse genome were similar, indicating that the EGTC mouse lines had trapped a wide range of mouse genes. © 2014 The Authors Development, Growth & Differentiation © 2014 Japanese Society of Developmental Biologists.

  11. KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase - an open access and searchable database of a coral genome

    PubMed Central

    2013-01-01

    Background Contemporary coral reef research has firmly established that a genomic approach is urgently needed to better understand the effects of anthropogenic environmental stress and global climate change on coral holobiont interactions. Here we present KEGG orthology-based annotation of the complete genome sequence of the scleractinian coral Acropora digitifera and provide the first comprehensive view of the genome of a reef-building coral by applying advanced bioinformatics. Description Sequences from the KEGG database of protein function were used to construct hidden Markov models. These models were used to search the predicted proteome of A. digitifera to establish complete genomic annotation. The annotated dataset is published in ZoophyteBase, an open access format with different options for searching the data. A particularly useful feature is the ability to use a Google-like search engine that links query words to protein attributes. We present features of the annotation that underpin the molecular structure of key processes of coral physiology that include (1) regulatory proteins of symbiosis, (2) planula and early developmental proteins, (3) neural messengers, receptors and sensory proteins, (4) calcification and Ca2+-signalling proteins, (5) plant-derived proteins, (6) proteins of nitrogen metabolism, (7) DNA repair proteins, (8) stress response proteins, (9) antioxidant and redox-protective proteins, (10) proteins of cellular apoptosis, (11) microbial symbioses and pathogenicity proteins, (12) proteins of viral pathogenicity, (13) toxins and venom, (14) proteins of the chemical defensome and (15) coral epigenetics. Conclusions We advocate that providing annotation in an open-access searchable database available to the public domain will give an unprecedented foundation to interrogate the fundamental molecular structure and interactions of coral symbiosis and allow critical questions to be addressed at the genomic level based on combined aspects of

  12. Chemogenomic analysis of neuronal differentiation with pathway changes in PC12 cells.

    PubMed

    Lin, Jack Yu-Shih; Wu, Chien Liang; Liao, Chia Nan; Higuchi, Akon; Ling, Qing-Dong

    2016-01-01

    The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database creates networks from interrelations between molecular biology and underlying chemical elements. This allows for analysis of biologic networks, genomic information, and higher-order functional information at a system level. Through high throughput experiments and system biology analysis, we investigated the genes and pathways associated with NGF induced neuronal differentiation. We performed microarray experiments and used the KEGG database, system biology analysis, and annotation of pathway functions to study NGF-induced differentiation in PC12 cells. We identified 2020 NGF-induced genes with altered expressions over time. Cross-matching with the KEGG database revealed 830 genes; among which, 395 altered genes were found to have a 2-fold increase in gene expression over a two-hour period. We then identified 191 associated biologic pathways in the KEGG database; the top 15 pathways showed correlation with neural differentiation. These included the neurotrophin pathways, mitogen-activated protein kinase (MAPK) pathways, genes associated with axonal guidance and the Wnt pathways. The activation of these pathways synchronized with nerve growth factor (NGF)-induced differentiation in PC12 cells. In summary, we have established a model system that allows one to systematically characterize the functional pathway changes in a group of neuronal population after an external stimulus.

  13. Genic and Intergenic SSR Database Generation, SNPs Determination and Pathway Annotations, in Date Palm (Phoenix dactylifera L.)

    PubMed Central

    2016-01-01

    The present investigation was carried out aiming to use the bioinformatics tools in order to identify and characterize, simple sequence repeats within the third Version of the date palm genome and develop a new SSR primers database. In addition single nucleotide polymorphisms (SNPs) that are located within the SSR flanking regions were recognized. Moreover, the pathways for the sequences assigned by SSR primers, the biological functions and gene interaction were determined. A total of 172,075 SSR motifs was identified on date palm genome sequence with a frequency of 450.97 SSRs per Mb. Out of these, 130,014 SSRs (75.6%) were located within the intergenic regions with a frequency of 499 SSRs per Mb. While, only 42,061 SSRs (24.4%) were located within the genic regions with a frequency of 347.5 SSRs per Mb. A total of 111,403 of SSR primer pairs were designed, that represents 291.9 SSR primers per Mb. Out of the 111,403, only 31,380 SSR primers were in the genic regions, while 80,023 primers were in the intergenic regions. A number of 250,507 SNPs were recognized in 84,172 SSR flanking regions, which represents 75.55% of the total SSR flanking regions. Out of 12,274 genes only 463 genes comprising 896 SSR primers were mapped onto 111 pathways using KEGG data base. The most abundant enzymes were identified in the pathway related to the biosynthesis of antibiotics. We tested 1031 SSR primers using both publicly available date palm genome sequences as templates in the in silico PCR reactions. Concerning in vitro validation, 31 SSR primers among those used in the in silico PCR were synthesized and tested for their ability to detect polymorphism among six Egyptian date palm cultivars. All tested primers have successfully amplified products, but only 18 primers detected polymorphic amplicons among the studied date palm cultivars. PMID:27434138

  14. Library of Apicomplexan Metabolic Pathways: a manually curated database for metabolic pathways of apicomplexan parasites

    PubMed Central

    Shanmugasundram, Achchuthan; Gonzalez-Galarza, Faviel F.; Wastling, Jonathan M.; Vasieva, Olga; Jones, Andrew R.

    2013-01-01

    The Library of Apicomplexan Metabolic Pathways (LAMP, http://www.llamp.net) is a web database that provides near complete mapping from genes to the central metabolic functions for some of the prominent intracellular parasites of the phylum Apicomplexa. This phylum includes the causative agents of malaria, toxoplasmosis and theileriosis—diseases with a huge economic and social impact. A number of apicomplexan genomes have been sequenced, but the accurate annotation of gene function remains challenging. We have adopted an approach called metabolic reconstruction, in which genes are systematically assigned to functions within pathways/networks for Toxoplasma gondii, Neospora caninum, Cryptosporidium and Theileria species, and Babesia bovis. Several functions missing from pathways have been identified, where the corresponding gene for an essential process appears to be absent from the current genome annotation. For each species, LAMP contains interactive diagrams of each pathway, hyperlinked to external resources and annotated with detailed information, including the sources of evidence used. We have also developed a section to highlight the overall metabolic capabilities of each species, such as the ability to synthesize or the dependence on the host for a particular metabolite. We expect this new database will become a valuable resource for fundamental and applied research on the Apicomplexa. PMID:23193253

  15. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    PubMed Central

    Caspi, Ron; Altman, Tomer; Dreher, Kate; Fulcher, Carol A.; Subhraveti, Pallavi; Keseler, Ingrid M.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Ong, Quang; Paley, Suzanne; Pujar, Anuradha; Shearer, Alexander G.; Travers, Michael; Weerasinghe, Deepika; Zhang, Peifen; Karp, Peter D.

    2012-01-01

    The MetaCyc database (http://metacyc.org/) provides a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains more than 1800 pathways derived from more than 30 000 publications, and is the largest curated collection of metabolic pathways currently available. Most reactions in MetaCyc pathways are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes and literature citations. BioCyc (http://biocyc.org/) is a collection of more than 1700 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference database, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs contain additional features, including predicted operons, transport systems and pathway-hole fillers. The BioCyc website and Pathway Tools software offer many tools for querying and analysis of PGDBs, including Omics Viewers and comparative analysis. New developments include a zoomable web interface for diagrams; flux-balance analysis model generation from PGDBs; web services; and a new tool called Web Groups. PMID:22102576

  16. Automated workflow-based exploitation of pathway databases provides new insights into genetic associations of metabolite profiles

    PubMed Central

    2013-01-01

    Background Genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) that associate with clinical phenotypes, but these SNPs usually explain just a small part of the heritability and have relatively modest effect sizes. In contrast, SNPs that associate with metabolite levels generally explain a higher percentage of the genetic variation and demonstrate larger effect sizes. Still, the discovery of SNPs associated with metabolite levels is challenging since testing all metabolites measured in typical metabolomics studies with all SNPs comes with a severe multiple testing penalty. We have developed an automated workflow approach that utilizes prior knowledge of biochemical pathways present in databases like KEGG and BioCyc to generate a smaller SNP set relevant to the metabolite. This paper explores the opportunities and challenges in the analysis of GWAS of metabolomic phenotypes and provides novel insights into the genetic basis of metabolic variation through the re-analysis of published GWAS datasets. Results Re-analysis of the published GWAS dataset from Illig et al. (Nature Genetics, 2010) using a pathway-based workflow (http://www.myexperiment.org/packs/319.html), confirmed previously identified hits and identified a new locus of human metabolic individuality, associating Aldehyde dehydrogenase family1 L1 (ALDH1L1) with serine/glycine ratios in blood. Replication in an independent GWAS dataset of phospholipids (Demirkan et al., PLoS Genetics, 2012) identified two novel loci supported by additional literature evidence: GPAM (Glycerol-3 phosphate acyltransferase) and CBS (Cystathionine beta-synthase). In addition, the workflow approach provided novel insight into the affected pathways and relevance of some of these gene-metabolite pairs in disease development and progression. Conclusions We demonstrate the utility of automated exploitation of background knowledge present in pathway databases for the analysis of GWAS

  17. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases.

    PubMed

    Caspi, Ron; Billington, Richard; Ferrer, Luciana; Foerster, Hartmut; Fulcher, Carol A; Keseler, Ingrid M; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A; Ong, Quang; Paley, Suzanne; Subhraveti, Pallavi; Weaver, Daniel S; Karp, Peter D

    2016-01-04

    The MetaCyc database (MetaCyc.org) is a freely accessible comprehensive database describing metabolic pathways and enzymes from all domains of life. The majority of MetaCyc pathways are small-molecule metabolic pathways that have been experimentally determined. MetaCyc contains more than 2400 pathways derived from >46,000 publications, and is the largest curated collection of metabolic pathways. BioCyc (BioCyc.org) is a collection of 5700 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems, and pathway-hole fillers. The BioCyc website offers a variety of tools for querying and analyzing PGDBs, including Omics Viewers and tools for comparative analysis. This article provides an update of new developments in MetaCyc and BioCyc during the last two years, including addition of Gibbs free energy values for compounds and reactions; redesign of the primary gene/protein page; addition of a tool for creating diagrams containing multiple linked pathways; several new search capabilities, including searching for genes based on sequence patterns, searching for databases based on an organism's phenotypes, and a cross-organism search; and a metabolite identifier translation service.

  18. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    PubMed Central

    Caspi, Ron; Billington, Richard; Ferrer, Luciana; Foerster, Hartmut; Fulcher, Carol A.; Keseler, Ingrid M.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Ong, Quang; Paley, Suzanne; Subhraveti, Pallavi; Weaver, Daniel S.; Karp, Peter D.

    2016-01-01

    The MetaCyc database (MetaCyc.org) is a freely accessible comprehensive database describing metabolic pathways and enzymes from all domains of life. The majority of MetaCyc pathways are small-molecule metabolic pathways that have been experimentally determined. MetaCyc contains more than 2400 pathways derived from >46 000 publications, and is the largest curated collection of metabolic pathways. BioCyc (BioCyc.org) is a collection of 5700 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems, and pathway-hole fillers. The BioCyc website offers a variety of tools for querying and analyzing PGDBs, including Omics Viewers and tools for comparative analysis. This article provides an update of new developments in MetaCyc and BioCyc during the last two years, including addition of Gibbs free energy values for compounds and reactions; redesign of the primary gene/protein page; addition of a tool for creating diagrams containing multiple linked pathways; several new search capabilities, including searching for genes based on sequence patterns, searching for databases based on an organism's phenotypes, and a cross-organism search; and a metabolite identifier translation service. PMID:26527732

  19. Database Constraints Applied to Metabolic Pathway Reconstruction Tools

    PubMed Central

    Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi

    2014-01-01

    Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes. PMID:25202745

  20. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases.

    PubMed

    Caspi, Ron; Altman, Tomer; Billington, Richard; Dreher, Kate; Foerster, Hartmut; Fulcher, Carol A; Holland, Timothy A; Keseler, Ingrid M; Kothari, Anamika; Kubo, Aya; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A; Ong, Quang; Paley, Suzanne; Subhraveti, Pallavi; Weaver, Daniel S; Weerasinghe, Deepika; Zhang, Peifen; Karp, Peter D

    2014-01-01

    The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible database describing metabolic pathways and enzymes from all domains of life. MetaCyc pathways are experimentally determined, mostly small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains >2100 pathways derived from >37,000 publications, and is the largest curated collection of metabolic pathways currently available. BioCyc (BioCyc.org) is a collection of >3000 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems and pathway-hole fillers. Additions to BioCyc over the past 2 years include YeastCyc, a PGDB for Saccharomyces cerevisiae, and 891 new genomes from the Human Microbiome Project. The BioCyc Web site offers a variety of tools for querying and analysis of PGDBs, including Omics Viewers and tools for comparative analysis. New developments include atom mappings in reactions, a new representation of glycan degradation pathways, improved compound structure display, better coverage of enzyme kinetic data, enhancements of the Web Groups functionality, improvements to the Omics viewers, a new representation of the Enzyme Commission system and, for the desktop version of the software, the ability to save display states.

  1. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases

    PubMed Central

    Caspi, Ron; Altman, Tomer; Billington, Richard; Dreher, Kate; Foerster, Hartmut; Fulcher, Carol A.; Holland, Timothy A.; Keseler, Ingrid M.; Kothari, Anamika; Kubo, Aya; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Ong, Quang; Paley, Suzanne; Subhraveti, Pallavi; Weaver, Daniel S.; Weerasinghe, Deepika; Zhang, Peifen; Karp, Peter D.

    2014-01-01

    The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible database describing metabolic pathways and enzymes from all domains of life. MetaCyc pathways are experimentally determined, mostly small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains >2100 pathways derived from >37 000 publications, and is the largest curated collection of metabolic pathways currently available. BioCyc (BioCyc.org) is a collection of >3000 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems and pathway-hole fillers. Additions to BioCyc over the past 2 years include YeastCyc, a PGDB for Saccharomyces cerevisiae, and 891 new genomes from the Human Microbiome Project. The BioCyc Web site offers a variety of tools for querying and analysis of PGDBs, including Omics Viewers and tools for comparative analysis. New developments include atom mappings in reactions, a new representation of glycan degradation pathways, improved compound structure display, better coverage of enzyme kinetic data, enhancements of the Web Groups functionality, improvements to the Omics viewers, a new representation of the Enzyme Commission system and, for the desktop version of the software, the ability to save display states. PMID:24225315

  2. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    PubMed Central

    Caspi, Ron; Altman, Tomer; Dale, Joseph M.; Dreher, Kate; Fulcher, Carol A.; Gilham, Fred; Kaipa, Pallavi; Karthikeyan, Athikkattuvalasu S.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Paley, Suzanne; Popescu, Liviu; Pujar, Anuradha; Shearer, Alexander G.; Zhang, Peifen; Karp, Peter D.

    2010-01-01

    The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic pathways currently available. Pathways reactions are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and literature citations. BioCyc (BioCyc.org) is a collection of more than 500 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs also contain additional features, such as predicted operons, transport systems, and pathway hole-fillers. The BioCyc Web site offers several tools for the analysis of the PGDBs, including Omics Viewers that enable visualization of omics datasets on two different genome-scale diagrams and tools for comparative analysis. The BioCyc PGDBs generated by SRI are offered for adoption by any party interested in curation of metabolic, regulatory, and genome-related information about an organism. PMID:19850718

  3. SMPDB 2.0: big improvements to the Small Molecule Pathway Database.

    PubMed

    Jewison, Timothy; Su, Yilu; Disfany, Fatemeh Miri; Liang, Yongjie; Knox, Craig; Maciejewski, Adam; Poelzer, Jenna; Huynh, Jessica; Zhou, You; Arndt, David; Djoumbou, Yannick; Liu, Yifeng; Deng, Lu; Guo, An Chi; Han, Beomsoo; Pon, Allison; Wilson, Michael; Rafatnia, Shahrzad; Liu, Philip; Wishart, David S

    2014-01-01

    The Small Molecule Pathway Database (SMPDB, http://www.smpdb.ca) is a comprehensive, colorful, fully searchable and highly interactive database for visualizing human metabolic, drug action, drug metabolism, physiological activity and metabolic disease pathways. SMPDB contains >600 pathways with nearly 75% of its pathways not found in any other database. All SMPDB pathway diagrams are extensively hyperlinked and include detailed information on the relevant tissues, organs, organelles, subcellular compartments, protein cofactors, protein locations, metabolite locations, chemical structures and protein quaternary structures. Since its last release in 2010, SMPDB has undergone substantial upgrades and significant expansion. In particular, the total number of pathways in SMPDB has grown by >70%. Additionally, every previously entered pathway has been completely redrawn, standardized, corrected, updated and enhanced with additional molecular or cellular information. Many SMPDB pathways now include transporter proteins as well as much more physiological, tissue, target organ and reaction compartment data. Thanks to the development of a standardized pathway drawing tool (called PathWhiz) all SMPDB pathways are now much more easily drawn and far more rapidly updated. PathWhiz has also allowed all SMPDB pathways to be saved in a BioPAX format. Significant improvements to SMPDB's visualization interface now make the browsing, selection, recoloring and zooming of pathways far easier and far more intuitive. Because of its utility and breadth of coverage, SMPDB is now integrated into several other databases including HMDB and DrugBank.

  4. Data mining in the MetaCyc family of pathway databases.

    PubMed

    Karp, Peter D; Paley, Suzanne; Altman, Tomer

    2013-01-01

    Pathway databases collect the bioreactions and molecular interactions that define the processes of life. The MetaCyc family of pathway databases consists of thousands of databases that were derived through computational inference of metabolic pathways from the MetaCyc pathway/genome database (PGDB). In some cases, these DBs underwent subsequent manual curation. Curated pathway DBs are now available for most of the major model organisms. Databases in the MetaCyc family are managed using the Pathway Tools software. This chapter presents methods for performing data mining on the MetaCyc family of pathway DBs. We discuss the major data access mechanisms for the family, which include data files in multiple formats; application programming interfaces (APIs) for the Lisp, Java, and Perl languages; and web services. We present an overview of the Pathway Tools schema, an understanding of which is needed to query the DBs. The chapter also presents several interactive data mining tools within Pathway Tools for performing omics data analysis.

  5. Data Mining in the MetaCyc Family of Pathway Databases

    PubMed Central

    Karp, Peter D.; Paley, Suzanne; Altman, Tomer

    2013-01-01

    Pathway databases collect the bioreactions and molecular interactions that define the processes of life. The MetaCyc family of pathway databases consists of thousands of databases that were derived through computational inference of metabolic pathways from the MetaCyc Pathway/Genome Database (PGDB). In some cases these DBs underwent subsequent manual curation. Curated pathway DBs are now available for most of the major model organisms. Databases in the MetaCyc family are managed using the Pathway Tools software. This chapter presents methods for performing data mining on the MetaCyc family of pathway DBs. We discuss the major data access mechanisms for the family, which include data files in multiple formats; application programming interfaces (APIs) for the Lisp, Java, and Perl languages; and web services. We present an overview of the Pathway Tools schema, an understanding of which is needed to query the DBs. The chapter also presents several interactive data mining tools within Pathway Tools for performing omics data analysis. PMID:23192547

  6. Data Exchange Format for Biological Pathway Databases (BioPAX) Workshop - Final Technical Report

    SciTech Connect

    Chris Sander, PhD

    2004-07-28

    In June 2003, the Department of Energy (DOE) allocated funds in support of the development of A Data Exchange Format for Biological Pathway Databases (BioPAX). The primary objective of the BioPAX initiative (http://www.biopax.org) is the development of a single, consensus-based standard for a data exchange format for biological pathway databases that can be widely adopted in a timely manner as a strategy for the interchange of biological pathway data in the life science community. BioPAX Level 1, Version 1.0, released July 2004, supports metabolic pathway data and is initially supported by the BioCyc and WIT databases. This work was developed during community led workshops that were significantly funded by this grant. Subsequent releases of BioPAX will add support for protein-protein interactions, signal transduction pathways, genetic interactions, and other pathway data types.

  7. STON: exploring biological pathways using the SBGN standard and graph databases.

    PubMed

    Touré, Vasundra; Mazein, Alexander; Waltemath, Dagmar; Balaur, Irina; Saqi, Mansoor; Henkel, Ron; Pellet, Johann; Auffray, Charles

    2016-12-05

    When modeling in Systems Biology and Systems Medicine, the data is often extensive, complex and heterogeneous. Graphs are a natural way of representing biological networks. Graph databases enable efficient storage and processing of the encoded biological relationships. They furthermore support queries on the structure of biological networks. We present the Java-based framework STON (SBGN TO Neo4j). STON imports and translates metabolic, signalling and gene regulatory pathways represented in the Systems Biology Graphical Notation into a graph-oriented format compatible with the Neo4j graph database. STON exploits the power of graph databases to store and query complex biological pathways. This advances the possibility of: i) identifying subnetworks in a given pathway; ii) linking networks across different levels of granularity to address difficulties related to incomplete knowledge representation at single level; and iii) identifying common patterns between pathways in the database.

  8. VisANT 3.0: new modules for pathway visualization, editing, prediction and construction

    PubMed Central

    Hu, Zhenjun; Ng, David M.; Yamada, Takuji; Chen, Chunnuan; Kawashima, Shuichi; Mellor, Joe; Linghu, Bolan; Kanehisa, Minoru; Stuart, Joshua M.; DeLisi, Charles

    2007-01-01

    With the integration of the KEGG and Predictome databases as well as two search engines for coexpressed genes/proteins using data sets obtained from the Stanford Microarray Database (SMD) and Gene Expression Omnibus (GEO) database, VisANT 3.0 supports exploratory pathway analysis, which includes multi-scale visualization of multiple pathways, editing and annotating pathways using a KEGG compatible visual notation and visualization of expression data in the context of pathways. Expression levels are represented either by color intensity or by nodes with an embedded expression profile. Multiple experiments can be navigated or animated. Known KEGG pathways can be enriched by querying either coexpressed components of known pathway members or proteins with known physical interactions. Predicted pathways for genes/proteins with unknown functions can be inferred from coexpression or physical interaction data. Pathways produced in VisANT can be saved as computer-readable XML format (VisML), graphic images or high-resolution Scalable Vector Graphics (SVG). Pathways in the format of VisML can be securely shared within an interested group or published online using a simple Web link. VisANT is freely available at http://visant.bu.edu. PMID:17586824

  9. VisANT 3.0: new modules for pathway visualization, editing, prediction and construction.

    PubMed

    Hu, Zhenjun; Ng, David M; Yamada, Takuji; Chen, Chunnuan; Kawashima, Shuichi; Mellor, Joe; Linghu, Bolan; Kanehisa, Minoru; Stuart, Joshua M; DeLisi, Charles

    2007-07-01

    With the integration of the KEGG and Predictome databases as well as two search engines for coexpressed genes/proteins using data sets obtained from the Stanford Microarray Database (SMD) and Gene Expression Omnibus (GEO) database, VisANT 3.0 supports exploratory pathway analysis, which includes multi-scale visualization of multiple pathways, editing and annotating pathways using a KEGG compatible visual notation and visualization of expression data in the context of pathways. Expression levels are represented either by color intensity or by nodes with an embedded expression profile. Multiple experiments can be navigated or animated. Known KEGG pathways can be enriched by querying either coexpressed components of known pathway members or proteins with known physical interactions. Predicted pathways for genes/proteins with unknown functions can be inferred from coexpression or physical interaction data. Pathways produced in VisANT can be saved as computer-readable XML format (VisML), graphic images or high-resolution Scalable Vector Graphics (SVG). Pathways in the format of VisML can be securely shared within an interested group or published online using a simple Web link. VisANT is freely available at http://visant.bu.edu.

  10. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases.

    PubMed

    Xie, Chen; Mao, Xizeng; Huang, Jiaju; Ding, Yang; Wu, Jianmin; Dong, Shan; Kong, Lei; Gao, Ge; Li, Chuan-Yun; Wei, Liping

    2011-07-01

    High-throughput experimental technologies often identify dozens to hundreds of genes related to, or changed in, a biological or pathological process. From these genes one wants to identify biological pathways that may be involved and diseases that may be implicated. Here, we report a web server, KOBAS 2.0, which annotates an input set of genes with putative pathways and disease relationships based on mapping to genes with known annotations. It allows for both ID mapping and cross-species sequence similarity mapping. It then performs statistical tests to identify statistically significantly enriched pathways and diseases. KOBAS 2.0 incorporates knowledge across 1327 species from 5 pathway databases (KEGG PATHWAY, PID, BioCyc, Reactome and Panther) and 5 human disease databases (OMIM, KEGG DISEASE, FunDO, GAD and NHGRI GWAS Catalog). KOBAS 2.0 can be accessed at http://kobas.cbi.pku.edu.cn.

  11. Data, information, knowledge and principle: back to metabolism in KEGG.

    PubMed

    Kanehisa, Minoru; Goto, Susumu; Sato, Yoko; Kawashima, Masayuki; Furumichi, Miho; Tanabe, Mao

    2014-01-01

    In the hierarchy of data, information and knowledge, computational methods play a major role in the initial processing of data to extract information, but they alone become less effective to compile knowledge from information. The Kyoto Encyclopedia of Genes and Genomes (KEGG) resource (http://www.kegg.jp/ or http://www.genome.jp/kegg/) has been developed as a reference knowledge base to assist this latter process. In particular, the KEGG pathway maps are widely used for biological interpretation of genome sequences and other high-throughput data. The link from genomes to pathways is made through the KEGG Orthology system, a collection of manually defined ortholog groups identified by K numbers. To better automate this interpretation process the KEGG modules defined by Boolean expressions of K numbers have been expanded and improved. Once genes in a genome are annotated with K numbers, the KEGG modules can be computationally evaluated revealing metabolic capacities and other phenotypic features. The reaction modules, which represent chemical units of reactions, have been used to analyze design principles of metabolic networks and also to improve the definition of K numbers and associated annotations. For translational bioinformatics, the KEGG MEDICUS resource has been developed by integrating drug labels (package inserts) used in society.

  12. Kinase Pathway Database: An Integrated Protein-Kinase and NLP-Based Protein-Interaction Resource

    PubMed Central

    Koike, Asako; Kobayashi, Yoshiyuki; Takagi, Toshihisa

    2003-01-01

    Protein kinases play a crucial role in the regulation of cellular functions. Various kinds of information about these molecules are important for understanding signaling pathways and organism characteristics. We have developed the Kinase Pathway Database, an integrated database involving major completely sequenced eukaryotes. It contains the classification of protein kinases and their functional conservation, ortholog tables among species, protein–protein, protein–gene, and protein–compound interaction data, domain information, and structural information. It also provides an automatic pathway graphic image interface. The protein, gene, and compound interactions are automatically extracted from abstracts for all genes and proteins by natural-language processing (NLP).The method of automatic extraction uses phrase patterns and the GENA protein, gene, and compound name dictionary, which was developed by our group. With this database, pathways are easily compared among species using data with more than 47,000 protein interactions and protein kinase ortholog tables. The database is available for querying and browsing at http://kinasedb.ontology.ims.u-tokyo.ac.jp/. PMID:12799355

  13. Critical assessment of human metabolic pathway databases: a stepping stone for future integration

    PubMed Central

    2011-01-01

    Background Multiple pathway databases are available that describe the human metabolic network and have proven their usefulness in many applications, ranging from the analysis and interpretation of high-throughput data to their use as a reference repository. However, so far the various human metabolic networks described by these databases have not been systematically compared and contrasted, nor has the extent to which they differ been quantified. For a researcher using these databases for particular analyses of human metabolism, it is crucial to know the extent of the differences in content and their underlying causes. Moreover, the outcomes of such a comparison are important for ongoing integration efforts. Results We compared the genes, EC numbers and reactions of five frequently used human metabolic pathway databases. The overlap is surprisingly low, especially on reaction level, where the databases agree on 3% of the 6968 reactions they have combined. Even for the well-established tricarboxylic acid cycle the databases agree on only 5 out of the 30 reactions in total. We identified the main causes for the lack of overlap. Importantly, the databases are partly complementary. Other explanations include the number of steps a conversion is described in and the number of possible alternative substrates listed. Missing metabolite identifiers and ambiguous names for metabolites also affect the comparison. Conclusions Our results show that each of the five networks compared provides us with a valuable piece of the puzzle of the complete reconstruction of the human metabolic network. To enable integration of the networks, next to a need for standardizing the metabolite names and identifiers, the conceptual differences between the databases should be resolved. Considerable manual intervention is required to reach the ultimate goal of a unified and biologically accurate model for studying the systems biology of human metabolism. Our comparison provides a stepping stone

  14. A geographically-diverse collection of 418 human gut microbiome pathway genome databases

    PubMed Central

    Hahn, Aria S.; Altman, Tomer; Konwar, Kishori M.; Hanson, Niels W.; Kim, Dongjae; Relman, David A.; Dill, David L.; Hallam, Steven J.

    2017-01-01

    Advances in high-throughput sequencing are reshaping how we perceive microbial communities inhabiting the human body, with implications for therapeutic interventions. Several large-scale datasets derived from hundreds of human microbiome samples sourced from multiple studies are now publicly available. However, idiosyncratic data processing methods between studies introduce systematic differences that confound comparative analyses. To overcome these challenges, we developed GutCyc, a compendium of environmental pathway genome databases (ePGDBs) constructed from 418 assembled human microbiome datasets using MetaPathways, enabling reproducible functional metagenomic annotation. We also generated metabolic network reconstructions for each metagenome using the Pathway Tools software, empowering researchers and clinicians interested in visualizing and interpreting metabolic pathways encoded by the human gut microbiome. For the first time, GutCyc provides consistent annotations and metabolic pathway predictions, making possible comparative community analyses between health and disease states in inflammatory bowel disease, Crohn’s disease, and type 2 diabetes. GutCyc data products are searchable online, or may be downloaded and explored locally using MetaPathways and Pathway Tools. PMID:28398290

  15. Creation of a genome-wide metabolic pathway database for Populus trichocarpa using a new approach for reconstruction and curation of metabolic pathways for plants.

    PubMed

    Zhang, Peifen; Dreher, Kate; Karthikeyan, A; Chi, Anjo; Pujar, Anuradha; Caspi, Ron; Karp, Peter; Kirkup, Vanessa; Latendresse, Mario; Lee, Cynthia; Mueller, Lukas A; Muller, Robert; Rhee, Seung Yon

    2010-08-01

    Metabolic networks reconstructed from sequenced genomes or transcriptomes can help visualize and analyze large-scale experimental data, predict metabolic phenotypes, discover enzymes, engineer metabolic pathways, and study metabolic pathway evolution. We developed a general approach for reconstructing metabolic pathway complements of plant genomes. Two new reference databases were created and added to the core of the infrastructure: a comprehensive, all-plant reference pathway database, PlantCyc, and a reference enzyme sequence database, RESD, for annotating metabolic functions of protein sequences. PlantCyc (version 3.0) includes 714 metabolic pathways and 2,619 reactions from over 300 species. RESD (version 1.0) contains 14,187 literature-supported enzyme sequences from across all kingdoms. We used RESD, PlantCyc, and MetaCyc (an all-species reference metabolic pathway database), in conjunction with the pathway prediction software Pathway Tools, to reconstruct a metabolic pathway database, PoplarCyc, from the recently sequenced genome of Populus trichocarpa. PoplarCyc (version 1.0) contains 321 pathways with 1,807 assigned enzymes. Comparing PoplarCyc (version 1.0) with AraCyc (version 6.0, Arabidopsis [Arabidopsis thaliana]) showed comparable numbers of pathways distributed across all domains of metabolism in both databases, except for a higher number of AraCyc pathways in secondary metabolism and a 1.5-fold increase in carbohydrate metabolic enzymes in PoplarCyc. Here, we introduce these new resources and demonstrate the feasibility of using them to identify candidate enzymes for specific pathways and to analyze metabolite profiling data through concrete examples. These resources can be searched by text or BLAST, browsed, and downloaded from our project Web site (http://plantcyc.org).

  16. Modular Architecture of Metabolic Pathways Revealed by Conserved Sequences of Reactions

    PubMed Central

    2013-01-01

    The metabolic network is both a network of chemical reactions and a network of enzymes that catalyze reactions. Toward better understanding of this duality in the evolution of the metabolic network, we developed a method to extract conserved sequences of reactions called reaction modules from the analysis of chemical compound structure transformation patterns in all known metabolic pathways stored in the KEGG PATHWAY database. The extracted reaction modules are repeatedly used as if they are building blocks of the metabolic network and contain chemical logic of organic reactions. Furthermore, the reaction modules often correspond to traditional pathway modules defined as sets of enzymes in the KEGG MODULE database and sometimes to operon-like gene clusters in prokaryotic genomes. We identified well-conserved, possibly ancient, reaction modules involving 2-oxocarboxylic acids. The chain extension module that appears as the tricarboxylic acid (TCA) reaction sequence in the TCA cycle is now shown to be used in other pathways together with different types of modification modules. We also identified reaction modules and their connection patterns for aromatic ring cleavages in microbial biodegradation pathways, which are most characteristic in terms of both distinct reaction sequences and distinct gene clusters. The modular architecture of biodegradation modules will have a potential for predicting degradation pathways of xenobiotic compounds. The collection of these and many other reaction modules is made available as part of the KEGG database. PMID:23384306

  17. FragariaCyc: A Metabolic Pathway Database for Woodland Strawberry Fragaria vesca

    PubMed Central

    Naithani, Sushma; Partipilo, Christina M.; Raja, Rajani; Elser, Justin L.; Jaiswal, Pankaj

    2016-01-01

    FragariaCyc is a strawberry-specific cellular metabolic network based on the annotated genome sequence of Fragaria vesca L. ssp. vesca, accession Hawaii 4. It was built on the Pathway-Tools platform using MetaCyc as the reference. The experimental evidences from published literature were used for supporting/editing existing entities and for the addition of new pathways, enzymes, reactions, compounds, and small molecules in the database. To date, FragariaCyc comprises 66 super-pathways, 488 unique pathways, 2348 metabolic reactions, 3507 enzymes, and 2134 compounds. In addition to searching and browsing FragariaCyc, researchers can compare pathways across various plant metabolic networks and analyze their data using Omics Viewer tool. We view FragariaCyc as a resource for the community of researchers working with strawberry and related fruit crops. It can help understanding the regulation of overall metabolism of strawberry plant during development and in response to diseases and abiotic stresses. FragariaCyc is available online at http://pathways.cgrb.oregonstate.edu. PMID:26973684

  18. FragariaCyc: A Metabolic Pathway Database for Woodland Strawberry Fragaria vesca.

    PubMed

    Naithani, Sushma; Partipilo, Christina M; Raja, Rajani; Elser, Justin L; Jaiswal, Pankaj

    2016-01-01

    FragariaCyc is a strawberry-specific cellular metabolic network based on the annotated genome sequence of Fragaria vesca L. ssp. vesca, accession Hawaii 4. It was built on the Pathway-Tools platform using MetaCyc as the reference. The experimental evidences from published literature were used for supporting/editing existing entities and for the addition of new pathways, enzymes, reactions, compounds, and small molecules in the database. To date, FragariaCyc comprises 66 super-pathways, 488 unique pathways, 2348 metabolic reactions, 3507 enzymes, and 2134 compounds. In addition to searching and browsing FragariaCyc, researchers can compare pathways across various plant metabolic networks and analyze their data using Omics Viewer tool. We view FragariaCyc as a resource for the community of researchers working with strawberry and related fruit crops. It can help understanding the regulation of overall metabolism of strawberry plant during development and in response to diseases and abiotic stresses. FragariaCyc is available online at http://pathways.cgrb.oregonstate.edu.

  19. A New Methodology to Associate SNPs with Human Diseases According to Their Pathway Related Context

    PubMed Central

    Bakir-Gungor, Burcu; Sezerman, Osman Ugur

    2011-01-01

    Genome-wide association studies (GWAS) with hundreds of żthousands of single nucleotide polymorphisms (SNPs) are popular strategies to reveal the genetic basis of human complex diseases. Despite many successes of GWAS, it is well recognized that new analytical approaches have to be integrated to achieve their full potential. Starting with a list of SNPs, found to be associated with disease in GWAS, here we propose a novel methodology to devise functionally important KEGG pathways through the identification of genes within these pathways, where these genes are obtained from SNP analysis. Our methodology is based on functionalization of important SNPs to identify effected genes and disease related pathways. We have tested our methodology on WTCCC Rheumatoid Arthritis (RA) dataset and identified: i) previously known RA related KEGG pathways (e.g., Toll-like receptor signaling, Jak-STAT signaling, Antigen processing, Leukocyte transendothelial migration and MAPK signaling pathways); ii) additional KEGG pathways (e.g., Pathways in cancer, Neurotrophin signaling, Chemokine signaling pathways) as associated with RA. Furthermore, these newly found pathways included genes which are targets of RA-specific drugs. Even though GWAS analysis identifies 14 out of 83 of those drug target genes; newly found functionally important KEGG pathways led to the discovery of 25 out of 83 genes, known to be used as drug targets for the treatment of RA. Among the previously known pathways, we identified additional genes associated with RA (e.g. Antigen processing and presentation, Tight junction). Importantly, within these pathways, the associations between some of these additionally found genes, such as HLA-C, HLA-G, PRKCQ, PRKCZ, TAP1, TAP2 and RA were verified by either OMIM database or by literature retrieved from the NCBI PubMed module. With the whole-genome sequencing on the horizon, we show that the full potential of GWAS can be achieved by integrating pathway and network

  20. Automated system for gene annotation and metabolic pathway reconstruction using general sequence databases.

    PubMed

    Alves, João M P; Buck, Gregory A

    2007-11-01

    Despite the growing number of genomes published or currently being sequenced, there is a relative paucity of software for functional classification of newly discovered genes and their assignment to metabolic pathways. Available software for such analyses has a very steep learning curve and requires the installation, configuration, and maintenance of large amounts of complex infrastructure, including complementary software and databases. Many such tools are restricted to one or a few data sources and classification schemes. In this work, we report an automated system for gene annotation and metabolic pathway reconstruction (ASGARD), which was designed to be powerful and generalizable, yet simple for the biologist to install and run on centralized, commonly available computers. It avoids the requirement for complex resources such as relational databases and web servers, as well as the need for administrator access to the operating system. Our methodology contributes to a more rapid investigation of the potential biochemical capabilities of genes and genomes by the biological researcher, and is useful in biochemical as well as comparative and evolutionary studies of pathways and networks.

  1. Exploring consumer exposure pathways and patterns of use for chemicals in the environment through the Chemical/Product Categories Database

    EPA Pesticide Factsheets

    Exploring consumer exposure pathways and patterns of use for chemicals in the environment through the Chemical/Product Categories Database (CPCat) (Presented by: Kathie Dionisio, Sc.D., NERL, US EPA, Research Triangle Park, NC (1/23/2014).

  2. Cyclone: java-based querying and computing with Pathway/Genome databases.

    PubMed

    Le Fèvre, François; Smidtas, Serge; Schächter, Vincent

    2007-05-15

    Cyclone aims at facilitating the use of BioCyc, a collection of Pathway/Genome Databases (PGDBs). Cyclone provides a fully extensible Java Object API to analyze and visualize these data. Cyclone can read and write PGDBs, and can write its own data in the CycloneML format. This format is automatically generated from the BioCyc ontology by Cyclone itself, ensuring continued compatibility. Cyclone objects can also be stored in a relational database CycloneDB. Queries can be written in SQL, and in an intuitive and concise object-oriented query language, Hibernate Query Language (HQL). In addition, Cyclone interfaces easily with Java software including the Eclipse IDE for HQL edition, the Jung API for graph algorithms or Cytoscape for graph visualization. Cyclone is freely available under an open source license at: http://sourceforge.net/projects/nemo-cyclone. For download and installation instructions, tutorials, use cases and examples, see http://nemo-cyclone.sourceforge.net.

  3. Potential biomarkers and latent pathways for vasculitis based on latent pathway identification analysis.

    PubMed

    Zhou, Tao; Zhang, Yudong; Wu, Peng; Sun, Qiang; Guo, Yanan; Yang, Yanfei

    2014-07-01

    We aimed in this study to identify the significant latent pathways and precise molecular mechanisms underlying the syndrome of vasculitis. Agilent dual-channel data of peripheral blood mononuclear cells (PBMCs) from healthy controls and vasculitis patients were downloaded from EBI Array Express database. Differentially expressed genes (DEGs) between normal and vasculitis PBMCs samples were selected. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were carried out to identify significant biological processes and pathways. DEGs were matched to NetBox software database to obtain LINKER genes with statistical significance. Protein-protein interaction (PPI) network was constructed with LINKER genes and DEGs according to STRING database. Latent pathway identification analysis (LPIA) was used to identify the most significant interactions among different pathways involved by DEGs. A total of 266 DEGs were selected. GO and KEGG pathway analysis showed that the up-regulated genes were significantly enriched in defense and wounding response; the down-regulated genes were enriched in immune response. The modules analysis of PPI network suggested that ISG15 and IFIT3 were the potential biomarkers for vasculitis. The results of LPIA showed that NOD-like receptor signaling pathway and shigellosis related pathway were the two most significant latent pathway interactions for vasculitis. ISG15 and IFIT3 were the potential biomarkers for vasculitis identification. NOD-like receptor signaling pathway and shigellosis related pathway were the most significant latent pathway interactions for vasculitis. Moreover, LPIA was a useful method for revealing systemic biological pathways and cellular mechanisms of diseases. © 2014 Asia Pacific League of Associations for Rheumatology and Wiley Publishing Asia Pty Ltd.

  4. A Novel Method for Pathway Identification Based on Attractor and Crosstalk in Polyarticular Juvenile Idiopathic Arthritis

    PubMed Central

    Wang, Yuanji; Lin, Shunhua; Li, Changhui; Li, Yizhao; Chen, Lei; Wang, Yingzhen

    2016-01-01

    Background Juvenile idiopathic arthritis (JIA) is one of the most common inflammatory disorders of unknown etiology. We introduced a novel method to identify dysregulated pathways associated with polyarticular JIA (pJIA). Material/Methods Gene expression profiling of 61 children with pJIA and 59 healthy controls were collected from E-GEOD-13849; 300 pathways were obtained from Kyoto Encyclopedia of Genes and Genomes (KEGG) database and 787,896 protein-protein interaction sets were gathered from the Retrieval of Interacting Genes. Attractor and crosstalk were designed to complement each other to increase the integrity of pathways assessment. Then, impact factor was used to assess the interactions inter-pathways, and RP-value was used to evaluate the comprehensive influential ability of attractors. Results There were seven attractors with p<0.01 and 14 pathways with RP<0.01. Finally, two significantly dysfunctional pathways were found, which were related to pJIA progression: p53 signaling pathway (KEGG ID: 04115) and non-alcoholic fatty liver disease (NAFLD) (KEGG ID: 04932). Conclusions A novel approach that identified the dysregulated pathways in pJIA was constructed based on attractor and crosstalk. The new process is expected to be efficient in the upcoming era of medicine. PMID:27804927

  5. Comprehensive analysis of the N-glycan biosynthetic pathway using bioinformatics to generate UniCorn: A theoretical N-glycan structure database.

    PubMed

    Akune, Yukie; Lin, Chi-Hung; Abrahams, Jodie L; Zhang, Jingyu; Packer, Nicolle H; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P

    2016-08-05

    Glycan structures attached to proteins are comprised of diverse monosaccharide sequences and linkages that are produced from precursor nucleotide-sugars by a series of glycosyltransferases. Databases of these structures are an essential resource for the interpretation of analytical data and the development of bioinformatics tools. However, with no template to predict what structures are possible the human glycan structure databases are incomplete and rely heavily on the curation of published, experimentally determined, glycan structure data. In this work, a library of 45 human glycosyltransferases was used to generate a theoretical database of N-glycan structures comprised of 15 or less monosaccharide residues. Enzyme specificities were sourced from major online databases including Kyoto Encyclopedia of Genes and Genomes (KEGG) Glycan, Consortium for Functional Glycomics (CFG), Carbohydrate-Active enZymes (CAZy), GlycoGene DataBase (GGDB) and BRENDA. Based on the known activities, more than 1.1 million theoretical structures and 4.7 million synthetic reactions were generated and stored in our database called UniCorn. Furthermore, we analyzed the differences between the predicted glycan structures in UniCorn and those contained in UniCarbKB (www.unicarbkb.org), a database which stores experimentally described glycan structures reported in the literature, and demonstrate that UniCorn can be used to aid in the assignment of ambiguous structures whilst also serving as a discovery database.

  6. miRPathDB: a new dictionary on microRNAs and target pathways

    PubMed Central

    Backes, Christina; Kehl, Tim; Stöckel, Daniel; Fehlmann, Tobias; Schneider, Lara; Meese, Eckart; Lenhof, Hans-Peter; Keller, Andreas

    2017-01-01

    In the last decade, miRNAs and their regulatory mechanisms have been intensively studied and many tools for the analysis of miRNAs and their targets have been developed. We previously presented a dictionary on single miRNAs and their putative target pathways. Since then, the number of miRNAs has tripled and the knowledge on miRNAs and targets has grown substantially. This, along with changes in pathway resources such as KEGG, leads to an improved understanding of miRNAs, their target genes and related pathways. Here, we introduce the miRNA Pathway Dictionary Database (miRPathDB), freely accessible at https://mpd.bioinf.uni-sb.de/. With the database we aim to complement available target pathway web-servers by providing researchers easy access to the information which pathways are regulated by a miRNA, which miRNAs target a pathway and how specific these regulations are. The database contains a large number of miRNAs (2595 human miRNAs), different miRNA target sets (14 773 experimentally validated target genes as well as 19 281 predicted targets genes) and a broad selection of functional biochemical categories (KEGG-, WikiPathways-, BioCarta-, SMPDB-, PID-, Reactome pathways, functional categories from gene ontology (GO), protein families from Pfam and chromosomal locations totaling 12 875 categories). In addition to Homo sapiens, also Mus musculus data are stored and can be compared to human target pathways. PMID:27742822

  7. A Pathway Analysis Tool for Analyzing Microarray Data of Species with Low Physiological Information

    PubMed Central

    te Pas, M. F. W.; van Hemert, S.; Hulsegge, B.; Hoekman, A. J. W.; Pool, M. H.; Rebel, J. M. J.; Smits, M. A.

    2008-01-01

    Pathway information provides insight into the biological processes underlying microarray data. Pathway information is widely available for humans and laboratory animals in databases through the internet, but less for other species, for example, livestock. Many software packages use species-specific gene IDs that cannot handle genomics data from other species. We developed a species-independent method to search pathways databases to analyse microarray data. Three PERL scripts were developed that use the names of the genes on the microarray. (1) Add synonyms of gene names by searching the Gene Ontology (GO) database. (2) Search the Kyoto Encyclopaedia of Genes and Genomes (KEGG) database for pathway information using this GO-enriched gene list. (3) Combine the pathway data with the microarray data and visualize the results using color codes indicating regulation. To demonstrate the power of the method, we used a previously reported chicken microarray experiment investigating line-specific reactions to Salmonella infection as an example. PMID:19920988

  8. DemaDb: an integrated dematiaceous fungal genomes database

    PubMed Central

    Kuan, Chee Sian; Yew, Su Mei; Chan, Chai Ling; Toh, Yue Fen; Lee, Kok Wei; Cheong, Wei-Hien; Yee, Wai-Yan; Hoh, Chee-Choong; Yap, Soon-Joo; Ng, Kee Peng

    2016-01-01

    Many species of dematiaceous fungi are associated with allergic reactions and potentially fatal diseases in human, especially in tropical climates. Over the past 10 years, we have isolated more than 400 dematiaceous fungi from various clinical samples. In this study, DemaDb, an integrated database was designed to support the integration and analysis of dematiaceous fungal genomes. A total of 92 072 putative genes and 6527 pathways that identified in eight dematiaceous fungi (Bipolaris papendorfii UM 226, Daldinia eschscholtzii UM 1400, D. eschscholtzii UM 1020, Pyrenochaeta unguis-hominis UM 256, Ochroconis mirabilis UM 578, Cladosporium sphaerospermum UM 843, Herpotrichiellaceae sp. UM 238 and Pleosporales sp. UM 1110) were deposited in DemaDb. DemaDb includes functional annotations for all predicted gene models in all genomes, such as Gene Ontology, EuKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam and InterProScan. All predicted protein models were further functionally annotated to Carbohydrate-Active enzymes, peptidases, secondary metabolites and virulence factors. DemaDb Genome Browser enables users to browse and visualize entire genomes with annotation data including gene prediction, structure, orientation and custom feature tracks. The Pathway Browser based on the KEGG pathway database allows users to look into molecular interaction and reaction networks for all KEGG annotated genes. The availability of downloadable files containing assembly, nucleic acid, as well as protein data allows the direct retrieval for further downstream works. DemaDb is a useful resource for fungal research community especially those involved in genome-scale analysis, functional genomics, genetics and disease studies of dematiaceous fungi. Database URL: http://fungaldb.um.edu.my PMID:26980516

  9. DemaDb: an integrated dematiaceous fungal genomes database.

    PubMed

    Kuan, Chee Sian; Yew, Su Mei; Chan, Chai Ling; Toh, Yue Fen; Lee, Kok Wei; Cheong, Wei-Hien; Yee, Wai-Yan; Hoh, Chee-Choong; Yap, Soon-Joo; Ng, Kee Peng

    2016-01-01

    Many species of dematiaceous fungi are associated with allergic reactions and potentially fatal diseases in human, especially in tropical climates. Over the past 10 years, we have isolated more than 400 dematiaceous fungi from various clinical samples. In this study, DemaDb, an integrated database was designed to support the integration and analysis of dematiaceous fungal genomes. A total of 92 072 putative genes and 6527 pathways that identified in eight dematiaceous fungi (Bipolaris papendorfii UM 226, Daldinia eschscholtzii UM 1400, D. eschscholtzii UM 1020, Pyrenochaeta unguis-hominis UM 256, Ochroconis mirabilis UM 578, Cladosporium sphaerospermum UM 843, Herpotrichiellaceae sp. UM 238 and Pleosporales sp. UM 1110) were deposited in DemaDb. DemaDb includes functional annotations for all predicted gene models in all genomes, such as Gene Ontology, EuKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam and InterProScan. All predicted protein models were further functionally annotated to Carbohydrate-Active enzymes, peptidases, secondary metabolites and virulence factors. DemaDb Genome Browser enables users to browse and visualize entire genomes with annotation data including gene prediction, structure, orientation and custom feature tracks. The Pathway Browser based on the KEGG pathway database allows users to look into molecular interaction and reaction networks for all KEGG annotated genes. The availability of downloadable files containing assembly, nucleic acid, as well as protein data allows the direct retrieval for further downstream works. DemaDb is a useful resource for fungal research community especially those involved in genome-scale analysis, functional genomics, genetics and disease studies of dematiaceous fungi. Database URL: http://fungaldb.um.edu.my.

  10. Enhancing a Pathway-Genome Database (PGDB) to Capture Subcellular Localization of Metabolites and Enzymes: The Nucleotide-Sugar Biosynthetic Pathways of Populus trichocarpa

    SciTech Connect

    Nag, A.; Karpinets, T. V.; Chang, C. H.; Bar-Peled, M.

    2012-01-01

    Understanding how cellular metabolism works and is regulated requires that the underlying biochemical pathways be adequately represented and integrated with large metabolomic data sets to establish a robust network model. Genetically engineering energy crops to be less recalcitrant to saccharification requires detailed knowledge of plant polysaccharide structures and a thorough understanding of the metabolic pathways involved in forming and regulating cell-wall synthesis. Nucleotide-sugars are building blocks for synthesis of cell wall polysaccharides. The biosynthesis of nucleotide-sugars is catalyzed by a multitude of enzymes that reside in different subcellular organelles, and precise representation of these pathways requires accurate capture of this biological compartmentalization. The lack of simple localization cues in genomic sequence data and annotations however leads to missing compartmentalization information for eukaryotes in automatically generated databases, such as the Pathway-Genome Databases (PGDBs) of the SRI Pathway Tools software that drives much biochemical knowledge representation on the internet. In this report, we provide an informal mechanism using the existing Pathway Tools framework to integrate protein and metabolite sub-cellular localization data with the existing representation of the nucleotide-sugar metabolic pathways in a prototype PGDB for Populus trichocarpa. The enhanced pathway representations have been successfully used to map SNP abundance data to individual nucleotide-sugar biosynthetic genes in the PGDB. The manually curated pathway representations are more conducive to the construction of a computational platform that will allow the simulation of natural and engineered nucleotide-sugar precursor fluxes into specific recalcitrant polysaccharide(s).

  11. Hedgehog Signaling Pathway Database: a repository of current annotation efforts and resources for the Hh research community.

    PubMed

    Hervold, Kieran; Martin, Andrew; Kirkpatrick, Roger A; Mc Kenna, Paul F; Ramirez-Weber, F A

    2007-01-01

    The Hedgehog Signaling Pathway Database is a curated repository of information pertaining to the Hedgehog developmental pathway. It was designed to provide centralized access to a wide range of relevant information in an organism-agnostic manner. Data are provided for all genes and gene targets known to be involved in the Hh pathway across various organisms. The data provided include DNA and protein sequences as well as domain structure motifs. All known human diseases associated with the Hh pathway are indexed including experimental data on therapeutic agents and their molecular targets. Hh researchers will find useful information on relevant protocols, tissue cell lines and reagents used in current Hh research projects. Curated content is also provided for publications, grants and patents relating to the Hh pathway. The database can be accessed at http://www.hedgehog.sfsu.edu.

  12. aglgenes, A curated and searchable database of archaeal N-glycosylation pathway components.

    PubMed

    Godin, Noa; Eichler, Jerry

    2014-01-01

    Whereas N-glycosylation is a posttranslational modification performed across evolution, the archaeal version of this protein-processing event presents a degree of diversity not seen in either bacteria or eukarya. Accordingly, archaeal N-glycosylation relies on a large number of enzymes that are often species-specific or restricted to a select group of species. As such, there is a need for an organized platform upon which amassing information about archaeal glycosylation (agl) genes can rest. Accordingly, the aglgenes database provides detailed descriptions of experimentally characterized archaeal N-glycosyation pathway components. For each agl gene, genomic information, supporting literature and relevant external links are provided at a functional intuitive web-interface designed for data browsing. Routine updates ensure that novel experimental information on genes and proteins contributing to archaeal N-glycosylation is incorporated into aglgenes in a timely manner. As such, aglgenes represents a specialized resource for sharing validated experimental information online, providing support for workers in the field of archaeal protein glycosylation. Database URL: www.bgu.ac.il/aglgenes.

  13. Multiomics in Grape Berry Skin Revealed Specific Induction of the Stilbene Synthetic Pathway by Ultraviolet-C Irradiation1

    PubMed Central

    Suzuki, Mami; Nakabayashi, Ryo; Ogata, Yoshiyuki; Sakurai, Nozomu; Tokimatsu, Toshiaki; Goto, Susumu; Suzuki, Makoto; Jasinski, Michal; Martinoia, Enrico; Otagaki, Shungo; Matsumoto, Shogo; Saito, Kazuki; Shiratake, Katsuhiro

    2015-01-01

    Grape (Vitis vinifera) accumulates various polyphenolic compounds, which protect against environmental stresses, including ultraviolet-C (UV-C) light and pathogens. In this study, we looked at the transcriptome and metabolome in grape berry skin after UV-C irradiation, which demonstrated the effectiveness of omics approaches to clarify important traits of grape. We performed transcriptome analysis using a genome-wide microarray, which revealed 238 genes up-regulated more than 5-fold by UV-C light. Enrichment analysis of Gene Ontology terms showed that genes encoding stilbene synthase, a key enzyme for resveratrol synthesis, were enriched in the up-regulated genes. We performed metabolome analysis using liquid chromatography-quadrupole time-of-flight mass spectrometry, and 2,012 metabolite peaks, including unidentified peaks, were detected. Principal component analysis using the peaks showed that only one metabolite peak, identified as resveratrol, was highly induced by UV-C light. We updated the metabolic pathway map of grape in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and in the KaPPA-View 4 KEGG system, then projected the transcriptome and metabolome data on a metabolic pathway map. The map showed specific induction of the resveratrol synthetic pathway by UV-C light. Our results showed that multiomics is a powerful tool to elucidate the accumulation mechanisms of secondary metabolites, and updated systems, such as KEGG and KaPPA-View 4 KEGG for grape, can support such studies. PMID:25761715

  14. ProCarDB: a database of bacterial carotenoids.

    PubMed

    Nupur, L N U; Vats, Asheema; Dhanda, Sandeep Kumar; Raghava, Gajendra P S; Pinnaka, Anil Kumar; Kumar, Ashwani

    2016-05-26

    Carotenoids have important functions in bacteria, ranging from harvesting light energy to neutralizing oxidants and acting as virulence factors. However, information pertaining to the carotenoids is scattered throughout the literature. Furthermore, information about the genes/proteins involved in the biosynthesis of carotenoids has tremendously increased in the post-genomic era. A web server providing the information about microbial carotenoids in a structured manner is required and will be a valuable resource for the scientific community working with microbial carotenoids. Here, we have created a manually curated, open access, comprehensive compilation of bacterial carotenoids named as ProCarDB- Prokaryotic Carotenoid Database. ProCarDB includes 304 unique carotenoids arising from 50 biosynthetic pathways distributed among 611 prokaryotes. ProCarDB provides important information on carotenoids, such as 2D and 3D structures, molecular weight, molecular formula, SMILES, InChI, InChIKey, IUPAC name, KEGG Id, PubChem Id, and ChEBI Id. The database also provides NMR data, UV-vis absorption data, IR data, MS data and HPLC data that play key roles in the identification of carotenoids. An important feature of this database is the extension of biosynthetic pathways from the literature and through the presence of the genes/enzymes in different organisms. The information contained in the database was mined from published literature and databases such as KEGG, PubChem, ChEBI, LipidBank, LPSN, and Uniprot. The database integrates user-friendly browsing and searching with carotenoid analysis tools to help the user. We believe that this database will serve as a major information centre for researchers working on bacterial carotenoids.

  15. Xtalk: a path-based approach for identifying crosstalk between signaling pathways

    PubMed Central

    Tegge, Allison N.; Sharp, Nicholas; Murali, T. M.

    2016-01-01

    Motivation: Cells communicate with their environment via signal transduction pathways. On occasion, the activation of one pathway can produce an effect downstream of another pathway, a phenomenon known as crosstalk. Existing computational methods to discover such pathway pairs rely on simple overlap statistics. Results: We present Xtalk, a path-based approach for identifying pairs of pathways that may crosstalk. Xtalk computes the statistical significance of the average length of multiple short paths that connect receptors in one pathway to the transcription factors in another. By design, Xtalk reports the precise interactions and mechanisms that support the identified crosstalk. We applied Xtalk to signaling pathways in the KEGG and NCI-PID databases. We manually curated a gold standard set of 132 crosstalking pathway pairs and a set of 140 pairs that did not crosstalk, for which Xtalk achieved an area under the receiver operator characteristic curve of 0.65, a 12% improvement over the closest competing approach. The area under the receiver operator characteristic curve varied with the pathway, suggesting that crosstalk should be evaluated on a pathway-by-pathway level. We also analyzed an extended set of 658 pathway pairs in KEGG and to a set of more than 7000 pathway pairs in NCI-PID. For the top-ranking pairs, we found substantial support in the literature (81% for KEGG and 78% for NCI-PID). We provide examples of networks computed by Xtalk that accurately recovered known mechanisms of crosstalk. Availability and implementation: The XTALK software is available at http://bioinformatics.cs.vt.edu/~murali/software. Crosstalk networks are available at http://graphspace.org/graphs?tags=2015-bioinformatics-xtalk. Contact: ategge@vt.edu, murali@cs.vt.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26400040

  16. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for

  17. The pathway ontology - updates and applications.

    PubMed

    Petri, Victoria; Jayaraman, Pushkala; Tutaj, Marek; Hayman, G Thomas; Smith, Jennifer R; De Pons, Jeff; Laulederkind, Stanley Jf; Lowry, Timothy F; Nigam, Rajni; Wang, Shur-Jen; Shimoyama, Mary; Dwinell, Melinda R; Munzenmaier, Diane H; Worthey, Elizabeth A; Jacob, Howard J

    2014-02-05

    The Pathway Ontology (PW) developed at the Rat Genome Database (RGD), covers all types of biological pathways, including altered and disease pathways and captures the relationships between them within the hierarchical structure of a directed acyclic graph. The ontology allows for the standardized annotation of rat, and of human and mouse genes to pathway terms. It also constitutes a vehicle for easy navigation between gene and ontology report pages, between reports and interactive pathway diagrams, between pathways directly connected within a diagram and between those that are globally related in pathway suites and suite networks. Surveys of the literature and the development of the Pathway and Disease Portals are important sources for the ongoing development of the ontology. User requests and mapping of pathways in other databases to terms in the ontology further contribute to increasing its content. Recently built automated pipelines use the mapped terms to make available the annotations generated by other groups. The two released pipelines - the Pathway Interaction Database (PID) Annotation Import Pipeline and the Kyoto Encyclopedia of Genes and Genomes (KEGG) Annotation Import Pipeline, make available over 7,400 and 31,000 pathway gene annotations, respectively. Building the PID pipeline lead to the addition of new terms within the signaling node, also augmented by the release of the RGD "Immune and Inflammatory Disease Portal" at that time. Building the KEGG pipeline lead to a substantial increase in the number of disease pathway terms, such as those within the 'infectious disease pathway' parent term category. The 'drug pathway' node has also seen increases in the number of terms as well as a restructuring of the node. Literature surveys, disease portal deployments and user requests have contributed and continue to contribute additional new terms across the ontology. Since first presented, the content of PW has increased by over 75%. Ongoing development of

  18. The pathway ontology – updates and applications

    PubMed Central

    2014-01-01

    Background The Pathway Ontology (PW) developed at the Rat Genome Database (RGD), covers all types of biological pathways, including altered and disease pathways and captures the relationships between them within the hierarchical structure of a directed acyclic graph. The ontology allows for the standardized annotation of rat, and of human and mouse genes to pathway terms. It also constitutes a vehicle for easy navigation between gene and ontology report pages, between reports and interactive pathway diagrams, between pathways directly connected within a diagram and between those that are globally related in pathway suites and suite networks. Surveys of the literature and the development of the Pathway and Disease Portals are important sources for the ongoing development of the ontology. User requests and mapping of pathways in other databases to terms in the ontology further contribute to increasing its content. Recently built automated pipelines use the mapped terms to make available the annotations generated by other groups. Results The two released pipelines – the Pathway Interaction Database (PID) Annotation Import Pipeline and the Kyoto Encyclopedia of Genes and Genomes (KEGG) Annotation Import Pipeline, make available over 7,400 and 31,000 pathway gene annotations, respectively. Building the PID pipeline lead to the addition of new terms within the signaling node, also augmented by the release of the RGD “Immune and Inflammatory Disease Portal” at that time. Building the KEGG pipeline lead to a substantial increase in the number of disease pathway terms, such as those within the ‘infectious disease pathway’ parent term category. The ‘drug pathway’ node has also seen increases in the number of terms as well as a restructuring of the node. Literature surveys, disease portal deployments and user requests have contributed and continue to contribute additional new terms across the ontology. Since first presented, the content of PW has increased by

  19. An Assessment of Database-Validated microRNA Target Genes in Normal Colonic Mucosa: Implications for Pathway Analysis.

    PubMed

    Slattery, Martha L; Herrick, Jennifer S; Stevens, John R; Wolff, Roger K; Mullany, Lila E

    2017-01-01

    Determination of functional pathways regulated by microRNAs (miRNAs), while an essential step in developing therapeutics, is challenging. Some miRNAs have been studied extensively; others have limited information. In this study, we focus on 254 miRNAs previously identified as being associated with colorectal cancer and their database-identified validated target genes. We use RNA-Seq data to evaluate messenger RNA (mRNA) expression for 157 subjects who also had miRNA expression data. In the replication phase of the study, we replicated associations between 254 miRNAs associated with colorectal cancer and mRNA expression of database-identified target genes in normal colonic mucosa. In the discovery phase of the study, we evaluated expression of 18 miR-NAs (those with 20 or fewer database-identified target genes along with miR-21-5p, miR-215-5p, and miR-124-3p which have more than 500 database-identified target genes) with expression of 17 434 mRNAs to identify new targets in colon tissue. Seed region matches between miRNA and newly identified targeted mRNA were used to help determine direct miRNA-mRNA associations. From the replication of the 121 miRNAs that had at least 1 database-identified target gene using mRNA expression methods, 97.9% were expressed in normal colonic mucosa. Of the 8622 target miRNA-mRNA associations identified in the database, 2658 (30.2%) were associated with gene expression in normal colonic mucosa after adjusting for multiple comparisons. Of the 133 miRNAs with database-identified target genes by non-mRNA expression methods, 97.2% were expressed in normal colonic mucosa. After adjustment for multiple comparisons, 2416 miRNA-mRNA associations remained significant (19.8%). Results from the discovery phase based on detailed examination of 18 miRNAs identified more than 80 000 miRNA-mRNA associations that had not previously linked to the miRNA. Of these miRNA-mRNA associations, 15.6% and 14.8% had seed matches for CRCh38 and CRCh37

  20. iPathCons and iPathDB: an improved insect pathway construction tool and the database

    PubMed Central

    Zhang, Zan; Yin, Chuanlin; Liu, Ying; Jie, Wencai; Lei, Wenjie; Li, Fei

    2014-01-01

    Insects are one of the most successful animal groups on earth. Some insects, such as the silkworm and honeybee, are beneficial to humans, whereas others are notorious pests of crops. At present, the genomes of 38 insects have been sequenced and made publically available. In addition, the transcriptomes of dozens of insects have been sequenced. As gene data rapidly accumulate, constructing the pathway of molecular interactions becomes increasingly important for entomological research. Here, we developed an improved tool, iPathCons, for knowledge-based construction of pathways from the transcriptomes or the official gene sets of genomes. Considering the high evolution diversity in insects, iPathCons uses a voting system for Kyoto Encyclopedia of Genes and Genomes Orthology assignment. Both stand-alone software and a web server of iPathCons are provided. Using iPathCons, we constructed the pathways of molecular interactions of 52 insects, including 37 genome-sequenced and 15 transcriptome-sequenced ones. These pathways are available in the iPathDB, which provides searches, web server, data downloads, etc. This database will be highly useful for the insect research community. Database URL: http://ento.njau.edu.cn/ipath/ PMID:25388589

  1. The use of functional chemical-protein associations to identify multi-pathway renoprotectants.

    PubMed

    Xu, Jia; Meng, Kexin; Zhang, Rui; Yang, He; Liao, Chang; Zhu, Wenliang; Jiao, Jundong

    2014-01-01

    Typically, most nephropathies can be categorized as complex human diseases in which the cumulative effect of multiple minor genes, combined with environmental and lifestyle factors, determines the disease phenotype. Thus, multi-target drugs would be more likely to facilitate comprehensive renoprotection than single-target agents. In this study, functional chemical-protein association analysis was performed to retrieve multi-target drugs of high pathway wideness from the STITCH 3.1 database. Pathway wideness of a drug evaluated the efficiency of regulation of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways in quantity. We identified nine experimentally validated renoprotectants that exerted remarkable impact on KEGG pathways by targeting a limited number of proteins. We selected curcumin as an illustrative compound to display the advantage of multi-pathway drugs on renoprotection. We compared curcumin with hemin, an agonist of heme oxygenase-1 (HO-1), which significantly affects only one KEGG pathway, porphyrin and chlorophyll metabolism (adjusted p = 1.5×10-5). At the same concentration (10 µM), both curcumin and hemin equivalently mitigated oxidative stress in H2O2-treated glomerular mesangial cells. The benefit of using hemin was derived from its agonistic effect on HO-1, providing relief from oxidative stress. Selective inhibition of HO-1 completely blocked the action of hemin but not that of curcumin, suggesting simultaneous multi-pathway intervention by curcumin. Curcumin also increased cellular autophagy levels, enhancing its protective effect; however, hemin had no effects. Based on the fact that the dysregulation of multiple pathways is implicated in the etiology of complex diseases, we proposed a feasible method for identifying multi-pathway drugs from compounds with validated targets. Our efforts will help identify multi-pathway agents capable of providing comprehensive protection against renal injuries.

  2. Preimplantation development regulatory pathway construction through a text-mining approach

    PubMed Central

    2011-01-01

    Background The integration of sequencing and gene interaction data and subsequent generation of pathways and networks contained in databases such as KEGG Pathway is essential for the comprehension of complex biological processes. We noticed the absence of a chart or pathway describing the well-studied preimplantation development stages; furthermore, not all genes involved in the process have entries in KEGG Orthology, important information for knowledge application with relation to other organisms. Results In this work we sought to develop the regulatory pathway for the preimplantation development stage using text-mining tools such as Medline Ranker and PESCADOR to reveal biointeractions among the genes involved in this process. The genes present in the resulting pathway were also used as seeds for software developed by our group called SeedServer to create clusters of homologous genes. These homologues allowed the determination of the last common ancestor for each gene and revealed that the preimplantation development pathway consists of a conserved ancient core of genes with the addition of modern elements. Conclusions The generation of regulatory pathways through text-mining tools allows the integration of data generated by several studies for a more complete visualization of complex biological processes. Using the genes in this pathway as “seeds” for the generation of clusters of homologues, the pathway can be visualized for other organisms. The clustering of homologous genes together with determination of the ancestry leads to a better understanding of the evolution of such process. PMID:22369103

  3. Preimplantation development regulatory pathway construction through a text-mining approach.

    PubMed

    Donnard, Elisa; Barbosa-Silva, Adriano; Guedes, Rafael L M; Fernandes, Gabriel R; Velloso, Henrique; Kohn, Matthew J; Andrade-Navarro, Miguel A; Ortega, J Miguel

    2011-12-22

    The integration of sequencing and gene interaction data and subsequent generation of pathways and networks contained in databases such as KEGG Pathway is essential for the comprehension of complex biological processes. We noticed the absence of a chart or pathway describing the well-studied preimplantation development stages; furthermore, not all genes involved in the process have entries in KEGG Orthology, important information for knowledge application with relation to other organisms. In this work we sought to develop the regulatory pathway for the preimplantation development stage using text-mining tools such as Medline Ranker and PESCADOR to reveal biointeractions among the genes involved in this process. The genes present in the resulting pathway were also used as seeds for software developed by our group called SeedServer to create clusters of homologous genes. These homologues allowed the determination of the last common ancestor for each gene and revealed that the preimplantation development pathway consists of a conserved ancient core of genes with the addition of modern elements. The generation of regulatory pathways through text-mining tools allows the integration of data generated by several studies for a more complete visualization of complex biological processes. Using the genes in this pathway as "seeds" for the generation of clusters of homologues, the pathway can be visualized for other organisms. The clustering of homologous genes together with determination of the ancestry leads to a better understanding of the evolution of such process.

  4. PathCase-SB architecture and database design

    PubMed Central

    2011-01-01

    Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889

  5. Creation of a Genome-Wide Metabolic Pathway Database for Populus trichocarpa Using a New Approach for Reconstruction and Curation of Metabolic Pathways for Plants1[W][OA

    PubMed Central

    Zhang, Peifen; Dreher, Kate; Karthikeyan, A.; Chi, Anjo; Pujar, Anuradha; Caspi, Ron; Karp, Peter; Kirkup, Vanessa; Latendresse, Mario; Lee, Cynthia; Mueller, Lukas A.; Muller, Robert; Rhee, Seung Yon

    2010-01-01

    Metabolic networks reconstructed from sequenced genomes or transcriptomes can help visualize and analyze large-scale experimental data, predict metabolic phenotypes, discover enzymes, engineer metabolic pathways, and study metabolic pathway evolution. We developed a general approach for reconstructing metabolic pathway complements of plant genomes. Two new reference databases were created and added to the core of the infrastructure: a comprehensive, all-plant reference pathway database, PlantCyc, and a reference enzyme sequence database, RESD, for annotating metabolic functions of protein sequences. PlantCyc (version 3.0) includes 714 metabolic pathways and 2,619 reactions from over 300 species. RESD (version 1.0) contains 14,187 literature-supported enzyme sequences from across all kingdoms. We used RESD, PlantCyc, and MetaCyc (an all-species reference metabolic pathway database), in conjunction with the pathway prediction software Pathway Tools, to reconstruct a metabolic pathway database, PoplarCyc, from the recently sequenced genome of Populus trichocarpa. PoplarCyc (version 1.0) contains 321 pathways with 1,807 assigned enzymes. Comparing PoplarCyc (version 1.0) with AraCyc (version 6.0, Arabidopsis [Arabidopsis thaliana]) showed comparable numbers of pathways distributed across all domains of metabolism in both databases, except for a higher number of AraCyc pathways in secondary metabolism and a 1.5-fold increase in carbohydrate metabolic enzymes in PoplarCyc. Here, we introduce these new resources and demonstrate the feasibility of using them to identify candidate enzymes for specific pathways and to analyze metabolite profiling data through concrete examples. These resources can be searched by text or BLAST, browsed, and downloaded from our project Web site (http://plantcyc.org). PMID:20522724

  6. NLDB: a database for 3D protein-ligand interactions in enzymatic reactions.

    PubMed

    Murakami, Yoichi; Omori, Satoshi; Kinoshita, Kengo

    2016-12-01

    NLDB (Natural Ligand DataBase; URL: http://nldb.hgc.jp ) is a database of automatically collected and predicted 3D protein-ligand interactions for the enzymatic reactions of metabolic pathways registered in KEGG. Structural information about these reactions is important for studying the molecular functions of enzymes, however a large number of the 3D interactions are still unknown. Therefore, in order to complement such missing information, we predicted protein-ligand complex structures, and constructed a database of the 3D interactions in reactions. NLDB provides three different types of data resources; the natural complexes are experimentally determined protein-ligand complex structures in PDB, the analog complexes are predicted based on known protein structures in a complex with a similar ligand, and the ab initio complexes are predicted by docking simulations. In addition, NLDB shows the known polymorphisms found in human genome on protein structures. The database has a flexible search function based on various types of keywords, and an enrichment analysis function based on a set of KEGG compound IDs. NLDB will be a valuable resource for experimental biologists studying protein-ligand interactions in specific reactions, and for theoretical researchers wishing to undertake more precise simulations of interactions.

  7. Transcriptome and Metabolite analysis reveal candidate genes of the cardiac glycoside biosynthetic pathway from Calotropis procera

    PubMed Central

    Pandey, Akansha; Swarnkar, Vishakha; Pandey, Tushar; Srivastava, Piush; Kanojiya, Sanjeev; Mishra, Dipak Kumar; Tripathi, Vineeta

    2016-01-01

    Calotropis procera is a medicinal plant of immense importance due to its pharmaceutical active components, especially cardiac glycosides (CG). As genomic resources for this plant are limited, the genes involved in CG biosynthetic pathway remain largely unknown till date. Our study on stage and tissue specific metabolite accumulation showed that CG’s were maximally accumulated in stems of 3 month old seedlings. De novo transcriptome sequencing of same was done using high throughput Illumina HiSeq platform generating 44074 unigenes with average mean length of 1785 base pair. Around 66.6% of unigenes were annotated by using various public databases and 5324 unigenes showed significant match in the KEGG database involved in 133 different pathways of plant metabolism. Further KEGG analysis resulted in identification of 336 unigenes involved in cardenolide biosynthesis. Tissue specific expression analysis of 30 putative transcripts involved in terpenoid, steroid and cardenolide pathways showed a positive correlation between metabolite and transcript accumulation. Wound stress elevated CG levels as well the levels of the putative transcripts involved in its biosynthetic pathways. This result further validated the involvement of identified transcripts in CGs biosynthesis. The identified transcripts will lay a substantial foundation for further research on metabolic engineering and regulation of cardiac glycosides biosynthesis pathway genes. PMID:27703261

  8. RNApathwaysDB—a database of RNA maturation and decay pathways

    PubMed Central

    Milanowska, Kaja; Mikolajczak, Katarzyna; Lukasik, Anna; Skorupski, Marcin; Balcer, Zuzanna; Machnicka, Magdalena A.; Nowacka, Martyna; Rother, Kristian M.; Bujnicki, Janusz M.

    2013-01-01

    Many RNA molecules undergo complex maturation, involving e.g. excision from primary transcripts, removal of introns, post-transcriptional modification and polyadenylation. The level of mature, functional RNAs in the cell is controlled not only by the synthesis and maturation but also by degradation, which proceeds via many different routes. The systematization of data about RNA metabolic pathways and enzymes taking part in RNA maturation and degradation is essential for the full understanding of these processes. RNApathwaysDB, available online at http://iimcb.genesilico.pl/rnapathwaysdb, is an online resource about maturation and decay pathways involving RNA as the substrate. The current release presents information about reactions and enzymes that take part in the maturation and degradation of tRNA, rRNA and mRNA, and describes pathways in three model organisms: Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. RNApathwaysDB can be queried with keywords, and sequences of protein enzymes involved in RNA processing can be searched with BLAST. Options for data presentation include pathway graphs and tables with enzymes and literature data. Structures of macromolecular complexes involving RNA and proteins that act on it are presented as ‘potato models’ using DrawBioPath—a new javascript tool. PMID:23155061

  9. TrypanoCyc: a community-led biochemical pathways database for Trypanosoma brucei

    PubMed Central

    Shameer, Sanu; Logan-Klumpler, Flora J.; Vinson, Florence; Cottret, Ludovic; Merlet, Benjamin; Achcar, Fiona; Boshart, Michael; Berriman, Matthew; Breitling, Rainer; Bringaud, Frédéric; Bütikofer, Peter; Cattanach, Amy M.; Bannerman-Chukualim, Bridget; Creek, Darren J.; Crouch, Kathryn; de Koning, Harry P.; Denise, Hubert; Ebikeme, Charles; Fairlamb, Alan H.; Ferguson, Michael A. J.; Ginger, Michael L.; Hertz-Fowler, Christiane; Kerkhoven, Eduard J.; Mäser, Pascal; Michels, Paul A. M.; Nayak, Archana; Nes, David W.; Nolan, Derek P.; Olsen, Christian; Silva-Franco, Fatima; Smith, Terry K.; Taylor, Martin C.; Tielens, Aloysius G. M.; Urbaniak, Michael D.; van Hellemond, Jaap J.; Vincent, Isabel M.; Wilkinson, Shane R.; Wyllie, Susan; Opperdoes, Fred R.; Barrett, Michael P.; Jourdan, Fabien

    2015-01-01

    The metabolic network of a cell represents the catabolic and anabolic reactions that interconvert small molecules (metabolites) through the activity of enzymes, transporters and non-catalyzed chemical reactions. Our understanding of individual metabolic networks is increasing as we learn more about the enzymes that are active in particular cells under particular conditions and as technologies advance to allow detailed measurements of the cellular metabolome. Metabolic network databases are of increasing importance in allowing us to contextualise data sets emerging from transcriptomic, proteomic and metabolomic experiments. Here we present a dynamic database, TrypanoCyc (http://www.metexplore.fr/trypanocyc/), which describes the generic and condition-specific metabolic network of Trypanosoma brucei, a parasitic protozoan responsible for human and animal African trypanosomiasis. In addition to enabling navigation through the BioCyc-based TrypanoCyc interface, we have also implemented a network-based representation of the information through MetExplore, yielding a novel environment in which to visualise the metabolism of this important parasite. PMID:25300491

  10. Exercise-Driven Metabolic Pathways in Healthy Cartilage

    PubMed Central

    Blazek, Alisa D.; Nam, Jin; Gupta, Rohan; Pradhan, Meera; Perera, Priyangi; Weisleder, Noah L.; Hewett, Timothy E.; Chaudhari°, Ajit M.; Lee, Beth S.; Leblebicioglu, Binnaz; Butterfield, Timothy A.; Agarwal, Sudha

    2016-01-01

    SUMMARY Objective Exercise is vital for maintaining cartilage integrity in healthy joints. Here we examined the exercise-driven transcriptional regulation of genes in healthy rat articular cartilage to dissect the metabolic pathways responsible for its potential benefits. Methods Transcriptome-wide gene expression in the articular cartilage of healthy Sprague-Dawley female rats exercised daily (low intensity treadmill walking) for 2, 5, or 15 days was compared to that of non-exercised rats, using Affymetrix GeneChip arrays. Database for Annotation, Visualization and Integrated Discovery (DAVID) was used for Gene Ontology (GO)-term enrichment and Functional Annotation analysis of differentially expressed genes (DEGs). Kyoto Encyclopedia of Genes and Genome (KEGG) pathway mapper was used to identify the metabolic pathways regulated by exercise. Results Microarray analysis revealed that exercise-induced 644 DEGs in healthy articular cartilage. The DAVID bioinformatics tool demonstrated high prevalence of Functional Annotation Clusters with greater enrichment scores and GO-terms associated with extracellular matrix (ECM) biosynthesis/remodeling and inflammation/immune response. The KEGG database revealed that exercise regulates 147 metabolic pathways representing molecular interaction networks for Metabolism, Genetic Information Processing, Environmental Information Processing, Cellular Processes, Organismal Systems, and Diseases. These pathways collectively supported the complex regulation of the beneficial effects of exercise on the cartilage. Conclusions Overall, the findings highlight that exercise is a robust transcriptional regulator of a wide array of metabolic pathways in healthy cartilage. The major actions of exercise involve ECM biosynthesis/cartilage strengthening and attenuation of inflammatory pathways to provide prophylaxis against onset of arthritic diseases in healthy cartilage. PMID:26924420

  11. Exercise-driven metabolic pathways in healthy cartilage.

    PubMed

    Blazek, A D; Nam, J; Gupta, R; Pradhan, M; Perera, P; Weisleder, N L; Hewett, T E; Chaudhari, A M; Lee, B S; Leblebicioglu, B; Butterfield, T A; Agarwal, S

    2016-07-01

    Exercise is vital for maintaining cartilage integrity in healthy joints. Here we examined the exercise-driven transcriptional regulation of genes in healthy rat articular cartilage to dissect the metabolic pathways responsible for the potential benefits of exercise. Transcriptome-wide gene expression in the articular cartilage of healthy Sprague-Dawley female rats exercised daily (low intensity treadmill walking) for 2, 5, or 15 days was compared to that of non-exercised rats, using Affymetrix GeneChip arrays. Database for Annotation, Visualization and Integrated Discovery (DAVID) was used for Gene Ontology (GO)-term enrichment and Functional Annotation analysis of differentially expressed genes (DEGs). Kyoto Encyclopedia of Genes and Genome (KEGG) pathway mapper was used to identify the metabolic pathways regulated by exercise. Microarray analysis revealed that exercise-induced 644 DEGs in healthy articular cartilage. The DAVID bioinformatics tool demonstrated high prevalence of functional annotation clusters with greater enrichment scores and GO-terms associated with extracellular matrix (ECM) biosynthesis/remodeling and inflammation/immune response. The KEGG database revealed that exercise regulates 147 metabolic pathways representing molecular interaction networks for Metabolism, Genetic Information Processing, Environmental Information Processing, Cellular Processes, Organismal Systems, and Diseases. These pathways collectively supported the complex regulation of the beneficial effects of exercise on the cartilage. Overall, the findings highlight that exercise is a robust transcriptional regulator of a wide array of metabolic pathways in healthy cartilage. The major actions of exercise involve ECM biosynthesis/cartilage strengthening and attenuation of inflammatory pathways to provide prophylaxis against onset of arthritic diseases in healthy cartilage. Copyright © 2016 Osteoarthritis Research Society International. Published by Elsevier Ltd. All rights

  12. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies.

    PubMed

    Schnoes, Alexandra M; Brown, Shoshana D; Dodevski, Igor; Babbitt, Patricia C

    2009-12-01

    Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%-63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with "overprediction" of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation.

  13. FMM: a web server for metabolic pathway reconstruction and comparative analysis.

    PubMed

    Chou, Chih-Hung; Chang, Wen-Chi; Chiu, Chih-Min; Huang, Chih-Chang; Huang, Hsien-Da

    2009-07-01

    Synthetic Biology, a multidisciplinary field, is growing rapidly. Improving the understanding of biological systems through mimicry and producing bio-orthogonal systems with new functions are two complementary pursuits in this field. A web server called FMM (From Metabolite to Metabolite) was developed for this purpose. FMM can reconstruct metabolic pathways form one metabolite to another metabolite among different species, based mainly on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and other integrated biological databases. Novel presentation for connecting different KEGG maps is newly provided. Both local and global graphical views of the metabolic pathways are designed. FMM has many applications in Synthetic Biology and Metabolic Engineering. For example, the reconstruction of metabolic pathways to produce valuable metabolites or secondary metabolites in bacteria or yeast is a promising strategy for drug production. FMM provides a highly effective way to elucidate the genes from which species should be cloned into those microorganisms based on FMM pathway comparative analysis. Consequently, FMM is an effective tool for applications in synthetic biology to produce both drugs and biofuels. This novel and innovative resource is now freely available at http://FMM.mbc.nctu.edu.tw/.

  14. GiSAO.db: a database for ageing research.

    PubMed

    Hofer, Edith; Laschober, Gerhard T; Hackl, Matthias; Thallinger, Gerhard G; Lepperdinger, Günter; Grillari, Johannes; Jansen-Dürr, Pidder; Trajanoski, Zlatko

    2011-05-24

    Age-related gene expression patterns of Homo sapiens as well as of model organisms such as Mus musculus, Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster are a basis for understanding the genetic mechanisms of ageing. For an effective analysis and interpretation of expression profiles it is necessary to store and manage huge amounts of data in an organized way, so that these data can be accessed and processed easily. GiSAO.db (Genes involved in senescence, apoptosis and oxidative stress database) is a web-based database system for storing and retrieving ageing-related experimental data. Expression data of genes and miRNAs, annotation data like gene identifiers and GO terms, orthologs data and data of follow-up experiments are stored in the database. A user-friendly web application provides access to the stored data. KEGG pathways were incorporated and links to external databases augment the information in GiSAO.db. Search functions facilitate retrieval of data which can also be exported for further processing. We have developed a centralized database that is very well suited for the management of data for ageing research. The database can be accessed at https://gisao.genome.tugraz.at and all the stored data can be viewed with a guest account.

  15. Sentra : a database of signal transduction proteins for comparative genome analysis.

    SciTech Connect

    D'Souza, M.; Glass, E. M.; Syed, M. H.; Zhang, Y.; Rodriguez, A.; Maltsev, N.; Galerpin, M. Y.; Mathematics and Computer Science; Univ. of Chicago; NIH

    2007-01-01

    Sentra (http://compbio.mcs.anl.gov/sentra), a database of signal transduction proteins encoded in completely sequenced prokaryotic genomes, has been updated to reflect recent advances in understanding signal transduction events on a whole-genome scale. Sentra consists of two principal components, a manually curated list of signal transduction proteins in 202 completely sequenced prokaryotic genomes and an automatically generated listing of predicted signaling proteins in 235 sequenced genomes that are awaiting manual curation. In addition to two-component histidine kinases and response regulators, the database now lists manually curated Ser/Thr/Tyr protein kinases and protein phosphatases, as well as adenylate and diguanylate cyclases and c-di-GMP phosphodiesterases, as defined in several recent reviews. All entries in Sentra are extensively annotated with relevant information from public databases (e.g. UniProt, KEGG, PDB and NCBI). Sentra's infrastructure was redesigned to support interactive cross-genome comparisons of signal transduction capabilities of prokaryotic organisms from a taxonomic and phenotypic perspective and in the framework of signal transduction pathways from KEGG. Sentra leverages the PUMA2 system to support interactive analysis and annotation of signal transduction proteins by the users.

  16. Pathway Network Analyses for Autism Reveal Multisystem Involvement, Major Overlaps with Other Diseases and Convergence upon MAPK and Calcium Signaling

    PubMed Central

    Wen, Ya; Alshikho, Mohamad J.; Herbert, Martha R.

    2016-01-01

    We used established databases in standard ways to systematically characterize gene ontologies, pathways and functional linkages in the large set of genes now associated with autism spectrum disorders (ASDs). These conditions are particularly challenging—they lack clear pathognomonic biological markers, they involve great heterogeneity across multiple levels (genes, systemic biological and brain characteristics, and nuances of behavioral manifestations)—and yet everyone with this diagnosis meets the same defining behavioral criteria. Using the human gene list from Simons Foundation Autism Research Initiative (SFARI) we performed gene set enrichment analysis with the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Database, and then derived a pathway network from pathway-pathway functional interactions again in reference to KEGG. Through identifying the GO (Gene Ontology) groups in which SFARI genes were enriched, mapping the coherence between pathways and GO groups, and ranking the relative strengths of representation of pathway network components, we 1) identified 10 disease-associated and 30 function-associated pathways 2) revealed calcium signaling pathway and neuroactive ligand-receptor interaction as the most enriched, statistically significant pathways from the enrichment analysis, 3) showed calcium signaling pathways and MAPK signaling pathway to be interactive hubs with other pathways and also to be involved with pervasively present biological processes, 4) found convergent indications that the process “calcium-PRC (protein kinase C)-Ras-Raf-MAPK/ERK” is likely a major contributor to ASD pathophysiology, and 5) noted that perturbations associated with KEGG’s category of environmental information processing were common. These findings support the idea that ASD-associated genes may contribute not only to core features of ASD themselves but also to vulnerability to other chronic and systemic problems potentially including cancer, metabolic

  17. WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data

    PubMed Central

    Yi, Ming; Horton, Jay D; Cohen, Jonathan C; Hobbs, Helen H; Stephens, Robert M

    2006-01-01

    Background Analysis of High Throughput (HTP) Data such as microarray and proteomics data has provided a powerful methodology to study patterns of gene regulation at genome scale. A major unresolved problem in the post-genomic era is to assemble the large amounts of data generated into a meaningful biological context. We have developed a comprehensive software tool, WholePathwayScope (WPS), for deriving biological insights from analysis of HTP data. Result WPS extracts gene lists with shared biological themes through color cue templates. WPS statistically evaluates global functional category enrichment of gene lists and pathway-level pattern enrichment of data. WPS incorporates well-known biological pathways from KEGG (Kyoto Encyclopedia of Genes and Genomes) and Biocarta, GO (Gene Ontology) terms as well as user-defined pathways or relevant gene clusters or groups, and explores gene-term relationships within the derived gene-term association networks (GTANs). WPS simultaneously compares multiple datasets within biological contexts either as pathways or as association networks. WPS also integrates Genetic Association Database and Partial MedGene Database for disease-association information. We have used this program to analyze and compare microarray and proteomics datasets derived from a variety of biological systems. Application examples demonstrated the capacity of WPS to significantly facilitate the analysis of HTP data for integrative discovery. Conclusion This tool represents a pathway-based platform for discovery integration to maximize analysis power. The tool is freely available at . PMID:16423281

  18. Endocrine Disruptors: Data-based survey of in vivo tests, predictive models and the Adverse Outcome Pathway.

    PubMed

    Benigni, Romualdo; Battistelli, Chiara Laura; Bossa, Cecilia; Giuliani, Alessandro; Tcheremenskaia, Olga

    2017-02-20

    The protection from endocrine disruptors is a high regulatory priority. Key issues are the characterization of in vivo assays, and the identification of reference chemicals to validate alternative methods. In this exploration, publicly available databases for in vivo assays for endocrine disruption were collected and compared: Rodent Uterotrophic, Rodent Repeated Dose 28-day Oral Toxicity, 21-Day Fish, and Daphnia magna reproduction assays. Only the Uterotrophic and 21-Day Fish assays results correlated with each other. The in vivo assays data were viewed in relation to the Adverse Outcome Pathway, using as a probe 18 ToxCast in vitro assays for the ER pathway. These are the same data at the basis of the EPA agonist ToxERscore model, whose good predictivity was confirmed. The multivariate comparison of the in vitro/in vivo assays suggests that the interaction with receptors is a major determinant of in vivo results, and is the critical basis for building predictive computational models. In agreement with the above, this work also shows that it is possible to build predictive models for the Uterotrophic and 21-Day Fish assays using a limited selection of Toxcast assays.

  19. Pathway — Using a State-of-the-Art Digital Video Database for Research and Development in Teacher Education

    NASA Astrophysics Data System (ADS)

    Adrian, Brian; Zollman, Dean; Stevens, Scott

    2006-02-01

    To demonstrate how state-of-the-art video databases can address issues related to the lack of preparation of many physics teachers, we have created the prototype Physics Teaching Web Advisory (Pathway). Pathway's Synthetic Interviews and related video materials are beginning to provide pre-service and out-of-field in-service teachers with much-needed professional development and well-prepared teachers with new perspectives on teaching physics. The prototype was limited to a demonstration of the systems. Now, with an additional grant we will extend the system and conduct research and evaluation on its effectiveness. This project will provide virtual expert help on issues of pedagogy and content. In particular, the system will convey, by example and explanation, contemporary ideas about the teaching of physics and applications of physics education research. The research effort will focus on the value of contemporary technology to address the continuing education of teachers who are teaching in a field in which they have not been trained.

  20. The Candidate Cancer Gene Database: a database of cancer driver genes from forward genetic screens in mice.

    PubMed

    Abbott, Kenneth L; Nyre, Erik T; Abrahante, Juan; Ho, Yen-Yi; Isaksson Vogel, Rachel; Starr, Timothy K

    2015-01-01

    Identification of cancer driver gene mutations is crucial for advancing cancer therapeutics. Due to the overwhelming number of passenger mutations in the human tumor genome, it is difficult to pinpoint causative driver genes. Using transposon mutagenesis in mice many laboratories have conducted forward genetic screens and identified thousands of candidate driver genes that are highly relevant to human cancer. Unfortunately, this information is difficult to access and utilize because it is scattered across multiple publications using different mouse genome builds and strength metrics. To improve access to these findings and facilitate meta-analyses, we developed the Candidate Cancer Gene Database (CCGD, http://ccgd-starrlab.oit.umn.edu/). The CCGD is a manually curated database containing a unified description of all identified candidate driver genes and the genomic location of transposon common insertion sites (CISs) from all currently published transposon-based screens. To demonstrate relevance to human cancer, we performed a modified gene set enrichment analysis using KEGG pathways and show that human cancer pathways are highly enriched in the database. We also used hierarchical clustering to identify pathways enriched in blood cancers compared to solid cancers. The CCGD is a novel resource available to scientists interested in the identification of genetic drivers of cancer. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Using Bioinformatic Approaches to Identify Pathways Targeted by Human Leukemogens

    PubMed Central

    Thomas, Reuben; Phuong, Jimmy; McHale, Cliona M.; Zhang, Luoping

    2012-01-01

    We have applied bioinformatic approaches to identify pathways common to chemical leukemogens and to determine whether leukemogens could be distinguished from non-leukemogenic carcinogens. From all known and probable carcinogens classified by IARC and NTP, we identified 35 carcinogens that were associated with leukemia risk in human studies and 16 non-leukemogenic carcinogens. Using data on gene/protein targets available in the Comparative Toxicogenomics Database (CTD) for 29 of the leukemogens and 11 of the non-leukemogenic carcinogens, we analyzed for enrichment of all 250 human biochemical pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The top pathways targeted by the leukemogens included metabolism of xenobiotics by cytochrome P450, glutathione metabolism, neurotrophin signaling pathway, apoptosis, MAPK signaling, Toll-like receptor signaling and various cancer pathways. The 29 leukemogens formed 18 distinct clusters comprising 1 to 3 chemicals that did not correlate with known mechanism of action or with structural similarity as determined by 2D Tanimoto coefficients in the PubChem database. Unsupervised clustering and one-class support vector machines, based on the pathway data, were unable to distinguish the 29 leukemogens from 11 non-leukemogenic known and probable IARC carcinogens. However, using two-class random forests to estimate leukemogen and non-leukemogen patterns, we estimated a 76% chance of distinguishing a random leukemogen/non-leukemogen pair from each other. PMID:22851955

  2. Gene ontology and KEGG enrichment analyses of genes related to age-related macular degeneration.

    PubMed

    Zhang, Jian; Xing, ZhiHao; Ma, Mingming; Wang, Ning; Cai, Yu-Dong; Chen, Lei; Xu, Xun

    2014-01-01

    Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD) is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes.

  3. LUCApedia: a database for the study of ancient life.

    PubMed

    Goldman, Aaron David; Bernhard, Tess M; Dolzhenko, Egor; Landweber, Laura F

    2013-01-01

    Organisms represented by the root of the universal evolutionary tree were most likely complex cells with a sophisticated protein translation system and a DNA genome encoding hundreds of genes. The growth of bioinformatics data from taxonomically diverse organisms has made it possible to infer the likely properties of early life in greater detail. Here we present LUCApedia, (http://eeb.princeton.edu/lucapedia), a unified framework for simultaneously evaluating multiple data sets related to the Last Universal Common Ancestor (LUCA) and its predecessors. This unification is achieved by mapping eleven such data sets onto UniProt, KEGG and BioCyc IDs. LUCApedia may be used to rapidly acquire evidence that a certain gene or set of genes is ancient, to examine the early evolution of metabolic pathways, or to test specific hypotheses related to ancient life by corroborating them against the rest of the database.

  4. PathwAX: a web server for network crosstalk based pathway annotation

    PubMed Central

    Ogris, Christoph; Helleday, Thomas; Sonnhammer, Erik L.L.

    2016-01-01

    Pathway annotation of gene lists is often used to functionally analyse biomolecular data such as gene expression in order to establish which processes are activated in a given experiment. Databases such as KEGG or GO represent collections of how genes are known to be organized in pathways, and the challenge is to compare a given gene list with the known pathways such that all true relations are identified. Most tools apply statistical measures to the gene overlap between the gene list and pathway. It is however problematic to avoid false negatives and false positives when only using the gene overlap. The pathwAX web server (http://pathwAX.sbc.su.se/) applies a different approach which is based on network crosstalk. It uses the comprehensive network FunCoup to analyse network crosstalk between a query gene list and KEGG pathways. PathwAX runs the BinoX algorithm, which employs Monte-Carlo sampling of randomized networks and estimates a binomial distribution, for estimating the statistical significance of the crosstalk. This results in substantially higher accuracy than gene overlap methods. The system was optimized for speed and allows interactive web usage. We illustrate the usage and output of pathwAX. PMID:27151197

  5. SuperToxic: a comprehensive database of toxic compounds

    PubMed Central

    Schmidt, Ulrike; Struck, Swantje; Gruening, Bjoern; Hossbach, Julia; Jaeger, Ines S.; Parol, Roza; Lindequist, Ulrike; Teuscher, Eberhard; Preissner, Robert

    2009-01-01

    Within our everyday life, we are confronted with a variety of toxic substances of natural or artificial origin. Toxins are already used, e.g. in medicine, but there is still an increasing number of toxic compounds, representing a tremendous potential to extract new substances. Since predictive toxicology gains in importance, the careful and extensive investigation of known toxins is the basis to assess the properties of unknown substances. In order to achieve this aim, we have collected toxic compounds from literature and web sources in the database SuperToxic. The current version of this database compiles about 60 000 compounds and their structures. These molecules are classified according to their toxicity, based on more than 2 million measurements. The SuperToxic database provides a variety of search options like name, CASRN, molecular weight and measured values of toxicity. With the aid of implemented similarity searches, information about possible biological interactions can be gained. Furthermore, connections to the Protein Data Bank, UniProt and the KEGG database are available, to allow the identification of targets and those pathways, the searched compounds are involved in. This database is available online at: http://bioinformatics.charite.de/supertoxic. PMID:19004875

  6. SoyFN: a knowledge database of soybean functional networks

    PubMed Central

    Xu, Yungang; Guo, Maozu; Liu, Xiaoyan; Wang, Chunyu; Liu, Yang

    2014-01-01

    Many databases for soybean genomic analysis have been built and made publicly available, but few of them contain knowledge specifically targeting the omics-level gene–gene, gene–microRNA (miRNA) and miRNA–miRNA interactions. Here, we present SoyFN, a knowledge database of soybean functional gene networks and miRNA functional networks. SoyFN provides user-friendly interfaces to retrieve, visualize, analyze and download the functional networks of soybean genes and miRNAs. In addition, it incorporates much information about KEGG pathways, gene ontology annotations and 3′-UTR sequences as well as many useful tools including SoySearch, ID mapping, Genome Browser, eFP Browser and promoter motif scan. SoyFN is a schema-free database that can be accessed as a Web service from any modern programming language using a simple Hypertext Transfer Protocol call. The Web site is implemented in Java, JavaScript, PHP, HTML and Apache, with all major browsers supported. We anticipate that this database will be useful for members of research communities both in soybean experimental science and bioinformatics. Database URL: http://nclab.hit.edu.cn/SoyFN PMID:24618044

  7. dEMBF: A Comprehensive Database of Enzymes of Microalgal Biofuel Feedstock.

    PubMed

    Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar; Mishra, Barada Kanta

    2016-01-01

    Microalgae have attracted wide attention as one of the most versatile renewable feedstocks for production of biofuel. To develop genetically engineered high lipid yielding algal strains, a thorough understanding of the lipid biosynthetic pathway and the underpinning enzymes is essential. In this work, we have systematically mined the genomes of fifteen diverse algal species belonging to Chlorophyta, Heterokontophyta, Rhodophyta, and Haptophyta, to identify and annotate the putative enzymes of lipid metabolic pathway. Consequently, we have also developed a database, dEMBF (Database of Enzymes of Microalgal Biofuel Feedstock), which catalogues the complete list of identified enzymes along with their computed annotation details including length, hydrophobicity, amino acid composition, subcellular location, gene ontology, KEGG pathway, orthologous group, Pfam domain, intron-exon organization, transmembrane topology, and secondary/tertiary structural data. Furthermore, to facilitate functional and evolutionary study of these enzymes, a collection of built-in applications for BLAST search, motif identification, sequence and phylogenetic analysis have been seamlessly integrated into the database. dEMBF is the first database that brings together all enzymes responsible for lipid synthesis from available algal genomes, and provides an integrative platform for enzyme inquiry and analysis. This database will be extremely useful for algal biofuel research. It can be accessed at http://bbprof.immt.res.in/embf.

  8. dEMBF: A Comprehensive Database of Enzymes of Microalgal Biofuel Feedstock

    PubMed Central

    Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar; Mishra, Barada Kanta

    2016-01-01

    Microalgae have attracted wide attention as one of the most versatile renewable feedstocks for production of biofuel. To develop genetically engineered high lipid yielding algal strains, a thorough understanding of the lipid biosynthetic pathway and the underpinning enzymes is essential. In this work, we have systematically mined the genomes of fifteen diverse algal species belonging to Chlorophyta, Heterokontophyta, Rhodophyta, and Haptophyta, to identify and annotate the putative enzymes of lipid metabolic pathway. Consequently, we have also developed a database, dEMBF (Database of Enzymes of Microalgal Biofuel Feedstock), which catalogues the complete list of identified enzymes along with their computed annotation details including length, hydrophobicity, amino acid composition, subcellular location, gene ontology, KEGG pathway, orthologous group, Pfam domain, intron-exon organization, transmembrane topology, and secondary/tertiary structural data. Furthermore, to facilitate functional and evolutionary study of these enzymes, a collection of built-in applications for BLAST search, motif identification, sequence and phylogenetic analysis have been seamlessly integrated into the database. dEMBF is the first database that brings together all enzymes responsible for lipid synthesis from available algal genomes, and provides an integrative platform for enzyme inquiry and analysis. This database will be extremely useful for algal biofuel research. It can be accessed at http://bbprof.immt.res.in/embf. PMID:26727469

  9. Bioinformatics analysis of key genes and pathways for hepatocellular carcinoma transformed from cirrhosis

    PubMed Central

    He, Bosheng; Yin, Jianbing; Gong, Shenchu; Gu, Jinhua; Xiao, Jing; Shi, Weixiang; Ding, Wenbin; He, Ying

    2017-01-01

    Abstract Objective: We aimed to identify some pivotal genes and pathways for hepatocellular carcinoma (HCC) transformation from cirrhosis and explore potential targets for treatment of the disease. Methods: The GSE17548 microarray data were downloaded from Gene Expression Omnibus database, and 37 samples (20 cirrhosis and 17 HCC samples) were used for analysis. The differentially expressed genes (DEGs) in HCC tissues were compared with those in cirrhosis tissues and analyzed using the limma package. Gene ontology-biological process and Kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment analyses were performed using ClueGO and CluePedia tool kits, and the key KEGG pathway was analyzed using the R package pathview. The regulatory factor miRNA of DEGs was extracted from 3 verified miRNAs-target databases using the multiMiR R package. Moreover, a protein-protein interaction (PPI) network was constructed using the Cytoscape software. Results: DEGs including cyclin-dependent Kinase 1 (CDK1), PDZ-binding kinase (PBK), ribonucleotide reductase M2 (RRM2), and abnormal spindle homolog, and microcephaly-associated drosophila (ASPM) were the hub proteins with higher degrees in the PPI network. The cell cycle pathway (CDK1 enriched) and p53 signaling pathway (CDK1 and RRM2 enriched) were significantly enriched by DEGs. Conclusion: CDK1, PBK, RRM2, and ASPM may be key genes for HCC transformation from cirrhosis. Furthermore, cell cycle and p53 signaling pathways may play vital mediatory roles; CDK1 may play crucial roles in HCC transformed from cirrhosis via cell cycle and p53 signaling pathways, and RRM2 might be involved in HCC transformed from cirrhosis via the p53 signaling pathway. PMID:28640074

  10. PathPPI: an integrated dataset of human pathways and protein-protein interactions.

    PubMed

    Tang, HaiLin; Zhong, Fan; Liu, Wei; He, FuChu; Xie, HongWei

    2015-06-01

    Integration of pathway and protein-protein interaction (PPI) data can provide more information that could lead to new biological insights. PPIs are usually represented by a simple binary model, whereas pathways are represented by more complicated models. We developed a series of rules for transforming protein interactions from pathway to binary model, and the protein interactions from seven pathway databases, including PID, BioCarta, Reactome, NetPath, INOH, SPIKE and KEGG, were transformed based on these rules. These pathway-derived binary protein interactions were integrated with PPIs from other five PPI databases including HPRD, IntAct, BioGRID, MINT and DIP, to develop integrated dataset (named PathPPI). More detailed interaction type and modification information on protein interactions can be preserved in PathPPI than other existing datasets. Comparison analysis results indicate that most of the interaction overlaps values (O AB) among these pathway databases were less than 5%, and these databases must be used conjunctively. The PathPPI data was provided at http://proteomeview.hupo.org.cn/PathPPI/PathPPI.html.

  11. Pathway-based approach using hierarchical components of collapsed rare variants

    PubMed Central

    Lee, Sungyoung; Choi, Sungkyoung; Kim, Young Jin; Kim, Bong-Jo; Hwang, Heungsun; Park, Taesung

    2016-01-01

    Motivation: To address ‘missing heritability’ issue, many statistical methods for pathway-based analyses using rare variants have been proposed to analyze pathways individually. However, neglecting correlations between multiple pathways can result in misleading solutions, and pathway-based analyses of large-scale genetic datasets require massive computational burden. We propose a Pathway-based approach using HierArchical components of collapsed RAre variants Of High-throughput sequencing data (PHARAOH) for the analysis of rare variants by constructing a single hierarchical model that consists of collapsed gene-level summaries and pathways and analyzes entire pathways simultaneously by imposing ridge-type penalties on both gene and pathway coefficient estimates; hence our method considers the correlation of pathways without constraint by a multiple testing problem. Results: Through simulation studies, the proposed method was shown to have higher statistical power than the existing pathway-based methods. In addition, our method was applied to the large-scale whole-exome sequencing data with levels of a liver enzyme using two well-known pathway databases Biocarta and KEGG. This application demonstrated that our method not only identified associated pathways but also successfully detected biologically plausible pathways for a phenotype of interest. These findings were successfully replicated by an independent large-scale exome chip study. Availability and Implementation: An implementation of PHARAOH is available at http://statgen.snu.ac.kr/software/pharaoh/. Contact: tspark@stats.snu.ac.kr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27587678

  12. ProOpDB: Prokaryotic Operon DataBase.

    PubMed

    Taboada, Blanca; Ciria, Ricardo; Martinez-Guerrero, Cristian E; Merino, Enrique

    2012-01-01

    The Prokaryotic Operon DataBase (ProOpDB, http://operons.ibt.unam.mx/OperonPredictor) constitutes one of the most precise and complete repositories of operon predictions now available. Using our novel and highly accurate operon identification algorithm, we have predicted the operon structures of more than 1200 prokaryotic genomes. ProOpDB offers diverse alternatives by which a set of operon predictions can be retrieved including: (i) organism name, (ii) metabolic pathways, as defined by the KEGG database, (iii) gene orthology, as defined by the COG database, (iv) conserved protein domains, as defined by the Pfam database, (v) reference gene and (vi) reference operon, among others. In order to limit the operon output to non-redundant organisms, ProOpDB offers an efficient method to select the most representative organisms based on a precompiled phylogenetic distances matrix. In addition, the ProOpDB operon predictions are used directly as the input data of our Gene Context Tool to visualize their genomic context and retrieve the sequence of their corresponding 5' regulatory regions, as well as the nucleotide or amino acid sequences of their genes.

  13. Pathway modeling of microarray data: A case study of pathway activity changes in the testis following in utero exposure to dibutyl phthalate (DBP)

    SciTech Connect

    Ovacik, Meric A.; Sen, Banalata; Euling, Susan Y.; Gaido, Kevin W.; Ierapetritou, Marianthi G.; Androulakis, Ioannis P.

    2013-09-15

    Pathway activity level analysis, the approach pursued in this study, focuses on all genes that are known to be members of metabolic and signaling pathways as defined by the KEGG database. The pathway activity level analysis entails singular value decomposition (SVD) of the expression data of the genes constituting a given pathway. We explore an extension of the pathway activity methodology for application to time-course microarray data. We show that pathway analysis enhances our ability to detect biologically relevant changes in pathway activity using synthetic data. As a case study, we apply the pathway activity level formulation coupled with significance analysis to microarray data from two different rat testes exposed in utero to Dibutyl Phthalate (DBP). In utero DBP exposure in the rat results in developmental toxicity of a number of male reproductive organs, including the testes. One well-characterized mode of action for DBP and the male reproductive developmental effects is the repression of expression of genes involved in cholesterol transport, steroid biosynthesis and testosterone synthesis that lead to a decreased fetal testicular testosterone. Previous analyses of DBP testes microarray data focused on either individual gene expression changes or changes in the expression of specific genes that are hypothesized, or known, to be important in testicular development and testosterone synthesis. However, a pathway analysis may inform whether there are additional affected pathways that could inform additional modes of action linked to DBP developmental toxicity. We show that Pathway activity analysis may be considered for a more comprehensive analysis of microarray data.

  14. Pandora, a pathway and network discovery approach based on common biological evidence.

    PubMed

    Zhang, Kelvin Xi; Ouellette, B F Francis

    2010-02-15

    Many biological phenomena involve extensive interactions between many of the biological pathways present in cells. However, extraction of all the inherent biological pathways remains a major challenge in systems biology. With the advent of high-throughput functional genomic techniques, it is now possible to infer biological pathways and pathway organization in a systematic way by integrating disparate biological information. Here, we propose a novel integrated approach that uses network topology to predict biological pathways. We integrated four types of biological evidence (protein-protein interaction, genetic interaction, domain-domain interaction and semantic similarity of Gene Ontology terms) to generate a functionally associated network. This network was then used to develop a new pathway finding algorithm to predict biological pathways in yeast. Our approach discovered 195 biological pathways and 31 functionally redundant pathway pairs in yeast. By comparing our identified pathways to three public pathway databases (KEGG, BioCyc and Reactome), we observed that our approach achieves a maximum positive predictive value of 12.8% and improves on other predictive approaches. This study allows us to reconstruct biological pathways and delineates cellular machinery in a systematic view.

  15. Pandora, a PAthway and Network DiscOveRy Approach based on common biological evidence

    PubMed Central

    Zhang, Kelvin Xi; Ouellette, B. F. Francis

    2010-01-01

    Motivation: Many biological phenomena involve extensive interactions between many of the biological pathways present in cells. However, extraction of all the inherent biological pathways remains a major challenge in systems biology. With the advent of high-throughput functional genomic techniques, it is now possible to infer biological pathways and pathway organization in a systematic way by integrating disparate biological information. Results: Here, we propose a novel integrated approach that uses network topology to predict biological pathways. We integrated four types of biological evidence (protein–protein interaction, genetic interaction, domain–domain interaction and semantic similarity of Gene Ontology terms) to generate a functionally associated network. This network was then used to develop a new pathway finding algorithm to predict biological pathways in yeast. Our approach discovered 195 biological pathways and 31 functionally redundant pathway pairs in yeast. By comparing our identified pathways to three public pathway databases (KEGG, BioCyc and Reactome), we observed that our approach achieves a maximum positive predictive value of 12.8% and improves on other predictive approaches. This study allows us to reconstruct biological pathways and delineates cellular machinery in a systematic view. Availability: The method has been implemented in Perl and is available for downloading from http://www.oicr.on.ca/research/ouellette/pandora. It is distributed under the terms of GPL (http://opensource.org/licenses/gpl-2.0.php) Contact: francis@oicr.on.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20031970

  16. A Web Tool for Generating High Quality Machine-readable Biological Pathways

    PubMed Central

    Ramirez-Gaona, Miguel; Marcu, Ana; Pon, Allison; Grant, Jason; Wu, Anthony; Wishart, David S.

    2017-01-01

    PathWhiz is a web server built to facilitate the creation of colorful, interactive, visually pleasing pathway diagrams that are rich in biological information. The pathways generated by this online application are machine-readable and fully compatible with essentially all web-browsers and computer operating systems. It uses a specially developed, web-enabled pathway drawing interface that permits the selection and placement of different combinations of pre-drawn biological or biochemical entities to depict reactions, interactions, transport processes and binding events. This palette of entities consists of chemical compounds, proteins, nucleic acids, cellular membranes, subcellular structures, tissues, and organs. All of the visual elements in it can be interactively adjusted and customized. Furthermore, because this tool is a web server, all pathways and pathway elements are publicly accessible. This kind of pathway "crowd sourcing" means that PathWhiz already contains a large and rapidly growing collection of previously drawn pathways and pathway elements. Here we describe a protocol for the quick and easy creation of new pathways and the alteration of existing pathways. To further facilitate pathway editing and creation, the tool contains replication and propagation functions. The replication function allows existing pathways to be used as templates to create or edit new pathways. The propagation function allows one to take an existing pathway and automatically propagate it across different species. Pathways created with this tool can be "re-styled" into different formats (KEGG-like or text-book like), colored with different backgrounds, exported to BioPAX, SBGN-ML, SBML, or PWML data exchange formats, and downloaded as PNG or SVG images. The pathways can easily be incorporated into online databases, integrated into presentations, posters or publications, or used exclusively for online visualization and exploration. This protocol has been successfully applied to

  17. A Web Tool for Generating High Quality Machine-readable Biological Pathways.

    PubMed

    Ramirez-Gaona, Miguel; Marcu, Ana; Pon, Allison; Grant, Jason; Wu, Anthony; Wishart, David S

    2017-02-08

    PathWhiz is a web server built to facilitate the creation of colorful, interactive, visually pleasing pathway diagrams that are rich in biological information. The pathways generated by this online application are machine-readable and fully compatible with essentially all web-browsers and computer operating systems. It uses a specially developed, web-enabled pathway drawing interface that permits the selection and placement of different combinations of pre-drawn biological or biochemical entities to depict reactions, interactions, transport processes and binding events. This palette of entities consists of chemical compounds, proteins, nucleic acids, cellular membranes, subcellular structures, tissues, and organs. All of the visual elements in it can be interactively adjusted and customized. Furthermore, because this tool is a web server, all pathways and pathway elements are publicly accessible. This kind of pathway "crowd sourcing" means that PathWhiz already contains a large and rapidly growing collection of previously drawn pathways and pathway elements. Here we describe a protocol for the quick and easy creation of new pathways and the alteration of existing pathways. To further facilitate pathway editing and creation, the tool contains replication and propagation functions. The replication function allows existing pathways to be used as templates to create or edit new pathways. The propagation function allows one to take an existing pathway and automatically propagate it across different species. Pathways created with this tool can be "re-styled" into different formats (KEGG-like or text-book like), colored with different backgrounds, exported to BioPAX, SBGN-ML, SBML, or PWML data exchange formats, and downloaded as PNG or SVG images. The pathways can easily be incorporated into online databases, integrated into presentations, posters or publications, or used exclusively for online visualization and exploration. This protocol has been successfully applied to

  18. Metabolic pathway reconstruction of eugenol to vanillin bioconversion in Aspergillus niger.

    PubMed

    Srivastava, Suchita; Luqman, Suaib; Khan, Feroz; Chanotiya, Chandan S; Darokar, Mahendra P

    2010-01-23

    Identification of missing genes or proteins participating in the metabolic pathways as enzymes are of great interest. One such class of pathway is involved in the eugenol to vanillin bioconversion. Our goal is to develop an integral approach for identifying the topology of a reference or known pathway in other organism. We successfully identify the missing enzymes and then reconstruct the vanillin biosynthetic pathway in Aspergillus niger. The procedure combines enzyme sequence similarity searched through BLAST homology search and orthologs detection through COG & KEGG databases. Conservation of protein domains and motifs was searched through CDD, PFAM & PROSITE databases. Predictions regarding how proteins act in pathway were validated experimentally and also compared with reported data. The bioconversion of vanillin was screened on UV-TLC plates and later confirmed through GC and GC-MS techniques. We applied a procedure for identifying missing enzymes on the basis of conserved functional motifs and later reconstruct the metabolic pathway in target organism. Using the vanillin biosynthetic pathway of Pseudomonas fluorescens as a case study, we indicate how this approach can be used to reconstruct the reference pathway in A. niger and later results were experimentally validated through chromatography and spectroscopy techniques.

  19. Identification of ecdysteroid receptor-mediated signaling pathways in the hepatopancreas of the red swamp crayfish, Procambarus clarkii.

    PubMed

    Zhu, Baojian; Tang, Lin; Yu, Yingying; Yu, Huimin; Wang, Lei; Qian, Cen; Wei, Guoqing; Liu, Chaoliang

    2017-01-06

    The hepatopancreas of crustaceans plays an important role in lipid and carbohydrate metabolism, digestion of food, and biogenesis. In this study, the hepatopancreas transcriptome from the red crayfish Procambarus clarkii was characterized for the first time using high-throughput sequencing, producing approximately 41.4 million reads were obtained. After de novo assembly, 57,363 unigenes with an average length of 725bp were identified, Gene Ontology analysis categorized 22,580 as being involved in biological processes, among which metabolic process and cellular process groups were the most highly enriched. A total of 8034 unigenes were assigned to 223 metabolic pathways following mapping against the Kyoto encyclopedia of genes and genomes (KEGG) database. Ecdysteroid receptor (EcR)-mediated signaling pathways were investigated using digital gene expression (DGE) analysis following RNA interference targeting the EcR. A total of 529 differentially expressed genes (DEGs) were identified, including 322 downregulated and 207 upregulated unigenes. Of these, 445 (84.12%) were annotated successfully by alignment with known sequences, many of which were related to catalytic activity and binding functional categories. Using KEGG enrichment analysis, 183 DEGs were clustered into 78 pathways, and six significantly enriched pathways were predicted. The expression patterns of candidate genes identified by real-time PCR were consistent with the DGE results.

  20. PathwayBooster: a tool to support the curation of metabolic pathways.

    PubMed

    Liberal, Rodrigo; Lisowska, Beata K; Leak, David J; Pinney, John W

    2015-03-15

    Despite several recent advances in the automated generation of draft metabolic reconstructions, the manual curation of these networks to produce high quality genome-scale metabolic models remains a labour-intensive and challenging task. We present PathwayBooster, an open-source software tool to support the manual comparison and curation of metabolic models. It combines gene annotations from GenBank files and other sources with information retrieved from the metabolic databases BRENDA and KEGG to produce a set of pathway diagrams and reports summarising the evidence for the presence of a reaction in a given organism's metabolic network. By comparing multiple sources of evidence within a common framework, PathwayBooster assists the curator in the identification of likely false positive (misannotated enzyme) and false negative (pathway hole) reactions. Reaction evidence may be taken from alternative annotations of the same genome and/or a set of closely related organisms. By integrating and visualising evidence from multiple sources, PathwayBooster reduces the manual effort required in the curation of a metabolic model. The software is available online at http://www.theosysbio.bio.ic.ac.uk/resources/pathwaybooster/ .

  1. PhID: an open-access integrated pharmacology interactions database for drugs, targets, diseases, genes, side-effects and pathways.

    PubMed

    Deng, Zhe; Tu, Weizhong; Deng, Zixin; Hu, Qian-Nan

    2017-09-14

    The current network pharmacology study encountered a bottleneck with a lot of public data scattered in different databases. There is the lack of open-access and consolidated platform that integrates this information for systemic research. To address this issue, we have developed PhID, an integrated pharmacology database which integrates >400,000 pharmacology elements (drug, target, disease, gene, side-effect, and pathway) and >200,000 element interactions in branches of public databases. The PhID has three major applications: (1) assists scientists searching through the overwhelming amount of pharmacology elements interaction data by names, public IDs, molecule structures, or molecular sub-structures; (2) helps visualizing pharmacology elements and their interactions with a web-based network graph; (3) provides prediction of drug-target interactions through two modules: PreDPI-ki and FIM, by which users can predict drug-target interactions of the PhID entities or some drug-target pairs they interest. To get a systems-level understanding of drug action and disease complexity, PhID as a network pharmacology tool was established from the perspective of data layer, visualization layer and prediction model layer to present information untapped by current databases. Database URL: http://phid.ditad.org/.

  2. Pathway Analysis for Genome-Wide Association Study of Lung Cancer in Han Chinese Population

    PubMed Central

    Wu, Chen; Jin, Guangfu; Dai, Juncheng; Wang, Cheng; Hu, Lingmin; Gou, Jianwei; Qian, Chen; Bai, Jianling; Wu, Tangchun; Hu, Zhibin; Lin, Dongxin; Shen, Hongbing; Chen, Feng

    2013-01-01

    Genome-wide association studies (GWAS) have identified a number of genetic variants associated with lung cancer risk. However, these loci explain only a small fraction of lung cancer hereditability and other variants with weak effect may be lost in the GWAS approach due to the stringent significance level after multiple comparison correction. In this study, in order to identify important pathways involving the lung carcinogenesis, we performed a two-stage pathway analysis in GWAS of lung cancer in Han Chinese using gene set enrichment analysis (GSEA) method. Predefined pathways by BioCarta and KEGG databases were systematically evaluated on Nanjing study (Discovery stage: 1,473 cases and 1,962 controls) and the suggestive pathways were further to be validated in Beijing study (Replication stage: 858 cases and 1,115 controls). We found that four pathways (achPathway, metPathway, At1rPathway and rac1Pathway) were consistently significant in both studies and the P values for combined dataset were 0.012, 0.010, 0.022 and 0.005 respectively. These results were stable after sensitivity analysis based on gene definition and gene overlaps between pathways. These findings may provide new insights into the etiology of lung cancer. PMID:23469231

  3. Gene-set meta-analysis of lung cancer identifies pathway related to systemic lupus erythematosus.

    PubMed

    Rosenberger, Albert; Sohns, Melanie; Friedrichs, Stefanie; Hung, Rayjean J; Fehringer, Gord; McLaughlin, John; Amos, Christopher I; Brennan, Paul; Risch, Angela; Brüske, Irene; Caporaso, Neil E; Landi, Maria Teresa; Christiani, David C; Wei, Yongyue; Bickeböller, Heike

    2017-01-01

    Gene-set analysis (GSA) is an approach using the results of single-marker genome-wide association studies when investigating pathways as a whole with respect to the genetic basis of a disease. We performed a meta-analysis of seven GSAs for lung cancer, applying the method META-GSA. Overall, the information taken from 11,365 cases and 22,505 controls from within the TRICL/ILCCO consortia was used to investigate a total of 234 pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. META-GSA reveals the systemic lupus erythematosus KEGG pathway hsa05322, driven by the gene region 6p21-22, as also implicated in lung cancer (p = 0.0306). This gene region is known to be associated with squamous cell lung carcinoma. The most important genes driving the significance of this pathway belong to the genomic areas HIST1-H4L, -1BN, -2BN, -H2AK, -H4K and C2/C4A/C4B. Within these areas, the markers most significantly associated with LC are rs13194781 (located within HIST12BN) and rs1270942 (located between C2 and C4A). We have discovered a pathway currently marked as specific to systemic lupus erythematosus as being significantly implicated in lung cancer. The gene region 6p21-22 in this pathway appears to be more extensively associated with lung cancer than previously assumed. Given wide-stretched linkage disequilibrium to the area APOM/BAG6/MSH5, there is currently simply not enough information or evidence to conclude whether the potential pleiotropy of lung cancer and systemic lupus erythematosus is spurious, biological, or mediated. Further research into this pathway and gene region will be necessary.

  4. Gene-set meta-analysis of lung cancer identifies pathway related to systemic lupus erythematosus

    PubMed Central

    Sohns, Melanie; Friedrichs, Stefanie; Hung, Rayjean J.; Fehringer, Gord; McLaughlin, John; Amos, Christopher I.; Brennan, Paul; Risch, Angela; Brüske, Irene; Caporaso, Neil E.; Landi, Maria Teresa; Christiani, David C.; Wei, Yongyue; Bickeböller, Heike

    2017-01-01

    Introduction Gene-set analysis (GSA) is an approach using the results of single-marker genome-wide association studies when investigating pathways as a whole with respect to the genetic basis of a disease. Methods We performed a meta-analysis of seven GSAs for lung cancer, applying the method META-GSA. Overall, the information taken from 11,365 cases and 22,505 controls from within the TRICL/ILCCO consortia was used to investigate a total of 234 pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Results META-GSA reveals the systemic lupus erythematosus KEGG pathway hsa05322, driven by the gene region 6p21-22, as also implicated in lung cancer (p = 0.0306). This gene region is known to be associated with squamous cell lung carcinoma. The most important genes driving the significance of this pathway belong to the genomic areas HIST1-H4L, -1BN, -2BN, -H2AK, -H4K and C2/C4A/C4B. Within these areas, the markers most significantly associated with LC are rs13194781 (located within HIST12BN) and rs1270942 (located between C2 and C4A). Conclusions We have discovered a pathway currently marked as specific to systemic lupus erythematosus as being significantly implicated in lung cancer. The gene region 6p21-22 in this pathway appears to be more extensively associated with lung cancer than previously assumed. Given wide-stretched linkage disequilibrium to the area APOM/BAG6/MSH5, there is currently simply not enough information or evidence to conclude whether the potential pleiotropy of lung cancer and systemic lupus erythematosus is spurious, biological, or mediated. Further research into this pathway and gene region will be necessary. PMID:28273134

  5. Analysis of Important Gene Ontology Terms and Biological Pathways Related to Pancreatic Cancer.

    PubMed

    Yin, Hang; Wang, ShaoPeng; Zhang, Yu-Hang; Cai, Yu-Dong; Liu, Hailin

    2016-01-01

    Pancreatic cancer is a serious disease that results in more than thirty thousand deaths around the world per year. To design effective treatments, many investigators have devoted themselves to the study of biological processes and mechanisms underlying this disease. However, it is far from complete. In this study, we tried to extract important gene ontology (GO) terms and KEGG pathways for pancreatic cancer by adopting some existing computational methods. Genes that have been validated to be related to pancreatic cancer and have not been validated were represented by features derived from GO terms and KEGG pathways using the enrichment theory. A popular feature selection method, minimum redundancy maximum relevance, was employed to analyze these features and extract important GO terms and KEGG pathways. An extensive analysis of the obtained GO terms and KEGG pathways was provided to confirm the correlations between them and pancreatic cancer.

  6. Analysis of Important Gene Ontology Terms and Biological Pathways Related to Pancreatic Cancer

    PubMed Central

    Yin, Hang; Wang, ShaoPeng; Zhang, Yu-Hang

    2016-01-01

    Pancreatic cancer is a serious disease that results in more than thirty thousand deaths around the world per year. To design effective treatments, many investigators have devoted themselves to the study of biological processes and mechanisms underlying this disease. However, it is far from complete. In this study, we tried to extract important gene ontology (GO) terms and KEGG pathways for pancreatic cancer by adopting some existing computational methods. Genes that have been validated to be related to pancreatic cancer and have not been validated were represented by features derived from GO terms and KEGG pathways using the enrichment theory. A popular feature selection method, minimum redundancy maximum relevance, was employed to analyze these features and extract important GO terms and KEGG pathways. An extensive analysis of the obtained GO terms and KEGG pathways was provided to confirm the correlations between them and pancreatic cancer. PMID:27957501

  7. KENeV: A web-application for the automated reconstruction and visualization of the enriched metabolic and signaling super-pathways deriving from genomic experiments

    PubMed Central

    Pilalis, Eleftherios; Koutsandreas, Theodoros; Valavanis, Ioannis; Athanasiadis, Emmanouil; Spyrou, George; Chatziioannou, Aristotelis

    2015-01-01

    Gene expression analysis, using high throughput genomic technologies,has become an indispensable step for the meaningful interpretation of the underlying molecular complexity, which shapes the phenotypic manifestation of the investigated biological mechanism. The modularity of the cellular response to different experimental conditions can be comprehended through the exploitation of molecular pathway databases, which offer a controlled, curated background for statistical enrichment analysis. Existing tools enable pathway analysis, visualization, or pathway merging but none integrates a fully automated workflow, combining all above-mentioned modules and destined to non-programmer users. We introduce an online web application, named KEGG Enriched Network Visualizer (KENeV), which enables a fully automated workflow starting from a list of differentially expressed genes and deriving the enriched KEGG metabolic and signaling pathways, merged into two respective, non-redundant super-networks. The final networks can be downloaded as SBML files, for further analysis, or instantly visualized through an interactive visualization module. In conclusion, KENeV (available online at http://www.grissom.gr/kenev) provides an integrative tool, suitable for users with no programming experience, for the functional interpretation, at both the metabolic and signaling level, of differentially expressed gene subsets deriving from genomic experiments. PMID:26925206

  8. Differentially expressed genes and interacting pathways in bladder cancer revealed by bioinformatic analysis.

    PubMed

    Shen, Yinzhou; Wang, Xuelei; Jin, Yongchao; Lu, Jiasun; Qiu, Guangming; Wen, Xiaofei

    2014-10-01

    The goal of this study was to identify cancer-associated differentially expressed genes (DEGs), analyze their biological functions and investigate the mechanism(s) of cancer occurrence and development, which may provide a theoretical foundation for bladder cancer (BCa) therapy. We downloaded the mRNA expression profiling dataset GSE13507 from the Gene Expression Omnibus database; the dataset includes 165 BCa and 68 control samples. T‑tests were used to identify DEGs. To further study the biological functions of the identified DEGs, we performed a Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Next, we built a network of potentially interacting pathways to study the synergistic relationships among DEGs. A total of 12,105 genes were identified as DEGs, of which 5,239 were upregulated and 6,866 were downregulated in BCa. The DEGs encoding activator protein 1 (AP-1), nuclear factor of activated T-cells (NFAT) proteins, nuclear factor κ-light-chain-enhancer of activated B cells (NF-κB) and interleukin (IL)-10 were revealed to participate in the significantly enriched immune pathways that were downregulated in BCa. KEGG enrichment analysis revealed 7 significantly upregulated and 47 significantly downregulated pathways enriched among the DEGs. We found a crosstalk interaction among a total of 44 pathways in the network of BCa-affected pathways. In conclusion, our results show that BCa involves dysfunctions in multiple systems. Our study is expected to pave ways for immune and inflammatory research and provide molecular insights for cancer therapy.

  9. Identification of key pathways and genes in colorectal cancer using bioinformatics analysis.

    PubMed

    Liang, Bin; Li, Chunning; Zhao, Jianying

    2016-10-01

    Colorectal cancer (CRC) is the most common malignant tumor of digestive system. The aim of this study was to identify gene signatures during CRC and uncover their potential mechanisms. The gene expression profiles of GSE21815 were downloaded from GEO database. The GSE21815 dataset contained 141 samples, including 132 CRC and 9 normal colon epitheliums. The gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) enrichment analyses were performed, and protein-protein interaction (PPI) network of the differentially expressed genes (DEGs) was constructed by Cytoscape software. In total, 3500 DEGs were identified in CRC, including 1370 up-regulated genes and 2130 down-regulated genes. GO analysis results showed that up-regulated DEGs were significantly enriched in biological processes (BP), including cell cycle, cell division, and cell proliferation; the down-regulated DEGs were significantly enriched in biological processes, including immune response, intracellular signaling cascade and defense response. KEGG pathway analysis showed the up-regulated DEGs were enriched in cell cycle and DNA replication, while the down-regulated DEGs were enriched in drug metabolism, metabolism of xenobiotics by cytochrome P450, and retinol metabolism pathways. The top 10 hub genes, GNG2, AGT, SAA1, ADCY5, LPAR1, NMU, IL8, CXCL12, GNAI1, and CCR2 were identified from the PPI network, and sub-networks revealed these genes were involved in significant pathways, including G protein-coupled receptors signaling pathway, gastrin-CREB signaling pathway via PKC and MAPK, and extracellular matrix organization. In conclusion, the present study indicated that the identified DEGs and hub genes promote our understanding of the molecular mechanisms underlying the development of CRC, and might be used as molecular targets and diagnostic biomarkers for the treatment of CRC.

  10. PolymiRTS Database 3.0: linking polymorphisms in microRNAs and their target sites with human diseases and biological pathways.

    PubMed

    Bhattacharya, Anindya; Ziebarth, Jesse D; Cui, Yan

    2014-01-01

    Polymorphisms in microRNAs (miRNAs) and their target sites (PolymiRTS) are known to disrupt miRNA function, leading to the development of disease and variation in physiological and behavioral phenotypes. Here, we describe recent updates to the PolymiRTS database (http://compbio.uthsc.edu/miRSNP), an integrated platform for analyzing the functional impact of genetic polymorphisms in miRNA seed regions and miRNA target sites. Recent advances in genomic technologies have made it possible to identify miRNA-mRNA binding sites from direct mapping experiments such as CLASH (cross linking, ligation and sequencing of hybrids). We have integrated data from CLASH experiments in the PolymiRTS database to provide more complete and accurate miRNA-mRNA interactions. Other significant new features include (i) small insertions and deletions in miRNA seed regions and miRNA target sites, (ii) TargetScan context + score differences for assessing the impact of polymorphic miRNA-mRNA interactions and (iii) biological pathways. The browse and search pages of PolymiRTS allow users to explore the relations between the PolymiRTSs and gene expression traits, physiological and behavioral phenotypes, human diseases and biological pathways.

  11. HPIminer: A text mining system for building and visualizing human protein interaction networks and pathways.

    PubMed

    Subramani, Suresh; Kalpana, Raja; Monickaraj, Pankaj Moses; Natarajan, Jeyakumar

    2015-04-01

    The knowledge on protein-protein interactions (PPI) and their related pathways are equally important to understand the biological functions of the living cell. Such information on human proteins is highly desirable to understand the mechanism of several diseases such as cancer, diabetes, and Alzheimer's disease. Because much of that information is buried in biomedical literature, an automated text mining system for visualizing human PPI and pathways is highly desirable. In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature. HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG. To our knowledge, HPIminer is the first system to build interaction networks from literature as well as curated databases. Further, the new interactions mined only from literature and not reported earlier in databases are highlighted as new. A comparative study with other similar tools shows that the resultant network is more informative and provides additional information on interacting proteins and their associated networks. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. NRDTD: a database for clinically or experimentally supported non-coding RNAs and drug targets associations

    PubMed Central

    Sun, Ya-Zhou; Zhang, De-Hong; Yan, Gui-Ying; An, Ji-Yong; You, Zhu-Hong

    2017-01-01

    Abstract In recent years, more and more non-coding RNAs (ncRNAs) have been identified and increasing evidences have shown that ncRNAs may affect gene expression and disease progression, making them a new class of targets for drug discovery. It thus becomes important to understand the relationship between ncRNAs and drug targets. For this purpose, an ncRNAs and drug targets association database would be extremely beneficial. Here, we developed ncRNA Drug Targets Database (NRDTD) that collected 165 entries of clinically or experimentally supported ncRNAs as drug targets, including 97 ncRNAs and 96 drugs. Moreover, we annotated ncRNA-drug target associations with drug information from KEGG, PubChem, DrugBank, CTD or Wikipedia, GenBank sequence links, OMIM disease ID, pathway and function annotation for ncRNAs, detailed description of associations between ncRNAs and diseases from HMDD or LncRNADisease and the publication PubMed ID. Additionally, we provided users a link to submit novel disease-ncRNA-drug associations and corresponding supporting evidences into the database. We hope NRDTD will be a useful resource for investigating the roles of ncRNAs in drug target identification, drug discovery and disease treatment. Database URL: http://chengroup.cumt.edu.cn/NRDTD

  13. A curated database of genetic markers from the angiogenesis/VEGF pathway and their relation to clinical outcome in human cancers.

    PubMed

    Savas, Sevtap

    2012-02-01

    Angiogenesis causes local growth, aggressiveness and metastasis in solid tumors, and thus, is almost always associated with poor prognosis and survival in cancer patients. Because of this clinical importance, several chemotherapeutic agents targeting angiogenesis have also been developed. Genes and genetic variations in angiogenesis/VEGF pathway thus may be correlated with clinical outcome in cancer patients. Here, we describe a manually curated public database, dbANGIO, which posts the results of studies testing the possible correlation of genetic variations (polymorphisms and mutations) from the angiogenesis/VEGF pathway with demographic features, clinicopathological features, treatment response and toxicity, and prognosis and survival-related endpoints in human cancers. The scientific findings are retrieved from PUBMED and posted in the dbANGIO website in a summarized form. As of September 2011, dbANGIO includes 362 entries from 83 research articles encompassing 154 unique genetic variations from 39 genes investigated in several solid and hematological cancers. By curating the literature findings and making them freely available to researchers, dbANGIO will expedite the research on genetic factors from the angiogenesis pathway and will assist in their utility in clinical management of cancer patients. dbANGIO is freely available for non-profit institutions at http://www.med.mun.ca/angio.

  14. pwOmics: an R package for pathway-based integration of time-series omics data using public database knowledge.

    PubMed

    Wachter, Astrid; Beißbarth, Tim

    2015-09-15

    Characterization of biological processes is progressively enabled with the increased generation of omics data on different signaling levels. Here we present a straightforward approach for the integrative analysis of data from different high-throughput technologies based on pathway and interaction models from public databases. pwOmics performs pathway-based level-specific data comparison of coupled human proteomic and genomic/transcriptomic datasets based on their log fold changes. Separate downstream and upstream analyses results on the functional levels of pathways, transcription factors and genes/transcripts are performed in the cross-platform consensus analysis. These provide a basis for the combined interpretation of regulatory effects over time. Via network reconstruction and inference methods (Steiner tree, dynamic Bayesian network inference) consensus graphical networks can be generated for further analyses and visualization. The R package pwOmics is freely available on Bioconductor (http://www.bioconductor.org/). astrid.wachter@med.uni-goettingen.de. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. Profiling conserved biological pathways in Autosomal Dominant Polycystic Kidney Disorder (ADPKD) to elucidate key transcriptomic alterations regulating cystogenesis: A cross-species meta-analysis approach.

    PubMed

    Chatterjee, Shatakshee; Verma, Srikant Prasad; Pandey, Priyanka

    2017-09-05

    Initiation and progression of fluid filled cysts mark Autosomal Dominant Polycystic Kidney Disease (ADPKD). Thus, improved therapeutics targeting cystogenesis remains a constant challenge. Microarray studies in single ADPKD animal models species with limited sample sizes tend to provide scattered views on underlying ADPKD pathogenesis. Thus we aim to perform a cross species meta-analysis to profile conserved biological pathways that might be key targets for therapy. Nine ADPKD microarray datasets on rat, mice and human fulfilled our study criteria and were chosen. Intra-species combined analysis was performed after considering removal of batch effect. Significantly enriched GO biological processes and KEGG pathways were computed and their overlap was observed. For the conserved pathways, biological modules and gene regulatory networks were observed. Additionally, Gene Set Enrichment Analysis (GSEA) using Molecular Signature Database (MSigDB) was performed for genes found in conserved pathways. We obtained 28 modules of significantly enriched GO processes and 5 major functional categories from significantly enriched KEGG pathways conserved in human, mice and rats that in turn suggest a global transcriptomic perturbation affecting cyst - formation, growth and progression. Significantly enriched pathways obtained from up-regulated genes such as Genomic instability, Protein localization in ER and Insulin Resistance were found to regulate cyst formation and growth whereas cyst progression due to increased cell adhesion and inflammation was suggested by perturbations in Angiogenesis, TGF-beta, CAMs, and Infection related pathways. Additionally, networks revealed shared genes among pathways e.g. SMAD2 and SMAD7 in Endocytosis and TGF-beta. Our study suggests cyst formation and progression to be an outcome of interplay between a set of several key deregulated pathways. Thus, further translational research is warranted focusing on developing a combinatorial therapeutic

  16. The relationship between inadvertent ingestion and dermal exposure pathways: a new integrated conceptual model and a database of dermal and oral transfer efficiencies.

    PubMed

    Gorman Ng, Melanie; Semple, Sean; Cherrie, John W; Christopher, Yvette; Northage, Christine; Tielemans, Erik; Veroughstraete, Violaine; Van Tongeren, Martie

    2012-11-01

    Occupational inadvertent ingestion exposure is ingestion exposure due to contact between the mouth and contaminated hands or objects. Although individuals are typically oblivious to their exposure by this route, it is a potentially significant source of occupational exposure for some substances. Due to the continual flux of saliva through the oral cavity and the non-specificity of biological monitoring to routes of exposure, direct measurement of exposure by the inadvertent ingestion route is challenging; predictive models may be required to assess exposure. The work described in this manuscript has been carried out as part of a project to develop a predictive model for estimating inadvertent ingestion exposure in the workplace. As inadvertent ingestion exposure mainly arises from hand-to-mouth contact, it is closely linked to dermal exposure. We present a new integrated conceptual model for dermal and inadvertent ingestion exposure that should help to increase our understanding of ingestion exposure and our ability to simultaneously estimate exposure by the dermal and ingestion routes. The conceptual model consists of eight compartments (source, air, surface contaminant layer, outer clothing contaminant layer, inner clothing contaminant layer, hands and arms layer, perioral layer, and oral cavity) and nine mass transport processes (emission, deposition, resuspension or evaporation, transfer, removal, redistribution, decontamination, penetration and/or permeation, and swallowing) that describe event-based movement of substances between compartments (e.g. emission, deposition, etc.). This conceptual model is intended to guide the development of predictive exposure models that estimate exposure from both the dermal and the inadvertent ingestion pathways. For exposure by these pathways the efficiency of transfer of materials between compartments (for example from surfaces to hands, or from hands to the mouth) are important determinants of exposure. A database of

  17. ReprOlive: a database with linked data for the olive tree (Olea europaea L.) reproductive transcriptome.

    PubMed

    Carmona, Rosario; Zafra, Adoración; Seoane, Pedro; Castro, Antonio J; Guerrero-Fernández, Darío; Castillo-Castillo, Trinidad; Medina-García, Ana; Cánovas, Francisco M; Aldana-Montes, José F; Navas-Delgado, Ismael; Alché, Juan de Dios; Claros, M Gonzalo

    2015-01-01

    Plant reproductive transcriptomes have been analyzed in different species due to the agronomical and biotechnological importance of plant reproduction. Here we presented an olive tree reproductive transcriptome database with samples from pollen and pistil at different developmental stages, and leaf and root as control vegetative tissues http://reprolive.eez.csic.es). It was developed from 2,077,309 raw reads to 1,549 Sanger sequences. Using a pre-defined workflow based on open-source tools, sequences were pre-processed, assembled, mapped, and annotated with expression data, descriptions, GO terms, InterPro signatures, EC numbers, KEGG pathways, ORFs, and SSRs. Tentative transcripts (TTs) were also annotated with the corresponding orthologs in Arabidopsis thaliana from TAIR and RefSeq databases to enable Linked Data integration. It results in a reproductive transcriptome comprising 72,846 contigs with average length of 686 bp, of which 63,965 (87.8%) included at least one functional annotation, and 55,356 (75.9%) had an ortholog. A minimum of 23,568 different TTs was identified and 5,835 of them contain a complete ORF. The representative reproductive transcriptome can be reduced to 28,972 TTs for further gene expression studies. Partial transcriptomes from pollen, pistil, and vegetative tissues as control were also constructed. ReprOlive provides free access and download capability to these results. Retrieval mechanisms for sequences and transcript annotations are provided. Graphical localization of annotated enzymes into KEGG pathways is also possible. Finally, ReprOlive has included a semantic conceptualisation by means of a Resource Description Framework (RDF) allowing a Linked Data search for extracting the most updated information related to enzymes, interactions, allergens, structures, and reactive oxygen species.

  18. ReprOlive: a database with linked data for the olive tree (Olea europaea L.) reproductive transcriptome

    PubMed Central

    Carmona, Rosario; Zafra, Adoración; Seoane, Pedro; Castro, Antonio J.; Guerrero-Fernández, Darío; Castillo-Castillo, Trinidad; Medina-García, Ana; Cánovas, Francisco M.; Aldana-Montes, José F.; Navas-Delgado, Ismael; Alché, Juan de Dios; Claros, M. Gonzalo

    2015-01-01

    Plant reproductive transcriptomes have been analyzed in different species due to the agronomical and biotechnological importance of plant reproduction. Here we presented an olive tree reproductive transcriptome database with samples from pollen and pistil at different developmental stages, and leaf and root as control vegetative tissues http://reprolive.eez.csic.es). It was developed from 2,077,309 raw reads to 1,549 Sanger sequences. Using a pre-defined workflow based on open-source tools, sequences were pre-processed, assembled, mapped, and annotated with expression data, descriptions, GO terms, InterPro signatures, EC numbers, KEGG pathways, ORFs, and SSRs. Tentative transcripts (TTs) were also annotated with the corresponding orthologs in Arabidopsis thaliana from TAIR and RefSeq databases to enable Linked Data integration. It results in a reproductive transcriptome comprising 72,846 contigs with average length of 686 bp, of which 63,965 (87.8%) included at least one functional annotation, and 55,356 (75.9%) had an ortholog. A minimum of 23,568 different TTs was identified and 5,835 of them contain a complete ORF. The representative reproductive transcriptome can be reduced to 28,972 TTs for further gene expression studies. Partial transcriptomes from pollen, pistil, and vegetative tissues as control were also constructed. ReprOlive provides free access and download capability to these results. Retrieval mechanisms for sequences and transcript annotations are provided. Graphical localization of annotated enzymes into KEGG pathways is also possible. Finally, ReprOlive has included a semantic conceptualisation by means of a Resource Description Framework (RDF) allowing a Linked Data search for extracting the most updated information related to enzymes, interactions, allergens, structures, and reactive oxygen species. PMID:26322066

  19. eDGAR: a database of Disease-Gene Associations with annotated Relationships among genes.

    PubMed

    Babbi, Giulia; Martelli, Pier Luigi; Profiti, Giuseppe; Bovo, Samuele; Savojardo, Castrense; Casadio, Rita

    2017-08-11

    Genetic investigations, boosted by modern sequencing techniques, allow dissecting the genetic component of different phenotypic traits. These efforts result in the compilation of lists of genes related to diseases and show that an increasing number of diseases is associated with multiple genes. Investigating functional relations among genes associated with the same disease contributes to highlighting molecular mechanisms of the pathogenesis. We present eDGAR, a database collecting and organizing the data on gene/disease associations as derived from OMIM, Humsavar and ClinVar. For each disease-associated gene, eDGAR collects information on its annotation. Specifically, for lists of genes, eDGAR provides information on: i) interactions retrieved from PDB, BIOGRID and STRING; ii) co-occurrence in stable and functional structural complexes; iii) shared Gene Ontology annotations; iv) shared KEGG and REACTOME pathways; v) enriched functional annotations computed with NET-GE; vi) regulatory interactions derived from TRRUST; vii) localization on chromosomes and/or co-localisation in neighboring loci. The present release of eDGAR includes 2672 diseases, related to 3658 different genes, for a total number of 5729 gene-disease associations. 71% of the genes are linked to 621 multigenic diseases and eDGAR highlights their common GO terms, KEGG/REACTOME pathways, physical and regulatory interactions. eDGAR includes a network based enrichment method for detecting statistically significant functional terms associated to groups of genes. eDGAR offers a resource to analyze disease-gene associations. In multigenic diseases genes can share physical interactions and/or co-occurrence in the same functional processes. eDGAR is freely available at: edgar.biocomp.unibo.it.

  20. [A novel biological pathway expansion method based on the knowledge of protein-protein interactions].

    PubMed

    Zhao, Xiaolei; Zuo, Xiaoyu; Qin, Jiheng; Liang, Yan; Zhang, Naizun; Luan, Yizhao; Rao, Shaoqi

    2014-04-01

    Biological pathways have been widely used in gene function studies; however, the current knowledge for biological pathways is per se incomplete and has to be further expanded. Bioinformatics prediction provides us a cheap but effective way for pathway expansion. Here, we proposed a novel method for biological pathway prediction, by intergrating prior knowledge of protein?protein interactions and Gene Ontology (GO) database. First, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways to which the interacting neighbors of a targe gene (at the level of protein?protein interaction) belong were chosen as the candidate pathways. Then, the pathways to which the target gene belong were determined by testing whether the genes in the candidate pathways were enriched in the GO terms to which the target gene were annotated. The protein?protein interaction data obtained from the Human Protein Reference Database (HPRD) and Biological General Repository for Interaction Datasets (BioGRID) were respectively used to predict the pathway attribution(s) of the target gene. The results demanstrated that both the average accuracy (the ratio of the correctly predicted pathways to the totally pathways to which all the target genes were annotated) and the relative accuracy (of the genes with at least one annotated pathway being successful predicted, the percentage of the genes with all the annotated pathways being correctly predicted) for pathway predictions were increased with the number of the interacting neighbours. When the number of interacting neighbours reached 22, the average accuracy was 96.2% (HPRD) and 96.3% (BioGRID), respectively, and the relative accuracy was 93.3% (HPRD) and 84.1% (BioGRID), respectively. Further validation analysis of 89 genes whose pathway knowledge was updated in a new database release indicated that 50 genes were correctly predicted for at least one updated pathway, and 43 genes were accurately predicted for all the updated pathways, giving an

  1. [Screening and identification of key signal transduction pathways in pulmonary silicotic fibrosis].

    PubMed

    Xue, Rong; Zhu, Lan; Li, Qian; Yang, Zhen; Wang, Xianhua; Gao, Hongsheng

    2014-03-01

    To investigate the differential gene expression profile of the lung tissues in experimental silicosis rats and to screen for and identify the key signal transduction pathways in pulmonary silicotic fibrosis. A total of 80 rats were randomly divided into control group (n = 40) and silica-instilled group (n = 40). Each group was equally divided into five subgroups, and each subgroup was treated at 1, 7, 14, 21, or 28 d. Intratracheal instillation was used to give 1 ml of silica suspension (50 mg/ml) in the silica-instilled group and normal saline in the control group. Silicotic nodules and type I and III collagen were observed through hematoxylin and eosin staining and Sirius red staining, respectively. Differentially expressed genes in pulmonary silicotic fibrosis were selected by the rat whole-genome gene expression RatRef-12 BeadChip (Illumina, USA), and a fold change cutoff was applied. Quantitative real-time polymerase chain reaction (qRT-PCR) was also used to verify differentially expressed genes. Through bioinformatics databases such as Visualization and Integrated Discovery (DAVID) and Kyoto Encyclopedia of Genes and Genomes (KEGG), preliminary research was performed on the biological pathways of differential genes, key biological signal transduction pathways were identified, and key differentially expressed genes in each pathway at different time points were searched for. A total of 2694 genes were differentially expressed and changed dynamically. The KEGG pathway analysis showed that 141 signal transduction pathways were involved in the development and progression of pulmonary silicotic fibrosis, among which 48 pathways were more significant than others (P < 0.01), with the mitogen-activated protein kinase (MAPK) pathway exceptionally significant. The differentially expressed genes interleukin-1 receptor (IL-1R), tumor necrosis factor receptor (TNFR), and transforming growth factor beta (TGF-β) in the MAPK pathway were up-regulated at different time points

  2. Hypothesis-independent pathway analysis implicates GABA and acetyl-CoA metabolism in primary open-angle glaucoma and normal-pressure glaucoma.

    PubMed

    Bailey, Jessica N Cooke; Yaspan, Brian L; Pasquale, Louis R; Hauser, Michael A; Kang, Jae H; Loomis, Stephanie J; Brilliant, Murray; Budenz, Donald L; Christen, William G; Fingert, John; Gaasterland, Douglas; Gaasterland, Terry; Kraft, Peter; Lee, Richard K; Lichter, Paul R; Liu, Yutao; McCarty, Catherine A; Moroi, Sayoko E; Richards, Julia E; Realini, Tony; Schuman, Joel S; Scott, William K; Singh, Kuldev; Sit, Arthur J; Vollrath, Douglas; Wollstein, Gadi; Zack, Donald J; Zhang, Kang; Pericak-Vance, Margaret A; Allingham, R Rand; Weinreb, Robert N; Haines, Jonathan L; Wiggs, Janey L

    2014-10-01

    Primary open-angle glaucoma (POAG) is a leading cause of blindness worldwide. Using genome-wide association single-nucleotide polymorphism data from the Glaucoma Genes and Environment study and National Eye Institute Glaucoma Human Genetics Collaboration comprising 3,108 cases and 3,430 controls, we assessed biologic pathways as annotated in the KEGG database for association with risk of POAG. After correction for genic overlap among pathways, we found 4 pathways, butanoate metabolism (hsa00650), hematopoietic cell lineage (hsa04640), lysine degradation (hsa00310) and basal transcription factors (hsa03022) related to POAG with permuted p < 0.001. In addition, the human leukocyte antigen (HLA) gene family was significantly associated with POAG (p < 0.001). In the POAG subset with normal-pressure glaucoma (NPG), the butanoate metabolism pathway was also significantly associated (p < 0.001) as well as the MAPK and Hedgehog signaling pathways (hsa04010 and hsa04340), glycosaminoglycan biosynthesis-heparan sulfate pathway (hsa00534) and the phenylalanine, tyrosine and tryptophan biosynthesis pathway (hsa0400). The butanoate metabolism pathway overall, and specifically the aspects of the pathway that contribute to GABA and acetyl-CoA metabolism, was the only pathway significantly associated with both POAG and NPG. Collectively these results implicate GABA and acetyl-CoA metabolism in glaucoma pathogenesis, and suggest new potential therapeutic targets.

  3. Pathway-based approach using hierarchical components of collapsed rare variants.

    PubMed

    Lee, Sungyoung; Choi, Sungkyoung; Kim, Young Jin; Kim, Bong-Jo; Hwang, Heungsun; Park, Taesung

    2016-09-01

    To address 'missing heritability' issue, many statistical methods for pathway-based analyses using rare variants have been proposed to analyze pathways individually. However, neglecting correlations between multiple pathways can result in misleading solutions, and pathway-based analyses of large-scale genetic datasets require massive computational burden. We propose a Pathway-based approach using HierArchical components of collapsed RAre variants Of High-throughput sequencing data (PHARAOH) for the analysis of rare variants by constructing a single hierarchical model that consists of collapsed gene-level summaries and pathways and analyzes entire pathways simultaneously by imposing ridge-type penalties on both gene and pathway coefficient estimates; hence our method considers the correlation of pathways without constraint by a multiple testing problem. Through simulation studies, the proposed method was shown to have higher statistical power than the existing pathway-based methods. In addition, our method was applied to the large-scale whole-exome sequencing data with levels of a liver enzyme using two well-known pathway databases Biocarta and KEGG. This application demonstrated that our method not only identified associated pathways but also successfully detected biologically plausible pathways for a phenotype of interest. These findings were successfully replicated by an independent large-scale exome chip study. An implementation of PHARAOH is available at http://statgen.snu.ac.kr/software/pharaoh/ tspark@stats.snu.ac.kr Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. A systematic analysis of a mi-RNA inter-pathway regulatory motif

    PubMed Central

    2013-01-01

    Background The continuing discovery of new types and functions of small non-coding RNAs is suggesting the presence of regulatory mechanisms far more complex than the ones currently used to study and design Gene Regulatory Networks. Just focusing on the roles of micro RNAs (miRNAs), they have been found to be part of several intra-pathway regulatory motifs. However, inter-pathway regulatory mechanisms have been often neglected and require further investigation. Results In this paper we present the result of a systems biology study aimed at analyzing a high-level inter-pathway regulatory motif called Pathway Protection Loop, not previously described, in which miRNAs seem to play a crucial role in the successful behavior and activation of a pathway. Through the automatic analysis of a large set of public available databases, we found statistical evidence that this inter-pathway regulatory motif is very common in several classes of KEGG Homo Sapiens pathways and concurs in creating a complex regulatory network involving several pathways connected by this specific motif. The role of this motif seems also confirmed by a deeper review of other research activities on selected representative pathways. Conclusions Although previous studies suggested transcriptional regulation mechanism at the pathway level such as the Pathway Protection Loop, a high-level analysis like the one proposed in this paper is still missing. The understanding of higher-level regulatory motifs could, as instance, lead to new approaches in the identification of therapeutic targets because it could unveil new and “indirect” paths to activate or silence a target pathway. However, a lot of work still needs to be done to better uncover this high-level inter-pathway regulation including enlarging the analysis to other small non-coding RNA molecules. PMID:24152805

  5. Analysis of Polygala tenuifolia Transcriptome and Description of Secondary Metabolite Biosynthetic Pathways by Illumina Sequencing

    PubMed Central

    Tian, Hongling; Xu, Xiaoshuang; Zhang, Fusheng; Wang, Yaoqin; Guo, Shuhong; Qin, Xuemei; Du, Guanhua

    2015-01-01

    Radix polygalae, the dried roots of Polygala tenuifolia and P. sibirica, is one of the most well-known traditional Chinese medicinal plants. Radix polygalae contains various saponins, xanthones, and oligosaccharide esters and these compounds are responsible for several pharmacological properties. To provide basic breeding information, enhance molecular biological analysis, and determine secondary metabolite biosynthetic pathways of P. tenuifolia, we applied Illumina sequencing technology and de novo assembly. We also applied this technique to gain an overview of P. tenuifolia transcriptome from samples with different years. Using Illumina sequencing, approximately 67.2% of unique sequences were annotated by basic local alignment search tool similarity searches against public sequence databases. We classified the annotated unigenes by using Nr, Nt, GO, COG, and KEGG databases compared with NCBI. We also obtained many candidates CYP450s and UGTs by the analysis of genes in the secondary metabolite biosynthetic pathways, including putative terpenoid backbone and phenylpropanoid biosynthesis pathway. With this transcriptome sequencing, future genetic and genomics studies related to the molecular mechanisms associated with the chemical composition of P. tenuifolia may be improved. Genes involved in the enrichment of secondary metabolite biosynthesis-related pathways could enhance the potential applications of P. tenuifolia in pharmaceutical industries. PMID:26543847

  6. Pathways enrichment analysis for differentially expressed genes in squamous lung cancer.

    PubMed

    Qian, Liqiang; Luo, Qingquan; Zhao, Xiaojing; Huang, Jia

    2014-01-01

    Squamous lung cancer (SQLC) is a common type of lung cancer, but its oncogenesis mechanism is not so clear. The aim of this study was to screen the potential pathways changed in SQLC and elucidate the mechanism of it. Published microarray data of GSE3268 series was downloaded from Gene Expression Omnibus (GEO). Significance analysis of microarrays was performed using software R, and differentially expressed genes (DEGs) were harvested. The functions and pathways of DEGs were mapped in Gene Otology and KEGG pathway database, respectively. A total of 2961 genes were filtered as DEGs between normal and SQLC cells. Cell cycle and metabolism were the mainly changed functions of SQLC cells. Meanwhile genes such as MCM, RFC, FEN1, and POLD may induce SQLC through DNA replication pathway, and genes such as PTTG1, CCNB1, CDC6, and PCNA may be involved in SQLC through cell cycle pathway. It is demonstrated that pathway analysis is useful in the identification of target genes in SQLC.

  7. Pathway-Based Genome-Wide Association Studies for Two Meat Production Traits in Simmental Cattle

    PubMed Central

    Fan, Huizhong; Wu, Yang; Zhou, Xiaojing; Xia, Jiangwei; Zhang, Wengang; Song, Yuxin; Liu, Fei; Chen, Yan; Zhang, Lupei; Gao, Xue; Gao, Huijiang; Li, Junya

    2015-01-01

    Most single nucleotide polymorphisms (SNPs) detected by genome-wide association studies (GWAS), explain only a small fraction of phenotypic variation. Pathway-based GWAS were proposed to improve the proportion of genes for some human complex traits that could be explained by enriching a mass of SNPs within genetic groups. However, few attempts have been made to describe the quantitative traits in domestic animals. In this study, we used a dataset with approximately 7,700,000 SNPs from 807 Simmental cattle and analyzed live weight and longissimus muscle area using a modified pathway-based GWAS method to orthogonalise the highly linked SNPs within each gene using principal component analysis (PCA). As a result, of the 262 biological pathways of cattle collected from the KEGG database, the gamma aminobutyric acid (GABA)ergic synapse pathway and the non-alcoholic fatty liver disease (NAFLD) pathway were significantly associated with the two traits analyzed. The GABAergic synapse pathway was biologically applicable to the traits analyzed because of its roles in feed intake and weight gain. The proposed method had high statistical power and a low false discovery rate, compared to those of the smallest P-value and SNP set enrichment analysis methods. PMID:26672757

  8. Identification of compound-protein interactions through the analysis of gene ontology, KEGG enrichment for proteins and molecular fragments of compounds.

    PubMed

    Chen, Lei; Zhang, Yu-Hang; Zheng, Mingyue; Huang, Tao; Cai, Yu-Dong

    2016-12-01

    Compound-protein interactions play important roles in every cell via the recognition and regulation of specific functional proteins. The correct identification of compound-protein interactions can lead to a good comprehension of this complicated system and provide useful input for the investigation of various attributes of compounds and proteins. In this study, we attempted to understand this system by extracting properties from both proteins and compounds, in which proteins were represented by gene ontology and KEGG pathway enrichment scores and compounds were represented by molecular fragments. Advanced feature selection methods, including minimum redundancy maximum relevance, incremental feature selection, and the basic machine learning algorithm random forest, were used to analyze these properties and extract core factors for the determination of actual compound-protein interactions. Compound-protein interactions reported in The Binding Databases were used as positive samples. To improve the reliability of the results, the analytic procedure was executed five times using different negative samples. Simultaneously, five optimal prediction methods based on a random forest and yielding maximum MCCs of approximately 77.55 % were constructed and may be useful tools for the prediction of compound-protein interactions. This work provides new clues to understanding the system of compound-protein interactions by analyzing extracted core features. Our results indicate that compound-protein interactions are related to biological processes involving immune, developmental and hormone-associated pathways.

  9. De novo transcriptomic analysis of peripheral blood lymphocytes from the Chinese goose: gene discovery and immune system pathway description.

    PubMed

    Tariq, Mansoor; Chen, Rong; Yuan, Hongyu; Liu, Yanjie; Wu, Yanan; Wang, Junya; Xia, Chun

    2015-01-01

    The Chinese goose is one of the most economically important poultry birds and is a natural reservoir for many avian viruses. However, the nature and regulation of the innate and adaptive immune systems of this waterfowl species are not completely understood due to limited information on the goose genome. Recently, transcriptome sequencing technology was applied in the genomic studies focused on novel gene discovery. Thus, this study described the transcriptome of the goose peripheral blood lymphocytes to identify immunity relevant genes. De novo transcriptome assembly of the goose peripheral blood lymphocytes was sequenced by Illumina-Solexa technology. In total, 211,198 unigenes were assembled from the 69.36 million cleaned reads. The average length, N50 size and the maximum length of the assembled unigenes were 687 bp, 1,298 bp and 18,992 bp, respectively. A total of 36,854 unigenes showed similarity by BLAST search against the NCBI non-redundant (Nr) protein database. For functional classification, 163,161 unigenes were comprised of three Gene Ontology (Go) categories and 67 subcategories. A total of 15,334 unigenes were annotated into 25 eukaryotic orthologous groups (KOGs) categories. Kyoto Encyclopedia of Genes and Genomes (KEGG) database annotated 39,585 unigenes into six biological functional groups and 308 pathways. Among the 2,757 unigenes that participated in the 15 immune system KEGG pathways, 125 of the most important immune relevant genes were summarized and analyzed by STRING analysis to identify gene interactions and relationships. Moreover, 10 genes were confirmed by PCR and analyzed. Of these 125 unigenes, 109 unigenes, approximately 87%, were not previously identified in the goose. This de novo transcriptome analysis could provide important Chinese goose sequence information and highlights the value of new gene discovery, pathways investigation and immune system gene identification, and comparison with other avian species as useful tools to

  10. De Novo Transcriptomic Analysis of Peripheral Blood Lymphocytes from the Chinese Goose: Gene Discovery and Immune System Pathway Description

    PubMed Central

    Tariq, Mansoor; Chen, Rong; Yuan, Hongyu; Liu, Yanjie; Wu, Yanan; Wang, Junya; Xia, Chun

    2015-01-01

    Background The Chinese goose is one of the most economically important poultry birds and is a natural reservoir for many avian viruses. However, the nature and regulation of the innate and adaptive immune systems of this waterfowl species are not completely understood due to limited information on the goose genome. Recently, transcriptome sequencing technology was applied in the genomic studies focused on novel gene discovery. Thus, this study described the transcriptome of the goose peripheral blood lymphocytes to identify immunity relevant genes. Principal Findings De novo transcriptome assembly of the goose peripheral blood lymphocytes was sequenced by Illumina-Solexa technology. In total, 211,198 unigenes were assembled from the 69.36 million cleaned reads. The average length, N50 size and the maximum length of the assembled unigenes were 687 bp, 1,298 bp and 18,992 bp, respectively. A total of 36,854 unigenes showed similarity by BLAST search against the NCBI non-redundant (Nr) protein database. For functional classification, 163,161 unigenes were comprised of three Gene Ontology (Go) categories and 67 subcategories. A total of 15,334 unigenes were annotated into 25 eukaryotic orthologous groups (KOGs) categories. Kyoto Encyclopedia of Genes and Genomes (KEGG) database annotated 39,585 unigenes into six biological functional groups and 308 pathways. Among the 2,757 unigenes that participated in the 15 immune system KEGG pathways, 125 of the most important immune relevant genes were summarized and analyzed by STRING analysis to identify gene interactions and relationships. Moreover, 10 genes were confirmed by PCR and analyzed. Of these 125 unigenes, 109 unigenes, approximately 87%, were not previously identified in the goose. Conclusion This de novo transcriptome analysis could provide important Chinese goose sequence information and highlights the value of new gene discovery, pathways investigation and immune system gene identification, and comparison with

  11. The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection.

    PubMed

    Galperin, Michael Y; Fernández-Suárez, Xosé M

    2012-01-01

    The 19th annual Database Issue of Nucleic Acids Research features descriptions of 92 new online databases covering various areas of molecular biology and 100 papers describing recent updates to the databases previously described in NAR and other journals. The highlights of this issue include, among others, a description of neXtProt, a knowledgebase on human proteins; a detailed explanation of the principles behind the NCBI Taxonomy Database; NCBI and EBI papers on the recently launched BioSample databases that store sample information for a variety of database resources; descriptions of the recent developments in the Gene Ontology and UniProt Gene Ontology Annotation projects; updates on Pfam, SMART and InterPro domain databases; update papers on KEGG and TAIR, two universally acclaimed databases that face an uncertain future; and a separate section with 10 wiki-based databases, introduced in an accompanying editorial. The NAR online Molecular Biology Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, has been updated and now lists 1380 databases. Brief machine-readable descriptions of the databases featured in this issue, according to the BioDBcore standards, will be provided at the http://biosharing.org/biodbcore web site. The full content of the Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/).

  12. The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection

    PubMed Central

    Galperin, Michael Y.; Fernández-Suárez, Xosé M.

    2012-01-01

    The 19th annual Database Issue of Nucleic Acids Research features descriptions of 92 new online databases covering various areas of molecular biology and 100 papers describing recent updates to the databases previously described in NAR and other journals. The highlights of this issue include, among others, a description of neXtProt, a knowledgebase on human proteins; a detailed explanation of the principles behind the NCBI Taxonomy Database; NCBI and EBI papers on the recently launched BioSample databases that store sample information for a variety of database resources; descriptions of the recent developments in the Gene Ontology and UniProt Gene Ontology Annotation projects; updates on Pfam, SMART and InterPro domain databases; update papers on KEGG and TAIR, two universally acclaimed databases that face an uncertain future; and a separate section with 10 wiki-based databases, introduced in an accompanying editorial. The NAR online Molecular Biology Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, has been updated and now lists 1380 databases. Brief machine-readable descriptions of the databases featured in this issue, according to the BioDBcore standards, will be provided at the http://biosharing.org/biodbcore web site. The full content of the Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/). PMID:22144685

  13. HypoxiaDB: a database of hypoxia-regulated proteins

    PubMed Central

    Khurana, Pankaj; Sugadev, Ragumani; Jain, Jaspreet; Singh, Shashi Bala

    2013-01-01

    There has been intense interest in the cellular response to hypoxia, and a large number of differentially expressed proteins have been identified through various high-throughput experiments. These valuable data are scattered, and there have been no systematic attempts to document the various proteins regulated by hypoxia. Compilation, curation and annotation of these data are important in deciphering their role in hypoxia and hypoxia-related disorders. Therefore, we have compiled HypoxiaDB, a database of hypoxia-regulated proteins. It is a comprehensive, manually-curated, non-redundant catalog of proteins whose expressions are shown experimentally to be altered at different levels and durations of hypoxia. The database currently contains 72 000 manually curated entries taken on 3500 proteins extracted from 73 peer-reviewed publications selected from PubMed. HypoxiaDB is distinctive from other generalized databases: (i) it compiles tissue-specific protein expression changes under different levels and duration of hypoxia. Also, it provides manually curated literature references to support the inclusion of the protein in the database and establish its association with hypoxia. (ii) For each protein, HypoxiaDB integrates data on gene ontology, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway, protein–protein interactions, protein family (Pfam), OMIM (Online Mendelian Inheritance in Man), PDB (Protein Data Bank) structures and homology to other sequenced genomes. (iii) It also provides pre-compiled information on hypoxia-proteins, which otherwise requires tedious computational analysis. This includes information like chromosomal location, identifiers like Entrez, HGNC, Unigene, Uniprot, Ensembl, Vega, GI numbers and Genbank accession numbers associated with the protein. These are further cross-linked to respective public databases augmenting HypoxiaDB to the external repositories. (iv) In addition, HypoxiaDB provides an online sequence-similarity search tool for

  14. Transcriptome Analysis Reveals the Genetic Basis of the Resveratrol Biosynthesis Pathway in an Endophytic Fungus (Alternaria sp. MG1) Isolated from Vitis vinifera.

    PubMed

    Che, Jinxin; Shi, Junling; Gao, Zhenhong; Zhang, Yan

    2016-01-01

    Alternaria sp. MG1, an endophytic fungus previously isolated from Merlot grape, produces resveratrol from glucose, showing similar metabolic flux to the phenylpropanoid biosynthesis pathway, currently found solely in plants. In order to identify the resveratrol biosynthesis pathway in this strain at the gene level, de novo transcriptome sequencing was conducted using Illumina paired-end sequencing. A total of 22,954,434 high-quality reads were assembled into contigs and 18,570 unigenes were identified. Among these unigenes, 14,153 were annotated in the NCBI non-redundant protein database and 5341 were annotated in the Swiss-Prot database. After KEGG mapping, 2701 unigenes were mapped onto 115 pathways. Eighty-four unigenes were annotated in major pathways from glucose to resveratrol, coding 20 enzymes for glycolysis, 10 for phenylalanine biosynthesis, 4 for phenylpropanoid biosynthesis, and 4 for stilbenoid biosynthesis. Chalcone synthase was identified for resveratrol biosynthesis in this strain, due to the absence of stilbene synthase. All the identified enzymes indicated a reasonable biosynthesis pathway from glucose to resveratrol via glycolysis, phenylalanine biosynthesis, phenylpropanoid biosynthesis, and stilbenoid pathways. These results provide essential evidence for the occurrence of resveratrol biosynthesis in Alternaria sp. MG1 at the gene level, facilitating further elucidation of the molecular mechanisms involved in this strain's secondary metabolism.

  15. Transcriptome Analysis Reveals the Genetic Basis of the Resveratrol Biosynthesis Pathway in an Endophytic Fungus (Alternaria sp. MG1) Isolated from Vitis vinifera

    PubMed Central

    Che, Jinxin; Shi, Junling; Gao, Zhenhong; Zhang, Yan

    2016-01-01

    Alternaria sp. MG1, an endophytic fungus previously isolated from Merlot grape, produces resveratrol from glucose, showing similar metabolic flux to the phenylpropanoid biosynthesis pathway, currently found solely in plants. In order to identify the resveratrol biosynthesis pathway in this strain at the gene level, de novo transcriptome sequencing was conducted using Illumina paired-end sequencing. A total of 22,954,434 high-quality reads were assembled into contigs and 18,570 unigenes were identified. Among these unigenes, 14,153 were annotated in the NCBI non-redundant protein database and 5341 were annotated in the Swiss-Prot database. After KEGG mapping, 2701 unigenes were mapped onto 115 pathways. Eighty-four unigenes were annotated in major pathways from glucose to resveratrol, coding 20 enzymes for glycolysis, 10 for phenylalanine biosynthesis, 4 for phenylpropanoid biosynthesis, and 4 for stilbenoid biosynthesis. Chalcone synthase was identified for resveratrol biosynthesis in this strain, due to the absence of stilbene synthase. All the identified enzymes indicated a reasonable biosynthesis pathway from glucose to resveratrol via glycolysis, phenylalanine biosynthesis, phenylpropanoid biosynthesis, and stilbenoid pathways. These results provide essential evidence for the occurrence of resveratrol biosynthesis in Alternaria sp. MG1 at the gene level, facilitating further elucidation of the molecular mechanisms involved in this strain's secondary metabolism. PMID:27588016

  16. Aligning Metabolic Pathways Exploiting Binary Relation of Reactions

    PubMed Central

    Zhong, Cheng; Lin, Hai Xiang; Huang, Jing

    2016-01-01

    Metabolic pathway alignment has been widely used to find one-to-one and/or one-to-many reaction mappings to identify the alternative pathways that have similar functions through different sets of reactions, which has important applications in reconstructing phylogeny and understanding metabolic functions. The existing alignment methods exhaustively search reaction sets, which may become infeasible for large pathways. To address this problem, we present an effective alignment method for accurately extracting reaction mappings between two metabolic pathways. We show that connected relation between reactions can be formalized as binary relation of reactions in metabolic pathways, and the multiplications of zero-one matrices for binary relations of reactions can be accomplished in finite steps. By utilizing the multiplications of zero-one matrices for binary relation of reactions, we efficiently obtain reaction sets in a small number of steps without exhaustive search, and accurately uncover biologically relevant reaction mappings. Furthermore, we introduce a measure of topological similarity of nodes (reactions) by comparing the structural similarity of the k-neighborhood subgraphs of the nodes in aligning metabolic pathways. We employ this similarity metric to improve the accuracy of the alignments. The experimental results on the KEGG database show that when compared with other state-of-the-art methods, in most cases, our method obtains better performance in the node correctness and edge correctness, and the number of the edges of the largest common connected subgraph for one-to-one reaction mappings, and the number of correct one-to-many reaction mappings. Our method is scalable in finding more reaction mappings with better biological relevance in large metabolic pathways. PMID:27936108

  17. enRoute: dynamic path extraction from biological pathway maps for exploring heterogeneous experimental datasets

    PubMed Central

    2013-01-01

    Jointly analyzing biological pathway maps and experimental data is critical for understanding how biological processes work in different conditions and why different samples exhibit certain characteristics. This joint analysis, however, poses a significant challenge for visualization. Current techniques are either well suited to visualize large amounts of pathway node attributes, or to represent the topology of the pathway well, but do not accomplish both at the same time. To address this we introduce enRoute, a technique that enables analysts to specify a path of interest in a pathway, extract this path into a separate, linked view, and show detailed experimental data associated with the nodes of this extracted path right next to it. This juxtaposition of the extracted path and the experimental data allows analysts to simultaneously investigate large amounts of potentially heterogeneous data, thereby solving the problem of joint analysis of topology and node attributes. As this approach does not modify the layout of pathway maps, it is compatible with arbitrary graph layouts, including those of hand-crafted, image-based pathway maps. We demonstrate the technique in context of pathways from the KEGG and the Wikipathways databases. We apply experimental data from two public databases, the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA) that both contain a wide variety of genomic datasets for a large number of samples. In addition, we make use of a smaller dataset of hepatocellular carcinoma and common xenograft models. To verify the utility of enRoute, domain experts conducted two case studies where they explore data from the CCLE and the hepatocellular carcinoma datasets in the context of relevant pathways. PMID:24564375

  18. MicroRNA expression, target genes, and signaling pathways in infants with a ventricular septal defect.

    PubMed

    Chai, Hui; Yan, Zhaoyuan; Huang, Ke; Jiang, Yuanqing; Zhang, Lin

    2017-08-18

    This study aimed to systematically investigate the relationship between miRNA expression and the occurrence of ventricular septal defect (VSD), and characterize the miRNA target genes and pathways that can lead to VSD. The miRNAs that were differentially expressed in blood samples from VSD and normal infants were screened and validated by implementing miRNA microarrays and qRT-PCR. The target genes regulated by differentially expressed miRNAs were predicted using three target gene databases. The functions and signaling pathways of the target genes were enriched using the GO database and KEGG database, respectively. The transcription and protein expression of specific target genes in critical pathways were compared in the VSD and normal control groups using qRT-PCR and western blotting, respectively. Compared with the normal control group, the VSD group had 22 differentially expressed miRNAs; 19 were downregulated and three were upregulated. The 10,677 predicted target genes participated in many biological functions related to cardiac development and morphogenesis. Four target genes (mGLUR, Gq, PLC, and PKC) were involved in the PKC pathway and four (ECM, FAK, PI3 K, and PDK1) were involved in the PI3 K-Akt pathway. The transcription and protein expression of these eight target genes were significantly upregulated in the VSD group. The 22 miRNAs that were dysregulated in the VSD group were mainly downregulated, which may result in the dysregulation of several key genes and biological functions related to cardiac development. These effects could also be exerted via the upregulation of eight specific target genes, the subsequent over-activation of the PKC and PI3 K-Akt pathways, and the eventual abnormal cardiac development and VSD.

  19. Characterization of Differentially Expressed Genes Involved in Pathways Associated with Gastric Cancer

    PubMed Central

    Li, Hao; Yu, Beiqin; Li, Jianfang; Su, Liping; Yan, Min; Zhang, Jun; Li, Chen; Zhu, Zhenggang; Liu, Bingya

    2015-01-01

    To explore the patterns of gene expression in gastric cancer, a total of 26 paired gastric cancer and noncancerous tissues from patients were enrolled for gene expression microarray analyses. Limma methods were applied to analyze the data, and genes were considered to be significantly differentially expressed if the False Discovery Rate (FDR) value was < 0.01, P-value was <0.01 and the fold change (FC) was >2. Subsequently, Gene Ontology (GO) categories were used to analyze the main functions of the differentially expressed genes. According to the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, we found pathways significantly associated with the differential genes. Gene-Act network and co-expression network were built respectively based on the relationships among the genes, proteins and compounds in the database. 2371 mRNAs and 350 lncRNAs considered as significantly differentially expressed genes were selected for the further analysis. The GO categories, pathway analyses and the Gene-Act network showed a consistent result that up-regulated genes were responsible for tumorigenesis, migration, angiogenesis and microenvironment formation, while down-regulated genes were involved in metabolism. These results of this study provide some novel findings on coding RNAs, lncRNAs, pathways and the co-expression network in gastric cancer which will be useful to guide further investigation and target therapy for this disease. PMID:25928635

  20. Path2enet: generation of human pathway-derived networks in an expression specific context.

    PubMed

    Droste, Conrad; De Las Rivas, Javier

    2016-10-25

    Biological pathways are subsets of the complex biomolecular wiring that occur in living cells. They are usually rationalized and depicted in cartoon maps or charts to show them in a friendly visible way. Despite these efforts to present biological pathways, the current progress of bioinformatics indicates that translation of pathways in networks can be a very useful approach to achieve a computer-based view of the complex processes and interactions that occurr in a living system. We have developed a bioinformatic tool called Path2enet that provides a translation of biological pathways in protein networks integrating several layers of information about the biomolecular nodes in a multiplex view. Path2enet is an R package that reads the relations and links between proteins stored in a comprehensive database of biological pathways, KEGG (Kyoto Encyclopedia of Genes and Genomes, http://www.genome.jp/kegg/ ), and integrates them with expression data from various resources and with data on protein-protein physical interactions. Path2enet tool uses the expression data to determine if a given protein in a network (i.e., a node) is active (ON) or inactive (OFF) in a specific cellular context or sample type. In this way, Path2enet reduces the complexity of the networks and reveals the proteins that are active (expressed) under specific conditions. As a proof of concept, this work presents a practical "case of use" generating the pathway-expression-networks corresponding to the NOTCH Signaling Pathway in human B- and T-lymphocytes. This case is produced by the analysis and integration in Path2enet of an experimental dataset of genome-wide expression microarrays produced with these cell types (i.e., B cells and T cells). Path2enet is an open source and open access tool that allows the construction of pathway-expression-networks, reading and integrating the information from biological pathways, protein interactions and gene expression cell specific data. The development of this

  1. EXPath tool-a system for comprehensively analyzing regulatory pathways and coexpression networks from high-throughput transcriptome data.

    PubMed

    Zheng, Han-Qin; Wu, Nai-Yun; Chow, Chi-Nga; Tseng, Kuan-Chieh; Chien, Chia-Hung; Hung, Yu-Cheng; Li, Guan-Zhen; Chang, Wen-Chi

    2017-03-13

    Next generation sequencing (NGS) has become the mainstream approach for monitoring gene expression levels in parallel with various experimental treatments. Unfortunately, there is no systematical webserver to comprehensively perform further analysis based on the huge amount of preliminary data that is obtained after finishing the process of gene annotation. Therefore, a user-friendly and effective system is required to mine important genes and regulatory pathways under specific conditions from high-throughput transcriptome data. EXPath Tool (available at: http://expathtool.itps.ncku.edu.tw/) was developed for the pathway annotation and comparative analysis of user-customized gene expression profiles derived from microarray or NGS platforms under various conditions to infer metabolic pathways for all organisms in the KEGG database. EXPath Tool contains several functions: access the gene expression patterns and the candidates of co-expression genes; dissect differentially expressed genes (DEGs) between two conditions (DEGs search), functional grouping with pathway and GO (Pathway/GO enrichment analysis), and correlation networks (co-expression analysis), and view the expression patterns of genes involved in specific pathways to infer the effects of the treatment. Additionally, the effectively of EXPath Tool has been performed by a case study on IAA-responsive genes. The results demonstrated that critical hub genes under IAA treatment could be efficiently identified.

  2. Weighted gene co-expression network analysis in identification of metastasis-related genes of lung squamous cell carcinoma based on the Cancer Genome Atlas database.

    PubMed

    Tian, Feng; Zhao, Jinlong; Fan, Xinlei; Kang, Zhenxing

    2017-01-01

    Lung squamous cell carcinoma (lung SCC) is a common type of malignancy. Its pathogenesis mechanism of tumor development is unclear. The aim of this study was to identify key genes for diagnosis biomarkers in lung SCC metastasis. We searched and downloaded mRNA expression data and clinical data from The Cancer Genome Atlas (TCGA) database to identify differences in mRNA expression of primary tumor tissues from lung SCC with and without metastasis. Gene co-expression network analysis, protein-protein interaction (PPI) network, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and quantitative real-time polymerase chain reactions (qRT-PCR) were used to explore the biological functions of the identified dysregulated genes. Four hundred and eighty-two differentially expressed genes (DEGs) were identified between lung SCC with and without metastasis. Nineteen modules were identified in lung SCC through weighted gene co-expression network analysis (WGCNA). Twenty-three DEGs and 26 DEGs were significantly enriched in the respective pink and black module. KEGG pathway analysis displayed that 26 DEGs in the black module were significantly enriched in bile secretion pathway. Forty-nine DEGs in the two gene co-expression module were used to construct PPI network. CFTR in the black module was the hub protein, had the connectivity with 182 genes. The results of qRT-PCR displayed that FIGF, SFTPD, DYNLRB2 were significantly down-regulated in the tumor samples of lung SCC with metastasis and CFTR, SCGB3A2, SSTR1, SCTR, ROPN1L had the down-regulation tendency in lung SCC with metastasis compared to lung SCC without metastasis. The dysregulated genes including CFTR, SCTR and FIGF might be involved in the pathology of lung SCC metastasis and could be used as potential diagnosis biomarkers or therapeutic targets for lung SCC.

  3. Weighted gene co-expression network analysis in identification of metastasis-related genes of lung squamous cell carcinoma based on the Cancer Genome Atlas database

    PubMed Central

    Tian, Feng; Zhao, Jinlong; Kang, Zhenxing

    2017-01-01

    Background Lung squamous cell carcinoma (lung SCC) is a common type of malignancy. Its pathogenesis mechanism of tumor development is unclear. The aim of this study was to identify key genes for diagnosis biomarkers in lung SCC metastasis. Methods We searched and downloaded mRNA expression data and clinical data from The Cancer Genome Atlas (TCGA) database to identify differences in mRNA expression of primary tumor tissues from lung SCC with and without metastasis. Gene co-expression network analysis, protein-protein interaction (PPI) network, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and quantitative real-time polymerase chain reactions (qRT-PCR) were used to explore the biological functions of the identified dysregulated genes. Results Four hundred and eighty-two differentially expressed genes (DEGs) were identified between lung SCC with and without metastasis. Nineteen modules were identified in lung SCC through weighted gene co-expression network analysis (WGCNA). Twenty-three DEGs and 26 DEGs were significantly enriched in the respective pink and black module. KEGG pathway analysis displayed that 26 DEGs in the black module were significantly enriched in bile secretion pathway. Forty-nine DEGs in the two gene co-expression module were used to construct PPI network. CFTR in the black module was the hub protein, had the connectivity with 182 genes. The results of qRT-PCR displayed that FIGF, SFTPD, DYNLRB2 were significantly down-regulated in the tumor samples of lung SCC with metastasis and CFTR, SCGB3A2, SSTR1, SCTR, ROPN1L had the down-regulation tendency in lung SCC with metastasis compared to lung SCC without metastasis. Conclusions The dysregulated genes including CFTR, SCTR and FIGF might be involved in the pathology of lung SCC metastasis and could be used as potential diagnosis biomarkers or therapeutic targets for lung SCC. PMID:28203405

  4. Integrated miRNA–risk gene–pathway pair network analysis provides prognostic biomarkers for gastric cancer

    PubMed Central

    Cai, Hui; Xu, Jiping; Han, Yifang; Lu, Zhengmao; Han, Ting; Ding, Yibo; Ma, Liye

    2016-01-01

    Purpose This study aimed to identify molecular prognostic biomarkers for gastric cancer. Methods mRNA and miRNA expression profiles of eligible gastric cancer and control samples were downloaded from Gene Expression Omnibus to screen the differentially expressed genes (DEGs) and differentially expressed miRNAs (DEmiRs), using MetaDE and limma packages, respectively. Target genes of the DEmiRs were also collected from both predictive and experimentally validated target databases of miRNAs. The overlapping genes between selected targets and DEGs were identified as risk genes, followed by functional enrichment analysis. Human pathways and their corresponding genes were downloaded from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database for the expression analysis of each pathway in gastric cancer samples. Next, co-pathway pairs were selected according to the Pearson correlation coefficients. Finally, the co-pathway pairs, miRNA–target pairs, and risk gene–pathway pairs were merged into a complex interaction network, the most important nodes (miRNAs/target genes/co-pathway pairs) of which were selected by calculating their degrees. Results Totally, 1,260 DEGs and 144 DEmiRs were identified. There were 336 risk genes found in the 9,572 miRNA–target pairs. Judging from the pathway expression files, 45 co-pathway pairs were screened out. There were 1,389 interactive pairs and 480 nodes in the integrated network. Among all nodes in the network, focal adhesion/extracellular matrix–receptor interaction pathways, CALM2, miR-19b, and miR-181b were the hub nodes with higher degrees. Conclusion CALM2, hsa-miR-19b, and hsa-miR-181b might be used as potential prognostic targets for gastric cancer. PMID:27284247

  5. SpiroESTdb: a transcriptome database and online tool for sparganum expressed sequences tags.

    PubMed

    Kim, Dae-Won; Kim, Dong-Wook; Yoo, Won Gi; Nam, Seong-Hyeuk; Lee, Myoung-Ro; Yang, Hye-Won; Park, Junhyung; Lee, Kyooyeol; Lee, Sanghyun; Cho, Shin-Hyeong; Lee, Won-Ja; Park, Hong-Seog; Ju, Jung-Won

    2012-03-08

    Sparganum (plerocercoid of Spirometra erinacei) is a parasite that possesses the remarkable ability to survive by successfully modifying its physiology and morphology to suit various hosts and can be found in various tissues, even the nervous system. However, surprisingly little is known about the molecular function of genes that are expressed during the course of the parasite life cycle. To begin to decipher the molecular processes underlying gene function, we constructed a database of expressed sequence tags (ESTs) generated from sparganum. SpiroESTdb is a web-based information resource that is built upon the annotation and curation of 5,655 ESTs data. SpiroESTdb provides an integrated platform for expressed sequence data, expression dynamics, functional genes, genetic markers including single nucleotide polymorphisms and tandem repeats, gene ontology and KEGG pathway information. Moreover, SpiroESTdb supports easy access to gene pages, such as (i) curation and query forms, (ii) in silico expression profiling and (iii) BLAST search tools. Comprehensive descriptions of the sparganum content of all sequenced data are available, including summary reports. The contents of SpiroESTdb can be viewed and downloaded from the web (http://pathod.cdc.go.kr/spiroestdb). This integrative web-based database of sequence data, functional annotations and expression profiling data will serve as a useful tool to help understand and expand the characterization of parasitic infections. It can also be used to identify potential industrial drug targets and vaccine candidate genes.

  6. The FunGenES database: a genomics resource for mouse embryonic stem cell differentiation.

    PubMed

    Schulz, Herbert; Kolde, Raivo; Adler, Priit; Aksoy, Irène; Anastassiadis, Konstantinos; Bader, Michael; Billon, Nathalie; Boeuf, Hélène; Bourillot, Pierre-Yves; Buchholz, Frank; Dani, Christian; Doss, Michael Xavier; Forrester, Lesley; Gitton, Murielle; Henrique, Domingos; Hescheler, Jürgen; Himmelbauer, Heinz; Hübner, Norbert; Karantzali, Efthimia; Kretsovali, Androniki; Lubitz, Sandra; Pradier, Laurent; Rai, Meena; Reimand, Jüri; Rolletschek, Alexandra; Sachinidis, Agapios; Savatier, Pierre; Stewart, Francis; Storm, Mike P; Trouillas, Marina; Vilo, Jaak; Welham, Melanie J; Winkler, Johannes; Wobus, Anna M; Hatzopoulos, Antonis K

    2009-09-03

    Embryonic stem (ES) cells have high self-renewal capacity and the potential to differentiate into a large variety of cell types. To investigate gene networks operating in pluripotent ES cells and their derivatives, the "Functional Genomics in Embryonic Stem Cells" consortium (FunGenES) has analyzed the transcriptome of mouse ES cells in eleven diverse settings representing sixty-seven experimental conditions. To better illustrate gene expression profiles in mouse ES cells, we have organized the results in an interactive database with a number of features and tools. Specifically, we have generated clusters of transcripts that behave the same way under the entire spectrum of the sixty-seven experimental conditions; we have assembled genes in groups according to their time of expression during successive days of ES cell differentiation; we have included expression profiles of specific gene classes such as transcription regulatory factors and Expressed Sequence Tags; transcripts have been arranged in "Expression Waves" and juxtaposed to genes with opposite or complementary expression patterns; we have designed search engines to display the expression profile of any transcript during ES cell differentiation; gene expression data have been organized in animated graphs of KEGG signaling and metabolic pathways; and finally, we have incorporated advanced functional annotations for individual genes or gene clusters of interest and links to microarray and genomic resources. The FunGenES database provides a comprehensive resource for studies into the biology of ES cells.

  7. The FunGenES Database: A Genomics Resource for Mouse Embryonic Stem Cell Differentiation

    PubMed Central

    Adler, Priit; Aksoy, Irène; Anastassiadis, Konstantinos; Bader, Michael; Billon, Nathalie; Boeuf, Hélène; Bourillot, Pierre-Yves; Buchholz, Frank; Dani, Christian; Doss, Michael Xavier; Forrester, Lesley; Gitton, Murielle; Henrique, Domingos; Hescheler, Jürgen; Himmelbauer, Heinz; Hübner, Norbert; Karantzali, Efthimia; Kretsovali, Androniki; Lubitz, Sandra; Pradier, Laurent; Rai, Meena; Reimand, Jüri; Rolletschek, Alexandra; Sachinidis, Agapios; Savatier, Pierre; Stewart, Francis; Storm, Mike P.; Trouillas, Marina; Vilo, Jaak; Welham, Melanie J.; Winkler, Johannes; Wobus, Anna M.; Hatzopoulos, Antonis K.

    2009-01-01

    Embryonic stem (ES) cells have high self-renewal capacity and the potential to differentiate into a large variety of cell types. To investigate gene networks operating in pluripotent ES cells and their derivatives, the “Functional Genomics in Embryonic Stem Cells” consortium (FunGenES) has analyzed the transcriptome of mouse ES cells in eleven diverse settings representing sixty-seven experimental conditions. To better illustrate gene expression profiles in mouse ES cells, we have organized the results in an interactive database with a number of features and tools. Specifically, we have generated clusters of transcripts that behave the same way under the entire spectrum of the sixty-seven experimental conditions; we have assembled genes in groups according to their time of expression during successive days of ES cell differentiation; we have included expression profiles of specific gene classes such as transcription regulatory factors and Expressed Sequence Tags; transcripts have been arranged in “Expression Waves” and juxtaposed to genes with opposite or complementary expression patterns; we have designed search engines to display the expression profile of any transcript during ES cell differentiation; gene expression data have been organized in animated graphs of KEGG signaling and metabolic pathways; and finally, we have incorporated advanced functional annotations for individual genes or gene clusters of interest and links to microarray and genomic resources. The FunGenES database provides a comprehensive resource for studies into the biology of ES cells. PMID:19727443

  8. MTD: a mammalian transcriptomic database to explore gene expression and regulation

    PubMed Central

    Sun, Qianqian; Li, Xue; Xian, Feng; Sun, Manman; Fang, Wan; Chen, Meili; Yu, Jun; Xiao, Jingfa

    2017-01-01

    A systematic transcriptome survey is essential for the characterization and comprehension of the molecular basis underlying phenotypic variations. Recently developed RNA-seq methodology has facilitated efficient data acquisition and information mining of transcriptomes in multiple tissues/cell lines. Current mammalian transcriptomic databases are either tissue-specific or species-specific, and they lack in-depth comparative features across tissues and species. Here, we present a mammalian transcriptomic database (MTD) that is focused on mammalian transcriptomes, and the current version contains data from humans, mice, rats and pigs. Regarding the core features, the MTD browses genes based on their neighboring genomic coordinates or joint KEGG pathway and provides expression information on exons, transcripts and genes by integrating them into a genome browser. We developed a novel nomenclature for each transcript that considers its genomic position and transcriptional features. The MTD allows a flexible search of genes or isoforms with user-defined transcriptional characteristics and provides both table-based descriptions and associated visualizations. To elucidate the dynamics of gene expression regulation, the MTD also enables comparative transcriptomic analysis in both intraspecies and interspecies manner. The MTD thus constitutes a valuable resource for transcriptomic and evolutionary studies. The MTD is freely accessible at http://mtd.cbi.ac.cn. PMID:26822098

  9. Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical-gene-disease networks.

    PubMed

    Davis, Allan Peter; Murphy, Cynthia G; Saraceni-Richards, Cynthia A; Rosenstein, Michael C; Wiegers, Thomas C; Mattingly, Carolyn J

    2009-01-01

    The Comparative Toxicogenomics Database (CTD) is a curated database that promotes understanding about the effects of environmental chemicals on human health. Biocurators at CTD manually curate chemical-gene interactions, chemical-disease relationships and gene-disease relationships from the literature. This strategy allows data to be integrated to construct chemical-gene-disease networks. CTD is unique in numerous respects: curation focuses on environmental chemicals; interactions are manually curated; interactions are constructed using controlled vocabularies and hierarchies; additional gene attributes (such as Gene Ontology, taxonomy and KEGG pathways) are integrated; data can be viewed from the perspective of a chemical, gene or disease; results and batch queries can be downloaded and saved; and most importantly, CTD acts as both a knowledgebase (by reporting data) and a discovery tool (by generating novel inferences). Over 116,000 interactions between 3900 chemicals and 13,300 genes have been curated from 270 species, and 5900 gene-disease and 2500 chemical-disease direct relationships have been captured. By integrating these data, 350,000 gene-disease relationships and 77,000 chemical-disease relationships can be inferred. This wealth of chemical-gene-disease information yields testable hypotheses for understanding the effects of environmental chemicals on human health. CTD is freely available at http://ctd.mdibl.org.

  10. 'RetinoGenetics': a comprehensive mutation database for genes related to inherited retinal degeneration.

    PubMed

    Ran, Xia; Cai, Wei-Jun; Huang, Xiu-Feng; Liu, Qi; Lu, Fan; Qu, Jia; Wu, Jinyu; Jin, Zi-Bing

    2014-01-01

    Inherited retinal degeneration (IRD), a leading cause of human blindness worldwide, is exceptionally heterogeneous with clinical heterogeneity and genetic variety. During the past decades, tremendous efforts have been made to explore the complex heterogeneity, and massive mutations have been identified in different genes underlying IRD with the significant advancement of sequencing technology. In this study, we developed a comprehensive database, 'RetinoGenetics', which contains informative knowledge about all known IRD-related genes and mutations for IRD. 'RetinoGenetics' currently contains 4270 mutations in 186 genes, with detailed information associated with 164 phenotypes from 934 publications and various types of functional annotations. Then extensive annotations were performed to each gene using various resources, including Gene Ontology, KEGG pathways, protein-protein interaction, mutational annotations and gene-disease network. Furthermore, by using the search functions, convenient browsing ways and intuitive graphical displays, 'RetinoGenetics' could serve as a valuable resource for unveiling the genetic basis of IRD. Taken together, 'RetinoGenetics' is an integrative, informative and updatable resource for IRD-related genetic predispositions. Database URL: http://www.retinogenetics.org/.

  11. ‘RetinoGenetics’: a comprehensive mutation database for genes related to inherited retinal degeneration

    PubMed Central

    Ran, Xia; Cai, Wei-Jun; Huang, Xiu-Feng; Liu, Qi; Lu, Fan; Qu, Jia; Wu, Jinyu; Jin, Zi-Bing

    2014-01-01

    Inherited retinal degeneration (IRD), a leading cause of human blindness worldwide, is exceptionally heterogeneous with clinical heterogeneity and genetic variety. During the past decades, tremendous efforts have been made to explore the complex heterogeneity, and massive mutations have been identified in different genes underlying IRD with the significant advancement of sequencing technology. In this study, we developed a comprehensive database, ‘RetinoGenetics’, which contains informative knowledge about all known IRD-related genes and mutations for IRD. ‘RetinoGenetics’ currently contains 4270 mutations in 186 genes, with detailed information associated with 164 phenotypes from 934 publications and various types of functional annotations. Then extensive annotations were performed to each gene using various resources, including Gene Ontology, KEGG pathways, protein–protein interaction, mutational annotations and gene–disease network. Furthermore, by using the search functions, convenient browsing ways and intuitive graphical displays, ‘RetinoGenetics’ could serve as a valuable resource for unveiling the genetic basis of IRD. Taken together, ‘RetinoGenetics’ is an integrative, informative and updatable resource for IRD-related genetic predispositions. Database URL: http://www.retinogenetics.org/. PMID:24939193

  12. Incorporating pathway information into boosting estimation of high-dimensional risk prediction models.

    PubMed

    Binder, Harald; Schumacher, Martin

    2009-01-13

    There are several techniques for fitting risk prediction models to high-dimensional data, arising from microarrays. However, the biological knowledge about relations between genes is only rarely taken into account. One recent approach incorporates pathway information, available, e.g., from the KEGG database, by augmenting the penalty term in Lasso estimation for continuous response models. As an alternative, we extend componentwise likelihood-based boosting techniques for incorporating pathway information into a larger number of model classes, such as generalized linear models and the Cox proportional hazards model for time-to-event data. In contrast to Lasso-like approaches, no further assumptions for explicitly specifying the penalty structure are needed, as pathway information is incorporated by adapting the penalties for single microarray features in the course of the boosting steps. This is shown to result in improved prediction performance when the coefficients of connected genes have opposite sign. The properties of the fitted models resulting from this approach are then investigated in two application examples with microarray survival data. The proposed approach results not only in improved prediction performance but also in structurally different model fits. Incorporating pathway information in the suggested way is therefore seen to be beneficial in several ways.

  13. Databases for Microbiologists

    PubMed Central

    2015-01-01

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493

  14. Databases for Microbiologists

    SciTech Connect

    Zhulin, Igor B.

    2015-05-26

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.

  15. MRMPath and MRMutation, Facilitating Discovery of Mass Transitions for Proteotypic Peptides in Biological Pathways Using a Bioinformatics Approach

    PubMed Central

    Crasto, Chiquito; Narne, Chandrahas; Kawai, Mikako; Wilson, Landon; Barnes, Stephen

    2013-01-01

    Quantitative proteomics applications in mass spectrometry depend on the knowledge of the mass-to-charge ratio (m/z) values of proteotypic peptides for the proteins under study and their product ions. MRMPath and MRMutation, web-based bioinformatics software that are platform independent, facilitate the recovery of this information by biologists. MRMPath utilizes publicly available information related to biological pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. All the proteins involved in pathways of interest are recovered and processed in silico to extract information relevant to quantitative mass spectrometry analysis. Peptides may also be subjected to automated BLAST analysis to determine whether they are proteotypic. MRMutation catalogs and makes available, following processing, known (mutant) variants of proteins from the current UniProtKB database. All these results, available via the web from well-maintained, public databases, are written to an Excel spreadsheet, which the user can download and save. MRMPath and MRMutation can be freely accessed. As a system that seeks to allow two or more resources to interoperate, MRMPath represents an advance in bioinformatics tool development. As a practical matter, the MRMPath automated approach represents significant time savings to researchers. PMID:23424586

  16. Pathway deviation-based biomarker and multi-effect target identification in asbestos-related squamous cell carcinoma of the lung

    PubMed Central

    Du, Jiang; Zhang, Lin

    2017-01-01

    Asbestos-related lung carcinoma is one of the most devastating occupational cancers, and effective techniques for early diagnosis are still lacking. In the present study, a systematic approach was applied to detect a potential biomarker for asbestos-related lung cancer (ARLC); in particular asbestos-related squamous cell carcinoma (ARLC-SCC). Microarray data (GSE23822) were retrieved from the Gene Expression Omnibus database, including 26 ARLC-SCCs and 30 non-asbestos-related squamous cell lung carcinomas (NARLC-SCCs). Differentially expressed genes (DEGs) were identified by the limma package, and then a protein-protein interaction (PPI) network was constructed according to the BioGRID and HPRD databases. A novel scoring approach integrating an expression deviation score and network degree of the gene was then proposed to weight the DEGs. Subsequently, the important genes were uploaded to DAVID for pathway enrichment analysis. Pathway correlation analysis was carried out using Spearman's rank correlation coefficient of the pathscore. In total, 1,333 DEGs, 391 upregulated and 942 downregulated, were obtained between the ARLC-SCCs and NARLC-SCCs. A total of 524 important genes for ARLC-SCC were significantly enriched in 22 KEGG pathways. Correlation analysis of these pathways showed that the pathway of SNARE interactions in vesicular transport was significantly correlated with 12 other pathways. Additionally, obvious correlations were found between multiple pathways by sharing cross-talk genes (EGFR, PRKX, PDGFB, PIK3R3, SLK, IGF1, CDC42 and PRKCA). On the whole, our data demonstrate that 8 cross-talk genes were found to bridge multiple ARLC-SCC-specific pathways, which may be used as candidate biomarkers and potential multi-effect targets. As these genes are involved in multiple pathways, it is possible that drugs targeting these genes may thus be able to influence multiple pathways simultaneously. PMID:28204826

  17. Construction of signaling pathways and identification of drug effects on the liver cancer cell HepG2.

    PubMed

    Alexopoulos, Leonidas G; Melas, Ioannis N; Chairakaki, Aikaterini D; Saez-Rodriguez, Julio; Mitsos, Alexander

    2010-01-01

    Construction of signaling pathway maps and identification of drug effects are major challenge for pharmaceutical industries. Signaling maps are usually obtained from manual literature search, automated text mining algorithms, or canonical pathway databases (i.e. Reactome, KEGG, STKE, Pathway Studio, Ingenuity etc.) and in some cases they are used in combination with gene expression or mass spec data in an effort to create pathways specific to cell types or diseases. Our approach combines computational models with novel multicombinatorial high-throughput phosphoproteomic data for the functional analysis of signalling networks in mammalian cells. On the experimental front, we subject the cells with hundreds of co-treatment with a diverse set of ligands and inhibitors and we measure phosphorylation events on key signaling proteins using the xMAP technology. On the computational front, we create pathway maps that are cell type specific by fitting our phosphoprotein dataset into generic signaling maps via an Integer Linear programming formulation. To identify drug effects, we monitor the differences of topologies created with and without the presence of drug. In the present work, we use this approach to identify the effects of Nilotinib, a well known anti-cancer drug.

  18. Gene expression analysis reveals the dysregulation of immune and metabolic pathways in Alzheimer's disease

    PubMed Central

    Li, Zhiyan; Xu, Panpan; Yao, Lifen

    2016-01-01

    In recent years, several pathway analyses of genome-wide association studies reported the involvement of metabolic and immune pathways in Alzheimer's disease (AD). Until now, the exact mechanisms of these pathways in AD are still unclear. Here, we conducted a pathway analysis of a whole genome AD case-control expression dataset (n=41, 25 AD cases and 16 controls) from the human temporal cortex tissue. Using the differently expressed AD genes, we identified significant KEGG pathways related to metabolism and immune processes. Using the up- and down- regulated AD gene list, we further found up-regulated AD gene were significantly enriched in immune and metabolic pathways. We further compare the immune and metabolic KEGG pathways from the expression dataset with those from previous GWAS datasets, and found that most of these pathways are shared in both GWAS and expression datasets. PMID:27732949

  19. MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases

    PubMed Central

    2012-01-01

    Background Increasingly, metabolite and reaction information is organized in the form of genome-scale metabolic reconstructions that describe the reaction stoichiometry, directionality, and gene to protein to reaction associations. A key bottleneck in the pace of reconstruction of new, high-quality metabolic models is the inability to directly make use of metabolite/reaction information from biological databases or other models due to incompatibilities in content representation (i.e., metabolites with multiple names across databases and models), stoichiometric errors such as elemental or charge imbalances, and incomplete atomistic detail (e.g., use of generic R-group or non-explicit specification of stereo-specificity). Description MetRxn is a knowledgebase that includes standardized metabolite and reaction descriptions by integrating information from BRENDA, KEGG, MetaCyc, Reactome.org and 44 metabolic models into a single unified data set. All metabolite entries have matched synonyms, resolved protonation states, and are linked to unique structures. All reaction entries are elementally and charge balanced. This is accomplished through the use of a workflow of lexicographic, phonetic, and structural comparison algorithms. MetRxn allows for the download of standardized versions of existing genome-scale metabolic models and the use of metabolic information for the rapid reconstruction of new ones. Conclusions The standardization in description allows for the direct comparison of the metabolite and reaction content between metabolic models and databases and the exhaustive prospecting of pathways for biotechnological production. This ever-growing dataset currently consists of over 76,000 metabolites participating in more than 72,000 reactions (including unresolved entries). MetRxn is hosted on a web-based platform that uses relational database models (MySQL). PMID:22233419

  20. The alignment of enzymatic steps reveals similar metabolic pathways and probable recruitment events in Gammaproteobacteria.

    PubMed

    Poot-Hernandez, Augusto Cesar; Rodriguez-Vazquez, Katya; Perez-Rueda, Ernesto

    2015-11-17

    It is generally accepted that gene duplication followed by functional divergence is one of the main sources of metabolic diversity. In this regard, there is an increasing interest in the development of methods that allow the systematic identification of these evolutionary events in metabolism. Here, we used a method not based on biomolecular sequence analysis to compare and identify common and variable routes in the metabolism of 40 Gammaproteobacteria species. The metabolic maps deposited in the KEGG database were transformed into linear Enzymatic Step Sequences (ESS) by using the breadth-first search algorithm. These ESS represent subsequent enzymes linked to each other, where their catalytic activities are encoded in the Enzyme Commission numbers. The ESS were compared in an all-against-all (pairwise comparisons) approach by using a dynamic programming algorithm, leaving only a set of significant pairs. From these comparisons, we identified a set of functionally conserved enzymatic steps in different metabolic maps, in which cell wall components and fatty acid and lysine biosynthesis were included. In addition, we found that pathways associated with biosynthesis share a higher proportion of similar ESS than degradation pathways and secondary metabolism pathways. Also, maps associated with the metabolism of similar compounds contain a high proportion of similar ESS, such as those maps from nucleotide metabolism pathways, in particular the inosine monophosphate pathway. Furthermore, diverse ESS associated with the low part of the glycolysis pathway were identified as functionally similar to multiple metabolic pathways. In summary, our comparisons may help to identify similar reactions in different metabolic pathways and could reinforce the patchwork model in the evolution of metabolism in Gammaproteobacteria.

  1. The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection.

    PubMed

    Fernández-Suárez, Xosé M; Rigden, Daniel J; Galperin, Michael Y

    2014-01-01

    The 2014 Nucleic Acids Research Database Issue includes descriptions of 58 new molecular biology databases and recent updates to 123 databases previously featured in NAR or other journals. For convenience, the issue is now divided into eight sections that reflect major subject categories. Among the highlights of this issue are six databases of the transcription factor binding sites in various organisms and updates on such popular databases as CAZy, Database of Genomic Variants (DGV), dbGaP, DrugBank, KEGG, miRBase, Pfam, Reactome, SEED, TCDB and UniProt. There is a strong block of structural databases, which includes, among others, the new RNA Bricks database, updates on PDBe, PDBsum, ArchDB, Gene3D, ModBase, Nucleic Acid Database and the recently revived iPfam database. An update on the NCBI's MMDB describes VAST+, an improved tool for protein structure comparison. Two articles highlight the development of the Structural Classification of Proteins (SCOP) database: one describes SCOPe, which automates assignment of new structures to the existing SCOP hierarchy; the other one describes the first version of SCOP2, with its more flexible approach to classifying protein structures. This issue also includes a collection of articles on bacterial taxonomy and metagenomics, which includes updates on the List of Prokaryotic Names with Standing in Nomenclature (LPSN), Ribosomal Database Project (RDP), the Silva/LTP project and several new metagenomics resources. The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been expanded to 1552 databases. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/).

  2. Session on computation in biological pathways

    SciTech Connect

    Karp, P.D.; Riley, M.

    1996-12-31

    The papers in this session focus on the development of pathway databases and computational tools for pathway analysis. The discussion involves existing databases of sequenced genomes, as well as techniques for studying regulatory pathways.

  3. Gene expression profiling of epithelial ovarian cancer reveals key genes and pathways associated with chemotherapy resistance.

    PubMed

    Zhang, M; Luo, S C

    2016-01-22

    The aim of this study is to analyze gene expression data to identify key genes and pathways associated with resistance to platinum-based chemotherapy in epithelial ovarian cancer (EOC) and to improve clinical treatment strategies. The gene expression data set was downloaded from Gene Expression Omnibus and included 12 chemotherapy-resistant EOC samples and 16 chemotherapy-sensitive EOC samples. A differential analysis was performed to screen out differentially expressed genes (DEGs). A functional enrichment analysis was conducted for the DEGs using the database for annotation, visualization, and integration discovery. A protein-protein interaction (PPI) network was constructed with information from the human protein reference database. Pathway-pathway interactions were determined with a test based on the hypergeometric distribution. A total of 1564 DEGs were identified in chemotherapy-sensitive EOC, including 654 upregulated genes and 910 downregulated genes. The top three upregulated genes were HIST1H3G, AKT3, and RTN3, while the top three downregulated genes were NBLA00301, TRIM62, and EPHA5. A Gene Ontology enrichment analysis showed that cell adhesion, biological adhesion, and intracellular signaling cascades were significantly enriched in the DEGs. A KEGG pathway enrichment analysis revealed that the calcium, mitogen-activated protein kinase, and B cell receptor signaling pathways were significantly over-represented in the DEGs. A PPI network containing 101 interactions was acquired. The top three hub genes were RAC1, CAV1, and BCL2. Five modules were identified from the PPI network. Taken together, these findings could advance the understanding of the molecular mechanisms underlying intrinsic chemotherapy resistance in EOC.

  4. Systematization of the protein sequence diversity in enzymes related to secondary metabolic pathways in plants, in the context of big data biology inspired by the KNApSAcK motorcycle database.

    PubMed

    Ikeda, Shun; Abe, Takashi; Nakamura, Yukiko; Kibinge, Nelson; Hirai Morita, Aki; Nakatani, Atsushi; Ono, Naoaki; Ikemura, Toshimichi; Nakamura, Kensuke; Altaf-Ul-Amin, Md; Kanaya, Shigehiko

    2013-05-01

    Biology is increasingly becoming a data-intensive science with the recent progress of the omics fields, e.g. genomics, transcriptomics, proteomics and metabolomics. The species-metabolite relationship database, KNApSAcK Core, has been widely utilized and cited in metabolomics research, and chronological analysis of that research work has helped to reveal recent trends in metabolomics research. To meet the needs of these trends, the KNApSAcK database has been extended by incorporating a secondary metabolic pathway database called Motorcycle DB. We examined the enzyme sequence diversity related to secondary metabolism by means of batch-learning self-organizing maps (BL-SOMs). Initially, we constructed a map by using a big data matrix consisting of the frequencies of all possible dipeptides in the protein sequence segments of plants and bacteria. The enzyme sequence diversity of the secondary metabolic pathways was examined by identifying clusters of segments associated with certain enzyme groups in the resulting map. The extent of diversity of 15 secondary metabolic enzyme groups is discussed. Data-intensive approaches such as BL-SOM applied to big data matrices are needed for systematizing protein sequences. Handling big data has become an inevitable part of biology.

  5. Path2Models: large-scale generation of computational models from biochemical pathway maps

    PubMed Central

    2013-01-01

    Background Systems biology projects and omics technologies have led to a growing number of biochemical pathway models and reconstructions. However, the majority of these models are still created de novo, based on literature mining and the manual processing of pathway data. Results To increase the efficiency of model creation, the Path2Models project has automatically generated mathematical models from pathway representations using a suite of freely available software. Data sources include KEGG, BioCarta, MetaCyc and SABIO-RK. Depending on the source data, three types of models are provided: kinetic, logical and constraint-based. Models from over 2 600 organisms are encoded consistently in SBML, and are made freely available through BioModels Database at http://www.ebi.ac.uk/biomodels-main/path2models. Each model contains the list of participants, their interactions, the relevant mathematical constructs, and initial parameter values. Most models are also available as easy-to-understand graphical SBGN maps. Conclusions To date, the project has resulted in more than 140 000 freely available models. Such a resource can tremendously accelerate the development of mathematical models by providing initial starting models for simulation and analysis, which can be subsequently curated and further parameterized. PMID:24180668

  6. LEGER: knowledge database and visualization tool for comparative genomics of pathogenic and non-pathogenic Listeria species

    PubMed Central

    Dieterich, Guido; Kärst, Uwe; Fischer, Elmar; Wehland, Jürgen; Jänsch, Lothar

    2006-01-01

    Listeria species are ubiquitous in the environment and often contaminate foods because they grow under conditions used for food preservation. Listeria monocytogenes, the human and animal pathogen, causes Listeriosis, an infection with a high mortality rate in risk groups such as immune-compromised individuals. Furthermore, L.monocytogenes is a model organism for the study of intracellular bacterial pathogens. The publication of its genome sequence and that of the non-pathogenic species Listeria innocua initiated numerous comparative studies and efforts to sequence all species comprising the genus. The Proteome database LEGER () was developed to support functional genome analyses by combining information obtained by applying bioinformatics methods and from public databases to improve the original annotations. LEGER offers three unique key features: (i) it is the first comprehensive information system focusing on the functional assignment of genes and proteins; (ii) integrated visualization tools, KEGG pathway and Genome Viewer, alleviate the functional exploration of complex data; and (iii) LEGER presents results of systematic post-genome studies, thus facilitating analyses combining computational and experimental results. Moreover, LEGER provides an unpublished membrane proteome analysis of L.innocua and in total visualizes experimentally validated information about the subcellular localizations of 789 different listerial proteins. PMID:16381897

  7. EuDBase: An online resource for automated EST analysis pipeline (ESTFrontier) and database for red seaweed Eucheuma denticulatum.

    PubMed

    Hussein, Zeti Azura Mohamed; Loke, Kok Keong; Abidin, Rabiatul Adawiah Zainal; Othman, Roohaida

    2011-01-01

    Functional genomics has proven to be an efficient tool in identifying genes involved in various biological functions. However the availability of commercially important seaweed Eucheuma denticulatum functional resources is still limited. EuDBase is the first seaweed online repository that provides integrated access to ESTs of Eucheuma denticulatum generated from samples collected from Kudat and Semporna in Sabah, Malaysia. The database stored 10,031 ESTs that are clustered and assembled into 2,275 unique transcripts (UT) and 955 singletons. Raw data were automatically processed using ESTFrontier, an in-house automated EST analysis pipeline. Data was collected in MySQL database. Web interface is implemented using PHP and it allows browsing and querying EuDBase through search engine. Data is searchable via BLAST hit, domain search, Gene Ontology or KEGG Pathway. A user-friendly interface allows the identification of sequences either using a simple text query or similarity search. The development of EuDBase is initiated to store, manage and analyze the E. denticulatum ESTs and to provide accumulative digital resources for the use of global scientific community. EuDBase is freely available from http://www.inbiosis.ukm.my/eudbase/.

  8. EuDBase: An online resource for automated EST analysis pipeline (ESTFrontier) and database for red seaweed Eucheuma denticulatum

    PubMed Central

    Hussein, Zeti Azura Mohamed; Loke, Kok Keong; Abidin, Rabiatul Adawiah Zainal; Othman, Roohaida

    2011-01-01

    Functional genomics has proven to be an efficient tool in identifying genes involved in various biological functions. However the availability of commercially important seaweed Eucheuma denticulatum functional resources is still limited. EuDBase is the first seaweed online repository that provides integrated access to ESTs of Eucheuma denticulatum generated from samples collected from Kudat and Semporna in Sabah, Malaysia. The database stored 10,031 ESTs that are clustered and assembled into 2,275 unique transcripts (UT) and 955 singletons. Raw data were automatically processed using ESTFrontier, an in-house automated EST analysis pipeline. Data was collected in MySQL database. Web interface is implemented using PHP and it allows browsing and querying EuDBase through search engine. Data is searchable via BLAST hit, domain search, Gene Ontology or KEGG Pathway. A user-friendly interface allows the identification of sequences either using a simple text query or similarity search. The development of EuDBase is initiated to store, manage and analyze the E. denticulatum ESTs and to provide accumulative digital resources for the use of global scientific community. EuDBase is freely available from http://www.inbiosis.ukm.my/eudbase/. PMID:22102771

  9. A toolbox model of evolution of metabolic pathways on networks of arbitrary topology.

    PubMed

    Pang, Tin Yau; Maslov, Sergei

    2011-05-01

    In prokaryotic genomes the number of transcriptional regulators is known to be proportional to the square of the total number of protein-coding genes. A toolbox model of evolution was recently proposed to explain this empirical scaling for metabolic enzymes and their regulators. According to its rules, the metabolic network of an organism evolves by horizontal transfer of pathways from other species. These pathways are part of a larger "universal" network formed by the union of all species-specific networks. It remained to be understood, however, how the topological properties of this universal network influence the scaling law of functional content of genomes in the toolbox model. Here we answer this question by first analyzing the scaling properties of the toolbox model on arbitrary tree-like universal networks. We prove that critical branching topology, in which the average number of upstream neighbors of a node is equal to one, is both necessary and sufficient for quadratic scaling. We further generalize the rules of the model to incorporate reactions with multiple substrates/products as well as branched and cyclic metabolic pathways. To achieve its metabolic tasks, the new model employs evolutionary optimized pathways with minimal number of reactions. Numerical simulations of this realistic model on the universal network of all reactions in the KEGG database produced approximately quadratic scaling between the number of regulated pathways and the size of the metabolic network. To quantify the geometrical structure of individual pathways, we investigated the relationship between their number of reactions, byproducts, intermediate, and feedback metabolites. Our results validate and explain the ubiquitous appearance of the quadratic scaling for a broad spectrum of topologies of underlying universal metabolic networks. They also demonstrate why, in spite of "small-world" topology, real-life metabolic networks are characterized by a broad distribution of pathway

  10. Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis.

    PubMed

    Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas

    2017-01-21

    We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues.

  11. A Toolbox Model of Evolution of Metabolic Pathways on Networks of Arbitrary Topology

    SciTech Connect

    Maslov, S.; Pang, T.Y.

    2011-05-01

    In prokaryotic genomes the number of transcriptional regulators is known to be proportional to the square of the total number of protein-coding genes. A toolbox model of evolution was recently proposed to explain this empirical scaling for metabolic enzymes and their regulators. According to its rules, the metabolic network of an organism evolves by horizontal transfer of pathways from other species. These pathways are part of a larger 'universal' network formed by the union of all species-specific networks. It remained to be understood, however, how the topological properties of this universal network influence the scaling law of functional content of genomes in the toolbox model. Here we answer this question by first analyzing the scaling properties of the toolbox model on arbitrary tree-like universal networks. We prove that critical branching topology, in which the average number of upstream neighbors of a node is equal to one, is both necessary and sufficient for quadratic scaling. We further generalize the rules of the model to incorporate reactions with multiple substrates/products as well as branched and cyclic metabolic pathways. To achieve its metabolic tasks, the new model employs evolutionary optimized pathways with minimal number of reactions. Numerical simulations of this realistic model on the universal network of all reactions in the KEGG database produced approximately quadratic scaling between the number of regulated pathways and the size of the metabolic network. To quantify the geometrical structure of individual pathways, we investigated the relationship between their number of reactions, byproducts, intermediate, and feedback metabolites. Our results validate and explain the ubiquitous appearance of the quadratic scaling for a broad spectrum of topologies of underlying universal metabolic networks. They also demonstrate why, in spite of 'small-world' topology, real-life metabolic networks are characterized by a broad distribution of pathway

  12. A Toolbox Model of Evolution of Metabolic Pathways on Networks of Arbitrary Topology

    PubMed Central

    Pang, Tin Yau; Maslov, Sergei

    2011-01-01

    In prokaryotic genomes the number of transcriptional regulators is known to be proportional to the square of the total number of protein-coding genes. A toolbox model of evolution was recently proposed to explain this empirical scaling for metabolic enzymes and their regulators. According to its rules, the metabolic network of an organism evolves by horizontal transfer of pathways from other species. These pathways are part of a larger “universal” network formed by the union of all species-specific networks. It remained to be understood, however, how the topological properties of this universal network influence the scaling law of functional content of genomes in the toolbox model. Here we answer this question by first analyzing the scaling properties of the toolbox model on arbitrary tree-like universal networks. We prove that critical branching topology, in which the average number of upstream neighbors of a node is equal to one, is both necessary and sufficient for quadratic scaling. We further generalize the rules of the model to incorporate reactions with multiple substrates/products as well as branched and cyclic metabolic pathways. To achieve its metabolic tasks, the new model employs evolutionary optimized pathways with minimal number of reactions. Numerical simulations of this realistic model on the universal network of all reactions in the KEGG database produced approximately quadratic scaling between the number of regulated pathways and the size of the metabolic network. To quantify the geometrical structure of individual pathways, we investigated the relationship between their number of reactions, byproducts, intermediate, and feedback metabolites. Our results validate and explain the ubiquitous appearance of the quadratic scaling for a broad spectrum of topologies of underlying universal metabolic networks. They also demonstrate why, in spite of “small-world” topology, real-life metabolic networks are characterized by a broad distribution of

  13. Identification of hub genes and pathways associated with retinoblastoma based on co-expression network analysis.

    PubMed

    Wang, Q L; Chen, X; Zhang, M H; Shen, Q H; Qin, Z M

    2015-12-08

    The objective of this paper was to identify hub genes and pathways associated with retinoblastoma using centrality analysis of the co-expression network and pathway-enrichment analysis. The co-expression network of retinoblastoma was constructed by weighted gene co-expression network analysis (WGCNA) based on differentially expressed (DE) genes, and clusters were obtained through the molecular complex detection (MCODE) algorithm. Degree centrality analysis of the co-expression network was performed to explore hub genes present in retinoblastoma. Pathway-enrichment analysis was performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Validation of hub gene expression in retinoblastoma was performed by reverse transcription-polymerase chain reaction (RT-PCR) analysis. The co-expression network based on 221 DE genes between retinoblastoma and normal controls consisted of 210 nodes and 3965 edges, and 5 clusters of the network were evaluated. By assessing the centrality analysis of the co-expression network, 21 hub genes were identified, such as SNORD115-41, RASSF2, and SNORD115-44. According to RT-PCR analysis, 16 of the 21 hub genes were differently expressed, including RASSF2 and CDCA7, and 5 were not differently expressed in retinoblastoma compared to normal controls. Pathway analysis showed that genes in 2 clusters were enriched in 3 pathways: purine metabolism, p53 signaling pathway, and melanogenesis. In this study, we successfully identified 16 hub genes and 3 pathways associated with retinoblastoma, which may be potential biomarkers for early detection and therapy for retinoblastoma.

  14. Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis

    PubMed Central

    Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas

    2017-01-01

    We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues. PMID:28117714

  15. Conceptualizing adverse outcome pathways for ...

    EPA Pesticide Factsheets

    Cyclooxygenase (COX) inhibition is of concern in fish because COX inhibitors (e.g., ibuprofen) are ubiquitous in aquatic systems/fish tissues, and can disrupt synthesis of prostaglandins that modulate a variety of essential biological functions (e.g., reproduction). This study utilized newly generated high content (transcriptomic and metabolomic) empirical data in combination with existing high throughput (ACTOR, epa.gov) toxicity data to facilitate development of adverse outcome pathways (AOPs) for molecular initiating event (MIE) of COX inhibition. We examined effects of a waterborne, 96h exposure to three COX inhibitors (indomethacin (IN; 100 µg/L), ibuprofen (IB; 200 µg/L) and celecoxib (CX; 20 µg/L) on the liver metabolome and ovarian gene expression (using oligonucleotide microarray 4 x15K platform) in sexually mature fathead minnows (n=8). Differentially expressed genes were identified (t-test, p < 0.01), and functional analyses performed to determine enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (p < 0.05). Principal component analysis indicated that liver metabolomics profiles of IN, IB and CX were not significantly different from control or one another. When compared to control, exposure to IB and CX resulted in differential expression of comparable numbers of genes (IB = 433, CX= 545). In contrast, 2558 genes were differentially expressed in IN-treated fish. KEGG pathway analyses show that IN had extensive effects on oocyte meios

  16. Altered molecular expression of the TLR4/NF-κB signaling pathway in mammary tissue of Chinese Holstein cattle with mastitis.

    PubMed

    Wu, Jie; Li, Lian; Sun, Yu; Huang, Shuai; Tang, Juan; Yu, Pan; Wang, Genlin

    2015-01-01

    Toll-like receptor 4 (TLR4) mediated activation of the nuclear transcription factor κB (NF-κB) signaling pathway by mastitis initiates expression of genes associated with inflammation and the innate immune response. In this study, the profile of mastitis-induced differential gene expression in the mammary tissue of Chinese Holstein cattle was investigated by Gene-Chip microarray and bioinformatics. The microarray results revealed that 79 genes associated with the TLR4/NF-κB signaling pathway were differentially expressed. Of these genes, 19 were up-regulated and 29 were down-regulated in mastitis tissue compared to normal, healthy tissue. Statistical analysis of transcript and protein level expression changes indicated that 10 genes, namely TLR4, MyD88, IL-6, and IL-10, were up-regulated, while, CD14, TNF-α, MD-2, IL-β, NF-κB, and IL-12 were significantly down-regulated in mastitis tissue in comparison with normal tissue. Analyses using bioinformatics database resources, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and the Gene Ontology Consortium (GO) for term enrichment analysis, suggested that these differently expressed genes implicate different regulatory pathways for immune function in the mammary gland. In conclusion, our study provides new evidence for better understanding the differential expression and mechanisms of the TLR4 /NF-κB signaling pathway in Chinese Holstein cattle with mastitis.

  17. Altered Molecular Expression of the TLR4/NF-κB Signaling Pathway in Mammary Tissue of Chinese Holstein Cattle with Mastitis

    PubMed Central

    Wu, Jie; Li, Lian; Sun, Yu; Huang, Shuai; Tang, Juan; Yu, Pan; Wang, Genlin

    2015-01-01

    Toll-like receptor 4 (TLR4) mediated activation of the nuclear transcription factor κB (NF-κB) signaling pathway by mastitis initiates expression of genes associated with inflammation and the innate immune response. In this study, the profile of mastitis-induced differential gene expression in the mammary tissue of Chinese Holstein cattle was investigated by Gene-Chip microarray and bioinformatics. The microarray results revealed that 79 genes associated with the TLR4/NF-κB signaling pathway were differentially expressed. Of these genes, 19 were up-regulated and 29 were down-regulated in mastitis tissue compared to normal, healthy tissue. Statistical analysis of transcript and protein level expression changes indicated that 10 genes, namely TLR4, MyD88, IL-6, and IL-10, were up-regulated, while, CD14, TNF-α, MD-2, IL-β, NF-κB, and IL-12 were significantly down-regulated in mastitis tissue in comparison with normal tissue. Analyses using bioinformatics database resources, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and the Gene Ontology Consortium (GO) for term enrichment analysis, suggested that these differently expressed genes implicate different regulatory pathways for immune function in the mammary gland. In conclusion, our study provides new evidence for better understanding the differential expression and mechanisms of the TLR4 /NF-κB signaling pathway in Chinese Holstein cattle with mastitis. PMID:25706977

  18. Enchytraeus albidus Microarray: Enrichment, Design, Annotation and Database (EnchyBASE)

    PubMed Central

    Novais, Sara C.; Arrais, Joel; Lopes, Pedro; Vandenbrouck, Tine; De Coen, Wim; Roelofs, Dick; Soares, Amadeu M. V. M.; Amorim, Mónica J. B.

    2012-01-01

    Enchytraeus albidus (Oligochaeta) is an ecologically relevant species used as standard test organisms for risk assessment. Effects of stressors in this species are commonly determined at the population level using reproduction and survival as endpoints. The assessment of transcriptomic responses can be very useful e.g. to understand underlying mechanisms of toxicity with gene expression fingerprinting. In the present paper the following is being addressed: 1) development of suppressive subtractive hybridization (SSH) libraries enriched for differentially expressed genes after metal and pesticide exposures; 2) sequencing and characterization of all generated cDNA inserts; 3) development of a publicly available genomic database on E. albidus. A total of 2100 Expressed Sequence Tags (ESTs) were isolated, sequenced and assembled into 1124 clusters (947 singletons and 177 contigs). From these sequences, 41% matched known proteins in GenBank (BLASTX, e-value≤10-5) and 37% had at least one Gene Ontology (GO) term assigned. In total, 5.5% of the sequences were assigned to a metabolic pathway, based on KEGG. With this new sequencing information, an Agilent custom oligonucleotide microarray was designed, representing a potential tool for transcriptomic studies. EnchyBASE (http://bioinformatics.ua.pt/enchybase/) was developed as a web freely available database containing genomic information on E. albidus and will be further extended in the near future for other enchytraeid species. The database so far includes all ESTs generated for E. albidus from three cDNA libraries. This information can be downloaded and applied in functional genomics and transcription studies. PMID:22558086

  19. Biofuel Database

    National Institute of Standards and Technology Data Gateway

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  20. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  1. FIREMON Database

    Treesearch

    John F. Caratti

    2006-01-01

    The FIREMON database software allows users to enter data, store, analyze, and summarize plot data, photos, and related documents. The FIREMON database software consists of a Java application and a Microsoft® Access database. The Java application provides the user interface with FIREMON data through data entry forms, data summary reports, and other data management tools...

  2. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  3. The Importance of Biological Databases in Biological Discovery.

    PubMed

    Baxevanis, Andreas D; Bateman, Alex

    2015-06-19

    Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. Copyright © 2015 John Wiley & Sons, Inc.

  4. Prediction and analysis of retinoblastoma related genes through gene ontology and KEGG.

    PubMed

    Li, Zhen; Li, Bi-Qing; Jiang, Min; Chen, Lei; Zhang, Jian; Liu, Lin; Huang, Tao

    2013-01-01

    One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB) is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). 119 RB genes were compiled from two previous RB related studies, while 5,500 non-RB genes were randomly selected from Ensemble genes. Ten datasets were constructed based on all these RB and non-RB genes. Each gene was encoded with a 13,126-dimensional vector including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 1061 GO terms and 8 KEGG pathways was obtained. Analysis showed that these features were closely related to RB. It is anticipated that the method can be applied to predict the other cancer related genes as well.

  5. Transcriptome Analysis and Discovery of Genes Involved in Immune Pathways from Hepatopancreas of Microbial Challenged Mitten Crab Eriocheir sinensis

    PubMed Central

    Li, Xihong; Cui, Zhaoxia; Liu, Yuan; Song, Chengwen; Shi, Guohui

    2013-01-01

    Background The Chinese mitten crab Eriocheir sinensis is an important economic crustacean and has been seriously attacked by various diseases, which requires more and more information for immune relevant genes on genome background. Recently, high-throughput RNA sequencing (RNA-seq) technology provides a powerful and efficient method for transcript analysis and immune gene discovery. Methods/Principal Findings A cDNA library from hepatopancreas of E. sinensis challenged by a mixture of three pathogen strains (Gram-positive bacteria Micrococcus luteus, Gram-negative bacteria Vibrio alginolyticus and fungi Pichia pastoris; 108 cfu·mL−1) was constructed and randomly sequenced using Illumina technique. Totally 39.76 million clean reads were assembled to 70,300 unigenes. After ruling out short-length and low-quality sequences, 52,074 non-redundant unigenes were compared to public databases for homology searching and 17,617 of them showed high similarity to sequences in NCBI non-redundant protein (Nr) database. For function classification and pathway assignment, 18,734 (36.00%) unigenes were categorized to three Gene Ontology (GO) categories, 12,243 (23.51%) were classified to 25 Clusters of Orthologous Groups (COG), and 8,983 (17.25%) were assigned to six Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Potentially, 24, 14, 47 and 132 unigenes were characterized to be involved in Toll, IMD, JAK-STAT and MAPK pathways, respectively. Conclusions/Significance This is the first systematical transcriptome analysis of components relating to innate immune pathways in E. sinensis. Functional genes and putative pathways identified here will contribute to better understand immune system and prevent various diseases in crab. PMID:23874555

  6. Transcriptome analysis and discovery of genes involved in immune pathways from hepatopancreas of microbial challenged mitten crab Eriocheir sinensis.

    PubMed

    Li, Xihong; Cui, Zhaoxia; Liu, Yuan; Song, Chengwen; Shi, Guohui

    2013-01-01

    The Chinese mitten crab Eriocheir sinensis is an important economic crustacean and has been seriously attacked by various diseases, which requires more and more information for immune relevant genes on genome background. Recently, high-throughput RNA sequencing (RNA-seq) technology provides a powerful and efficient method for transcript analysis and immune gene discovery. A cDNA library from hepatopancreas of E. sinensis challenged by a mixture of three pathogen strains (Gram-positive bacteria Micrococcus luteus, Gram-negative bacteria Vibrio alginolyticus and fungi Pichia pastoris; 10(8) cfu·mL(-1)) was constructed and randomly sequenced using Illumina technique. Totally 39.76 million clean reads were assembled to 70,300 unigenes. After ruling out short-length and low-quality sequences, 52,074 non-redundant unigenes were compared to public databases for homology searching and 17,617 of them showed high similarity to sequences in NCBI non-redundant protein (Nr) database. For function classification and pathway assignment, 18,734 (36.00%) unigenes were categorized to three Gene Ontology (GO) categories, 12,243 (23.51%) were classified to 25 Clusters of Orthologous Groups (COG), and 8,983 (17.25%) were assigned to six Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Potentially, 24, 14, 47 and 132 unigenes were characterized to be involved in Toll, IMD, JAK-STAT and MAPK pathways, respectively. This is the first systematical transcriptome analysis of components relating to innate immune pathways in E. sinensis. Functional genes and putative pathways identified here will contribute to better understand immune system and prevent various diseases in crab.

  7. Role and mechanism of the AMPK pathway in waterborne Zn exposure influencing the hepatic energy metabolism of Synechogobius hasta

    NASA Astrophysics Data System (ADS)

    Wu, Kun; Huang, Chao; Shi, Xi; Chen, Feng; Xu, Yi-Huan; Pan, Ya-Xiong; Luo, Zhi; Liu, Xu

    2016-12-01

    Previous studies have investigated the physiological responses in the liver of Synechogobius hasta exposed to waterborne zinc (Zn). However, at present, very little is known about the underlying molecular mechanisms of these responses. In this study, RNA sequencing (RNA-seq) was performed to analyse the differences in the hepatic transcriptomes between control and Zn-exposed S. hasta. A total of 36,339 unigenes and 1,615 bp of unigene N50 were detected. These genes were further annotated to the Nonredundant protein (NR), Nonredundant nucleotide (Nt), Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG), Clusters of Orthologous Groups (COG) and Gene Ontology (GO) databases. After 60 days of Zn exposure, 708 and 237 genes were significantly up- and down-regulated, respectively. Many differentially expressed genes (DEGs) involved in energy metabolic pathways were identified, and their expression profiles suggested increased catabolic processes and reduced biosynthetic processes. These changes indicated that waterborne Zn exposure increased the energy production and requirement, which was related to the activation of the AMPK signalling pathway. Furthermore, using the primary hepatocytes of S. hasta, we identified the role of the AMPK signalling pathway in Zn-influenced energy metabolism.

  8. De novo assembly of Eugenia uniflora L. transcriptome and identification of genes from the terpenoid biosynthesis pathway.

    PubMed

    Guzman, Frank; Kulcheski, Franceli Rodrigues; Turchetto-Zolet, Andreia Carina; Margis, Rogerio

    2014-12-01

    Pitanga (Eugenia uniflora L.) is a member of the Myrtaceae family and is of particular interest due to its medicinal properties that are attributed to specialized metabolites with known biological activities. Among these molecules, terpenoids are the most abundant in essential oils that are found in the leaves and represent compounds with potential pharmacological benefits. The terpene diversity observed in Myrtaceae is determined by the activity of different members of the terpene synthase and oxidosqualene cyclase families. Therefore, the aim of this study was to perform a de novo assembly of transcripts from E. uniflora leaves and to annotation to identify the genes potentially involved in the terpenoid biosynthesis pathway and terpene diversity. In total, 72,742 unigenes with a mean length of 1048bp were identified. Of these, 43,631 and 36,289 were annotated with the NCBI non-redundant protein and Swiss-Prot databases, respectively. The gene ontology categorized the sequences into 53 functional groups. A metabolic pathway analysis with KEGG revealed 8,625 unigenes assigned to 141 metabolic pathways and 40 unigenes predicted to be associated with the biosynthesis of terpenoids. Furthermore, we identified four putative full-length terpene synthase genes involved in sesquiterpenes and monoterpenes biosynthesis, and three putative full-length oxidosqualene cyclase genes involved in the triterpenes biosynthesis. The expression of these genes was validated in different E. uniflora tissues. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  9. Analysis of schizophrenia and hepatocellular carcinoma genetic network with corresponding modularity and pathways: novel insights to the immune system

    PubMed Central

    2013-01-01

    Background Schizophrenic patients show lower incidences of cancer, implicating schizophrenia may be a protective factor against cancer. To study the genetic correlation between the two diseases, a specific PPI network was constructed with candidate genes of both schizophrenia and hepatocellular carcinoma. The network, designated schizophrenia-hepatocellular carcinoma network (SHCN), was analysed and cliques were identified as potential functional modules or complexes. The findings were compared with information from pathway databases such as KEGG, Reactome, PID and ConsensusPathDB. Results The functions of mediator genes from SHCN show immune system and cell cycle regulation have important roles in the eitology mechanism of schizophrenia. For example, the over-expressing schizophrenia candidate genes, SIRPB1, SYK and LCK, are responsible for signal transduction in cytokine production; immune responses involving IL-2 and TREM-1/DAP12 pathways are relevant for the etiology mechanism of schizophrenia. Novel treatments were proposed by searching the target genes of FDA approved drugs with genes in potential protein complexes and pathways. It was found that Vitamin A, retinoid acid and a few other immune response agents modulated by RARA and LCK genes may be potential treatments for both schizophrenia and hepatocellular carcinoma. Conclusions This is the first study showing specific mediator genes in the SHCN which may suppress tumors. We also show that the schizophrenic protein interactions and modulation with cancer implicates the importance of immune system for etiology of schizophrenia. PMID:24564241

  10. Role and mechanism of the AMPK pathway in waterborne Zn exposure influencing the hepatic energy metabolism of Synechogobius hasta

    PubMed Central

    Wu, Kun; Huang, Chao; Shi, Xi; Chen, Feng; Xu, Yi-Huan; Pan, Ya-Xiong; Luo, Zhi; Liu, Xu

    2016-01-01

    Previous studies have investigated the physiological responses in the liver of Synechogobius hasta exposed to waterborne zinc (Zn). However, at present, very little is known about the underlying molecular mechanisms of these responses. In this study, RNA sequencing (RNA-seq) was performed to analyse the differences in the hepatic transcriptomes between control and Zn-exposed S. hasta. A total of 36,339 unigenes and 1,615 bp of unigene N50 were detected. These genes were further annotated to the Nonredundant protein (NR), Nonredundant nucleotide (Nt), Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG), Clusters of Orthologous Groups (COG) and Gene Ontology (GO) databases. After 60 days of Zn exposure, 708 and 237 genes were significantly up- and down-regulated, respectively. Many differentially expressed genes (DEGs) involved in energy metabolic pathways were identified, and their expression profiles suggested increased catabolic processes and reduced biosynthetic processes. These changes indicated that waterborne Zn exposure increased the energy production and requirement, which was related to the activation of the AMPK signalling pathway. Furthermore, using the primary hepatocytes of S. hasta, we identified the role of the AMPK signalling pathway in Zn-influenced energy metabolism. PMID:27934965

  11. Tracing the Repertoire of Promiscuous Enzymes along the Metabolic Pathways in Archaeal Organisms

    PubMed Central

    Rodríguez-Vázquez, Katya

    2017-01-01

    The metabolic pathways that carry out the biochemical transformations sustaining life depend on the efficiency of their associated enzymes. In recent years, it has become clear that promiscuous enzymes have played an important role in the function and evolution of metabolism. In this work we analyze the repertoire of promiscuous enzymes in 89 non-redundant genomes of the Archaea cellular domain. Promiscuous enzymes are defined as those proteins with two or more different Enzyme Commission (E.C.) numbers, according the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. From this analysis, it was found that the fraction of promiscuous enzymes is lower in Archaea than in Bacteria. A greater diversity of superfamily domains is associated with promiscuous enzymes compared to specialized enzymes, both in Archaea and Bacteria, and there is an enrichment of substrate promiscuity rather than catalytic promiscuity in the archaeal enzymes. Finally, the presence of promiscuous enzymes in the metabolic pathways was found to be heterogeneously distributed at the domain level and in the phyla that make up the Archaea. These analyses increase our understanding of promiscuous enzymes and provide additional clues to the evolution of metabolism in Archaea. PMID:28703743

  12. Tracing the Repertoire of Promiscuous Enzymes along the Metabolic Pathways in Archaeal Organisms.

    PubMed

    Martínez-Núñez, Mario Alberto; Rodríguez-Escamilla, Zuemy; Rodríguez-Vázquez, Katya; Pérez-Rueda, Ernesto

    2017-07-13

    The metabolic pathways that carry out the biochemical transformations sustaining life depend on the efficiency of their associated enzymes. In recent years, it has become clear that promiscuous enzymes have played an important role in the function and evolution of metabolism. In this work we analyze the repertoire of promiscuous enzymes in 89 non-redundant genomes of the Archaea cellular domain. Promiscuous enzymes are defined as those proteins with two or more different Enzyme Commission (E.C.) numbers, according the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. From this analysis, it was found that the fraction of promiscuous enzymes is lower in Archaea than in Bacteria. A greater diversity of superfamily domains is associated with promiscuous enzymes compared to specialized enzymes, both in Archaea and Bacteria, and there is an enrichment of substrate promiscuity rather than catalytic promiscuity in the archaeal enzymes. Finally, the presence of promiscuous enzymes in the metabolic pathways was found to be heterogeneously distributed at the domain level and in the phyla that make up the Archaea. These analyses increase our understanding of promiscuous enzymes and provide additional clues to the evolution of metabolism in Archaea.

  13. Aptamer Database

    PubMed Central

    Lee, Jennifer F.; Hesselberth, Jay R.; Meyers, Lauren Ancel; Ellington, Andrew D.

    2004-01-01

    The aptamer database is designed to contain comprehensive sequence information on aptamers and unnatural ribozymes that have been generated by in vitro selection methods. Such data are not normally collected in ‘natural’ sequence databases, such as GenBank. Besides serving as a storehouse of sequences that may have diagnostic or therapeutic utility, the database serves as a valuable resource for theoretical biologists who describe and explore fitness landscapes. The database is updated monthly and is publicly available at http://aptamer.icmb.utexas.edu/. PMID:14681367

  14. miRNAs target databases: developmental methods and target identification techniques with functional annotations.

    PubMed

    Singh, Nagendra Kumar

    2017-06-01

    microRNA (miRNA) regulates diverse biological mechanisms and metabolisms in plants and animals. Thus, the discoveries of miRNA has revolutionized the life sciences and medical research.The miRNA represses and cleaves the targeted mRNA by binding perfect or near perfect or imperfect complementary base pairs by RNA-induced silencing complex (RISC) formation during biogenesis process. One miRNA interacts with one or more mRNA genes and vice versa, hence takes part in causing various diseases. In this paper, the different microRNA target databases and their functional annotations developed by various researchers have been reviewed. The concurrent research review aims at comprehending the significance of miRNA and presenting the existing status of annotated miRNA target resources built by researchers henceforth discovering the knowledge for diagnosis and prognosis. This review discusses the applications and developmental methodologies for constructing target database as well as the utility of user interface design. An integrated architecture is drawn and a graphically comparative study of present status of miRNA targets in diverse diseases and various biological processes is performed. These databases comprise of information such as miRNA target-associated disease, transcription factor binding sites (TFBSs) in miRNA genomic locations, polymorphism in miRNA target, A-to-I edited target, Gene Ontology (GO), genome annotations, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, target expression analysis, TF-miRNA and miRNA-mRNA interaction networks, drugs-targets interactions, etc. miRNA target databases contain diverse experimentally and computationally predicted target through various algorithms. The comparison of various miRNA target database has been performed on various parameters. The computationally predicted target databases suffer from false positive information as there is no common theory for prediction of miRNA targets. The review conclusion emphasizes

  15. Pathway collages: personalized multi-pathway diagrams.

    PubMed

    Paley, Suzanne; O'Maille, Paul E; Weaver, Daniel; Karp, Peter D

    2016-12-13

    Metabolic pathway diagrams are a classical way of visualizing a linked cascade of biochemical reactions. However, to understand some biochemical situations, viewing a single pathway is insufficient, whereas viewing the entire metabolic network results in information overload. How do we enable scientists to rapidly construct personalized multi-pathway diagrams that depict a desired collection of interacting pathways that emphasize particular pathway interactions? We define software for constructing personalized multi-pathway diagrams called pathway-collages using a combination of manual and automatic layouts. The user specifies a set of pathways of interest for the collage from a Pathway/Genome Database. Layouts for the individual pathways are generated by the Pathway Tools software, and are sent to a Javascript Pathway Collage application implemented using Cytoscape.js. That application allows the user to re-position pathways; define connections between pathways; change visual style parameters; and paint metabolomics, gene expression, and reaction flux data onto the collage to obtain a desired multi-pathway diagram. We demonstrate the use of pathway collages in two application areas: a metabolomics study of pathogen drug response, and an Escherichia coli metabolic model. Pathway collages enable facile construction of personalized multi-pathway diagrams.

  16. CSGene: a literature-based database for cell senescence genes and its application to identify critical cell aging pathways and associated diseases

    PubMed Central

    Zhao, M; Chen, L; Qu, H

    2016-01-01

    Cell senescence is a cellular process in which normal diploid cells cease to replicate and is a major driving force for human cancers and aging-associated diseases. Recent studies on cell senescence have identified many new genetic components and pathways that control cell aging. However, there is no comprehensive resource for cell senescence that integrates various genetic studies and relationships with cell senescence, and the risk associated with complex diseases such as cancer is still unexplored. We have developed the first literature-based gene resource for exploring cell senescence genes, CSGene. We complied 504 experimentally verified genes from public data resources and published literature. Pathway analyses highlighted the prominent roles of cell senescence genes in the control of rRNA gene transcription and unusual rDNA repeat that constitute a center for the stability of the whole genome. We also found a strong association of cell senescence with HIV-1 infection and viral carcinogenesis that are mainly related to promoter/enhancer binding and chromatin modification processes. Moreover, pan-cancer mutation and network analysis also identified common cell aging mechanisms in cancers and uncovered a highly modular network structure. These results highlight the utility of CSGene for elucidating the complex cellular events of cell senescence. PMID:26775705

  17. CSGene: a literature-based database for cell senescence genes and its application to identify critical cell aging pathways and associated diseases.

    PubMed

    Zhao, M; Chen, L; Qu, H

    2016-01-14

    Cell senescence is a cellular process in which normal diploid cells cease to replicate and is a major driving force for human cancers and aging-associated diseases. Recent studies on cell senescence have identified many new genetic components and pathways that control cell aging. However, there is no comprehensive resource for cell senescence that integrates various genetic studies and relationships with cell senescence, and the risk associated with complex diseases such as cancer is still unexplored. We have developed the first literature-based gene resource for exploring cell senescence genes, CSGene. We complied 504 experimentally verified genes from public data resources and published literature. Pathway analyses highlighted the prominent roles of cell senescence genes in the control of rRNA gene transcription and unusual rDNA repeat that constitute a center for the stability of the whole genome. We also found a strong association of cell senescence with HIV-1 infection and viral carcinogenesis that are mainly related to promoter/enhancer binding and chromatin modification processes. Moreover, pan-cancer mutation and network analysis also identified common cell aging mechanisms in cancers and uncovered a highly modular network structure. These results highlight the utility of CSGene for elucidating the complex cellular events of cell senescence.

  18. An Approach for Identification of Novel Drug Targets in Streptococcus pyogenes SF370 Through Pathway Analysis.

    PubMed

    Singh, Satendra; Singh, Dev Bukhsh; Singh, Anamika; Gautam, Budhayash; Ram, Gurudayal; Dwivedi, Seema; Ramteke, Pramod W

    2016-12-01

    Streptococcus pyogenes is one of the most important pathogens as it is involved in various infections affecting upper respiratory tract and skin. Due to the emergence of multidrug resistance and cross-resistance, S. Pyogenes is becoming more pathogenic and dangerous. In the present study, an in silico comparative analysis of total 65 metabolic pathways of the host (Homo sapiens) and the pathogen was performed. Initially, 486 paralogous enzymes were identified so that they can be removed from possible drug target list. The 105 enzymes of the biochemical pathways of S. pyogenes from the KEGG metabolic pathway database were compared with the proteins from the Homo sapiens by performing a BLASTP search against the non-redundant database restricted to the Homo sapiens subset. Out of these, 83 enzymes were identified as non-human homologous while 30 enzymes of inadequate amino acid length were removed for further processing. Essential enzymes were finally mined from remaining 53 enzymes. Finally, 28 essential enzymes were identified in S. pyogenes SF370 (serotype M1). In subcellular localization study, 18 enzymes were predicted with cytoplasmic localization and ten enzymes with the membrane localization. These ten enzymes with putative membrane localization should be of particular interest. Acyl-carrier-protein S-malonyltransferase, DNA polymerase III subunit beta and dihydropteroate synthase are novel drug targets and thus can be used to design potential inhibitors against S. pyogenes infection. 3D structure of dihydropteroate synthase was modeled and validated that can be used for virtual screening and interaction study of potential inhibitors with the target enzyme.

  19. Maize databases

    USDA-ARS?s Scientific Manuscript database

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  20. Database Manager

    ERIC Educational Resources Information Center

    Martin, Andrew

    2010-01-01

    It is normal practice today for organizations to store large quantities of records of related information as computer-based files or databases. Purposeful information is retrieved by performing queries on the data sets. The purpose of DATABASE MANAGER is to communicate to students the method by which the computer performs these queries. This…

  1. Image Databases.

    ERIC Educational Resources Information Center

    Pettersson, Rune

    Different kinds of pictorial databases are described with respect to aims, user groups, search possibilities, storage, and distribution. Some specific examples are given for databases used for the following purposes: (1) labor markets for artists; (2) document management; (3) telling a story; (4) preservation (archives and museums); (5) research;…

  2. Database Manager

    ERIC Educational Resources Information Center

    Martin, Andrew

    2010-01-01

    It is normal practice today for organizations to store large quantities of records of related information as computer-based files or databases. Purposeful information is retrieved by performing queries on the data sets. The purpose of DATABASE MANAGER is to communicate to students the method by which the computer performs these queries. This…

  3. BIAdb: A curated database of benzylisoquinoline alkaloids

    PubMed Central

    2010-01-01

    Background Benzylisoquinoline is the structural backbone of many alkaloids with a wide variety of structures including papaverine, noscapine, codeine, morphine, apomorphine, berberine, protopine and tubocurarine. Many benzylisoquinoline alkaloids have been reported to show therapeutic properties and to act as novel medicines. Thus it is important to collect and compile benzylisoquinoline alkaloids in order to explore their usage in medicine. Description We extract information about benzylisoquinoline alkaloids from various sources like PubChem, KEGG, KNApSAcK and manual curation from literature. This information was processed and compiled in order to create a comprehensive database of benzylisoquinoline alkaloids, called BIAdb. The current version of BIAdb contains information about 846 unique benzylisoquinoline alkaloids, with multiple entries in term of source, function leads to total number of 2504 records. One of the major features of this database is that it provides data about 627 different plant species as a source of benzylisoquinoline and 114 different types of function performed by these compounds. A large number of online tools have been integrated, which facilitate user in exploring full potential of BIAdb. In order to provide additional information, we give external links to other resources/databases. One of the important features of this database is that it is tightly integrated with Drugpedia, which allows managing data in fixed/flexible format. Conclusions A database of benzylisoquinoline compounds has been created, which provides comprehensive information about benzylisoquinoline alkaloids. This database will be very useful for those who are working in the field of drug discovery based on natural products. This database will also serve researchers working in the field of synthetic biology, as developing medicinally important alkaloids using synthetic process are one of important challenges. This database is available from http

  4. Variations in target gene expression and pathway profiles in the mouse hippocampus following treatment with different effective compounds for ischemia-reperfusion injury.

    PubMed

    Chen, Yinying; Zhou, Caixiu; Yu, Yanan; Liu, Jun; Jing, Zhiwei; Lv, Aiping; Meng, Fanyun; Wang, Zhong; Wang, Yongyan

    2012-08-01

    In order to elucidate the overlapping and diverse pharmacological protective mechanisms of different Chinese medicinal compounds, we investigated the alteration of gene expression and activation of signaling pathways in the mouse hippocampus after treatment of cerebral ischemia-reperfusion injury with various compounds. A microarray including 16,463 genes was used to identify differentially expressed genes among six treatment groups: baicalin (BA), jasminoidin (JA), cholic acid (CA), concha margaritiferausta (CM), sham, and vehicle. The US Food and Drug Administration (FDA) ArrayTrack system and Kyoto Encyclopedia of Genes and Genomes (KEGG) database were used to screen significantly altered genes and pathways (P < 0.05, fold change >1.5). Vehicle treatment alone resulted in alteration of 726 genes (283 upregulated, 443 downregulated) compared to the sham treatment group. BA, JA, and CA treatments, but not CM treatment, were effective in reducing infarct volume compared with vehicle treatment (P < 0.05). Compared with the CM group, a total of 167 (73 upregulated, 94 downregulated), 379 (211 upregulated, 168 downregulated), and 181 (76 upregulated, 105 downregulated) altered genes were found in the BA, JA, and CA groups, respectively. The numbers of overlapping genes between the BA and JA, BA and CA, and JA and CA groups were 28 (16 upregulated, 12 downregulated), 14 (4 upregulated, 10 downregulated), and 31 (8 upregulated, 23 downregulated), respectively. Three overlapping genes were identified among the BA, JA, and CA treatment groups: Il1rap, Gnb5, and Wdr38. Based on KEGG pathway analysis, two, seven, and four pathways were significantly activated in the BA, JA, and CA groups, respectively, when compared to the CM group. The ATP-binding cassette (ABC) transporters general pathway was activated by BA and JA treatment, and the mitogen-activated protein kinase (MAPK) signaling pathway was activated by JA and CA treatment. Alteration of IL-1 and Hspa1a expression

  5. Plasma-based proteomics reveals immune response, complement and coagulation cascades pathway shifts in heat-stressed lactating dairy cows.

    PubMed

    Min, Li; Cheng, Jianbo; Zhao, Shengguo; Tian, He; Zhang, Yangdong; Li, Songli; Yang, Hongjian; Zheng, Nan; Wang, Jiaqi

    2016-09-02

    Heat stress (HS) has an enormous economic impact on the dairy industry. In recent years, many researchers have investigated changes in the gene expression and metabolomics profiles in dairy cows caused by HS. However, the proteomics profiles of heat-stressed dairy cows have not yet been completely elucidated. We compared plasma proteomics from HS-free and heat-stressed dairy cows using an iTRAQ labeling approach. After the depletion of high abundant proteins in the plasma, 1472 proteins were identified. Of these, 85 proteins were differentially abundant in cows exposed to HS relative to HS-free. Database searches combined with GO and KEGG pathway enrichment analyses revealed that many components of the complement and coagulation cascades were altered in heat-stressed cows compared with HS-free cows. Of these, many factors in the complement system (including complement components C1, C3, C5, C6, C7, C8, and C9, complement factor B, and factor H) were down-regulated by HS, while components of the coagulation system (including coagulation factors, vitamin K-dependent proteins, and fibrinogens) were up-regulated by HS. In conclusion, our results indicate that HS decreases plasma levels of complement system proteins, suggesting that immune function is impaired in dairy cows exposed to HS. Though many aspects of heat stress (HS) have been extensively researched, relatively little is known about the proteomics profile changes that occur during heat exposure. In this work, we employed a proteomics approach to investigate differential abundance of plasma proteins in HS-free and heat-stressed dairy cows. Database searches combined with GO and KEGG pathway enrichment analyses revealed that HS resulted in a decrease in complement components, suggesting that heat-stressed dairy cows have impaired immune function. In addition, through integrative analyses of proteomics and previous metabolomics, we showed enhanced glycolysis, lipid metabolic pathway shifts, and nitrogen

  6. Pathway Enrichment Analysis with Networks.

    PubMed

    Liu, Lu; Wei, Jinmao; Ruan, Jianhua

    2017-09-28

    Detecting associations between an input gene set and annotated gene sets (e.g., pathways) is an important problem in modern molecular biology. In this paper, we propose two algorithms, termed NetPEA and NetPEA', for conducting network-based pathway enrichment analysis. Our algorithms consider not only shared genes but also gene-gene interactions. Both algorithms utilize a protein-protein interaction network and a random walk with a restart procedure to identify hidden relationships between an input gene set and pathways, but both use different randomization strategies to evaluate statistical significance and as a result emphasize different pathway properties. Compared to an over representation-based method, our algorithms can identify more statistically significant pathways. Compared to an existing network-based algorithm, EnrichNet, our algorithms have a higher sensitivity in revealing the true causal pathways while at the same time achieving a higher specificity. A literature review of selected results indicates that some of the novel pathways reported by our algorithms are biologically relevant and important. While the evaluations are performed only with KEGG pathways, we believe the algorithms can be valuable for general functional discovery from high-throughput experiments.

  7. LigandBox: A database for 3D structures of chemical compounds.

    PubMed

    Kawabata, Takeshi; Sugihara, Yusuke; Fukunishi, Yoshifumi; Nakamura, Haruki

    2013-01-01

    A database for the 3D structures of available compounds is essential for the virtual screening by molecular docking. We have developed the LigandBox database (http://ligandbox.protein.osaka-u.ac.jp/ligandbox/) containing four million available compounds, collected from the catalogues of 37 commercial suppliers, and approved drugs and biochemical compounds taken from KEGG_DRUG, KEGG_COMPOUND and PDB databases. Each chemical compound in the database has several 3D conformers with hydrogen atoms and atomic charges, which are ready to be docked into receptors using docking programs. The 3D conformations were generated using our molecular simulation program package, myPresto. Various physical properties, such as aqueous solubility (LogS) and carcinogenicity have also been calculated to characterize the ADME-Tox properties of the compounds. The Web database provides two services for compound searches: a property/chemical ID search and a chemical structure search. The chemical structure search is performed by a descriptor search and a maximum common substructure (MCS) search combination, using our program kcombu. By specifying a query chemical structure, users can find similar compounds among the millions of compounds in the database within a few minutes. Our database is expected to assist a wide range of researchers, in the fields of medical science, chemical biology, and biochemistry, who are seeking to discover active chemical compounds by the virtual screening.

  8. The EcoCyc Database

    PubMed Central

    Karp, Peter D.; Riley, Monica; Saier, Milton; Paulsen, Ian T.; Collado-Vides, Julio; Paley, Suzanne M.; Pellegrini-Toole, Alida; Bonavides, César; Gama-Castro, Socorro

    2002-01-01

    EcoCyc is an organism-specific pathway/genome database that describes the metabolic and signal-transduction pathways of Escherichia coli, its enzymes, its transport proteins and its mechanisms of transcriptional control of gene expression. EcoCyc is queried using the Pathway Tools graphical user interface, which provides a wide variety of query operations and visualization tools. EcoCyc is available at http://ecocyc.org/. PMID:11752253

  9. Integrative Pathway Analysis of Metabolic Signature in Bladder Cancer: A Linkage to The Cancer Genome Atlas Project and Prediction of Survival

    PubMed Central

    von Rundstedt, Friedrich-Carl; Rajapakshe, Kimal; Ma, Jing; Arnold, James M.; Gohlke, Jie; Putluri, Vasanta; Krishnapuram, Rashmi; Piyarathna, D. Badrajee; Lotan, Yair; Gödde, Daniel; Roth, Stephan; Störkel, Stephan; Levitt, Jonathan M.; Michailidis, George; Sreekumar, Arun; Lerner, Seth P.; Coarfa, Cristian; Putluri, Nagireddy

    2016-01-01

    Purpose We used targeted mass spectrometry to study the metabolic fingerprint of urothelial cancer and determine whether the biochemical pathway analysis gene signature would have a predictive value in independent cohorts of patients with bladder cancer. Materials and Methods Pathologically evaluated, bladder derived tissues, including benign adjacent tissue from 14 patients and bladder cancer from 46, were analyzed by liquid chromatography based targeted mass spectrometry. Differential metabolites associated with tumor samples in comparison to benign tissue were identified by adjusting the p values for multiple testing at a false discovery rate threshold of 15%. Enrichment of pathways and processes associated with the metabolic signature were determined using the GO (Gene Ontology) Database and MSigDB (Molecular Signature Database). Integration of metabolite alterations with transcriptome data from TCGA (The Cancer Genome Atlas) was done to identify the molecular signature of 30 metabolic genes. Available outcome data from TCGA portal were used to determine the association with survival. Results We identified 145 metabolites, of which analysis revealed 31 differential metabolites when comparing benign and tumor tissue samples. Using the KEGG (Kyoto Encyclopedia of Genes and Genomes) Database we identified a total of 174 genes that correlated with the altered metabolic pathways involved. By integrating these genes with the transcriptomic data from the corresponding TCGA data set we identified a metabolic signature consisting of 30 genes. The signature was significant in its prediction of survival in 95 patients with a low signature score vs 282 with a high signature score (p = 0.0458). Conclusions Targeted mass spectrometry of bladder cancer is highly sensitive for detecting metabolic alterations. Applying transcriptome data allows for integration into larger data sets and identification of relevant metabolic pathways in bladder cancer progression. PMID:26802582

  10. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  11. Neuroplasticity and second messenger pathways in antidepressant efficacy: pharmacogenetic results from a prospective trial investigating treatment resistance.

    PubMed

    Fabbri, Chiara; Crisafulli, Concetta; Calati, Raffaella; Albani, Diego; Forloni, Gianluigi; Calabrò, Marco; Martines, Rosalba; Kasper, Siegfried; Zohar, Joseph; Juven-Wetzler, Alzbeta; Souery, Daniel; Montgomery, Stuart; Mendlewicz, Julien; Serretti, Alessandro

    2017-03-04

    Genes belonging to neuroplasticity, monoamine, circadian rhythm, and transcription factor pathways were investigated as modulators of antidepressant efficacy. The present study aimed (1) to replicate previous findings in an independent sample with treatment-resistant depression (TRD), and (2) to perform a pathway analysis to investigate the possible molecular mechanisms involved. 220 patients with major depressive disorder who were non-responders to a previous antidepressant were treated with venlafaxine for 4-6 weeks and in case of non-response with escitalopram for 4-6 weeks. Symptoms were assessed using the Montgomery Asberg Depression Rating Scale. The phenotypes were response and remission to venlafaxine, non-response (TRDA) and non-remission (TRDB) to neither venlafaxine nor escitalopram. 50 tag SNPs in 14 genes belonging to the pathways of interest were tested for association with phenotypes. Molecular pathways (KEGG database) that included one or more of the genes associated with the phenotypes were investigated also in the STAR*D sample. The associations between ZNF804A rs7603001 and response, CREB1 rs2254137 and remission were replicated, as well as CHL1 rs2133402 and lower risk of TRD. Other CHL1 SNPs were potential predictors of TRD (rs1516340, rs2272522, rs1516338, rs2133402). The MAPK1 rs6928 SNP was consistently associated with all the phenotypes. The protein processing in endoplasmic reticulum pathway (hsa04141) was the best pathway that may explain the mechanisms of MAPK1 involvement in antidepressant response. Signals in genes previously associated with antidepressant efficacy were confirmed for CREB1, ZNF804A and CHL1. These genes play pivotal roles in synaptic plasticity, neural activity and connectivity.

  12. Chemotography for multi-target SAR analysis in the context of biological pathways.

    PubMed

    Lounkine, Eugen; Kutchukian, Peter; Petrone, Paula; Davies, John W; Glick, Meir

    2012-09-15

    The increasing amount of chemogenomics data, that is, activity measurements of many compounds across a variety of biological targets, allows for better understanding of pharmacology in a broad biological context. Rather than assessing activity at individual biological targets, today understanding of compound interaction with complex biological systems and molecular pathways is often sought in phenotypic screens. This perspective poses novel challenges to structure-activity relationship (SAR) assessment. Today, the bottleneck of drug discovery lies in the understanding of SAR of rich datasets that go beyond single targets in the context of biological pathways, potential off-targets, and complex selectivity profiles. To aid in the understanding and interpretation of such complex SAR, we introduce Chemotography (chemotype chromatography), which encodes chemical space using a color spectrum by combining clustering and multidimensional scaling. Rich biological data in our approach were visualized using spatial dimensions traditionally reserved for chemical space. This allowed us to analyze SAR in the context of target hierarchies and phylogenetic trees, two-target activity scatter plots, and biological pathways. Chemotography, in combination with the Kyoto Encyclopedia of Genes and Genomes (KEGG), also allowed us to extract pathway-relevant SAR from the ChEMBL database. We identified chemotypes showing polypharmacology and selectivity-conferring scaffolds, even in cases where individual compounds have not been tested against all relevant targets. In addition, we analyzed SAR in ChEMBL across the entire Kinome, going beyond individual compounds. Our method combines the strengths of chemical space visualization for SAR analysis and graphical representation of complex biological data. Chemotography is a new paradigm for chemogenomic data visualization and its versatile applications presented here may allow for improved assessment of SAR in biological context, such as

  13. MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics.

    PubMed

    Jeffryes, James G; Colastani, Ricardo L; Elbadawi-Sidhu, Mona; Kind, Tobias; Niehaus, Thomas D; Broadbelt, Linda J; Hanson, Andrew D; Fiehn, Oliver; Tyo, Keith E J; Henry, Christopher S

    2015-01-01

    In spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography-mass spectrometry (LC-MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC-MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical

  14. The NCBI BioSystems database.

    PubMed

    Geer, Lewis Y; Marchler-Bauer, Aron; Geer, Renata C; Han, Lianyi; He, Jane; He, Siqian; Liu, Chunlei; Shi, Wenyao; Bryant, Stephen H

    2010-01-01

    The NCBI BioSystems database, found at http://www.ncbi.nlm.nih.gov/biosystems/, centralizes and cross-links existing biological systems databases, increasing their utility and target audience by integrating their pathways and systems into NCBI resources. This integration allows users of NCBI's Entrez databases to quickly categorize proteins, genes and small molecules by metabolic pathway, disease state or other BioSystem type, without requiring time-consuming inference of biological relationships from the literature or multiple experimental datasets.

  15. ocsESTdb: a database of oil crop seed EST sequences for comparative analysis and investigation of a global metabolic network and oil accumulation metabolism.

    PubMed

    Ke, Tao; Yu, Jingyin; Dong, Caihua; Mao, Han; Hua, Wei; Liu, Shengyi

    2015-01-21

    Oil crop seeds are important sources of fatty acids (FAs) for human and animal nutrition. Despite their importance, there is a lack of an essential bioinformatics resource on gene transcription of oil crops from a comparative perspective. In this study, we developed ocsESTdb, the first database of expressed sequence tag (EST) information on seeds of four large-scale oil crops with an emphasis on global metabolic networks and oil accumulation metabolism that target the involved unigenes. A total of 248,522 ESTs and 106,835 unigenes were collected from the cDNA libraries of rapeseed (Brassica napus), soybean (Glycine max), sesame (Sesamum indicum) and peanut (Arachis hypogaea). These unigenes were annotated by a sequence similarity search against databases including TAIR, NR protein database, Gene Ontology, COG, Swiss-Prot, TrEMBL and Kyoto Encyclopedia of Genes and Genomes (KEGG). Five genome-scale metabolic networks that contain different numbers of metabolites and gene-enzyme reaction-association entries were analysed and constructed using Cytoscape and yEd programs. Details of unigene entries, deduced amino acid sequences and putative annotation are available from our database to browse, search and download. Intuitive and graphical representations of EST/unigene sequences, functional annotations, metabolic pathways and metabolic networks are also available. ocsESTdb will be updated regularly and can be freely accessed at http://ocri-genomics.org/ocsESTdb/ . ocsESTdb may serve as a valuable and unique resource for comparative analysis of acyl lipid synthesis and metabolism in oilseed plants. It also may provide vital insights into improving oil content in seeds of oil crop species by transcriptional reconstruction of the metabolic network.

  16. hp-DPI: Helicobacter pylori database of protein interactomes--embracing experimental and inferred interactions.

    PubMed

    Lin, Chung-Yen; Chen, Chia-Ling; Cho, Chi-Shiang; Wang, Li-Ming; Chang, Chia-Ming; Chen, Pao-Yang; Lo, Chen-Zen; Hsiung, Chao A

    2005-04-01

    We implemented a statistical model into our protein interaction database for validation of two-hybrid assays of Helicobacter pylori, and prediction of putative protein interactions not yet discovered experimentally. To present the enormous amount of experimental and inferred protein interaction networking maps, the H.pylori Database of Protein Interactomes (hp-DPI) is developed with a succinct yet comprehensive visualization tool integrated with annotation from Genbank, GO, and KEGG. hp-DPI is first built with, but not limited to, H.pylori protein interactions and is expected to naturally include other organisms' protein interacting relationships in the future.

  17. Experiment Databases

    NASA Astrophysics Data System (ADS)

    Vanschoren, Joaquin; Blockeel, Hendrik

    Next to running machine learning algorithms based on inductive queries, much can be learned by immediately querying the combined results of many prior studies. Indeed, all around the globe, thousands of machine learning experiments are being executed on a daily basis, generating a constant stream of empirical information on machine learning techniques. While the information contained in these experiments might have many uses beyond their original intent, results are typically described very concisely in papers and discarded afterwards. If we properly store and organize these results in central databases, they can be immediately reused for further analysis, thus boosting future research. In this chapter, we propose the use of experiment databases: databases designed to collect all the necessary details of these experiments, and to intelligently organize them in online repositories to enable fast and thorough analysis of a myriad of collected results. They constitute an additional, queriable source of empirical meta-data based on principled descriptions of algorithm executions, without reimplementing the algorithms in an inductive database. As such, they engender a very dynamic, collaborative approach to experimentation, in which experiments can be freely shared, linked together, and immediately reused by researchers all over the world. They can be set up for personal use, to share results within a lab or to create open, community-wide repositories. Here, we provide a high-level overview of their design, and use an existing experiment database to answer various interesting research questions about machine learning algorithms and to verify a number of recent studies.

  18. Pathway Distiller - multisource biological pathway consolidation

    PubMed Central

    2012-01-01

    Background One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. Methods After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. Results We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/Pathway

  19. Creating and analyzing pathway and protein interaction compendia for modelling signal transduction networks.

    PubMed

    Kirouac, Daniel C; Saez-Rodriguez, Julio; Swantek, Jennifer; Burke, John M; Lauffenburger, Douglas A; Sorger, Peter K

    2012-05-01

    Understanding the information-processing capabilities of signal transduction networks, how those networks are disrupted in disease, and rationally designing therapies to manipulate diseased states require systematic and accurate reconstruction of network topology. Data on networks central to human physiology, such as the inflammatory signalling networks analyzed here, are found in a multiplicity of on-line resources of pathway and interactome databases (Cancer CellMap, GeneGo, KEGG, NCI-Pathway Interactome Database (NCI-PID), PANTHER, Reactome, I2D, and STRING). We sought to determine whether these databases contain overlapping information and whether they can be used to construct high reliability prior knowledge networks for subsequent modeling of experimental data. We have assembled an ensemble network from multiple on-line sources representing a significant portion of all machine-readable and reconcilable human knowledge on proteins and protein interactions involved in inflammation. This ensemble network has many features expected of complex signalling networks assembled from high-throughput data: a power law distribution of both node degree and edge annotations, and topological features of a "bow tie" architecture in which diverse pathways converge on a highly conserved set of enzymatic cascades focused around PI3K/AKT, MAPK/ERK, JAK/STAT, NFκB, and apoptotic signaling. Individual pathways exhibit "fuzzy" modularity that is statistically significant but still involving a majority of "cross-talk" interactions. However, we find that the most widely used pathway databases are highly inconsistent with respect to the actual constituents and interactions in this network. Using a set of growth factor signalling networks as examples (epidermal growth factor, transforming growth factor-beta, tumor necrosis factor, and wingless), we find a multiplicity of network topologies in which receptors couple to downstream components through myriad alternate paths. Many of these

  20. Solubility Database

    National Institute of Standards and Technology Data Gateway

    SRD 106 IUPAC-NIST Solubility Database (Web, free access)   These solubilities are compiled from 18 volumes (Click here for List) of the International Union for Pure and Applied Chemistry(IUPAC)-NIST Solubility Data Series. The database includes liquid-liquid, solid-liquid, and gas-liquid systems. Typical solvents and solutes include water, seawater, heavy water, inorganic compounds, and a variety of organic compounds such as hydrocarbons, halogenated hydrocarbons, alcohols, acids, esters and nitrogen compounds. There are over 67,500 solubility measurements and over 1800 references.

  1. Quantitative Proteogenomics and the Reconstruction of the Metabolic Pathway in Lactobacillus mucosae LM1

    PubMed Central

    Lee, Ji-Yoon

    2015-01-01

    Lactobacillus mucosae is a natural resident of the gastrointestinal tract of humans and animals and a potential probiotic bacterium. To understand the global protein expression profile and metabolic features of L. mucosae LM1 in the early stationary phase, the QExactiveTM Hybrid Quadrupole-Orbitrap Mass Spectrometer was used. Characterization of the intracellular proteome identified 842 proteins, accounting for approximately 35% of the 2,404 protein-coding sequences in the complete genome of L. mucosae LM1. Proteome quantification using QExactiveTM Orbitrap MS detected 19 highly abundant proteins (> 1.0% of the intracellular proteome), including CysK (cysteine synthase, 5.41%) and EF-Tu (elongation factor Tu, 4.91%), which are involved in cell survival against environmental stresses. Metabolic pathway annotation of LM1 proteome using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database showed that half of the proteins expressed are important for basic metabolic and biosynthetic processes, and the other half might be structurally important or involved in basic cellular processes. In addition, glycogen biosynthesis was activated in the early stationary phase, which is important for energy storage and maintenance. The proteogenomic data presented in this study provide a suitable reference to understand the protein expression pattern of lactobacilli in standard conditions. PMID:26761899

  2. Next Generation Sequencing and Transcriptome Analysis Predicts Biosynthetic Pathway of Sennosides from Senna (Cassia angustifolia Vahl.), a Non-Model Plant with Potent Laxative Properties.

    PubMed

    Rama Reddy, Nagaraja Reddy; Mehta, Rucha Harishbhai; Soni, Palak Harendrabhai; Makasana, Jayanti; Gajbhiye, Narendra Athamaram; Ponnuchamy, Manivel; Kumar, Jitendra

    2015-01-01

    Senna (Cassia angustifolia Vahl.) is a world's natural laxative medicinal plant. Laxative properties are due to sennosides (anthraquinone glycosides) natural products. However, little genetic information is available for this species, especially concerning the biosynthetic pathways of sennosides. We present here the transcriptome sequencing of young and mature leaf tissue of Cassia angustifolia using Illumina MiSeq platform that resulted in a total of 6.34 Gb of raw nucleotide sequence. The sequence assembly resulted in 42230 and 37174 transcripts with an average length of 1119 bp and 1467 bp for young and mature leaf, respectively. The transcripts were annotated using NCBI BLAST with 'green plant database (txid 33090)', Swiss Prot, Kyoto Encylcopedia of Genes & Genomes (KEGG), Cluster of Orthologous Gene (COG) and Gene Ontology (GO). Out of the total transcripts, 40138 (95.0%) and 36349 (97.7%) from young and mature leaf, respectively, were annotated by BLASTX against green plant database of NCBI. We used InterProscan to see protein similarity at domain level, a total of 34031 (young leaf) and 32077 (mature leaf) transcripts were annotated against the Pfam domains. All transcripts from young and mature leaf were assigned to 191 KEGG pathways. There were 166 and 159 CDS, respectively, from young and mature leaf involved in metabolism of terpenoids and polyketides. Many CDS encoding enzymes leading to biosynthesis of sennosides were identified. A total of 10,763 CDS differentially expressing in both young and mature leaf libraries of which 2,343 (21.7%) CDS were up-regulated in young compared to mature leaf. Several differentially expressed genes found functionally associated with sennoside biosynthesis. CDS encoding for many CYPs and TF families were identified having probable roles in metabolism of primary as well as secondary metabolites. We developed SSR markers for molecular breeding of senna. We have identified a set of putative genes involved in various

  3. Next Generation Sequencing and Transcriptome Analysis Predicts Biosynthetic Pathway of Sennosides from Senna (Cassia angustifolia Vahl.), a Non-Model Plant with Potent Laxative Properties

    PubMed Central

    Rama Reddy, Nagaraja Reddy; Mehta, Rucha Harishbhai; Soni, Palak Harendrabhai; Makasana, Jayanti; Gajbhiye, Narendra Athamaram; Ponnuchamy, Manivel; Kumar, Jitendra

    2015-01-01

    Senna (Cassia angustifolia Vahl.) is a world’s natural laxative medicinal plant. Laxative properties are due to sennosides (anthraquinone glycosides) natural products. However, little genetic information is available for this species, especially concerning the biosynthetic pathways of sennosides. We present here the transcriptome sequencing of young and mature leaf tissue of Cassia angustifolia using Illumina MiSeq platform that resulted in a total of 6.34 Gb of raw nucleotide sequence. The sequence assembly resulted in 42230 and 37174 transcripts with an average length of 1119 bp and 1467 bp for young and mature leaf, respectively. The transcripts were annotated using NCBI BLAST with ‘green plant database (txid 33090)’, Swiss Prot, Kyoto Encylcopedia of Genes & Genomes (KEGG), Cluster of Orthologous Gene (COG) and Gene Ontology (GO). Out of the total transcripts, 40138 (95.0%) and 36349 (97.7%) from young and mature leaf, respectively, were annotated by BLASTX against green plant database of NCBI. We used InterProscan to see protein similarity at domain level, a total of 34031 (young leaf) and 32077 (mature leaf) transcripts were annotated against the Pfam domains. All transcripts from young and mature leaf were assigned to 191 KEGG pathways. There were 166 and 159 CDS, respectively, from young and mature leaf involved in metabolism of terpenoids and polyketides. Many CDS encoding enzymes leading to biosynthesis of sennosides were identified. A total of 10,763 CDS differentially expressing in both young and mature leaf libraries of which 2,343 (21.7%) CDS were up-regulated in young compared to mature leaf. Several differentially expressed genes found functionally associated with sennoside biosynthesis. CDS encoding for many CYPs and TF families were identified having probable roles in metabolism of primary as well as secondary metabolites. We developed SSR markers for molecular breeding of senna. We have identified a set of putative genes involved in various

  4. Glaucoma database.

    PubMed

    K, Rangachari; M, Dhivya; Pj, Eswari Pandaranayaka; N, Prasanthi; P, Sundaresan; Sr, Krishnadas; S, Krishnaswamy

    2011-02-07

    Glaucoma, a complex heterogenous disease, is the leading cause for optic nerve-related blindness worldwide. Primary open angle glaucoma (POAG) is the most common subset and by the year 2020 it is estimated that approximately 60 million people will be affected. MYOC, OPTN, CYP1B1 and WDR36 are the important candidate genes. Nearly 4% of the glaucoma patients have mutation in any one of these genes. Mutation in any of these genes causes disease either directly or indirectly and the severity of the disease varies according to position of the genes. We have compiled all the related mutations and SNPs in the above genes and developed a database, to help access statistical and clinical information of particular mutation. This database is available online at http:bicmku.in:8081/glaucoma The database, constructed using SQL, contains data pertaining to the SNPs and mutation information involved in the above genes and relevant study data. The database is available for free at http:bicmku.in:8081/glaucoma.

  5. Changes in the Proteome of Langat-Infected Ixodes scapularis ISE6 Cells: Metabolic Pathways Associated with Flavivirus Infection

    PubMed Central

    Grabowski, Jeffrey M.; Perera, Rushika; Roumani, Ali M.; Hedrick, Victoria E.; Inerowicz, Halina D.; Hill, Catherine A.; Kuhn, Richard J.

    2016-01-01

    Background Ticks (Family Ixodidae) transmit a variety of disease causing agents to humans and animals. The tick-borne flaviviruses (TBFs; family Flaviviridae) are a complex of viruses, many of which cause encephalitis and hemorrhagic fever, and represent global threats to human health and biosecurity. Pathogenesis has been well studied in human and animal disease models. Equivalent analyses of tick-flavivirus interactions are limited and represent an area of study that could reveal novel approaches for TBF control. Methodology/Principal Findings High resolution LC-MS/MS was used to analyze the proteome of Ixodes scapularis (Lyme disease tick) embryonic ISE6 cells following infection with Langat virus (LGTV) and identify proteins associated with viral infection and replication. Maximal LGTV infection of cells and determination of peak release of infectious virus, was observed at 36 hours post infection (hpi). Proteins were extracted from ISE6 cells treated with LGTV and non-infectious (UV inactivated) LGTV at 36 hpi and analyzed by mass spectrometry. The Omics Discovery Pipeline (ODP) identified thousands of MS peaks. Protein homology searches against the I. scapularis IscaW1 genome assembly identified a total of 486 proteins that were subsequently assigned to putative functional pathways using searches against the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. 266 proteins were differentially expressed following LGTV infection relative to non-infected (mock) cells. Of these, 68 proteins exhibited increased expression and 198 proteins had decreased expression. The majority of the former were classified in the KEGG pathways: “translation”, “amino acid metabolism”, and “protein folding/sorting/degradation”. Finally, Trichostatin A and Oligomycin A increased and decreased LGTV replication in vitro in ISE6 cells, respectively. Conclusions/Significance Proteomic analyses revealed ISE6 proteins that were differentially expressed at the peak of LGTV

  6. Changes in the Proteome of Langat-Infected Ixodes scapularis ISE6 Cells: Metabolic Pathways Associated with Flavivirus Infection.

    PubMed

    Grabowski, Jeffrey M; Perera, Rushika; Roumani, Ali M; Hedrick, Victoria E; Inerowicz, Halina D; Hill, Catherine A; Kuhn, Richard J

    2016-02-01

    Ticks (Family Ixodidae) transmit a variety of disease causing agents to humans and animals. The tick-borne flaviviruses (TBFs; family Flaviviridae) are a complex of viruses, many of which cause encephalitis and hemorrhagic fever, and represent global threats to human health and biosecurity. Pathogenesis has been well studied in human and animal disease models. Equivalent analyses of tick-flavivirus interactions are limited and represent an area of study that could reveal novel approaches for TBF control. High resolution LC-MS/MS was used to analyze the proteome of Ixodes scapularis (Lyme disease tick) embryonic ISE6 cells following infection with Langat virus (LGTV) and identify proteins associated with viral infection and replication. Maximal LGTV infection of cells and determination of peak release of infectious virus, was observed at 36 hours post infection (hpi). Proteins were extracted from ISE6 cells treated with LGTV and non-infectious (UV inactivated) LGTV at 36 hpi and analyzed by mass spectrometry. The Omics Discovery Pipeline (ODP) identified thousands of MS peaks. Protein homology searches against the I. scapularis IscaW1 genome assembly identified a total of 486 proteins that were subsequently assigned to putative functional pathways using searches against the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. 266 proteins were differentially expressed following LGTV infection relative to non-infected (mock) cells. Of these, 68 proteins exhibited increased expression and 198 proteins had decreased expression. The majority of the former were classified in the KEGG pathways: "translation", "amino acid metabolism", and "protein folding/sorting/degradation". Finally, Trichostatin A and Oligomycin A increased and decreased LGTV replication in vitro in ISE6 cells, respectively. Proteomic analyses revealed ISE6 proteins that were differentially expressed at the peak of LGTV replication. Proteins with increased expression following infection were

  7. De Novo Transcriptome Analysis of Wing Development-Related Signaling Pathways in Locusta migratoria Manilensis and Ostrinia furnacalis (Guenée)

    PubMed Central

    Chu, Yuan; Zhang, Long; Shen, Jie; An, Chunju

    2014-01-01

    Background Orthopteran migratory locust, Locusta migratoria, and lepidopteran Asian corn borer, Ostrinia furnacalis, are two types of insects undergoing incomplete and complete metamorphosis, respectively. Identification of candidate genes regulating wing development in these two insects would provide insights into the further study about the molecular mechanisms controlling metamorphosis development. We have sequenced the transcriptome of O. furnacalis larvae previously. Here we sequenced and characterized the transcriptome of L. migratoria wing discs with special emphasis on wing development-related signaling pathways. Methodology/Principal Findings Illumina Hiseq2000 was used to sequence 8.38 Gb of the transcriptome from dissected nymphal wing discs. De novo assembly generated 91,907 unigenes with mean length of 610 nt. All unigenes were searched against five databases including Nt, Nr, Swiss-Prot, COG, and KEGG for annotations using blastn or blastx algorithm with an cut-off E-value of 10−5. A total of 23,359 (25.4%) unigenes have homologs within at least one database. Based on sequence similarity to homologs known to regulate Drosophila melanogaster wing development, we identified 50 and 46 potential wing development-related unigenes from L. migratoria and O. furnacalis transcriptome, respectively. The identified unigenes encode putative orthologs for nearly all components of the Hedgehog (Hh), Decapentaplegic (Dpp), Notch (N), and Wingless (Wg) signaling pathways, which are essential for growth and pattern formation during wing development. We investigated the expression profiles of the component genes involved in these signaling pathways in forewings and hind wings of L. migratoria and O. furnacalis. The results revealed the tested genes had different expression patterns in two insects. Conclusions/Significance This study provides the comprehensive sequence resource of the wing development-related signaling pathways of L. migratoria. The obtained data

  8. Drinking Water Treatability Database (Database)

    EPA Science Inventory

    The drinking Water Treatability Database (TDB) will provide data taken from the literature on the control of contaminants in drinking water, and will be housed on an interactive, publicly-available USEPA web site. It can be used for identifying effective treatment processes, rec...

  9. Reconstruction of biological pathways and metabolic networks from in silico labeled metabolites.

    PubMed

    Hadadi, Noushin; Hafner, Jasmin; Soh, Keng Cher; Hatzimanikatis, Vassily

    2017-01-01

    Reaction atom mappings track the positional changes of all of the atoms between the substrates and the products as they undergo the biochemical transformation. However, information on atom transitions in the context of metabolic pathways is not widely available in the literature. The understanding of metabolic pathways at the atomic level is of great importance as it can deconvolute the overlapping catabolic/anabolic pathways resulting in the observed metabolic phenotype. The automated identification of atom transitions within a metabolic network is a very challenging task since the degree of complexity of metabolic networks dramatically increases when we transit from metabolite-level studies to atom-level studies. Despite being studied extensively in various approaches, the field of atom mapping of metabolic networks is lacking an automated approach, which (i) accounts for the information of reaction mechanism for atom mapping and (ii) is extendable from individual atom-mapped reactions to atom-mapped reaction networks. Hereby, we introduce a computational framework, iAM.NICE (in silico Atom Mapped Network Integrated Computational Explorer), for the systematic atom-level reconstruction of metabolic networks from in silico labelled substrates. iAM.NICE is to our knowledge the first automated atom-mapping algorithm that is based on the underlying enzymatic biotransformation mechanisms, and its application goes beyond individual reactions and it can be used for the reconstruction of atom-mapped metabolic networks. We illustrate the applicability of our method through the reconstruction of atom-mapped reactions of the KEGG database and we provide an example of an atom-level representation of the core metabolic network of E. coli.

  10. Pathway deviation-based biomarker and multi-effect target identification in asbestos-related squamous cell carcinoma of the lung.

    PubMed

    Du, Jiang; Zhang, Lin

    2017-03-01

    Asbestos-related lung carcinoma is one of the most devastating occupational cancers, and effective techniques for early diagnosis are still lacking. In the present study, a systematic approach was applied to detect a potential biomarker for asbestos-related lung cancer (ARLC); in particular asbestos-related squamous cell carcinoma (ARLC-SCC). Microarray data (GSE23822) were retrieved from the Gene Expression Omnibus database, including 26 ARLC-SCCs and 30 non-asbestos-related squamous cell lung carcinomas (NARLC-SCCs). Differentially expressed genes (DEGs) were identified by the limma package, and then a protein-protein interaction (PPI) network was constructed according to the BioGRID and HPRD databases. A novel scoring approach integrating an expression deviation score and network degree of the gene was then proposed to weight the DEGs. Subsequently, the important genes were uploaded to DAVID for pathway enrichment analysis. Pathway correlation analysis was carried out using Spearman's rank correlation coefficient of the pathscore. In total, 1,333 DEGs, 391 upregulated and 942 downregulated, were obtained between the ARLC-SCCs and NARLC-SCCs. A total of 524 important genes for ARLC-SCC were significantly enriched in 22 KEGG pathways. Correlation analysis of these pathways showed that the pathway of SNARE interactions in vesicular transport was significantly correlated with 12 other pathways. Additionally, obvious correlations were found between multiple pathways by sharing cross-talk genes (EGFR, PRKX, PDGFB, PIK3R3, SLK, IGF1, CDC42 and PRKCA). On the whole, our data demonstrate that 8 cross-talk genes were found to bridge multiple ARLC-SCC-specific pathways, which may be used as candidate biomarkers and potential multi-effect targets. As these genes are involved in multiple pathways, it is possible that drugs targeting these genes may thus be able to influence multiple pathways simultaneously.

  11. The Comparative Toxicogenomics Database facilitates identification and understanding of chemical-gene-disease associations: arsenic as a case study

    PubMed Central

    Davis, Allan P; Murphy, Cynthia G; Rosenstein, Michael C; Wiegers, Thomas C; Mattingly, Carolyn J

    2008-01-01

    Background The etiology of many chronic diseases involves interactions between environmental factors and genes that modulate physiological processes. Understanding interactions between environmental chemicals and genes/proteins may provide insights into the mechanisms of chemical actions, disease susceptibility, toxicity, and therapeutic drug interactions. The Comparative Toxicogenomics Database (CTD; ) provides these insights by curating and integrating data describing relationships between chemicals, genes/proteins, and human diseases. To illustrate the scope and application of CTD, we present an analysis of curated data for the chemical arsenic. Arsenic represents a major global environmental health threat and is associated with many diseases. The mechanisms by which arsenic modulates these diseases are not well understood. Methods Curated interactions between arsenic compounds and genes were downloaded using export and batch query tools at CTD. The list of genes was analyzed for molecular interactions, Gene Ontology (GO) terms, KEGG pathway annotations, and inferred disease relationships. Results CTD contains curated data from the published literature describing 2,738 molecular interactions between 21 different arsenic compounds and 1,456 genes and proteins. Analysis of these genes and proteins provide insight into the biological functions and molecular networks that are affected by exposure to arsenic, including stress response, apoptosis, cell cycle, and specific protein signaling pathways. Integrating arsenic-gene data with gene-disease data yields a list of diseases that may be associated with arsenic exposure and genes that may explain this association. Conclusion CTD data integration and curation strategies yield insight into the actions of environmental chemicals and provide a basis for developing hypotheses about the molecular mechanisms underlying the etiology of environmental diseases. While many reports describe the molecular response to arsenic, CTD

  12. Transcriptome Analysis of Secondary Metabolism Pathway, Transcription Factors, and Transporters in Response to Methyl Jasmonate in Lycoris aurea

    PubMed Central

    Wang, Rong; Xu, Sheng; Wang, Ning; Xia, Bing; Jiang, Yumei; Wang, Ren

    2017-01-01

    Lycoris aurea, a medicinal species of the Amaryllidaceae family, is used in the practice of traditional Chinese medicine (TCM) because of its broad pharmacological activities of Amaryllidaceae alkaloids. Despite the officinal and economic importance of Lycoris species, the secondary mechanism for this species is relatively deficient. In this study, we attempted to characterize the transcriptome profiling of L. aurea seedlings with the methyl jasmonate (MeJA) treatment to uncover the molecular mechanisms regulating plant secondary metabolite pathway. By using short reads sequencing technology (Illumina), two sequencing cDNA libraries prepared from control (Con) and 100 μM MeJA-treated (MJ100) samples were sequenced. A total of 26,809,842 and 25,874,478 clean reads in the Con and MJ100 libraries, respectively, were obtained and assembled into 59,643 unigenes. Among them, 41,585 (69.72%) unigenes were annotated by basic local alignment search tool similarity searches against public sequence databases. These included 55 Gene Ontology (GO) terms, 128 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and 25 Clusters of Orthologous Groups (COG) families. Additionally, 4,175 differentially expressed genes (DEGs; false discovery rate ≤ 0.001 and |log2 Ratio| ≥ 1) with 2,291 up-regulated and 1,884 down-regulated, were found to be affected significantly under MeJA treatment. Subsequently, the DEGs encoding key enzymes involving in the secondary metabolite biosynthetic pathways, transcription factors, and transporter proteins were also analyzed and summarized. Meanwhile, we confirmed the altered expression levels of the unigenes that encode transporters and transcription factors using quantitative real-time PCR (qRT-PCR). With this transcriptome sequencing, future genetic and genomics studies related to the molecular mechanisms associated with the chemical composition of L. aurea may be improved. Additionally, the genes involved in the enrichment of secondary

  13. Transcriptomic analysis of the head kidney of Topmouth culter (Culter alburnus) infected with Flavobacterium columnare with an emphasis on phagosome pathway.

    PubMed

    Zhao, Lijuan; Tu, Jiagang; Zhang, Yulei; Wang, Jinfu; Yang, Ling; Wang, Weimin; Wu, Zaohe; Meng, Qinglei; Lin, Li

    2016-10-01

    Flavobacterium columnare (FC) has caused worldwide fish columnaris disease with high mortality and great economic losses in cultured fish, including Topmouth culter (Culter alburnus). However, the knowledge about the host factors involved in FC infection is little known. In this study, the transcriptomic profiles of the head kidney from Topmouth culter with or without FC infection were obtained using HiSeq™ 2500 (Illumina). Totally 79,641 unigenes with high quality were obtained. Among them, 4037 differently expressed genes, including 1217 up-regulated and 2820 down-regulated genes, were identified and enriched using databases of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). The differently expressed genes were mainly associated with pathways such as immune response, carbohydrate metabolism, amino acid metabolism, and lipid metabolism. Since phagocytosis is a central mechanism of innate immune response by host cells to defense against infectious agents, genes related to the phagosome pathway were scrutinized and 9 differently expressed phagosome-related genes were identified including 3 up-regulated and 6 down-regulated genes. Five of them were further validated by quantitative real-time polymerase chain reaction (qRT-PCR). This transcriptomic analysis of host genes in response to FC infection provides data towards understanding the infection mechanisms and will shed a new light on the prevention of columnaris.

  14. MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics

    SciTech Connect

    Jeffryes, James G.; Colastani, Ricardo L.; Elbadawi-Sidhu, Mona; Kind, Tobias; Niehaus, Thomas D.; Broadbelt, Linda J.; Hanson, Andrew D.; Fiehn, Oliver; Tyo, Keith E. J.; Henry, Christopher S.

    2015-08-28

    Metabolomics have proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results

  15. MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics

    DOE PAGES

    Jeffryes, James G.; Colastani, Ricardo L.; Elbadawi-Sidhu, Mona; ...

    2015-08-28

    Metabolomics have proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likelymore » to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose

  16. Atomic Databases

    NASA Astrophysics Data System (ADS)

    Mendoza, Claudio

    2000-10-01

    Atomic and molecular data are required in a variety of fields ranging from the traditional astronomy, atmospherics and fusion research to fast growing technologies such as lasers, lighting, low-temperature plasmas, plasma assisted etching and radiotherapy. In this context, there are some research groups, both theoretical and experimental, scattered round the world that attend to most of this data demand, but the implementation of atomic databases has grown independently out of sheer necessity. In some cases the latter has been associated with the data production process or with data centers involved in data collection and evaluation; but sometimes it has been the result of individual initiatives that have been quite successful. In any case, the development and maintenance of atomic databases call for a number of skills and an entrepreneurial spirit that are not usually associated with most physics researchers. In the present report we present some of the highlights in this area in the past five years and discuss what we think are some of the main issues that have to be addressed.

  17. Orchidstra: an integrated orchid functional genomics database.

    PubMed

    Su, Chun-lin; Chao, Ya-Ting; Yen, Shao-Hua; Chen, Chun-Yi; Chen, Wan-Chieh; Chang, Yao-Chien Alex; Shih, Ming-Che

    2013-02-01

    A specialized orchid database, named Orchidstra (URL: http://orchidstra.abrc.sinica.edu.tw), has been constructed to collect, annotate and share genomic information for orchid functional genomics studies. The Orchidaceae is a large family of Angiosperms that exhibits extraordinary biodiversity in terms of both the number of species and their distribution worldwide. Orchids exhibit many unique biological features; however, investigation of these traits is currently constrained due to the limited availability of genomic information. Transcriptome information for five orchid species and one commercial hybrid has been included in the Orchidstra database. Altogether, these comprise >380,000 non-redundant orchid transcript sequences, of which >110,000 are protein-coding genes. Sequences from the transcriptome shotgun assembly (TSA) were obtained either from output reads from next-generation sequencing technologies assembled into contigs, or from conventional cDNA library approaches. An annotation pipeline using Gene Ontology, KEGG and Pfam was built to assign gene descriptions and functional annotation to protein-coding genes. Deep sequencing of small RNA was also performed for Phalaenopsis aphrodite to search for microRNAs (miRNAs), extending the information archived for this species to miRNA annotation, precursors and putative target genes. The P. aphrodite transcriptome information was further used to design probes for an oligonucleotide microarray, and expression profiling analysis was carried out. The intensities of hybridized probes derived from microarray assays of various tissues were incorporated into the database as part of the functional evidence. In the future, the content of the Orchidstra database will be expanded with transcriptome data and genomic information from more orchid species.

  18. De novo Transcriptome Analysis of Sinapis alba in Revealing the Glucosinolate and Phytochelatin Pathways

    PubMed Central

    Zhang, Xiaohui; Liu, Tongjin; Duan, Mengmeng; Song, Jiangping; Li, Xixiang

    2016-01-01

    Sinapis alba is an important condiment crop and can also be used as a phytoremediation plant. Though it has important economic and agronomic values, sequence data, and the genetic tools are still rare in this plant. In the present study, a de novo transcriptome based on the transcriptions of leaves, stems, and roots was assembled for S. alba for the first time. The transcriptome contains 47,972 unigenes with a mean length of 1185 nt and an N50 of 1672 nt. Among these unigenes, 46,535 (97%) unigenes were annotated by at least one of the following databases: NCBI non-redundant (Nr), Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, Gene Ontology (GO), and Clusters of Orthologous Groups of proteins (COGs). The tissue expression pattern profiles revealed that 3489, 1361, and 8482 unigenes were predominantly expressed in the leaves, stems, and roots of S. alba, respectively. Genes predominantly expressed in the leaf were enriched in photosynthesis- and carbon fixation-related pathways. Genes predominantly expressed in the stem were enriched in not only pathways related to sugar, ether lipid, and amino acid metabolisms but also plant hormone signal transduction and circadian rhythm pathways, while the root-dominant genes were enriched in pathways related to lignin and cellulose syntheses, involved in plant-pathogen interactions, and potentially responsible for heavy metal chelating, and detoxification. Based on this transcriptome, 14,727 simple sequence repeats (SSRs) were identified, and 12,830 pairs of primers were developed for 2522 SSR-containing unigenes. Additionally, the glucosinolate (GSL) and phytochelatin metabolic pathways, which give the characteristic flavor and the heavy metal tolerance of this plant, were intensively analyzed. The genes of aliphatic GSLs pathway were predominantly expressed in roots. The absence of aliphatic GSLs in leaf tissues was due to the shutdown of BCAT4, MAM1, and CYP79F1 expressions. Glutathione was extensively

  19. Copy number variations and genome-wide associations reveal putative genes and metabolic pathways involved with the feed conversion ratio in beef cattle.

    PubMed

    de Almeida Santana, Miguel Henrique; Junior, Gerson Antônio Oliveira; Cesar, Aline Silva Mello; Freua, Mateus Castelani; da Costa Gomes, Rodrigo; da Luz E Silva, Saulo; Leme, Paulo Roberto; Fukumasu, Heidge; Carvalho, Minos Esperândio; Ventura, Ricardo Vieira; Coutinho, Luiz Lehmann; Kadarmideen, Haja N; Ferraz, José Bento Sterman

    2016-11-01

    The use of genome-wide association results combined with other genomic approaches may uncover genes and metabolic pathways related to complex traits. In this study, the phenotypic and genotypic data of 1475 Nellore (Bos indicus) cattle and 941,033 single nucleotide polymorphisms (SNPs) were used for genome-wide association study (GWAS) and copy number variations (CNVs) analysis in order to identify candidate genes and putative pathways involved with the feed conversion ratio (FCR). The GWAS was based on the Bayes B approach analyzing genomic windows with multiple regression models to estimate the proportion of genetic variance explained by each window. The CNVs were detected with PennCNV software using the log R ratio and B allele frequency data. CNV regions (CNVRs) were identified with CNVRuler and a linear regression was used to associate CNVRs and the FCR. Functional annotation of associated genomic regions was performed with the Database for Annotation, Visualization and Integrated Discovery (DAVID) and the metabolic pathways were obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG). We showed five genomic windows distributed over chromosomes 4, 6, 7, 8, and 24 that explain 12 % of the total genetic variance for FCR, and detected 12 CNVRs (chromosomes 1, 5, 7, 10, and 12) significantly associated [false discovery rate (FDR) < 0.05] with the FCR. Significant genomic regions (GWAS and CNV) harbor candidate genes involved in pathways related to energetic, lipid, and protein metabolism. The metabolic pathways found in this study are related to processes directly connected to feed efficiency in beef cattle. It was observed that, even though different genomic regions and genes were found between the two approaches (GWAS and CNV), the metabolic processes covered were related to each other. Therefore, a combination of the approaches complement each other and lead to a better understanding of the FCR.

  20. Stackfile Database

    NASA Technical Reports Server (NTRS)

    deVarvalho, Robert; Desai, Shailen D.; Haines, Bruce J.; Kruizinga, Gerhard L.; Gilmer, Christopher

    2013-01-01

    This software provides storage retrieval and analysis functionality for managing satellite altimetry data. It improves the efficiency and analysis capabilities of existing database software with improved flexibility and documentation. It offers flexibility in the type of data that can be stored. There is efficient retrieval either across the spatial domain or the time domain. Built-in analysis tools are provided for frequently performed altimetry tasks. This software package is used for storing and manipulating satellite measurement data. It was developed with a focus on handling the requirements of repeat-track altimetry missions such as Topex and Jason. It was, however, designed to work with a wide variety of satellite measurement data [e.g., Gravity Recovery And Climate Experiment -- GRACE). The software consists of several command-line tools for importing, retrieving, and analyzing satellite measurement data.

  1. De Novo transcriptome sequencing reveals important molecular networks and metabolic pathways of the plant, Chlorophytum borivilianum.

    PubMed

    Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

    2013-01-01

    Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum.

  2. De Novo Transcriptome Sequencing Reveals Important Molecular Networks and Metabolic Pathways of the Plant, Chlorophytum borivilianum

    PubMed Central

    Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

    2013-01-01

    Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum. PMID:24376689

  3. Genomic Contributors to Rhythm Outcome of Atrial Fibrillation Catheter Ablation – Pathway Enrichment Analysis of GWAS Data

    PubMed Central

    Ueberham, Laura; Dinov, Borislav; Sommer, Philipp; Arya, Arash; Hindricks, Gerhard; Bollmann, Andreas

    2016-01-01

    Background Left atrial enlargement and persistent atrial fibrillation (AF) are well-known predictors for arrhythmia recurrence after AF catheter ablation (LRAF). In this study, by using pathway enrichment analysis of GWAS data, we tested the hypothesis that genetic pathways associated with these phenotypes are also associated with LRAF. Methods Samples from 660 patients with paroxysmal (n = 370) or persistent AF (n = 290) undergoing de-novo AF catheter ablation were genotyped for ~1,000,000 SNPs. SNPs found to be significantly associated with left atrial diameter (LAD) or AF type were used for gene-based association tests in a systematic biological Knowledge-based mining system for Genome-wide Genetic studies (KGG). Associated genes were tested for pathway enrichment using WEB-based Gene SeT AnaLysis Toolkit (WebGestalt), the Gene Annotation Tool to Help Explain Relationships (GATHER) and the databases provided by Kyoto Encyclopedia of Genes and Genomes (KEGG). In a second step, the association of consistently enriched pathways and LRAF was tested. Results By using sequential 7-day Holter ECGs, LRAF between 3 and 12 months was observed in 48% and was associated with LAD (B = 1.801, 95% CI 0.760–2.841, p = 1.0E-3) and persistent AF (OR = 2.1; 95% CI 1.567–2.931, p = 2.0E-6). WebGestalt (adj. p = 2.7E-22) and GATHER (adj. p = 5.2E-3) identified the calcium signaling pathway (hsa04020) as the only consistently enriched pathway for LAD, while the extracellular matrix (ECM) -receptor interaction pathway (hsa04512) was the only consistently enriched pathway for AF type (adj. p = 2.1E-15 in WebGestalt; adj. p = 9.3E-4 in GATHER). Both calcium signaling (adj. p = 2.2E-17 in WebGestalt; adj. p = 2.9E-2 in GATHER) and ECM-receptor interaction (adj. p = 1.2E-10 in WebGestalt; adj. p = 2.9E-2 in GATHER) were significantly associated with LRAF. Conclusions Calcium signaling and ECM-receptor interaction pathways are associated with LAD and AF type and, in turn, with LRAF

  4. OxDBase: a database of oxygenases involved in biodegradation

    PubMed Central

    Arora, Pankaj K; Kumar, Manish; Chauhan, Archana; Raghava, Gajendra PS; Jain, Rakesh K

    2009-01-01

    Background Oxygenases belong to the oxidoreductive group of enzymes (E.C. Class 1), which oxidize the substrates by transferring oxygen from molecular oxygen (O2) and utilize FAD/NADH/NADPH as the co-substrate. Oxygenases can further be grouped into two categories i.e. monooxygenases and dioxygenases on the basis of number of oxygen atoms used for oxidation. They play a key role in the metabolism of organic compounds by increasing their reactivity or water solubility or bringing about cleavage of the aromatic ring. Findings We compiled a database of biodegradative oxygenases (OxDBase) which provides a compilation of the oxygenase data as sourced from primary literature in the form of web accessible database. There are two separate search engines for searching into the database i.e. mono and dioxygenases database respectively. Each enzyme entry contains its common name and synonym, reaction in which enzyme is involved, family and subfamily, structure and gene link and literature citation. The entries are also linked to several external database including BRENDA, KEGG, ENZYME and UM-BBD providing wide background information. At present the database contains information of over 235 oxygenases including both dioxygenases and monooxygenases. This database is freely available online at . Conclusion OxDBase is the first database that is dedicated only to oxygenases and provides comprehensive information about them. Due to the importance of the oxygenases in chemical synthesis of drug intermediates and oxidation of xenobiotic compounds, OxDBase database would be very useful tool in the field of synthetic chemistry as well as bioremediation. PMID:19405962

  5. Predicting the diagnosis of autism spectrum disorder using gene pathway analysis.

    PubMed

    Skafidas, E; Testa, R; Zantomio, D; Chana, G; Everall, I P; Pantelis, C

    2014-04-01

    Autism spectrum disorder (ASD) depends on a clinical interview with no biomarkers to aid diagnosis. The current investigation interrogated single-nucleotide polymorphisms (SNPs) of individuals with ASD from the Autism Genetic Resource Exchange (AGRE) database. SNPs were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG)-derived pathways to identify affected cellular processes and develop a diagnostic test. This test was then applied to two independent samples from the Simons Foundation Autism Research Initiative (SFARI) and Wellcome Trust 1958 normal birth cohort (WTBC) for validation. Using AGRE SNP data from a Central European (CEU) cohort, we created a genetic diagnostic classifier consisting of 237 SNPs in 146 genes that correctly predicted ASD diagnosis in 85.6% of CEU cases. This classifier also predicted 84.3% of cases in an ethnically related Tuscan cohort; however, prediction was less accurate (56.4%) in a genetically dissimilar Han Chinese cohort (HAN). Eight SNPs in three genes (KCNMB4, GNAO1, GRM5) had the largest effect in the classifier with some acting as vulnerability SNPs, whereas others were protective. Prediction accuracy diminished as the number of SNPs analyzed in the model was decreased. Our diagnostic classifier correctly predicted ASD diagnosis with an accuracy of 71.7% in CEU individuals from the SFARI (ASD) and WTBC (controls) validation data sets. In conclusion, we have developed an accurate diagnostic test for a genetically homogeneous group to aid in early detection of ASD. While SNPs differ across ethnic groups, our pathway approach identified cellular processes common to ASD across ethnicities. Our results have wide implications for detection, intervention and prevention of ASD.

  6. Comparison of Metabolic Pathways in Escherichia coli by Using Genetic Algorithms

    PubMed Central

    Ortegon, Patricia; Poot-Hernández, Augusto C.; Perez-Rueda, Ernesto; Rodriguez-Vazquez, Katya

    2015-01-01

    In order to understand how cellular metabolism has taken its modern form, the conservation and variations between metabolic pathways were evaluated by using a genetic algorithm (GA). The GA approach considered information on the complete metabolism of the bacterium Escherichia coli K-12, as deposited in the KEGG database, and the enzymes belonging to a particular pathway were transformed into enzymatic step sequences by using the breadth-first search algorithm. These sequences represent contiguous enzymes linked to each other, based on their catalytic activities as they are encoded in the Enzyme Commission numbers. In a posterior step, these sequences were compared using a GA in an all-against-all (pairwise comparisons) approach. Individual reactions were chosen based on their measure of fitness to act as parents of offspring, which constitute the new generation. The sequences compared were used to construct a similarity matrix (of fitness values) that was then considered to be clustered by using a k-medoids algorithm. A total of 34 clusters of conserved reactions were obtained, and their sequences were finally aligned with a multiple-sequence alignment GA optimized to align all the reaction sequences included in each group or cluster. From these comparisons, maps associated with the metabolism of similar compounds also contained similar enzymatic step sequences, reinforcing the Patchwork Model for the evolution of metabolism in E. coli K-12, an observation that can be expanded to other organisms, for which there is metabolism information. Finally, our mapping of these reactions is discussed, with illustrations from a particular case. PMID:25973143

  7. Comparison of Metabolic Pathways in Escherichia coli by Using Genetic Algorithms.

    PubMed

    Ortegon, Patricia; Poot-Hernández, Augusto C; Perez-Rueda, Ernesto; Rodriguez-Vazquez, Katya

    2015-01-01

    In order to understand how cellular metabolism has taken its modern form, the conservation and variations between metabolic pathways were evaluated by using a genetic algorithm (GA). The GA approach considered information on the complete metabolism of the bacterium Escherichia coli K-12, as deposited in the KEGG database, and the enzymes belonging to a particular pathway were transformed into enzymatic step sequences by using the breadth-first search algorithm. These sequences represent contiguous enzymes linked to each other, based on their catalytic activities as they are encoded in the Enzyme Commission numbers. In a posterior step, these sequences were compared using a GA in an all-against-all (pairwise comparisons) approach. Individual reactions were chosen based on their measure of fitness to act as parents of offspring, which constitute the new generation. The sequences compared were used to construct a similarity matrix (of fitness values) that was then considered to be clustered by using a k-medoids algorithm. A total of 34 clusters of conserved reactions were obtained, and their sequences were finally aligned with a multiple-sequence alignment GA optimized to align all the reaction sequences included in each group or cluster. From these comparisons, maps associated with the metabolism of similar compounds also contained similar enzymatic step sequences, reinforcing the Patchwork Model for the evolution of metabolism in E. coli K-12, an observation that can be expanded to other organisms, for which there is metabolism information. Finally, our mapping of these reactions is discussed, with illustrations from a particular case.

  8. De novo assembly of the Carcinus maenas transcriptome and characterization of innate immune system pathways.

    PubMed

    Verbruggen, Bas; Bickley, Lisa K; Santos, Eduarda M; Tyler, Charles R; Stentiford, Grant D; Bateman, Kelly S; van Aerle, Ronny

    2015-06-16

    The European shore crab, Carcinus maenas, is used widely in biomonitoring, ecotoxicology and for studies into host-pathogen interactions. It is also an important invasive species in numerous global locations. However, the genomic resources for this organism are still sparse, limiting research progress in these fields. To address this resource shortfall we produced a C. maenas transcriptome, enabled by the progress in next-generation sequencing technologies, and applied this to assemble information on the innate immune system in this species. We isolated and pooled RNA for twelve different tissues and organs from C. maenas individuals and sequenced the RNA using next generation sequencing on an Illumina HiSeq 2500 platform. After de novo assembly a transcriptome was generated encompassing 212,427 transcripts (153,699 loci). The transcripts were filtered, annotated and characterised using a variety of tools (including BLAST, MEGAN and RSEM) and databases (including NCBI, Gene Ontology and KEGG). There were differential patterns of expression for between 1,223 and 2,741 transcripts across tissues and organs with over-represented Gene Ontology terms relating to their specific function. Based on sequence homology to immune system components in other organisms, we show both the presence of transcripts for a series of known pathogen recognition receptors and response proteins that form part of the innate immune system, and transcripts representing the RNAi, Toll-like receptor signalling, IMD and JAK/STAT pathways. We have produced an assembled transcriptome for C. maenas that provides a significant molecular resource for wide ranging studies in this species. Analysis of the transcriptome has revealed the presence of a series of known targets and functional pathways that form part of their innate immune system and illustrate tissue specific differences in their expression patterns.

  9. Predicting the diagnosis of autism spectrum disorder using gene pathway analysis

    PubMed Central

    Skafidas, E; Testa, R; Zantomio, D; Chana, G; Everall, I P; Pantelis, C

    2014-01-01

    Autism spectrum disorder (ASD) depends on a clinical interview with no biomarkers to aid diagnosis. The current investigation interrogated single-nucleotide polymorphisms (SNPs) of individuals with ASD from the Autism Genetic Resource Exchange (AGRE) database. SNPs were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG)-derived pathways to identify affected cellular processes and develop a diagnostic test. This test was then applied to two independent samples from the Simons Foundation Autism Research Initiative (SFARI) and Wellcome Trust 1958 normal birth cohort (WTBC) for validation. Using AGRE SNP data from a Central European (CEU) cohort, we created a genetic diagnostic classifier consisting of 237 SNPs in 146 genes that correctly predicted ASD diagnosis in 85.6% of CEU cases. This classifier also predicted 84.3% of cases in an ethnically related Tuscan cohort; however, prediction was less accurate (56.4%) in a genetically dissimilar Han Chinese cohort (HAN). Eight SNPs in three genes (KCNMB4, GNAO1, GRM5) had the largest effect in the classifier with some acting as vulnerability SNPs, whereas others were protective. Prediction accuracy diminished as the number of SNPs analyzed in the model was decreased. Our diagnostic classifier correctly predicted ASD diagnosis with an accuracy of 71.7% in CEU individuals from the SFARI (ASD) and WTBC (controls) validation data sets. In conclusion, we have developed an accurate diagnostic test for a genetically homogeneous group to aid in early detection of ASD. While SNPs differ across ethnic groups, our pathway approach identified cellular processes common to ASD across ethnicities. Our results have wide implications for detection, intervention and prevention of ASD. PMID:22965006

  10. Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords.

    PubMed

    Luque-Baena, R M; Urda, D; Gonzalo Claros, M; Franco, L; Jerez, J M

    2014-06-01

    Genetic algorithms are widely used in the estimation of expression profiles from microarrays data. However, these techniques are unable to produce stable and robust solutions suitable to use in clinical and biomedical studies. This paper presents a novel two-stage evolutionary strategy for gene feature selection combining the genetic algorithm with biological information extracted from the KEGG database. A comparative study is carried out over public data from three different types of cancer (leukemia, lung cancer and prostate cancer). Even though the analyses only use features having KEGG information, the results demonstrate that this two-stage evolutionary strategy increased the consistency, robustness and accuracy of a blind discrimination among relapsed and healthy individuals. Therefore, this approach could facilitate the definition of gene signatures for the clinical prognosis and diagnostic of cancer diseases in a near future. Additionally, it could also be used for biological knowledge discovery about the studied disease. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. Gene microarray assessment of multiple genes and signal pathways involved in androgen-dependent prostate cancer becoming androgen independent.

    PubMed

    Liu, Jun-Bao; Dai, Chun-Mei; Su, Xiao-Yun; Cao, Lu; Qin, Rui; Kong, Qing-Bo

    2014-01-01

    To study the gene expression change and possible signal pathway during androgen-dependent prostate cancer (ADPC) becoming androgen-independent prostate cancer (AIPC), an LNCaP cell model of AIPC was established using flutamide in combination with androgen-free environment inducement, and differential expression genes were screened by microarray. Then the biological process, molecular function and KEGG pathway of differential expression genes are analyzed by Molecule Annotation System (MAS). By comparison of 12,207 expression genes, 347 expression genes were acquired, of which 156 were up-ragulated and 191 down-regulated. After analyzing the biological process and molecule function of differential expression genes, these genes are found to play crucial roles in cell proliferation, differntiation, cell cycle control, protein metabolism and modification and other biological process, serve as signal molecules, enzymes, peptide hormones, cytokines, cytoskeletal proteins and adhesion molecules. The analysis of KEGG show that the relevant genes of AIPC transformation participate in glutathione metabolism, cell cycle, P53 signal pathway, cytochrome P450 metabolism, Hedgehog signal pathway, MAPK signal pathway, adipocytokines signal pathway, PPAR signal pathway, TGF-β signal pathway and JAK-STAT signal pathway. In conclusion, during the process of ADPC becoming AIPC, it is not only one specific gene or pathway, but multiple genes and pathways that change. The findings above lay the foundation for study of AIPC mechanism and development of AIPC targeting drugs.

  12. Comparative study on gene set and pathway topology-based enrichment methods.

    PubMed

    Bayerlová, Michaela; Jung, Klaus; Kramer, Frank; Klemm, Florian; Bleckmann, Annalen; Beißbarth, Tim

    2015-10-22

    Enrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis. We comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. In the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower. We conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both

  13. Transcriptome profiling shows gene regulation patterns in a flavonoid pathway in response to exogenous phenylalanine in Boesenbergia rotunda cell culture.

    PubMed

    Md-Mustafa, Noor Diyana; Khalid, Norzulaani; Gao, Huan; Peng, Zhiyu; Alimin, Mohd Firdaus; Bujang, Noraini; Ming, Wong Sher; Mohd-Yusuf, Yusmin; Harikrishna, Jennifer A; Othman, Rofina Yasmin

    2014-11-18

    Panduratin A extracted from Boesenbergia rotunda is a flavonoid reported to possess a range of medicinal indications which include anti-dengue, anti-HIV, anti-cancer, antioxidant and anti-inflammatory properties. Boesenbergia rotunda is a plant from the Zingiberaceae family commonly used as a food ingredient and traditional medicine in Southeast Asia and China. Reports on the health benefits of secondary metabolites extracted from Boesenbergia rotunda over the last few years has resulted in rising demands for panduratin A. However large scale extraction has been hindered by the naturally low abundance of the compound and limited knowledge of its biosynthetic pathway. Transcriptome sequencing and digital gene expression (DGE) analysis of native and phenylalanine treated Boesenbergia rotunda cell suspension cultures were carried out to elucidate the key genes differentially expressed in the panduratin A biosynthetic pathway. Based on experiments that show increase in panduratin A production after 14 days post treatment with exogenous phenylalanine, an aromatic amino acid derived from the shikimic acid pathway, total RNA of untreated and 14 days post-phenylalanine treated cell suspension cultures were extracted and sequenced using next generation sequencing technology employing an Illumina-Solexa platform. The transcriptome data generated 101, 043 unigenes with 50, 932 (50.41%) successfully annotated in the public protein databases; including 49.93% (50, 447) in the non-redundant (NR) database, 34.63% (34, 989) in Swiss-Prot, 24,07% (24, 316) in Kyoto Encyclopedia of Genes and Genomes (KEGG) and 16.26% (16, 426) in Clusters of Orthologous Groups (COG). Through DGE analysis, we found that 14, 644 unigenes were up-regulated and 14, 379 unigenes down-regulated in response to exogenous phenylalanine treatment. In the phenylpropanoid pathway leading to the proposed panduratin A production, 2 up-regulated phenylalanine ammonia-lyase (PAL), 3 up-regulated 4-coumaroyl

  14. The MetaCyc Database.

    PubMed

    Karp, Peter D; Riley, Monica; Paley, Suzanne M; Pellegrini-Toole, Alida

    2002-01-01

    MetaCyc is a metabolic-pathway database that describes 445 pathways and 1115 enzymes occurring in 158 organisms. MetaCyc is a review-level database in that a given entry in MetaCyc often integrates information from multiple literature sources. The pathways in MetaCyc were determined experimentally, and are labeled with the species in which they are known to occur based on literature references examined to date. MetaCyc contains extensive commentary and literature citations. Applications of MetaCyc include pathway analysis of genomes, metabolic engineering and biochemistry education. MetaCyc is queried using the Pathway Tools graphical user interface, which provides a wide variety of query operations and visualization tools. MetaCyc is available via the World Wide Web at http://ecocyc.org/ecocyc/metacyc.html, and is available for local installation as a binary program for the PC and the Sun workstation, and as a set of flatfiles. Contact metacyc-info@ai.sri.com for information on obtaining a local copy of MetaCyc.

  15. Core Proteomic Analysis of Unique Metabolic Pathways of Salmonella enterica for the Identification of Potential Drug Targets

    PubMed Central

    2016-01-01

    Background Infections caused by Salmonella enterica, a Gram-negative facultative anaerobic bacteria belonging to the family of Enterobacteriaceae, are major threats to the health of humans and animals. The recent availability of complete genome data of pathogenic strains of the S. enterica gives new avenues for the identification of drug targets and drug candidates. We have used the genomic and metabolic pathway data to identify pathways and proteins essential to the pathogen and absent from the host. Methods We took the whole proteome sequence data of 42 strains of S. enterica and Homo sapiens along with KEGG-annotated metabolic pathway data, clustered proteins sequences using CD-HIT, identified essential genes using DEG database and discarded S. enterica homologs of human proteins in unique metabolic pathways (UMPs) and characterized hypothetical proteins with SVM-prot and InterProScan. Through this core proteomic analysis we have identified enzymes essential to the pathogen. Results The identification of 73 enzymes common in 42 strains of S. enterica is the real strength of the current study. We proposed all 73 unexplored enzymes as potential drug targets against the infections caused by the S. enterica. The study is comprehensive around S. enterica and simultaneously considered every possible pathogenic strain of S. enterica. This comprehensiveness turned the current study significant since, to the best of our knowledge it is the first subtractive core proteomic analysis of the unique metabolic pathways applied to any pathogen for the identification of drug targets. We applied extensive computational methods to shortlist few potential drug targets considering the druggability criteria e.g. Non-homologous to the human host, essential to the pathogen and playing significant role in essential metabolic pathways of the pathogen (i.e. S. enterica). In the current study, the subtractive proteomics through a novel approach was applied i.e. by considering only proteins

  16. Core Proteomic Analysis of Unique Metabolic Pathways of Salmonella enterica for the Identification of Potential Drug Targets.

    PubMed

    Uddin, Reaz; Sufian, Muhammad

    2016-01-01

    Infections caused by Salmonella enterica, a Gram-negative facultative anaerobic bacteria belonging to the family of Enterobacteriaceae, are major threats to the health of humans and animals. The recent availability of complete genome data of pathogenic strains of the S. enterica gives new avenues for the identification of drug targets and drug candidates. We have used the genomic and metabolic pathway data to identify pathways and proteins essential to the pathogen and absent from the host. We took the whole proteome sequence data of 42 strains of S. enterica and Homo sapiens along with KEGG-annotated metabolic pathway data, clustered proteins sequences using CD-HIT, identified essential genes using DEG database and discarded S. enterica homologs of human proteins in unique metabolic pathways (UMPs) and characterized hypothetical proteins with SVM-prot and InterProScan. Through this core proteomic analysis we have identified enzymes essential to the pathogen. The identification of 73 enzymes common in 42 strains of S. enterica is the real strength of the current study. We proposed all 73 unexplored enzymes as potential drug targets against the infections caused by the S. enterica. The study is comprehensive around S. enterica and simultaneously considered every possible pathogenic strain of S. enterica. This comprehensiveness turned the current study significant since, to the best of our knowledge it is the first subtractive core proteomic analysis of the unique metabolic pathways applied to any pathogen for the identification of drug targets. We applied extensive computational methods to shortlist few potential drug targets considering the druggability criteria e.g. Non-homologous to the human host, essential to the pathogen and playing significant role in essential metabolic pathways of the pathogen (i.e. S. enterica). In the current study, the subtractive proteomics through a novel approach was applied i.e. by considering only proteins of the unique metabolic

  17. Overlap in Bibliographic Databases.

    ERIC Educational Resources Information Center

    Hood, William W.; Wilson, Concepcion S.

    2003-01-01

    Examines the topic of Fuzzy Set Theory to determine the overlap of coverage in bibliographic databases. Highlights include examples of comparisons of database coverage; frequency distribution of the degree of overlap; records with maximum overlap; records unique to one database; intra-database duplicates; and overlap in the top ten databases.…

  18. JICST Factual Database JICST DNA Database

    NASA Astrophysics Data System (ADS)

    Shirokizawa, Yoshiko; Abe, Atsushi

    Japan Information Center of Science and Technology (JICST) has started the on-line service of DNA database in October 1988. This database is composed of EMBL Nucleotide Sequence Library and Genetic Sequence Data Bank. The authors outline the database system, data items and search commands. Examples of retrieval session are presented.

  19. Detection of driver pathways using mutated gene network in cancer.

    PubMed

    Li, Feng; Gao, Lin; Ma, Xiaoke; Yang, Xiaofei

    2016-06-21

    Distinguishing driver pathways has been extensively studied because they are critical for understanding the development and molecular mechanisms of cancers. Most existing methods for driver pathways are based on high coverage as well as high mutual exclusivity, with the underlying assumption that mutations are exclusive. However, in many cases, mutated driver genes in the same pathways are not strictly mutually exclusive. Based on this observation, we propose an index for quantifying mutual exclusivity between gene pairs. Then, we construct a mutated gene network for detecting driver pathways by integrating the proposed index and coverage. The detection of driver pathways on the mutated gene network consists of two steps: raw pathways are obtained using a CPM method, and the final driver pathways are selected using a strict testing strategy. We apply this method to glioblastoma and breast cancers and find that our method is more accurate than state-of-the-art methods in terms of enrichment of KEGG pathways. Furthermore, the detected driver pathways intersect with well-known pathways with moderate exclusivity, which cannot be discovered using the existing algorithms. In conclusion, the proposed method provides an effective way to investigate driver pathways in cancers.

  20. De novo assembly and transcriptome analysis of the rubber tree (Hevea brasiliensis) and SNP markers development for rubber biosynthesis pathways.

    PubMed

    Mantello, Camila Campos; Cardoso-Silva, Claudio Benicio; da Silva, Carla Cristina; de Souza, Livia Moura; Scaloppi Junior, Erivaldo José; de Souza Gonçalves, Paulo; Vicentini, Renato; de Souza, Anete Pereira

    2014-01-01

    Hevea brasiliensis (Willd. Ex Adr. Juss.) Muell.-Arg. is the primary source of natural rubber that is native to the Amazon rainforest. The singular properties of natural rubber make it superior to and competitive with synthetic rubber for use in several applications. Here, we performed RNA sequencing (RNA-seq) of H. brasiliensis bark on the Illumina GAIIx platform, which generated 179,326,804 raw reads on the Illumina GAIIx platform. A total of 50,384 contigs that were over 400 bp in size were obtained and subjected to further analyses. A similarity search against the non-redundant (nr) protein database returned 32,018 (63%) positive BLASTx hits. The transcriptome analysis was annotated using the clusters of orthologous groups (COG), gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Pfam databases. A search for putative molecular marker was performed to identify simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). In total, 17,927 SSRs and 404,114 SNPs were detected. Finally, we selected sequences that were identified as belonging to the mevalonate (MVA) and 2-C-methyl-D-erythritol 4-phosphate (MEP) pathways, which are involved in rubber biosynthesis, to validate the SNP markers. A total of 78 SNPs were validated in 36 genotypes of H. brasiliensis. This new dataset represents a powerful information source for rubber tree bark genes and will be an important tool for the development of microsatellites and SNP markers for use in future genetic analyses such as genetic linkage mapping, quantitative trait loci identification, investigations of linkage disequilibrium and marker-assisted selection.

  1. De Novo Assembly and Transcriptome Analysis of the Rubber Tree (Hevea brasiliensis) and SNP Markers Development for Rubber Biosynthesis Pathways

    PubMed Central

    Mantello, Camila Campos; Cardoso-Silva, Claudio Benicio; da Silva, Carla Cristina; de Souza, Livia Moura; Scaloppi Junior, Erivaldo José; de Souza Gonçalves, Paulo; Vicentini, Renato; de Souza, Anete Pereira

    2014-01-01

    Hevea brasiliensis (Willd. Ex Adr. Juss.) Muell.-Arg. is the primary source of natural rubber that is native to the Amazon rainforest. The singular properties of natural rubber make it superior to and competitive with synthetic rubber for use in several applications. Here, we performed RNA sequencing (RNA-seq) of H. brasiliensis bark on the Illumina GAIIx platform, which generated 179,326,804 raw reads on the Illumina GAIIx platform. A total of 50,384 contigs that were over 400 bp in size were obtained and subjected to further analyses. A similarity search against the non-redundant (nr) protein database returned 32,018 (63%) positive BLASTx hits. The transcriptome analysis was annotated using the clusters of orthologous groups (COG), gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Pfam databases. A search for putative molecular marker was performed to identify simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). In total, 17,927 SSRs and 404,114 SNPs were detected. Finally, we selected sequences that were identified as belonging to the mevalonate (MVA) and 2-C-methyl-D-erythritol 4-phosphate (MEP) pathways, which are involved in rubber biosynthesis, to validate the SNP markers. A total of 78 SNPs were validated in 36 genotypes of H. brasiliensis. This new dataset represents a powerful information source for rubber tree bark genes and will be an important tool for the development of microsatellites and SNP markers for use in future genetic analyses such as genetic linkage mapping, quantitative trait loci identification, investigations of linkage disequilibrium and marker-assisted selection. PMID:25048025

  2. SpirPro: A Spirulina proteome database and web-based tools for the analysis of protein-protein interactions at the metabolic level in Spirulina (Arthrospira) platensis C1.

    PubMed

    Senachak, Jittisak; Cheevadhanarak, Supapon; Hongsthong, Apiradee

    2015-07-29

    Spirulina (Arthrospira) platensis is the only cyanobacterium that in addition to being studied at the molecular level and subjected to gene manipulation, can also be mass cultivated in outdoor ponds for commercial use as a food supplement. Thus, encountering environmental changes, including temperature stresses, is common during the mass production of Spirulina. The use of cyanobacteria as an experimental platform, especially for photosynthetic gene manipulation in plants and bacteria, is becoming increasingly important. Understanding the mechanisms and protein-protein interaction networks that underlie low- and high-temperature responses is relevant to Spirulina mass production. To accomplish this goal, high-throughput techniques such as OMICs analyses are used. Thus, large datasets must be collected, managed and subjected to information extraction. Therefore, databases including (i) proteomic analysis and protein-protein interaction (PPI) data and (ii) domain/motif visualization tools are required for potential use in temperature response models for plant chloroplasts and photosynthetic bacteria. A web-based repository was developed including an embedded database, SpirPro, and tools for network visualization. Proteome data were analyzed integrated with protein-protein interactions and/or metabolic pathways from KEGG. The repository provides various information, ranging from raw data (2D-gel images) to associated results, such as data from interaction and/or pathway analyses. This integration allows in silico analyses of protein-protein interactions affected at the metabolic level and, particularly, analyses of interactions between and within the affected metabolic pathways under temperature stresses for comparative proteomic analysis. The developed tool, which is coded in HTML with CSS/JavaScript and depicted in Scalable Vector Graphics (SVG), is designed for interactive analysis and exploration of the constructed network. SpirPro is publicly available on the web

  3. Dietary Supplement Ingredient Database

    MedlinePlus

    ... and US Department of Agriculture Dietary Supplement Ingredient Database Toggle navigation Menu Home About DSID Mission Current ... values can be saved to build a small database or add to an existing database for national, ...

  4. Integrative data mining of high-throughput in vitro screens, in vivo data, and disease information to identify Adverse Outcome Pathway (AOP) signatures:ToxCast high-throughput screening data and Comparative Toxicogenomics Database (CTD) as a case study.

    EPA Science Inventory

    The Adverse Outcome Pathway (AOP) framework provides a systematic way to describe linkages between molecular and cellular processes and organism or population level effects. The current AOP assembly methods however, are inefficient. Our goal is to generate computationally-pr...

  5. Integrative data mining of high-throughput in vitro screens, in vivo data, and disease information to identify Adverse Outcome Pathway (AOP) signatures:ToxCast high-throughput screening data and Comparative Toxicogenomics Database (CTD) as a case study.

    EPA Science Inventory

    The Adverse Outcome Pathway (AOP) framework provides a systematic way to describe linkages between molecular and cellular processes and organism or population level effects. The current AOP assembly methods however, are inefficient. Our goal is to generate computationally-pr...

  6. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    SciTech Connect

    Shi, CY; Yang, H; Wei, CL; Yu, O; Zhang, ZZ; Sun, J; Wan, XC

    2011-01-01

    Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Using high-throughput Illumina RNA-seq, the transcriptome from poly (A){sup +} RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real

  7. Inference of miRNA targets using evolutionary conservation and pathway analysis

    PubMed Central

    Gaidatzis, Dimos; van Nimwegen, Erik; Hausser, Jean; Zavolan, Mihaela

    2007-01-01

    Background MicroRNAs have emerged as important regulatory genes in a variety of cellular processes and, in recent years, hundreds of such genes have been discovered in animals. In contrast, functional annotations are available only for a very small fraction of these miRNAs, and even in these cases only partially. Results We developed a general Bayesian method for the inference of miRNA target sites, in which, for each miRNA, we explicitly model the evolution of orthologous target sites in a set of related species. Using this method we predict target sites for all known miRNAs in flies, worms, fish, and mammals. By comparing our predictions in fly with a reference set of experimentally tested miRNA-mRNA interactions we show that our general method performs at least as well as the most accurate methods available to date, including ones specifically tailored for target prediction in fly. An important novel feature of our model is that it explicitly infers the phylogenetic distribution of functional target sites, independently for each miRNA. This allows us to infer species-specific and clade-specific miRNA targeting. We also show that, in long human 3' UTRs, miRNA target sites occur preferentially near the start and near the end of the 3' UTR. To characterize miRNA function beyond the predicted lists of targets we further present a method to infer significant associations between the sets of targets predicted for individual miRNAs and specific biochemical pathways, in particular those of the KEGG pathway database. We show that this approach retrieves several known functional miRNA-mRNA associations, and predicts novel functions for known miRNAs in cell growth and in development. Conclusion We have presented a Bayesian target prediction algorithm without any tunable parameters, that can be applied to sequences from any clade of species. The algorithm automatically infers the phylogenetic distribution of functional sites for each miRNA, and assigns a posterior

  8. Databases: Beyond the Basics.

    ERIC Educational Resources Information Center

    Whittaker, Robert

    This presented paper offers an elementary description of database characteristics and then provides a survey of databases that may be useful to the teacher and researcher in Slavic and East European languages and literatures. The survey focuses on commercial databases that are available, usable, and needed. Individual databases discussed include:…

  9. Databases: Beyond the Basics.

    ERIC Educational Resources Information Center

    Whittaker, Robert

    This presented paper offers an elementary description of database characteristics and then provides a survey of databases that may be useful to the teacher and researcher in Slavic and East European languages and literatures. The survey focuses on commercial databases that are available, usable, and needed. Individual databases discussed include:…

  10. Reflective Database Access Control

    ERIC Educational Resources Information Center

    Olson, Lars E.

    2009-01-01

    "Reflective Database Access Control" (RDBAC) is a model in which a database privilege is expressed as a database query itself, rather than as a static privilege contained in an access control list. RDBAC aids the management of database access controls by improving the expressiveness of policies. However, such policies introduce new interactions…

  11. Reflective Database Access Control

    ERIC Educational Resources Information Center

    Olson, Lars E.

    2009-01-01

    "Reflective Database Access Control" (RDBAC) is a model in which a database privilege is expressed as a database query itself, rather than as a static privilege contained in an access control list. RDBAC aids the management of database access controls by improving the expressiveness of policies. However, such policies introduce new interactions…

  12. More Publications about Databases.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1983-01-01

    Reviews recent publications in online database literature including three newsletters ("Database Update,""Database Alert," and "Information Hotline"), a directory ("Guide to Online Databases"), and a textbook ("Online Reference and Information Retrieval" by Roger C. Palmer). The new "Guide to Searching ONTAP ABI/INFORM" is noted. (EJS)

  13. BioDB extractor: customized data extraction system for commonly used bioinformatics databases.

    PubMed

    Karbhal, Rajiv; Sawant, Sangeeta; Kulkarni-Kale, Urmila

    2015-01-01

    Diverse types of biological data, primary as well as derived, are available in various formats and are stored in heterogeneous resources. Database-specific as well as integrated search engines are available for carrying out efficient searches of databases. These search engines however, do not support extraction of subsets of data with the same level of granularity that exists in typical database entries. In order to extract fine grained subsets of data, users are required to download complete or partial database entries and write scripts for parsing and extraction. BioDBExtractor (BDE) has been developed to provide 26 customized data extraction utilities for some of the commonly used databases such as ENA (EMBL-Bank), UniprotKB, PDB, and KEGG. BDE eliminates the need for downloading entries and writing scripts. BDE has a simple web interface that enables input of query in the form of accession numbers/ID codes, choice of utilities and selection of fields/subfields of data by the users. BDE thus provides a common data extraction platform for multiple databases and is useful to both, novice and expert users. BDE, however, is not a substitute to basic keyword-based database searches. Desired subsets of data, compiled using BDE can be subsequently used for downstream processing, analyses and knowledge discovery. BDE can be accessed from http://bioinfo.net.in/BioDB/Home.html.

  14. Human Mitochondrial Protein Database

    National Institute of Standards and Technology Data Gateway

    SRD 131 Human Mitochondrial Protein Database (Web, free access)   The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.

  15. GlycomeDB – integration of open-access carbohydrate structure databases

    PubMed Central

    Ranzinger, René; Herget, Stephan; Wetter, Thomas; von der Lieth, Claus-Wilhelm

    2008-01-01

    Background Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes. Funding for further development of the Complex Carbohydrate Structure Database (CCSD or CarbBank) ceased in 1997, and since then several initiatives have developed independent databases with partially overlapping foci. For each database, different encoding schemes for residues and sequence topology were designed. Therefore, it is virtually impossible to obtain an overview of all deposited structures or to compare the contents of the various databases. Results We have implemented procedures which download the structures contained in the seven major databases, e.g. GLYCOSCIENCES.de, the Consortium for Functional Glycomics (CFG), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the Bacterial Carbohydrate Structure Database (BCSDB). We have created a new database called GlycomeDB, containing all structures, their taxonomic annotations and references (IDs) for the original databases. More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT. Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators. Conclusion GlycomeDB is a new, publicly available database for carbohydrate sequences with a unified, all-encompassing structure encoding format and NCBI taxonomic referencing. The database is updated weekly and can be downloaded free of charge. The JAVA application GlycoUpdateDB is also available for establishing and updating a local installation of GlycomeDB. With the advent of GlycomeDB, the distributed islands of knowledge in glycomics are now bridged to form a single resource. PMID:18803830

  16. Pathway-based factor analysis of gene expression data produces highly heritable phenotypes that associate with age.

    PubMed

    Anand Brown, Andrew; Ding, Zhihao; Viñuela, Ana; Glass, Dan; Parts, Leopold; Spector, Tim; Winn, John; Durbin, Richard

    2015-03-09

    Statistical factor analysis methods have previously been used to remove noise components from high-dimensional data prior to genetic association mapping and, in a guided fashion, to summarize biologically relevant sources of variation. Here, we show how the derived factors summarizing pathway expression can be used to analyze the relationships between expression, heritability, and aging. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarize patterns of gene expression to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 "pathway phenotypes" that summarized patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold ([Formula: see text]). These phenotypes are more heritable ([Formula: see text]) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolizing sugars and fatty acids; others relate to insulin signaling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors.

  17. Pathway-Based Factor Analysis of Gene Expression Data Produces Highly Heritable Phenotypes That Associate with Age

    PubMed Central

    Anand Brown, Andrew; Ding, Zhihao; Viñuela, Ana; Glass, Dan; Parts, Leopold; Spector, Tim; Winn, John; Durbin, Richard

    2015-01-01

    Statistical factor analysis methods have previously been used to remove noise components from high-dimensional data prior to genetic association mapping and, in a guided fashion, to summarize biologically relevant sources of variation. Here, we show how the derived factors summarizing pathway expression can be used to analyze the relationships between expression, heritability, and aging. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarize patterns of gene expression to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 “pathway phenotypes” that summarized patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold (P<5.38×10−5). These phenotypes are more heritable (h2=0.32) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolizing sugars and fatty acids; others relate to insulin signaling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors. PMID:25758824

  18. WikiPathways: building research communities on biological pathways.

    PubMed

    Kelder, Thomas; van Iersel, Martijn P; Hanspers, Kristina; Kutmon, Martina; Conklin, Bruce R; Evelo, Chris T; Pico, Alexander R

    2012-01-01

    Here, we describe the development of WikiPathways (http://www.wikipathways.org), a public wiki for pathway curation, since it was first published in 2008. New features are discussed, as well as developments in the community of contributors. New features include a zoomable pathway viewer, support for pathway ontology annotations, the ability to mark pathways as private for a limited time and the availability of stable hyperlinks to pathways and the elements therein. WikiPathways content is freely available in a variety of formats such as the BioPAX standard, and the content is increasingly adopted by external databases and tools, including Wikipedia. A recent development is the use of WikiPathways as a staging ground for centrally curated databases such as Reactome. WikiPathways is seeing steady growth in the number of users, page views and edits for each pathway. To assess whether the community curation experiment can be considered successful, here we analyze the relation between use and contribution, which gives results in line with other wiki projects. The novel use of pathway pages as supplementary material to publications, as well as the addition of tailored content for research domains, is expected to stimulate growth further.

  19. YMDB: the Yeast Metabolome Database

    PubMed Central

    Jewison, Timothy; Knox, Craig; Neveu, Vanessa; Djoumbou, Yannick; Guo, An Chi; Lee, Jacqueline; Liu, Philip; Mandal, Rupasri; Krishnamurthy, Ram; Sinelnikov, Igor; Wilson, Michael; Wishart, David S.

    2012-01-01

    The Yeast Metabolome Database (YMDB, http://www.ymdb.ca) is a richly annotated ‘metabolomic’ database containing detailed information about the metabolome of Saccharomyces cerevisiae. Modeled closely after the Human Metabolome Database, the YMDB contains >2000 metabolites with links to 995 different genes/proteins, including enzymes and transporters. The information in YMDB has been gathered from hundreds of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the YMDB also contains an extensive collection of experimental intracellular and extracellular metabolite concentration data compiled from detailed Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) metabolomic analyses performed in our lab. This is further supplemented with thousands of NMR and MS spectra collected on pure, reference yeast metabolites. Each metabolite entry in the YMDB contains an average of 80 separate data fields including comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, intracellular/extracellular concentrations, growth conditions and substrates, pathway information, enzyme data, gene/protein sequence data, as well as numerous hyperlinks to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of S. cervesiae's importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers, but also to yeast biologists, systems biologists, the industrial fermentation industry, as well as the beer, wine and spirit industry. PMID:22064855

  20. Meteor Databases in Astronomy

    NASA Astrophysics Data System (ADS)

    Kolomiyets, Svitlana V.

    2017-06-01

    There are specific problems of databases in meteor science such as making meteor databases into the modern research tools. Special institutes and virtual observatories exist for the meteor data storage where the data is online and in open access. However, there are also numerous databases without the open access, such as for example, three radar databases: Kharkiv database with 250,000 meteor orbits in Ukraine, New Zealand database with 500,000 meteor orbits, and Canadian database with more than 3 million meteor orbits. One of the reasons the open access is absent for these databases could be the complexity in the copyright compliance. In the framework of the creation of the modern effective research tool in the meteor science, we discuss here the case of the Kharkiv meteor database.

  1. Functional diversity and structural disorder in the human ubiquitination pathway.

    PubMed

    Bhowmick, Pallab; Pancsa, Rita; Guharoy, Mainak; Tompa, Peter

    2013-01-01

    The ubiquitin-proteasome system plays a central role in cellular regulation and protein quality control (PQC). The system is built as a pyramid of increasing complexity, with two E1 (ubiquitin activating), few dozen E2 (ubiquitin conjugating) and several hundred E3 (ubiquitin ligase) enzymes. By collecting and analyzing E3 sequences from the KEGG BRITE database and literature, we assembled a coherent dataset of 563 human E3s and analyzed their various physical features. We found an increase in structural disorder of the system with multiple disorder predictors (IUPred - E1: 5.97%, E2: 17.74%, E3: 20.03%). E3s that can bind E2 and substrate simultaneously (single subunit E3, ssE3) have significantly higher disorder (22.98%) than E3s in which E2 binding (multi RING-finger, mRF, 0.62%), scaffolding (6.01%) and substrate binding (adaptor/substrate recognition subunits, 17.33%) functions are separated. In ssE3s, the disorder was localized in the substrate/adaptor binding domains, whereas the E2-binding RING/HECT-domains were structured. To demonstrate the involvement of disorder in E3 function, we applied normal modes and molecular dynamics analyses to show how a disordered and highly flexible linker in human CBL (an E3 that acts as a regulator of several tyrosine kinase-mediated signalling pathways) facilitates long-range conformational changes bringing substrate and E2-binding domains towards each other and thus assisting in ubiquitin transfer. E3s with multiple interaction partners (as evidenced by data in STRING) also possess elevated levels of disorder (hubs, 22.90% vs. non-hubs, 18.36%). Furthermore, a search in PDB uncovered 21 distinct human E3 interactions, in 7 of which the disordered region of E3s undergoes induced folding (or mutual induced folding) in the presence of the partner. In conclusion, our data highlights the primary role of structural disorder in the functions of E3 ligases that manifests itself in the substrate/adaptor binding functions as well

  2. Functional Diversity and Structural Disorder in the Human Ubiquitination Pathway

    PubMed Central

    Bhowmick, Pallab; Pancsa, Rita; Guharoy, Mainak; Tompa, Peter

    2013-01-01

    The ubiquitin-proteasome system plays a central role in cellular regulation and protein quality control (PQC). The system is built as a pyramid of increasing complexity, with two E1 (ubiquitin activating), few dozen E2 (ubiquitin conjugating) and several hundred E3 (ubiquitin ligase) enzymes. By collecting and analyzing E3 sequences from the KEGG BRITE database and literature, we assembled a coherent dataset of 563 human E3s and analyzed their various physical features. We found an increase in structural disorder of the system with multiple disorder predictors (IUPred – E1: 5.97%, E2: 17.74%, E3: 20.03%). E3s that can bind E2 and substrate simultaneously (single subunit E3, ssE3) have significantly higher disorder (22.98%) than E3s in which E2 binding (multi RING-finger, mRF, 0.62%), scaffolding (6.01%) and substrate binding (adaptor/substrate recognition subunits, 17.33%) functions are separated. In ssE3s, the disorder was localized in the substrate/adaptor binding domains, whereas the E2-binding RING/HECT-domains were structured. To demonstrate the involvement of disorder in E3 function, we applied normal modes and molecular dynamics analyses to show how a disordered and highly flexible linker in human CBL (an E3 that acts as a regulator of several tyrosine kinase-mediated signalling pathways) facilitates long-range conformational changes bringing substrate and E2-binding domains towards each other and thus assisting in ubiquitin transfer. E3s with multiple interaction partners (as evidenced by data in STRING) also possess elevated levels of disorder (hubs, 22.90% vs. non-hubs, 18.36%). Furthermore, a search in PDB uncovered 21 distinct human E3 interactions, in 7 of which the disordered region of E3s undergoes induced folding (or mutual induced folding) in the presence of the partner. In conclusion, our data highlights the primary role of structural disorder in the functions of E3 ligases that manifests itself in the substrate/adaptor binding functions as well

  3. Comprehensive transcriptome analysis identifies pathways with therapeutic potential in locally advanced cervical cancer.

    PubMed

    Campos-Parra, Alma Delia; Padua-Bracho, Alejandra; Pedroza-Torres, Abraham; Figueroa-González, Gabriela; Fernández-Retana, Jorge; Millan-Catalan, Oliver; Peralta-Zaragoza, Oscar; Cantú de León, David; Herrera, Luis A; Pérez-Plasencia, Carlos

    2016-11-01

    The objective of the present study was to provide genomic and transcriptomic information that may improve clinical outcomes for locally advanced cervical cancer (LACC) patients by searching for therapeutic targets or potential biomarkers through the analysis of significantly altered signaling pathways in LACC. Microarray-based transcriptome profiling of 89 tumor samples from women with LACC was performed. Through Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, significantly over-expressed genes in LACC were identified; these genes were validated by quantitative reverse transcription-polymerase chain reaction in an independent cohort, and the protein expression data were obtained from the Human Protein Atlas. A transcriptome analysis revealed 7530 significantly over-expressed genes in LACC samples. By KEGG analysis, we found 93 dysregulated signaling pathways, including the JAK-STAT, NOTCH and mTOR-autophagy pathways, which were significantly upregulated. We confirmed the overexpression of the relevant genes of each pathway, such as NOTCH1, JAK2, STAM1, SOS1, ADAM17, PSEN1, NCSTN, RPS6, STK11/LKB1 and MLTS8/GBL in LACC compared with normal cervical tissue epithelia. Through comprehensive genomic and transcriptomic analyses, this work provides information regarding signaling pathways with promising therapeutic targets, suggesting novel target therapies to be considered in future clinical trials for LACC patients. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  4. Transcriptome Analysis of Pig In Vivo, In Vitro–Fertilized, and Nuclear Transfer Blastocyst-Stage Embryos Treated with Histone Deacetylase Inhibitors Postfusion and Activation Reveals Changes in the Lysosomal Pathway

    PubMed Central

    Whitworth, Kristin M.; Mao, Jiude; Lee, Kiho; Spollen, William G.; Samuel, Melissa S.; Walters, Eric M.; Spate, Lee D.

    2015-01-01

    Abstract Genetically modified pigs are commonly created via somatic cell nuclear transfer (SCNT). Treatment of reconstructed embryos with histone deacetylase inhibitors (HDACi) immediately after activation improves cloning efficiency. The objective of this experiment was to evaluate the transcriptome of SCNT embryos treated with suberoylanilide hydroxamic acid (SAHA), 4-iodo-SAHA (ISAHA), or Scriptaid as compared to untreated SCNT, in vitro–fertilized (IVF), and in vivo (IVV) blastocyst-stage embryos. SAHA (10 μM) had the highest level of blastocyst development at 43.9%, and all treatments except 10 μM ISAHA had the same percentage of blastocyst development as Scriptaid (p<0.05). Two treatments, 1.0 μM ISAHA and 1.0 μM SAHA, had higher mean cell number than No HDACi treatment (p<0.021). Embryo transfers performed with 10 μM SAHA- and 1 μM ISAHA-treated embryos resulted in the birth of healthy piglets. GenBank accession numbers from up- and downregulated transcripts were loaded into the Database for Annotation, Visualization and Integrated Discovery to identify enriched biological themes. HDACi treatment yielded the highest enrichment for transcripts within the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway, lysosome. The mean intensity of LysoTracker was lower in IVV embryos compared to IVF and SCNT embryos (p<0.0001). SAHA and ISAHA can successfully be used to create healthy piglets from SCNT. PMID:26731590

  5. Transcriptome Analysis of Pig In Vivo, In Vitro-Fertilized, and Nuclear Transfer Blastocyst-Stage Embryos Treated with Histone Deacetylase Inhibitors Postfusion and Activation Reveals Changes in the Lysosomal Pathway.

    PubMed

    Whitworth, Kristin M; Mao, Jiude; Lee, Kiho; Spollen, William G; Samuel, Melissa S; Walters, Eric M; Spate, Lee D; Prather, Randall S

    2015-08-01

    Genetically modified pigs are commonly created via somatic cell nuclear transfer (SCNT). Treatment of reconstructed embryos with histone deacetylase inhibitors (HDACi) immediately after activation improves cloning efficiency. The objective of this experiment was to evaluate the transcriptome of SCNT embryos treated with suberoylanilide hydroxamic acid (SAHA), 4-iodo-SAHA (ISAHA), or Scriptaid as compared to untreated SCNT, in vitro-fertilized (IVF), and in vivo (IVV) blastocyst-stage embryos. SAHA (10 μM) had the highest level of blastocyst development at 43.9%, and all treatments except 10 μM ISAHA had the same percentage of blastocyst development as Scriptaid (p<0.05). Two treatments, 1.0 μM ISAHA and 1.0 μM SAHA, had higher mean cell number than No HDACi treatment (p<0.021). Embryo transfers performed with 10 μM SAHA- and 1 μM ISAHA-treated embryos resulted in the birth of healthy piglets. GenBank accession numbers from up- and downregulated transcripts were loaded into the Database for Annotation, Visualization and Integrated Discovery to identify enriched biological themes. HDACi treatment yielded the highest enrichment for transcripts within the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway, lysosome. The mean intensity of LysoTracker was lower in IVV embryos compared to IVF and SCNT embryos (p<0.0001). SAHA and ISAHA can successfully be used to create healthy piglets from SCNT.

  6. Screening key genes and pathways in glioma based on gene set enrichment analysis and meta-analysis.

    PubMed

    Tang, Yanyan; He, Wenwu; Wei, Yunfei; Qu, Zhanli; Zeng, Jinming; Qin, Chao

    2013-06-01

    Glioma is a highly invasive, rapidly spreading form of brain cancer, while its etiology is largely unknown. A few recently reported studies have been developed using gene expression microarrays of glioma to identify differentially expressed genes from several to hundreds. This study was designed to analyze vast amounts of glioma-related microarray data and screen the key genes and pathways related to the development and progression of glioma. We used gene set enrichment analysis (GSEA) and meta-analysis of seven included studies after standardized microarray preprocessing, which increased concordance between these gene datasets. After GSEA, there were 14 mixing pathways including 13 up- and 1 down-regulated pathways. Based on the meta-analysis, 268 significant genes were screened out (P < 0.05); there were 249 genes identified by Kyoto Encyclopedia of Genes and Genomes (KEGG), and 27 KEGG pathways closely related to the set of the imported genes were identified. At last, six consistent pathways and key genes in these pathways related to glioma were obtained with combined GSEA and meta-analysis. The gene pathways that we identified could provide insight concerning the development of glioma. Further studies are needed to determine the biological function for the positive genes.

  7. THE ECOTOX DATABASE

    EPA Science Inventory

    The database provides chemical-specific toxicity information for aquatic life, terrestrial plants, and terrestrial wildlife. ECOTOX is a comprehensive ecotoxicology database and is therefore essential for providing and suppoirting high quality models needed to estimate population...

  8. THE ECOTOX DATABASE

    EPA Science Inventory

    The database provides chemical-specific toxicity information for aquatic life, terrestrial plants, and terrestrial wildlife. ECOTOX is a comprehensive ecotoxicology database and is therefore essential for providing and suppoirting high quality models needed to estimate population...

  9. Physiological Information Database (PID)

    EPA Science Inventory

    EPA has developed a physiological information database (created using Microsoft ACCESS) intended to be used in PBPK modeling. The database contains physiological parameter values for humans from early childhood through senescence as well as similar data for laboratory animal spec...

  10. Physiological Information Database (PID)

    EPA Science Inventory

    EPA has developed a physiological information database (created using Microsoft ACCESS) intended to be used in PBPK modeling. The database contains physiological parameter values for humans from early childhood through senescence as well as similar data for laboratory animal spec...

  11. Household Products Database: Pesticides

    MedlinePlus

    ... Names Types of Products Manufacturers Ingredients About the Database FAQ Product Recalls Help Glossary Contact Us More ... holders. Information is extracted from Consumer Product Information Database ©2001-2016 by DeLima Associates. All rights reserved. ...

  12. Improved orthologous databases to ease protozoan targets inference.

    PubMed

    Kotowski, Nelson; Jardim, Rodrigo; Dávila, Alberto M R

    2015-09-29

    Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformatics applications and algorithms. In this article, we propose a methodology to build improved orthologous databases with the potential to aid on protozoan target identification, one of the many tasks which benefit from comparative genomics tools. Our analyses are based on OrthoSearch, a comparative genomics pipeline originally designed to infer orthologs through protein-profile comparison, supported by an HMM, reciprocal best hits based approach. Our methodology allows OrthoSearch to confront two orthologous databases and to generate an improved new one. Such can be later used to infer potential protozoan targets through a similarity analysis against the human genome. The protein sequences of Cryptosporidium hominis, Entamoeba histolytica and Leishmania infantum genomes were comparatively analyzed against three orthologous databases: (i) EggNOG KOG, (ii) ProtozoaDB and (iii) Kegg Orthology (KO). That allowed us to create two new orthologous databases, "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB", with 16,938 and 27,701 orthologous groups, respectively. Such new orthologous databases were used for a regular OrthoSearch run. By confronting "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB" databases and protozoan species we were able to detect the following total of orthologous groups and coverage (relation between the inferred orthologous groups and the species total number of proteins): Cryptosporidium hominis: 1,821 (11 %) and 3,254 (12 %); Entamoeba histolytica: 2,245 (13 %) and 5,305 (19 %); Leishmania infantum: 2,702 (16 %) and 4,760 (17 %). Using our HMM-based methodology and the largest created orthologous database, it was possible to infer 13

  13. Aviation Safety Issues Database

    NASA Technical Reports Server (NTRS)

    Morello, Samuel A.; Ricks, Wendell R.

    2009-01-01

    The aviation safety issues database was instrumental in the refinement and substantiation of the National Aviation Safety Strategic Plan (NASSP). The issues database is a comprehensive set of issues from an extremely broad base of aviation functions, personnel, and vehicle categories, both nationally and internationally. Several aviation safety stakeholders such as the Commercial Aviation Safety Team (CAST) have already used the database. This broader interest was the genesis to making the database publically accessible and writing this report.

  14. Scopus database: a review.

    PubMed

    Burnham, Judy F

    2006-03-08

    The Scopus database provides access to STM journal articles and the references included in those articles, allowing the searcher to search both forward and backward in time. The database can be used for collection development as well as for research. This review provides information on the key points of the database and compares it to Web of Science. Neither database is inclusive, but complements each other. If a library can only afford one, choice must be based in institutional needs.

  15. Scopus database: a review

    PubMed Central

    Burnham, Judy F

    2006-01-01

    The Scopus database provides access to STM journal articles and the references included in those articles, allowing the searcher to search both forward and backward in time. The database can be used for collection development as well as for research. This review provides information on the key points of the database and compares it to Web of Science. Neither database is inclusive, but complements each other. If a library can only afford one, choice must be based in institutional needs. PMID:16522216

  16. GeConT 2: gene context analysis for orthologous proteins, conserved domains and metabolic pathways.

    PubMed

    Martinez-Guerrero, C E; Ciria, R; Abreu-Goodger, C; Moreno-Hagelsieb, G; Merino, E

    2008-07-01

    The Gene Context Tool (GeConT) allows users to visualize the genomic context of a gene or a group of genes and their orthologous relationships within fully sequenced bacterial genomes. The new version of the server incorporates information from the COG, Pfam and KEGG databases, allowing users to have an integrated graphical representation of the function of genes at multiple levels, their phylogenetic distribution and their genomic context. The sequence of any of the genes can be easily retrieved, as well as the 5' or 3' regulatory regions, greatly facilitating further types of analysis. GeConT 2 is available at: http://bioinfo.ibt.unam.mx/gecont.

  17. Mission and Assets Database

    NASA Technical Reports Server (NTRS)

    Baldwin, John; Zendejas, Silvino; Gutheinz, Sandy; Borden, Chester; Wang, Yeou-Fang

    2009-01-01

    Mission and Assets Database (MADB) Version 1.0 is an SQL database system with a Web user interface to centralize information. The database stores flight project support resource requirements, view periods, antenna information, schedule, and forecast results for use in mid-range and long-term planning of Deep Space Network (DSN) assets.

  18. JICST Factual Database

    NASA Astrophysics Data System (ADS)

    Hayase, Shuichi; Okano, Keiko

    Japan Information Center of Science and Technology (JICST) has started the on-line service of JICST Crystal Structure Database (JICST CR) in this January (1990). This database provides the information of atomic positions in a crystal and related informations of the crystal. The database system and the crystal data in JICST CR are outlined in this manuscript.

  19. Development of SRS.php, a Simple Object Access Protocol-based library for data acquisition from integrated biological databases.

    PubMed

    Barbosa-Silva, A; Pafilis, E; Ortega, J M; Schneider, R

    2007-12-11

    Data integration has become an important task for biological database providers. The current model for data exchange among different sources simplifies the manner that distinct information is accessed by users. The evolution of data representation from HTML to XML enabled programs, instead of humans, to interact with biological databases. We present here SRS.php, a PHP library that can interact with the data integration Sequence Retrieval System (SRS). The library has been written using SOAP definitions, and permits the programmatic communication through webservices with the SRS. The interactions are possible by invoking the methods described in WSDL by exchanging XML messages. The current functions available in the library have been built to access specific data stored in any of the 90 different databases (such as UNIPROT, KEGG and GO) using the same query syntax format. The inclusion of the described functions in the source of scripts written in PHP enables them as webservice clients to the SRS server. The functions permit one to query the whole content of any SRS database, to list specific records in these databases, to get specific fields from the records, and to link any record among any pair of linked databases. The case study presented exemplifies the library usage to retrieve information regarding registries of a Plant Defense Mechanisms database. The Plant Defense Mechanisms database is currently being developed, and the proposal of SRS.php library usage is to enable the data acquisition for the further warehousing tasks related to its setup and maintenance.

  20. Exploring metabolic pathway disruption in the subchronic phencyclidine model of schizophrenia with the Generalized Singular Value Decomposition

    PubMed Central

    2011-01-01

    (KEGG) metabolite pathway database) were altered in the PFC of PCP-treated rats. Several significant changes were discovered, notably: 1) neuroactive ligands active at glutamate and GABA receptors are disrupted in the PFC of PCP-treated animals, 2) glutamate dysfunction in these animals was not limited to compromised glutamatergic neurotransmission but also involves the disruption of metabolic pathways linked to glutamate; and 3) a specific series of purine reactions Xanthine ← Hypoxyanthine ↔ Inosine ← IMP → adenylosuccinate is also disrupted in the PFC of PCP-treated animals. Conclusions Network reordering via the GSVD provides a means to discover statistically validated differences in clustering between a pair of networks. In practice this analytical approach, when applied to metabolomic data, allows us to quantify the alterations in metabolic pathways between two experimental groups. With this new computational technique we identified metabolic pathway alterations that are consistent with known results. Furthermore, we discovered disruption in a novel series of purine reactions that may contribute to the PFC dysfunction and cognitive deficits seen in schizophrenia. PMID:21575198

  1. The NCBI Taxonomy database.

    PubMed

    Federhen, Scott

    2012-01-01

    The NCBI Taxonomy database (http://www.ncbi.nlm.nih.gov/taxonomy) is the standard nomenclature and classification repository for the International Nucleotide Sequence Database Collaboration (INSDC), comprising the GenBank, ENA (EMBL) and DDBJ databases. It includes organism names and taxonomic lineages for each of the sequences represented in the INSDC's nucleotide and protein sequence databases. The taxonomy database is manually curated by a small group of scientists at the NCBI who use the current taxonomic literature to maintain a phylogenetic taxonomy for the source organisms represented in the sequence databases. The taxonomy database is a central organizing hub for many of the resources at the NCBI, and provides a means for clustering elements within other domains of NCBI web site, for internal linking between domains of the Entrez system and for linking out to taxon-specific external resources on the web. Our primary purpose is to index the domain of sequences as conveniently as possible for our user community.

  2. IDPredictor: predict database links in biomedical database.

    PubMed

    Mehlhorn, Hendrik; Lange, Matthias; Scholz, Uwe; Schreiber, Falk

    2012-06-26

    Knowledge found in biomedical databases, in particular in Web information systems, is a major bioinformatics resource. In general, this biological knowledge is worldwide represented in a network of databases. These data is spread among thousands of databases, which overlap in content, but differ substantially with respect to content detail, interface, formats and data structure. To support a functional annotation of lab data, such as protein sequences, metabolites or DNA sequences as well as a semi-automated data exploration in information retrieval environments, an integrated view to databases is essential. Search engines have the potential of assisting in data retrieval from these structured sources, but fall short of providing a comprehensive knowledge except out of the interlinked databases. A prerequisite of supporting the concept of an integrated data view is to acquire insights into cross-references among database entities. This issue is being hampered by the fact, that only a fraction of all possible cross-references are explicitely tagged in the particular biomedical informations systems. In this work, we investigate to what extend an automated construction of an integrated data network is possible. We propose a method that predicts and extracts cross-references from multiple life science databases and possible referenced data targets. We study the retrieval quality of our method and report on first, promising results. The method is implemented as the tool IDPredictor, which is published under the DOI 10.5447/IPK/2012/4 and is freely available using the URL: http://dx.doi.org/10.5447/IPK/2012/4.

  3. Pathways with PathWhiz.

    PubMed

    Pon, Allison; Jewison, Timothy; Su, Yilu; Liang, Yongjie; Knox, Craig; Maciejewski, Adam; Wilson, Michael; Wishart, David S

    2015-07-01

    PathWhiz (http://smpdb.ca/pathwhiz) is a web server designed to create colourful, visually pleasing and biologically accurate pathway diagrams that are both machine-readable and interactive. As a web server, PathWhiz is accessible from almost any place and compatible with essentially any operating system. It also houses a public library of pathways and pathway components that can be easily viewed and expanded upon by its users. PathWhiz allows users to readily generate biologically complex pathways by using a specially designed drawing palette to quickly render metabolites (including automated structure generation), proteins (including quaternary structures, covalent modifications and cofactors), nucleic acids, membranes, subcellular structures, cells, tissues and organs. Both small-molecule and protein/gene pathways can be constructed by combining multiple pathway processes such as reactions, interactions, binding events and transport activities. PathWhiz's pathway replication and propagation functions allow for existing pathways to be used to create new pathways or for existing pathways to be automatically propagated across species. PathWhiz pathways can be saved in BioPAX, SBGN-ML and SBML data exchange formats, as well as PNG, PWML, HTML image map or SVG images that can be viewed offline or explored using PathWhiz's interactive viewer. PathWhiz has been used to generate over 700 pathway diagrams for a number of popular databases including HMDB, DrugBank and SMPDB. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. A-WINGS: an integrated genome database for Pleurocybella porrigens (Angel's wing oyster mushroom, Sugihiratake).

    PubMed

    Yamamoto, Naoki; Suzuki, Tomohiro; Kobayashi, Masaaki; Dohra, Hideo; Sasaki, Yohei; Hirai, Hirofumi; Yokoyama, Koji; Kawagishi, Hirokazu; Yano, Kentaro

    2014-12-03

    The angel's wing oyster mushroom (Pleurocybella porrigens, Sugihiratake) is a well-known delicacy. However, its potential risk in acute encephalopathy was recently revealed by a food poisoning incident. To disclose the genes underlying the accident and provide mechanistic insight, we seek to develop an information infrastructure containing omics data. In our previous work, we sequenced the genome and transcriptome using next-generation sequencing techniques. The next step in achieving our goal is to develop a web database to facilitate the efficient mining of large-scale omics data and identification of genes specifically expressed in the mushroom. This paper introduces a web database A-WINGS (http://bioinf.mind.meiji.ac.jp/a-wings/) that provides integrated genomic and transcriptomic information for the angel's wing oyster mushroom. The database contains structure and functional annotations of transcripts and gene expressions. Functional annotations contain information on homologous sequences from NCBI nr and UniProt, Gene Ontology, and KEGG Orthology. Digital gene expression profiles were derived from RNA sequencing (RNA-seq) analysis in the fruiting bodies and mycelia. The omics information stored in the database is freely accessible through interactive and graphical interfaces by search functions that include 'GO TREE VIEW' browsing, keyword searches, and BLAST searches. The A-WINGS database will accelerate omics studies on specific aspects of the angel's wing oyster mushroom and the family Tricholomataceae.

  5. An Introduction to Database Structure and Database Machines.

    ERIC Educational Resources Information Center

    Detweiler, Karen

    1984-01-01

    Enumerates principal management objectives of database management systems (data independence, quality, security, multiuser access, central control) and criteria for comparison (response time, size, flexibility, other features). Conventional database management systems, relational databases, and database machines used for backend processing are…

  6. SENTRA, a database of signal transduction proteins.

    SciTech Connect

    D'Souza, M.; Romine, M. F.; Maltsev, N.; Mathematics and Computer Science; PNNL

    2000-01-01

    SENTRA, available via URL http://wit.mcs.anl.gov/WIT2/Sentra/, is a database of proteins associated with microbial signal transduction. The database currently includes the classical two-component signal transduction pathway proteins and methyl-accepting chemotaxis proteins, but will be expanded to also include other classes of signal transduction systems that are modulated by phosphorylation or methylation reactions. Although the majority of database entries are from prokaryotic systems, eukaroytic proteins with bacterial-like signal transduction domains are also included. Currently SENTRA contains signal transduction proteins in 34 complete and almost completely sequenced prokaryotic genomes, as well as sequences from 243 organisms available in public databases (SWISS-PROT and EMBL). The analysis was carried out within the framework of the WIT2 system, which is designed and implemented to support genetic sequence analysis and comparative analysis of sequenced genomes.

  7. The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection.

    PubMed

    Rigden, Daniel J; Fernández-Suárez, Xosé M; Galperin, Michael Y

    2016-01-04

    The 2016 Database Issue of Nucleic Acids Research starts with overviews of the resources provided by three major bioinformatics centers, the U.S. National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EMBL-EBI) and Swiss Institute for Bioinformatics (SIB). Also included are descriptions of 62 new databases and updates on 95 databases that have been previously featured in NAR plus 17 previously described elsewhere. A number of papers in this issue deal with resources on nucleic acids, including various kinds of non-coding RNAs and their interactions, molecular dynamics simulations of nucleic acid structure, and two databases of super-enhancers. The protein database section features important updates on the EBI's Pfam, PDBe and PRIDE databases, as well as a variety of resources on pathways, metabolomics and metabolic modeling. This issue also includes updates on popular metagenomics resources, such as MG-RAST, EBI Metagenomics, and probeBASE, as well as a newly compiled Human Pan-Microbe Communities database. A significant fraction of the new and updated databases are dedicated to the genetic basis of disease, primarily cancer, and various aspects of drug research, including resources for patented drugs, their side effects, withdrawn drugs, and potential drug targets. A further six papers present updated databases of various antimicrobial and anticancer peptides. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/). The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been updated with the addition of 88 new resources and removal of 23 obsolete websites, which brought the current listing to 1685 databases.

  8. The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection

    PubMed Central

    Rigden, Daniel J.; Fernández-Suárez, Xosé M.; Galperin, Michael Y.

    2016-01-01

    The 2016 Database Issue of Nucleic Acids Research starts with overviews of the resources provided by three major bioinformatics centers, the U.S. National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EMBL-EBI) and Swiss Institute for Bioinformatics (SIB). Also included are descriptions of 62 new databases and updates on 95 databases that have been previously featured in NAR plus 17 previously described elsewhere. A number of papers in this issue deal with resources on nucleic acids, including various kinds of non-coding RNAs and their interactions, molecular dynamics simulations of nucleic acid structure, and two databases of super-enhancers. The protein database section features important updates on the EBI's Pfam, PDBe and PRIDE databases, as well as a variety of resources on pathways, metabolomics and metabolic modeling. This issue also includes updates on popular metagenomics resources, such as MG-RAST, EBI Metagenomics, and probeBASE, as well as a newly compiled Human Pan-Microbe Communities database. A significant fraction of the new and updated databases are dedicated to the genetic basis of disease, primarily cancer, and various aspects of drug research, including resources for patented drugs, their side effects, withdrawn drugs, and potential drug targets. A further six papers present updated databases of various antimicrobial and anticancer peptides. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/). The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been updated with the addition of 88 new resources and removal of 23 obsolete websites, which brought the current listing to 1685 databases. PMID:26740669

  9. MetaMapp: mapping and visualizing metabolomic data by integrating information from biochemical pathways and chemical and mass spectral similarity

    PubMed Central

    2012-01-01

    Background Exposure to environmental tobacco smoke (ETS) leads to higher rates of pulmonary diseases and infections in children. To study the biochemical changes that may precede lung diseases, metabolomic effects on fetal and maternal lungs and plasma from rats exposed to ETS were compared to filtered air control animals. Genome- reconstructed metabolic pathways may be used to map and interpret dysregulation in metabolic networks. However, mass spectrometry-based non-targeted metabolomics datasets often comprise many metabolites for which links to enzymatic reactions have not yet been reported. Hence, network visualizations that rely on current biochemical databases are incomplete and also fail to visualize novel, structurally unidentified metabolites. Results We present a novel approach to integrate biochemical pathway and chemical relationships to map all detected metabolites in network graphs (MetaMapp) using KEGG reactant pair database, Tanimoto chemical and NIST mass spectral similarity scores. In fetal and maternal lungs, and in maternal blood plasma from pregnant rats exposed to environmental tobacco smoke (ETS), 459 unique metabolites comprising 179 structurally identified compounds were detected by gas chromatography time of flight mass spectrometry (GC-TOF MS) and BinBase data processing. MetaMapp graphs in Cytoscape showed much clearer metabolic modularity and complete content visualization compared to conventional biochemical mapping approaches. Cytoscape visualization of differential statistics results using these graphs showed that overall, fetal lung metabolism was more impaired than lungs and blood metabolism in dams. Fetuses from ETS-exposed dams expressed lower lipid and nucleotide levels and higher amounts of energy metabolism intermediates than control animals, indicating lower biosynthetic rates of metabolites for cell division, structural proteins and lipids that are critical for in lung development. Conclusions MetaMapp graphs efficiently

  10. Fragment recruitment on metabolic pathways: comparative metabolic profiling of metagenomes and metatranscriptomes.

    PubMed

    Desai, Dhwani K; Schunck, Harald; Löser, Johannes W; Laroche, Julie

    2013-03-15

    The sheer scale of the metagenomic and metatranscriptomic datasets that are now available warrants the development of automated protocols for organizing, annotating and comparing the samples in terms of their metabolic profiles. We describe a user-friendly java program FROMP (Fragment Recruitment on Metabolic Pathways) for mapping and visualizing enzyme annotations onto the Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways or custom-made pathways and comparing the samples in terms of their Pathway Completeness Scores, their relative Activity Scores or enzyme enrichment odds ratios. This program along with our fully configurable PERL-based annotation organization pipeline Meta2Pro (METAbolic PROfiling of META-omic data) offers a quick and accurate standalone solution for metabolic profiling of environmental samples or cultures from different treatments. Apart from pictorial comparisons, FROMP can also generate score matrices for multiple meta-omics samples, which can be used directly by other statistical programs.

  11. ITS-90 Thermocouple Database

    National Institute of Standards and Technology Data Gateway

    SRD 60 NIST ITS-90 Thermocouple Database (Web, free access)   Web version of Standard Reference Database 60 and NIST Monograph 175. The database gives temperature -- electromotive force (emf) reference functions and tables for the letter-designated thermocouple types B, E, J, K, N, R, S and T. These reference functions have been adopted as standards by the American Society for Testing and Materials (ASTM) and the International Electrotechnical Commission (IEC).

  12. IPSec Database Query Acceleration

    NASA Astrophysics Data System (ADS)

    Ferrante, Alberto; Chandra, Satish; Piuri, Vincenzo

    IPSec is a suite of protocols that adds security to communications at the IP level. Protocols within IPSec make extensive use of two databases, namely the Security Policy Database (SPD) and the Security Association Database (SAD). The ability to query the SPD quickly is fundamental as this operation needs to be done for each incoming or outgoing IP packet, even if no IPSec processing needs to be applied on it. This may easily result in millions of query per second in gigabit networks.

  13. 2010 Worldwide Gasification Database

    DOE Data Explorer

    The 2010 Worldwide Gasification Database describes the current world gasification industry and identifies near-term planned capacity additions. The database lists gasification projects and includes information (e.g., plant location, number and type of gasifiers, syngas capacity, feedstock, and products). The database reveals that the worldwide gasification capacity has continued to grow for the past several decades and is now at 70,817 megawatts thermal (MWth) of syngas output at 144 operating plants with a total of 412 gasifiers.

  14. Veterans Administration Databases

    Cancer.gov

    The Veterans Administration Information Resource Center provides database and informatics experts, customer service, expert advice, information products, and web technology to VA researchers and others.

  15. Databases for LDEF results

    NASA Technical Reports Server (NTRS)

    Bohnhoff-Hlavacek, Gail

    1992-01-01

    One of the objectives of the team supporting the LDEF Systems and Materials Special Investigative Groups is to develop databases of experimental findings. These databases identify the hardware flown, summarize results and conclusions, and provide a system for acknowledging investigators, tracing sources of data, and future design suggestions. To date, databases covering the optical experiments, and thermal control materials (chromic acid anodized aluminum, silverized Teflon blankets, and paints) have been developed at Boeing. We used the Filemaker Pro software, the database manager for the Macintosh computer produced by the Claris Corporation. It is a flat, text-retrievable database that provides access to the data via an intuitive user interface, without tedious programming. Though this software is available only for the Macintosh computer at this time, copies of the databases can be saved to a format that is readable on a personal computer as well. Further, the data can be exported to more powerful relational databases, capabilities, and use of the LDEF databases and describe how to get copies of the database for your own research.

  16. Mugshot Identification Database (MID)

    National Institute of Standards and Technology Data Gateway

    NIST Mugshot Identification Database (MID) (PC database for purchase)   NIST Special Database 18 is being distributed for use in development and testing of automated mugshot identification systems. The database consists of three CD-ROMs, containing a total of 3248 images of variable size using lossless compression. A newer version of the compression/decompression software on the CDROM can be found at the website http://www.nist.gov/itl/iad/ig/nigos.cfm as part of the NBIS package.

  17. HIV Sequence Databases

    PubMed Central

    Kuiken, Carla; Korber, Bette; Shafer, Robert W.

    2008-01-01

    Two important databases are often used in HIV genetic research, the HIV Sequence Database in Los Alamos, which collects all sequences and focuses on annotation and data analysis, and the HIV RT/Protease Sequence Database in Stanford, which collects sequences associated with the development of viral resistance against anti-retroviral drugs and focuses on analysis of those sequences. The types of data and services these two databases offer, the tools they provide, and the way they are set up and operated are described in detail. PMID:12875108

  18. De novo assembly and functional annotation of Myrciaria dubia fruit transcriptome reveals multiple metabolic pathways for L-ascorbic acid biosynthesis.

    PubMed

    Castro, Juan C; Maddox, J Dylan; Cobos, Marianela; Requena, David; Zimic, Mirko; Bombarely, Aureliano; Imán, Sixto A; Cerdeira, Luis A; Medina, Andersson E

    2015-11-24

    Myrciaria dubia is an Amazonian fruit shrub that produces numerous bioactive phytochemicals, but is best known by its high L-ascorbic acid (AsA) content in fruits. Pronounced variation in AsA content has been observed both within and among individuals, but the genetic factors responsible for this variation are largely unknown. The goals of this research, therefore, were to assemble, characterize, and annotate the fruit transcriptome of M. dubia in order to reconstruct metabolic pathways and determine if multiple pathways contribute to AsA biosynthesis. In total 24,551,882 high-quality sequence reads were de novo assembled into 70,048 unigenes (mean length = 1150 bp, N50 = 1775 bp). Assembled sequences were annotated using BLASTX against public databases such as TAIR, GR-protein, FB, MGI, RGD, ZFIN, SGN, WB, TIGR_CMR, and JCVI-CMR with 75.2 % of unigenes having annotations. Of the three core GO annotation categories, biological processes comprised 53.6 % of the total assigned annotations, whereas cellular components and molecular functions comprised 23.3 and 23.1 %, respectively. Based on the KEGG pathway assignment of the functionally annotated transcripts, five metabolic pathways for AsA biosynthesis were identified: animal-like pathway, myo-inositol pathway, L-gulose pathway, D-mannose/L-galactose pathway, and uronic acid pathway. All transcripts coding enzymes involved in the ascorbate-glutathione cycle were also identified. Finally, we used the assembly to identified 6314 genic microsatellites and 23,481 high quality SNPs. This study describes the first next-generation sequencing effort and transcriptome annotation of a non-model Amazonian plant that is relevant for AsA production and other bioactive phytochemicals. Genes encoding key enzymes were successfully identified and metabolic pathways involved in biosynthesis of AsA, anthocyanins, and other metabolic pathways have been reconstructed. The identification of these genes and pathways is in agreement with

  19. Potential hippocampal genes and pathways involved in Alzheimer's disease: a bioinformatic analysis.

    PubMed

    Zhang, L; Guo, X Q; Chu, J F; Zhang, X; Yan, Z R; Li, Y Z

    2015-06-29

    Alzheimer's disease (AD) is a neurodegenerative disor-der and the most common cause of dementia in elderly people. Nu-merous studies have focused on the dysregulated genes in AD, but the pathogenesis is still unknown. In this study, we explored critical hippocampal genes and pathways that might potentially be involved in the pathogenesis of AD. Four transcriptome datasets for the hip-pocampus of patients with AD were downloaded from ArrayExpress, and the gene signature was identified by integrated analysis of mul-tiple transcriptomes using novel genome-wide relative significance and genome-wide global significance models. A protein-protein interaction network was constructed, and five clusters were selected. The biologi-cal functions and pathways were identified by Gene Ontology and Kyo-to Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. A total of 6994 genes were screened, and the top 300 genes were subjected to further analysis. Four significant KEGG pathways were identified, including oxidative phosphorylation and Parkinson's disease, Huntington's disease, and Alzheimer's disease pathways. The hub network of cluster 1 with the highest average rank value was de-fined. The genes (NDUFB3, NDUFA9, NDUFV1, NDUFV2, NDUFS3, NDUFA10, COX7B, and UQCR1) were considered critical with high degree in cluster 1 as well as being shared by the four significant path-ways. The oxidative phosphorylation process was also involved in the other three pathways and is considered to be relevant to energy-related AD pathology in the hippocampus. This research provides a perspec-tive from which to explore critical genes and pathways for potential AD therapies.

  20. FOAM (Functional Ontology Assignments for Metagenomes): A Hidden Markov Model (HMM) database with environmental focus

    SciTech Connect

    Prestat, Emmanuel; David, Maude M.; Hultman, Jenni; Ta , Neslihan; Lamendella, Regina; Dvornik, Jill; Mackelprang, Rachel; Myrold, David D.; Jumpponen, Ari; Tringe, Susannah G.; Holman, Elizabeth; Mavromatis, Konstantinos; Jansson, Janet K.

    2014-09-26

    A new functional gene database, FOAM (Functional Ontology Assignments for Metagenomes), was developed to screen environmental metagenomic sequence datasets. FOAM provides a new functional ontology dedicated to classify gene functions relevant to environmental microorganisms based on Hidden Markov Models (HMMs). Sets of aligned protein sequences (i.e. ‘profiles’) were tailored to a large group of target KEGG Orthologs (KOs) from which HMMs were trained. The alignments were checked and curated to make them specific to the targeted KO. Within this process, sequence profiles were enriched with the most abundant sequences available to maximize the yield of accurate classifier models. An associated functional ontology was built to describe the functional groups and hierarchy. FOAM allows the user to select the target search space before HMM-based comparison steps and to easily organize the results into different functional categories and subcategories. FOAM is publicly available at http://portal.nersc.gov/project/m1317/FOAM/.

  1. Deriving pathway maps from automated text analysis using a grammar-based approach.

    PubMed

    Olsson, Björn; Gawronska, Barbara; Erlendsson, Björn

    2006-04-01

    We demonstrate how automated text analysis can be used to support the large-scale analysis of metabolic and regulatory pathways by deriving pathway maps from textual descriptions found in the scientific literature. The main assumption is that correct syntactic analysis combined with domain-specific heuristics provides a good basis for relation extraction. Our method uses an algorithm that searches through the syntactic trees produced by a parser based on a Referent Grammar formalism, identifies relations mentioned in the sentence, and classifies them with respect to their semantic class and epistemic status (facts, counterfactuals, hypotheses). The semantic categories used in the classification are based on the relation set used in KEGG (Kyoto Encyclopedia of Genes and Genomes), so that pathway maps using KEGG notation can be automatically generated. We present the current version of the relation extraction algorithm and an evaluation based on a corpus of abstracts obtained from PubMed. The results indicate that the method is able to combine a reasonable coverage with high accuracy. We found that 61% of all sentences were parsed, and 97% of the parse trees were judged to be correct. The extraction algorithm was tested on a sample of 300 parse trees and was found to produce correct extractions in 90.5% of the cases.

  2. Dictionary as Database.

    ERIC Educational Resources Information Center

    Painter, Derrick

    1996-01-01

    Discussion of dictionaries as databases focuses on the digitizing of The Oxford English dictionary (OED) and the use of Standard Generalized Mark-Up Language (SGML). Topics include the creation of a consortium to digitize the OED, document structure, relational databases, text forms, sequence, and discourse. (LRW)

  3. BioImaging Database

    SciTech Connect

    David Nix, Lisa Simirenko

    2006-10-25

    The Biolmaging Database (BID) is a relational database developed to store the data and meta-data for the 3D gene expression in early Drosophila embryo development on a cellular level. The schema was written to be used with the MySQL DBMS but with minor modifications can be used on any SQL compliant relational DBMS.

  4. Ionic Liquids Database- (ILThermo)

    National Institute of Standards and Technology Data Gateway

    SRD 147 Ionic Liquids Database- (ILThermo) (Web, free access)   IUPAC Ionic Liquids Database, ILThermo, is a free web research tool that allows users worldwide to access an up-to-date data collection from the publications on experimental investigations of thermodynamic, and transport properties of ionic liquids as well as binary and ternary mixtures containing ionic liquids.

  5. Structural Ceramics Database

    National Institute of Standards and Technology Data Gateway

    SRD 30 NIST Structural Ceramics Database (Web, free access)   The NIST Structural Ceramics Database (WebSCD) provides evaluated materials property data for a wide range of advanced ceramics known variously as structural ceramics, engineering ceramics, and fine ceramics.

  6. Atomic Spectra Database (ASD)

    National Institute of Standards and Technology Data Gateway

    SRD 78 NIST Atomic Spectra Database (ASD) (Web, free access)   This database provides access and search capability for NIST critically evaluated data on atomic energy levels, wavelengths, and transition probabilities that are reasonably up-to-date. The NIST Atomic Spectroscopy Data Center has carried out these critical compilations.

  7. A Quality System Database

    NASA Technical Reports Server (NTRS)

    Snell, William H.; Turner, Anne M.; Gifford, Luther; Stites, William

    2010-01-01

    A quality system database (QSD), and software to administer the database, were developed to support recording of administrative nonconformance activities that involve requirements for documentation of corrective and/or preventive actions, which can include ISO 9000 internal quality audits and customer complaints.

  8. Consumer Product Category Database

    EPA Pesticide Factsheets

    The Chemical and Product Categories database (CPCat) catalogs the use of over 40,000 chemicals and their presence in different consumer products. The chemical use information is compiled from multiple sources while product information is gathered from publicly available Material Safety Data Sheets (MSDS). EPA researchers are evaluating the possibility of expanding the database with additional product and use information.

  9. Reach Address Database (RAD)

    EPA Pesticide Factsheets

    The Reach Address Database (RAD) stores the reach address of each Water Program feature that has been linked to the underlying surface water features (streams, lakes, etc) in the National Hydrology Database (NHD). (A reach is the portion of a stream between two points of confluence. A confluence is the location where two or more streams flow together.)

  10. Online Database Searching Workbook.

    ERIC Educational Resources Information Center

    Littlejohn, Alice C.; Parker, Joan M.

    Designed primarily for use by first-time searchers, this workbook provides an overview of online searching. Following a brief introduction which defines online searching, databases, and database producers, five steps in carrying out a successful search are described: (1) identifying the main concepts of the search statement; (2) selecting a…

  11. Database Searching by Managers.

    ERIC Educational Resources Information Center

    Arnold, Stephen E.

    Managers and executives need the easy and quick access to business and management information that online databases can provide, but many have difficulty articulating their search needs to an intermediary. One possible solution would be to encourage managers and their immediate support staff members to search textual databases directly as they now…

  12. Knowledge Discovery in Databases.

    ERIC Educational Resources Information Center

    Norton, M. Jay

    1999-01-01

    Knowledge discovery in databases (KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and mechanisms for retrieving knowledge from data collections. The article is an introductory overview of KDD. The rationale and environment of its development and applications are discussed. Issues related to database design…

  13. Morchella MLST database

    USDA-ARS?s Scientific Manuscript database

    Welcome to the Morchella MLST database. This dedicated database was set up at the CBS-KNAW Biodiversity Center by Vincent Robert in February 2012, using BioloMICS software (Robert et al., 2011), to facilitate DNA sequence-based identifications of Morchella species via the Internet. The current datab...

  14. HIV Structural Database

    National Institute of Standards and Technology Data Gateway

    SRD 102 HIV Structural Database (Web, free access)   The HIV Protease Structural Database is an archive of experimentally determined 3-D structures of Human Immunodeficiency Virus 1 (HIV-1), Human Immunodeficiency Virus 2 (HIV-2) and Simian Immunodeficiency Virus (SIV) Proteases and their complexes with inhibitors or products of substrate cleavage.

  15. Biological Macromolecule Crystallization Database

    National Institute of Standards and Technology Data Gateway

    SRD 21 Biological Macromolecule Crystallization Database (Web, free access)   The Biological Macromolecule Crystallization Database and NASA Archive for Protein Crystal Growth Data (BMCD) contains the conditions reported for the crystallization of proteins and nucleic acids used in X-ray structure determinations and archives the results of microgravity macromolecule crystallization studies.

  16. Online Database Searching Workbook.

    ERIC Educational Resources Information Center

    Littlejohn, Alice C.; Parker, Joan M.

    Designed primarily for use by first-time searchers, this workbook provides an overview of online searching. Following a brief introduction which defines online searching, databases, and database producers, five steps in carrying out a successful search are described: (1) identifying the main concepts of the search statement; (2) selecting a…

  17. First Look: TRADEMARKSCAN Database.

    ERIC Educational Resources Information Center

    Fernald, Anne Conway; Davidson, Alan B.

    1984-01-01

    Describes database produced by Thomson and Thomson and available on Dialog which contains over 700,000 records representing all active federal trademark registrations and applications for registrations filed in United States Patent and Trademark Office. A typical record, special features, database applications, learning to use TRADEMARKSCAN, and…

  18. Knowledge Discovery in Databases.

    ERIC Educational Resources Information Center

    Norton, M. Jay

    1999-01-01

    Knowledge discovery in databases (KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and mechanisms for retrieving knowledge from data collections. The article is an introductory overview of KDD. The rationale and environment of its development and applications are discussed. Issues related to database design…

  19. Database Reviews: Legal Information.

    ERIC Educational Resources Information Center

    Seiser, Virginia

    Detailed reviews of two legal information databases--"Laborlaw I" and "Legal Resource Index"--are presented in this paper. Each database review begins with a bibliographic entry listing the title; producer; vendor; cost per hour contact time; offline print cost per citation; time period covered; frequency of updates; and size…

  20. Model organism databases in behavioral neuroscience.

    PubMed

    Shimoyama, Mary; Smith, Jennifer R; Hayman, G Thomas; Petri, Victoria; Nigam, Rajni

    2012-01-01

    Model Organism Databases (MODs) are an important informatics tool for researchers. They provide comprehensive organism specific genetic, genomic, and phenotype datasets. MODs ensure accurate data identification and integrity and provide official nomenclature for genes, Quantitative Trait Loci, and strains. Most importantly, the MODs provide professionally curated data drawn from the literature for function, phenotype and disease associations, and pathway involvement. These data, along with nomenclature and data identity, are incorporated into larger scale genomic databases and research publications. MODs also offer a number of software tools that allow researchers to access, display, and analyze data from reports to genome browsers. Copyright © 2012 Elsevier Inc. All rights reserved.

  1. The BioPAX community standard for pathway

    SciTech Connect

    Syed, Mustafa H

    2010-01-01

    Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery.

  2. The BioPAX community standard for pathway data sharing.

    PubMed

    Demir, Emek; Cary, Michael P; Paley, Suzanne; Fukuda, Ken; Lemer, Christian; Vastrik, Imre; Wu, Guanming; D'Eustachio, Peter; Schaefer, Carl; Luciano, Joanne; Schacherer, Frank; Martinez-Flores, Irma; Hu, Zhenjun; Jimenez-Jacinto, Veronica; Joshi-Tope, Geeta; Kandasamy, Kumaran; Lopez-Fuentes, Alejandra C; Mi, Huaiyu; Pichler, Elgar; Rodchenkov, Igor; Splendiani, Andrea; Tkachev, Sasha; Zucker, Jeremy; Gopinath, Gopal; Rajasimha, Harsha; Ramakrishnan, Ranjani; Shah, Imran; Syed, Mustafa; Anwar, Nadia; Babur, Ozgün; Blinov, Michael; Brauner, Erik; Corwin, Dan; Donaldson, Sylva; Gibbons, Frank; Goldberg, Robert; Hornbeck, Peter; Luna, Augustin; Murray-Rust, Peter; Neumann, Eric; Ruebenacker, Oliver; Reubenacker, Oliver; Samwald, Matthias; van Iersel, Martijn; Wimalaratne, Sarala; Allen, Keith; Braun, Burk; Whirl-Carrillo, Michelle; Cheung, Kei-Hoi; Dahlquist, Kam; Finney, Andrew; Gillespie, Marc; Glass, Elizabeth; Gong, Li; Haw, Robin; Honig, Michael; Hubaut, Olivier; Kane, David; Krupa, Shiva; Kutmon, Martina; Leonard, Julie; Marks, Debbie; Merberg, David; Petri, Victoria; Pico, Alex; Ravenscroft, Dean; Ren, Liya; Shah, Nigam; Sunshine, Margot; Tang, Rebecca; Whaley, Ryan; Letovksy, Stan; Buetow, Kenneth H; Rzhetsky, Andrey; Schachter, Vincent; Sobral, Bruno S; Dogrusoz, Ugur; McWeeney, Shannon; Aladjem, Mirit; Birney, Ewan; Collado-Vides, Julio; Goto, Susumu; Hucka, Michael; Le Novère, Nicolas; Maltsev, Natalia; Pandey, Akhilesh; Thomas, Paul; Wingender, Edgar; Karp, Peter D; Sander, Chris; Bader, Gary D

    2010-09-01

    Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery.

  3. Cascadia Tsunami Deposit Database

    USGS Publications Warehouse

    Peters, Robert; Jaffe, Bruce; Gelfenbaum, Guy; Peterson, Curt

    2003-01-01

    The Cascadia Tsunami Deposit Database contains data on the location and sedimentological properties of tsunami deposits found along the Cascadia margin. Data have been compiled from 52 studies, documenting 59 sites from northern California to Vancouver Island, British Columbia that contain known or potential tsunami deposits. Bibliographical references are provided for all sites included in the database. Cascadia tsunami deposits are usually seen as anomalous sand layers in coastal marsh or lake sediments. The studies cited in the database use numerous criteria based on sedimentary characteristics to distinguish tsunami deposits from sand layers deposited by other processes, such as river flooding and storm surges. Several studies cited in the database contain evidence for more than one tsunami at a site. Data categories include age, thickness, layering, grainsize, and other sedimentological characteristics of Cascadia tsunami deposits. The database documents the variability observed in tsunami deposits found along the Cascadia margin.

  4. An extended bioreaction database that significantly improves reconstruction and analysis of genome-scale metabolic networks.

    PubMed

    Stelzer, Michael; Sun, Jibin; Kamphans, Tom; Fekete, Sándor P; Zeng, An-Ping

    2011-11-01

    The bioreaction database established by Ma and Zeng (Bioinformatics, 2003, 19, 270-277) for in silico reconstruction of genome-scale metabolic networks has been widely used. Based on more recent information in the reference databases KEGG LIGAND and Brenda, we upgrade the bioreaction database in this work by almost doubling the number of reactions from 3565 to 6851. Over 70% of the reactions have been manually updated/revised in terms of reversibility, reactant pairs, currency metabolites and error correction. For the first time, 41 spontaneous sugar mutarotation reactions are introduced into the biochemical database. The upgrade significantly improves the reconstruction of genome scale metabolic networks. Many gaps or missing biochemical links can be recovered, as exemplified with three model organisms Homo sapiens, Aspergillus niger, and Escherichia coli. The topological parameters of the constructed networks were also largely affected, however, the overall network structure remains scale-free. Furthermore, we consider the problem of computing biologically feasible shortest paths in reconstructed metabolic networks. We show that these paths are hard to compute and present solutions to find such paths in networks of small and medium size.

  5. Hazard Analysis Database Report

    SciTech Connect

    GRAMS, W.H.

    2000-12-28

    The Hazard Analysis Database was developed in conjunction with the hazard analysis activities conducted in accordance with DOE-STD-3009-94, Preparation Guide for U S . Department of Energy Nonreactor Nuclear Facility Safety Analysis Reports, for HNF-SD-WM-SAR-067, Tank Farms Final Safety Analysis Report (FSAR). The FSAR is part of the approved Authorization Basis (AB) for the River Protection Project (RPP). This document describes, identifies, and defines the contents and structure of the Tank Farms FSAR Hazard Analysis Database and documents the configuration control changes made to the database. The Hazard Analysis Database contains the collection of information generated during the initial hazard evaluations and the subsequent hazard and accident analysis activities. The Hazard Analysis Database supports the preparation of Chapters 3 ,4 , and 5 of the Tank Farms FSAR and the Unreviewed Safety Question (USQ) process and consists of two major, interrelated data sets: (1) Hazard Analysis Database: Data from the results of the hazard evaluations, and (2) Hazard Topography Database: Data from the system familiarization and hazard identification.

  6. National Database of Geriatrics

    PubMed Central

    Kannegaard, Pia Nimann; Vinding, Kirsten L; Hare-Bruun, Helle

    2016-01-01

    Aim of database The aim of the National Database of Geriatrics is to monitor the quality of interdisciplinary diagnostics and treatment of patients admitted to a geriatric hospital unit. Study population The database population consists of patients who were admitted to a geriatric hospital unit. Geriatric patients cannot be defined by specific diagnoses. A geriatric patient is typically a frail multimorbid elderly patient with decreasing functional ability and social challenges. The database includes 14–15,000 admissions per year, and the database completeness has been stable at 90% during the past 5 years. Main variables An important part of the geriatric approach is the interdisciplinary collaboration. Indicators, therefore, reflect the combined efforts directed toward the geriatric patient. The indicators include Barthel index, body mass index, de Morton Mobility Index, Chair Stand, percentage of discharges with a rehabilitation plan, and the part of cases where an interdisciplinary conference has taken place. Data are recorded by doctors, nurses, and therapists in a database and linked to the Danish National Patient Register. Descriptive data Descriptive patient-related data include information about home, mobility aid, need of fall and/or cognitive diagnosing, and categorization of cause (general geriatric, orthogeriatric, or neurogeriatric). Conclusion The National Database of Geriatrics covers ∼90% of geriatric admissions in Danish hospitals and provides valuable information about a large and increasing patient population in the health care system. PMID:27822120

  7. ResPlan Database

    NASA Technical Reports Server (NTRS)

    Zellers, Michael L.

    2003-01-01

    The main project I was involved in was new application development for the existing CIS0 Database (ResPlan). This database application was developed in Microsoft Access. Initial meetings with Greg Follen, Linda McMillen, Griselle LaFontaine and others identified a few key weaknesses with the existing database. The weaknesses centered around that while the database correctly modeled the structure of Programs, Projects and Tasks, once the data was entered, the database did not capture any dynamic status information, and as such was of limited usefulness. After the initial meetings my goals were identified as follows: Enhance the ResPlan Database to include qualitative and quantitative status information about the Programs, Projects and Tasks Train staff members about the ResPlan database from both the user perspective and the developer perspective Give consideration to a Web Interface for reporting. Initially, the thought was that there would not be adequate time to actually develop the Web Interface, Greg wanted it understood that this was an eventual goal and as such should be a consideration throughout the development process.

  8. The NCBI Taxonomy database

    PubMed Central

    Federhen, Scott

    2012-01-01

    The NCBI Taxonomy database (http://www.ncbi.nlm.nih.gov/taxonomy) is the standard nomenclature and classification repository for the International Nucleotide Sequence Database Collaboration (INSDC), comprising the GenBank, ENA (EMBL) and DDBJ databases. It includes organism names and taxonomic lineages for each of the sequences represented in the INSDC’s nucleotide and protein sequence databases. The taxonomy database is manually curated by a small group of scientists at the NCBI who use the current taxonomic literature to maintain a phylogenetic taxonomy for the source organisms represented in the sequence databases. The taxonomy database is a central organizing hub for many of the resources at the NCBI, and provides a means for clustering elements within other domains of NCBI web site, for internal linking between domains of the Entrez system and for linking out to taxon-specific external resources on the web. Our primary purpose is to index the domain of sequences as conveniently as possible for our user community. PMID:22139910

  9. DIP: The Database of Interacting Proteins

    DOE Data Explorer

    The DIP Database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. By interaction, the DIP Database creators mean that two amino acid chains were experimentally identified to bind to each other. The database lists such pairs to aid those studying a particular protein-protein interaction but also those investigating entire regulatory and signaling pathways as well as those studying the organisation and complexity of the protein interaction network at the cellular level. The data stored within the DIP database were curated, both, manually by expert curators and also automatically using computational approaches that utilize the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data. It is a relational database that can be searched by protein, sequence, motif, article information, and pathBLAST. The website also serves as an access point to a number of projects related to DIP, such as LiveDIP, The Database of Ligand-Receptor Partners (DLRP) and JDIP. Users have free and open access to DIP after login. [Taken from the DIP Guide and the DIP website] (Specialized Interface) (Registration Required)

  10. Functional pathway mapping analysis for hypoxia-inducible factors

    PubMed Central

    2011-01-01

    Background Hypoxia-inducible factors (HIFs) are transcription factors that play a crucial role in response to hypoxic stress in living organisms. The HIF pathway is activated by changes in cellular oxygen levels and has significant impacts on the regulation of gene expression patterns in cancer cells. Identifying functional conservation across species and discovering conserved regulatory motifs can facilitate the selection of reference species for empirical tests. This paper describes a cross-species functional pathway mapping strategy based on evidence of homologous relationships that employs matrix-based searching techniques for identifying transcription factor-binding sites on all retrieved HIF target genes. Results HIF-related orthologous and paralogous genes were mapped onto the conserved pathways to indicate functional conservation across species. Quantitatively measured HIF pathways are depicted in order to illustrate the extent of functional conservation. The results show that in spite of the evolutionary process of speciation, distantly related species may exhibit functional conservation owing to conservative pathways. The novel terms OrthRate and ParaRate are proposed to quantitatively indicate the flexibility of a homologous pathway and reveal the alternative regulation of functional genes. Conclusion The developed functional pathway mapping strategy provides a bioinformatics approach for constructing biological pathways by highlighting the homologous relationships between various model species. The mapped HIF pathways were quantitatively illustrated and evaluated by statistically analyzing their conserved transcription factor-binding elements. Keywords hypoxia-inducible factor (HIF), hypoxia-response element (HRE), transcription factor (TF), transcription factor binding site (TFBS), KEGG (Kyoto Encyclopedia of Genes and Genomes), cross-species comparison, orthology, paralogy, functional pathway PMID:21689478

  11. Functional pathway mapping analysis for hypoxia-inducible factors.

    PubMed

    Chuang, Chia-Sheng; Pai, Tun-Wen; Hu, Chin-Hua; Tzou, Wen-Shyong; Dah-Tsyr Chang, Margaret; Chang, Hao-Teng; Chen, Chih-Chia

    2011-06-20

    Hypoxia-inducible factors (HIFs) are transcription factors that play a crucial role in response to hypoxic stress in living organisms. The HIF pathway is activated by changes in cellular oxygen levels and has significant impacts on the regulation of gene expression patterns in cancer cells. Identifying functional conservation across species and discovering conserved regulatory motifs can facilitate the selection of reference species for empirical tests. This paper describes a cross-species functional pathway mapping strategy based on evidence of homologous relationships that employs matrix-based searching techniques for identifying transcription factor-binding sites on all retrieved HIF target genes. HIF-related orthologous and paralogous genes were mapped onto the conserved pathways to indicate functional conservation across species. Quantitatively measured HIF pathways are depicted in order to illustrate the extent of functional conservation. The results show that in spite of the evolutionary process of speciation, distantly related species may exhibit functional conservation owing to conservative pathways. The novel terms OrthRate and ParaRate are proposed to quantitatively indicate the flexibility of a homologous pathway and reveal the alternative regulation of functional genes. The developed functional pathway mapping strategy provides a bioinformatics approach for constructing biological pathways by highlighting the homologous relationships between various model species. The mapped HIF pathways were quantitatively illustrated and evaluated by statistically analyzing their conserved transcription factor-binding elements. hypoxia-inducible factor (HIF), hypoxia-response element (HRE), transcription factor (TF), transcription factor binding site (TFBS), KEGG (Kyoto Encyclopedia of Genes and Genomes), cross-species comparison, orthology, paralogy, functional pathway.

  12. Database for propagation models

    NASA Technical Reports Server (NTRS)

    Kantak, Anil V.

    1991-01-01

    A propagation researcher or a systems engineer who intends to use the results of a propagation experiment is generally faced with various database tasks such as the selection of the computer software, the hardware, and the writing of the programs to pass the data through the models of interest. This task is repeated every time a new experiment is conducted or the same experiment is carried out at a different location generating different data. Thus the users of this data have to spend a considerable portion of their time learning how to implement the computer hardware and the software towards the desired end. This situation may be facilitated considerably if an easily accessible propagation database is created that has all the accepted (standardized) propagation phenomena models approved by the propagation research community. Also, the handling of data will become easier for the user. Such a database construction can only stimulate the growth of the propagation research it if is available to all the researchers, so that the results of the experiment conducted by one researcher can be examined independently by another, without different hardware and software being used. The database may be made flexible so that the researchers need not be confined only to the contents of the database. Another way in which the database may help the researchers is by the fact that they will not have to document the software and hardware tools used in their research since the propagation research community will know the database already. The following sections show a possible database construction, as well as properties of the database for the propagation research.

  13. Glycoproteomic and glycomic databases.

    PubMed

    Baycin Hizal, Deniz; Wolozny, Daniel; Colao, Joseph; Jacobson, Elena; Tian, Yuan; Krag, Sharon S; Betenbaugh, Michael J; Zhang, Hui

    2014-01-01

    Protein glycosylation serves critical roles in the cellular and biological processes of many organisms. Aberrant glycosylation has been associated with many illnesses such as hereditary and chronic diseases like cancer, cardiovascular diseases, neurological disorders, and immunological disorders. Emerging mass spectrometry (MS) technologies that enable the high-throughput identification of glycoproteins and glycans have accelerated the analysis and made possible the creation of dynamic and expanding databases. Although glycosylation-related databases have been established by many laboratories and institutions, they are not yet widely known in the community. Our study reviews 15 different publicly available databases and identifies their key elements so that users can identify the most applicable platform for their analytical needs. These databases include biological information on the experimentally identified glycans and glycopeptides from various cells and organisms such as human, rat, mouse, fly and zebrafish. The features of these databases - 7 for glycoproteomic data, 6 for glycomic data, and 2 for glycan binding proteins are summarized including the enrichment techniques that are used for glycoproteome and glycan identification. Furthermore databases such as Unipep, GlycoFly, GlycoFish recently established by our group are introduced. The unique features of each database, such as the analytical methods used and bioinformatical tools available are summarized. This information will be a valuable resource for the glycobiology community as it presents the analytical methods and glycosylation related databases together in one compendium. It will also represent a step towards the desired long term goal of integrating the different databases of glycosylation in order to characterize and categorize glycoproteins and glycans better for biomedical research.

  14. Hybrid Terrain Database

    NASA Technical Reports Server (NTRS)

    Arthur, Trey

    2006-01-01

    A prototype hybrid terrain database is being developed in conjunction with other databases and with hardware and software that constitute subsystems of aerospace cockpit display systems (known in the art as synthetic vision systems) that generate images to increase pilots' situation awareness and eliminate poor visibility as a cause of aviation accidents. The basic idea is to provide a clear view of the world around an aircraft by displaying computer-generated imagery derived from an onboard database of terrain, obstacle, and airport information.

  15. Databases for materials selection

    SciTech Connect

    1996-06-01

    The Cambridge Materials Selector (CMS2.0) materials database was developed by the Engineering Dept. at Cambridge University in the United Kingdom. This database makes it possible to select a material for a specific application from essentially all classes of materials. Genera, Predict, and Socrates software programs from CLI International, Houston, Texas, automate materials selection and corrosion problem-solving tasks. They are said to significantly reduce the time necessary to select a suitable material and/or to assess a corrosion problem and reach cost-effective solutions. This article describes both databases and tells how to use them.

  16. JICST Factual Database

    NASA Astrophysics Data System (ADS)

    Suzuki, Kazuaki; Shimura, Kazuki; Monma, Yoshio; Sakamoto, Masao; Morishita, Hiroshi; Kanazawa, Kenji

    The Japan Information Center of Science and Technology (JICST) has started the on-line service of JICST/NRIM Materials Strength Database for Engineering Steels and Alloys (JICST ME) in this March (1990). This database has been developed under the joint research between JICST and the National Research Institute for Metals (NRIM). It provides material strength data (creep, fatigue, etc.) of engineering steels and alloys. It is able to search and display on-line, and to analyze the searched data statistically and plot the result on graphic display. The database system and the data in JICST ME are described.

  17. Phase Equilibria Diagrams Database

    National Institute of Standards and Technology Data Gateway

    SRD 31 NIST/ACerS Phase Equilibria Diagrams Database (PC database for purchase)   The Phase Equilibria Diagrams Database contains commentaries and more than 21,000 diagrams for non-organic systems, including those published in all 21 hard-copy volumes produced as part of the ACerS-NIST Phase Equilibria Diagrams Program (formerly titled Phase Diagrams for Ceramists): Volumes I through XIV (blue books); Annuals 91, 92, 93; High Tc Superconductors I & II; Zirconium & Zirconia Systems; and Electronic Ceramics I. Materials covered include oxides as well as non-oxide systems such as chalcogenides and pnictides, phosphates, salt systems, and mixed systems of these classes.

  18. Working with Existing Databases

    PubMed Central

    Murphy, Melissa; Alavi, Karim; Maykel, Justin

    2013-01-01

    Outcomes research has established itself as an integral part of surgical research as physicians and hospitals are increasingly required to demonstrate attainment of performance markers and surgical safety indicators. Large-volume and clinical and administrative databases are used to study regional practice pattern variations, health care disparities, and resource utilization. Understanding the unique strengths and limitations of these large databases is critical to performing quality surgical outcomes research. In the current work, we review the currently available large-volume databases including selection processes, modes of analyses, data application, and limitations. PMID:24436641

  19. Plant Genome Duplication Database.

    PubMed

    Lee, Tae-Ho; Kim, Junah; Robertson, Jon S; Paterson, Andrew H

    2017-01-01

    Genome duplication, widespread in flowering plants, is a driving force in evolution. Genome alignments between/within genomes facilitate identification of homologous regions and individual genes to investigate evolutionary consequences of genome duplication. PGDD (the Plant Genome Duplication Database), a public web service database, provides intra- or interplant genome alignment information. At present, PGDD contains information for 47 plants whose genome sequences have been released. Here, we describe methods for identification and estimation of dates of genome duplication and speciation by functions of PGDD.The database is freely available at http://chibba.agtec.uga.edu/duplication/.

  20. DDTRP: Database of Drug Targets for Resistant Pathogens

    PubMed Central

    Sundaramurthi, Jagadish Chandrabose; Ramanandan, Prabhakaran; Brindha, Sridharan; Subhasree, Chelladurai Ramarathnam; Prasad, Abhimanyu; Kumaraswami, Vasanthapuram; Hanna, Luke Elizabeth

    2011-01-01

    Emergence of drug resistance is a major threat to public health. Many pathogens have developed resistance to most of the existing antibiotics, and multidrug-resistant and extensively drug resistant strains are extremely difficult to treat. This has resulted in an urgent need for novel drugs. We describe a database called ‘Database of Drug Targets for Resistant Pathogens’ (DDTRP). The database contains information on drugs with reported resistance, their respective targets, metabolic pathways involving these targets, and a list of potential alternate targets for seven pathogens. The database can be accessed freely at http://bmi.icmr.org.in/DDTRP. PMID:21938213

  1. Large-scale annotation of small-molecule libraries using public databases.

    PubMed

    Zhou, Yingyao; Zhou, Bin; Chen, Kaisheng; Yan, S Frank; King, Frederick J; Jiang, Shumei; Winzeler, Elizabeth A

    2007-01-01

    While many large publicly accessible databases provide excellent annotation for biological macromolecules, the same is not true for small chemical compounds. Commercial data sources also fail to encompass an annotation interface for large numbers of compounds and tend to be cost prohibitive to be widely available to biomedical researchers. Therefore, using annotation information for the selection of lead compounds from a modern day high-throughput screening (HTS) campaign presently occurs only under a very limited scale. The recent rapid expansion of the NIH PubChem database provides an opportunity to link existing biological databases with compound catalogs and provides relevant information that potentially could improve the information garnered from large-scale screening efforts. Using the 2.5 million compound collection at the Genomics Institute of the Novartis Research Foundation (GNF) as a model, we determined that approximately 4% of the library contained compounds with potential annotation in such databases as PubChem and the World Drug Index (WDI) as well as related databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and ChemIDplus. Furthermore, the exact structure match analysis showed 32% of GNF compounds can be linked to third party databases via PubChem. We also showed annotations such as MeSH (medical subject headings) terms can be applied to in-house HTS databases in identifying signature biological inhibition profiles of interest as well as expediting the assay validation process. The automated annotation of thousands of screening hits in batch is becoming feasible and has the potential to play an essential role in the hit-to-lead decision making process.

  2. TREATABILITY DATABASE DESCRIPTION

    EPA Science Inventory

    The Drinking Water Treatability Database (TDB) presents referenced information on the control of contaminants in drinking water. It allows drinking water utilities, first responders to spills or emergencies, treatment process designers, research organizations, academics, regulato...

  3. Enhancing medical database security.

    PubMed

    Pangalos, G; Khair, M; Bozios, L

    1994-08-01

    A methodology for the enhancement of database security in a hospital environment is presented in this paper which is based on both the discretionary and the mandatory database security policies. In this way the advantages of both approaches are combined to enhance medical database security. An appropriate classification of the different types of users according to their different needs and roles and a User Role Definition Hierarchy has been used. The experience obtained from the experimental implementation of the proposed methodology in a major general hospital is briefly discussed. The implementation has shown that the combined discretionary and mandatory security enforcement effectively limits the unauthorized access to the medical database, without severely restricting the capabilities of the system.

  4. Chemical Kinetics Database

    National Institute of Standards and Technology Data Gateway

    SRD 17 NIST Chemical Kinetics Database (Web, free access)   The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000.

  5. THE CTEPP DATABASE

    EPA Science Inventory

    The CTEPP (Children's Total Exposure to Persistent Pesticides and Other Persistent Organic Pollutants) database contains a wealth of data on children's aggregate exposures to pollutants in their everyday surroundings. Chemical analysis data for the environmental media and ques...

  6. ARTI Refrigerant Database

    SciTech Connect

    Calm, J.M.

    1994-05-27

    The Refrigerant Database consolidates and facilitates access to information to assist industry in developing equipment using alternative refrigerants. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern.

  7. Uranium Location Database Compilation

    EPA Pesticide Factsheets

    EPA has compiled mine location information from federal, state, and Tribal agencies into a single database as part of its investigation into the potential environmental hazards of wastes from abandoned uranium mines in the western United States.

  8. Livestock Anaerobic Digester Database

    EPA Pesticide Factsheets

    The Anaerobic Digester Database provides basic information about anaerobic digesters on livestock farms in the United States, organized in Excel spreadsheets. It includes projects that are under construction, operating, or shut down.

  9. ARTI Refrigerant Database

    SciTech Connect

    Calm, J.M.

    1995-06-01

    The Refrigerant Database consolidates and facilitates access to information to assist industry in developing equipment using alternative refrigerants. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern.

  10. ARTI Refrigerant Database

    SciTech Connect

    Calm, J.M.

    1995-02-01

    The Refrigerant Database consolidates and facilitates access to information to assist industry in developing equipment using alternative refrigerants. The underlying purpose is to accelerate phase-out of chemical compounds of environmental concern.

  11. THE CTEPP DATABASE

    EPA Science Inventory

    The CTEPP (Children's Total Exposure to Persistent Pesticides and Other Persistent Organic Pollutants) database contains a wealth of data on children's aggregate exposures to pollutants in their everyday surroundings. Chemical analysis data for the environmental media and ques...

  12. Household Products Database

    MedlinePlus

    ... Care Landscape/Yard Arts & Crafts Pet Care Pesticides Auto Products Home Office Commercial / Institutional Product Names Types of Products Manufacturers Ingredients About the Database FAQ Product Recalls Help Glossary Contact Us More Resources What's under your ...

  13. Hawaii bibliographic database

    USGS Publications Warehouse

    Wright, T.L.; Takahashi, T.J.

    1998-01-01

    The Hawaii bibliographic database has been created to contain all of the literature, from 1779 to the present, pertinent to the volcanological history of the Hawaiian-Emperor volcanic chain. References are entered in a PC- and Macintosh-compatible EndNote Plus bibliographic database with keywords and abstracts or (if no abstract) with annotations as to content. Keywords emphasize location, discipline, process, identification of new chemical data or age determinations, and type of publication. The database is updated approximately three times a year and is available to upload from an ftp site. The bibliography contained 8460 references at the time this paper was submitted for publication. Use of the database greatly enhances the power and completeness of library searches for anyone interested in Hawaiian volcanism.

  14. Hawaii bibliographic database

    NASA Astrophysics Data System (ADS)

    Wright, Thomas L.; Takahashi, Taeko Jane

    The Hawaii bibliographic database has been created to contain all of the literature, from 1779 to the present, pertinent to the volcanological history of the Hawaiian-Emperor volcanic chain. References are entered in a PC- and Macintosh-compatible EndNote Plus bibliographic database with keywords and s or (if no ) with annotations as to content. Keywords emphasize location, discipline, process, identification of new chemical data or age determinations, and type of publication. The database is updated approximately three times a year and is available to upload from an ftp site. The bibliography contained 8460 references at the time this paper was submitted for publication. Use of the database greatly enhances the power and completeness of library searches for anyone interested in Hawaiian volcanism.

  15. Nuclear Science References Database

    SciTech Connect

    Pritychenko, B.; Běták, E.; Singh, B.; Totans, J.

    2014-06-15

    The Nuclear Science References (NSR) database together with its associated Web interface, is the world's only comprehensive source of easily accessible low- and intermediate-energy nuclear physics bibliographic information for more than 210,000 articles since the beginning of nuclear science. The weekly-updated NSR database provides essential support for nuclear data evaluation, compilation and research activities. The principles of the database and Web application development and maintenance are described. Examples of nuclear structure, reaction and decay applications are specifically included. The complete NSR database is freely available at the websites of the National Nuclear Data Center (http://www.nndc.bnl.gov/nsr) and the International Atomic Energy Agency (http://www-nds.iaea.org/nsr)

  16. Bioinformatics Analysis Reveals MicroRNAs Regulating Biological Pathways in Exercise-Induced Cardiac Physiological Hypertrophy.

    PubMed

    Xu, Jiahong; Liu, Yang; Xie, Yuan; Zhao, Cuimei; Wang, Hongbao

    2017-01-01

    Exercise-induced physiological cardiac hypertrophy is generally considered to be a type of adaptive change after exercise training and is beneficial for cardiovascular diseases. This study aims at investigating exercise-regulated microRNAs (miRNAs) and their potential biological pathways. Here, we collected 23 miRNAs from 8 published studies. MirPath v.3 from the DIANA tools website was used to execute the analysis, and TargetScan was used to predict the target genes. Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses were performed to identify potential pathways and functional annotations associated with exercise-induced physiological cardiac hypertrophy. Various miRNA targets and molecular pathways, such as Fatty acid elongation, Arrhythmogenic right ventricular cardiomyopathy (ARVC), and ECM-receptor interaction, were identified. This study could prompt the understanding of the regulatory mechanisms underlying exercise-induced physiological cardiac hypertrophy.

  17. Genotype Correlation Analysis Reveals Pathway-Based Functional Disequilibrium and Potential Epistasis in the Human Interactome

    PubMed Central

    Bush, William S.; Haines, Jonathan L.

    2016-01-01

    Epistasis is thought to be a pervasive part of complex phenotypes due to the dynamics and complexity of biological systems, and a further understanding of epistasis in the context of biological pathways may provide insight into the etiology of complex disease. In this study, we use genotype data from the International HapMap Project to characterize the functional dependencies between alleles in the human interactome as defined by KEGG pathways. We performed chi-square tests to identify non-independence between functionally-related SNP pairs within parental Caucasian and Yoruba samples. We further refine this list by testing for skewed transmission of pseudo-haplotypes to offspring using a haplotype-based TDT test. From these analyses, we identify pathways enriched for functional disequilibrium, and a set of 863 SNP pairs (representing 453 gene pairs) showing consistent non-independence and transmission distortion. These results represent gene pairs with strong evidence of epistasis within the context of a biological function.

  18. Bioinformatics Analysis Reveals MicroRNAs Regulating Biological Pathways in Exercise-Induced Cardiac Physiological Hypertrophy

    PubMed Central

    Xu, Jiahong; Liu, Yang; Xie, Yuan

    2017-01-01

    Exercise-induced physiological cardiac hypertrophy is generally considered to be a type of adaptive change after exercise training and is beneficial for cardiovascular diseases. This study aims at investigating exercise-regulated microRNAs (miRNAs) and their potential biological pathways. Here, we collected 23 miRNAs from 8 published studies. MirPath v.3 from the DIANA tools website was used to execute the analysis, and TargetScan was used to predict the target genes. Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses were performed to identify potential pathways and functional annotations associated with exercise-induced physiological cardiac hypertrophy. Various miRNA targets and molecular pathways, such as Fatty acid elongation, Arrhythmogenic right ventricular cardiomyopathy (ARVC), and ECM-receptor interaction, were identified. This study could prompt the understanding of the regulatory mechanisms underlying exercise-induced physiological cardiac hypertrophy. PMID:28286759

  19. Numeric Databases in the Sciences.

    ERIC Educational Resources Information Center

    Meschel, S. V.

    1984-01-01

    Provides exploration into types of numeric databases available (also known as source databases, nonbibliographic databases, data-files, data-banks, fact banks); examines differences and similarities between bibliographic and numeric databases; identifies disciplines that utilize numeric databases; and surveys representative examples in the…

  20. The ChEMBL database in 2017.

    PubMed

    Gaulton, Anna; Hersey, Anne; Nowotka, Michał; Bento, A Patrícia; Chambers, Jon; Mendez, David; Mutowo, Prudence; Atkinson, Francis; Bellis, Louisa J; Cibrián-Uhalte, Elena; Davies, Mark; Dedman, Nathan; Karlsson, Anneli; Magariños, María Paula; Overington, John P; Papadatos, George; Smit, Ines; Leach, Andrew R

    2017-01-04

    ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 and 2014 Nucleic Acids Research Database Issues. Since then, alongside the continued extraction of data from the medicinal chemistry literature, new sources of bioactivity data have also been added to the database. These include: deposited data sets from neglected disease screening; crop protection data; drug metabolism and disposition data and bioactivity data from patents. A number of improvements and new features have also been incorporated. These include the annotation of assays and targets using ontologies, the inclusion of targets and indications for clinical candidates, addition of metabolic pathways for drugs and calculation of structural alerts. The ChEMBL data can be accessed via a web-interface, RDF distribution, data downloads and RESTful web-services.

  1. The ChEMBL database in 2017

    PubMed Central

    Gaulton, Anna; Hersey, Anne; Nowotka, Michał; Bento, A. Patrícia; Chambers, Jon; Mendez, David; Mutowo, Prudence; Atkinson, Francis; Bellis, Louisa J.; Cibrián-Uhalte, Elena; Davies, Mark; Dedman, Nathan; Karlsson, Anneli; Magariños, María Paula; Overington, John P.; Papadatos, George; Smit, Ines; Leach, Andrew R.

    2017-01-01

    ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 and 2014 Nucleic Acids Research Database Issues. Since then, alongside the continued extraction of data from the medicinal chemistry literature, new sources of bioactivity data have also been added to the database. These include: deposited data sets from neglected disease screening; crop protection data; drug metabolism and disposition data and bioactivity data from patents. A number of improvements and new features have also been incorporated. These include the annotation of assays and targets using ontologies, the inclusion of targets and indications for clinical candidates, addition of metabolic pathways for drugs and calculation of structural alerts. The ChEMBL data can be accessed via a web-interface, RDF distribution, data downloads and RESTful web-services. PMID:27899562

  2. Dynameomics: A comprehensive database of protein dynamics

    PubMed Central

    van der Kamp, Marc W.; Schaeffer, Richard D.; Jonsson, Amanda L.; Scouras, Alexander D.; Simms, Andrew; Toofanny, Rudesh D.; Benson, Noah C.; Anderson, Peter C.; Merkley, Eric D.; Rysavy, Steve; Bromley, Denny; Beck, David A. C.; Daggett, Valerie

    2010-01-01

    Summary The dynamic behavior of proteins is important for an understanding of their function and folding. We have performed molecular dynamics simulations of the native state and unfolding pathways of over 1000 proteins, representing the majority of folds in globular proteins. These data are stored and organized using an innovative database approach, which can be mined to obtain both general and specific information about the dynamics and folding/unfolding of proteins, relevant subsets thereof, and individual proteins. Here we describe the project in general terms and the type of information contained in the database. Then we provide examples of mining the database for information relevant to protein folding, structure building, the effect of single-nucleotide polymorphisms, and drug design. The native state simulation data and corresponding analyses for the 100 most populated metafolds, together with related resources, are publicly accessible through www.dynameomics.org. PMID:20399180

  3. Querying genomic databases

    SciTech Connect

    Baehr, A.; Hagstrom, R.; Joerg, D.; Overbeek, R.

    1991-09-01

    A natural-language interface has been developed that retrieves genomic information by using a simple subset of English. The interface spares the biologist from the task of learning database-specific query languages and computer programming. Currently, the interface deals with the E. coli genome. It can, however, be readily extended and shows promise as a means of easy access to other sequenced genomic databases as well.

  4. Steam Properties Database

    National Institute of Standards and Technology Data Gateway

    SRD 10 NIST/ASME Steam Properties Database (PC database for purchase)   Based upon the International Association for the Properties of Water and Steam (IAPWS) 1995 formulation for the thermodynamic properties of water and the most recent IAPWS formulations for transport and other properties, this updated version provides water properties over a wide range of conditions according to the accepted international standards.

  5. Database computing in HEP

    SciTech Connect

    Day, C.T.; Loken, S.; MacFarlane, J.F. ); May, E.; Lifka, D.; Lusk, E.; Price, L.E. ); Baden, A. . Dept. of Physics); Grossman, R.; Qin, X. . Dept. of Mathematics, Statistics and Computer Science); Cormell, L.; Leibold, P.; Liu, D

    1992-01-01

    The major SSC experiments are expected to produce up to 1 Petabyte of data per year each. Once the primary reconstruction is completed by farms of inexpensive processors. I/O becomes a major factor in further analysis of the data. We believe that the application of database techniques can significantly reduce the I/O performed in these analyses. We present examples of such I/O reductions in prototype based on relational and object-oriented databases of CDF data samples.

  6. Human mapping databases.

    PubMed

    Talbot, C; Cuticchia, A J

    2001-05-01

    This unit concentrates on the data contained within two human genome databasesGDB (Genome Database) and OMIM (Online Mendelian Inheritance in Man)and includes discussion of different methods for submitting and accessing data. An understanding of electronic mail, FTP, and the use of a World Wide Web (WWW) navigational tool such as Netscape or Internet Explorer is a prerequisite for utilizing the information in this unit.

  7. The ribosomal database project.

    PubMed

    Larsen, N; Olsen, G J; Maidak, B L; McCaughey, M J; Overbeek, R; Macke, T J; Marsh, T L; Woese, C R

    1993-07-01

    The Ribosomal Database Project (RDP) is a curated database that offers ribosome data along with related programs and services. The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, derived phylogenetic trees, rRNA secondary structure diagrams and various software packages for handling, analyzing and displaying alignments and trees. The data are available via ftp and electronic mail. Certain analytic services are also provided by the electronic mail server.

  8. The ribosomal database project.

    PubMed Central

    Larsen, N; Olsen, G J; Maidak, B L; McCaughey, M J; Overbeek, R; Macke, T J; Marsh, T L; Woese, C R

    1993-01-01

    The Ribosomal Database Project (RDP) is a curated database that offers ribosome data along with related programs and services. The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, derived phylogenetic trees, rRNA secondary structure diagrams and various software packages for handling, analyzing and displaying alignments and trees. The data are available via ftp and electronic mail. Certain analytic services are also provided by the electronic mail server. PMID:8332524

  9. Database computing in HEP

    NASA Technical Reports Server (NTRS)

    Day, C. T.; Loken, S.; Macfarlane, J. F.; May, E.; Lifka, D.; Lusk, E.; Price, L. E.; Baden, A.; Grossman, R.; Qin, X.

    1992-01-01

    The major SSC experiments are expected to produce up to 1 Petabyte of data per year each. Once the primary reconstruction is completed by farms of inexpensive processors, I/O becomes a major factor in further analysis of the data. We believe that the application of database techniques can significantly reduce the I/O performed in these analyses. We present examples of such I/O reductions in prototypes based on relational and object-oriented databases of CDF data samples.

  10. Pathway analysis of body mass index genome-wide association study highlights risk pathways in cardiovascular disease

    PubMed Central

    Zhao, Xin; Gu, Jinxia; Li, Ming; Xi, Jie; Sun, Wenyu; Song, Guangmin; Liu, Guiyou

    2015-01-01

    Cardiovascular disease (CVD) is a class of diseases that involve the heart or blood vessels. It is reported that body mass index (BMI) is risk factor for CVD. Genome-wide association studies (GWAS) have recently provided rapid insights into genetics of CVD and its risk factors. However, the specific mechanisms how BMI influences CVD risk are largely unknown. We think that BMI may influences CVD risk by shared genetic pathways. In order to confirm this view, we conducted a pathway analysis of BMI GWAS, which examined approximately 329,091 single nucleotide polymorphisms from 4763 samples. We identified 31 significant KEGG pathways. There is literature evidence supporting the involvement of GnRH signaling, vascular smooth muscle contraction, dilated cardiomyopathy, Gap junction, Wnt signaling, Calcium signaling and Chemokine signaling in CVD. Collectively, our study supports the potential role of the CVD risk pathways in BMI. BMI may influence CVD risk by the shared genetic pathways. We believe that our results may advance our understanding of BMI mechanisms in CVD. PMID:26264282

  11. The Transporter Classification Database

    PubMed Central

    Saier, Milton H.; Reddy, Vamsee S.; Tamang, Dorjee G.; Västermark, Åke

    2014-01-01

    The Transporter Classification Database (TCDB; http://www.tcdb.org) serves as a common reference point for transport protein research. The database contains more than 10 000 non-redundant proteins that represent all currently recognized families of transmembrane molecular transport systems. Proteins in TCDB are organized in a five level hierarchical system, where the first two levels are the class and subclass, the second two are the family and subfamily, and the last one is the transport system. Superfamilies that contain multiple families are included as hyperlinks to the five tier TC hierarchy. TCDB includes proteins from all types of living organisms and is the only transporter classification system that is both universal and recognized by the International Union of Biochemistry and Molecular Biology. It has been expanded by manual curation, contains extensive text descriptions providing structural, functional, mechanistic and evolutionary information, is supported by unique software and is interconnected to many other relevant databases. TCDB is of increasing usefulness to the international scientific community and can serve as a model for the expansion of database technologies. This manuscript describes an update of the database descriptions previously featured in NAR database issues. PMID:24225317

  12. Specialist Bibliographic Databases

    PubMed Central

    2016-01-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485

  13. Specialist Bibliographic Databases.

    PubMed

    Gasparyan, Armen Yuri; Yessirkepov, Marlen; Voronov, Alexander A; Trukhachev, Vladimir I; Kostyukova, Elena I; Gerasimov, Alexey N; Kitas, George D

    2016-05-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls.

  14. National Database of Geriatrics.

    PubMed

    Kannegaard, Pia Nimann; Vinding, Kirsten L; Hare-Bruun, Helle

    2016-01-01

    The aim of the National Database of Geriatrics is to monitor the quality of interdisciplinary diagnostics and treatment of patients admitted to a geriatric hospital unit. The database population consists of patients who were admitted to a geriatric hospital unit. Geriatric patients cannot be defined by specific diagnoses. A geriatric patient is typically a frail multimorbid elderly patient with decreasing functional ability and social challenges. The database includes 14-15,000 admissions per year, and the database completeness has been stable at 90% during the past 5 years. An important part of the geriatric approach is the interdisciplinary collaboration. Indicators, therefore, reflect the combined efforts directed toward the geriatric patient. The indicators include Barthel index, body mass index, de Morton Mobility Index, Chair Stand, percentage of discharges with a rehabilitation plan, and the part of cases where an interdisciplinary conference has taken place. Data are recorded by doctors, nurses, and therapists in a database and linked to the Danish National Patient Register. Descriptive patient-related data include information about home, mobility aid, need of fall and/or cognitive diagnosing, and categorization of cause (general geriatric, orthogeriatric, or neurogeriatric). The National Database of Geriatrics covers ∼90% of geriatric admissions in Danish hospitals and provides valuable information about a large and increasing patient population in the health care system.

  15. Drinking Water Database

    NASA Technical Reports Server (NTRS)

    Murray, ShaTerea R.

    2004-01-01

    This summer I had the opportunity to work in the Environmental Management Office (EMO) under the Chemical Sampling and Analysis Team or CS&AT. This team s mission is to support Glenn Research Center (GRC) and EM0 by providing chemical sampling and analysis services and expert consulting. Services include sampling and chemical analysis of water, soil, fbels, oils, paint, insulation materials, etc. One of this team s major projects is the Drinking Water Project. This is a project that is done on Glenn s water coolers and ten percent of its sink every two years. For the past two summers an intern had been putting together a database for this team to record the test they had perform. She had successfully created a database but hadn't worked out all the quirks. So this summer William Wilder (an intern from Cleveland State University) and I worked together to perfect her database. We began be finding out exactly what every member of the team thought about the database and what they would change if any. After collecting this data we both had to take some courses in Microsoft Access in order to fix the problems. Next we began looking at what exactly how the database worked from the outside inward. Then we began trying to change the database but we quickly found out that this would be virtually impossible.

  16. The comprehensive peptaibiotics database.

    PubMed

    Stoppacher, Norbert; Neumann, Nora K N; Burgstaller, Lukas; Zeilinger, Susanne; Degenkolb, Thomas; Brückner, Hans; Schuhmacher, Rainer

    2013-05-01

    Peptaibiotics are nonribosomally biosynthesized peptides, which - according to definition - contain the marker amino acid α-aminoisobutyric acid (Aib) and possess antibiotic properties. Being known since 1958, a constantly increasing number of peptaibiotics have been described and investigated with a particular emphasis on hypocrealean fungi. Starting from the existing online 'Peptaibol Database', first published in 1997, an exhaustive literature survey of all known peptaibiotics was carried out and resulted in a list of 1043 peptaibiotics. The gathered information was compiled and used to create the new 'The Comprehensive Peptaibiotics Database', which is presented here. The database was devised as a software tool based on Microsoft (MS) Access. It is freely available from the internet at http://peptaibiotics-database.boku.ac.at and can easily be installed and operated on any computer offering a Windows XP/7 environment. It provides useful information on characteristic properties of the peptaibiotics included such as peptide category, group name of the microheterogeneous mixture to which the peptide belongs, amino acid sequence, sequence length, producing fungus, peptide subfamily, molecular formula, and monoisotopic mass. All these characteristics can be used and combined for automated search within the database, which makes The Comprehensive Peptaibiotics Database a versatile tool for the retrieval of valuable information about peptaibiotics. Sequence data have been considered as to December 14, 2012.

  17. Crude Oil Analysis Database

    DOE Data Explorer

    Shay, Johanna Y.

    The composition and physical properties of crude oil vary widely from one reservoir to another within an oil field, as well as from one field or region to another. Although all oils consist of hydrocarbons and their derivatives, the proportions of various types of compounds differ greatly. This makes some oils more suitable than others for specific refining processes and uses. To take advantage of this diversity, one needs access to information in a large database of crude oil analyses. The Crude Oil Analysis Database (COADB) currently satisfies this need by offering 9,056 crude oil analyses. Of these, 8,500 are United States domestic oils. The database contains results of analysis of the general properties and chemical composition, as well as the field, formation, and geographic location of the crude oil sample. [Taken from the Introduction to COAMDATA_DESC.pdf, part of the zipped software and database file at http://www.netl.doe.gov/technologies/oil-gas/Software/database.html] Save the zipped file to your PC. When opened, it will contain PDF documents and a large Excel spreadsheet. It will also contain the database in Microsoft Access 2002.

  18. Cancer Metabolomics and the Human Metabolome Database

    PubMed Central

    Wishart, David S.; Mandal, Rupasri; Stanislaus, Avalyn; Ramirez-Gaona, Miguel

    2016-01-01

    The application of metabolomics towards cancer research has led to a renewed appreciation of metabolism in cancer development and progression. It has also led to the discovery of metabolite cancer biomarkers and the identification of a number of novel cancer causing metabolites. The rapid growth of metabolomics in cancer research is also leading to challenges. In particular, with so many cancer-associate metabolites being identified, it is often difficult to keep track of which compounds are associated with which cancers. It is also challenging to track down information on the specific pathways that particular metabolites, drugs or drug metabolites may be affecting. Even more frustrating are the difficulties associated with identifying metabolites from NMR or MS spectra. Fortunately, a number of metabolomics databases are emerging that are designed to address these challenges. One such database is the Human Metabolome Database (HMDB). The HMDB is currently the world’s largest and most comprehensive, organism-specific metabolomics database. It contains more than 40,000 metabolite entries, thousands of metabolite concentrations, >700 metabolic and disease-associated pathways, as well as information on dozens of cancer biomarkers. This review is intended to provide a brief summary of the HMDB and to offer some guidance on how it can be used in metabolomic studies of cancer. PMID:26950159

  19. Cancer Metabolomics and the Human Metabolome Database.

    PubMed

    Wishart, David S; Mandal, Rupasri; Stanislaus, Avalyn; Ramirez-Gaona, Miguel

    2016-03-02

    The application of metabolomics towards cancer research has led to a renewed appreciation of metabolism in cancer development and progression. It has also led to the discovery of metabolite cancer biomarkers and the identification of a number of novel cancer causing metabolites. The rapid growth of metabolomics in cancer research is also leading to challenges. In particular, with so many cancer-associate metabolites being identified, it is often difficult to keep track of which compounds are associated with which cancers. It is also challenging to track down information on the specific pathways that particular metabolites, drugs or drug metabolites may be affecting. Even more frustrating are the difficulties associated with identifying metabolites from NMR or MS spectra. Fortunately, a number of metabolomics databases are emerging that are designed to address these challenges. One such database is the Human Metabolome Database (HMDB). The HMDB is currently the world's largest and most comprehensive, organism-specific metabolomics database. It contains more than 40,000 metabolite entries, thousands of metabolite concentrations, >700 metabolic and disease-associated pathways, as well as information on dozens of cancer biomarkers. This review is intended to provide a brief summary of the HMDB and to offer some guidance on how it can be used in metabolomic studies of cancer.

  20. A computational platform to maintain and migrate manual functional annotations for BioCyc databases

    USDA-ARS?s Scientific Manuscript database

    Model organism databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integratio...

  1. WikiPathways App for Cytoscape: Making biological pathways amenable to network analysis and visualization.

    PubMed

    Kutmon, Martina; Lotia, Samad; Evelo, Chris T; Pico, Alexander R

    2014-01-01

    In this paper we present the open-source WikiPathways app for Cytoscape ( http://apps.cytoscape.org/apps/wikipathways) that can be used to import biological pathways for data visualization and network analysis. WikiPathways is an open, collaborative biological pathway database that provides fully annotated pathway diagrams for manual download or through web services. The WikiPathways app allows users to load pathways in two different views: as an annotated pathway ideal for data visualization and as a simple network to perform computational analysis. An example pathway and dataset are used to demonstrate the functionality of the WikiPathways app and how they can be combined and used together with other apps. More than 3000 downloads in the first 12 months following its release in August 2013 highlight the importance and adoption of the app in the network biology field.

  2. Differential pathway network analysis used to identify key pathways associated with pediatric pneumonia.

    PubMed

    Yang, Jun-Bo; Luo, Rong; Yan, Yan; Chen, Yan

    2016-12-01

    We aimed to identify key pathways to further explore the molecular mechanism underlying pediatric pneumonia using differential pathway network which integrated protein-protein interactions (PPI) data and pathway information. PPI data and pathway information were obtained from STRING and Reactome database, respectively. Next, pathway interactions were identified on the basis of constructing gene-gene interactions randomly, and a weight value computed using Spearman correlation coefficient was assigned to each pathway-pathway interaction, thereby to further detect differential pathway interactions. Subsequently, construction of differential pathway network was implemented using Cytoscope, following by network clustering analysis using ClusterONE. Finally, topological analysis for differential pathway network was performed to identify hub pathway which had top 5% degree distribution. Significantly, 901 pathways were identified to construct pathway interactions. After discarding the pathway interactions with weight value < 1.2, a differential pathway network was constructed, which contained 499 interactions and 347 pathways. Topological analysis showed 17 hub pathways (FGFR1 fusion mutants, molecules associated with elastic fibres, FGFR1 mutant receptor activation, and so on) were identified. Significantly, signaling by FGFR1 fusion mutants and FGFR1 mutant receptor activation simultaneously appeared in two clusters. Molecules associated with elastic fibres existed in one cluster. Accordingly, differential pathway network method might serve as a predictive tool to help us to further understand the development of pediatric pneumonia. FGFR1 fusion mutants, FGFR1 mutant receptor activation, and molecules associated with elastic fibres might play important roles in the progression of pediatric pneumonia.

  3. Combinations of gene ontology and pathway characterize and predict prognosis genes for recurrence of gastric cancer after surgery.

    PubMed

    Fan, Haiyan; Guo, Zhanjun; Wang, Cuijv

    2015-09-01

    Gastric cancer (GC) is the second leading cause of death from cancer globally. The most common cause of GC is the infection of Helicobacter pylori, but ∼11% of cases are caused by genetic factors. However, recurrences occur in approximately one-third of stage II GC patients, even if they are treated with adjuvant chemotherapy or chemoradiotherapy. This is potentially due to expression variation of genes; some candidate prognostic genes were identified in patients with high-risk recurrences. The objective of this study was to develop an effective computational method for meaningfully interpreting these GC-related genes and accurately predicting novel prognostic genes for high-risk recurrence patients. We employed properties of genes (gene ontology [GO] and KEGG pathway information) as features to characterize GC-related genes. We obtained an optimal set of features for interpreting these genes. By applying the minimum redundancy maximum relevance algorithm, we predicted the GC-related genes. With the same approach, we further predicted the genes for the prognostic of high-risk recurrence. We obtained 1104 GO terms and KEGG pathways and 530 GO terms and KEGG pathways, respectively, that characterized GC-related genes and recurrence-related genes well. Finally, three novel prognostic genes were predicted to help supplement genetic markers of high-risk GC patients for recurrence after surgery. An in-depth text mining indicated that the results are quite consistent with previous knowledge. Survival analysis of patients confirmed the novel prognostic genes as markers. By analyzing the related genes, we developed a systematic method to interpret the possible underlying mechanism of GC. The novel prognostic genes facilitate the understanding and therapy of GC recurrences after surgery.

  4. First insight into the human liver proteome from PROTEOME(SKY)-LIVER(Hu) 1.0, a publicly available database.

    PubMed

    2010-01-01

    Herein, we report proteome and transcriptome profiles of the human adult liver and present an initial analysis. Overall, the human liver proteome (HLP) data set comprises 6788 identified proteins with at least two peptides matches at 95% confidence, including 3721 proteins newly identified in liver. The human liver transcriptome (HLT) data set consists of 11 205 expressed genes. The HLP is the largest proteome data set for a human organ and is the first direct association between a proteome and its transcriptome derived from the same sample. Although it is hard to approach complete coverage of the HLP currently, several conclusions based on this data set are clearly reached: (1) The 5816 protein-encoding genes (PEGs) represented by the HLP and the 11 104 PEGs represented in the HLT have been identified from 20 070 PEGs in IPI Human v3.07 and 19 478 PEGs in the integrated human transcriptome database, respectively. (2) The patterns of chromosomal distribution of the genes corresponding to the HLP are highly consistent with those of the HLT. Some chromosomal regions, such as 16p13.3, 19q13.31, 19q13.42, and Xq28, exhibit particularly high densities of liver-specific genes, which perform the important functions related to normal physiology or/and pathology in this organ. (3) The HLP spans 6 orders of magnitude in relative protein abundance and 78% of the proteins fall in the middle of this range. Of newly identified liver proteins, 82.5% are of low abundance. (4) Proteins involving in metabolism, transport, and coagulation and those containing active domains for metabolism, transport, and biosynthesis are significantly enriched in liver. (5) All 94 metabolic pathways in KEGG are touched to different extent. Of which, for 48 pathways, particularly those involved in metabolism of carbohydrates and amino acids, more than 80% of the component proteins have been detected. The liver-specific pathways, such as those participating in metabolism of bile acid and bilirubin and

  5. Databases: Peter's Picks and Pans.

    ERIC Educational Resources Information Center

    Jacso, Peter

    1995-01-01

    Reviews the best and worst in databases on disk, CD-ROM, and online, and offers judgments and observations on database characteristics. Two databases are praised and three are criticized. (Author/JMV)

  6. Databases: Peter's Picks and Pans.

    ERIC Educational Resources Information Center

    Jacso, Peter

    1995-01-01

    Reviews the best and worst in databases on disk, CD-ROM, and online, and offers judgments and observations on database characteristics. Two databases are praised and three are criticized. (Author/JMV)

  7. Great Basin paleontological database

    USGS Publications Warehouse

    Zhang, N.; Blodgett, R.B.; Hofstra, A.H.

    2008-01-01

    The U.S. Geological Survey has constructed a paleontological database for the Great Basin physiographic province that can be served over the World Wide Web for data entry, queries, displays, and retrievals. It is similar to the web-database solution that we constructed for Alaskan paleontological data (www.alaskafossil.org). The first phase of this effort was to compile a paleontological bibliography for Nevada and portions of adjacent states in the Great Basin that has recently been completed. In addition, we are also compiling paleontological reports (Known as E&R reports) of the U.S. Geological Survey, which are another extensive source of l,egacy data for this region. Initial population of the database benefited from a recently published conodont data set and is otherwise focused on Devonian and Mississippian localities because strata of this age host important sedimentary exhalative (sedex) Au, Zn, and barite resources and enormons Carlin-type An deposits. In addition, these strata are the most important petroleum source rocks in the region, and record the transition from extension to contraction associated with the Antler orogeny, the Alamo meteorite impact, and biotic crises associated with global oceanic anoxic events. The finished product will provide an invaluable tool for future geologic mapping, paleontological research, and mineral resource investigations in the Great Basin, making paleontological data acquired over nearly the past 150 yr readily available over the World Wide Web. A description of the structure of the database and the web interface developed for this effort are provided herein. This database is being used ws a model for a National Paleontological Database (which we am currently developing for the U.S. Geological Survey) as well as for other paleontological databases now being developed in other parts of the globe. ?? 2008 Geological Society of America.

  8. Extracellular Matrix-dependent Pathways in Colorectal Cancer Cell Lines Reveal Potential Targets for Anticancer Therapies.

    PubMed

    Stankevicius, Vaidotas; Vasauskas, Gintautas; Noreikiene, Rimante; Kuodyte, Karolina; Valius, Mindaugas; Suziedelis, Kestutis

    2016-09-01

    Cancer cells grown in a 3D culture are more resistant to anticancer therapy treatment compared to those in a monolayer 2D culture. Emerging evidence has suggested that the key reasons for increased cell survival could be gene expression changes in cell-extracellular matrix (ECM) interaction-dependent manner. Global gene-expression changes were obtained in human colorectal carcinoma HT29 and DLD1 cell lines between 2D and laminin-rich (lr) ECM 3D growth conditions by gene-expression microarray analysis. The most significantly altered functional categories were revealed by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. The microarray data revealed that 841 and 1190 genes were differentially expressed in colorectal carcinoma DLD1 and HT29 cells. KEGG analysis indicated that the most significantly altered categories were cell adhesion, mitogen-activated protein kinase and immune response. Our results indicate altered pathways related to cancer development and progression and suggest potential ECM-regulated targets for the development of anticancer therapies. Copyright© 2016 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved.

  9. NASA Records Database

    NASA Technical Reports Server (NTRS)

    Callac, Christopher; Lunsford, Michelle

    2005-01-01

    The NASA Records Database, comprising a Web-based application program and a database, is used to administer an archive of paper records at Stennis Space Center. The system begins with an electronic form, into which a user enters information about records that the user is sending to the archive. The form is smart : it provides instructions for entering information correctly and prompts the user to enter all required information. Once complete, the form is digitally signed and submitted to the database. The system determines which storage locations are not in use, assigns the user s boxes of records to some of them, and enters these assignments in the database. Thereafter, the software tracks the boxes and can be used to locate them. By use of search capabilities of the software, specific records can be sought by box storage locations, accession numbers, record dates, submitting organizations, or details of the records themselves. Boxes can be marked with such statuses as checked out, lost, transferred, and destroyed. The system can generate reports showing boxes awaiting destruction or transfer. When boxes are transferred to the National Archives and Records Administration (NARA), the system can automatically fill out NARA records-transfer forms. Currently, several other NASA Centers are considering deploying the NASA Records Database to help automate their records archives.

  10. FishTraits Database

    USGS Publications Warehouse

    Angermeier, Paul L.; Frimpong, Emmanuel A.

    2009-01-01

    The need for integrated and widely accessible sources of species traits data to facilitate studies of ecology, conservation, and management has motivated development of traits databases for various taxa. In spite of the increasing number of traits-based analyses of freshwater fishes in the United States, no consolidated database of traits of this group exists publicly, and much useful information on these species is documented only in obscure sources. The largely inaccessible and unconsolidated traits information makes large-scale analysis involving many fishes and/or traits particularly challenging. FishTraits is a database of >100 traits for 809 (731 native and 78 exotic) fish species found in freshwaters of the conterminous United States, including 37 native families and 145 native genera. The database contains information on four major categories of traits: (1) trophic ecology, (2) body size and reproductive ecology (life history), (3) habitat associations, and (4) salinity and temperature tolerances. Information on geographic distribution and conservation status is also included. Together, we refer to the traits, distribution, and conservation status information as attributes. Descriptions of attributes are available here. Many sources were consulted to compile attributes, including state and regional species accounts and other databases.

  11. Shuttle Hypervelocity Impact Database

    NASA Technical Reports Server (NTRS)

    Hyde, James L.; Christiansen, Eric L.; Lear, Dana M.

    2011-01-01

    With three missions outstanding, the Shuttle Hypervelocity Impact Database has nearly 3000 entries. The data is divided into tables for crew module windows, payload bay door radiators and thermal protection system regions, with window impacts compromising just over half the records. In general, the database provides dimensions of hypervelocity impact damage, a component level location (i.e., window number or radiator panel number) and the orbiter mission when the impact occurred. Additional detail on the type of particle that produced the damage site is provided when sampling data and definitive analysis results are available. Details and insights on the contents of the database including examples of descriptive statistics will be provided. Post flight impact damage inspection and sampling techniques that were employed during the different observation campaigns will also be discussed. Potential enhancements to the database structure and availability of the data for other researche