Sample records for biological pathway databases

  1. PathwayAccess: CellDesigner plugins for pathway databases.

    PubMed

    Van Hemert, John L; Dickerson, Julie A

    2010-09-15

    CellDesigner provides a user-friendly interface for graphical biochemical pathway description. Many pathway databases are not directly exportable to CellDesigner models. PathwayAccess is an extensible suite of CellDesigner plugins, which connect CellDesigner directly to pathway databases using respective Java application programming interfaces. The process is streamlined for creating new PathwayAccess plugins for specific pathway databases. Three PathwayAccess plugins, MetNetAccess, BioCycAccess and ReactomeAccess, directly connect CellDesigner to the pathway databases MetNetDB, BioCyc and Reactome. PathwayAccess plugins enable CellDesigner users to expose pathway data to analytical CellDesigner functions, curate their pathway databases and visually integrate pathway data from different databases using standard Systems Biology Markup Language and Systems Biology Graphical Notation. Implemented in Java, PathwayAccess plugins run with CellDesigner version 4.0.1 and were tested on Ubuntu Linux, Windows XP and 7, and MacOSX. Source code, binaries, documentation and video walkthroughs are freely available at http://vrac.iastate.edu/~jlv.

  2. PATIKAweb: a Web interface for analyzing biological pathways through advanced querying and visualization.

    PubMed

    Dogrusoz, U; Erson, E Z; Giral, E; Demir, E; Babur, O; Cetintas, A; Colak, R

    2006-02-01

    Patikaweb provides a Web interface for retrieving and analyzing biological pathways in the Patika database, which contains data integrated from various prominent public pathway databases. It features a user-friendly interface, dynamic visualization and automated layout, advanced graph-theoretic queries for extracting biologically important phenomena, local persistence capability and exporting facilities to various pathway exchange formats.

  3. Redundancy control in pathway databases (ReCiPa): an application for improving gene-set enrichment analysis in Omics studies and "Big data" biology.

    PubMed

    Vivar, Juan C; Pemu, Priscilla; McPherson, Ruth; Ghosh, Sujoy

    2013-08-01

    Abstract Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of "Big Data" that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of 'omics'-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large 'omics' datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association to obesity compared to pathways identified from the original databases.

  4. cPath: open source software for collecting, storing, and querying biological pathways.

    PubMed

    Cerami, Ethan G; Bader, Gary D; Gross, Benjamin E; Sander, Chris

    2006-11-13

    Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required. We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use. cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling.

  5. The NCBI BioSystems database.

    PubMed

    Geer, Lewis Y; Marchler-Bauer, Aron; Geer, Renata C; Han, Lianyi; He, Jane; He, Siqian; Liu, Chunlei; Shi, Wenyao; Bryant, Stephen H

    2010-01-01

    The NCBI BioSystems database, found at http://www.ncbi.nlm.nih.gov/biosystems/, centralizes and cross-links existing biological systems databases, increasing their utility and target audience by integrating their pathways and systems into NCBI resources. This integration allows users of NCBI's Entrez databases to quickly categorize proteins, genes and small molecules by metabolic pathway, disease state or other BioSystem type, without requiring time-consuming inference of biological relationships from the literature or multiple experimental datasets.

  6. cPath: open source software for collecting, storing, and querying biological pathways

    PubMed Central

    Cerami, Ethan G; Bader, Gary D; Gross, Benjamin E; Sander, Chris

    2006-01-01

    Background Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required. Results We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use. Conclusion cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling. PMID:17101041

  7. The NCBI BioSystems database

    PubMed Central

    Geer, Lewis Y.; Marchler-Bauer, Aron; Geer, Renata C.; Han, Lianyi; He, Jane; He, Siqian; Liu, Chunlei; Shi, Wenyao; Bryant, Stephen H.

    2010-01-01

    The NCBI BioSystems database, found at http://www.ncbi.nlm.nih.gov/biosystems/, centralizes and cross-links existing biological systems databases, increasing their utility and target audience by integrating their pathways and systems into NCBI resources. This integration allows users of NCBI’s Entrez databases to quickly categorize proteins, genes and small molecules by metabolic pathway, disease state or other BioSystem type, without requiring time-consuming inference of biological relationships from the literature or multiple experimental datasets. PMID:19854944

  8. BioPAX – A community standard for pathway data sharing

    PubMed Central

    Demir, Emek; Cary, Michael P.; Paley, Suzanne; Fukuda, Ken; Lemer, Christian; Vastrik, Imre; Wu, Guanming; D’Eustachio, Peter; Schaefer, Carl; Luciano, Joanne; Schacherer, Frank; Martinez-Flores, Irma; Hu, Zhenjun; Jimenez-Jacinto, Veronica; Joshi-Tope, Geeta; Kandasamy, Kumaran; Lopez-Fuentes, Alejandra C.; Mi, Huaiyu; Pichler, Elgar; Rodchenkov, Igor; Splendiani, Andrea; Tkachev, Sasha; Zucker, Jeremy; Gopinath, Gopal; Rajasimha, Harsha; Ramakrishnan, Ranjani; Shah, Imran; Syed, Mustafa; Anwar, Nadia; Babur, Ozgun; Blinov, Michael; Brauner, Erik; Corwin, Dan; Donaldson, Sylva; Gibbons, Frank; Goldberg, Robert; Hornbeck, Peter; Luna, Augustin; Murray-Rust, Peter; Neumann, Eric; Reubenacker, Oliver; Samwald, Matthias; van Iersel, Martijn; Wimalaratne, Sarala; Allen, Keith; Braun, Burk; Whirl-Carrillo, Michelle; Dahlquist, Kam; Finney, Andrew; Gillespie, Marc; Glass, Elizabeth; Gong, Li; Haw, Robin; Honig, Michael; Hubaut, Olivier; Kane, David; Krupa, Shiva; Kutmon, Martina; Leonard, Julie; Marks, Debbie; Merberg, David; Petri, Victoria; Pico, Alex; Ravenscroft, Dean; Ren, Liya; Shah, Nigam; Sunshine, Margot; Tang, Rebecca; Whaley, Ryan; Letovksy, Stan; Buetow, Kenneth H.; Rzhetsky, Andrey; Schachter, Vincent; Sobral, Bruno S.; Dogrusoz, Ugur; McWeeney, Shannon; Aladjem, Mirit; Birney, Ewan; Collado-Vides, Julio; Goto, Susumu; Hucka, Michael; Le Novère, Nicolas; Maltsev, Natalia; Pandey, Akhilesh; Thomas, Paul; Wingender, Edgar; Karp, Peter D.; Sander, Chris; Bader, Gary D.

    2010-01-01

    BioPAX (Biological Pathway Exchange) is a standard language to represent biological pathways at the molecular and cellular level. Its major use is to facilitate the exchange of pathway data (http://www.biopax.org). Pathway data captures our understanding of biological processes, but its rapid growth necessitates development of databases and computational tools to aid interpretation. However, the current fragmentation of pathway information across many databases with incompatible formats presents barriers to its effective use. BioPAX solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. BioPAX was created through a community process. Through BioPAX, millions of interactions organized into thousands of pathways across many organisms, from a growing number of sources, are available. Thus, large amounts of pathway data are available in a computable form to support visualization, analysis and biological discovery. PMID:20829833

  9. An editor for pathway drawing and data visualization in the Biopathways Workbench.

    PubMed

    Byrnes, Robert W; Cotter, Dawn; Maer, Andreia; Li, Joshua; Nadeau, David; Subramaniam, Shankar

    2009-10-02

    Pathway models serve as the basis for much of systems biology. They are often built using programs designed for the purpose. Constructing new models generally requires simultaneous access to experimental data of diverse types, to databases of well-characterized biological compounds and molecular intermediates, and to reference model pathways. However, few if any software applications provide all such capabilities within a single user interface. The Pathway Editor is a program written in the Java programming language that allows de-novo pathway creation and downloading of LIPID MAPS (Lipid Metabolites and Pathways Strategy) and KEGG lipid metabolic pathways, and of measured time-dependent changes to lipid components of metabolism. Accessed through Java Web Start, the program downloads pathways from the LIPID MAPS Pathway database (Pathway) as well as from the LIPID MAPS web server http://www.lipidmaps.org. Data arises from metabolomic (lipidomic), microarray, and protein array experiments performed by the LIPID MAPS consortium of laboratories and is arranged by experiment. Facility is provided to create, connect, and annotate nodes and processes on a drawing panel with reference to database objects and time course data. Node and interaction layout as well as data display may be configured in pathway diagrams as desired. Users may extend diagrams, and may also read and write data and non-lipidomic KEGG pathways to and from files. Pathway diagrams in XML format, containing database identifiers referencing specific compounds and experiments, can be saved to a local file for subsequent use. The program is built upon a library of classes, referred to as the Biopathways Workbench, that convert between different file formats and database objects. An example of this feature is provided in the form of read/construct/write access to models in SBML (Systems Biology Markup Language) contained in the local file system. Inclusion of access to multiple experimental data types and of pathway diagrams within a single interface, automatic updating through connectivity to an online database, and a focus on annotation, including reference to standardized lipid nomenclature as well as common lipid names, supports the view that the Pathway Editor represents a significant, practicable contribution to current pathway modeling tools.

  10. MIMO: an efficient tool for molecular interaction maps overlap

    PubMed Central

    2013-01-01

    Background Molecular pathways represent an ensemble of interactions occurring among molecules within the cell and between cells. The identification of similarities between molecular pathways across organisms and functions has a critical role in understanding complex biological processes. For the inference of such novel information, the comparison of molecular pathways requires to account for imperfect matches (flexibility) and to efficiently handle complex network topologies. To date, these characteristics are only partially available in tools designed to compare molecular interaction maps. Results Our approach MIMO (Molecular Interaction Maps Overlap) addresses the first problem by allowing the introduction of gaps and mismatches between query and template pathways and permits -when necessary- supervised queries incorporating a priori biological information. It then addresses the second issue by relying directly on the rich graph topology described in the Systems Biology Markup Language (SBML) standard, and uses multidigraphs to efficiently handle multiple queries on biological graph databases. The algorithm has been here successfully used to highlight the contact point between various human pathways in the Reactome database. Conclusions MIMO offers a flexible and efficient graph-matching tool for comparing complex biological pathways. PMID:23672344

  11. Reactome graph database: Efficient access to complex pathway data

    PubMed Central

    Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter

    2018-01-01

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902

  12. Reactome graph database: Efficient access to complex pathway data.

    PubMed

    Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning

    2018-01-01

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.

  13. An overview of bioinformatics methods for modeling biological pathways in yeast

    PubMed Central

    Hou, Jie; Acharya, Lipi; Zhu, Dongxiao

    2016-01-01

    The advent of high-throughput genomics techniques, along with the completion of genome sequencing projects, identification of protein–protein interactions and reconstruction of genome-scale pathways, has accelerated the development of systems biology research in the yeast organism Saccharomyces cerevisiae. In particular, discovery of biological pathways in yeast has become an important forefront in systems biology, which aims to understand the interactions among molecules within a cell leading to certain cellular processes in response to a specific environment. While the existing theoretical and experimental approaches enable the investigation of well-known pathways involved in metabolism, gene regulation and signal transduction, bioinformatics methods offer new insights into computational modeling of biological pathways. A wide range of computational approaches has been proposed in the past for reconstructing biological pathways from high-throughput datasets. Here we review selected bioinformatics approaches for modeling biological pathways in S. cerevisiae, including metabolic pathways, gene-regulatory pathways and signaling pathways. We start with reviewing the research on biological pathways followed by discussing key biological databases. In addition, several representative computational approaches for modeling biological pathways in yeast are discussed. PMID:26476430

  14. A computational platform to maintain and migrate manual functional annotations for BioCyc databases.

    PubMed

    Walsh, Jesse R; Sen, Taner Z; Dickerson, Julie A

    2014-10-12

    BioCyc databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integration as new knowledge is discovered. As functional annotations are improved, scalable methods are needed for curators to manage annotations without detailed knowledge of the specific design of the BioCyc database. We have developed CycTools, a software tool which allows curators to maintain functional annotations in a model organism database. This tool builds on existing software to improve and simplify annotation data imports of user provided data into BioCyc databases. Additionally, CycTools automatically resolves synonyms and alternate identifiers contained within the database into the appropriate internal identifiers. Automating steps in the manual data entry process can improve curation efforts for major biological databases. The functionality of CycTools is demonstrated by transferring GO term annotations from MaizeCyc to matching proteins in CornCyc, both maize metabolic pathway databases available at MaizeGDB, and by creating strain specific databases for metabolic engineering.

  15. An overview of bioinformatics methods for modeling biological pathways in yeast.

    PubMed

    Hou, Jie; Acharya, Lipi; Zhu, Dongxiao; Cheng, Jianlin

    2016-03-01

    The advent of high-throughput genomics techniques, along with the completion of genome sequencing projects, identification of protein-protein interactions and reconstruction of genome-scale pathways, has accelerated the development of systems biology research in the yeast organism Saccharomyces cerevisiae In particular, discovery of biological pathways in yeast has become an important forefront in systems biology, which aims to understand the interactions among molecules within a cell leading to certain cellular processes in response to a specific environment. While the existing theoretical and experimental approaches enable the investigation of well-known pathways involved in metabolism, gene regulation and signal transduction, bioinformatics methods offer new insights into computational modeling of biological pathways. A wide range of computational approaches has been proposed in the past for reconstructing biological pathways from high-throughput datasets. Here we review selected bioinformatics approaches for modeling biological pathways inS. cerevisiae, including metabolic pathways, gene-regulatory pathways and signaling pathways. We start with reviewing the research on biological pathways followed by discussing key biological databases. In addition, several representative computational approaches for modeling biological pathways in yeast are discussed. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  16. WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data

    PubMed Central

    Yi, Ming; Horton, Jay D; Cohen, Jonathan C; Hobbs, Helen H; Stephens, Robert M

    2006-01-01

    Background Analysis of High Throughput (HTP) Data such as microarray and proteomics data has provided a powerful methodology to study patterns of gene regulation at genome scale. A major unresolved problem in the post-genomic era is to assemble the large amounts of data generated into a meaningful biological context. We have developed a comprehensive software tool, WholePathwayScope (WPS), for deriving biological insights from analysis of HTP data. Result WPS extracts gene lists with shared biological themes through color cue templates. WPS statistically evaluates global functional category enrichment of gene lists and pathway-level pattern enrichment of data. WPS incorporates well-known biological pathways from KEGG (Kyoto Encyclopedia of Genes and Genomes) and Biocarta, GO (Gene Ontology) terms as well as user-defined pathways or relevant gene clusters or groups, and explores gene-term relationships within the derived gene-term association networks (GTANs). WPS simultaneously compares multiple datasets within biological contexts either as pathways or as association networks. WPS also integrates Genetic Association Database and Partial MedGene Database for disease-association information. We have used this program to analyze and compare microarray and proteomics datasets derived from a variety of biological systems. Application examples demonstrated the capacity of WPS to significantly facilitate the analysis of HTP data for integrative discovery. Conclusion This tool represents a pathway-based platform for discovery integration to maximize analysis power. The tool is freely available at . PMID:16423281

  17. Pathway Distiller - multisource biological pathway consolidation

    PubMed Central

    2012-01-01

    Background One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. Methods After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. Results We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. Conclusions By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments. PMID:23134636

  18. Pathway Distiller - multisource biological pathway consolidation.

    PubMed

    Doderer, Mark S; Anguiano, Zachry; Suresh, Uthra; Dashnamoorthy, Ravi; Bishop, Alexander J R; Chen, Yidong

    2012-01-01

    One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments.

  19. FMM: a web server for metabolic pathway reconstruction and comparative analysis.

    PubMed

    Chou, Chih-Hung; Chang, Wen-Chi; Chiu, Chih-Min; Huang, Chih-Chang; Huang, Hsien-Da

    2009-07-01

    Synthetic Biology, a multidisciplinary field, is growing rapidly. Improving the understanding of biological systems through mimicry and producing bio-orthogonal systems with new functions are two complementary pursuits in this field. A web server called FMM (From Metabolite to Metabolite) was developed for this purpose. FMM can reconstruct metabolic pathways form one metabolite to another metabolite among different species, based mainly on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and other integrated biological databases. Novel presentation for connecting different KEGG maps is newly provided. Both local and global graphical views of the metabolic pathways are designed. FMM has many applications in Synthetic Biology and Metabolic Engineering. For example, the reconstruction of metabolic pathways to produce valuable metabolites or secondary metabolites in bacteria or yeast is a promising strategy for drug production. FMM provides a highly effective way to elucidate the genes from which species should be cloned into those microorganisms based on FMM pathway comparative analysis. Consequently, FMM is an effective tool for applications in synthetic biology to produce both drugs and biofuels. This novel and innovative resource is now freely available at http://FMM.mbc.nctu.edu.tw/.

  20. RaMP: A Comprehensive Relational Database of Metabolomics Pathways for Pathway Enrichment Analysis of Genes and Metabolites

    PubMed Central

    Zhang, Bofei; Hu, Senyang; Baskin, Elizabeth; Patt, Andrew; Siddiqui, Jalal K.

    2018-01-01

    The value of metabolomics in translational research is undeniable, and metabolomics data are increasingly generated in large cohorts. The functional interpretation of disease-associated metabolites though is difficult, and the biological mechanisms that underlie cell type or disease-specific metabolomics profiles are oftentimes unknown. To help fully exploit metabolomics data and to aid in its interpretation, analysis of metabolomics data with other complementary omics data, including transcriptomics, is helpful. To facilitate such analyses at a pathway level, we have developed RaMP (Relational database of Metabolomics Pathways), which combines biological pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, WikiPathways, and the Human Metabolome DataBase (HMDB). To the best of our knowledge, an off-the-shelf, public database that maps genes and metabolites to biochemical/disease pathways and can readily be integrated into other existing software is currently lacking. For consistent and comprehensive analysis, RaMP enables batch and complex queries (e.g., list all metabolites involved in glycolysis and lung cancer), can readily be integrated into pathway analysis tools, and supports pathway overrepresentation analysis given a list of genes and/or metabolites of interest. For usability, we have developed a RaMP R package (https://github.com/Mathelab/RaMP-DB), including a user-friendly RShiny web application, that supports basic simple and batch queries, pathway overrepresentation analysis given a list of genes or metabolites of interest, and network visualization of gene-metabolite relationships. The package also includes the raw database file (mysql dump), thereby providing a stand-alone downloadable framework for public use and integration with other tools. In addition, the Python code needed to recreate the database on another system is also publicly available (https://github.com/Mathelab/RaMP-BackEnd). Updates for databases in RaMP will be checked multiple times a year and RaMP will be updated accordingly. PMID:29470400

  1. RaMP: A Comprehensive Relational Database of Metabolomics Pathways for Pathway Enrichment Analysis of Genes and Metabolites.

    PubMed

    Zhang, Bofei; Hu, Senyang; Baskin, Elizabeth; Patt, Andrew; Siddiqui, Jalal K; Mathé, Ewy A

    2018-02-22

    The value of metabolomics in translational research is undeniable, and metabolomics data are increasingly generated in large cohorts. The functional interpretation of disease-associated metabolites though is difficult, and the biological mechanisms that underlie cell type or disease-specific metabolomics profiles are oftentimes unknown. To help fully exploit metabolomics data and to aid in its interpretation, analysis of metabolomics data with other complementary omics data, including transcriptomics, is helpful. To facilitate such analyses at a pathway level, we have developed RaMP (Relational database of Metabolomics Pathways), which combines biological pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, WikiPathways, and the Human Metabolome DataBase (HMDB). To the best of our knowledge, an off-the-shelf, public database that maps genes and metabolites to biochemical/disease pathways and can readily be integrated into other existing software is currently lacking. For consistent and comprehensive analysis, RaMP enables batch and complex queries (e.g., list all metabolites involved in glycolysis and lung cancer), can readily be integrated into pathway analysis tools, and supports pathway overrepresentation analysis given a list of genes and/or metabolites of interest. For usability, we have developed a RaMP R package (https://github.com/Mathelab/RaMP-DB), including a user-friendly RShiny web application, that supports basic simple and batch queries, pathway overrepresentation analysis given a list of genes or metabolites of interest, and network visualization of gene-metabolite relationships. The package also includes the raw database file (mysql dump), thereby providing a stand-alone downloadable framework for public use and integration with other tools. In addition, the Python code needed to recreate the database on another system is also publicly available (https://github.com/Mathelab/RaMP-BackEnd). Updates for databases in RaMP will be checked multiple times a year and RaMP will be updated accordingly.

  2. MetNetAPI: A flexible method to access and manipulate biological network data from MetNet

    PubMed Central

    2010-01-01

    Background Convenient programmatic access to different biological databases allows automated integration of scientific knowledge. Many databases support a function to download files or data snapshots, or a webservice that offers "live" data. However, the functionality that a database offers cannot be represented in a static data download file, and webservices may consume considerable computational resources from the host server. Results MetNetAPI is a versatile Application Programming Interface (API) to the MetNetDB database. It abstracts, captures and retains operations away from a biological network repository and website. A range of database functions, previously only available online, can be immediately (and independently from the website) applied to a dataset of interest. Data is available in four layers: molecular entities, localized entities (linked to a specific organelle), interactions, and pathways. Navigation between these layers is intuitive (e.g. one can request the molecular entities in a pathway, as well as request in what pathways a specific entity participates). Data retrieval can be customized: Network objects allow the construction of new and integration of existing pathways and interactions, which can be uploaded back to our server. In contrast to webservices, the computational demand on the host server is limited to processing data-related queries only. Conclusions An API provides several advantages to a systems biology software platform. MetNetAPI illustrates an interface with a central repository of data that represents the complex interrelationships of a metabolic and regulatory network. As an alternative to data-dumps and webservices, it allows access to a current and "live" database and exposes analytical functions to application developers. Yet it only requires limited resources on the server-side (thin server/fat client setup). The API is available for Java, Microsoft.NET and R programming environments and offers flexible query and broad data- retrieval methods. Data retrieval can be customized to client needs and the API offers a framework to construct and manipulate user-defined networks. The design principles can be used as a template to build programmable interfaces for other biological databases. The API software and tutorials are available at http://www.metnetonline.org/api. PMID:21083943

  3. Consensus and conflict cards for metabolic pathway databases

    PubMed Central

    2013-01-01

    Background The metabolic network of H. sapiens and many other organisms is described in multiple pathway databases. The level of agreement between these descriptions, however, has proven to be low. We can use these different descriptions to our advantage by identifying conflicting information and combining their knowledge into a single, more accurate, and more complete description. This task is, however, far from trivial. Results We introduce the concept of Consensus and Conflict Cards (C2Cards) to provide concise overviews of what the databases do or do not agree on. Each card is centered at a single gene, EC number or reaction. These three complementary perspectives make it possible to distinguish disagreements on the underlying biology of a metabolic process from differences that can be explained by different decisions on how and in what detail to represent knowledge. As a proof-of-concept, we implemented C2CardsHuman, as a web application http://www.molgenis.org/c2cards, covering five human pathway databases. Conclusions C2Cards can contribute to ongoing reconciliation efforts by simplifying the identification of consensus and conflicts between pathway databases and lowering the threshold for experts to contribute. Several case studies illustrate the potential of the C2Cards in identifying disagreements on the underlying biology of a metabolic process. The overviews may also point out controversial biological knowledge that should be subject of further research. Finally, the examples provided emphasize the importance of manual curation and the need for a broad community involvement. PMID:23803311

  4. Consensus and conflict cards for metabolic pathway databases.

    PubMed

    Stobbe, Miranda D; Swertz, Morris A; Thiele, Ines; Rengaw, Trebor; van Kampen, Antoine H C; Moerland, Perry D

    2013-06-26

    The metabolic network of H. sapiens and many other organisms is described in multiple pathway databases. The level of agreement between these descriptions, however, has proven to be low. We can use these different descriptions to our advantage by identifying conflicting information and combining their knowledge into a single, more accurate, and more complete description. This task is, however, far from trivial. We introduce the concept of Consensus and Conflict Cards (C₂Cards) to provide concise overviews of what the databases do or do not agree on. Each card is centered at a single gene, EC number or reaction. These three complementary perspectives make it possible to distinguish disagreements on the underlying biology of a metabolic process from differences that can be explained by different decisions on how and in what detail to represent knowledge. As a proof-of-concept, we implemented C₂Cards(Human), as a web application http://www.molgenis.org/c2cards, covering five human pathway databases. C₂Cards can contribute to ongoing reconciliation efforts by simplifying the identification of consensus and conflicts between pathway databases and lowering the threshold for experts to contribute. Several case studies illustrate the potential of the C₂Cards in identifying disagreements on the underlying biology of a metabolic process. The overviews may also point out controversial biological knowledge that should be subject of further research. Finally, the examples provided emphasize the importance of manual curation and the need for a broad community involvement.

  5. Systematic reconstruction of TRANSPATH data into Cell System Markup Language

    PubMed Central

    Nagasaki, Masao; Saito, Ayumu; Li, Chen; Jeong, Euna; Miyano, Satoru

    2008-01-01

    Background Many biological repositories store information based on experimental study of the biological processes within a cell, such as protein-protein interactions, metabolic pathways, signal transduction pathways, or regulations of transcription factors and miRNA. Unfortunately, it is difficult to directly use such information when generating simulation-based models. Thus, modeling rules for encoding biological knowledge into system-dynamics-oriented standardized formats would be very useful for fully understanding cellular dynamics at the system level. Results We selected the TRANSPATH database, a manually curated high-quality pathway database, which provides a plentiful source of cellular events in humans, mice, and rats, collected from over 31,500 publications. In this work, we have developed 16 modeling rules based on hybrid functional Petri net with extension (HFPNe), which is suitable for graphical representing and simulating biological processes. In the modeling rules, each Petri net element is incorporated with Cell System Ontology to enable semantic interoperability of models. As a formal ontology for biological pathway modeling with dynamics, CSO also defines biological terminology and corresponding icons. By combining HFPNe with the CSO features, it is possible to make TRANSPATH data to simulation-based and semantically valid models. The results are encoded into a biological pathway format, Cell System Markup Language (CSML), which eases the exchange and integration of biological data and models. Conclusion By using the 16 modeling rules, 97% of the reactions in TRANSPATH are converted into simulation-based models represented in CSML. This reconstruction demonstrates that it is possible to use our rules to generate quantitative models from static pathway descriptions. PMID:18570683

  6. Systematic reconstruction of TRANSPATH data into cell system markup language.

    PubMed

    Nagasaki, Masao; Saito, Ayumu; Li, Chen; Jeong, Euna; Miyano, Satoru

    2008-06-23

    Many biological repositories store information based on experimental study of the biological processes within a cell, such as protein-protein interactions, metabolic pathways, signal transduction pathways, or regulations of transcription factors and miRNA. Unfortunately, it is difficult to directly use such information when generating simulation-based models. Thus, modeling rules for encoding biological knowledge into system-dynamics-oriented standardized formats would be very useful for fully understanding cellular dynamics at the system level. We selected the TRANSPATH database, a manually curated high-quality pathway database, which provides a plentiful source of cellular events in humans, mice, and rats, collected from over 31,500 publications. In this work, we have developed 16 modeling rules based on hybrid functional Petri net with extension (HFPNe), which is suitable for graphical representing and simulating biological processes. In the modeling rules, each Petri net element is incorporated with Cell System Ontology to enable semantic interoperability of models. As a formal ontology for biological pathway modeling with dynamics, CSO also defines biological terminology and corresponding icons. By combining HFPNe with the CSO features, it is possible to make TRANSPATH data to simulation-based and semantically valid models. The results are encoded into a biological pathway format, Cell System Markup Language (CSML), which eases the exchange and integration of biological data and models. By using the 16 modeling rules, 97% of the reactions in TRANSPATH are converted into simulation-based models represented in CSML. This reconstruction demonstrates that it is possible to use our rules to generate quantitative models from static pathway descriptions.

  7. Databases for Microbiologists

    DOE PAGES

    Zhulin, Igor B.

    2015-05-26

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.

  8. Databases for Microbiologists

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhulin, Igor B.

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.

  9. Databases for Microbiologists

    PubMed Central

    2015-01-01

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493

  10. Mapping the patent landscape of synthetic biology for fine chemical production pathways.

    PubMed

    Carbonell, Pablo; Gök, Abdullah; Shapira, Philip; Faulon, Jean-Loup

    2016-09-01

    A goal of synthetic biology bio-foundries is to innovate through an iterative design/build/test/learn pipeline. In assessing the value of new chemical production routes, the intellectual property (IP) novelty of the pathway is important. Exploratory studies can be carried using knowledge of the patent/IP landscape for synthetic biology and metabolic engineering. In this paper, we perform an assessment of pathways as potential targets for chemical production across the full catalogue of reachable chemicals in the extended metabolic space of chassis organisms, as computed by the retrosynthesis-based algorithm RetroPath. Our database for reactions processed by sequences in heterologous pathways was screened against the PatSeq database, a comprehensive collection of more than 150M sequences present in patent grants and applications. We also examine related patent families using Derwent Innovations. This large-scale computational study provides useful insights into the IP landscape of synthetic biology for fine and specialty chemicals production. © 2016 The Authors. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.

  11. WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research.

    PubMed

    Slenter, Denise N; Kutmon, Martina; Hanspers, Kristina; Riutta, Anders; Windsor, Jacob; Nunes, Nuno; Mélius, Jonathan; Cirillo, Elisa; Coort, Susan L; Digles, Daniela; Ehrhart, Friederike; Giesbertz, Pieter; Kalafati, Marianthi; Martens, Marvin; Miller, Ryan; Nishida, Kozo; Rieswijk, Linda; Waagmeester, Andra; Eijssen, Lars M T; Evelo, Chris T; Pico, Alexander R; Willighagen, Egon L

    2018-01-04

    WikiPathways (wikipathways.org) captures the collective knowledge represented in biological pathways. By providing a database in a curated, machine readable way, omics data analysis and visualization is enabled. WikiPathways and other pathway databases are used to analyze experimental data by research groups in many fields. Due to the open and collaborative nature of the WikiPathways platform, our content keeps growing and is getting more accurate, making WikiPathways a reliable and rich pathway database. Previously, however, the focus was primarily on genes and proteins, leaving many metabolites with only limited annotation. Recent curation efforts focused on improving the annotation of metabolism and metabolic pathways by associating unmapped metabolites with database identifiers and providing more detailed interaction knowledge. Here, we report the outcomes of the continued growth and curation efforts, such as a doubling of the number of annotated metabolite nodes in WikiPathways. Furthermore, we introduce an OpenAPI documentation of our web services and the FAIR (Findable, Accessible, Interoperable and Reusable) annotation of resources to increase the interoperability of the knowledge encoded in these pathways and experimental omics data. New search options, monthly downloads, more links to metabolite databases, and new portals make pathway knowledge more effortlessly accessible to individual researchers and research communities. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. PathNER: a tool for systematic identification of biological pathway mentions in the literature

    PubMed Central

    2013-01-01

    Background Biological pathways are central to many biomedical studies and are frequently discussed in the literature. Several curated databases have been established to collate the knowledge of molecular processes constituting pathways. Yet, there has been little focus on enabling systematic detection of pathway mentions in the literature. Results We developed a tool, named PathNER (Pathway Named Entity Recognition), for the systematic identification of pathway mentions in the literature. PathNER is based on soft dictionary matching and rules, with the dictionary generated from public pathway databases. The rules utilise general pathway-specific keywords, syntactic information and gene/protein mentions. Detection results from both components are merged. On a gold-standard corpus, PathNER achieved an F1-score of 84%. To illustrate its potential, we applied PathNER on a collection of articles related to Alzheimer's disease to identify associated pathways, highlighting cases that can complement an existing manually curated knowledgebase. Conclusions In contrast to existing text-mining efforts that target the automatic reconstruction of pathway details from molecular interactions mentioned in the literature, PathNER focuses on identifying specific named pathway mentions. These mentions can be used to support large-scale curation and pathway-related systems biology applications, as demonstrated in the example of Alzheimer's disease. PathNER is implemented in Java and made freely available online at http://sourceforge.net/projects/pathner/. PMID:24555844

  13. Plant Reactome: a resource for plant pathways and comparative analysis

    PubMed Central

    Naithani, Sushma; Preece, Justin; D'Eustachio, Peter; Gupta, Parul; Amarasinghe, Vindhya; Dharmawardhana, Palitha D.; Wu, Guanming; Fabregat, Antonio; Elser, Justin L.; Weiser, Joel; Keays, Maria; Fuentes, Alfonso Munoz-Pomer; Petryszak, Robert; Stein, Lincoln D.; Ware, Doreen; Jaiswal, Pankaj

    2017-01-01

    Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX. PMID:27799469

  14. From 20th century metabolic wall charts to 21st century systems biology: database of mammalian metabolic enzymes

    PubMed Central

    Corcoran, Callan C.; Grady, Cameron R.; Pisitkun, Trairak; Parulekar, Jaya

    2017-01-01

    The organization of the mammalian genome into gene subsets corresponding to specific functional classes has provided key tools for systems biology research. Here, we have created a web-accessible resource called the Mammalian Metabolic Enzyme Database (https://hpcwebapps.cit.nih.gov/ESBL/Database/MetabolicEnzymes/MetabolicEnzymeDatabase.html) keyed to the biochemical reactions represented on iconic metabolic pathway wall charts created in the previous century. Overall, we have mapped 1,647 genes to these pathways, representing ~7 percent of the protein-coding genome. To illustrate the use of the database, we apply it to the area of kidney physiology. In so doing, we have created an additional database (Database of Metabolic Enzymes in Kidney Tubule Segments: https://hpcwebapps.cit.nih.gov/ESBL/Database/MetabolicEnzymes/), mapping mRNA abundance measurements (mined from RNA-Seq studies) for all metabolic enzymes to each of 14 renal tubule segments. We carry out bioinformatics analysis of the enzyme expression pattern among renal tubule segments and mine various data sources to identify vasopressin-regulated metabolic enzymes in the renal collecting duct. PMID:27974320

  15. Systematization of the protein sequence diversity in enzymes related to secondary metabolic pathways in plants, in the context of big data biology inspired by the KNApSAcK motorcycle database.

    PubMed

    Ikeda, Shun; Abe, Takashi; Nakamura, Yukiko; Kibinge, Nelson; Hirai Morita, Aki; Nakatani, Atsushi; Ono, Naoaki; Ikemura, Toshimichi; Nakamura, Kensuke; Altaf-Ul-Amin, Md; Kanaya, Shigehiko

    2013-05-01

    Biology is increasingly becoming a data-intensive science with the recent progress of the omics fields, e.g. genomics, transcriptomics, proteomics and metabolomics. The species-metabolite relationship database, KNApSAcK Core, has been widely utilized and cited in metabolomics research, and chronological analysis of that research work has helped to reveal recent trends in metabolomics research. To meet the needs of these trends, the KNApSAcK database has been extended by incorporating a secondary metabolic pathway database called Motorcycle DB. We examined the enzyme sequence diversity related to secondary metabolism by means of batch-learning self-organizing maps (BL-SOMs). Initially, we constructed a map by using a big data matrix consisting of the frequencies of all possible dipeptides in the protein sequence segments of plants and bacteria. The enzyme sequence diversity of the secondary metabolic pathways was examined by identifying clusters of segments associated with certain enzyme groups in the resulting map. The extent of diversity of 15 secondary metabolic enzyme groups is discussed. Data-intensive approaches such as BL-SOM applied to big data matrices are needed for systematizing protein sequences. Handling big data has become an inevitable part of biology.

  16. An "EAR" on environmental surveillance and monitoring: A ...

    EPA Pesticide Factsheets

    Current environmental monitoring approaches focus primarily on chemical occurrence. However, based on chemical concentration alone, it can be difficult to identify which compounds may be of toxicological concern for prioritization for further monitoring or management. This can be problematic because toxicological characterization is lacking for many emerging contaminants. New sources of high throughput screening data like the ToxCast™ database, which contains data for over 9,000 compounds screened through up to 1,100 assays, are now available. Integrated analysis of chemical occurrence data with HTS data offers new opportunities to prioritize chemicals, sites, or biological effects for further investigation based on concentrations detected in the environment linked to relative potencies in pathway-based bioassays. As a case study, chemical occurrence data from a 2012 study in the Great Lakes Basin along with the ToxCast™ effects database were used to calculate exposure-activity ratios (EARs) as a prioritization tool. Technical considerations of data processing and use of the ToxCast™ database are presented and discussed. EAR prioritization identified multiple sites, biological pathways, and chemicals that warrant further investigation. Biological pathways were then linked to adverse outcome pathways to identify potential adverse outcomes and biomarkers for use in subsequent monitoring efforts. Anthropogenic contaminants are frequently reported in environm

  17. AN OVERVIEW OF COMPUTATIONAL LIFE SCIENCE DATABASES & EXCHANGE FORMATS OF RELEVANCE TO CHEMICAL BIOLOGY RESEARCH

    PubMed Central

    Hall, Aaron Smalter; Shan, Yunfeng; Lushington, Gerald; Visvanathan, Mahesh

    2016-01-01

    Databases and exchange formats describing biological entities such as chemicals and proteins, along with their relationships, are a critical component of research in life sciences disciplines, including chemical biology wherein small information about small molecule properties converges with cellular and molecular biology. Databases for storing biological entities are growing not only in size, but also in type, with many similarities between them and often subtle differences. The data formats available to describe and exchange these entities are numerous as well. In general, each format is optimized for a particular purpose or database, and hence some understanding of these formats is required when choosing one for research purposes. This paper reviews a selection of different databases and data formats with the goal of summarizing their purposes, features, and limitations. Databases are reviewed under the categories of 1) protein interactions, 2) metabolic pathways, 3) chemical interactions, and 4) drug discovery. Representation formats will be discussed according to those describing chemical structures, and those describing genomic/proteomic entities. PMID:22934944

  18. An overview of computational life science databases & exchange formats of relevance to chemical biology research.

    PubMed

    Smalter Hall, Aaron; Shan, Yunfeng; Lushington, Gerald; Visvanathan, Mahesh

    2013-03-01

    Databases and exchange formats describing biological entities such as chemicals and proteins, along with their relationships, are a critical component of research in life sciences disciplines, including chemical biology wherein small information about small molecule properties converges with cellular and molecular biology. Databases for storing biological entities are growing not only in size, but also in type, with many similarities between them and often subtle differences. The data formats available to describe and exchange these entities are numerous as well. In general, each format is optimized for a particular purpose or database, and hence some understanding of these formats is required when choosing one for research purposes. This paper reviews a selection of different databases and data formats with the goal of summarizing their purposes, features, and limitations. Databases are reviewed under the categories of 1) protein interactions, 2) metabolic pathways, 3) chemical interactions, and 4) drug discovery. Representation formats will be discussed according to those describing chemical structures, and those describing genomic/proteomic entities.

  19. Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology

    PubMed Central

    Paley, Suzanne M.; Krummenacker, Markus; Latendresse, Mario; Dale, Joseph M.; Lee, Thomas J.; Kaipa, Pallavi; Gilham, Fred; Spaulding, Aaron; Popescu, Liviu; Altman, Tomer; Paulsen, Ian; Keseler, Ingrid M.; Caspi, Ron

    2010-01-01

    Pathway Tools is a production-quality software environment for creating a type of model-organism database called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc integrates the evolving understanding of the genes, proteins, metabolic network and regulatory network of an organism. This article provides an overview of Pathway Tools capabilities. The software performs multiple computational inferences including prediction of metabolic pathways, prediction of metabolic pathway hole fillers and prediction of operons. It enables interactive editing of PGDBs by DB curators. It supports web publishing of PGDBs, and provides a large number of query and visualization tools. The software also supports comparative analyses of PGDBs, and provides several systems biology analyses of PGDBs including reachability analysis of metabolic networks, and interactive tracing of metabolites through a metabolic network. More than 800 PGDBs have been created using Pathway Tools by scientists around the world, many of which are curated DBs for important model organisms. Those PGDBs can be exchanged using a peer-to-peer DB sharing system called the PGDB Registry. PMID:19955237

  20. Conversion of KEGG metabolic pathways to SBGN maps including automatic layout

    PubMed Central

    2013-01-01

    Background Biologists make frequent use of databases containing large and complex biological networks. One popular database is the Kyoto Encyclopedia of Genes and Genomes (KEGG) which uses its own graphical representation and manual layout for pathways. While some general drawing conventions exist for biological networks, arbitrary graphical representations are very common. Recently, a new standard has been established for displaying biological processes, the Systems Biology Graphical Notation (SBGN), which aims to unify the look of such maps. Ideally, online repositories such as KEGG would automatically provide networks in a variety of notations including SBGN. Unfortunately, this is non‐trivial, since converting between notations may add, remove or otherwise alter map elements so that the existing layout cannot be simply reused. Results Here we describe a methodology for automatic translation of KEGG metabolic pathways into the SBGN format. We infer important properties of the KEGG layout and treat these as layout constraints that are maintained during the conversion to SBGN maps. Conclusions This allows for the drawing and layout conventions of SBGN to be followed while creating maps that are still recognizably the original KEGG pathways. This article details the steps in this process and provides examples of the final result. PMID:23953132

  1. ESEA: Discovering the Dysregulated Pathways based on Edge Set Enrichment Analysis

    PubMed Central

    Han, Junwei; Shi, Xinrui; Zhang, Yunpeng; Xu, Yanjun; Jiang, Ying; Zhang, Chunlong; Feng, Li; Yang, Haixiu; Shang, Desi; Sun, Zeguo; Su, Fei; Li, Chunquan; Li, Xia

    2015-01-01

    Pathway analyses are playing an increasingly important role in understanding biological mechanism, cellular function and disease states. Current pathway-identification methods generally focus on only the changes of gene expression levels; however, the biological relationships among genes are also the fundamental components of pathways, and the dysregulated relationships may also alter the pathway activities. We propose a powerful computational method, Edge Set Enrichment Analysis (ESEA), for the identification of dysregulated pathways. This provides a novel way of pathway analysis by investigating the changes of biological relationships of pathways in the context of gene expression data. Simulation studies illustrate the power and performance of ESEA under various simulated conditions. Using real datasets from p53 mutation, Type 2 diabetes and lung cancer, we validate effectiveness of ESEA in identifying dysregulated pathways. We further compare our results with five other pathway enrichment analysis methods. With these analyses, we show that ESEA is able to help uncover dysregulated biological pathways underlying complex traits and human diseases via specific use of the dysregulated biological relationships. We develop a freely available R-based tool of ESEA. Currently, ESEA can support pathway analysis of the seven public databases (KEGG; Reactome; Biocarta; NCI; SPIKE; HumanCyc; Panther). PMID:26267116

  2. ChlamyCyc: an integrative systems biology database and web-portal for Chlamydomonas reinhardtii.

    PubMed

    May, Patrick; Christian, Jan-Ole; Kempa, Stefan; Walther, Dirk

    2009-05-04

    The unicellular green alga Chlamydomonas reinhardtii is an important eukaryotic model organism for the study of photosynthesis and plant growth. In the era of modern high-throughput technologies there is an imperative need to integrate large-scale data sets from high-throughput experimental techniques using computational methods and database resources to provide comprehensive information about the molecular and cellular organization of a single organism. In the framework of the German Systems Biology initiative GoFORSYS, a pathway database and web-portal for Chlamydomonas (ChlamyCyc) was established, which currently features about 250 metabolic pathways with associated genes, enzymes, and compound information. ChlamyCyc was assembled using an integrative approach combining the recently published genome sequence, bioinformatics methods, and experimental data from metabolomics and proteomics experiments. We analyzed and integrated a combination of primary and secondary database resources, such as existing genome annotations from JGI, EST collections, orthology information, and MapMan classification. ChlamyCyc provides a curated and integrated systems biology repository that will enable and assist in systematic studies of fundamental cellular processes in Chlamydomonas. The ChlamyCyc database and web-portal is freely available under http://chlamycyc.mpimp-golm.mpg.de.

  3. Plant Reactome: a resource for plant pathways and comparative analysis.

    PubMed

    Naithani, Sushma; Preece, Justin; D'Eustachio, Peter; Gupta, Parul; Amarasinghe, Vindhya; Dharmawardhana, Palitha D; Wu, Guanming; Fabregat, Antonio; Elser, Justin L; Weiser, Joel; Keays, Maria; Fuentes, Alfonso Munoz-Pomer; Petryszak, Robert; Stein, Lincoln D; Ware, Doreen; Jaiswal, Pankaj

    2017-01-04

    Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. From Databases to Modelling of Functional Pathways

    PubMed Central

    2004-01-01

    This short review comments on current informatics resources and methodologies in the study of functional pathways in cell biology. It highlights recent achievements in unveiling the structural design of protein and gene networks and discusses current approaches to model and simulate the dynamics of regulatory pathways in the cell. PMID:18629070

  5. From databases to modelling of functional pathways.

    PubMed

    Nasi, Sergio

    2004-01-01

    This short review comments on current informatics resources and methodologies in the study of functional pathways in cell biology. It highlights recent achievements in unveiling the structural design of protein and gene networks and discusses current approaches to model and simulate the dynamics of regulatory pathways in the cell.

  6. The Pathway Coexpression Network: Revealing pathway relationships

    PubMed Central

    Tanzi, Rudolph E.

    2018-01-01

    A goal of genomics is to understand the relationships between biological processes. Pathways contribute to functional interplay within biological processes through complex but poorly understood interactions. However, limited functional references for global pathway relationships exist. Pathways from databases such as KEGG and Reactome provide discrete annotations of biological processes. Their relationships are currently either inferred from gene set enrichment within specific experiments, or by simple overlap, linking pathway annotations that have genes in common. Here, we provide a unifying interpretation of functional interaction between pathways by systematically quantifying coexpression between 1,330 canonical pathways from the Molecular Signatures Database (MSigDB) to establish the Pathway Coexpression Network (PCxN). We estimated the correlation between canonical pathways valid in a broad context using a curated collection of 3,207 microarrays from 72 normal human tissues. PCxN accounts for shared genes between annotations to estimate significant correlations between pathways with related functions rather than with similar annotations. We demonstrate that PCxN provides novel insight into mechanisms of complex diseases using an Alzheimer’s Disease (AD) case study. PCxN retrieved pathways significantly correlated with an expert curated AD gene list. These pathways have known associations with AD and were significantly enriched for genes independently associated with AD. As a further step, we show how PCxN complements the results of gene set enrichment methods by revealing relationships between enriched pathways, and by identifying additional highly correlated pathways. PCxN revealed that correlated pathways from an AD expression profiling study include functional clusters involved in cell adhesion and oxidative stress. PCxN provides expanded connections to pathways from the extracellular matrix. PCxN provides a powerful new framework for interrogation of global pathway relationships. Comprehensive exploration of PCxN can be performed at http://pcxn.org/. PMID:29554099

  7. From 20th century metabolic wall charts to 21st century systems biology: database of mammalian metabolic enzymes.

    PubMed

    Corcoran, Callan C; Grady, Cameron R; Pisitkun, Trairak; Parulekar, Jaya; Knepper, Mark A

    2017-03-01

    The organization of the mammalian genome into gene subsets corresponding to specific functional classes has provided key tools for systems biology research. Here, we have created a web-accessible resource called the Mammalian Metabolic Enzyme Database ( https://hpcwebapps.cit.nih.gov/ESBL/Database/MetabolicEnzymes/MetabolicEnzymeDatabase.html) keyed to the biochemical reactions represented on iconic metabolic pathway wall charts created in the previous century. Overall, we have mapped 1,647 genes to these pathways, representing ~7 percent of the protein-coding genome. To illustrate the use of the database, we apply it to the area of kidney physiology. In so doing, we have created an additional database ( Database of Metabolic Enzymes in Kidney Tubule Segments: https://hpcwebapps.cit.nih.gov/ESBL/Database/MetabolicEnzymes/), mapping mRNA abundance measurements (mined from RNA-Seq studies) for all metabolic enzymes to each of 14 renal tubule segments. We carry out bioinformatics analysis of the enzyme expression pattern among renal tubule segments and mine various data sources to identify vasopressin-regulated metabolic enzymes in the renal collecting duct. Copyright © 2017 the American Physiological Society.

  8. GEM System: automatic prototyping of cell-wide metabolic pathway models from genomes.

    PubMed

    Arakawa, Kazuharu; Yamada, Yohei; Shinoda, Kosaku; Nakayama, Yoichi; Tomita, Masaru

    2006-03-23

    Successful realization of a "systems biology" approach to analyzing cells is a grand challenge for our understanding of life. However, current modeling approaches to cell simulation are labor-intensive, manual affairs, and therefore constitute a major bottleneck in the evolution of computational cell biology. We developed the Genome-based Modeling (GEM) System for the purpose of automatically prototyping simulation models of cell-wide metabolic pathways from genome sequences and other public biological information. Models generated by the GEM System include an entire Escherichia coli metabolism model comprising 968 reactions of 1195 metabolites, achieving 100% coverage when compared with the KEGG database, 92.38% with the EcoCyc database, and 95.06% with iJR904 genome-scale model. The GEM System prototypes qualitative models to reduce the labor-intensive tasks required for systems biology research. Models of over 90 bacterial genomes are available at our web site.

  9. Recent Progress in the Development of Metabolome Databases for Plant Systems Biology

    PubMed Central

    Fukushima, Atsushi; Kusano, Miyako

    2013-01-01

    Metabolomics has grown greatly as a functional genomics tool, and has become an invaluable diagnostic tool for biochemical phenotyping of biological systems. Over the past decades, a number of databases involving information related to mass spectra, compound names and structures, statistical/mathematical models and metabolic pathways, and metabolite profile data have been developed. Such databases complement each other and support efficient growth in this area, although the data resources remain scattered across the World Wide Web. Here, we review available metabolome databases and summarize the present status of development of related tools, particularly focusing on the plant metabolome. Data sharing discussed here will pave way for the robust interpretation of metabolomic data and advances in plant systems biology. PMID:23577015

  10. PAGER 2.0: an update to the pathway, annotated-list and gene-signature electronic repository for Human Network Biology

    PubMed Central

    Yue, Zongliang; Zheng, Qi; Neylon, Michael T; Yoo, Minjae; Shin, Jimin; Zhao, Zhiying; Tan, Aik Choon

    2018-01-01

    Abstract Integrative Gene-set, Network and Pathway Analysis (GNPA) is a powerful data analysis approach developed to help interpret high-throughput omics data. In PAGER 1.0, we demonstrated that researchers can gain unbiased and reproducible biological insights with the introduction of PAGs (Pathways, Annotated-lists and Gene-signatures) as the basic data representation elements. In PAGER 2.0, we improve the utility of integrative GNPA by significantly expanding the coverage of PAGs and PAG-to-PAG relationships in the database, defining a new metric to quantify PAG data qualities, and developing new software features to simplify online integrative GNPA. Specifically, we included 84 282 PAGs spanning 24 different data sources that cover human diseases, published gene-expression signatures, drug–gene, miRNA–gene interactions, pathways and tissue-specific gene expressions. We introduced a new normalized Cohesion Coefficient (nCoCo) score to assess the biological relevance of genes inside a PAG, and RP-score to rank genes and assign gene-specific weights inside a PAG. The companion web interface contains numerous features to help users query and navigate the database content. The database content can be freely downloaded and is compatible with third-party Gene Set Enrichment Analysis tools. We expect PAGER 2.0 to become a major resource in integrative GNPA. PAGER 2.0 is available at http://discovery.informatics.uab.edu/PAGER/. PMID:29126216

  11. Comparison of human cell signaling pathway databases—evolution, drawbacks and challenges

    PubMed Central

    Chowdhury, Saikat; Sarkar, Ram Rup

    2015-01-01

    Elucidating the complexities of cell signaling pathways is of immense importance to gain understanding about various biological phenomenon, such as dynamics of gene/protein expression regulation, cell fate determination, embryogenesis and disease progression. The successful completion of human genome project has also helped experimental and theoretical biologists to analyze various important pathways. To advance this study, during the past two decades, systematic collections of pathway data from experimental studies have been compiled and distributed freely by several databases, which also integrate various computational tools for further analysis. Despite significant advancements, there exist several drawbacks and challenges, such as pathway data heterogeneity, annotation, regular update and automated image reconstructions, which motivated us to perform a thorough review on popular and actively functioning 24 cell signaling databases. Based on two major characteristics, pathway information and technical details, freely accessible data from commercial and academic databases are examined to understand their evolution and enrichment. This review not only helps to identify some novel and useful features, which are not yet included in any of the databases but also highlights their current limitations and subsequently propose the reasonable solutions for future database development, which could be useful to the whole scientific community. PMID:25632107

  12. Metabolic pathways for the whole community.

    PubMed

    Hanson, Niels W; Konwar, Kishori M; Hawley, Alyse K; Altman, Tomer; Karp, Peter D; Hallam, Steven J

    2014-07-22

    A convergence of high-throughput sequencing and computational power is transforming biology into information science. Despite these technological advances, converting bits and bytes of sequence information into meaningful insights remains a challenging enterprise. Biological systems operate on multiple hierarchical levels from genomes to biomes. Holistic understanding of biological systems requires agile software tools that permit comparative analyses across multiple information levels (DNA, RNA, protein, and metabolites) to identify emergent properties, diagnose system states, or predict responses to environmental change. Here we adopt the MetaPathways annotation and analysis pipeline and Pathway Tools to construct environmental pathway/genome databases (ePGDBs) that describe microbial community metabolism using MetaCyc, a highly curated database of metabolic pathways and components covering all domains of life. We evaluate Pathway Tools' performance on three datasets with different complexity and coding potential, including simulated metagenomes, a symbiotic system, and the Hawaii Ocean Time-series. We define accuracy and sensitivity relationships between read length, coverage and pathway recovery and evaluate the impact of taxonomic pruning on ePGDB construction and interpretation. Resulting ePGDBs provide interactive metabolic maps, predict emergent metabolic pathways associated with biosynthesis and energy production and differentiate between genomic potential and phenotypic expression across defined environmental gradients. This multi-tiered analysis provides the user community with specific operating guidelines, performance metrics and prediction hazards for more reliable ePGDB construction and interpretation. Moreover, it demonstrates the power of Pathway Tools in predicting metabolic interactions in natural and engineered ecosystems.

  13. Predicting Protein Relationships to Human Pathways through a Relational Learning Approach Based on Simple Sequence Features.

    PubMed

    García-Jiménez, Beatriz; Pons, Tirso; Sanchis, Araceli; Valencia, Alfonso

    2014-01-01

    Biological pathways are important elements of systems biology and in the past decade, an increasing number of pathway databases have been set up to document the growing understanding of complex cellular processes. Although more genome-sequence data are becoming available, a large fraction of it remains functionally uncharacterized. Thus, it is important to be able to predict the mapping of poorly annotated proteins to original pathway models. We have developed a Relational Learning-based Extension (RLE) system to investigate pathway membership through a function prediction approach that mainly relies on combinations of simple properties attributed to each protein. RLE searches for proteins with molecular similarities to specific pathway components. Using RLE, we associated 383 uncharacterized proteins to 28 pre-defined human Reactome pathways, demonstrating relative confidence after proper evaluation. Indeed, in specific cases manual inspection of the database annotations and the related literature supported the proposed classifications. Examples of possible additional components of the Electron transport system, Telomere maintenance and Integrin cell surface interactions pathways are discussed in detail. All the human predicted proteins in the 2009 and 2012 releases 30 and 40 of Reactome are available at http://rle.bioinfo.cnio.es.

  14. Functional Analysis of OMICs Data and Small Molecule Compounds in an Integrated "Knowledge-Based" Platform.

    PubMed

    Dubovenko, Alexey; Nikolsky, Yuri; Rakhmatulin, Eugene; Nikolskaya, Tatiana

    2017-01-01

    Analysis of NGS and other sequencing data, gene variants, gene expression, proteomics, and other high-throughput (OMICs) data is challenging because of its biological complexity and high level of technical and biological noise. One way to deal with both problems is to perform analysis with a high fidelity annotated knowledgebase of protein interactions, pathways, and functional ontologies. This knowledgebase has to be structured in a computer-readable format and must include software tools for managing experimental data, analysis, and reporting. Here, we present MetaCore™ and Key Pathway Advisor (KPA), an integrated platform for functional data analysis. On the content side, MetaCore and KPA encompass a comprehensive database of molecular interactions of different types, pathways, network models, and ten functional ontologies covering human, mouse, and rat genes. The analytical toolkit includes tools for gene/protein list enrichment analysis, statistical "interactome" tool for the identification of over- and under-connected proteins in the dataset, and a biological network analysis module made up of network generation algorithms and filters. The suite also features Advanced Search, an application for combinatorial search of the database content, as well as a Java-based tool called Pathway Map Creator for drawing and editing custom pathway maps. Applications of MetaCore and KPA include molecular mode of action of disease research, identification of potential biomarkers and drug targets, pathway hypothesis generation, analysis of biological effects for novel small molecule compounds and clinical applications (analysis of large cohorts of patients, and translational and personalized medicine).

  15. Gramene database: navigating plant comparative genomics resources

    USDA-ARS?s Scientific Manuscript database

    Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationship...

  16. Proteome reference map and regulation network of neonatal rat cardiomyocyte

    PubMed Central

    Li, Zi-jian; Liu, Ning; Han, Qi-de; Zhang, You-yi

    2011-01-01

    Aim: To study and establish a proteome reference map and regulation network of neonatal rat cardiomyocyte. Methods: Cultured cardiomyocytes of neonatal rats were used. All proteins expressed in the cardiomyocytes were separated and identified by two-dimensional polyacrylamide gel electrophoresis (2-DE) and matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS). Biological networks and pathways of the neonatal rat cardiomyocytes were analyzed using the Ingenuity Pathway Analysis (IPA) program (www.ingenuity.com). A 2-DE database was made accessible on-line by Make2ddb package on a web server. Results: More than 1000 proteins were separated on 2D gels, and 148 proteins were identified. The identified proteins were used for the construction of an extensible markup language-based database. Biological networks and pathways were constructed to analyze the functions associate with cardiomyocyte proteins in the database. The 2-DE database of rat cardiomyocyte proteins can be accessed at http://2d.bjmu.edu.cn. Conclusion: A proteome reference map and regulation network of the neonatal rat cardiomyocytes have been established, which may serve as an international platform for storage, analysis and visualization of cardiomyocyte proteomic data. PMID:21841810

  17. A dedicated database system for handling multi-level data in systems biology.

    PubMed

    Pornputtapong, Natapol; Wanichthanarak, Kwanjeera; Nilsson, Avlant; Nookaew, Intawat; Nielsen, Jens

    2014-01-01

    Advances in high-throughput technologies have enabled extensive generation of multi-level omics data. These data are crucial for systems biology research, though they are complex, heterogeneous, highly dynamic, incomplete and distributed among public databases. This leads to difficulties in data accessibility and often results in errors when data are merged and integrated from varied resources. Therefore, integration and management of systems biological data remain very challenging. To overcome this, we designed and developed a dedicated database system that can serve and solve the vital issues in data management and hereby facilitate data integration, modeling and analysis in systems biology within a sole database. In addition, a yeast data repository was implemented as an integrated database environment which is operated by the database system. Two applications were implemented to demonstrate extensibility and utilization of the system. Both illustrate how the user can access the database via the web query function and implemented scripts. These scripts are specific for two sample cases: 1) Detecting the pheromone pathway in protein interaction networks; and 2) Finding metabolic reactions regulated by Snf1 kinase. In this study we present the design of database system which offers an extensible environment to efficiently capture the majority of biological entities and relations encountered in systems biology. Critical functions and control processes were designed and implemented to ensure consistent, efficient, secure and reliable transactions. The two sample cases on the yeast integrated data clearly demonstrate the value of a sole database environment for systems biology research.

  18. Beyond mitochondria, what would be the energy source of the cell?

    PubMed

    Herrera, Arturo S; Del C A Esparza, Maria; Md Ashraf, Ghulam; Zamyatnin, Andrey A; Aliev, Gjumrakch

    2015-01-01

    Currently, cell biology is based on glucose as the main source of energy. Cellular bioenergetic pathways have become unnecessarily complex in their eagerness to explain that how the cell is able to generate and use energy from the oxidation of glucose, where mitochondria play an important role through oxidative phosphorylation. During a descriptive study about the three leading causes of blindness in the world, the ability of melanin to transform light energy into chemical energy through the dissociation of water molecule was unraveled. Initially, during 2 or 3 years; we tried to link together our findings with the widely accepted metabolic pathways already described in metabolic pathway databases, which have been developed to collect and organize the current knowledge on metabolism scattered across a multitude of scientific articles. However, firstly, the literature on metabolism is extensive but rarely conclusive evidence is available, and secondly, one would expect these databases to contain largely the same information, but the contrary is true. For the apparently well studied metabolic process Krebs cycle, which was described as early as 1937 and is found in nearly every biology and chemistry curriculum, there is a considerable disagreement between at least five databases. Of the nearly 7000 reactions contained jointly by these five databases, only 199 are described in the same way in all the five databases. Thus to try to integrate chemical energy from melanin with the supposedly well-known bioenergetic pathways is easier said than done; and the lack of consensus about metabolic network constitutes an insurmountable barrier. After years of unsuccessful results, we finally realized that the chemical energy released through the dissociation of water molecule by melanin represents over 90% of cell energy requirements. These findings reveal a new aspect of cell biology, as glucose and ATP have biological functions related mainly to biomass and not so much with energy. Our finding about the unexpected intrinsic property of melanin to transform photon energy into chemical energy through the dissociation of water molecule, a role performed supposedly only by chlorophyll in plants, seriously questions the sacrosanct role of glucose and thereby mitochondria as the primary source of energy and power for the cells.

  19. Protein-protein interaction analysis of Alzheimer`s disease and NAFLD based on systems biology methods unhide common ancestor pathways.

    PubMed

    Karbalaei, Reza; Allahyari, Marzieh; Rezaei-Tavirani, Mostafa; Asadzadeh-Aghdaei, Hamid; Zali, Mohammad Reza

    2018-01-01

    Analysis reconstruction networks from two diseases, NAFLD and Alzheimer`s diseases and their relationship based on systems biology methods. NAFLD and Alzheimer`s diseases are two complex diseases, with progressive prevalence and high cost for countries. There are some reports on relation and same spreading pathways of these two diseases. In addition, they have some similar risk factors, exclusively lifestyle such as feeding, exercises and so on. Therefore, systems biology approach can help to discover their relationship. DisGeNET and STRING databases were sources of disease genes and constructing networks. Three plugins of Cytoscape software, including ClusterONE, ClueGO and CluePedia, were used to analyze and cluster networks and enrichment of pathways. An R package used to define best centrality method. Finally, based on degree and Betweenness, hubs and bottleneck nodes were defined. Common genes between NAFLD and Alzheimer`s disease were 190 genes that used construct a network with STRING database. The resulting network contained 182 nodes and 2591 edges and comprises from four clusters. Enrichment of these clusters separately lead to carbohydrate metabolism, long chain fatty acid and regulation of JAK-STAT and IL-17 signaling pathways, respectively. Also seven genes selected as hub-bottleneck include: IL6, AKT1, TP53, TNF, JUN, VEGFA and PPARG. Enrichment of these proteins and their first neighbors in network by OMIM database lead to diabetes and obesity as ancestors of NAFLD and AD. Systems biology methods, specifically PPI networks, can be useful for analyzing complicated related diseases. Finding Hub and bottleneck proteins should be the goal of drug designing and introducing disease markers.

  20. LASSO-ing Potential Nuclear Receptor Agonists and Antagonists: A New Computational Method for Database Screening

    EPA Science Inventory

    Nuclear receptors (NRs) are important biological macromolecular transcription factors that are implicated in multiple biological pathways and may interact with other xenobiotics that are endocrine disruptors present in the environment. Examples of important NRs include the androg...

  1. A Web Tool for Generating High Quality Machine-readable Biological Pathways.

    PubMed

    Ramirez-Gaona, Miguel; Marcu, Ana; Pon, Allison; Grant, Jason; Wu, Anthony; Wishart, David S

    2017-02-08

    PathWhiz is a web server built to facilitate the creation of colorful, interactive, visually pleasing pathway diagrams that are rich in biological information. The pathways generated by this online application are machine-readable and fully compatible with essentially all web-browsers and computer operating systems. It uses a specially developed, web-enabled pathway drawing interface that permits the selection and placement of different combinations of pre-drawn biological or biochemical entities to depict reactions, interactions, transport processes and binding events. This palette of entities consists of chemical compounds, proteins, nucleic acids, cellular membranes, subcellular structures, tissues, and organs. All of the visual elements in it can be interactively adjusted and customized. Furthermore, because this tool is a web server, all pathways and pathway elements are publicly accessible. This kind of pathway "crowd sourcing" means that PathWhiz already contains a large and rapidly growing collection of previously drawn pathways and pathway elements. Here we describe a protocol for the quick and easy creation of new pathways and the alteration of existing pathways. To further facilitate pathway editing and creation, the tool contains replication and propagation functions. The replication function allows existing pathways to be used as templates to create or edit new pathways. The propagation function allows one to take an existing pathway and automatically propagate it across different species. Pathways created with this tool can be "re-styled" into different formats (KEGG-like or text-book like), colored with different backgrounds, exported to BioPAX, SBGN-ML, SBML, or PWML data exchange formats, and downloaded as PNG or SVG images. The pathways can easily be incorporated into online databases, integrated into presentations, posters or publications, or used exclusively for online visualization and exploration. This protocol has been successfully applied to generate over 2,000 pathway diagrams, which are now found in many online databases including HMDB, DrugBank, SMPDB, and ECMDB.

  2. miRwayDB: a database for experimentally validated microRNA-pathway associations in pathophysiological conditions

    PubMed Central

    Das, Sankha Subhra; Saha, Pritam

    2018-01-01

    Abstract MicroRNAs (miRNAs) are well-known as key regulators of diverse biological pathways. A series of experimental evidences have shown that abnormal miRNA expression profiles are responsible for various pathophysiological conditions by modulating genes in disease associated pathways. In spite of the rapid increase in research data confirming such associations, scientists still do not have access to a consolidated database offering these miRNA-pathway association details for critical diseases. We have developed miRwayDB, a database providing comprehensive information of experimentally validated miRNA-pathway associations in various pathophysiological conditions utilizing data collected from published literature. To the best of our knowledge, it is the first database that provides information about experimentally validated miRNA mediated pathway dysregulation as seen specifically in critical human diseases and hence indicative of a cause-and-effect relationship in most cases. The current version of miRwayDB collects an exhaustive list of miRNA-pathway association entries for 76 critical disease conditions by reviewing 663 published articles. Each database entry contains complete information on the name of the pathophysiological condition, associated miRNA(s), experimental sample type(s), regulation pattern (up/down) of miRNA, pathway association(s), targeted member of dysregulated pathway(s) and a brief description. In addition, miRwayDB provides miRNA, gene and pathway score to evaluate the role of a miRNA regulated pathways in various pathophysiological conditions. The database can also be used for other biomedical approaches such as validation of computational analysis, integrated analysis and prediction of computational model. It also offers a submission page to submit novel data from recently published studies. We believe that miRwayDB will be a useful tool for miRNA research community. Database URL: http://www.mirway.iitkgp.ac.in PMID:29688364

  3. Reconstruction of metabolic pathways for the cattle genome

    PubMed Central

    Seo, Seongwon; Lewin, Harris A

    2009-01-01

    Background Metabolic reconstruction of microbial, plant and animal genomes is a necessary step toward understanding the evolutionary origins of metabolism and species-specific adaptive traits. The aims of this study were to reconstruct conserved metabolic pathways in the cattle genome and to identify metabolic pathways with missing genes and proteins. The MetaCyc database and PathwayTools software suite were chosen for this work because they are widely used and easy to implement. Results An amalgamated cattle genome database was created using the NCBI and Ensembl cattle genome databases (based on build 3.1) as data sources. PathwayTools was used to create a cattle-specific pathway genome database, which was followed by comprehensive manual curation for the reconstruction of metabolic pathways. The curated database, CattleCyc 1.0, consists of 217 metabolic pathways. A total of 64 mammalian-specific metabolic pathways were modified from the reference pathways in MetaCyc, and two pathways previously identified but missing from MetaCyc were added. Comparative analysis of metabolic pathways revealed the absence of mammalian genes for 22 metabolic enzymes whose activity was reported in the literature. We also identified six human metabolic protein-coding genes for which the cattle ortholog is missing from the sequence assembly. Conclusion CattleCyc is a powerful tool for understanding the biology of ruminants and other cetartiodactyl species. In addition, the approach used to develop CattleCyc provides a framework for the metabolic reconstruction of other newly sequenced mammalian genomes. It is clear that metabolic pathway analysis strongly reflects the quality of the underlying genome annotations. Thus, having well-annotated genomes from many mammalian species hosted in BioCyc will facilitate the comparative analysis of metabolic pathways among different species and a systems approach to comparative physiology. PMID:19284618

  4. Kinetic Modeling using BioPAX ontology

    PubMed Central

    Ruebenacker, Oliver; Moraru, Ion. I.; Schaff, James C.; Blinov, Michael L.

    2010-01-01

    Thousands of biochemical interactions are available for download from curated databases such as Reactome, Pathway Interaction Database and other sources in the Biological Pathways Exchange (BioPAX) format. However, the BioPAX ontology does not encode the necessary information for kinetic modeling and simulation. The current standard for kinetic modeling is the System Biology Markup Language (SBML), but only a small number of models are available in SBML format in public repositories. Additionally, reusing and merging SBML models presents a significant challenge, because often each element has a value only in the context of the given model, and information encoding biological meaning is absent. We describe a software system that enables a variety of operations facilitating the use of BioPAX data to create kinetic models that can be visualized, edited, and simulated using the Virtual Cell (VCell), including improved conversion to SBML (for use with other simulation tools that support this format). PMID:20862270

  5. Next-generation sequencing analysis of gene regulation in the rat model of retinopathy of prematurity.

    PubMed

    Griffith, Rachel M; Li, Hu; Zhang, Nan; Favazza, Tara L; Fulton, Anne B; Hansen, Ronald M; Akula, James D

    2013-08-01

    The purpose of this study was to identify the genes, biochemical signaling pathways, and biological themes involved in the pathogenesis of retinopathy of prematurity (ROP). Next-generation sequencing (NGS) was performed on the RNA transcriptome of rats with the Penn et al. (Pediatr Res 36:724-731, 1994) oxygen-induced retinopathy model of ROP at the height of vascular abnormality, postnatal day (P) 19, and normalized to age-matched, room-air-reared littermate controls. Eight custom-developed pathways with potential relevance to known ROP sequelae were evaluated for significant regulation in ROP: The three major Wnt signaling pathways, canonical, planar cell polarity (PCP), and Wnt/Ca(2+); two signaling pathways mediated by the Rho GTPases RhoA and Cdc42, which are, respectively, thought to intersect with canonical and non-canonical Wnt signaling; nitric oxide signaling pathways mediated by two nitric oxide synthase (NOS) enzymes, neuronal (nNOS) and endothelial (eNOS); and the retinoic acid (RA) signaling pathway. Regulation of other biological pathways and themes was detected by gene ontology using the Kyoto Encyclopedia of Genes and Genomes and the NIH's Database for Annotation, Visualization, and Integrated Discovery's GO terms databases. Canonical Wnt signaling was found to be regulated, but the non-canonical PCP and Wnt/Ca(2+) pathways were not. Nitric oxide signaling, as measured by the activation of nNOS and eNOS, was also regulated, as was RA signaling. Biological themes related to protein translation (ribosomes), neural signaling, inflammation and immunity, cell cycle, and cell death were (among others) highly regulated in ROP rats. These several genes and pathways identified by NGS might provide novel targets for intervention in ROP.

  6. Next Generation Sequencing Analysis of Gene Regulation in the Rat Model of Retinopathy of Prematurity

    PubMed Central

    Griffith, Rachel M.; Li, Hu; Zhang, Nan; Favazza, Tara L.; Fulton, Anne B.; Hansen, Ronald M.; Akula, James D.

    2013-01-01

    Purpose To identify the genes, biochemical signaling pathways and biological themes involved in the pathogenesis of retinopathy of prematurity (ROP). Methods Next-generation sequencing (NGS) was performed on the RNA transcriptome of rats with the Penn et al. (1994) oxygen-induced retinopathy (OIR) model of ROP at the height of vascular abnormality, postnatal day (P) 19, and normalized to age-matched, room-air-reared littermate controls. Eight custom developed pathways with potential relevance to known ROP sequelae were evaluated for significant regulation in ROP: The three major Wnt signaling pathways, canonical, planar cell polarity (PCP), and Wnt/Ca2+, two signaling pathways mediated by the Rho GTPases RhoA and Cdc42, which are respectively thought to intersect with canonical and noncanonical Wnt signaling, nitric oxide signaling pathways mediated by two nitrox oxide synthase (NOS) enzymes, neuronal (nNOS) and endothelial (eNOS), and the retinoic acid (RA) signaling pathway. Regulation of other biological pathways and themes were detected by gene ontology using the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the NIH's Database for Annotation, Visualization and Integrated Discovery (DAVID)'s GO terms databases. Results Canonical Wnt signaling was found to be regulated, but the non-canonical PCP and Wnt/Ca2+ pathways were not. Nitric oxide (NO) signaling, as measured by the activation of nNOS eNOS, was also regulated, as was RA signaling. Biological themes related to protein translation (ribosomes), neural signaling, inflammation and immunity, cell cycle and cell death, were (among others) highly regulated in ROP rats. Conclusions These several genes and pathways identified by NGS might provide novel targets for intervention in ROP. PMID:23775346

  7. Informatics approaches in the Biological Characterization of ...

    EPA Pesticide Factsheets

    Adverse Outcome Pathways (AOPs) are a conceptual framework to characterize toxicity pathways by a series of mechanistic steps from a molecular initiating event to population outcomes. This framework helps to direct risk assessment research, for example by aiding in computational prioritization of chemicals, genes, and tissues relevant to an adverse health outcome. We have designed and implemented a computational workflow to access a wealth of public data relating genes, chemicals, diseases, pathways, and species, to provide a biological context for putative AOPs. We selected three AOP case studies: ER/Aromatase Antagonism Leading to Reproductive Dysfunction, AHR1 Activation Leading to Cardiotoxicity, and AChE Inhibition Leading to Acute Mortality, and deduced a taxonomic range of applicability for each AOP. We developed computational tools to automatically access and analyze the pathway activity of AOP-relevant protein orthologs, finding broad similarity among vertebrate species for the ER/Aromatase and AHR1 AOPs, and similarity extending to invertebrate animal species for AChE inhibition. Additionally, we used public gene expression data to find groups of highly co-expressed genes, and compared those groups across organisms. To interpret these findings at a higher level of biological organization, we created the AOPdb, a relational database that mines results from sources including NCBI, KEGG, Reactome, CTD, and OMIM. This multi-source database connects genes,

  8. PathCase-SB architecture and database design

    PubMed Central

    2011-01-01

    Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889

  9. MicroRNA expression, target genes, and signaling pathways in infants with a ventricular septal defect.

    PubMed

    Chai, Hui; Yan, Zhaoyuan; Huang, Ke; Jiang, Yuanqing; Zhang, Lin

    2018-02-01

    This study aimed to systematically investigate the relationship between miRNA expression and the occurrence of ventricular septal defect (VSD), and characterize the miRNA target genes and pathways that can lead to VSD. The miRNAs that were differentially expressed in blood samples from VSD and normal infants were screened and validated by implementing miRNA microarrays and qRT-PCR. The target genes regulated by differentially expressed miRNAs were predicted using three target gene databases. The functions and signaling pathways of the target genes were enriched using the GO database and KEGG database, respectively. The transcription and protein expression of specific target genes in critical pathways were compared in the VSD and normal control groups using qRT-PCR and western blotting, respectively. Compared with the normal control group, the VSD group had 22 differentially expressed miRNAs; 19 were downregulated and three were upregulated. The 10,677 predicted target genes participated in many biological functions related to cardiac development and morphogenesis. Four target genes (mGLUR, Gq, PLC, and PKC) were involved in the PKC pathway and four (ECM, FAK, PI3 K, and PDK1) were involved in the PI3 K-Akt pathway. The transcription and protein expression of these eight target genes were significantly upregulated in the VSD group. The 22 miRNAs that were dysregulated in the VSD group were mainly downregulated, which may result in the dysregulation of several key genes and biological functions related to cardiac development. These effects could also be exerted via the upregulation of eight specific target genes, the subsequent over-activation of the PKC and PI3 K-Akt pathways, and the eventual abnormal cardiac development and VSD.

  10. EcoCyc: a comprehensive database resource for Escherichia coli

    PubMed Central

    Keseler, Ingrid M.; Collado-Vides, Julio; Gama-Castro, Socorro; Ingraham, John; Paley, Suzanne; Paulsen, Ian T.; Peralta-Gil, Martín; Karp, Peter D.

    2005-01-01

    The EcoCyc database (http://EcoCyc.org/) is a comprehensive source of information on the biology of the prototypical model organism Escherichia coli K12. The mission for EcoCyc is to contain both computable descriptions of, and detailed comments describing, all genes, proteins, pathways and molecular interactions in E.coli. Through ongoing manual curation, extensive information such as summary comments, regulatory information, literature citations and evidence types has been extracted from 8862 publications and added to Version 8.5 of the EcoCyc database. The EcoCyc database can be accessed through a World Wide Web interface, while the downloadable Pathway Tools software and data files enable computational exploration of the data and provide enhanced querying capabilities that web interfaces cannot support. For example, EcoCyc contains carefully curated information that can be used as training sets for bioinformatics prediction of entities such as promoters, operons, genetic networks, transcription factor binding sites, metabolic pathways, functionally related genes, protein complexes and protein–ligand interactions. PMID:15608210

  11. Critical assessment of human metabolic pathway databases: a stepping stone for future integration

    PubMed Central

    2011-01-01

    Background Multiple pathway databases are available that describe the human metabolic network and have proven their usefulness in many applications, ranging from the analysis and interpretation of high-throughput data to their use as a reference repository. However, so far the various human metabolic networks described by these databases have not been systematically compared and contrasted, nor has the extent to which they differ been quantified. For a researcher using these databases for particular analyses of human metabolism, it is crucial to know the extent of the differences in content and their underlying causes. Moreover, the outcomes of such a comparison are important for ongoing integration efforts. Results We compared the genes, EC numbers and reactions of five frequently used human metabolic pathway databases. The overlap is surprisingly low, especially on reaction level, where the databases agree on 3% of the 6968 reactions they have combined. Even for the well-established tricarboxylic acid cycle the databases agree on only 5 out of the 30 reactions in total. We identified the main causes for the lack of overlap. Importantly, the databases are partly complementary. Other explanations include the number of steps a conversion is described in and the number of possible alternative substrates listed. Missing metabolite identifiers and ambiguous names for metabolites also affect the comparison. Conclusions Our results show that each of the five networks compared provides us with a valuable piece of the puzzle of the complete reconstruction of the human metabolic network. To enable integration of the networks, next to a need for standardizing the metabolite names and identifiers, the conceptual differences between the databases should be resolved. Considerable manual intervention is required to reach the ultimate goal of a unified and biologically accurate model for studying the systems biology of human metabolism. Our comparison provides a stepping stone for such an endeavor. PMID:21999653

  12. HBVPathDB: a database of HBV infection-related molecular interaction network.

    PubMed

    Zhang, Yi; Bo, Xiao-Chen; Yang, Jing; Wang, Sheng-Qi

    2005-03-21

    To describe molecules or genes interaction between hepatitis B viruses (HBV) and host, for understanding how virus' and host's genes and molecules are networked to form a biological system and for perceiving mechanism of HBV infection. The knowledge of HBV infection-related reactions was organized into various kinds of pathways with carefully drawn graphs in HBVPathDB. Pathway information is stored with relational database management system (DBMS), which is currently the most efficient way to manage large amounts of data and query is implemented with powerful Structured Query Language (SQL). The search engine is written using Personal Home Page (PHP) with SQL embedded and web retrieval interface is developed for searching with Hypertext Markup Language (HTML). We present the first version of HBVPathDB, which is a HBV infection-related molecular interaction network database composed of 306 pathways with 1 050 molecules involved. With carefully drawn graphs, pathway information stored in HBVPathDB can be browsed in an intuitive way. We develop an easy-to-use interface for flexible accesses to the details of database. Convenient software is implemented to query and browse the pathway information of HBVPathDB. Four search page layout options-category search, gene search, description search, unitized search-are supported by the search engine of the database. The database is freely available at http://www.bio-inf.net/HBVPathDB/HBV/. The conventional perspective HBVPathDB have already contained a considerable amount of pathway information with HBV infection related, which is suitable for in-depth analysis of molecular interaction network of virus and host. HBVPathDB integrates pathway data-sets with convenient software for query, browsing, visualization, that provides users more opportunity to identify regulatory key molecules as potential drug targets and to explore the possible mechanism of HBV infection based on gene expression datasets.

  13. IntPath--an integrated pathway gene relationship database for model organisms and important pathogens.

    PubMed

    Zhou, Hufeng; Jin, Jingjing; Zhang, Haojun; Yi, Bo; Wozniak, Michal; Wong, Limsoon

    2012-01-01

    Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases. In this paper, we overcome these issues through extraction, normalization and integration of pathway data from several major public databases (KEGG, WikiPathways, BioCyc, etc). We build a database that not only hosts our integrated pathway gene relationship data for public access but also maintains the necessary updates in the long run. This public repository is named IntPath (Integrated Pathway gene relationship database for model organisms and important pathogens). Four organisms--S. cerevisiae, M. tuberculosis H37Rv, H. Sapiens and M. musculus--are included in this version (V2.0) of IntPath. IntPath uses the "full unification" approach to ensure no deletion and no introduced noise in this process. Therefore, IntPath contains much richer pathway-gene and pathway-gene pair relationships and much larger number of non-redundant genes and gene pairs than any of the single-source databases. The gene relationships of each gene (measured by average node degree) per pathway are significantly richer. The gene relationships in each pathway (measured by average number of gene pairs per pathway) are also considerably richer in the integrated pathways. Moderate manual curation are involved to get rid of errors and noises from source data (e.g., the gene ID errors in WikiPathways and relationship errors in KEGG). We turn complicated and incompatible xml data formats and inconsistent gene and gene relationship representations from different source databases into normalized and unified pathway-gene and pathway-gene pair relationships neatly recorded in simple tab-delimited text format and MySQL tables, which facilitates convenient automatic computation and large-scale referencing in many related studies. IntPath data can be downloaded in text format or MySQL dump. IntPath data can also be retrieved and analyzed conveniently through web service by local programs or through web interface by mouse clicks. Several useful analysis tools are also provided in IntPath. We have overcome in IntPath the issues of compatibility, consistency, and comprehensiveness that often hamper effective use of pathway databases. We have included four organisms in the current release of IntPath. Our methodology and programs described in this work can be easily applied to other organisms; and we will include more model organisms and important pathogens in future releases of IntPath. IntPath maintains regular updates and is freely available at http://compbio.ddns.comp.nus.edu.sg:8080/IntPath.

  14. Elucidation of metabolic pathways from enzyme classification data.

    PubMed

    McDonald, Andrew G; Tipton, Keith F

    2014-01-01

    The IUBMB Enzyme List is widely used by other databases as a source for avoiding ambiguity in the recognition of enzymes as catalytic entities. However, it was not designed for metabolic pathway tracing, which has become increasingly important in systems biology. A Reactions Database has been created from the material in the Enzyme List to allow reactions to be searched by substrate/product, and pathways to be traced from any selected starting/seed substrate. An extensive synonym glossary allows searches by many of the alternative names, including accepted abbreviations, by which a chemical compound may be known. This database was necessary for the development of the application Reaction Explorer ( http://www.reaction-explorer.org ), which was written in Real Studio ( http://www.realsoftware.com/realstudio/ ) to search the Reactions Database and draw metabolic pathways from reactions selected by the user. Having input the name of the starting compound (the "seed"), the user is presented with a list of all reactions containing that compound and then selects the product of interest as the next point on the ensuing graph. The pathway diagram is then generated as the process iterates. A contextual menu is provided, which allows the user: (1) to remove a compound from the graph, along with all associated links; (2) to search the reactions database again for additional reactions involving the compound; (3) to search for the compound within the Enzyme List.

  15. Integrating In Silico Resources to Map a Signaling Network

    PubMed Central

    Liu, Hanqing; Beck, Tim N.; Golemis, Erica A.; Serebriiskii, Ilya G.

    2013-01-01

    The abundance of publicly available life science databases offer a wealth of information that can support interpretation of experimentally derived data and greatly enhance hypothesis generation. Protein interaction and functional networks are not simply new renditions of existing data: they provide the opportunity to gain insights into the specific physical and functional role a protein plays as part of the biological system. In this chapter, we describe different in silico tools that can quickly and conveniently retrieve data from existing data repositories and discuss how the available tools are best utilized for different purposes. While emphasizing protein-protein interaction databases (e.g., BioGrid and IntAct), we also introduce metasearch platforms such as STRING and GeneMANIA, pathway databases (e.g., BioCarta and Pathway Commons), text mining approaches (e.g., PubMed and Chilibot), and resources for drug-protein interactions, genetic information for model organisms and gene expression information based on microarray data mining. Furthermore, we provide a simple step-by-step protocol to building customized protein-protein interaction networks in Cytoscape, a powerful network assembly and visualization program, integrating data retrieved from these various databases. As we illustrate, generation of composite interaction networks enables investigators to extract significantly more information about a given biological system than utilization of a single database or sole reliance on primary literature. PMID:24233784

  16. Meta-All: a system for managing metabolic pathway information.

    PubMed

    Weise, Stephan; Grosse, Ivo; Klukas, Christian; Koschützki, Dirk; Scholz, Uwe; Schreiber, Falk; Junker, Björn H

    2006-10-23

    Many attempts are being made to understand biological subjects at a systems level. A major resource for these approaches are biological databases, storing manifold information about DNA, RNA and protein sequences including their functional and structural motifs, molecular markers, mRNA expression levels, metabolite concentrations, protein-protein interactions, phenotypic traits or taxonomic relationships. The use of these databases is often hampered by the fact that they are designed for special application areas and thus lack universality. Databases on metabolic pathways, which provide an increasingly important foundation for many analyses of biochemical processes at a systems level, are no exception from the rule. Data stored in central databases such as KEGG, BRENDA or SABIO-RK is often limited to read-only access. If experimentalists want to store their own data, possibly still under investigation, there are two possibilities. They can either develop their own information system for managing that own data, which is very time-consuming and costly, or they can try to store their data in existing systems, which is often restricted. Hence, an out-of-the-box information system for managing metabolic pathway data is needed. We have designed META-ALL, an information system that allows the management of metabolic pathways, including reaction kinetics, detailed locations, environmental factors and taxonomic information. Data can be stored together with quality tags and in different parallel versions. META-ALL uses Oracle DBMS and Oracle Application Express. We provide the META-ALL information system for download and use. In this paper, we describe the database structure and give information about the tools for submitting and accessing the data. As a first application of META-ALL, we show how the information contained in a detailed kinetic model can be stored and accessed. META-ALL is a system for managing information about metabolic pathways. It facilitates the handling of pathway-related data and is designed to help biochemists and molecular biologists in their daily research. It is available on the Web at http://bic-gh.de/meta-all and can be downloaded free of charge and installed locally.

  17. Meta-All: a system for managing metabolic pathway information

    PubMed Central

    Weise, Stephan; Grosse, Ivo; Klukas, Christian; Koschützki, Dirk; Scholz, Uwe; Schreiber, Falk; Junker, Björn H

    2006-01-01

    Background Many attempts are being made to understand biological subjects at a systems level. A major resource for these approaches are biological databases, storing manifold information about DNA, RNA and protein sequences including their functional and structural motifs, molecular markers, mRNA expression levels, metabolite concentrations, protein-protein interactions, phenotypic traits or taxonomic relationships. The use of these databases is often hampered by the fact that they are designed for special application areas and thus lack universality. Databases on metabolic pathways, which provide an increasingly important foundation for many analyses of biochemical processes at a systems level, are no exception from the rule. Data stored in central databases such as KEGG, BRENDA or SABIO-RK is often limited to read-only access. If experimentalists want to store their own data, possibly still under investigation, there are two possibilities. They can either develop their own information system for managing that own data, which is very time-consuming and costly, or they can try to store their data in existing systems, which is often restricted. Hence, an out-of-the-box information system for managing metabolic pathway data is needed. Results We have designed META-ALL, an information system that allows the management of metabolic pathways, including reaction kinetics, detailed locations, environmental factors and taxonomic information. Data can be stored together with quality tags and in different parallel versions. META-ALL uses Oracle DBMS and Oracle Application Express. We provide the META-ALL information system for download and use. In this paper, we describe the database structure and give information about the tools for submitting and accessing the data. As a first application of META-ALL, we show how the information contained in a detailed kinetic model can be stored and accessed. Conclusion META-ALL is a system for managing information about metabolic pathways. It facilitates the handling of pathway-related data and is designed to help biochemists and molecular biologists in their daily research. It is available on the Web at and can be downloaded free of charge and installed locally. PMID:17059592

  18. A guide for building biological pathways along with two case studies: hair and breast development.

    PubMed

    Trindade, Daniel; Orsine, Lissur A; Barbosa-Silva, Adriano; Donnard, Elisa R; Ortega, J Miguel

    2015-03-01

    Genomic information is being underlined in the format of biological pathways. Building these biological pathways is an ongoing demand and benefits from methods for extracting information from biomedical literature with the aid of text-mining tools. Here we hopefully guide you in the attempt of building a customized pathway or chart representation of a system. Our manual is based on a group of software designed to look at biointeractions in a set of abstracts retrieved from PubMed. However, they aim to support the work of someone with biological background, who does not need to be an expert on the subject and will play the role of manual curator while designing the representation of the system, the pathway. We therefore illustrate with two challenging case studies: hair and breast development. They were chosen for focusing on recent acquisitions of human evolution. We produced sub-pathways for each study, representing different phases of development. Differently from most charts present in current databases, we present detailed descriptions, which will additionally guide PESCADOR users along the process. The implementation as a web interface makes PESCADOR a unique tool for guiding the user along the biointeractions, which will constitute a novel pathway. Copyright © 2014 Elsevier Inc. All rights reserved.

  19. Predictive Models and Computational Embryology

    EPA Science Inventory

    EPA’s ‘virtual embryo’ project is building an integrative systems biology framework for predictive models of developmental toxicity. One schema involves a knowledge-driven adverse outcome pathway (AOP) framework utilizing information from public databases, standardized ontologies...

  20. Pathway Tools version 19.0 update: software for pathway/genome informatics and systems biology

    PubMed Central

    Latendresse, Mario; Paley, Suzanne M.; Krummenacker, Markus; Ong, Quang D.; Billington, Richard; Kothari, Anamika; Weaver, Daniel; Lee, Thomas; Subhraveti, Pallavi; Spaulding, Aaron; Fulcher, Carol; Keseler, Ingrid M.; Caspi, Ron

    2016-01-01

    Pathway Tools is a bioinformatics software environment with a broad set of capabilities. The software provides genome-informatics tools such as a genome browser, sequence alignments, a genome-variant analyzer and comparative-genomics operations. It offers metabolic-informatics tools, such as metabolic reconstruction, quantitative metabolic modeling, prediction of reaction atom mappings and metabolic route search. Pathway Tools also provides regulatory-informatics tools, such as the ability to represent and visualize a wide range of regulatory interactions. This article outlines the advances in Pathway Tools in the past 5 years. Major additions include components for metabolic modeling, metabolic route search, computation of atom mappings and estimation of compound Gibbs free energies of formation; addition of editors for signaling pathways, for genome sequences and for cellular architecture; storage of gene essentiality data and phenotype data; display of multiple alignments, and of signaling and electron-transport pathways; and development of Python and web-services application programming interfaces. Scientists around the world have created more than 9800 Pathway/Genome Databases by using Pathway Tools, many of which are curated databases for important model organisms. PMID:26454094

  1. SorghumFDB: sorghum functional genomics database with multidimensional network analysis.

    PubMed

    Tian, Tian; You, Qi; Zhang, Liwei; Yi, Xin; Yan, Hengyu; Xu, Wenying; Su, Zhen

    2016-01-01

    Sorghum (Sorghum bicolor [L.] Moench) has excellent agronomic traits and biological properties, such as heat and drought-tolerance. It is a C4 grass and potential bioenergy-producing plant, which makes it an important crop worldwide. With the sorghum genome sequence released, it is essential to establish a sorghum functional genomics data mining platform. We collected genomic data and some functional annotations to construct a sorghum functional genomics database (SorghumFDB). SorghumFDB integrated knowledge of sorghum gene family classifications (transcription regulators/factors, carbohydrate-active enzymes, protein kinases, ubiquitins, cytochrome P450, monolignol biosynthesis related enzymes, R-genes and organelle-genes), detailed gene annotations, miRNA and target gene information, orthologous pairs in the model plants Arabidopsis, rice and maize, gene loci conversions and a genome browser. We further constructed a dynamic network of multidimensional biological relationships, comprised of the co-expression data, protein-protein interactions and miRNA-target pairs. We took effective measures to combine the network, gene set enrichment and motif analyses to determine the key regulators that participate in related metabolic pathways, such as the lignin pathway, which is a major biological process in bioenergy-producing plants.Database URL: http://structuralbiology.cau.edu.cn/sorghum/index.html. © The Author(s) 2016. Published by Oxford University Press.

  2. Predictive Models and Computational Toxicology (II IBAMTOX)

    EPA Science Inventory

    EPA’s ‘virtual embryo’ project is building an integrative systems biology framework for predictive models of developmental toxicity. One schema involves a knowledge-driven adverse outcome pathway (AOP) framework utilizing information from public databases, standardized ontologies...

  3. Functional Interaction Network Construction and Analysis for Disease Discovery.

    PubMed

    Wu, Guanming; Haw, Robin

    2017-01-01

    Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.

  4. IntPath--an integrated pathway gene relationship database for model organisms and important pathogens

    PubMed Central

    2012-01-01

    Background Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases. Results In this paper, we overcome these issues through extraction, normalization and integration of pathway data from several major public databases (KEGG, WikiPathways, BioCyc, etc). We build a database that not only hosts our integrated pathway gene relationship data for public access but also maintains the necessary updates in the long run. This public repository is named IntPath (Integrated Pathway gene relationship database for model organisms and important pathogens). Four organisms--S. cerevisiae, M. tuberculosis H37Rv, H. Sapiens and M. musculus--are included in this version (V2.0) of IntPath. IntPath uses the "full unification" approach to ensure no deletion and no introduced noise in this process. Therefore, IntPath contains much richer pathway-gene and pathway-gene pair relationships and much larger number of non-redundant genes and gene pairs than any of the single-source databases. The gene relationships of each gene (measured by average node degree) per pathway are significantly richer. The gene relationships in each pathway (measured by average number of gene pairs per pathway) are also considerably richer in the integrated pathways. Moderate manual curation are involved to get rid of errors and noises from source data (e.g., the gene ID errors in WikiPathways and relationship errors in KEGG). We turn complicated and incompatible xml data formats and inconsistent gene and gene relationship representations from different source databases into normalized and unified pathway-gene and pathway-gene pair relationships neatly recorded in simple tab-delimited text format and MySQL tables, which facilitates convenient automatic computation and large-scale referencing in many related studies. IntPath data can be downloaded in text format or MySQL dump. IntPath data can also be retrieved and analyzed conveniently through web service by local programs or through web interface by mouse clicks. Several useful analysis tools are also provided in IntPath. Conclusions We have overcome in IntPath the issues of compatibility, consistency, and comprehensiveness that often hamper effective use of pathway databases. We have included four organisms in the current release of IntPath. Our methodology and programs described in this work can be easily applied to other organisms; and we will include more model organisms and important pathogens in future releases of IntPath. IntPath maintains regular updates and is freely available at http://compbio.ddns.comp.nus.edu.sg:8080/IntPath. PMID:23282057

  5. Text mining for metabolic pathways, signaling cascades, and protein networks.

    PubMed

    Hoffmann, Robert; Krallinger, Martin; Andres, Eduardo; Tamames, Javier; Blaschke, Christian; Valencia, Alfonso

    2005-05-10

    The complexity of the information stored in databases and publications on metabolic and signaling pathways, the high throughput of experimental data, and the growing number of publications make it imperative to provide systems to help the researcher navigate through these interrelated information resources. Text-mining methods have started to play a key role in the creation and maintenance of links between the information stored in biological databases and its original sources in the literature. These links will be extremely useful for database updating and curation, especially if a number of technical problems can be solved satisfactorily, including the identification of protein and gene names (entities in general) and the characterization of their types of interactions. The first generation of openly accessible text-mining systems, such as iHOP (Information Hyperlinked over Proteins), provides additional functions to facilitate the reconstruction of protein interaction networks, combine database and text information, and support the scientist in the formulation of novel hypotheses. The next challenge is the generation of comprehensive information regarding the general function of signaling pathways and protein interaction networks.

  6. Causal biological network database: a comprehensive platform of causal biological network models focused on the pulmonary and vascular systems

    PubMed Central

    Boué, Stéphanie; Talikka, Marja; Westra, Jurjen Willem; Hayes, William; Di Fabio, Anselmo; Park, Jennifer; Schlage, Walter K.; Sewer, Alain; Fields, Brett; Ansari, Sam; Martin, Florian; Veljkovic, Emilija; Kenney, Renee; Peitsch, Manuel C.; Hoeng, Julia

    2015-01-01

    With the wealth of publications and data available, powerful and transparent computational approaches are required to represent measured data and scientific knowledge in a computable and searchable format. We developed a set of biological network models, scripted in the Biological Expression Language, that reflect causal signaling pathways across a wide range of biological processes, including cell fate, cell stress, cell proliferation, inflammation, tissue repair and angiogenesis in the pulmonary and cardiovascular context. This comprehensive collection of networks is now freely available to the scientific community in a centralized web-based repository, the Causal Biological Network database, which is composed of over 120 manually curated and well annotated biological network models and can be accessed at http://causalbionet.com. The website accesses a MongoDB, which stores all versions of the networks as JSON objects and allows users to search for genes, proteins, biological processes, small molecules and keywords in the network descriptions to retrieve biological networks of interest. The content of the networks can be visualized and browsed. Nodes and edges can be filtered and all supporting evidence for the edges can be browsed and is linked to the original articles in PubMed. Moreover, networks may be downloaded for further visualization and evaluation. Database URL: http://causalbionet.com PMID:25887162

  7. HPIminer: A text mining system for building and visualizing human protein interaction networks and pathways.

    PubMed

    Subramani, Suresh; Kalpana, Raja; Monickaraj, Pankaj Moses; Natarajan, Jeyakumar

    2015-04-01

    The knowledge on protein-protein interactions (PPI) and their related pathways are equally important to understand the biological functions of the living cell. Such information on human proteins is highly desirable to understand the mechanism of several diseases such as cancer, diabetes, and Alzheimer's disease. Because much of that information is buried in biomedical literature, an automated text mining system for visualizing human PPI and pathways is highly desirable. In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature. HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG. To our knowledge, HPIminer is the first system to build interaction networks from literature as well as curated databases. Further, the new interactions mined only from literature and not reported earlier in databases are highlighted as new. A comparative study with other similar tools shows that the resultant network is more informative and provides additional information on interacting proteins and their associated networks. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. PyPathway: Python Package for Biological Network Analysis and Visualization.

    PubMed

    Xu, Yang; Luo, Xiao-Chun

    2018-05-01

    Life science studies represent one of the biggest generators of large data sets, mainly because of rapid sequencing technological advances. Biological networks including interactive networks and human curated pathways are essential to understand these high-throughput data sets. Biological network analysis offers a method to explore systematically not only the molecular complexity of a particular disease but also the molecular relationships among apparently distinct phenotypes. Currently, several packages for Python community have been developed, such as BioPython and Goatools. However, tools to perform comprehensive network analysis and visualization are still needed. Here, we have developed PyPathway, an extensible free and open source Python package for functional enrichment analysis, network modeling, and network visualization. The network process module supports various interaction network and pathway databases such as Reactome, WikiPathway, STRING, and BioGRID. The network analysis module implements overrepresentation analysis, gene set enrichment analysis, network-based enrichment, and de novo network modeling. Finally, the visualization and data publishing modules enable users to share their analysis by using an easy web application. For package availability, see the first Reference.

  9. Interleukins and their signaling pathways in the Reactome biological pathway database.

    PubMed

    Jupe, Steve; Ray, Keith; Roca, Corina Duenas; Varusai, Thawfeek; Shamovsky, Veronica; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning

    2018-04-01

    There is a wealth of biological pathway information available in the scientific literature, but it is spread across many thousands of publications. Alongside publications that contain definitive experimental discoveries are many others that have been dismissed as spurious, found to be irreproducible, or are contradicted by later results and consequently now considered controversial. Many descriptions and images of pathways are incomplete stylized representations that assume the reader is an expert and familiar with the established details of the process, which are consequently not fully explained. Pathway representations in publications frequently do not represent a complete, detailed, and unambiguous description of the molecules involved; their precise posttranslational state; or a full account of the molecular events they undergo while participating in a process. Although this might be sufficient to be interpreted by an expert reader, the lack of detail makes such pathways less useful and difficult to understand for anyone unfamiliar with the area and of limited use as the basis for computational models. Reactome was established as a freely accessible knowledge base of human biological pathways. It is manually populated with interconnected molecular events that fully detail the molecular participants linked to published experimental data and background material by using a formal and open data structure that facilitates computational reuse. These data are accessible on a Web site in the form of pathway diagrams that have descriptive summaries and annotations and as downloadable data sets in several formats that can be reused with other computational tools. The entire database and all supporting software can be downloaded and reused under a Creative Commons license. Pathways are authored by expert biologists who work with Reactome curators and editorial staff to represent the consensus in the field. Pathways are represented as interactive diagrams that include as much molecular detail as possible and are linked to literature citations that contain supporting experimental details. All newly created events undergo a peer-review process before they are added to the database and made available on the associated Web site. New content is added quarterly. The 63rd release of Reactome in December 2017 contains 10,996 human proteins participating in 11,426 events in 2,179 pathways. In addition, analytic tools allow data set submission for the identification and visualization of pathway enrichment and representation of expression profiles as an overlay on Reactome pathways. Protein-protein and compound-protein interactions from several sources, including custom user data sets, can be added to extend pathways. Pathway diagrams and analytic result displays can be downloaded as editable images, human-readable reports, and files in several standard formats that are suitable for computational reuse. Reactome content is available programmatically through a REpresentational State Transfer (REST)-based content service and as a Neo4J graph database. Signaling pathways for IL-1 to IL-38 are hierarchically classified within the pathway "signaling by interleukins." The classification used is largely derived from Akdis et al. The addition to Reactome of a complete set of the known human interleukins, their receptors, and established signaling pathways linked to annotations of relevant aspects of immune function provides a significant computationally accessible resource of information about this important family. This information can be extended easily as new discoveries become accepted as the consensus in the field. A key aim for the future is to increase coverage of gene expression changes induced by interleukin signaling. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  10. Update of KDBI: Kinetic Data of Bio-molecular Interaction database

    PubMed Central

    Kumar, Pankaj; Han, B. C.; Shi, Z.; Jia, J.; Wang, Y. P.; Zhang, Y. T.; Liang, L.; Liu, Q. F.; Ji, Z. L.; Chen, Y. Z.

    2009-01-01

    Knowledge of the kinetics of biomolecular interactions is important for facilitating the study of cellular processes and underlying molecular events, and is essential for quantitative study and simulation of biological systems. Kinetic Data of Bio-molecular Interaction database (KDBI) has been developed to provide information about experimentally determined kinetic data of protein–protein, protein–nucleic acid, protein–ligand, nucleic acid–ligand binding or reaction events described in the literature. To accommodate increasing demand for studying and simulating biological systems, numerous improvements and updates have been made to KDBI, including new ways to access data by pathway and molecule names, data file in System Biology Markup Language format, more efficient search engine, access to published parameter sets of simulation models of 63 pathways, and 2.3-fold increase of data (19 263 entries of 10 532 distinctive biomolecular binding and 11 954 interaction events, involving 2635 proteins/protein complexes, 847 nucleic acids, 1603 small molecules and 45 multi-step processes). KDBI is publically available at http://bidd.nus.edu.sg/group/kdbi/kdbi.asp. PMID:18971255

  11. SPIKE – a database, visualization and analysis tool of cellular signaling pathways

    PubMed Central

    Elkon, Ran; Vesterman, Rita; Amit, Nira; Ulitsky, Igor; Zohar, Idan; Weisz, Mali; Mass, Gilad; Orlev, Nir; Sternberg, Giora; Blekhman, Ran; Assa, Jackie; Shiloh, Yosef; Shamir, Ron

    2008-01-01

    Background Biological signaling pathways that govern cellular physiology form an intricate web of tightly regulated interlocking processes. Data on these regulatory networks are accumulating at an unprecedented pace. The assimilation, visualization and interpretation of these data have become a major challenge in biological research, and once met, will greatly boost our ability to understand cell functioning on a systems level. Results To cope with this challenge, we are developing the SPIKE knowledge-base of signaling pathways. SPIKE contains three main software components: 1) A database (DB) of biological signaling pathways. Carefully curated information from the literature and data from large public sources constitute distinct tiers of the DB. 2) A visualization package that allows interactive graphic representations of regulatory interactions stored in the DB and superposition of functional genomic and proteomic data on the maps. 3) An algorithmic inference engine that analyzes the networks for novel functional interplays between network components. SPIKE is designed and implemented as a community tool and therefore provides a user-friendly interface that allows registered users to upload data to SPIKE DB. Our vision is that the DB will be populated by a distributed and highly collaborative effort undertaken by multiple groups in the research community, where each group contributes data in its field of expertise. Conclusion The integrated capabilities of SPIKE make it a powerful platform for the analysis of signaling networks and the integration of knowledge on such networks with omics data. PMID:18289391

  12. 1-CMDb: A Curated Database of Genomic Variations of the One-Carbon Metabolism Pathway.

    PubMed

    Bhat, Manoj K; Gadekar, Veerendra P; Jain, Aditya; Paul, Bobby; Rai, Padmalatha S; Satyamoorthy, Kapaettu

    2017-01-01

    The one-carbon metabolism pathway is vital in maintaining tissue homeostasis by driving the critical reactions of folate and methionine cycles. A myriad of genetic and epigenetic events mark the rate of reactions in a tissue-specific manner. Integration of these to predict and provide personalized health management requires robust computational tools that can process multiomics data. The DNA sequences that may determine the chain of biological events and the endpoint reactions within one-carbon metabolism genes remain to be comprehensively recorded. Hence, we designed the one-carbon metabolism database (1-CMDb) as a platform to interrogate its association with a host of human disorders. DNA sequence and network information of a total of 48 genes were extracted from a literature survey and KEGG pathway that are involved in the one-carbon folate-mediated pathway. The information generated, collected, and compiled for all these genes from the UCSC genome browser included the single nucleotide polymorphisms (SNPs), CpGs, copy number variations (CNVs), and miRNAs, and a comprehensive database was created. Furthermore, a significant correlation analysis was performed for SNPs in the pathway genes. Detailed data of SNPs, CNVs, CpG islands, and miRNAs for 48 folate pathway genes were compiled. The SNPs in CNVs (9670), CpGs (984), and miRNAs (14) were also compiled for all pathway genes. The SIFT score, the prediction and PolyPhen score, as well as the prediction for each of the SNPs were tabulated and represented for folate pathway genes. Also included in the database for folate pathway genes were the links to 124 various phenotypes and disease associations as reported in the literature and from publicly available information. A comprehensive database was generated consisting of genomic elements within and among SNPs, CNVs, CpGs, and miRNAs of one-carbon metabolism pathways to facilitate (a) single source of information and (b) integration into large-genome scale network analysis to be developed in the future by the scientific community. The database can be accessed at http://slsdb.manipal.edu/ocm/. © 2017 S. Karger AG, Basel.

  13. ATLAS of Biochemistry: A Repository of All Possible Biochemical Reactions for Synthetic Biology and Metabolic Engineering Studies.

    PubMed

    Hadadi, Noushin; Hafner, Jasmin; Shajkofci, Adrian; Zisaki, Aikaterini; Hatzimanikatis, Vassily

    2016-10-21

    Because the complexity of metabolism cannot be intuitively understood or analyzed, computational methods are indispensable for studying biochemistry and deepening our understanding of cellular metabolism to promote new discoveries. We used the computational framework BNICE.ch along with cheminformatic tools to assemble the whole theoretical reactome from the known metabolome through expansion of the known biochemistry presented in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. We constructed the ATLAS of Biochemistry, a database of all theoretical biochemical reactions based on known biochemical principles and compounds. ATLAS includes more than 130 000 hypothetical enzymatic reactions that connect two or more KEGG metabolites through novel enzymatic reactions that have never been reported to occur in living organisms. Moreover, ATLAS reactions integrate 42% of KEGG metabolites that are not currently present in any KEGG reaction into one or more novel enzymatic reactions. The generated repository of information is organized in a Web-based database ( http://lcsb-databases.epfl.ch/atlas/ ) that allows the user to search for all possible routes from any substrate compound to any product. The resulting pathways involve known and novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential targets for protein engineering. Our approach of introducing novel biochemistry into pathway design and associated databases will be important for synthetic biology and metabolic engineering.

  14. Pathway analysis from lists of microRNAs: common pitfalls and alternative strategy

    PubMed Central

    Godard, Patrice; van Eyll, Jonathan

    2015-01-01

    MicroRNAs (miRNAs) are involved in the regulation of gene expression at a post-transcriptional level. As such, monitoring miRNA expression has been increasingly used to assess their role in regulatory mechanisms of biological processes. In large scale studies, once miRNAs of interest have been identified, the target genes they regulate are often inferred using algorithms or databases. A pathway analysis is then often performed in order to generate hypotheses about the relevant biological functions controlled by the miRNA signature. Here we show that the method widely used in scientific literature to identify these pathways is biased and leads to inaccurate results. In addition to describing the bias and its origin we present an alternative strategy to identify potential biological functions specifically impacted by a miRNA signature. More generally, our study exemplifies the crucial need of relevant negative controls when developing, and using, bioinformatics methods. PMID:25800743

  15. MelanomaDB: A Web Tool for Integrative Analysis of Melanoma Genomic Information to Identify Disease-Associated Molecular Pathways

    PubMed Central

    Trevarton, Alexander J.; Mann, Michael B.; Knapp, Christoph; Araki, Hiromitsu; Wren, Jonathan D.; Stones-Havas, Steven; Black, Michael A.; Print, Cristin G.

    2013-01-01

    Despite on-going research, metastatic melanoma survival rates remain low and treatment options are limited. Researchers can now access a rapidly growing amount of molecular and clinical information about melanoma. This information is becoming difficult to assemble and interpret due to its dispersed nature, yet as it grows it becomes increasingly valuable for understanding melanoma. Integration of this information into a comprehensive resource to aid rational experimental design and patient stratification is needed. As an initial step in this direction, we have assembled a web-accessible melanoma database, MelanomaDB, which incorporates clinical and molecular data from publically available sources, which will be regularly updated as new information becomes available. This database allows complex links to be drawn between many different aspects of melanoma biology: genetic changes (e.g., mutations) in individual melanomas revealed by DNA sequencing, associations between gene expression and patient survival, data concerning drug targets, biomarkers, druggability, and clinical trials, as well as our own statistical analysis of relationships between molecular pathways and clinical parameters that have been produced using these data sets. The database is freely available at http://genesetdb.auckland.ac.nz/melanomadb/about.html. A subset of the information in the database can also be accessed through a freely available web application in the Illumina genomic cloud computing platform BaseSpace at http://www.biomatters.com/apps/melanoma-profiler-for-research. The MelanomaDB database illustrates dysregulation of specific signaling pathways across 310 exome-sequenced melanomas and in individual tumors and identifies the distribution of somatic variants in melanoma. We suggest that MelanomaDB can provide a context in which to interpret the tumor molecular profiles of individual melanoma patients relative to biological information and available drug therapies. PMID:23875173

  16. Incorporating Information of microRNAs into Pathway Analysis in a Genome-Wide Association Study of Bipolar Disorder

    PubMed Central

    Shih, Wei-Liang; Kao, Chung-Feng; Chuang, Li-Chung; Kuo, Po-Hsiu

    2012-01-01

    MicroRNAs (miRNAs) are known to be important post-transcriptional regulators that are involved in the etiology of complex psychiatric traits. The present study aimed to incorporate miRNAs information into pathway analysis using a genome-wide association dataset to identify relevant biological pathways for bipolar disorder (BPD). We selected psychiatric- and neurological-associated miRNAs (N = 157) from PhenomiR database. The miRNA target genes (miTG) predictions were obtained from microRNA.org. Canonical pathways (N = 4,051) were downloaded from the Molecule Signature Database. We employed a novel weighting scheme for miTGs in pathway analysis using methods of gene set enrichment analysis and sum-statistic. Under four statistical scenarios, 38 significantly enriched pathways (P-value < 0.01 after multiple testing correction) were identified for the risk of developing BPD, including pathways of ion channels associated (e.g., gated channel activity, ion transmembrane transporter activity, and ion channel activity) and nervous related biological processes (e.g., nervous system development, cytoskeleton, and neuroactive ligand receptor interaction). Among them, 19 were identified only when the weighting scheme was applied. Many miRNA-targeted genes were functionally related to ion channels, collagen, and axonal growth and guidance that have been suggested to be associated with BPD previously. Some of these genes are linked to the regulation of miRNA machinery in the literature. Our findings provide support for the potential involvement of miRNAs in the psychopathology of BPD. Further investigations to elucidate the functions and mechanisms of identified candidate pathways are needed. PMID:23264780

  17. An advanced web query interface for biological databases

    PubMed Central

    Latendresse, Mario; Karp, Peter D.

    2010-01-01

    Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715

  18. Genome sequence analysis of a flocculant-producing bacterium, Paenibacillus shenyangensis.

    PubMed

    Fu, Lili; Jiang, Binhui; Liu, Jinliang; Zhao, Xin; Liu, Qian; Hu, Xiaomin

    2016-03-01

    To explore the metabolic process of Paenibacillus shenyangensis that is an efficient bioflocculant-producing bacterium. The biosynthesis mechanism of bioflocculation was used to enrich the genome of Paenibacillus shenyangensis and provide a basis for molecular genetics and functional genomics analyses. According to the analysis of de novo assembly, a total of 5,501,467 bp clean reads were generated, and were assembled into 92 contigs. 4800 unigenes were predicted of which 4393 were annotated showing a specific gene function in the NCBI-Nr database. 3423 genes were found in the database of cluster of orthologous groups. Among the 168 Kyoto Encyclopedia of Genes and Genomes database, cell growth and metabolism were the main biological processes, and a potential metabolic pathway was predicted from glucose to exopolysaccharide within the starch and sucrose metabolism pathway. By using the high-throughput sequencing technology, we provide a genome analysis of Paenibacillus shenyangensis that predicts the main metabolic processes and a potential pathway of exopolysaccharide biosynthesis.

  19. An “EAR” on environmental surveillance and monitoring: A case study on the use of Exposure–Activity Ratios (EARs) to prioritize sites, chemicals, and bioactivities of concern in Great Lakes waters

    USGS Publications Warehouse

    Blackwell, Brett R.; Ankley, Gerald T.; Corsi, Steven; DeCicco, Laura; Houck, Kieth A.; Judson, Richard S.; Li, Shibin; Martin, Matthew T.; Murphy, Elizabeth; Schroeder, Anthony L.; Smith, Edwin R.; Swintek, Joe; Villeneuve, Daniel L.

    2017-01-01

    Current environmental monitoring approaches focus primarily on chemical occurrence. However, based on concentration alone, it can be difficult to identify which compounds may be of toxicological concern and should be prioritized for further monitoring, in-depth testing, or management. This can be problematic because toxicological characterization is lacking for many emerging contaminants. New sources of high-throughput screening (HTS) data, such as the ToxCast database, which contains information for over 9000 compounds screened through up to 1100 bioassays, are now available. Integrated analysis of chemical occurrence data with HTS data offers new opportunities to prioritize chemicals, sites, or biological effects for further investigation based on concentrations detected in the environment linked to relative potencies in pathway-based bioassays. As a case study, chemical occurrence data from a 2012 study in the Great Lakes Basin along with the ToxCast effects database were used to calculate exposure–activity ratios (EARs) as a prioritization tool. Technical considerations of data processing and use of the ToxCast database are presented and discussed. EAR prioritization identified multiple sites, biological pathways, and chemicals that warrant further investigation. Prioritized bioactivities from the EAR analysis were linked to discrete adverse outcome pathways to identify potential adverse outcomes and biomarkers for use in subsequent monitoring efforts.

  20. Pathway Tools version 19.0 update: software for pathway/genome informatics and systems biology.

    PubMed

    Karp, Peter D; Latendresse, Mario; Paley, Suzanne M; Krummenacker, Markus; Ong, Quang D; Billington, Richard; Kothari, Anamika; Weaver, Daniel; Lee, Thomas; Subhraveti, Pallavi; Spaulding, Aaron; Fulcher, Carol; Keseler, Ingrid M; Caspi, Ron

    2016-09-01

    Pathway Tools is a bioinformatics software environment with a broad set of capabilities. The software provides genome-informatics tools such as a genome browser, sequence alignments, a genome-variant analyzer and comparative-genomics operations. It offers metabolic-informatics tools, such as metabolic reconstruction, quantitative metabolic modeling, prediction of reaction atom mappings and metabolic route search. Pathway Tools also provides regulatory-informatics tools, such as the ability to represent and visualize a wide range of regulatory interactions. This article outlines the advances in Pathway Tools in the past 5 years. Major additions include components for metabolic modeling, metabolic route search, computation of atom mappings and estimation of compound Gibbs free energies of formation; addition of editors for signaling pathways, for genome sequences and for cellular architecture; storage of gene essentiality data and phenotype data; display of multiple alignments, and of signaling and electron-transport pathways; and development of Python and web-services application programming interfaces. Scientists around the world have created more than 9800 Pathway/Genome Databases by using Pathway Tools, many of which are curated databases for important model organisms. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  1. Enhancing a Pathway-Genome Database (PGDB) to capture subcellular localization of metabolites and enzymes: the nucleotide-sugar biosynthetic pathways of Populus trichocarpa.

    PubMed

    Nag, Ambarish; Karpinets, Tatiana V; Chang, Christopher H; Bar-Peled, Maor

    2012-01-01

    Understanding how cellular metabolism works and is regulated requires that the underlying biochemical pathways be adequately represented and integrated with large metabolomic data sets to establish a robust network model. Genetically engineering energy crops to be less recalcitrant to saccharification requires detailed knowledge of plant polysaccharide structures and a thorough understanding of the metabolic pathways involved in forming and regulating cell-wall synthesis. Nucleotide-sugars are building blocks for synthesis of cell wall polysaccharides. The biosynthesis of nucleotide-sugars is catalyzed by a multitude of enzymes that reside in different subcellular organelles, and precise representation of these pathways requires accurate capture of this biological compartmentalization. The lack of simple localization cues in genomic sequence data and annotations however leads to missing compartmentalization information for eukaryotes in automatically generated databases, such as the Pathway-Genome Databases (PGDBs) of the SRI Pathway Tools software that drives much biochemical knowledge representation on the internet. In this report, we provide an informal mechanism using the existing Pathway Tools framework to integrate protein and metabolite sub-cellular localization data with the existing representation of the nucleotide-sugar metabolic pathways in a prototype PGDB for Populus trichocarpa. The enhanced pathway representations have been successfully used to map SNP abundance data to individual nucleotide-sugar biosynthetic genes in the PGDB. The manually curated pathway representations are more conducive to the construction of a computational platform that will allow the simulation of natural and engineered nucleotide-sugar precursor fluxes into specific recalcitrant polysaccharide(s). Database URL: The curated Populus PGDB is available in the BESC public portal at http://cricket.ornl.gov/cgi-bin/beocyc_home.cgi and the nucleotide-sugar biosynthetic pathways can be directly accessed at http://cricket.ornl.gov:1555/PTR/new-image?object=SUGAR-NUCLEOTIDES.

  2. Enhancing a Pathway-Genome Database (PGDB) to capture subcellular localization of metabolites and enzymes: the nucleotide-sugar biosynthetic pathways of Populus trichocarpa

    PubMed Central

    Nag, Ambarish; Karpinets, Tatiana V.; Chang, Christopher H.; Bar-Peled, Maor

    2012-01-01

    Understanding how cellular metabolism works and is regulated requires that the underlying biochemical pathways be adequately represented and integrated with large metabolomic data sets to establish a robust network model. Genetically engineering energy crops to be less recalcitrant to saccharification requires detailed knowledge of plant polysaccharide structures and a thorough understanding of the metabolic pathways involved in forming and regulating cell-wall synthesis. Nucleotide-sugars are building blocks for synthesis of cell wall polysaccharides. The biosynthesis of nucleotide-sugars is catalyzed by a multitude of enzymes that reside in different subcellular organelles, and precise representation of these pathways requires accurate capture of this biological compartmentalization. The lack of simple localization cues in genomic sequence data and annotations however leads to missing compartmentalization information for eukaryotes in automatically generated databases, such as the Pathway-Genome Databases (PGDBs) of the SRI Pathway Tools software that drives much biochemical knowledge representation on the internet. In this report, we provide an informal mechanism using the existing Pathway Tools framework to integrate protein and metabolite sub-cellular localization data with the existing representation of the nucleotide-sugar metabolic pathways in a prototype PGDB for Populus trichocarpa. The enhanced pathway representations have been successfully used to map SNP abundance data to individual nucleotide-sugar biosynthetic genes in the PGDB. The manually curated pathway representations are more conducive to the construction of a computational platform that will allow the simulation of natural and engineered nucleotide-sugar precursor fluxes into specific recalcitrant polysaccharide(s). Database URL: The curated Populus PGDB is available in the BESC public portal at http://cricket.ornl.gov/cgi-bin/beocyc_home.cgi and the nucleotide-sugar biosynthetic pathways can be directly accessed at http://cricket.ornl.gov:1555/PTR/new-image?object=SUGAR-NUCLEOTIDES. PMID:22465851

  3. Online Analytical Processing (OLAP): A Fast and Effective Data Mining Tool for Gene Expression Databases

    PubMed Central

    2005-01-01

    Gene expression databases contain a wealth of information, but current data mining tools are limited in their speed and effectiveness in extracting meaningful biological knowledge from them. Online analytical processing (OLAP) can be used as a supplement to cluster analysis for fast and effective data mining of gene expression databases. We used Analysis Services 2000, a product that ships with SQLServer2000, to construct an OLAP cube that was used to mine a time series experiment designed to identify genes associated with resistance of soybean to the soybean cyst nematode, a devastating pest of soybean. The data for these experiments is stored in the soybean genomics and microarray database (SGMD). A number of candidate resistance genes and pathways were found. Compared to traditional cluster analysis of gene expression data, OLAP was more effective and faster in finding biologically meaningful information. OLAP is available from a number of vendors and can work with any relational database management system through OLE DB. PMID:16046824

  4. Automated detection of discourse segment and experimental types from the text of cancer pathway results sections.

    PubMed

    Burns, Gully A P C; Dasigi, Pradeep; de Waard, Anita; Hovy, Eduard H

    2016-01-01

    Automated machine-reading biocuration systems typically use sentence-by-sentence information extraction to construct meaning representations for use by curators. This does not directly reflect the typical discourse structure used by scientists to construct an argument from the experimental data available within a article, and is therefore less likely to correspond to representations typically used in biomedical informatics systems (let alone to the mental models that scientists have). In this study, we develop Natural Language Processing methods to locate, extract, and classify the individual passages of text from articles' Results sections that refer to experimental data. In our domain of interest (molecular biology studies of cancer signal transduction pathways), individual articles may contain as many as 30 small-scale individual experiments describing a variety of findings, upon which authors base their overall research conclusions. Our system automatically classifies discourse segments in these texts into seven categories (fact, hypothesis, problem, goal, method, result, implication) with an F-score of 0.68. These segments describe the essential building blocks of scientific discourse to (i) provide context for each experiment, (ii) report experimental details and (iii) explain the data's meaning in context. We evaluate our system on text passages from articles that were curated in molecular biology databases (the Pathway Logic Datum repository, the Molecular Interaction MINT and INTACT databases) linking individual experiments in articles to the type of assay used (coprecipitation, phosphorylation, translocation etc.). We use supervised machine learning techniques on text passages containing unambiguous references to experiments to obtain baseline F1 scores of 0.59 for MINT, 0.71 for INTACT and 0.63 for Pathway Logic. Although preliminary, these results support the notion that targeting information extraction methods to experimental results could provide accurate, automated methods for biocuration. We also suggest the need for finer-grained curation of experimental methods used when constructing molecular biology databases. © The Author(s) 2016. Published by Oxford University Press.

  5. Linking microarray reporters with protein functions.

    PubMed

    Gaj, Stan; van Erk, Arie; van Haaften, Rachel I M; Evelo, Chris T A

    2007-09-26

    The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways. This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways. Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/.

  6. Database constraints applied to metabolic pathway reconstruction tools.

    PubMed

    Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi

    2014-01-01

    Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes.

  7. Global Metabolic Reconstruction and Metabolic Gene Evolution in the Cattle Genome

    PubMed Central

    Kim, Woonsu; Park, Hyesun; Seo, Seongwon

    2016-01-01

    The sequence of cattle genome provided a valuable opportunity to systematically link genetic and metabolic traits of cattle. The objectives of this study were 1) to reconstruct genome-scale cattle-specific metabolic pathways based on the most recent and updated cattle genome build and 2) to identify duplicated metabolic genes in the cattle genome for better understanding of metabolic adaptations in cattle. A bioinformatic pipeline of an organism for amalgamating genomic annotations from multiple sources was updated. Using this, an amalgamated cattle genome database based on UMD_3.1, was created. The amalgamated cattle genome database is composed of a total of 33,292 genes: 19,123 consensus genes between NCBI and Ensembl databases, 8,410 and 5,493 genes only found in NCBI or Ensembl, respectively, and 266 genes from NCBI scaffolds. A metabolic reconstruction of the cattle genome and cattle pathway genome database (PGDB) was also developed using Pathway Tools, followed by an intensive manual curation. The manual curation filled or revised 68 pathway holes, deleted 36 metabolic pathways, and added 23 metabolic pathways. Consequently, the curated cattle PGDB contains 304 metabolic pathways, 2,460 reactions including 2,371 enzymatic reactions, and 4,012 enzymes. Furthermore, this study identified eight duplicated genes in 12 metabolic pathways in the cattle genome compared to human and mouse. Some of these duplicated genes are related with specific hormone biosynthesis and detoxifications. The updated genome-scale metabolic reconstruction is a useful tool for understanding biology and metabolic characteristics in cattle. There has been significant improvements in the quality of cattle genome annotations and the MetaCyc database. The duplicated metabolic genes in the cattle genome compared to human and mouse implies evolutionary changes in the cattle genome and provides a useful information for further research on understanding metabolic adaptations of cattle. PMID:26992093

  8. Causal biological network database: a comprehensive platform of causal biological network models focused on the pulmonary and vascular systems.

    PubMed

    Boué, Stéphanie; Talikka, Marja; Westra, Jurjen Willem; Hayes, William; Di Fabio, Anselmo; Park, Jennifer; Schlage, Walter K; Sewer, Alain; Fields, Brett; Ansari, Sam; Martin, Florian; Veljkovic, Emilija; Kenney, Renee; Peitsch, Manuel C; Hoeng, Julia

    2015-01-01

    With the wealth of publications and data available, powerful and transparent computational approaches are required to represent measured data and scientific knowledge in a computable and searchable format. We developed a set of biological network models, scripted in the Biological Expression Language, that reflect causal signaling pathways across a wide range of biological processes, including cell fate, cell stress, cell proliferation, inflammation, tissue repair and angiogenesis in the pulmonary and cardiovascular context. This comprehensive collection of networks is now freely available to the scientific community in a centralized web-based repository, the Causal Biological Network database, which is composed of over 120 manually curated and well annotated biological network models and can be accessed at http://causalbionet.com. The website accesses a MongoDB, which stores all versions of the networks as JSON objects and allows users to search for genes, proteins, biological processes, small molecules and keywords in the network descriptions to retrieve biological networks of interest. The content of the networks can be visualized and browsed. Nodes and edges can be filtered and all supporting evidence for the edges can be browsed and is linked to the original articles in PubMed. Moreover, networks may be downloaded for further visualization and evaluation. Database URL: http://causalbionet.com © The Author(s) 2015. Published by Oxford University Press.

  9. Pathway results from the chicken data set using GOTM, Pathway Studio and Ingenuity softwares

    PubMed Central

    Bonnet, Agnès; Lagarrigue, Sandrine; Liaubet, Laurence; Robert-Granié, Christèle; SanCristobal, Magali; Tosser-Klopp, Gwenola

    2009-01-01

    Background As presented in the introduction paper, three sets of differentially regulated genes were found after the analysis of the chicken infection data set from EADGENE. Different methods were used to interpret these results. Results GOTM, Pathway Studio and Ingenuity softwares were used to investigate the three lists of genes. The three softwares allowed the analysis of the data and highlighted different networks. However, only one set of genes, showing a differential expression between primary and secondary response gave significant biological interpretation. Conclusion Combining these databases that were developed independently on different annotation sources supplies a useful tool for a global biological interpretation of microarray data, even if they may contain some imperfections (e.g. gene not or not well annotated). PMID:19615111

  10. A Systems Biology Strategy to Identify Molecular Mechanisms of Action and Protein Indicators of Traumatic Brain Injury

    DTIC Science & Technology

    2014-11-14

    2 Xueping Yu,1 Bhaskar Dutta,1 Jacob D. Feala,1 Kara Schmid,2 Jitendra Dave,2 Gregory J . Tawa,1 Anders Wallqvist,1 and Jaques Reifman1* 1Department of...pathway.html), downloaded in December, 2011. KEGG, one of the largest and most widely used publicly available pathway databases, anno - tates pathways...Ansari MA, Roberts KN, Scheff SW. 2008b. A time course of contusion-induced oxidative stress and synaptic proteins in cortex in a rat model of TBI. J

  11. Metabolome searcher: a high throughput tool for metabolite identification and metabolic pathway mapping directly from mass spectrometry and using genome restriction.

    PubMed

    Dhanasekaran, A Ranjitha; Pearson, Jon L; Ganesan, Balasubramanian; Weimer, Bart C

    2015-02-25

    Mass spectrometric analysis of microbial metabolism provides a long list of possible compounds. Restricting the identification of the possible compounds to those produced by the specific organism would benefit the identification process. Currently, identification of mass spectrometry (MS) data is commonly done using empirically derived compound databases. Unfortunately, most databases contain relatively few compounds, leaving long lists of unidentified molecules. Incorporating genome-encoded metabolism enables MS output identification that may not be included in databases. Using an organism's genome as a database restricts metabolite identification to only those compounds that the organism can produce. To address the challenge of metabolomic analysis from MS data, a web-based application to directly search genome-constructed metabolic databases was developed. The user query returns a genome-restricted list of possible compound identifications along with the putative metabolic pathways based on the name, formula, SMILES structure, and the compound mass as defined by the user. Multiple queries can be done simultaneously by submitting a text file created by the user or obtained from the MS analysis software. The user can also provide parameters specific to the experiment's MS analysis conditions, such as mass deviation, adducts, and detection mode during the query so as to provide additional levels of evidence to produce the tentative identification. The query results are provided as an HTML page and downloadable text file of possible compounds that are restricted to a specific genome. Hyperlinks provided in the HTML file connect the user to the curated metabolic databases housed in ProCyc, a Pathway Tools platform, as well as the KEGG Pathway database for visualization and metabolic pathway analysis. Metabolome Searcher, a web-based tool, facilitates putative compound identification of MS output based on genome-restricted metabolic capability. This enables researchers to rapidly extend the possible identifications of large data sets for metabolites that are not in compound databases. Putative compound names with their associated metabolic pathways from metabolomics data sets are returned to the user for additional biological interpretation and visualization. This novel approach enables compound identification by restricting the possible masses to those encoded in the genome.

  12. Pathway Analysis Revealed Potential Diverse Health Impacts of Flavonoids that Bind Estrogen Receptors

    PubMed Central

    Ye, Hao; Ng, Hui Wen; Sakkiah, Sugunadevi; Ge, Weigong; Perkins, Roger; Tong, Weida; Hong, Huixiao

    2016-01-01

    Flavonoids are frequently used as dietary supplements in the absence of research evidence regarding health benefits or toxicity. Furthermore, ingested doses could far exceed those received from diet in the course of normal living. Some flavonoids exhibit binding to estrogen receptors (ERs) with consequential vigilance by regulatory authorities at the U.S. EPA and FDA. Regulatory authorities must consider both beneficial claims and potential adverse effects, warranting the increases in research that has spanned almost two decades. Here, we report pathway enrichment of 14 targets from the Comparative Toxicogenomics Database (CTD) and the Herbal Ingredients’ Targets (HIT) database for 22 flavonoids that bind ERs. The selected flavonoids are confirmed ER binders from our earlier studies, and were here found in mainly involved in three types of biological processes, ER regulation, estrogen metabolism and synthesis, and apoptosis. Besides cancers, we conjecture that the flavonoids may affect several diseases via apoptosis pathways. Diseases such as amyotrophic lateral sclerosis, viral myocarditis and non-alcoholic fatty liver disease could be implicated. More generally, apoptosis processes may be importantly evolved biological functions of flavonoids that bind ERs and high dose ingestion of those flavonoids could adversely disrupt the cellular apoptosis process. PMID:27023590

  13. A novel dysregulated pathway-identification analysis based on global influence of within-pathway effects and crosstalk between pathways

    PubMed Central

    Han, Junwei; Li, Chunquan; Yang, Haixiu; Xu, Yanjun; Zhang, Chunlong; Ma, Jiquan; Shi, Xinrui; Liu, Wei; Shang, Desi; Yao, Qianlan; Zhang, Yunpeng; Su, Fei; Feng, Li; Li, Xia

    2015-01-01

    Identifying dysregulated pathways from high-throughput experimental data in order to infer underlying biological insights is an important task. Current pathway-identification methods focus on single pathways in isolation; however, consideration of crosstalk between pathways could improve our understanding of alterations in biological states. We propose a novel method of pathway analysis based on global influence (PAGI) to identify dysregulated pathways, by considering both within-pathway effects and crosstalk between pathways. We constructed a global gene–gene network based on the relationships among genes extracted from a pathway database. We then evaluated the extent of differential expression for each gene, and mapped them to the global network. The random walk with restart algorithm was used to calculate the extent of genes affected by global influence. Finally, we used cumulative distribution functions to determine the significance values of the dysregulated pathways. We applied the PAGI method to five cancer microarray datasets, and compared our results with gene set enrichment analysis and five other methods. Based on these analyses, we demonstrated that PAGI can effectively identify dysregulated pathways associated with cancer, with strong reproducibility and robustness. We implemented PAGI using the freely available R-based and Web-based tools (http://bioinfo.hrbmu.edu.cn/PAGI). PMID:25551156

  14. Exploring Genetic, Genomic, and Phenotypic Data at the Rat Genome Database

    PubMed Central

    Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Dwinell, Melinda R.; Jacob, Howard J.; Shimoyama, Mary

    2013-01-01

    The laboratory rat, Rattus norvegicus, is an important model of human health and disease, and experimental findings in the rat have relevance to human physiology and disease. The Rat Genome Database (RGD, http://rgd.mcw.edu) is a model organism database that provides access to a wide variety of curated rat data including disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components for genes, quantitative trait loci, and strains. We present an overview of the database followed by specific examples that can be used to gain experience in employing RGD to explore the wealth of functional data available for the rat. PMID:23255149

  15. BIOZON: a system for unification, management and analysis of heterogeneous biological data.

    PubMed

    Birkland, Aaron; Yona, Golan

    2006-02-15

    Integration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly. Amongst the problems that one has to face are integrity, consistency, redundancy, connectivity, expressiveness and updatability. Here we present a system (Biozon) that addresses these problems, and offers biologists a new knowledge resource to navigate through and explore. Biozon unifies multiple biological databases consisting of a variety of data types (such as DNA sequences, proteins, interactions and cellular pathways). It is fundamentally different from previous efforts as it uses a single extensive and tightly connected graph schema wrapped with hierarchical ontology of documents and relations. Beyond warehousing existing data, Biozon computes and stores novel derived data, such as similarity relationships and functional predictions. The integration of similarity data allows propagation of knowledge through inference and fuzzy searches. Sophisticated methods of query that span multiple data types were implemented and first-of-a-kind biological ranking systems were explored and integrated. The Biozon system is an extensive knowledge resource of heterogeneous biological data. Currently, it holds more than 100 million biological documents and 6.5 billion relations between them. The database is accessible through an advanced web interface that supports complex queries, "fuzzy" searches, data materialization and more, online at http://biozon.org.

  16. Correcting ligands, metabolites, and pathways

    PubMed Central

    Ott, Martin A; Vriend, Gert

    2006-01-01

    Background A wide range of research areas in bioinformatics, molecular biology and medicinal chemistry require precise chemical structure information about molecules and reactions, e.g. drug design, ligand docking, metabolic network reconstruction, and systems biology. Most available databases, however, treat chemical structures more as illustrations than as a datafield in its own right. Lack of chemical accuracy impedes progress in the areas mentioned above. We present a database of metabolites called BioMeta that augments the existing pathway databases by explicitly assessing the validity, correctness, and completeness of chemical structure and reaction information. Description The main bulk of the data in BioMeta were obtained from the KEGG Ligand database. We developed a tool for chemical structure validation which assesses the chemical validity and stereochemical completeness of a molecule description. The validation tool was used to examine the compounds in BioMeta, showing that a relatively small number of compounds had an incorrect constitution (connectivity only, not considering stereochemistry) and that a considerable number (about one third) had incomplete or even incorrect stereochemistry. We made a large effort to correct the errors and to complete the structural descriptions. A total of 1468 structures were corrected and/or completed. We also established the reaction balance of the reactions in BioMeta and corrected 55% of the unbalanced (stoichiometrically incorrect) reactions in an automatic procedure. The BioMeta database was implemented in PostgreSQL and provided with a web-based interface. Conclusion We demonstrate that the validation of metabolite structures and reactions is a feasible and worthwhile undertaking, and that the validation results can be used to trigger corrections and improvements to BioMeta, our metabolite database. BioMeta provides some tools for rational drug design, reaction searches, and visualization. It is freely available at provided that the copyright notice of all original data is cited. The database will be useful for querying and browsing biochemical pathways, and to obtain reference information for identifying compounds. However, these applications require that the underlying data be correct, and that is the focus of BioMeta. PMID:17132165

  17. Molecular signatures database (MSigDB) 3.0.

    PubMed

    Liberzon, Arthur; Subramanian, Aravind; Pinchback, Reid; Thorvaldsdóttir, Helga; Tamayo, Pablo; Mesirov, Jill P

    2011-06-15

    Well-annotated gene sets representing the universe of the biological processes are critical for meaningful and insightful interpretation of large-scale genomic data. The Molecular Signatures Database (MSigDB) is one of the most widely used repositories of such sets. We report the availability of a new version of the database, MSigDB 3.0, with over 6700 gene sets, a complete revision of the collection of canonical pathways and experimental signatures from publications, enhanced annotations and upgrades to the web site. MSigDB is freely available for non-commercial use at http://www.broadinstitute.org/msigdb.

  18. Linking microarray reporters with protein functions

    PubMed Central

    Gaj, Stan; van Erk, Arie; van Haaften, Rachel IM; Evelo, Chris TA

    2007-01-01

    Background The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways. Results This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways. Conclusion Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/. PMID:17897448

  19. Database Constraints Applied to Metabolic Pathway Reconstruction Tools

    PubMed Central

    Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi

    2014-01-01

    Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes. PMID:25202745

  20. Benchmarking pathway interaction network for colorectal cancer to identify dysregulated pathways.

    PubMed

    Wang, Q; Shi, C-J; Lv, S-H

    2017-03-30

    Different pathways act synergistically to participate in many biological processes. Thus, the purpose of our study was to extract dysregulated pathways to investigate the pathogenesis of colorectal cancer (CRC) based on the functional dependency among pathways. Protein-protein interaction (PPI) information and pathway data were retrieved from STRING and Reactome databases, respectively. After genes were aligned to the pathways, each pathway activity was calculated using the principal component analysis (PCA) method, and the seed pathway was discovered. Subsequently, we constructed the pathway interaction network (PIN), where each node represented a biological pathway based on gene expression profile, PPI data, as well as pathways. Dysregulated pathways were then selected from the PIN according to classification performance and seed pathway. A PIN including 11,960 interactions was constructed to identify dysregulated pathways. Interestingly, the interaction of mRNA splicing and mRNA splicing-major pathway had the highest score of 719.8167. Maximum change of the activity score between CRC and normal samples appeared in the pathway of DNA replication, which was selected as the seed pathway. Starting with this seed pathway, a pathway set containing 30 dysregulated pathways was obtained with an area under the curve score of 0.8598. The pathway of mRNA splicing, mRNA splicing-major pathway, and RNA polymerase I had the maximum genes of 107. Moreover, we found that these 30 pathways had crosstalks with each other. The results suggest that these dysregulated pathways might be used as biomarkers to diagnose CRC.

  1. e-Science and biological pathway semantics

    PubMed Central

    Luciano, Joanne S; Stevens, Robert D

    2007-01-01

    Background The development of e-Science presents a major set of opportunities and challenges for the future progress of biological and life scientific research. Major new tools are required and corresponding demands are placed on the high-throughput data generated and used in these processes. Nowhere is the demand greater than in the semantic integration of these data. Semantic Web tools and technologies afford the chance to achieve this semantic integration. Since pathway knowledge is central to much of the scientific research today it is a good test-bed for semantic integration. Within the context of biological pathways, the BioPAX initiative, part of a broader movement towards the standardization and integration of life science databases, forms a necessary prerequisite for its successful application of e-Science in health care and life science research. This paper examines whether BioPAX, an effort to overcome the barrier of disparate and heterogeneous pathway data sources, addresses the needs of e-Science. Results We demonstrate how BioPAX pathway data can be used to ask and answer some useful biological questions. We find that BioPAX comes close to meeting a broad range of e-Science needs, but certain semantic weaknesses mean that these goals are missed. We make a series of recommendations for re-modeling some aspects of BioPAX to better meet these needs. Conclusion Once these semantic weaknesses are addressed, it will be possible to integrate pathway information in a manner that would be useful in e-Science. PMID:17493286

  2. The EBI SRS server-new features.

    PubMed

    Zdobnov, Evgeny M; Lopez, Rodrigo; Apweiler, Rolf; Etzold, Thure

    2002-08-01

    Here we report on recent developments at the EBI SRS server (http://srs.ebi.ac.uk). SRS has become an integration system for both data retrieval and sequence analysis applications. The EBI SRS server is a primary gateway to major databases in the field of molecular biology produced and supported at EBI as well as European public access point to the MEDLINE database provided by US National Library of Medicine (NLM). It is a reference server for latest developments in data and application integration. The new additions include: concept of virtual databases, integration of XML databases like the Integrated Resource of Protein Domains and Functional Sites (InterPro), Gene Ontology (GO), MEDLINE, Metabolic pathways, etc., user friendly data representation in 'Nice views', SRSQuickSearch bookmarklets. SRS6 is a licensed product of LION Bioscience AG freely available for academics. The EBI SRS server (http://srs.ebi.ac.uk) is a free central resource for molecular biology data as well as a reference server for the latest developments in data integration.

  3. PathMAPA: a tool for displaying gene expression and performing statistical tests on metabolic pathways at multiple levels for Arabidopsis.

    PubMed

    Pan, Deyun; Sun, Ning; Cheung, Kei-Hoi; Guan, Zhong; Ma, Ligeng; Holford, Matthew; Deng, Xingwang; Zhao, Hongyu

    2003-11-07

    To date, many genomic and pathway-related tools and databases have been developed to analyze microarray data. In published web-based applications to date, however, complex pathways have been displayed with static image files that may not be up-to-date or are time-consuming to rebuild. In addition, gene expression analyses focus on individual probes and genes with little or no consideration of pathways. These approaches reveal little information about pathways that are key to a full understanding of the building blocks of biological systems. Therefore, there is a need to provide useful tools that can generate pathways without manually building images and allow gene expression data to be integrated and analyzed at pathway levels for such experimental organisms as Arabidopsis. We have developed PathMAPA, a web-based application written in Java that can be easily accessed over the Internet. An Oracle database is used to store, query, and manipulate the large amounts of data that are involved. PathMAPA allows its users to (i) upload and populate microarray data into a database; (ii) integrate gene expression with enzymes of the pathways; (iii) generate pathway diagrams without building image files manually; (iv) visualize gene expressions for each pathway at enzyme, locus, and probe levels; and (v) perform statistical tests at pathway, enzyme and gene levels. PathMAPA can be used to examine Arabidopsis thaliana gene expression patterns associated with metabolic pathways. PathMAPA provides two unique features for the gene expression analysis of Arabidopsis thaliana: (i) automatic generation of pathways associated with gene expression and (ii) statistical tests at pathway level. The first feature allows for the periodical updating of genomic data for pathways, while the second feature can provide insight into how treatments affect relevant pathways for the selected experiment(s).

  4. PathMAPA: a tool for displaying gene expression and performing statistical tests on metabolic pathways at multiple levels for Arabidopsis

    PubMed Central

    Pan, Deyun; Sun, Ning; Cheung, Kei-Hoi; Guan, Zhong; Ma, Ligeng; Holford, Matthew; Deng, Xingwang; Zhao, Hongyu

    2003-01-01

    Background To date, many genomic and pathway-related tools and databases have been developed to analyze microarray data. In published web-based applications to date, however, complex pathways have been displayed with static image files that may not be up-to-date or are time-consuming to rebuild. In addition, gene expression analyses focus on individual probes and genes with little or no consideration of pathways. These approaches reveal little information about pathways that are key to a full understanding of the building blocks of biological systems. Therefore, there is a need to provide useful tools that can generate pathways without manually building images and allow gene expression data to be integrated and analyzed at pathway levels for such experimental organisms as Arabidopsis. Results We have developed PathMAPA, a web-based application written in Java that can be easily accessed over the Internet. An Oracle database is used to store, query, and manipulate the large amounts of data that are involved. PathMAPA allows its users to (i) upload and populate microarray data into a database; (ii) integrate gene expression with enzymes of the pathways; (iii) generate pathway diagrams without building image files manually; (iv) visualize gene expressions for each pathway at enzyme, locus, and probe levels; and (v) perform statistical tests at pathway, enzyme and gene levels. PathMAPA can be used to examine Arabidopsis thaliana gene expression patterns associated with metabolic pathways. Conclusion PathMAPA provides two unique features for the gene expression analysis of Arabidopsis thaliana: (i) automatic generation of pathways associated with gene expression and (ii) statistical tests at pathway level. The first feature allows for the periodical updating of genomic data for pathways, while the second feature can provide insight into how treatments affect relevant pathways for the selected experiment(s). PMID:14604444

  5. Bioinformatics approach reveals systematic mechanism underlying lung adenocarcinoma.

    PubMed

    Wu, Xiya; Zhang, Wei; Hu, Yunhua; Yi, Xianghua

    2015-01-01

    The purpose of this work was to explore the systematic molecular mechanism of lung adenocarcinoma and gain a deeper insight into it. Comprehensive bioinformatics methods were applied. Initially, significant differentially expressed genes (DEGs) were analyzed from the Affymetrix microarray data (GSE27262) deposited in the Gene Expression Omnibus (GEO). Subsequently, gene ontology (GO) analysis was performed using online Database for Annotation, Visualization and Integration Discovery (DAVID) software. Finally, significant pathway crosstalk was investigated based on the information derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. According to our results, the N-terminal globular domain of the type X collagen (COL10A1) gene and transmembrane protein 100 (TMEM100) gene were identified to be the most significant DEGs in tumor tissue compared with the adjacent normal tissues. The main GO categories were biological process, cellular component and molecular function. In addition, the crosstalk was significantly different between non-small cell lung cancer pathways and inositol phosphate metabolism pathway, focal adhesion signal pathway, vascular smooth muscle contraction signal pathway, peroxisome proliferator-activated receptor (PPAR) signaling pathway and calcium signaling pathway in tumor. Dysfunctional genes and pathways may play key roles in the progression and development of lung adenocarcinoma. Our data provide a systematic perspective for understanding this mechanism and may be helpful in discovering an effective treatment for lung adenocarcinoma.

  6. Knowledge representation in metabolic pathway databases.

    PubMed

    Stobbe, Miranda D; Jansen, Gerbert A; Moerland, Perry D; van Kampen, Antoine H C

    2014-05-01

    The accurate representation of all aspects of a metabolic network in a structured format, such that it can be used for a wide variety of computational analyses, is a challenge faced by a growing number of researchers. Analysis of five major metabolic pathway databases reveals that each database has made widely different choices to address this challenge, including how to deal with knowledge that is uncertain or missing. In concise overviews, we show how concepts such as compartments, enzymatic complexes and the direction of reactions are represented in each database. Importantly, also concepts which a database does not represent are described. Which aspects of the metabolic network need to be available in a structured format and to what detail differs per application. For example, for in silico phenotype prediction, a detailed representation of gene-protein-reaction relations and the compartmentalization of the network is essential. Our analysis also shows that current databases are still limited in capturing all details of the biology of the metabolic network, further illustrated with a detailed analysis of three metabolic processes. Finally, we conclude that the conceptual differences between the databases, which make knowledge exchange and integration a challenge, have not been resolved, so far, by the exchange formats in which knowledge representation is standardized.

  7. Pathway-Based Genome-Wide Association Studies for Two Meat Production Traits in Simmental Cattle.

    PubMed

    Fan, Huizhong; Wu, Yang; Zhou, Xiaojing; Xia, Jiangwei; Zhang, Wengang; Song, Yuxin; Liu, Fei; Chen, Yan; Zhang, Lupei; Gao, Xue; Gao, Huijiang; Li, Junya

    2015-12-17

    Most single nucleotide polymorphisms (SNPs) detected by genome-wide association studies (GWAS), explain only a small fraction of phenotypic variation. Pathway-based GWAS were proposed to improve the proportion of genes for some human complex traits that could be explained by enriching a mass of SNPs within genetic groups. However, few attempts have been made to describe the quantitative traits in domestic animals. In this study, we used a dataset with approximately 7,700,000 SNPs from 807 Simmental cattle and analyzed live weight and longissimus muscle area using a modified pathway-based GWAS method to orthogonalise the highly linked SNPs within each gene using principal component analysis (PCA). As a result, of the 262 biological pathways of cattle collected from the KEGG database, the gamma aminobutyric acid (GABA)ergic synapse pathway and the non-alcoholic fatty liver disease (NAFLD) pathway were significantly associated with the two traits analyzed. The GABAergic synapse pathway was biologically applicable to the traits analyzed because of its roles in feed intake and weight gain. The proposed method had high statistical power and a low false discovery rate, compared to those of the smallest P-value and SNP set enrichment analysis methods.

  8. Comprehensive coverage of cardiovascular disease data in the disease portals at the Rat Genome Database.

    PubMed

    Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G Thomas; Petri, Victoria; Smith, Jennifer R; Tutaj, Marek; Nigam, Rajni; Dwinell, Melinda R; Shimoyama, Mary

    2016-08-01

    Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality. Copyright © 2016 the American Physiological Society.

  9. The systematic annotation of the three main GPCR families in Reactome.

    PubMed

    Jassal, Bijay; Jupe, Steven; Caudy, Michael; Birney, Ewan; Stein, Lincoln; Hermjakob, Henning; D'Eustachio, Peter

    2010-07-29

    Reactome is an open-source, freely available database of human biological pathways and processes. A major goal of our work is to provide an integrated view of cellular signalling processes that spans from ligand-receptor interactions to molecular readouts at the level of metabolic and transcriptional events. To this end, we have built the first catalogue of all human G protein-coupled receptors (GPCRs) known to bind endogenous or natural ligands. The UniProt database has records for 797 proteins classified as GPCRs and sorted into families A/1, B/2 and C/3 on the basis of amino acid sequence. To these records we have added details from the IUPHAR database and our own manual curation of relevant literature to create reactions in which 563 GPCRs bind ligands and also interact with specific G-proteins to initiate signalling cascades. We believe the remaining 234 GPCRs are true orphans. The Reactome GPCR pathway can be viewed as a detailed interactive diagram and can be exported in many forms. It provides a template for the orthology-based inference of GPCR reactions for diverse model organism species, and can be overlaid with protein-protein interaction and gene expression datasets to facilitate overrepresentation studies and other forms of pathway analysis. Database URL: http://www.reactome.org.

  10. The European Bioinformatics Institute's data resources: towards systems biology.

    PubMed

    Brooksbank, Catherine; Cameron, Graham; Thornton, Janet

    2005-01-01

    Genomic and post-genomic biological research has provided fine-grain insights into the molecular processes of life, but also threatens to drown biomedical researchers in data. Moreover, as new high-throughput technologies are developed, the types of data that are gathered en masse are diversifying. The need to collect, store and curate all this information in ways that allow its efficient retrieval and exploitation is greater than ever. The European Bioinformatics Institute's (EBI's) databases and tools have evolved to meet the changing needs of molecular biologists: since we last wrote about our services in the 2003 issue of Nucleic Acids Research, we have launched new databases covering protein-protein interactions (IntAct), pathways (Reactome) and small molecules (ChEBI). Our existing core databases have continued to evolve to meet the changing needs of biomedical researchers, and we have developed new data-access tools that help biologists to move intuitively through the different data types, thereby helping them to put the parts together to understand biology at the systems level. The EBI's data resources are all available on our website at http://www.ebi.ac.uk.

  11. The European Bioinformatics Institute's data resources: towards systems biology

    PubMed Central

    Brooksbank, Catherine; Cameron, Graham; Thornton, Janet

    2005-01-01

    Genomic and post-genomic biological research has provided fine-grain insights into the molecular processes of life, but also threatens to drown biomedical researchers in data. Moreover, as new high-throughput technologies are developed, the types of data that are gathered en masse are diversifying. The need to collect, store and curate all this information in ways that allow its efficient retrieval and exploitation is greater than ever. The European Bioinformatics Institute's (EBI's) databases and tools have evolved to meet the changing needs of molecular biologists: since we last wrote about our services in the 2003 issue of Nucleic Acids Research, we have launched new databases covering protein–protein interactions (IntAct), pathways (Reactome) and small molecules (ChEBI). Our existing core databases have continued to evolve to meet the changing needs of biomedical researchers, and we have developed new data-access tools that help biologists to move intuitively through the different data types, thereby helping them to put the parts together to understand biology at the systems level. The EBI's data resources are all available on our website at http://www.ebi.ac.uk. PMID:15608238

  12. In silico database screening of potential targets and pathways of compounds contained in plants used for psoriasis vulgaris.

    PubMed

    May, Brian H; Deng, Shiqiang; Zhang, Anthony L; Lu, Chuanjian; Xue, Charlie C L

    2015-09-01

    Reviews and meta-analyses of clinical trials identified plants used as traditional medicines (TMs) that show promise for psoriasis. These include Rehmannia glutinosa, Camptotheca acuminata, Indigo naturalis and Salvia miltiorrhiza. Compounds contained in these TMs have shown activities of relevance to psoriasis in experimental models. To further investigate the likely mechanisms of action of the multiple compounds in these TMs, we undertook a computer-based in silico investigation of the proteins known to be regulated by these compounds and their associated biological pathways. The proteins reportedly regulated by compounds in these four TMs were identified using the HIT (Herbal Ingredients' Targets) database. The resultant data were entered into the PANTHER (Protein ANnotation THrough Evolutionary Relationship) database to identify the pathways in which the proteins could be involved. The study identified 237 compounds in the TMs and these retrieved 287 proteins from HIT. These proteins identified 59 pathways in PANTHER with most proteins being located in the Apoptosis, Angiogenesis, Inflammation mediated by chemokine and cytokine, Gonadotropin releasing hormone receptor, and/or Interleukin signaling pathways. All four TMs contained compounds that had regulating effects on Apoptosis regulator BAX, Apoptosis regulator Bcl-2, Caspase-3, Tumor necrosis factor (TNF) or Prostaglandin G/H synthase 2 (COX2). The main proteins and pathways are primarily related to inflammation, proliferation and angiogenesis which are all processes involved in psoriasis. Experimental studies have reported that certain compounds from these TMs can regulate the expression of proteins involved in each of these pathways.

  13. Detecting uber-operons in prokaryotic genomes.

    PubMed

    Che, Dongsheng; Li, Guojun; Mao, Fenglou; Wu, Hongwei; Xu, Ying

    2006-01-01

    We present a study on computational identification of uber-operons in a prokaryotic genome, each of which represents a group of operons that are evolutionarily or functionally associated through operons in other (reference) genomes. Uber-operons represent a rich set of footprints of operon evolution, whose full utilization could lead to new and more powerful tools for elucidation of biological pathways and networks than what operons have provided, and a better understanding of prokaryotic genome structures and evolution. Our prediction algorithm predicts uber-operons through identifying groups of functionally or transcriptionally related operons, whose gene sets are conserved across the target and multiple reference genomes. Using this algorithm, we have predicted uber-operons for each of a group of 91 genomes, using the other 90 genomes as references. In particular, we predicted 158 uber-operons in Escherichia coli K12 covering 1830 genes, and found that many of the uber-operons correspond to parts of known regulons or biological pathways or are involved in highly related biological processes based on their Gene Ontology (GO) assignments. For some of the predicted uber-operons that are not parts of known regulons or pathways, our analyses indicate that their genes are highly likely to work together in the same biological processes, suggesting the possibility of new regulons and pathways. We believe that our uber-operon prediction provides a highly useful capability and a rich information source for elucidation of complex biological processes, such as pathways in microbes. All the prediction results are available at our Uber-Operon Database: http://csbl.bmb.uga.edu/uber, the first of its kind.

  14. Detecting uber-operons in prokaryotic genomes

    PubMed Central

    Che, Dongsheng; Li, Guojun; Mao, Fenglou; Wu, Hongwei; Xu, Ying

    2006-01-01

    We present a study on computational identification of uber-operons in a prokaryotic genome, each of which represents a group of operons that are evolutionarily or functionally associated through operons in other (reference) genomes. Uber-operons represent a rich set of footprints of operon evolution, whose full utilization could lead to new and more powerful tools for elucidation of biological pathways and networks than what operons have provided, and a better understanding of prokaryotic genome structures and evolution. Our prediction algorithm predicts uber-operons through identifying groups of functionally or transcriptionally related operons, whose gene sets are conserved across the target and multiple reference genomes. Using this algorithm, we have predicted uber-operons for each of a group of 91 genomes, using the other 90 genomes as references. In particular, we predicted 158 uber-operons in Escherichia coli K12 covering 1830 genes, and found that many of the uber-operons correspond to parts of known regulons or biological pathways or are involved in highly related biological processes based on their Gene Ontology (GO) assignments. For some of the predicted uber-operons that are not parts of known regulons or pathways, our analyses indicate that their genes are highly likely to work together in the same biological processes, suggesting the possibility of new regulons and pathways. We believe that our uber-operon prediction provides a highly useful capability and a rich information source for elucidation of complex biological processes, such as pathways in microbes. All the prediction results are available at our Uber-Operon Database: , the first of its kind. PMID:16682449

  15. Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence-Function Space and Genome Context to Discover Novel Functions.

    PubMed

    Gerlt, John A

    2017-08-22

    The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.

  16. Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence–Function Space and Genome Context to Discover Novel Functions

    PubMed Central

    2017-01-01

    The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of “genomic enzymology” web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence–function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems. PMID:28826221

  17. Identifying novel glioma associated pathways based on systems biology level meta-analysis.

    PubMed

    Hu, Yangfan; Li, Jinquan; Yan, Wenying; Chen, Jiajia; Li, Yin; Hu, Guang; Shen, Bairong

    2013-01-01

    With recent advances in microarray technology, including genomics, proteomics, and metabolomics, it brings a great challenge for integrating this "-omics" data to analysis complex disease. Glioma is an extremely aggressive and lethal form of brain tumor, and thus the study of the molecule mechanism underlying glioma remains very important. To date, most studies focus on detecting the differentially expressed genes in glioma. However, the meta-analysis for pathway analysis based on multiple microarray datasets has not been systematically pursued. In this study, we therefore developed a systems biology based approach by integrating three types of omics data to identify common pathways in glioma. Firstly, the meta-analysis has been performed to study the overlapping of signatures at different levels based on the microarray gene expression data of glioma. Among these gene expression datasets, 12 pathways were found in GeneGO database that shared by four stages. Then, microRNA expression profiles and ChIP-seq data were integrated for the further pathway enrichment analysis. As a result, we suggest 5 of these pathways could be served as putative pathways in glioma. Among them, the pathway of TGF-beta-dependent induction of EMT via SMAD is of particular importance. Our results demonstrate that the meta-analysis based on systems biology level provide a more useful approach to study the molecule mechanism of complex disease. The integration of different types of omics data, including gene expression microarrays, microRNA and ChIP-seq data, suggest some common pathways correlated with glioma. These findings will offer useful potential candidates for targeted therapeutic intervention of glioma.

  18. Signaling gateway molecule pages—a data model perspective

    PubMed Central

    Dinasarapu, Ashok Reddy; Saunders, Brian; Ozerlat, Iley; Azam, Kenan; Subramaniam, Shankar

    2011-01-01

    Summary: The Signaling Gateway Molecule Pages (SGMP) database provides highly structured data on proteins which exist in different functional states participating in signal transduction pathways. A molecule page starts with a state of a native protein, without any modification and/or interactions. New states are formed with every post-translational modification or interaction with one or more proteins, small molecules or class molecules and with each change in cellular location. State transitions are caused by a combination of one or more modifications, interactions and translocations which then might be associated with one or more biological processes. In a characterized biological state, a molecule can function as one of several entities or their combinations, including channel, receptor, enzyme, transcription factor and transporter. We have also exported SGMP data to the Biological Pathway Exchange (BioPAX) and Systems Biology Markup Language (SBML) as well as in our custom XML. Availability: SGMP is available at www.signaling-gateway.org/molecule. Contact: shankar@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21505029

  19. Rhizoma Dioscoreae extract protects against alveolar bone loss by regulating the cell cycle: A predictive study based on the protein‑protein interaction network.

    PubMed

    Zhang, Zhi-Guo; Song, Chang-Heng; Zhang, Fang-Zhen; Chen, Yan-Jing; Xiang, Li-Hua; Xiao, Gary Guishan; Ju, Da-Hong

    2016-06-01

    Rhizoma Dioscoreae extract (RDE) exhibits a protective effect on alveolar bone loss in ovariectomized (OVX) rats. The aim of this study was to predict the pathways or targets that are regulated by RDE, by re‑assessing our previously reported data and conducting a protein‑protein interaction (PPI) network analysis. In total, 383 differentially expressed genes (≥3‑fold) between alveolar bone samples from the RDE and OVX group rats were identified, and a PPI network was constructed based on these genes. Furthermore, four molecular clusters (A‑D) in the PPI network with the smallest P‑values were detected by molecular complex detection (MCODE) algorithm. Using Database for Annotation, Visualization and Integrated Discovery (DAVID) and Ingenuity Pathway Analysis (IPA) tools, two molecular clusters (A and B) were enriched for biological process in Gene Ontology (GO). Only cluster A was associated with biological pathways in the IPA database. GO and pathway analysis results showed that cluster A, associated with cell cycle regulation, was the most important molecular cluster in the PPI network. In addition, cyclin‑dependent kinase 1 (CDK1) may be a key molecule achieving the cell‑cycle‑regulatory function of cluster A. From the PPI network analysis, it was predicted that delayed cell cycle progression in excessive alveolar bone remodeling via downregulation of CDK1 may be another mechanism underling the anti‑osteopenic effect of RDE on alveolar bone.

  20. BiologicalNetworks 2.0 - an integrative view of genome biology data

    PubMed Central

    2010-01-01

    Background A significant problem in the study of mechanisms of an organism's development is the elucidation of interrelated factors which are making an impact on the different levels of the organism, such as genes, biological molecules, cells, and cell systems. Numerous sources of heterogeneous data which exist for these subsystems are still not integrated sufficiently enough to give researchers a straightforward opportunity to analyze them together in the same frame of study. Systematic application of data integration methods is also hampered by a multitude of such factors as the orthogonal nature of the integrated data and naming problems. Results Here we report on a new version of BiologicalNetworks, a research environment for the integral visualization and analysis of heterogeneous biological data. BiologicalNetworks can be queried for properties of thousands of different types of biological entities (genes/proteins, promoters, COGs, pathways, binding sites, and other) and their relations (interactions, co-expression, co-citations, and other). The system includes the build-pathways infrastructure for molecular interactions/relations and module discovery in high-throughput experiments. Also implemented in BiologicalNetworks are the Integrated Genome Viewer and Comparative Genomics Browser applications, which allow for the search and analysis of gene regulatory regions and their conservation in multiple species in conjunction with molecular pathways/networks, experimental data and functional annotations. Conclusions The new release of BiologicalNetworks together with its back-end database introduces extensive functionality for a more efficient integrated multi-level analysis of microarray, sequence, regulatory, and other data. BiologicalNetworks is freely available at http://www.biologicalnetworks.org. PMID:21190573

  1. Profiling conserved biological pathways in Autosomal Dominant Polycystic Kidney Disorder (ADPKD) to elucidate key transcriptomic alterations regulating cystogenesis: A cross-species meta-analysis approach.

    PubMed

    Chatterjee, Shatakshee; Verma, Srikant Prasad; Pandey, Priyanka

    2017-09-05

    Initiation and progression of fluid filled cysts mark Autosomal Dominant Polycystic Kidney Disease (ADPKD). Thus, improved therapeutics targeting cystogenesis remains a constant challenge. Microarray studies in single ADPKD animal models species with limited sample sizes tend to provide scattered views on underlying ADPKD pathogenesis. Thus we aim to perform a cross species meta-analysis to profile conserved biological pathways that might be key targets for therapy. Nine ADPKD microarray datasets on rat, mice and human fulfilled our study criteria and were chosen. Intra-species combined analysis was performed after considering removal of batch effect. Significantly enriched GO biological processes and KEGG pathways were computed and their overlap was observed. For the conserved pathways, biological modules and gene regulatory networks were observed. Additionally, Gene Set Enrichment Analysis (GSEA) using Molecular Signature Database (MSigDB) was performed for genes found in conserved pathways. We obtained 28 modules of significantly enriched GO processes and 5 major functional categories from significantly enriched KEGG pathways conserved in human, mice and rats that in turn suggest a global transcriptomic perturbation affecting cyst - formation, growth and progression. Significantly enriched pathways obtained from up-regulated genes such as Genomic instability, Protein localization in ER and Insulin Resistance were found to regulate cyst formation and growth whereas cyst progression due to increased cell adhesion and inflammation was suggested by perturbations in Angiogenesis, TGF-beta, CAMs, and Infection related pathways. Additionally, networks revealed shared genes among pathways e.g. SMAD2 and SMAD7 in Endocytosis and TGF-beta. Our study suggests cyst formation and progression to be an outcome of interplay between a set of several key deregulated pathways. Thus, further translational research is warranted focusing on developing a combinatorial therapeutic approach for ADPKD redressal. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Systems Genetics Analysis of GWAS reveals Novel Associations between Key Biological Processes and Coronary Artery Disease

    PubMed Central

    Ghosh, Sujoy; Vivar, Juan; Nelson, Christopher P; Willenborg, Christina; Segrè, Ayellet V; Mäkinen, Ville-Petteri; Nikpay, Majid; Erdmann, Jeannette; Blankenberg, Stefan; O'Donnell, Christopher; März, Winfried; Laaksonen, Reijo; Stewart, Alexandre FR; Epstein, Stephen E; Shah, Svati H; Granger, Christopher B; Hazen, Stanley L; Kathiresan, Sekar; Reilly, Muredach P; Yang, Xia; Quertermous, Thomas; Samani, Nilesh J; Schunkert, Heribert; Assimes, Themistocles L; McPherson, Ruth

    2016-01-01

    Objective Genome-wide association (GWA) studies have identified multiple genetic variants affecting the risk of coronary artery disease (CAD). However, individually these explain only a small fraction of the heritability of CAD and for most, the causal biological mechanisms remain unclear. We sought to obtain further insights into potential causal processes of CAD by integrating large-scale GWA data with expertly curated databases of core human pathways and functional networks. Approaches and Results Employing pathways (gene sets) from Reactome, we carried out a two-stage gene set enrichment analysis strategy. From a meta-analyzed discovery cohort of 7 CADGWAS data sets (9,889 cases/11,089 controls), nominally significant gene-sets were tested for replication in a meta-analysis of 9 additional studies (15,502 cases/55,730 controls) from the CARDIoGRAM Consortium. A total of 32 of 639 Reactome pathways tested showed convincing association with CAD (replication p<0.05). These pathways resided in 9 of 21 core biological processes represented in Reactome, and included pathways relevant to extracellular matrix integrity, innate immunity, axon guidance, and signaling by PDRF, NOTCH, and the TGF-β/SMAD receptor complex. Many of these pathways had strengths of association comparable to those observed in lipid transport pathways. Network analysis of unique genes within the replicated pathways further revealed several interconnected functional and topologically interacting modules representing novel associations (e.g. semaphorin regulated axonal guidance pathway) besides confirming known processes (lipid metabolism). The connectivity in the observed networks was statistically significant compared to random networks (p<0.001). Network centrality analysis (‘degree’ and ‘betweenness’) further identified genes (e.g. NCAM1, FYN, FURIN etc.) likely to play critical roles in the maintenance and functioning of several of the replicated pathways. Conclusions These findings provide novel insights into how genetic variation, interpreted in the context of biological processes and functional interactions among genes, may help define the genetic architecture of CAD. PMID:25977570

  3. A Systems Biology Approach Reveals Converging Molecular Mechanisms that Link Different POPs to Common Metabolic Diseases.

    PubMed

    Ruiz, Patricia; Perlina, Ally; Mumtaz, Moiz; Fowler, Bruce A

    2016-07-01

    A number of epidemiological studies have identified statistical associations between persistent organic pollutants (POPs) and metabolic diseases, but testable hypotheses regarding underlying molecular mechanisms to explain these linkages have not been published. We assessed the underlying mechanisms of POPs that have been associated with metabolic diseases; three well-known POPs [2,3,7,8-tetrachlorodibenzodioxin (TCDD), 2,2´,4,4´,5,5´-hexachlorobiphenyl (PCB 153), and 4,4´-dichlorodiphenyldichloroethylene (p,p´-DDE)] were studied. We used advanced database search tools to delineate testable hypotheses and to guide laboratory-based research studies into underlying mechanisms by which this POP mixture could produce or exacerbate metabolic diseases. For our searches, we used proprietary systems biology software (MetaCore™/MetaDrug™) to conduct advanced search queries for the underlying interactions database, followed by directional network construction to identify common mechanisms for these POPs within two or fewer interaction steps downstream of their primary targets. These common downstream pathways belong to various cytokine and chemokine families with experimentally well-documented causal associations with type 2 diabetes. Our systems biology approach allowed identification of converging pathways leading to activation of common downstream targets. To our knowledge, this is the first study to propose an integrated global set of step-by-step molecular mechanisms for a combination of three common POPs using a systems biology approach, which may link POP exposure to diseases. Experimental evaluation of the proposed pathways may lead to development of predictive biomarkers of the effects of POPs, which could translate into disease prevention and effective clinical treatment strategies. Ruiz P, Perlina A, Mumtaz M, Fowler BA. 2016. A systems biology approach reveals converging molecular mechanisms that link different POPs to common metabolic diseases. Environ Health Perspect 124:1034-1041; http://dx.doi.org/10.1289/ehp.1510308.

  4. The Co-regulation Data Harvester: Automating gene annotation starting from a transcriptome database

    NASA Astrophysics Data System (ADS)

    Tsypin, Lev M.; Turkewitz, Aaron P.

    Identifying co-regulated genes provides a useful approach for defining pathway-specific machinery in an organism. To be efficient, this approach relies on thorough genome annotation, a process much slower than genome sequencing per se. Tetrahymena thermophila, a unicellular eukaryote, has been a useful model organism and has a fully sequenced but sparsely annotated genome. One important resource for studying this organism has been an online transcriptomic database. We have developed an automated approach to gene annotation in the context of transcriptome data in T. thermophila, called the Co-regulation Data Harvester (CDH). Beginning with a gene of interest, the CDH identifies co-regulated genes by accessing the Tetrahymena transcriptome database. It then identifies their closely related genes (orthologs) in other organisms by using reciprocal BLAST searches. Finally, it collates the annotations of those orthologs' functions, which provides the user with information to help predict the cellular role of the initial query. The CDH, which is freely available, represents a powerful new tool for analyzing cell biological pathways in Tetrahymena. Moreover, to the extent that genes and pathways are conserved between organisms, the inferences obtained via the CDH should be relevant, and can be explored, in many other systems.

  5. DESHARKY: automatic design of metabolic pathways for optimal cell growth.

    PubMed

    Rodrigo, Guillermo; Carrera, Javier; Prather, Kristala Jones; Jaramillo, Alfonso

    2008-11-01

    The biological solution for synthesis or remediation of organic compounds using living organisms, particularly bacteria and yeast, has been promoted because of the cost reduction with respect to the non-living chemical approach. In that way, computational frameworks can profit from the previous knowledge stored in large databases of compounds, enzymes and reactions. In addition, the cell behavior can be studied by modeling the cellular context. We have implemented a Monte Carlo algorithm (DESHARKY) that finds a metabolic pathway from a target compound by exploring a database of enzymatic reactions. DESHARKY outputs a biochemical route to the host metabolism together with its impact in the cellular context by using mathematical models of the cell resources and metabolism. Furthermore, we provide the sequence of amino acids for the enzymes involved in the route closest phylogenetically to the considered organism. We provide examples of designed metabolic pathways with their genetic load characterizations. Here, we have used Escherichia coli as host organism. In addition, our bioinformatic tool can be applied for biodegradation or biosynthesis and its performance scales with the database size. Software, a tutorial and examples are freely available and open source at http://soft.synth-bio.org/desharky.html

  6. A novel approach to select differential pathways associated with hypertrophic cardiomyopathy based on gene co‑expression analysis.

    PubMed

    Chen, Xiao-Min; Feng, Ming-Jun; Shen, Cai-Jie; He, Bin; Du, Xian-Feng; Yu, Yi-Bo; Liu, Jing; Chu, Hui-Min

    2017-07-01

    The present study was designed to develop a novel method for identifying significant pathways associated with human hypertrophic cardiomyopathy (HCM), based on gene co‑expression analysis. The microarray dataset associated with HCM (E‑GEOD‑36961) was obtained from the European Molecular Biology Laboratory‑European Bioinformatics Institute database. Informative pathways were selected based on the Reactome pathway database and screening treatments. An empirical Bayes method was utilized to construct co‑expression networks for informative pathways, and a weight value was assigned to each pathway. Differential pathways were extracted based on weight threshold, which was calculated using a random model. In order to assess whether the co‑expression method was feasible, it was compared with traditional pathway enrichment analysis of differentially expressed genes, which were identified using the significance analysis of microarrays package. A total of 1,074 informative pathways were screened out for subsequent investigations and their weight values were also obtained. According to the threshold of weight value of 0.01057, 447 differential pathways, including folding of actin by chaperonin containing T‑complex protein 1 (CCT)/T‑complex protein 1 ring complex (TRiC), purine ribonucleoside monophosphate biosynthesis and ubiquinol biosynthesis, were obtained. Compared with traditional pathway enrichment analysis, the number of pathways obtained from the co‑expression approach was increased. The results of the present study demonstrated that this method may be useful to predict marker pathways for HCM. The pathways of folding of actin by CCT/TRiC and purine ribonucleoside monophosphate biosynthesis may provide evidence of the underlying molecular mechanisms of HCM, and offer novel therapeutic directions for HCM.

  7. IntegromeDB: an integrated system and biological search engine.

    PubMed

    Baitaluk, Michael; Kozhenkov, Sergey; Dubinina, Yulia; Ponomarenko, Julia

    2012-01-19

    With the growth of biological data in volume and heterogeneity, web search engines become key tools for researchers. However, general-purpose search engines are not specialized for the search of biological data. Here, we present an approach at developing a biological web search engine based on the Semantic Web technologies and demonstrate its implementation for retrieving gene- and protein-centered knowledge. The engine is available at http://www.integromedb.org. The IntegromeDB search engine allows scanning data on gene regulation, gene expression, protein-protein interactions, pathways, metagenomics, mutations, diseases, and other gene- and protein-related data that are automatically retrieved from publicly available databases and web pages using biological ontologies. To perfect the resource design and usability, we welcome and encourage community feedback.

  8. Combining chemoinformatics with bioinformatics: in silico prediction of bacterial flavor-forming pathways by a chemical systems biology approach "reverse pathway engineering".

    PubMed

    Liu, Mengjin; Bienfait, Bruno; Sacher, Oliver; Gasteiger, Johann; Siezen, Roland J; Nauta, Arjen; Geurts, Jan M W

    2014-01-01

    The incompleteness of genome-scale metabolic models is a major bottleneck for systems biology approaches, which are based on large numbers of metabolites as identified and quantified by metabolomics. Many of the revealed secondary metabolites and/or their derivatives, such as flavor compounds, are non-essential in metabolism, and many of their synthesis pathways are unknown. In this study, we describe a novel approach, Reverse Pathway Engineering (RPE), which combines chemoinformatics and bioinformatics analyses, to predict the "missing links" between compounds of interest and their possible metabolic precursors by providing plausible chemical and/or enzymatic reactions. We demonstrate the added-value of the approach by using flavor-forming pathways in lactic acid bacteria (LAB) as an example. Established metabolic routes leading to the formation of flavor compounds from leucine were successfully replicated. Novel reactions involved in flavor formation, i.e. the conversion of alpha-hydroxy-isocaproate to 3-methylbutanoic acid and the synthesis of dimethyl sulfide, as well as the involved enzymes were successfully predicted. These new insights into the flavor-formation mechanisms in LAB can have a significant impact on improving the control of aroma formation in fermented food products. Since the input reaction databases and compounds are highly flexible, the RPE approach can be easily extended to a broad spectrum of applications, amongst others health/disease biomarker discovery as well as synthetic biology.

  9. Pivotal role of the muscle-contraction pathway in cryptorchidism and evidence for genomic connections with cardiomyopathy pathways in RASopathies.

    PubMed

    Cannistraci, Carlo V; Ogorevc, Jernej; Zorc, Minja; Ravasi, Timothy; Dovc, Peter; Kunej, Tanja

    2013-02-14

    Cryptorchidism is the most frequent congenital disorder in male children; however the genetic causes of cryptorchidism remain poorly investigated. Comparative integratomics combined with systems biology approach was employed to elucidate genetic factors and molecular pathways underlying testis descent. Literature mining was performed to collect genomic loci associated with cryptorchidism in seven mammalian species. Information regarding the collected candidate genes was stored in MySQL relational database. Genomic view of the loci was presented using Flash GViewer web tool (http://gmod.org/wiki/Flashgviewer/). DAVID Bioinformatics Resources 6.7 was used for pathway enrichment analysis. Cytoscape plug-in PiNGO 1.11 was employed for protein-network-based prediction of novel candidate genes. Relevant protein-protein interactions were confirmed and visualized using the STRING database (version 9.0). The developed cryptorchidism gene atlas includes 217 candidate loci (genes, regions involved in chromosomal mutations, and copy number variations) identified at the genomic, transcriptomic, and proteomic level. Human orthologs of the collected candidate loci were presented using a genomic map viewer. The cryptorchidism gene atlas is freely available online: http://www.integratomics-time.com/cryptorchidism/. Pathway analysis suggested the presence of twelve enriched pathways associated with the list of 179 literature-derived candidate genes. Additionally, a list of 43 network-predicted novel candidate genes was significantly associated with four enriched pathways. Joint pathway analysis of the collected and predicted candidate genes revealed the pivotal importance of the muscle-contraction pathway in cryptorchidism and evidence for genomic associations with cardiomyopathy pathways in RASopathies. The developed gene atlas represents an important resource for the scientific community researching genetics of cryptorchidism. The collected data will further facilitate development of novel genetic markers and could be of interest for functional studies in animals and human. The proposed network-based systems biology approach elucidates molecular mechanisms underlying co-presence of cryptorchidism and cardiomyopathy in RASopathies. Such approach could also aid in molecular explanation of co-presence of diverse and apparently unrelated clinical manifestations in other syndromes.

  10. Expression profiling indicating low selenium-sensitive microRNA levels linked to cell cycle and cell stress response pathways in the CaCo-2 cell line.

    PubMed

    McCann, Mark J; Rotjanapun, Kunjana; Hesketh, John E; Roy, Nicole C

    2017-05-01

    Se is an essential micronutrient for human health, and fluctuations in Se levels and the potential cellular dysfunction associated with it may increase the risk for disease. Although Se has been shown to influence several biological pathways important in health, little is known about the effect of Se on the expression of microRNA (miRNA) molecules regulating these pathways. To explore the potential role of Se-sensitive miRNA in regulating pathways linked with colon cancer, we profiled the expression of 800 miRNA in the CaCo-2 human adenocarcinoma cell line in response to a low-Se (72 h at <40 nm) environment using nCounter direct quantification. These data were then examined using a range of in silico databases to identify experimentally validated miRNA-mRNA interactions and the biological pathways involved. We identified ten Se-sensitive miRNA (hsa-miR-93-5p, hsa-miR-106a-5p, hsa-miR-205-5p, hsa-miR-200c-3p, hsa-miR-99b-5p, hsa-miR-302d-3p, hsa-miR-373-3p, hsa-miR-483-3p, hsa-miR-512-5p and hsa-miR-4454), which regulate 3588 mRNA in key pathways such as the cell cycle, the cellular response to stress, and the canonical Wnt/β-catenin, p53 and ERK/MAPK signalling pathways. Our data show that the effects of low Se on biological pathways may, in part, be due to these ten Se-sensitive miRNA. Dysregulation of the cell cycle and of the stress response pathways due to low Se may influence key genes involved in carcinogenesis.

  11. Estrogen alters the profile of the transcriptome in river snail Bellamya aeruginosa.

    PubMed

    Lei, Kun; Liu, Ruizhi; An, Li-Hui; Luo, Ying-Feng; LeBlanc, Gerald A

    2015-03-01

    We evaluated the transcriptome dynamics of the freshwater river snail Bellamya aeruginosa exposed to 17β-estradiol (E2) using the Roche/454 GS-FLX platform. In total, 41,869 unigenes, with an average length of 586 bp, representing 36,181 contigs and 5,688 singlets were obtained. Among them, 18.08, 36.85, and 25.47 % matched sequences in the GenBank non-redundant nucleic acid database, non-redundant protein database, and Swiss protein database, respectively. Annotation of the unigenes with gene ontology, and then mapping them to biological pathways, revealed large groups of genes related to growth, development, reproduction, signal transduction, and defense mechanisms. Significant differences were found in gene expression in both liver and testicular tissues between control and E2-exposed organisms. These changes in gene expression will help in understanding the molecular mechanisms of the response to physiological stress in the river snail exposed to estrogen, and will facilitate research into biological processes and underlying physiological adaptations to xenoestrogen exposure in gastropods.

  12. Androgen-responsive gene database: integrated knowledge on androgen-responsive genes.

    PubMed

    Jiang, Mei; Ma, Yunsheng; Chen, Congcong; Fu, Xuping; Yang, Shu; Li, Xia; Yu, Guohua; Mao, Yumin; Xie, Yi; Li, Yao

    2009-11-01

    Androgen signaling plays an important role in many biological processes. Androgen Responsive Gene Database (ARGDB) is devoted to providing integrated knowledge on androgen-controlled genes. Gene records were collected on the basis of PubMed literature collections. More than 6000 abstracts and 950 original publications were manually screened, leading to 1785 human genes, 993 mouse genes, and 583 rat genes finally included in the database. All the collected genes were experimentally proved to be regulated by androgen at the expression level or to contain androgen-responsive regions. For each gene important details of the androgen regulation experiments were collected from references, such as expression change, androgen-responsive sequence, response time, tissue/cell type, experimental method, ligand identity, and androgen amount, which will facilitate further evaluation by researchers. Furthermore, the database was integrated with multiple annotation resources, including National Center for Biotechnology Information, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes pathway, to reveal the biological characteristics and significance of androgen-regulated genes. The ARGDB web site is mainly composed of the Browse, Search, Element Scan, and Submission modules. It is user friendly and freely accessible at http://argdb.fudan.edu.cn. Preliminary analysis of the collected data was performed. Many disease pathways, such as prostate carcinogenesis, were found to be enriched in androgen-regulated genes. The discovered androgen-response motifs were similar to those in previous reports. The analysis results are displayed in the web site. In conclusion, ARGDB provides a unified gateway to storage, retrieval, and update of information on androgen-regulated genes.

  13. PathVisio 3: an extendable pathway analysis toolbox.

    PubMed

    Kutmon, Martina; van Iersel, Martijn P; Bohler, Anwesha; Kelder, Thomas; Nunes, Nuno; Pico, Alexander R; Evelo, Chris T

    2015-02-01

    PathVisio is a commonly used pathway editor, visualization and analysis software. Biological pathways have been used by biologists for many years to describe the detailed steps in biological processes. Those powerful, visual representations help researchers to better understand, share and discuss knowledge. Since the first publication of PathVisio in 2008, the original paper was cited more than 170 times and PathVisio was used in many different biological studies. As an online editor PathVisio is also integrated in the community curated pathway database WikiPathways. Here we present the third version of PathVisio with the newest additions and improvements of the application. The core features of PathVisio are pathway drawing, advanced data visualization and pathway statistics. Additionally, PathVisio 3 introduces a new powerful extension systems that allows other developers to contribute additional functionality in form of plugins without changing the core application. PathVisio can be downloaded from http://www.pathvisio.org and in 2014 PathVisio 3 has been downloaded over 5,500 times. There are already more than 15 plugins available in the central plugin repository. PathVisio is a freely available, open-source tool published under the Apache 2.0 license (http://www.apache.org/licenses/LICENSE-2.0). It is implemented in Java and thus runs on all major operating systems. The code repository is available at http://svn.bigcat.unimaas.nl/pathvisio. The support mailing list for users is available on https://groups.google.com/forum/#!forum/wikipathways-discuss and for developers on https://groups.google.com/forum/#!forum/wikipathways-devel.

  14. Gramene 2013: comparative plant genomics resources.

    PubMed

    Monaco, Marcela K; Stein, Joshua; Naithani, Sushma; Wei, Sharon; Dharmawardhana, Palitha; Kumari, Sunita; Amarasinghe, Vindhya; Youens-Clark, Ken; Thomason, James; Preece, Justin; Pasternak, Shiran; Olson, Andrew; Jiao, Yinping; Lu, Zhenyuan; Bolser, Dan; Kerhornou, Arnaud; Staines, Dan; Walts, Brandon; Wu, Guanming; D'Eustachio, Peter; Haw, Robin; Croft, David; Kersey, Paul J; Stein, Lincoln; Jaiswal, Pankaj; Ware, Doreen

    2014-01-01

    Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.

  15. A Database of Reaction Monitoring Mass Spectrometry Assays for Elucidating Therapeutic Response in Cancer

    PubMed Central

    Remily-Wood, Elizabeth R.; Liu, Richard Z.; Xiang, Yun; Chen, Yi; Thomas, C. Eric; Rajyaguru, Neal; Kaufman, Laura M.; Ochoa, Joana E.; Hazlehurst, Lori; Pinilla-Ibarz, Javier; Lancet, Jeffrey; Zhang, Guolin; Haura, Eric; Shibata, David; Yeatman, Timothy; Smalley, Keiran S.M.; Dalton, William S.; Huang, Emina; Scott, Ed; Bloom, Gregory C.; Eschrich, Steven A.; Koomen, John M.

    2012-01-01

    Purpose The Quantitative Assay Database (QuAD), http://proteome.moffitt.org/QUAD/, facilitates widespread implementation of quantitative mass spectrometry in cancer biology and clinical research through sharing of methods and reagents for monitoring protein expression and modification. Experimental Design Liquid chromatography coupled to multiple reaction monitoring mass spectrometry (LC-MRM) assays are developed using SDS-PAGE fractionated lysates from cancer cell lines. Pathway maps created using GeneGO Metacore provide the biological relationships between proteins and illustrate concepts for multiplexed analysis; each protein can be selected to examine assay development at the protein and peptide level. Results The coupling of SDS-PAGE and LC-MRM screening has been used to detect 876 peptides from 218 cancer-related proteins in model systems including colon, lung, melanoma, leukemias, and myeloma, which has led to the development of 95 quantitative assays including stable-isotope labeled peptide standards. Methods are published online and peptide standards are made available to the research community. Protein expression measurements for heat shock proteins, including a comparison with ELISA and monitoring response to the HSP90 inhibitor, 17-DMAG, are used to illustrate the components of the QuAD and its potential utility. Conclusions and Clinical Relevance This resource enables quantitative assessment of protein components of signaling pathways and biological processes and holds promise for systematic investigation of treatment responses in cancer. PMID:21656910

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karp, Peter D.

    Pathway Tools is a systems-biology software package written by SRI International (SRI) that produces Pathway/Genome Databases (PGDBs) for organisms with a sequenced genome. Pathway Tools also provides a wide range of capabilities for analyzing predicted metabolic networks and user-generated omics data. More than 5,000 academic, industrial, and government groups have licensed Pathway Tools. This user community includes researchers at all three DOE bioenergy centers, as well as academic and industrial metabolic engineering (ME) groups. An integral part of the Pathway Tools software is MetaCyc, a large, multiorganism database of metabolic pathways and enzymes that SRI and its academic collaborators manuallymore » curate. This project included two main goals: I. Enhance the MetaCyc content of bioenergy-related enzymes and pathways. II. Develop computational tools for engineering metabolic pathways that satisfy specified design goals, in particular for bioenergy-related pathways. In part I, SRI proposed to significantly expand the coverage of bioenergy-related metabolic information in MetaCyc, followed by the generation of organism-specific PGDBs for all energy-relevant organisms sequenced at the DOE Joint Genome Institute (JGI). Part I objectives included: 1: Expand the content of MetaCyc to include bioenergy-related enzymes and pathways. 2: Enhance the Pathway Tools software to enable display of complex polymer degradation processes. 3: Create new PGDBs for the energy-related organisms sequenced by JGI, update existing PGDBs with new MetaCyc content, and make these data available to JBEI via the BioCyc website. In part II, SRI proposed to develop an efficient computational tool for the engineering of metabolic pathways. Part II objectives included: 4: Develop computational tools for generating metabolic pathways that satisfy specified design goals, enabling users to specify parameters such as starting and ending compounds, and preferred or disallowed intermediate compounds. The pathways were to be generated using metabolic reactions from a reference database (DB). 5: Develop computational tools for ranking the pathways generated in objective (4) according to their optimality. The ranking criteria include stoichiometric yield, the number and cost of additional inputs and the cofactor compounds required by the pathway, pathway length, and pathway energetics. 6: Develop tools for visualizing generated pathways to facilitate the evaluation of a large space of generated pathways.« less

  17. microRNAs Databases: Developmental Methodologies, Structural and Functional Annotations.

    PubMed

    Singh, Nagendra Kumar

    2017-09-01

    microRNA (miRNA) is an endogenous and evolutionary conserved non-coding RNA, involved in post-transcriptional process as gene repressor and mRNA cleavage through RNA-induced silencing complex (RISC) formation. In RISC, miRNA binds in complementary base pair with targeted mRNA along with Argonaut proteins complex, causes gene repression or endonucleolytic cleavage of mRNAs and results in many diseases and syndromes. After the discovery of miRNA lin-4 and let-7, subsequently large numbers of miRNAs were discovered by low-throughput and high-throughput experimental techniques along with computational process in various biological and metabolic processes. The miRNAs are important non-coding RNA for understanding the complex biological phenomena of organism because it controls the gene regulation. This paper reviews miRNA databases with structural and functional annotations developed by various researchers. These databases contain structural and functional information of animal, plant and virus miRNAs including miRNAs-associated diseases, stress resistance in plant, miRNAs take part in various biological processes, effect of miRNAs interaction on drugs and environment, effect of variance on miRNAs, miRNAs gene expression analysis, sequence of miRNAs, structure of miRNAs. This review focuses on the developmental methodology of miRNA databases such as computational tools and methods used for extraction of miRNAs annotation from different resources or through experiment. This study also discusses the efficiency of user interface design of every database along with current entry and annotations of miRNA (pathways, gene ontology, disease ontology, etc.). Here, an integrated schematic diagram of construction process for databases is also drawn along with tabular and graphical comparison of various types of entries in different databases. Aim of this paper is to present the importance of miRNAs-related resources at a single place.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    SacconePhD, Scott F; Chesler, Elissa J; Bierut, Laura J

    Commercial SNP microarrays now provide comprehensive and affordable coverage of the human genome. However, some diseases have biologically relevant genomic regions that may require additional coverage. Addiction, for example, is thought to be influenced by complex interactions among many relevant genes and pathways. We have assembled a list of 486 biologically relevant genes nominated by a panel of experts on addiction. We then added 424 genes that showed evidence of association with addiction phenotypes through mouse QTL mappings and gene co-expression analysis. We demonstrate that there are a substantial number of SNPs in these genes that are not well representedmore » by commercial SNP platforms. We address this problem by introducing a publicly available SNP database for addiction. The database is annotated using numeric prioritization scores indicating the extent of biological relevance. The scores incorporate a number of factors such as SNP/gene functional properties (including synonymy and promoter regions), data from mouse systems genetics and measures of human/mouse evolutionary conservation. We then used HapMap genotyping data to determine if a SNP is tagged by a commercial microarray through linkage disequilibrium. This combination of biological prioritization scores and LD tagging annotation will enable addiction researchers to supplement commercial SNP microarrays to ensure comprehensive coverage of biologically relevant regions.« less

  19. Constraints on signaling network logic reveal functional subgraphs on Multiple Myeloma OMIC data.

    PubMed

    Miannay, Bertrand; Minvielle, Stéphane; Magrangeas, Florence; Guziolowski, Carito

    2018-03-21

    The integration of gene expression profiles (GEPs) and large-scale biological networks derived from pathways databases is a subject which is being widely explored. Existing methods are based on network distance measures among significantly measured species. Only a small number of them include the directionality and underlying logic existing in biological networks. In this study we approach the GEP-networks integration problem by considering the network logic, however our approach does not require a prior species selection according to their gene expression level. We start by modeling the biological network representing its underlying logic using Logic Programming. This model points to reachable network discrete states that maximize a notion of harmony between the molecular species active or inactive possible states and the directionality of the pathways reactions according to their activator or inhibitor control role. Only then, we confront these network states with the GEP. From this confrontation independent graph components are derived, each of them related to a fixed and optimal assignment of active or inactive states. These components allow us to decompose a large-scale network into subgraphs and their molecular species state assignments have different degrees of similarity when compared to the same GEP. We apply our method to study the set of possible states derived from a subgraph from the NCI-PID Pathway Interaction Database. This graph links Multiple Myeloma (MM) genes to known receptors for this blood cancer. We discover that the NCI-PID MM graph had 15 independent components, and when confronted to 611 MM GEPs, we find 1 component as being more specific to represent the difference between cancer and healthy profiles.

  20. IntegromeDB: an integrated system and biological search engine

    PubMed Central

    2012-01-01

    Background With the growth of biological data in volume and heterogeneity, web search engines become key tools for researchers. However, general-purpose search engines are not specialized for the search of biological data. Description Here, we present an approach at developing a biological web search engine based on the Semantic Web technologies and demonstrate its implementation for retrieving gene- and protein-centered knowledge. The engine is available at http://www.integromedb.org. Conclusions The IntegromeDB search engine allows scanning data on gene regulation, gene expression, protein-protein interactions, pathways, metagenomics, mutations, diseases, and other gene- and protein-related data that are automatically retrieved from publicly available databases and web pages using biological ontologies. To perfect the resource design and usability, we welcome and encourage community feedback. PMID:22260095

  1. An online model composition tool for system biology models

    PubMed Central

    2013-01-01

    Background There are multiple representation formats for Systems Biology computational models, and the Systems Biology Markup Language (SBML) is one of the most widely used. SBML is used to capture, store, and distribute computational models by Systems Biology data sources (e.g., the BioModels Database) and researchers. Therefore, there is a need for all-in-one web-based solutions that support advance SBML functionalities such as uploading, editing, composing, visualizing, simulating, querying, and browsing computational models. Results We present the design and implementation of the Model Composition Tool (Interface) within the PathCase-SB (PathCase Systems Biology) web portal. The tool helps users compose systems biology models to facilitate the complex process of merging systems biology models. We also present three tools that support the model composition tool, namely, (1) Model Simulation Interface that generates a visual plot of the simulation according to user’s input, (2) iModel Tool as a platform for users to upload their own models to compose, and (3) SimCom Tool that provides a side by side comparison of models being composed in the same pathway. Finally, we provide a web site that hosts BioModels Database models and a separate web site that hosts SBML Test Suite models. Conclusions Model composition tool (and the other three tools) can be used with little or no knowledge of the SBML document structure. For this reason, students or anyone who wants to learn about systems biology will benefit from the described functionalities. SBML Test Suite models will be a nice starting point for beginners. And, for more advanced purposes, users will able to access and employ models of the BioModels Database as well. PMID:24006914

  2. Computational analysis of microRNA function in heart development.

    PubMed

    Liu, Ganqiang; Ding, Min; Chen, Jiajia; Huang, Jinyan; Wang, Haiyun; Jing, Qing; Shen, Bairong

    2010-09-01

    Emerging evidence suggests that specific spatio-temporal microRNA (miRNA) expression is required for heart development. In recent years, hundreds of miRNAs have been discovered. In contrast, functional annotations are available only for a very small fraction of these regulatory molecules. In order to provide a global perspective for the biologists who study the relationship between differentially expressed miRNAs and heart development, we employed computational analysis to uncover the specific cellular processes and biological pathways targeted by miRNAs in mouse heart development. Here, we utilized Gene Ontology (GO) categories, KEGG Pathway, and GeneGo Pathway Maps as a gene functional annotation system for miRNA target enrichment analysis. The target genes of miRNAs were found to be enriched in functional categories and pathway maps in which miRNAs could play important roles during heart development. Meanwhile, we developed miRHrt (http://sysbio.suda.edu.cn/mirhrt/), a database aiming to provide a comprehensive resource of miRNA function in regulating heart development. These computational analysis results effectively illustrated the correlation of differentially expressed miRNAs with cellular functions and heart development. We hope that the identified novel heart development-associated pathways and the database presented here would facilitate further understanding of the roles and mechanisms of miRNAs in heart development.

  3. Tcof1-Related Molecular Networks in Treacher Collins Syndrome.

    PubMed

    Dai, Jiewen; Si, Jiawen; Wang, Minjiao; Huang, Li; Fang, Bing; Shi, Jun; Wang, Xudong; Shen, Guofang

    2016-09-01

    Treacher Collins syndrome (TCS) is a rare, autosomal-dominant disorder characterized by craniofacial deformities, and is primarily caused by mutations in the Tcof1 gene. This article was aimed to perform a comprehensive literature review and systematic bioinformatic analysis of Tcof1-related molecular networks in TCS. First, the up- and down-regulated genes in Tcof1 heterozygous haploinsufficient mutant mice embryos and Tcof1 knockdown and Tcof1 over-expressed neuroblastoma N1E-115 cells were obtained from the Gene Expression Omnibus database. The GeneDecks database was used to calculate the 500 genes most closely related to Tcof1. Then, the relationships between 4 gene sets (a predicted set and sets comparing the wildtype with the 3 Gene Expression Omnibus datasets) were analyzed using the DAVID, GeneMANIA and STRING databases. The analysis results showed that the Tcof1-related genes were enriched in various biological processes, including cell proliferation, apoptosis, cell cycle, differentiation, and migration. They were also enriched in several signaling pathways, such as the ribosome, p53, cell cycle, and WNT signaling pathways. Additionally, these genes clearly had direct or indirect interactions with Tcof1 and between each other. Literature review and bioinformatic analysis finds imply that special attention should be given to these pathways, as they may offer target points for TCS therapies.

  4. Systems Biology Approaches for Discovering Biomarkers for Traumatic Brain Injury

    PubMed Central

    Feala, Jacob D.; AbdulHameed, Mohamed Diwan M.; Yu, Chenggang; Dutta, Bhaskar; Yu, Xueping; Schmid, Kara; Dave, Jitendra; Tortella, Frank

    2013-01-01

    Abstract The rate of traumatic brain injury (TBI) in service members with wartime injuries has risen rapidly in recent years, and complex, variable links have emerged between TBI and long-term neurological disorders. The multifactorial nature of TBI secondary cellular response has confounded attempts to find cellular biomarkers for its diagnosis and prognosis or for guiding therapy for brain injury. One possibility is to apply emerging systems biology strategies to holistically probe and analyze the complex interweaving molecular pathways and networks that mediate the secondary cellular response through computational models that integrate these diverse data sets. Here, we review available systems biology strategies, databases, and tools. In addition, we describe opportunities for applying this methodology to existing TBI data sets to identify new biomarker candidates and gain insights about the underlying molecular mechanisms of TBI response. As an exemplar, we apply network and pathway analysis to a manually compiled list of 32 protein biomarker candidates from the literature, recover known TBI-related mechanisms, and generate hypothetical new biomarker candidates. PMID:23510232

  5. Systems Genetics Analysis of Genome-Wide Association Study Reveals Novel Associations Between Key Biological Processes and Coronary Artery Disease.

    PubMed

    Ghosh, Sujoy; Vivar, Juan; Nelson, Christopher P; Willenborg, Christina; Segrè, Ayellet V; Mäkinen, Ville-Petteri; Nikpay, Majid; Erdmann, Jeannette; Blankenberg, Stefan; O'Donnell, Christopher; März, Winfried; Laaksonen, Reijo; Stewart, Alexandre F R; Epstein, Stephen E; Shah, Svati H; Granger, Christopher B; Hazen, Stanley L; Kathiresan, Sekar; Reilly, Muredach P; Yang, Xia; Quertermous, Thomas; Samani, Nilesh J; Schunkert, Heribert; Assimes, Themistocles L; McPherson, Ruth

    2015-07-01

    Genome-wide association studies have identified multiple genetic variants affecting the risk of coronary artery disease (CAD). However, individually these explain only a small fraction of the heritability of CAD and for most, the causal biological mechanisms remain unclear. We sought to obtain further insights into potential causal processes of CAD by integrating large-scale GWA data with expertly curated databases of core human pathways and functional networks. Using pathways (gene sets) from Reactome, we carried out a 2-stage gene set enrichment analysis strategy. From a meta-analyzed discovery cohort of 7 CAD genome-wide association study data sets (9889 cases/11 089 controls), nominally significant gene sets were tested for replication in a meta-analysis of 9 additional studies (15 502 cases/55 730 controls) from the Coronary ARtery DIsease Genome wide Replication and Meta-analysis (CARDIoGRAM) Consortium. A total of 32 of 639 Reactome pathways tested showed convincing association with CAD (replication P<0.05). These pathways resided in 9 of 21 core biological processes represented in Reactome, and included pathways relevant to extracellular matrix (ECM) integrity, innate immunity, axon guidance, and signaling by PDRF (platelet-derived growth factor), NOTCH, and the transforming growth factor-β/SMAD receptor complex. Many of these pathways had strengths of association comparable to those observed in lipid transport pathways. Network analysis of unique genes within the replicated pathways further revealed several interconnected functional and topologically interacting modules representing novel associations (eg, semaphoring-regulated axonal guidance pathway) besides confirming known processes (lipid metabolism). The connectivity in the observed networks was statistically significant compared with random networks (P<0.001). Network centrality analysis (degree and betweenness) further identified genes (eg, NCAM1, FYN, FURIN, etc) likely to play critical roles in the maintenance and functioning of several of the replicated pathways. These findings provide novel insights into how genetic variation, interpreted in the context of biological processes and functional interactions among genes, may help define the genetic architecture of CAD. © 2015 American Heart Association, Inc.

  6. DNAtraffic--a new database for systems biology of DNA dynamics during the cell life.

    PubMed

    Kuchta, Krzysztof; Barszcz, Daniela; Grzesiuk, Elzbieta; Pomorski, Pawel; Krwawicz, Joanna

    2012-01-01

    DNAtraffic (http://dnatraffic.ibb.waw.pl/) is dedicated to be a unique comprehensive and richly annotated database of genome dynamics during the cell life. It contains extensive data on the nomenclature, ontology, structure and function of proteins related to the DNA integrity mechanisms such as chromatin remodeling, histone modifications, DNA repair and damage response from eight organisms: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Escherichia coli and Arabidopsis thaliana. DNAtraffic contains comprehensive information on the diseases related to the assembled human proteins. DNAtraffic is richly annotated in the systemic information on the nomenclature, chemistry and structure of DNA damage and their sources, including environmental agents or commonly used drugs targeting nucleic acids and/or proteins involved in the maintenance of genome stability. One of the DNAtraffic database aim is to create the first platform of the combinatorial complexity of DNA network analysis. Database includes illustrations of pathways, damage, proteins and drugs. Since DNAtraffic is designed to cover a broad spectrum of scientific disciplines, it has to be extensively linked to numerous external data sources. Our database represents the result of the manual annotation work aimed at making the DNAtraffic much more useful for a wide range of systems biology applications.

  7. DNAtraffic—a new database for systems biology of DNA dynamics during the cell life

    PubMed Central

    Kuchta, Krzysztof; Barszcz, Daniela; Grzesiuk, Elzbieta; Pomorski, Pawel; Krwawicz, Joanna

    2012-01-01

    DNAtraffic (http://dnatraffic.ibb.waw.pl/) is dedicated to be a unique comprehensive and richly annotated database of genome dynamics during the cell life. It contains extensive data on the nomenclature, ontology, structure and function of proteins related to the DNA integrity mechanisms such as chromatin remodeling, histone modifications, DNA repair and damage response from eight organisms: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Escherichia coli and Arabidopsis thaliana. DNAtraffic contains comprehensive information on the diseases related to the assembled human proteins. DNAtraffic is richly annotated in the systemic information on the nomenclature, chemistry and structure of DNA damage and their sources, including environmental agents or commonly used drugs targeting nucleic acids and/or proteins involved in the maintenance of genome stability. One of the DNAtraffic database aim is to create the first platform of the combinatorial complexity of DNA network analysis. Database includes illustrations of pathways, damage, proteins and drugs. Since DNAtraffic is designed to cover a broad spectrum of scientific disciplines, it has to be extensively linked to numerous external data sources. Our database represents the result of the manual annotation work aimed at making the DNAtraffic much more useful for a wide range of systems biology applications. PMID:22110027

  8. Applicability of computational systems biology in toxicology.

    PubMed

    Kongsbak, Kristine; Hadrup, Niels; Audouze, Karine; Vinggaard, Anne Marie

    2014-07-01

    Systems biology as a research field has emerged within the last few decades. Systems biology, often defined as the antithesis of the reductionist approach, integrates information about individual components of a biological system. In integrative systems biology, large data sets from various sources and databases are used to model and predict effects of chemicals on, for instance, human health. In toxicology, computational systems biology enables identification of important pathways and molecules from large data sets; tasks that can be extremely laborious when performed by a classical literature search. However, computational systems biology offers more advantages than providing a high-throughput literature search; it may form the basis for establishment of hypotheses on potential links between environmental chemicals and human diseases, which would be very difficult to establish experimentally. This is possible due to the existence of comprehensive databases containing information on networks of human protein-protein interactions and protein-disease associations. Experimentally determined targets of the specific chemical of interest can be fed into these networks to obtain additional information that can be used to establish hypotheses on links between the chemical and human diseases. Such information can also be applied for designing more intelligent animal/cell experiments that can test the established hypotheses. Here, we describe how and why to apply an integrative systems biology method in the hypothesis-generating phase of toxicological research. © 2014 Nordic Association for the Publication of BCPT (former Nordic Pharmacological Society).

  9. sbv IMPROVER: Modern Approach to Systems Biology.

    PubMed

    Guryanova, Svetlana; Guryanova, Anna

    2017-01-01

    The increasing amount and variety of data in biosciences call for innovative methods of visualization, scientific verification, and pathway analysis. Novel approaches to biological networks and research quality control are important because of their role in development of new products, improvement, and acceleration of existing health policies and research for novel ways of solving scientific challenges. One such approach is sbv IMPROVER. It is a platform that uses crowdsourcing and verification to create biological networks with easy public access. It contains 120 networks built in Biological Expression Language (BEL) to interpret data from PubMed articles with high-quality verification available for free on the CBN database. Computable, human-readable biological networks with a structured syntax are a powerful way of representing biological information generated from high-density data. This article presents sbv IMPROVER, a crowd-verification approach for the visualization and expansion of biological networks.

  10. Systems biology of cancer biomarker detection.

    PubMed

    Mitra, Sanga; Das, Smarajit; Chakrabarti, Jayprokas

    2013-01-01

    Cancer systems-biology is an ever-growing area of research due to explosion of data; how to mine these data and extract useful information is the problem. To have an insight on carcinogenesis one need to systematically mine several resources, such as databases, microarray and next-generation sequences. This review encompasses management and analysis of cancer data, databases construction and data deposition, whole transcriptome and genome comparison, analysing results from high throughput experiments to uncover cellular pathways and molecular interactions, and the design of effective algorithms to identify potential biomarkers. Recent technical advances such as ChIP-on-chip, ChIP-seq and RNA-seq can be applied to get epigenetic information transformed into a high-throughput endeavour to which systems biology and bioinformatics are making significant inroads. The data from ENCODE and GENCODE projects available through UCSC genome browser can be considered as benchmark for comparison and meta-analysis. A pipeline for integrating next generation sequencing data, microarray data, and putting them together with the existing database is discussed. The understanding of cancer genomics is changing the way we approach cancer diagnosis and treatment. To give a better understanding of utilizing available resources' we have chosen oral cancer to show how and what kind of analysis can be done. This review is a computational genomic primer that provides a bird's eye view of computational and bioinformatics' tools currently available to perform integrated genomic and system biology analyses of several carcinoma.

  11. Training Signaling Pathway Maps to Biochemical Data with Constrained Fuzzy Logic: Quantitative Analysis of Liver Cell Responses to Inflammatory Stimuli

    PubMed Central

    Morris, Melody K.; Saez-Rodriguez, Julio; Clarke, David C.; Sorger, Peter K.; Lauffenburger, Douglas A.

    2011-01-01

    Predictive understanding of cell signaling network operation based on general prior knowledge but consistent with empirical data in a specific environmental context is a current challenge in computational biology. Recent work has demonstrated that Boolean logic can be used to create context-specific network models by training proteomic pathway maps to dedicated biochemical data; however, the Boolean formalism is restricted to characterizing protein species as either fully active or inactive. To advance beyond this limitation, we propose a novel form of fuzzy logic sufficiently flexible to model quantitative data but also sufficiently simple to efficiently construct models by training pathway maps on dedicated experimental measurements. Our new approach, termed constrained fuzzy logic (cFL), converts a prior knowledge network (obtained from literature or interactome databases) into a computable model that describes graded values of protein activation across multiple pathways. We train a cFL-converted network to experimental data describing hepatocytic protein activation by inflammatory cytokines and demonstrate the application of the resultant trained models for three important purposes: (a) generating experimentally testable biological hypotheses concerning pathway crosstalk, (b) establishing capability for quantitative prediction of protein activity, and (c) prediction and understanding of the cytokine release phenotypic response. Our methodology systematically and quantitatively trains a protein pathway map summarizing curated literature to context-specific biochemical data. This process generates a computable model yielding successful prediction of new test data and offering biological insight into complex datasets that are difficult to fully analyze by intuition alone. PMID:21408212

  12. MESSI: metabolic engineering target selection and best strain identification tool.

    PubMed

    Kang, Kang; Li, Jun; Lim, Boon Leong; Panagiotou, Gianni

    2015-01-01

    Metabolic engineering and synthetic biology are synergistically related fields for manipulating target pathways and designing microorganisms that can act as chemical factories. Saccharomyces cerevisiae's ideal bioprocessing traits make yeast a very attractive chemical factory for production of fuels, pharmaceuticals, nutraceuticals as well as a wide range of chemicals. However, future attempts of engineering S. cerevisiae's metabolism using synthetic biology need to move towards more integrative models that incorporate the high connectivity of metabolic pathways and regulatory processes and the interactions in genetic elements across those pathways and processes. To contribute in this direction, we have developed Metabolic Engineering target Selection and best Strain Identification tool (MESSI), a web server for predicting efficient chassis and regulatory components for yeast bio-based production. The server provides an integrative platform for users to analyse ready-to-use public high-throughput metabolomic data, which are transformed to metabolic pathway activities for identifying the most efficient S. cerevisiae strain for the production of a compound of interest. As input MESSI accepts metabolite KEGG IDs or pathway names. MESSI outputs a ranked list of S. cerevisiae strains based on aggregation algorithms. Furthermore, through a genome-wide association study of the metabolic pathway activities with the strains' natural variation, MESSI prioritizes genes and small variants as potential regulatory points and promising metabolic engineering targets. Users can choose various parameters in the whole process such as (i) weight and expectation of each metabolic pathway activity in the final ranking of the strains, (ii) Weighted AddScore Fuse or Weighted Borda Fuse aggregation algorithm, (iii) type of variants to be included, (iv) variant sets in different biological levels.Database URL: http://sbb.hku.hk/MESSI/. © The Author(s) 2015. Published by Oxford University Press.

  13. VISIBIOweb: visualization and layout services for BioPAX pathway models

    PubMed Central

    Dilek, Alptug; Belviranli, Mehmet E.; Dogrusoz, Ugur

    2010-01-01

    With recent advancements in techniques for cellular data acquisition, information on cellular processes has been increasing at a dramatic rate. Visualization is critical to analyzing and interpreting complex information; representing cellular processes or pathways is no exception. VISIBIOweb is a free, open-source, web-based pathway visualization and layout service for pathway models in BioPAX format. With VISIBIOweb, one can obtain well-laid-out views of pathway models using the standard notation of the Systems Biology Graphical Notation (SBGN), and can embed such views within one's web pages as desired. Pathway views may be navigated using zoom and scroll tools; pathway object properties, including any external database references available in the data, may be inspected interactively. The automatic layout component of VISIBIOweb may also be accessed programmatically from other tools using Hypertext Transfer Protocol (HTTP). The web site is free and open to all users and there is no login requirement. It is available at: http://visibioweb.patika.org. PMID:20460470

  14. T-cell lymphomas associated gene expression signature: Bioinformatics analysis based on gene expression Omnibus.

    PubMed

    Zhou, Lei-Lei; Xu, Xiao-Yue; Ni, Jie; Zhao, Xia; Zhou, Jian-Wei; Feng, Ji-Feng

    2018-06-01

    Due to the low incidence and the heterogeneity of subtypes, the biological process of T-cell lymphomas is largely unknown. Although many genes have been detected in T-cell lymphomas, the role of these genes in biological process of T-cell lymphomas was not further analyzed. Two qualified datasets were downloaded from Gene Expression Omnibus database. The biological functions of differentially expressed genes were evaluated by gene ontology enrichment and KEGG pathway analysis. The network for intersection genes was constructed by the cytoscape v3.0 software. Kaplan-Meier survival curves and log-rank test were employed to assess the association between differentially expressed genes and clinical characters. The intersection mRNAs were proved to be associated with fundamental processes of T-cell lymphoma cells. These intersection mRNAs were involved in the activation of some cancer-related pathways, including PI3K/AKT, Ras, JAK-STAT, and NF-kappa B signaling pathway. PDGFRA, CXCL12, and CCL19 were the most significant central genes in the signal-net analysis. The results of survival analysis are not entirely credible. Our findings uncovered aberrantly expressed genes and a complex RNA signal network in T-cell lymphomas and indicated cancer-related pathways involved in disease initiation and progression, providing a new insight for biotargeted therapy in T-cell lymphomas. © 2018 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  15. Environmental surveillance and monitoring. The next frontiers ...

    EPA Pesticide Factsheets

    High throughput toxicity testing (HTT) technologies along with the world-wide web are revolutionizing both generation and access to data regarding the bioactivities that chemicals can elicit when they interact with specific proteins, genes, or other targets in the body of an organism. However, to date, most of the focus has been on the application of such data to assessment of individual chemicals. We suggest that environmental surveillance and monitoring represent the next frontiers for HTT. Resources already exist in curated databases of chemical-biological interactions, including highly standardized quantitative dose-response data generated from nascent HTT programs like ToxCast and Tox21, to link chemicals detected through environmental analytical chemistry to known biological activities. The emergence of the adverse outcome pathway framework and associated knowledgebase for linking molecular or pathway-level perturbations of biological systems to adverse outcomes traditionally considered in risk assessment and regulatory decision-making through a series of measureable biological changes provides a critical link between activity and hazard. Furthermore, environmental samples can be directly analyzed via HTT platforms to provide an unprecedented breadth of biological activity characterization that integrates the effects of all compounds present in a mixture, whether known or not. Novel application of these chemical-biological interaction data provide an oppor

  16. Modeling of cell signaling pathways in macrophages by semantic networks

    PubMed Central

    Hsing, Michael; Bellenson, Joel L; Shankey, Conor; Cherkasov, Artem

    2004-01-01

    Background Substantial amounts of data on cell signaling, metabolic, gene regulatory and other biological pathways have been accumulated in literature and electronic databases. Conventionally, this information is stored in the form of pathway diagrams and can be characterized as highly "compartmental" (i.e. individual pathways are not connected into more general networks). Current approaches for representing pathways are limited in their capacity to model molecular interactions in their spatial and temporal context. Moreover, the critical knowledge of cause-effect relationships among signaling events is not reflected by most conventional approaches for manipulating pathways. Results We have applied a semantic network (SN) approach to develop and implement a model for cell signaling pathways. The semantic model has mapped biological concepts to a set of semantic agents and relationships, and characterized cell signaling events and their participants in the hierarchical and spatial context. In particular, the available information on the behaviors and interactions of the PI3K enzyme family has been integrated into the SN environment and a cell signaling network in human macrophages has been constructed. A SN-application has been developed to manipulate the locations and the states of molecules and to observe their actions under different biological scenarios. The approach allowed qualitative simulation of cell signaling events involving PI3Ks and identified pathways of molecular interactions that led to known cellular responses as well as other potential responses during bacterial invasions in macrophages. Conclusions We concluded from our results that the semantic network is an effective method to model cell signaling pathways. The semantic model allows proper representation and integration of information on biological structures and their interactions at different levels. The reconstruction of the cell signaling network in the macrophage allowed detailed investigation of connections among various essential molecules and reflected the cause-effect relationships among signaling events. The simulation demonstrated the dynamics of the semantic network, where a change of states on a molecule can alter its function and potentially cause a chain-reaction effect in the system. PMID:15494071

  17. SolCyc: a database hub at the Sol Genomics Network (SGN) for the manual curation of metabolic networks in Solanum and Nicotiana specific databases

    PubMed Central

    Foerster, Hartmut; Bombarely, Aureliano; Battey, James N D; Sierro, Nicolas; Ivanov, Nikolai V; Mueller, Lukas A

    2018-01-01

    Abstract SolCyc is the entry portal to pathway/genome databases (PGDBs) for major species of the Solanaceae family hosted at the Sol Genomics Network. Currently, SolCyc comprises six organism-specific PGDBs for tomato, potato, pepper, petunia, tobacco and one Rubiaceae, coffee. The metabolic networks of those PGDBs have been computationally predicted by the pathologic component of the pathway tools software using the manually curated multi-domain database MetaCyc (http://www.metacyc.org/) as reference. SolCyc has been recently extended by taxon-specific databases, i.e. the family-specific SolanaCyc database, containing only curated data pertinent to species of the nightshade family, and NicotianaCyc, a genus-specific database that stores all relevant metabolic data of the Nicotiana genus. Through manual curation of the published literature, new metabolic pathways have been created in those databases, which are complemented by the continuously updated, relevant species-specific pathways from MetaCyc. At present, SolanaCyc comprises 199 pathways and 29 superpathways and NicotianaCyc accounts for 72 pathways and 13 superpathways. Curator-maintained, taxon-specific databases such as SolanaCyc and NicotianaCyc are characterized by an enrichment of data specific to these taxa and free of falsely predicted pathways. Both databases have been used to update recently created Nicotiana-specific databases for Nicotiana tabacum, Nicotiana benthamiana, Nicotiana sylvestris and Nicotiana tomentosiformis by propagating verifiable data into those PGDBs. In addition, in-depth curation of the pathways in N.tabacum has been carried out which resulted in the elimination of 156 pathways from the 569 pathways predicted by pathway tools. Together, in-depth curation of the predicted pathway network and the supplementation with curated data from taxon-specific databases has substantially improved the curation status of the species–specific N.tabacum PGDB. The implementation of this strategy will significantly advance the curation status of all organism-specific databases in SolCyc resulting in the improvement on database accuracy, data analysis and visualization of biochemical networks in those species. Database URL https://solgenomics.net/tools/solcyc/ PMID:29762652

  18. Identification of hub subnetwork based on topological features of genes in breast cancer

    PubMed Central

    ZHUANG, DA-YONG; JIANG, LI; HE, QING-QING; ZHOU, PENG; YUE, TAO

    2015-01-01

    The aim of this study was to provide functional insight into the identification of hub subnetworks by aggregating the behavior of genes connected in a protein-protein interaction (PPI) network. We applied a protein network-based approach to identify subnetworks which may provide new insight into the functions of pathways involved in breast cancer rather than individual genes. Five groups of breast cancer data were downloaded and analyzed from the Gene Expression Omnibus (GEO) database of high-throughput gene expression data to identify gene signatures using the genome-wide global significance (GWGS) method. A PPI network was constructed using Cytoscape and clusters that focused on highly connected nodes were obtained using the molecular complex detection (MCODE) clustering algorithm. Pathway analysis was performed to assess the functional relevance of selected gene signatures based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Topological centrality was used to characterize the biological importance of gene signatures, pathways and clusters. The results revealed that, cluster1, as well as the cell cycle and oocyte meiosis pathways were significant subnetworks in the analysis of degree and other centralities, in which hub nodes mostly distributed. The most important hub nodes, with top ranked centrality, were also similar with the common genes from the above three subnetwork intersections, which was viewed as a hub subnetwork with more reproducible than individual critical genes selected without network information. This hub subnetwork attributed to the same biological process which was essential in the function of cell growth and death. This increased the accuracy of identifying gene interactions that took place within the same functional process and was potentially useful for the development of biomarkers and networks for breast cancer. PMID:25573623

  19. A systems biology pipeline identifies new immune and disease related molecular signatures and networks in human cells during microgravity exposure

    NASA Astrophysics Data System (ADS)

    Mukhopadhyay, Sayak; Saha, Rohini; Palanisamy, Anbarasi; Ghosh, Madhurima; Biswas, Anupriya; Roy, Saheli; Pal, Arijit; Sarkar, Kathakali; Bagh, Sangram

    2016-05-01

    Microgravity is a prominent health hazard for astronauts, yet we understand little about its effect at the molecular systems level. In this study, we have integrated a set of systems-biology tools and databases and have analysed more than 8000 molecular pathways on published global gene expression datasets of human cells in microgravity. Hundreds of new pathways have been identified with statistical confidence for each dataset and despite the difference in cell types and experiments, around 100 of the new pathways are appeared common across the datasets. They are related to reduced inflammation, autoimmunity, diabetes and asthma. We have identified downregulation of NfκB pathway via Notch1 signalling as new pathway for reduced immunity in microgravity. Induction of few cancer types including liver cancer and leukaemia and increased drug response to cancer in microgravity are also found. Increase in olfactory signal transduction is also identified. Genes, based on their expression pattern, are clustered and mathematically stable clusters are identified. The network mapping of genes within a cluster indicates the plausible functional connections in microgravity. This pipeline gives a new systems level picture of human cells under microgravity, generates testable hypothesis and may help estimating risk and developing medicine for space missions.

  20. A systems biology pipeline identifies new immune and disease related molecular signatures and networks in human cells during microgravity exposure.

    PubMed

    Mukhopadhyay, Sayak; Saha, Rohini; Palanisamy, Anbarasi; Ghosh, Madhurima; Biswas, Anupriya; Roy, Saheli; Pal, Arijit; Sarkar, Kathakali; Bagh, Sangram

    2016-05-17

    Microgravity is a prominent health hazard for astronauts, yet we understand little about its effect at the molecular systems level. In this study, we have integrated a set of systems-biology tools and databases and have analysed more than 8000 molecular pathways on published global gene expression datasets of human cells in microgravity. Hundreds of new pathways have been identified with statistical confidence for each dataset and despite the difference in cell types and experiments, around 100 of the new pathways are appeared common across the datasets. They are related to reduced inflammation, autoimmunity, diabetes and asthma. We have identified downregulation of NfκB pathway via Notch1 signalling as new pathway for reduced immunity in microgravity. Induction of few cancer types including liver cancer and leukaemia and increased drug response to cancer in microgravity are also found. Increase in olfactory signal transduction is also identified. Genes, based on their expression pattern, are clustered and mathematically stable clusters are identified. The network mapping of genes within a cluster indicates the plausible functional connections in microgravity. This pipeline gives a new systems level picture of human cells under microgravity, generates testable hypothesis and may help estimating risk and developing medicine for space missions.

  1. Gene expression profiles in liver of mouse after chronic exposure to drinking water.

    PubMed

    Wu, Bing; Zhang, Yan; Zhao, Dayong; Zhang, Xuxiang; Kong, Zhiming; Cheng, Shupei

    2009-10-01

    cDNA micorarray approach was applied to hepatic transcriptional profile analysis in male mouse (Mus musculus, ICR) to assess the potential health effects of drinking water in Nanjing, China. Mice were treated with continuous exposure to drinking water for 90 days. Hepatic gene expression was analyzed with Affymetrix Mouse Genome 430A 2.0 arrays, and pathway analysis was carried out by Molecule Annotation System 2.0 and KEGG pathway database. A total of 836 genes were found to be significantly altered (1.5-fold, P < or = 0.05), including 294 up-regulated genes and 542 down-regulated genes. According to biological pathway analysis, drinking water exposure resulted in aberration of gene expression and biological pathways linked to xenobiotic metabolism, signal transduction, cell cycle and oxidative stress response. Further, deregulation of several genes associated with carcinogenesis or tumor progression including Ccnd1, Egfr, Map2k3, Mcm2, Orc2l and Smad2 was observed. Although transcription changes in identified genes are unlikely to be used as a sole indicator of adverse health effects, the results of this study could enhance our understanding of early toxic effects of drinking water exposure and support future studies on drinking water safety.

  2. ChemProt-2.0: visual navigation in a disease chemical biology database

    PubMed Central

    Kim Kjærulff, Sonny; Wich, Louis; Kringelum, Jens; Jacobsen, Ulrik P.; Kouskoumvekaki, Irene; Audouze, Karine; Lund, Ole; Brunak, Søren; Oprea, Tudor I.; Taboureau, Olivier

    2013-01-01

    ChemProt-2.0 (http://www.cbs.dtu.dk/services/ChemProt-2.0) is a public available compilation of multiple chemical–protein annotation resources integrated with diseases and clinical outcomes information. The database has been updated to >1.15 million compounds with 5.32 millions bioactivity measurements for 15 290 proteins. Each protein is linked to quality-scored human protein–protein interactions data based on more than half a million interactions, for studying diseases and biological outcomes (diseases, pathways and GO terms) through protein complexes. In ChemProt-2.0, therapeutic effects as well as adverse drug reactions have been integrated allowing for suggesting proteins associated to clinical outcomes. New chemical structure fingerprints were computed based on the similarity ensemble approach. Protein sequence similarity search was also integrated to evaluate the promiscuity of proteins, which can help in the prediction of off-target effects. Finally, the database was integrated into a visual interface that enables navigation of the pharmacological space for small molecules. Filtering options were included in order to facilitate and to guide dynamic search of specific queries. PMID:23185041

  3. The functional cancer map: a systems-level synopsis of genetic deregulation in cancer.

    PubMed

    Krupp, Markus; Maass, Thorsten; Marquardt, Jens U; Staib, Frank; Bauer, Tobias; König, Rainer; Biesterfeld, Stefan; Galle, Peter R; Tresch, Achim; Teufel, Andreas

    2011-06-30

    Cancer cells are characterized by massive dysegulation of physiological cell functions with considerable disruption of transcriptional regulation. Genome-wide transcriptome profiling can be utilized for early detection and molecular classification of cancers. Accurate discrimination of functionally different tumor types may help to guide selection of targeted therapy in translational research. Concise grouping of tumor types in cancer maps according to their molecular profile may further be helpful for the development of new therapeutic modalities or open new avenues for already established therapies. Complete available human tumor data of the Stanford Microarray Database was downloaded and filtered for relevance, adequacy and reliability. A total of 649 tumor samples from more than 1400 experiments and 58 different tissues were analyzed. Next, a method to score deregulation of KEGG pathway maps in different tumor entities was established, which was then used to convert hundreds of gene expression profiles into corresponding tumor-specific pathway activity profiles. Based on the latter, we defined a measure for functional similarity between tumor entities, which yielded to phylogeny of tumors. We provide a comprehensive, easy-to-interpret functional cancer map that characterizes tumor types with respect to their biological and functional behavior. Consistently, multiple pathways commonly associated with tumor progression were revealed as common features in the majority of the tumors. However, several pathways previously not linked to carcinogenesis were identified in multiple cancers suggesting an essential role of these pathways in cancer biology. Among these pathways were 'ECM-receptor interaction', 'Complement and Coagulation cascades', and 'PPAR signaling pathway'. The functional cancer map provides a systematic view on molecular similarities across different cancers by comparing tumors on the level of pathway activity. This work resulted in identification of novel superimposed functional pathways potentially linked to cancer biology. Therefore, our work may serve as a starting point for rationalizing combination of tumor therapeutics as well as for expanding the application of well-established targeted tumor therapies.

  4. Improving Microbial Genome Annotations in an Integrated Database Context

    PubMed Central

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.

    2013-01-01

    Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620

  5. EuPathDB: the eukaryotic pathogen genomics database resource

    PubMed Central

    Aurrecoechea, Cristina; Barreto, Ana; Basenko, Evelina Y.; Brestelli, John; Brunk, Brian P.; Cade, Shon; Crouch, Kathryn; Doherty, Ryan; Falke, Dave; Fischer, Steve; Gajria, Bindu; Harb, Omar S.; Heiges, Mark; Hertz-Fowler, Christiane; Hu, Sufen; Iodice, John; Kissinger, Jessica C.; Lawrence, Cris; Li, Wei; Pinney, Deborah F.; Pulman, Jane A.; Roos, David S.; Shanmugasundram, Achchuthan; Silva-Franco, Fatima; Steinbiss, Sascha; Stoeckert, Christian J.; Spruill, Drew; Wang, Haiming; Warrenfeltz, Susanne; Zheng, Jie

    2017-01-01

    The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host–pathogen interactions. PMID:27903906

  6. Identification of Biological Targets of Therapeutic Intervention for Hepatocellular Carcinoma by Integrated Bioinformatical Analysis.

    PubMed

    Hu, Wei Qi; Wang, Wei; Fang, Di Long; Yin, Xue Feng

    2018-05-24

    BACKGROUND We screened the potential molecular targets and investigated the molecular mechanisms of hepatocellular carcinoma (HCC). MATERIAL AND METHODS Microarray data of GSE47786, including the 40 μM berberine-treated HepG2 human hepatoma cell line and 0.08% DMSO-treated as control cells samples, was downloaded from the GEO database. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) enrichment analyses were performed; the protein-protein interaction (PPI) networks were constructed using STRING database and Cytoscape; the genetic alteration, neighboring genes networks, and survival analysis of hub genes were explored by cBio portal; and the expression of mRNA level of hub genes was obtained from the Oncomine databases. RESULTS A total of 56 upregulated and 8 downregulated DEGs were identified. The GO analysis results were significantly enriched in cell-cycle arrest, regulation of transcription, DNA-dependent, protein amino acid phosphorylation, cell cycle, and apoptosis. The KEGG pathway analysis showed that DEGs were enriched in MAPK signaling pathway, ErbB signaling pathway, and p53 signaling pathway. JUN, EGR1, MYC, and CDKN1A were identified as hub genes in PPI networks. The genetic alteration of hub genes was mainly concentrated in amplification. TP53, NDRG1, and MAPK15 were found in neighboring genes networks. Altered genes had worse overall survival and disease-free survival than unaltered genes. The expressions of EGR1, MYC, and CDKN1A were significantly increased, but expression of JUN was not, in the Roessler Liver datasets. CONCLUSIONS We found that JUN, EGR1, MYC, and CDKN1A might be used as diagnostic and therapeutic molecular biomarkers and broaden our understanding of the molecular mechanisms of HCC.

  7. Drug-Path: a database for drug-induced pathways

    PubMed Central

    Zeng, Hui; Cui, Qinghua

    2015-01-01

    Some databases for drug-associated pathways have been built and are publicly available. However, the pathways curated in most of these databases are drug-action or drug-metabolism pathways. In recent years, high-throughput technologies such as microarray and RNA-sequencing have produced lots of drug-induced gene expression profiles. Interestingly, drug-induced gene expression profile frequently show distinct patterns, indicating that drugs normally induce the activation or repression of distinct pathways. Therefore, these pathways contribute to study the mechanisms of drugs and drug-repurposing. Here, we present Drug-Path, a database of drug-induced pathways, which was generated by KEGG pathway enrichment analysis for drug-induced upregulated genes and downregulated genes based on drug-induced gene expression datasets in Connectivity Map. Drug-Path provides user-friendly interfaces to retrieve, visualize and download the drug-induced pathway data in the database. In addition, the genes deregulated by a given drug are highlighted in the pathways. All data were organized using SQLite. The web site was implemented using Django, a Python web framework. Finally, we believe that this database will be useful for related researches. Database URL: http://www.cuilab.cn/drugpath PMID:26130661

  8. The exploration of contrasting pathways in Triple Negative Breast Cancer (TNBC).

    PubMed

    Narrandes, Shavira; Huang, Shujun; Murphy, Leigh; Xu, Wayne

    2018-01-04

    Triple Negative Breast Cancers (TNBCs) lack the appropriate targets for currently used breast cancer therapies, conferring an aggressive phenotype, more frequent relapse and poorer survival rates. The biological heterogeneity of TNBC complicates the clinical treatment further. We have explored and compared the biological pathways in TNBC and other subtypes of breast cancers, using an in silico approach and the hypothesis that two opposing effects (Yin and Yang) pathways in cancer cells determine the fate of cancer cells. Identifying breast subgroup specific components of these opposing pathways may aid in selecting potential therapeutic targets as well as further classifying the heterogeneous TNBC subtype. Gene expression and patient clinical data from The Cancer Genome Atlas (TCGA) and the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) were used for this study. Gene Set Enrichment Analysis (GSEA) was used to identify the more active pathways in cancer (Yin) than in normal and the more active pathways in normal (Yang) than in cancer. The clustering analysis was performed to compare pathways of TNBC with other types of breast cancers. The association of pathway classified TNBC sub-groups to clinical outcomes was tested using Cox regression model. Among 4729 curated canonical pathways in GSEA database, 133 Yin pathways (FDR < 0.05) and 71 Yang pathways (p-value <0.05) were discovered in TNBC. The FOXM1 is the top Yin pathway while PPARα is the top Yang pathway in TNBC. The TNBC and other types of breast cancers showed different pathways enrichment significance profiles. Using top Yin and Yang pathways as classifier, the TNBC can be further subtyped into six sub-groups each having different clinical outcomes. We first reported that the FOMX1 pathway is the most upregulated and the PPARα pathway is the most downregulated pathway in TNBC. These two pathways could be simultaneously targeted in further studies. Also the pathway classifier we performed in this study provided insight into the TNBC heterogeneity.

  9. Defining a Computational Framework for the Assessment of ...

    EPA Pesticide Factsheets

    The Adverse Outcome Pathway (AOP) framework describes the effects of environmental stressors across multiple scales of biological organization and function. This includes an evaluation of the potential for each key event to occur across a broad range of species in order to determine the taxonomic applicability of each AOP. Computational tools are needed to facilitate this process. Recently, we developed a tool that uses sequence homology to evaluate the applicability of molecular initiating events across species (Lalone et al., Toxicol. Sci., 2016). To extend our ability to make computational predictions at higher levels of biological organization, we have created the AOPdb. This database links molecular targets identified associated with key events in the AOPwiki to publically available data (e.g. gene-protein, pathway, species orthology, ontology, chemical, disease) including ToxCast assay information. The AOPdb combines different data types in order to characterize the impacts of chemicals to human health and the environment and serves as a decision support tool for case study development in the area of taxonomic applicability. As a proof of concept, the AOPdb allows identification of relevant molecular targets, biological pathways, and chemical and disease associations across species for four AOPs from the AOP-Wiki (https://aopwiki.org): Estrogen receptor antagonism leading to reproductive dysfunction (Aop:30); Aromatase inhibition leading to reproductive d

  10. Synthetic Peptide Arrays for Pathway-Level Protein Monitoring by Liquid Chromatography-Tandem Mass Spectrometry*

    PubMed Central

    Hewel, Johannes A.; Liu, Jian; Onishi, Kento; Fong, Vincent; Chandran, Shamanta; Olsen, Jonathan B.; Pogoutse, Oxana; Schutkowski, Mike; Wenschuh, Holger; Winkler, Dirk F. H.; Eckler, Larry; Zandstra, Peter W.; Emili, Andrew

    2010-01-01

    Effective methods to detect and quantify functionally linked regulatory proteins in complex biological samples are essential for investigating mammalian signaling pathways. Traditional immunoassays depend on proprietary reagents that are difficult to generate and multiplex, whereas global proteomic profiling can be tedious and can miss low abundance proteins. Here, we report a target-driven liquid chromatography-tandem mass spectrometry (LC-MS/MS) strategy for selectively examining the levels of multiple low abundance components of signaling pathways which are refractory to standard shotgun screening procedures and hence appear limited in current MS/MS repositories. Our stepwise approach consists of: (i) synthesizing microscale peptide arrays, including heavy isotope-labeled internal standards, for use as high quality references to (ii) build empirically validated high density LC-MS/MS detection assays with a retention time scheduling system that can be used to (iii) identify and quantify endogenous low abundance protein targets in complex biological mixtures with high accuracy by correlation to a spectral database using new software tools. The method offers a flexible, rapid, and cost-effective means for routine proteomic exploration of biological systems including “label-free” quantification, while minimizing spurious interferences. As proof-of-concept, we have examined the abundance of transcription factors and protein kinases mediating pluripotency and self-renewal in embryonic stem cell populations. PMID:20467045

  11. Comparison of transcripts in Phalaenopsis bellina and Phalaenopsis equestris (Orchidaceae) flowers to deduce monoterpene biosynthesis pathway.

    PubMed

    Hsiao, Yu-Yun; Tsai, Wen-Chieh; Kuoh, Chang-Sheng; Huang, Tian-Hsiang; Wang, Hei-Chia; Wu, Tian-Shung; Leu, Yann-Lii; Chen, Wen-Huei; Chen, Hong-Hwa

    2006-07-13

    Floral scent is one of the important strategies for ensuring fertilization and for determining seed or fruit set. Research on plant scents has hampered mainly by the invisibility of this character, its dynamic nature, and complex mixtures of components that are present in very small quantities. Most progress in scent research, as in other areas of plant biology, has come from the use of molecular and biochemical techniques. Although volatile components have been identified in several orchid species, the biosynthetic pathways of orchid flower fragrance are far from understood. We investigated how flower fragrance was generated in certain Phalaenopsis orchids by determining the chemical components of the floral scent, identifying floral expressed-sequence-tags (ESTs), and deducing the pathways of floral scent biosynthesis in Phalaneopsis bellina by bioinformatics analysis. The main chemical components in the P. bellina flower were shown by gas chromatography-mass spectrometry to be monoterpenoids, benzenoids and phenylpropanoids. The set of floral scent producing enzymes in the biosynthetic pathway from glyceraldehyde-3-phosphate (G3P) to geraniol and linalool were recognized through data mining of the P. bellina floral EST database (dbEST). Transcripts preferentially expressed in P. bellina were distinguished by comparing the scent floral dbEST to that of a scentless species, P. equestris, and included those encoding lipoxygenase, epimerase, diacylglycerol kinase and geranyl diphosphate synthase. In addition, EST filtering results showed that transcripts encoding signal transduction and Myb transcription factors and methyltransferase, in addition to those for scent biosynthesis, were detected by in silico hybridization of the P. bellina unigene database against those of the scentless species, rice and Arabidopsis. Altogether, we pinpointed 66% of the biosynthetic steps from G3P to geraniol, linalool and their derivatives. This systems biology program combined chemical analysis, genomics and bioinformatics to elucidate the scent biosynthesis pathway and identify the relevant genes. It integrates the forward and reverse genetic approaches to knowledge discovery by which researchers can study non-model plants.

  12. A Systems Biology-Based Investigation into the Pharmacological Mechanisms of Sheng-ma-bie-jia-tang Acting on Systemic Lupus Erythematosus by Multi-Level Data Integration.

    PubMed

    Huang, Lin; Lv, Qi; Liu, Fenfen; Shi, Tieliu; Wen, Chengping

    2015-11-12

    Sheng-ma-bie-jia-tang (SMBJT) is a Traditional Chinese Medicine (TCM) formula that is widely used for the treatment of Systemic Lupus Erythematosus (SLE) in China. However, molecular mechanism behind this formula remains unknown. Here, we systematically analyzed targets of the ingredients in SMBJT to evaluate its potential molecular mechanism. First, we collected 1,267 targets from our previously published database, the Traditional Chinese Medicine Integrated Database (TCMID). Next, we conducted gene ontology and pathway enrichment analyses for these targets and determined that they were enriched in metabolism (amino acids, fatty acids, etc.) and signaling pathways (chemokines, Toll-like receptors, adipocytokines, etc.). 96 targets, which are known SLE disease proteins, were identified as essential targets and the rest 1,171 targets were defined as common targets of this formula. The essential targets directly interacted with SLE disease proteins. Besides, some common targets also had essential connections to both key targets and SLE disease proteins in enriched signaling pathway, e.g. toll-like receptor signaling pathway. We also found distinct function of essential and common targets in immune system processes. This multi-level approach to deciphering the underlying mechanism of SMBJT treatment of SLE details a new perspective that will further our understanding of TCM formulas.

  13. Bioinformatics for spermatogenesis: annotation of male reproduction based on proteomics

    PubMed Central

    Zhou, Tao; Zhou, Zuo-Min; Guo, Xue-Jiang

    2013-01-01

    Proteomics strategies have been widely used in the field of male reproduction, both in basic and clinical research. Bioinformatics methods are indispensable in proteomics-based studies and are used for data presentation, database construction and functional annotation. In the present review, we focus on the functional annotation of gene lists obtained through qualitative or quantitative methods, summarizing the common and male reproduction specialized proteomics databases. We introduce several integrated tools used to find the hidden biological significance from the data obtained. We further describe in detail the information on male reproduction derived from Gene Ontology analyses, pathway analyses and biomedical analyses. We provide an overview of bioinformatics annotations in spermatogenesis, from gene function to biological function and from biological function to clinical application. On the basis of recently published proteomics studies and associated data, we show that bioinformatics methods help us to discover drug targets for sperm motility and to scan for cancer-testis genes. In addition, we summarize the online resources relevant to male reproduction research for the exploration of the regulation of spermatogenesis. PMID:23852026

  14. Prioritizing biological pathways by recognizing context in time-series gene expression data.

    PubMed

    Lee, Jusang; Jo, Kyuri; Lee, Sunwon; Kang, Jaewoo; Kim, Sun

    2016-12-23

    The primary goal of pathway analysis using transcriptome data is to find significantly perturbed pathways. However, pathway analysis is not always successful in identifying pathways that are truly relevant to the context under study. A major reason for this difficulty is that a single gene is involved in multiple pathways. In the KEGG pathway database, there are 146 genes, each of which is involved in more than 20 pathways. Thus activation of even a single gene will result in activation of many pathways. This complex relationship often makes the pathway analysis very difficult. While we need much more powerful pathway analysis methods, a readily available alternative way is to incorporate the literature information. In this study, we propose a novel approach for prioritizing pathways by combining results from both pathway analysis tools and literature information. The basic idea is as follows. Whenever there are enough articles that provide evidence on which pathways are relevant to the context, we can be assured that the pathways are indeed related to the context, which is termed as relevance in this paper. However, if there are few or no articles reported, then we should rely on the results from the pathway analysis tools, which is termed as significance in this paper. We realized this concept as an algorithm by introducing Context Score and Impact Score and then combining the two into a single score. Our method ranked truly relevant pathways significantly higher than existing pathway analysis tools in experiments with two data sets. Our novel framework was implemented as ContextTRAP by utilizing two existing tools, TRAP and BEST. ContextTRAP will be a useful tool for the pathway based analysis of gene expression data since the user can specify the context of the biological experiment in a set of keywords. The web version of ContextTRAP is available at http://biohealth.snu.ac.kr/software/contextTRAP .

  15. Patterns of population differentiation of candidate genes for cardiovascular disease.

    PubMed

    Kullo, Iftikhar J; Ding, Keyue

    2007-07-12

    The basis for ethnic differences in cardiovascular disease (CVD) susceptibility is not fully understood. We investigated patterns of population differentiation (FST) of a set of genes in etiologic pathways of CVD among 3 ethnic groups: Yoruba in Nigeria (YRI), Utah residents with European ancestry (CEU), and Han Chinese (CHB) + Japanese (JPT). We identified 37 pathways implicated in CVD based on the PANTHER classification and 416 genes in these pathways were further studied; these genes belonged to 6 biological processes (apoptosis, blood circulation and gas exchange, blood clotting, homeostasis, immune response, and lipoprotein metabolism). Genotype data were obtained from the HapMap database. We calculated FST for 15,559 common SNPs (minor allele frequency > or = 0.10 in at least one population) in genes that co-segregated among the populations, as well as an average-weighted FST for each gene. SNPs were classified as putatively functional (non-synonymous and untranslated regions) or non-functional (intronic and synonymous sites). Mean FST values for common putatively functional variants were significantly higher than FST values for nonfunctional variants. A significant variation in FST was also seen based on biological processes; the processes of 'apoptosis' and 'lipoprotein metabolism' showed an excess of genes with high FST. Thus, putative functional SNPs in genes in etiologic pathways for CVD show greater population differentiation than non-functional SNPs and a significant variance of FST values was noted among pairwise population comparisons for different biological processes. These results suggest a possible basis for varying susceptibility to CVD among ethnic groups.

  16. Aligning Metabolic Pathways Exploiting Binary Relation of Reactions.

    PubMed

    Huang, Yiran; Zhong, Cheng; Lin, Hai Xiang; Huang, Jing

    2016-01-01

    Metabolic pathway alignment has been widely used to find one-to-one and/or one-to-many reaction mappings to identify the alternative pathways that have similar functions through different sets of reactions, which has important applications in reconstructing phylogeny and understanding metabolic functions. The existing alignment methods exhaustively search reaction sets, which may become infeasible for large pathways. To address this problem, we present an effective alignment method for accurately extracting reaction mappings between two metabolic pathways. We show that connected relation between reactions can be formalized as binary relation of reactions in metabolic pathways, and the multiplications of zero-one matrices for binary relations of reactions can be accomplished in finite steps. By utilizing the multiplications of zero-one matrices for binary relation of reactions, we efficiently obtain reaction sets in a small number of steps without exhaustive search, and accurately uncover biologically relevant reaction mappings. Furthermore, we introduce a measure of topological similarity of nodes (reactions) by comparing the structural similarity of the k-neighborhood subgraphs of the nodes in aligning metabolic pathways. We employ this similarity metric to improve the accuracy of the alignments. The experimental results on the KEGG database show that when compared with other state-of-the-art methods, in most cases, our method obtains better performance in the node correctness and edge correctness, and the number of the edges of the largest common connected subgraph for one-to-one reaction mappings, and the number of correct one-to-many reaction mappings. Our method is scalable in finding more reaction mappings with better biological relevance in large metabolic pathways.

  17. Computational biology for ageing

    PubMed Central

    Wieser, Daniela; Papatheodorou, Irene; Ziehm, Matthias; Thornton, Janet M.

    2011-01-01

    High-throughput genomic and proteomic technologies have generated a wealth of publicly available data on ageing. Easy access to these data, and their computational analysis, is of great importance in order to pinpoint the causes and effects of ageing. Here, we provide a description of the existing databases and computational tools on ageing that are available for researchers. We also describe the computational approaches to data interpretation in the field of ageing including gene expression, comparative and pathway analyses, and highlight the challenges for future developments. We review recent biological insights gained from applying bioinformatics methods to analyse and interpret ageing data in different organisms, tissues and conditions. PMID:21115530

  18. Transcriptomic analysis of flower development in wintersweet (Chimonanthus praecox).

    PubMed

    Liu, Daofeng; Sui, Shunzhao; Ma, Jing; Li, Zhineng; Guo, Yulong; Luo, Dengpan; Yang, Jianfeng; Li, Mingyang

    2014-01-01

    Wintersweet (Chimonanthus praecox) is familiar as a garden plant and woody ornamental flower. On account of its unique flowering time and strong fragrance, it has a high ornamental and economic value. Despite a long history of human cultivation, our understanding of wintersweet genetics and molecular biology remains scant, reflecting a lack of basic genomic and transcriptomic data. In this study, we assembled three cDNA libraries, from three successive stages in flower development, designated as the flower bud with displayed petal, open flower and senescing flower stages. Using the Illumina RNA-Seq method, we obtained 21,412,928, 26,950,404, 24,912,954 qualified Illumina reads, respectively, for the three successive stages. The pooled reads from all three libraries were then assembled into 106,995 transcripts, 51,793 of which were annotated in the NCBI non-redundant protein database. Of these annotated sequences, 32,649 and 21,893 transcripts were assigned to gene ontology categories and clusters of orthologous groups, respectively. We could map 15,587 transcripts onto 312 pathways using the Kyoto Encyclopedia of Genes and Genomes pathway database. Based on these transcriptomic data, we obtained a large number of candidate genes that were differentially expressed at the open flower and senescing flower stages. An analysis of differentially expressed genes involved in plant hormone signal transduction pathways indicated that although flower opening and senescence may be independent of the ethylene signaling pathway in wintersweet, salicylic acid may be involved in the regulation of flower senescence. We also succeeded in isolating key genes of floral scent biosynthesis and proposed a biosynthetic pathway for monoterpenes and sesquiterpenes in wintersweet flowers, based on the annotated sequences. This comprehensive transcriptomic analysis presents fundamental information on the genes and pathways which are involved in flower development in wintersweet. And our data provided a useful database for further research of wintersweet and other Calycanthaceae family plants.

  19. Transcriptomic Analysis of Flower Development in Wintersweet (Chimonanthus praecox)

    PubMed Central

    Liu, Daofeng; Sui, Shunzhao; Ma, Jing; Li, Zhineng; Guo, Yulong; Luo, Dengpan; Yang, Jianfeng; Li, Mingyang

    2014-01-01

    Wintersweet (Chimonanthus praecox) is familiar as a garden plant and woody ornamental flower. On account of its unique flowering time and strong fragrance, it has a high ornamental and economic value. Despite a long history of human cultivation, our understanding of wintersweet genetics and molecular biology remains scant, reflecting a lack of basic genomic and transcriptomic data. In this study, we assembled three cDNA libraries, from three successive stages in flower development, designated as the flower bud with displayed petal, open flower and senescing flower stages. Using the Illumina RNA-Seq method, we obtained 21,412,928, 26,950,404, 24,912,954 qualified Illumina reads, respectively, for the three successive stages. The pooled reads from all three libraries were then assembled into 106,995 transcripts, 51,793 of which were annotated in the NCBI non-redundant protein database. Of these annotated sequences, 32,649 and 21,893 transcripts were assigned to gene ontology categories and clusters of orthologous groups, respectively. We could map 15,587 transcripts onto 312 pathways using the Kyoto Encyclopedia of Genes and Genomes pathway database. Based on these transcriptomic data, we obtained a large number of candidate genes that were differentially expressed at the open flower and senescing flower stages. An analysis of differentially expressed genes involved in plant hormone signal transduction pathways indicated that although flower opening and senescence may be independent of the ethylene signaling pathway in wintersweet, salicylic acid may be involved in the regulation of flower senescence. We also succeeded in isolating key genes of floral scent biosynthesis and proposed a biosynthetic pathway for monoterpenes and sesquiterpenes in wintersweet flowers, based on the annotated sequences. This comprehensive transcriptomic analysis presents fundamental information on the genes and pathways which are involved in flower development in wintersweet. And our data provided a useful database for further research of wintersweet and other Calycanthaceae family plants. PMID:24489818

  20. Drug-Path: a database for drug-induced pathways.

    PubMed

    Zeng, Hui; Qiu, Chengxiang; Cui, Qinghua

    2015-01-01

    Some databases for drug-associated pathways have been built and are publicly available. However, the pathways curated in most of these databases are drug-action or drug-metabolism pathways. In recent years, high-throughput technologies such as microarray and RNA-sequencing have produced lots of drug-induced gene expression profiles. Interestingly, drug-induced gene expression profile frequently show distinct patterns, indicating that drugs normally induce the activation or repression of distinct pathways. Therefore, these pathways contribute to study the mechanisms of drugs and drug-repurposing. Here, we present Drug-Path, a database of drug-induced pathways, which was generated by KEGG pathway enrichment analysis for drug-induced upregulated genes and downregulated genes based on drug-induced gene expression datasets in Connectivity Map. Drug-Path provides user-friendly interfaces to retrieve, visualize and download the drug-induced pathway data in the database. In addition, the genes deregulated by a given drug are highlighted in the pathways. All data were organized using SQLite. The web site was implemented using Django, a Python web framework. Finally, we believe that this database will be useful for related researches. © The Author(s) 2015. Published by Oxford University Press.

  1. Classification of Chemical Compounds to Support Complex Queries in a Pathway Database

    PubMed Central

    Weidemann, Andreas; Kania, Renate; Peiss, Christian; Rojas, Isabel

    2004-01-01

    Data quality in biological databases has become a topic of great discussion. To provide high quality data and to deal with the vast amount of biochemical data, annotators and curators need to be supported by software that carries out part of their work in an (semi-) automatic manner. The detection of errors and inconsistencies is a part that requires the knowledge of domain experts, thus in most cases it is done manually, making it very expensive and time-consuming. This paper presents two tools to partially support the curation of data on biochemical pathways. The tool enables the automatic classification of chemical compounds based on their respective SMILES strings. Such classification allows the querying and visualization of biochemical reactions at different levels of abstraction, according to the level of detail at which the reaction participants are described. Chemical compounds can be classified in a flexible manner based on different criteria. The support of the process of data curation is provided by facilitating the detection of compounds that are identified as different but that are actually the same. This is also used to identify similar reactions and, in turn, pathways. PMID:18629066

  2. In silico study of protein to protein interaction analysis of AMP-activated protein kinase and mitochondrial activity in three different farm animal species

    NASA Astrophysics Data System (ADS)

    Prastowo, S.; Widyas, N.

    2018-03-01

    AMP-activated protein kinase (AMPK) is cellular energy censor which works based on ATP and AMP concentration. This protein interacts with mitochondria in determine its activity to generate energy for cell metabolism purposes. For that, this paper aims to compare the protein to protein interaction of AMPK and mitochondrial activity genes in the metabolism of known animal farm (domesticated) that are cattle (Bos taurus), pig (Sus scrofa) and chicken (Gallus gallus). In silico study was done using STRING V.10 as prominent protein interaction database, followed with biological function comparison in KEGG PATHWAY database. Set of genes (12 in total) were used as input analysis that are PRKAA1, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3, PPARGC1, ACC, CPT1B, NRF2 and SOD. The first 7 genes belong to gene in AMPK family, while the last 5 belong to mitochondrial activity genes. The protein interaction result shows 11, 8 and 5 metabolism pathways in Bos taurus, Sus scrofa and Gallus gallus, respectively. The top pathway in Bos taurus is AMPK signaling pathway (10 genes), Sus scrofa is Adipocytokine signaling pathway (8 genes) and Gallus gallus is FoxO signaling pathway (5 genes). Moreover, the common pathways found in those 3 species are Adipocytokine signaling pathway, Insulin signaling pathway and FoxO signaling pathway. Genes clustered in Adipocytokine and Insulin signaling pathway are PRKAA2, PPARGC1A, PRKAB1 and PRKAG2. While, in FoxO signaling pathway are PRKAA2, PRKAB1, PRKAG2. According to that, we found PRKAA2, PRKAB1 and PRKAG2 are the common genes. Based on the bioinformatics analysis, we can demonstrate that protein to protein interaction shows distinct different of metabolism in different species. However, further validation is needed to give a clear explanation.

  3. Creation of a Genome-Wide Metabolic Pathway Database for Populus trichocarpa Using a New Approach for Reconstruction and Curation of Metabolic Pathways for Plants1[W][OA

    PubMed Central

    Zhang, Peifen; Dreher, Kate; Karthikeyan, A.; Chi, Anjo; Pujar, Anuradha; Caspi, Ron; Karp, Peter; Kirkup, Vanessa; Latendresse, Mario; Lee, Cynthia; Mueller, Lukas A.; Muller, Robert; Rhee, Seung Yon

    2010-01-01

    Metabolic networks reconstructed from sequenced genomes or transcriptomes can help visualize and analyze large-scale experimental data, predict metabolic phenotypes, discover enzymes, engineer metabolic pathways, and study metabolic pathway evolution. We developed a general approach for reconstructing metabolic pathway complements of plant genomes. Two new reference databases were created and added to the core of the infrastructure: a comprehensive, all-plant reference pathway database, PlantCyc, and a reference enzyme sequence database, RESD, for annotating metabolic functions of protein sequences. PlantCyc (version 3.0) includes 714 metabolic pathways and 2,619 reactions from over 300 species. RESD (version 1.0) contains 14,187 literature-supported enzyme sequences from across all kingdoms. We used RESD, PlantCyc, and MetaCyc (an all-species reference metabolic pathway database), in conjunction with the pathway prediction software Pathway Tools, to reconstruct a metabolic pathway database, PoplarCyc, from the recently sequenced genome of Populus trichocarpa. PoplarCyc (version 1.0) contains 321 pathways with 1,807 assigned enzymes. Comparing PoplarCyc (version 1.0) with AraCyc (version 6.0, Arabidopsis [Arabidopsis thaliana]) showed comparable numbers of pathways distributed across all domains of metabolism in both databases, except for a higher number of AraCyc pathways in secondary metabolism and a 1.5-fold increase in carbohydrate metabolic enzymes in PoplarCyc. Here, we introduce these new resources and demonstrate the feasibility of using them to identify candidate enzymes for specific pathways and to analyze metabolite profiling data through concrete examples. These resources can be searched by text or BLAST, browsed, and downloaded from our project Web site (http://plantcyc.org). PMID:20522724

  4. Perigone Lobe Transcriptome Analysis Provides Insights into Rafflesia cantleyi Flower Development.

    PubMed

    Lee, Xin-Wei; Mat-Isa, Mohd-Noor; Mohd-Elias, Nur-Atiqah; Aizat-Juhari, Mohd Afiq; Goh, Hoe-Han; Dear, Paul H; Chow, Keng-See; Haji Adam, Jumaat; Mohamed, Rahmah; Firdaus-Raih, Mohd; Wan, Kiew-Lian

    2016-01-01

    Rafflesia is a biologically enigmatic species that is very rare in occurrence and possesses an extraordinary morphology. This parasitic plant produces a gigantic flower up to one metre in diameter with no leaves, stem or roots. However, little is known about the floral biology of this species especially at the molecular level. In an effort to address this issue, we have generated and characterised the transcriptome of the Rafflesia cantleyi flower, and performed a comparison with the transcriptome of its floral bud to predict genes that are expressed and regulated during flower development. Approximately 40 million sequencing reads were generated and assembled de novo into 18,053 transcripts with an average length of 641 bp. Of these, more than 79% of the transcripts had significant matches to annotated sequences in the public protein database. A total of 11,756 and 7,891 transcripts were assigned to Gene Ontology categories and clusters of orthologous groups respectively. In addition, 6,019 transcripts could be mapped to 129 pathways in Kyoto Encyclopaedia of Genes and Genomes Pathway database. Digital abundance analysis identified 52 transcripts with very high expression in the flower transcriptome of R. cantleyi. Subsequently, analysis of differential expression between developing flower and the floral bud revealed a set of 105 transcripts with potential role in flower development. Our work presents a deep transcriptome resource analysis for the developing flower of R. cantleyi. Genes potentially involved in the growth and development of the R. cantleyi flower were identified and provide insights into biological processes that occur during flower development.

  5. Cancer-related marketing centrality motifs acting as pivot units in the human signaling network and mediating cross-talk between biological pathways.

    PubMed

    Li, Wan; Chen, Lina; Li, Xia; Jia, Xu; Feng, Chenchen; Zhang, Liangcai; He, Weiming; Lv, Junjie; He, Yuehan; Li, Weiguo; Qu, Xiaoli; Zhou, Yanyan; Shi, Yuchen

    2013-12-01

    Network motifs in central positions are considered to not only have more in-coming and out-going connections but are also localized in an area where more paths reach the networks. These central motifs have been extensively investigated to determine their consistent functions or associations with specific function categories. However, their functional potentials in the maintenance of cross-talk between different functional communities are unclear. In this paper, we constructed an integrated human signaling network from the Pathway Interaction Database. We identified 39 essential cancer-related motifs in central roles, which we called cancer-related marketing centrality motifs, using combined centrality indices on the system level. Our results demonstrated that these cancer-related marketing centrality motifs were pivotal units in the signaling network, and could mediate cross-talk between 61 biological pathways (25 could be mediated by one motif on average), most of which were cancer-related pathways. Further analysis showed that molecules of most marketing centrality motifs were in the same or adjacent subcellular localizations, such as the motif containing PI3K, PDK1 and AKT1 in the plasma membrane, to mediate signal transduction between 32 cancer-related pathways. Finally, we analyzed the pivotal roles of cancer genes in these marketing centrality motifs in the pathogenesis of cancers, and found that non-cancer genes were potential cancer-related genes.

  6. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more.

    PubMed

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-07-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: 'Find all diseases associated with Bisphenol A'. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more

    PubMed Central

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-01-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized ‘Given X, find all associated Ys’ query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: ‘Find all diseases associated with Bisphenol A’. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. PMID:25925572

  8. DR-GAS: a database of functional genetic variants and their phosphorylation states in human DNA repair systems.

    PubMed

    Sehgal, Manika; Singh, Tiratha Raj

    2014-04-01

    We present DR-GAS(1), a unique, consolidated and comprehensive DNA repair genetic association studies database of human DNA repair system. It presents information on repair genes, assorted mechanisms of DNA repair, linkage disequilibrium, haplotype blocks, nsSNPs, phosphorylation sites, associated diseases, and pathways involved in repair systems. DNA repair is an intricate process which plays an essential role in maintaining the integrity of the genome by eradicating the damaging effect of internal and external changes in the genome. Hence, it is crucial to extensively understand the intact process of DNA repair, genes involved, non-synonymous SNPs which perhaps affect the function, phosphorylated residues and other related genetic parameters. All the corresponding entries for DNA repair genes, such as proteins, OMIM IDs, literature references and pathways are cross-referenced to their respective primary databases. DNA repair genes and their associated parameters are either represented in tabular or in graphical form through images elucidated by computational and statistical analyses. It is believed that the database will assist molecular biologists, biotechnologists, therapeutic developers and other scientific community to encounter biologically meaningful information, and meticulous contribution of genetic level information towards treacherous diseases in human DNA repair systems. DR-GAS is freely available for academic and research purposes at: http://www.bioinfoindia.org/drgas. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Bridging Plant and Human Radiation Response and DNA Repair through an In Silico Approach

    PubMed Central

    Nikitaki, Zacharenia; Pavlopoulou, Athanasia; Holá, Marcela; Donà, Mattia; Michalopoulos, Ioannis; Balestrazzi, Alma; Angelis, Karel J.; Georgakilas, Alexandros G.

    2017-01-01

    The mechanisms of response to radiation exposure are conserved in plants and animals. The DNA damage response (DDR) pathways are the predominant molecular pathways activated upon exposure to radiation, both in plants and animals. The conserved features of DDR in plants and animals might facilitate interdisciplinary studies that cross traditional boundaries between animal and plant biology in order to expand the collection of biomarkers currently used for radiation exposure monitoring (REM) in environmental and biomedical settings. Genes implicated in trans-kingdom conserved DDR networks often triggered by ionizing radiation (IR) and UV light are deposited into biological databases. In this study, we have applied an innovative approach utilizing data pertinent to plant and human genes from publicly available databases towards the design of a ‘plant radiation biodosimeter’, that is, a plant and DDR gene-based platform that could serve as a REM reliable biomarker for assessing environmental radiation exposure and associated risk. From our analysis, in addition to REM biomarkers, a significant number of genes, both in human and Arabidopsis thaliana, not yet characterized as DDR, are suggested as possible DNA repair players. Last but not least, we provide an example on the applicability of an Arabidopsis thaliana—based plant system monitoring the role of cancer-related DNA repair genes BRCA1, BARD1 and PARP1 in processing DNA lesions. PMID:28587301

  10. Bridging Plant and Human Radiation Response and DNA Repair through an In Silico Approach.

    PubMed

    Nikitaki, Zacharenia; Pavlopoulou, Athanasia; Holá, Marcela; Donà, Mattia; Michalopoulos, Ioannis; Balestrazzi, Alma; Angelis, Karel J; Georgakilas, Alexandros G

    2017-06-06

    The mechanisms of response to radiation exposure are conserved in plants and animals. The DNA damage response (DDR) pathways are the predominant molecular pathways activated upon exposure to radiation, both in plants and animals. The conserved features of DDR in plants and animals might facilitate interdisciplinary studies that cross traditional boundaries between animal and plant biology in order to expand the collection of biomarkers currently used for radiation exposure monitoring (REM) in environmental and biomedical settings. Genes implicated in trans-kingdom conserved DDR networks often triggered by ionizing radiation (IR) and UV light are deposited into biological databases. In this study, we have applied an innovative approach utilizing data pertinent to plant and human genes from publicly available databases towards the design of a 'plant radiation biodosimeter', that is, a plant and DDR gene-based platform that could serve as a REM reliable biomarker for assessing environmental radiation exposure and associated risk. From our analysis, in addition to REM biomarkers, a significant number of genes, both in human and Arabidopsis thaliana, not yet characterized as DDR, are suggested as possible DNA repair players. Last but not least, we provide an example on the applicability of an Arabidopsis thaliana- based plant system monitoring the role of cancer-related DNA repair genes BRCA1 , BARD1 and PARP1 in processing DNA lesions.

  11. Freshwater Biological Traits Database (Final Report)

    EPA Science Inventory

    EPA announced the release of the final report, Freshwater Biological Traits Database. This report discusses the development of a database of freshwater biological traits. The database combines several existing traits databases into an online format. The database is also...

  12. SCRIPDB: a portal for easy access to syntheses, chemicals and reactions in patents

    PubMed Central

    Heifets, Abraham; Jurisica, Igor

    2012-01-01

    The patent literature is a rich catalog of biologically relevant chemicals; many public and commercial molecular databases contain the structures disclosed in patent claims. However, patents are an equally rich source of metadata about bioactive molecules, including mechanism of action, disease class, homologous experimental series, structural alternatives, or the synthetic pathways used to produce molecules of interest. Unfortunately, this metadata is discarded when chemical structures are deposited separately in databases. SCRIPDB is a chemical structure database designed to make this metadata accessible. SCRIPDB provides the full original patent text, reactions and relationships described within any individual patent, in addition to the molecular files common to structural databases. We discuss how such information is valuable in medical text mining, chemical image analysis, reaction extraction and in silico pharmaceutical lead optimization. SCRIPDB may be searched by exact chemical structure, substructure or molecular similarity and the results may be restricted to patents describing synthetic routes. SCRIPDB is available at http://dcv.uhnres.utoronto.ca/SCRIPDB. PMID:22067445

  13. AOP-DB Frontend: A user interface for the Adverse Outcome Pathways Database.

    EPA Science Inventory

    The EPA Adverse Outcome Pathway Database (AOP-DB) is a database resource that aggregates association relationships between AOPs, genes, chemicals, diseases, pathways, species orthology information, ontologies. The AOP-DB frontend is a simple yet powerful AOP-DB user interface in...

  14. AOP-DB Frontend: A user interface for the Adverse Outcome Pathways Database

    EPA Science Inventory

    The EPA Adverse Outcome Pathway Database (AOP-DB) is a database resource that aggregates association relationships between AOPs, genes, chemicals, diseases, pathways, species orthology information, ontologies. The AOP-DB frontend is a simple yet powerful user interface in the for...

  15. TEGS-CN: A Statistical Method for Pathway Analysis of Genome-wide Copy Number Profile.

    PubMed

    Huang, Yen-Tsung; Hsu, Thomas; Christiani, David C

    2014-01-01

    The effects of copy number alterations make up a significant part of the tumor genome profile, but pathway analyses of these alterations are still not well established. We proposed a novel method to analyze multiple copy numbers of genes within a pathway, termed Test for the Effect of a Gene Set with Copy Number data (TEGS-CN). TEGS-CN was adapted from TEGS, a method that we previously developed for gene expression data using a variance component score test. With additional development, we extend the method to analyze DNA copy number data, accounting for different sizes and thus various numbers of copy number probes in genes. The test statistic follows a mixture of X (2) distributions that can be obtained using permutation with scaled X (2) approximation. We conducted simulation studies to evaluate the size and the power of TEGS-CN and to compare its performance with TEGS. We analyzed a genome-wide copy number data from 264 patients of non-small-cell lung cancer. With the Molecular Signatures Database (MSigDB) pathway database, the genome-wide copy number data can be classified into 1814 biological pathways or gene sets. We investigated associations of the copy number profile of the 1814 gene sets with pack-years of cigarette smoking. Our analysis revealed five pathways with significant P values after Bonferroni adjustment (<2.8 × 10(-5)), including the PTEN pathway (7.8 × 10(-7)), the gene set up-regulated under heat shock (3.6 × 10(-6)), the gene sets involved in the immune profile for rejection of kidney transplantation (9.2 × 10(-6)) and for transcriptional control of leukocytes (2.2 × 10(-5)), and the ganglioside biosynthesis pathway (2.7 × 10(-5)). In conclusion, we present a new method for pathway analyses of copy number data, and causal mechanisms of the five pathways require further study.

  16. TrypsNetDB: An integrated framework for the functional characterization of trypanosomatid proteins

    PubMed Central

    Gazestani, Vahid H.; Yip, Chun Wai; Nikpour, Najmeh; Berghuis, Natasha

    2017-01-01

    Trypanosomatid parasites cause serious infections in humans and production losses in livestock. Due to the high divergence from other eukaryotes, such as humans and model organisms, the functional roles of many trypanosomatid proteins cannot be predicted by homology-based methods, rendering a significant portion of their proteins as uncharacterized. Recent technological advances have led to the availability of multiple systematic and genome-wide datasets on trypanosomatid parasites that are informative regarding the biological role(s) of their proteins. Here, we report TrypsNetDB (http://trypsNetDB.org), a web-based resource for the functional annotation of 16 different species/strains of trypanosomatid parasites. The database not only visualizes the network context of the queried protein(s) in an intuitive way but also examines the response of the represented network in more than 50 different biological contexts and its enrichment for various biological terms and pathways, protein sequence signatures, and potential RNA regulatory elements. The interactome core of the database, as of Jan 23, 2017, contains 101,187 interactions among 13,395 trypanosomatid proteins inferred from 97 genome-wide and focused studies on the interactome of these organisms. PMID:28158179

  17. Patterns of population differentiation of candidate genes for cardiovascular disease

    PubMed Central

    Kullo, Iftikhar J; Ding, Keyue

    2007-01-01

    Background The basis for ethnic differences in cardiovascular disease (CVD) susceptibility is not fully understood. We investigated patterns of population differentiation (FST) of a set of genes in etiologic pathways of CVD among 3 ethnic groups: Yoruba in Nigeria (YRI), Utah residents with European ancestry (CEU), and Han Chinese (CHB) + Japanese (JPT). We identified 37 pathways implicated in CVD based on the PANTHER classification and 416 genes in these pathways were further studied; these genes belonged to 6 biological processes (apoptosis, blood circulation and gas exchange, blood clotting, homeostasis, immune response, and lipoprotein metabolism). Genotype data were obtained from the HapMap database. Results We calculated FST for 15,559 common SNPs (minor allele frequency ≥ 0.10 in at least one population) in genes that co-segregated among the populations, as well as an average-weighted FST for each gene. SNPs were classified as putatively functional (non-synonymous and untranslated regions) or non-functional (intronic and synonymous sites). Mean FST values for common putatively functional variants were significantly higher than FST values for nonfunctional variants. A significant variation in FST was also seen based on biological processes; the processes of 'apoptosis' and 'lipoprotein metabolism' showed an excess of genes with high FST. Thus, putative functional SNPs in genes in etiologic pathways for CVD show greater population differentiation than non-functional SNPs and a significant variance of FST values was noted among pairwise population comparisons for different biological processes. Conclusion These results suggest a possible basis for varying susceptibility to CVD among ethnic groups. PMID:17626638

  18. atBioNet--an integrated network analysis tool for genomics and biomarker discovery.

    PubMed

    Ding, Yijun; Chen, Minjun; Liu, Zhichao; Ding, Don; Ye, Yanbin; Zhang, Min; Kelly, Reagan; Guo, Li; Su, Zhenqiang; Harris, Stephen C; Qian, Feng; Ge, Weigong; Fang, Hong; Xu, Xiaowei; Tong, Weida

    2012-07-20

    Large amounts of mammalian protein-protein interaction (PPI) data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should integrate different content-specific PPI databases into a comprehensive network format with a user-friendly platform to identify key functional modules/pathways and the underlying mechanisms of disease and toxicity. atBioNet integrates seven publicly available PPI databases into a network-specific knowledge base. Knowledge expansion is achieved by expanding a user supplied proteins/genes list with interactions from its integrated PPI network. The statistically significant functional modules are determined by applying a fast network-clustering algorithm (SCAN: a Structural Clustering Algorithm for Networks). The functional modules can be visualized either separately or together in the context of the whole network. Integration of pathway information enables enrichment analysis and assessment of the biological function of modules. Three case studies are presented using publicly available disease gene signatures as a basis to discover new biomarkers for acute leukemia, systemic lupus erythematosus, and breast cancer. The results demonstrated that atBioNet can not only identify functional modules and pathways related to the studied diseases, but this information can also be used to hypothesize novel biomarkers for future analysis. atBioNet is a free web-based network analysis tool that provides a systematic insight into proteins/genes interactions through examining significant functional modules. The identified functional modules are useful for determining underlying mechanisms of disease and biomarker discovery. It can be accessed at: http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm285284.htm.

  19. The BioGRID Interaction Database: 2011 update

    PubMed Central

    Stark, Chris; Breitkreutz, Bobby-Joe; Chatr-aryamontri, Andrew; Boucher, Lorrie; Oughtred, Rose; Livstone, Michael S.; Nixon, Julie; Van Auken, Kimberly; Wang, Xiaodong; Shi, Xiaoqi; Reguly, Teresa; Rust, Jennifer M.; Winter, Andrew; Dolinski, Kara; Tyers, Mike

    2011-01-01

    The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions. PMID:21071413

  20. Gene regulation knowledge commons: community action takes care of DNA binding transcription factors

    PubMed Central

    Tripathi, Sushil; Vercruysse, Steven; Chawla, Konika; Christie, Karen R.; Blake, Judith A.; Huntley, Rachael P.; Orchard, Sandra; Hermjakob, Henning; Thommesen, Liv; Lægreid, Astrid; Kuiper, Martin

    2016-01-01

    A large gap remains between the amount of knowledge in scientific literature and the fraction that gets curated into standardized databases, despite many curation initiatives. Yet the availability of comprehensive knowledge in databases is crucial for exploiting existing background knowledge, both for designing follow-up experiments and for interpreting new experimental data. Structured resources also underpin the computational integration and modeling of regulatory pathways, which further aids our understanding of regulatory dynamics. We argue how cooperation between the scientific community and professional curators can increase the capacity of capturing precise knowledge from literature. We demonstrate this with a project in which we mobilize biological domain experts who curate large amounts of DNA binding transcription factors, and show that they, although new to the field of curation, can make valuable contributions by harvesting reported knowledge from scientific papers. Such community curation can enhance the scientific epistemic process. Database URL: http://www.tfcheckpoint.org PMID:27270715

  1. The Chinchilla Research Resource Database: resource for an otolaryngology disease model

    PubMed Central

    Shimoyama, Mary; Smith, Jennifer R.; De Pons, Jeff; Tutaj, Marek; Khampang, Pawjai; Hong, Wenzhou; Erbe, Christy B.; Ehrlich, Garth D.; Bakaletz, Lauren O.; Kerschner, Joseph E.

    2016-01-01

    The long-tailed chinchilla (Chinchilla lanigera) is an established animal model for diseases of the inner and middle ear, among others. In particular, chinchilla is commonly used to study diseases involving viral and bacterial pathogens and polymicrobial infections of the upper respiratory tract and the ear, such as otitis media. The value of the chinchilla as a model for human diseases prompted the sequencing of its genome in 2012 and the more recent development of the Chinchilla Research Resource Database (http://crrd.mcw.edu) to provide investigators with easy access to relevant datasets and software tools to enhance their research. The Chinchilla Research Resource Database contains a complete catalog of genes for chinchilla and, for comparative purposes, human. Chinchilla genes can be viewed in the context of their genomic scaffold positions using the JBrowse genome browser. In contrast to the corresponding records at NCBI, individual gene reports at CRRD include functional annotations for Disease, Gene Ontology (GO) Biological Process, GO Molecular Function, GO Cellular Component and Pathway assigned to chinchilla genes based on annotations from the corresponding human orthologs. Data can be retrieved via keyword and gene-specific searches. Lists of genes with similar functional attributes can be assembled by leveraging the hierarchical structure of the Disease, GO and Pathway vocabularies through the Ontology Search and Browser tool. Such lists can then be further analyzed for commonalities using the Gene Annotator (GA) Tool. All data in the Chinchilla Research Resource Database is freely accessible and downloadable via the CRRD FTP site or using the download functions available in the search and analysis tools. The Chinchilla Research Resource Database is a rich resource for researchers using, or considering the use of, chinchilla as a model for human disease. Database URL: http://crrd.mcw.edu PMID:27173523

  2. Biological pathways and genetic mechanisms involved in social functioning.

    PubMed

    Ordoñana, Juan R; Bartels, Meike; Boomsma, Dorret I; Cella, David; Mosing, Miriam; Oliveira, Joao R; Patrick, Donald L; Veenhoven, Ruut; Wagner, Gert G; Sprangers, Mirjam A G

    2013-08-01

    To describe the major findings in the literature regarding associations between biological and genetic factors and social functioning, paying special attention to: (1) heritability studies on social functioning and related concepts; (2) hypothesized biological pathways and genetic variants that could be involved in social functioning, and (3) the implications of these results for quality-of-life research. A search of Web of Science and PubMed databases was conducted using combinations of the following keywords: genetics, twins, heritability, social functioning, social adjustment, social interaction, and social dysfunction. Variability in the definitions and measures of social functioning was extensive. Moderate to high heritability was reported for social functioning and related concepts, including prosocial behavior, loneliness, and extraversion. Disorders characterized by impairments in social functioning also show substantial heritability. Genetic variants hypothesized to be involved in social functioning are related to the network of brain structures and processes that are known to affect social cognition and behavior. Better knowledge and understanding about the impact of genetic factors on social functioning is needed to help us to attain a more comprehensive view of health-related quality-of-life (HRQOL) and will ultimately enhance our ability to identify those patients who are vulnerable to poor social functioning.

  3. Clique-based data mining for related genes in a biomedical database.

    PubMed

    Matsunaga, Tsutomu; Yonemori, Chikara; Tomita, Etsuji; Muramatsu, Masaaki

    2009-07-01

    Progress in the life sciences cannot be made without integrating biomedical knowledge on numerous genes in order to help formulate hypotheses on the genetic mechanisms behind various biological phenomena, including diseases. There is thus a strong need for a way to automatically and comprehensively search from biomedical databases for related genes, such as genes in the same families and genes encoding components of the same pathways. Here we address the extraction of related genes by searching for densely-connected subgraphs, which are modeled as cliques, in a biomedical relational graph. We constructed a graph whose nodes were gene or disease pages, and edges were the hyperlink connections between those pages in the Online Mendelian Inheritance in Man (OMIM) database. We obtained over 20,000 sets of related genes (called 'gene modules') by enumerating cliques computationally. The modules included genes in the same family, genes for proteins that form a complex, and genes for components of the same signaling pathway. The results of experiments using 'metabolic syndrome'-related gene modules show that the gene modules can be used to get a coherent holistic picture helpful for interpreting relations among genes. We presented a data mining approach extracting related genes by enumerating cliques. The extracted gene sets provide a holistic picture useful for comprehending complex disease mechanisms.

  4. Bioinformatics analysis for evaluation of the diagnostic potentialities of miR-19b, -125b and -205 as liquid biopsy markers of prostate cancer

    NASA Astrophysics Data System (ADS)

    Bryzgunova, O. E.; Lekchnov, E. A.; Zaripov, M. M.; Yurchenko, Yu. B.; Yarmoschuk, S. V.; Pashkovskaya, O. A.; Rykova, E. Yu.; Zheravin, A. A.; Laktionov, P. P.

    2017-09-01

    Presence of tumor-derived cell-free miRNA in biological fluids as well as simplicity and robustness of cell-free miRNA quantification makes them suitable markers for cancer diagnostics. Based on previously published data demonstrating diagnostic potentialities of miR-205 in blood and miR-19b as well as miR-125b in urine of prostate cancer patients, bioinformatics analysis was carried out to follow their involvement in prostate cancer development and select additional miRNA-markers for prostate cancer diagnostics. Studied miRNAs are involved in different signaling pathways and regulate a number of genes involved in cancer development. Five of their targets (CCND1, BRAF, CCNE1, CCNE2, RAF1), according to the STRING database, act as part of the same signaling pathway. RAF1 is regulated by miR-19b and miR-125b, and it was shown to be involved in prostate cancer development by DIANA and STRING databases. Thus, other microRNAs regulating RAF1 expression such as miR-16, -195, -497, and -7 (suggested by DIANA, TargetScan, MiRTarBase and miRDB databases) can potentially be regarded as prostate cancer markers.

  5. Comparison of transcripts in Phalaenopsis bellina and Phalaenopsis equestris (Orchidaceae) flowers to deduce monoterpene biosynthesis pathway

    PubMed Central

    Hsiao, Yu-Yun; Tsai, Wen-Chieh; Kuoh, Chang-Sheng; Huang, Tian-Hsiang; Wang, Hei-Chia; Wu, Tian-Shung; Leu, Yann-Lii; Chen, Wen-Huei; Chen, Hong-Hwa

    2006-01-01

    Background Floral scent is one of the important strategies for ensuring fertilization and for determining seed or fruit set. Research on plant scents has hampered mainly by the invisibility of this character, its dynamic nature, and complex mixtures of components that are present in very small quantities. Most progress in scent research, as in other areas of plant biology, has come from the use of molecular and biochemical techniques. Although volatile components have been identified in several orchid species, the biosynthetic pathways of orchid flower fragrance are far from understood. We investigated how flower fragrance was generated in certain Phalaenopsis orchids by determining the chemical components of the floral scent, identifying floral expressed-sequence-tags (ESTs), and deducing the pathways of floral scent biosynthesis in Phalaneopsis bellina by bioinformatics analysis. Results The main chemical components in the P. bellina flower were shown by gas chromatography-mass spectrometry to be monoterpenoids, benzenoids and phenylpropanoids. The set of floral scent producing enzymes in the biosynthetic pathway from glyceraldehyde-3-phosphate (G3P) to geraniol and linalool were recognized through data mining of the P. bellina floral EST database (dbEST). Transcripts preferentially expressed in P. bellina were distinguished by comparing the scent floral dbEST to that of a scentless species, P. equestris, and included those encoding lipoxygenase, epimerase, diacylglycerol kinase and geranyl diphosphate synthase. In addition, EST filtering results showed that transcripts encoding signal transduction and Myb transcription factors and methyltransferase, in addition to those for scent biosynthesis, were detected by in silico hybridization of the P. bellina unigene database against those of the scentless species, rice and Arabidopsis. Altogether, we pinpointed 66% of the biosynthetic steps from G3P to geraniol, linalool and their derivatives. Conclusion This systems biology program combined chemical analysis, genomics and bioinformatics to elucidate the scent biosynthesis pathway and identify the relevant genes. It integrates the forward and reverse genetic approaches to knowledge discovery by which researchers can study non-model plants. PMID:16836766

  6. CARFMAP: A Curated Pathway Map of Cardiac Fibroblasts.

    PubMed

    Nim, Hieu T; Furtado, Milena B; Costa, Mauro W; Kitano, Hiroaki; Rosenthal, Nadia A; Boyd, Sarah E

    2015-01-01

    The adult mammalian heart contains multiple cell types that work in unison under tightly regulated conditions to maintain homeostasis. Cardiac fibroblasts are a significant and unique population of non-muscle cells in the heart that have recently gained substantial interest in the cardiac biology community. To better understand this renaissance cell, it is essential to systematically survey what has been known in the literature about the cellular and molecular processes involved. We have built CARFMAP (http://visionet.erc.monash.edu.au/CARFMAP), an interactive cardiac fibroblast pathway map derived from the biomedical literature using a software-assisted manual data collection approach. CARFMAP is an information-rich interactive tool that enables cardiac biologists to explore the large body of literature in various creative ways. There is surprisingly little overlap between the cardiac fibroblast pathway map, a foreskin fibroblast pathway map, and a whole mouse organism signalling pathway map from the REACTOME database. Among the use cases of CARFMAP is a common task in our cardiac biology laboratory of identifying new genes that are (1) relevant to cardiac literature, and (2) differentially regulated in high-throughput assays. From the expression profiles of mouse cardiac and tail fibroblasts, we employed CARFMAP to characterise cardiac fibroblast pathways. Using CARFMAP in conjunction with transcriptomic data, we generated a stringent list of six genes that would not have been singled out using bioinformatics analyses alone. Experimental validation showed that five genes (Mmp3, Il6, Edn1, Pdgfc and Fgf10) are differentially regulated in the cardiac fibroblast. CARFMAP is a powerful tool for systems analyses of cardiac fibroblasts, facilitating systems-level cardiovascular research.

  7. Multiplatform serum metabolic phenotyping combined with pathway mapping to identify biochemical differences in smokers.

    PubMed

    Kaluarachchi, Manuja R; Boulangé, Claire L; Garcia-Perez, Isabel; Lindon, John C; Minet, Emmanuel F

    2016-10-01

    Determining perturbed biochemical functions associated with tobacco smoking should be helpful for establishing causal relationships between exposure and adverse events. A multiplatform comparison of serum of smokers (n = 55) and never-smokers (n = 57) using nuclear magnetic resonance spectroscopy, UPLC-MS and statistical modeling revealed clustering of the classes, distinguished by metabolic biomarkers. The identified metabolites were subjected to metabolic pathway enrichment, modeling adverse biological events using available databases. Perturbation of metabolites involved in chronic obstructive pulmonary disease, cardiovascular diseases and cancer were identified and discussed. Combining multiplatform metabolic phenotyping with knowledge-based mapping gives mechanistic insights into disease development, which can be applied to next-generation tobacco and nicotine products for comparative risk assessment.

  8. Getting the most out of parasitic helminth transcriptomes using HelmDB: implications for biology and biotechnology.

    PubMed

    Mangiola, Stefano; Young, Neil D; Korhonen, Pasi; Mondal, Alinda; Scheerlinck, Jean-Pierre; Sternberg, Paul W; Cantacessi, Cinzia; Hall, Ross S; Jex, Aaron R; Gasser, Robin B

    2013-12-01

    Compounded by a massive global food shortage, many parasitic diseases have a devastating, long-term impact on animal and human health and welfare worldwide. Parasitic helminths (worms) affect the health of billions of animals. Unlocking the systems biology of these neglected pathogens will underpin the design of new and improved interventions against them. Currently, the functional annotation of genomic and transcriptomic sequence data for socio-economically important parasitic worms relies almost exclusively on comparative bioinformatic analyses using model organism- and other databases. However, many genes and gene products of parasitic helminths (often >50%) cannot be annotated using this approach, because they are specific to parasites and/or do not have identifiable homologs in other organisms for which sequence data are available. This inability to fully annotate transcriptomes and predicted proteomes is a major challenge and constrains our understanding of the biology of parasites, interactions with their hosts and of parasitism and the pathogenesis of disease on a molecular level. In the present article, we compiled transcriptomic data sets of key, socioeconomically important parasitic helminths, and constructed and validated a curated database, called HelmDB (www.helmdb.org). We demonstrate how this database can be used effectively for the improvement of functional annotation by employing data integration and clustering. Importantly, HelmDB provides a practical and user-friendly toolkit for sequence browsing and comparative analyses among divergent helminth groups (including nematodes and trematodes), and should be readily adaptable and applicable to a wide range of other organisms. This web-based, integrative database should assist 'systems biology' studies of parasitic helminths, and the discovery and prioritization of novel drug and vaccine targets. This focus provides a pathway toward developing new and improved approaches for the treatment and control of parasitic diseases, with the potential for important biotechnological outcomes. Copyright © 2012 Elsevier Inc. All rights reserved.

  9. Simulation of a Petri net-based model of the terpenoid biosynthesis pathway.

    PubMed

    Hawari, Aliah Hazmah; Mohamed-Hussein, Zeti-Azura

    2010-02-09

    The development and simulation of dynamic models of terpenoid biosynthesis has yielded a systems perspective that provides new insights into how the structure of this biochemical pathway affects compound synthesis. These insights may eventually help identify reactions that could be experimentally manipulated to amplify terpenoid production. In this study, a dynamic model of the terpenoid biosynthesis pathway was constructed based on the Hybrid Functional Petri Net (HFPN) technique. This technique is a fusion of three other extended Petri net techniques, namely Hybrid Petri Net (HPN), Dynamic Petri Net (HDN) and Functional Petri Net (FPN). The biological data needed to construct the terpenoid metabolic model were gathered from the literature and from biological databases. These data were used as building blocks to create an HFPNe model and to generate parameters that govern the global behaviour of the model. The dynamic model was simulated and validated against known experimental data obtained from extensive literature searches. The model successfully simulated metabolite concentration changes over time (pt) and the observations correlated with known data. Interactions between the intermediates that affect the production of terpenes could be observed through the introduction of inhibitors that established feedback loops within and crosstalk between the pathways. Although this metabolic model is only preliminary, it will provide a platform for analysing various high-throughput data, and it should lead to a more holistic understanding of terpenoid biosynthesis.

  10. Transcriptome profiling identified differentially expressed genes and pathways associated with tamoxifen resistance in human breast cancer

    PubMed Central

    Men, Xin; Ma, Jun; Wu, Tong; Pu, Junyi; Wen, Shaojia; Shen, Jianfeng; Wang, Xun; Wang, Yamin; Chen, Chao; Dai, Penggao

    2018-01-01

    Tamoxifen (TAM) resistance is an important clinical problem in the treatment of breast cancer. In order to identify the mechanism of TAM resistance for estrogen receptor (ER)-positive breast cancer, we screened the transcriptome using RNA-seq and compared the gene expression profiles between the MCF-7 mamma carcinoma cell line and the TAM-resistant cell line TAMR/MCF-7, 52 significant differential expression genes (DEGs) were identified including SLIT2, ROBO, LHX, KLF, VEGFC, BAMBI, LAMA1, FLT4, PNMT, DHRS2, MAOA and ALDH. The DEGs were annotated in the GO, COG and KEGG databases. Annotation of the function of the DEGs in the KEGG database revealed the top three pathways enriched with the most DEGs, including pathways in cancer, the PI3K-AKT pathway, and focal adhesion. Then we compared the gene expression profiles between the Clinical progressive disease (PD) and the complete response (CR) from the cancer genome altas (TCGA). 10 common DEGs were identified through combining the clinical and cellular analysis results. Protein-protein interaction network was applied to analyze the association of ER signal pathway with the 10 DEGs. 3 significant genes (GFRA3, NPY1R and PTPRN2) were closely related to ER related pathway. These significant DEGs regulated many biological activities such as cell proliferation and survival, motility and migration, and tumor cell invasion. The interactions between these DEGs and drug resistance phenomenon need to be further elucidated at a functional level in further studies. Based on our findings, we believed that these DEGs could be therapeutic targets, which can be explored to develop new treatment options. PMID:29423105

  11. Plasma Glycoproteomics Reveals Sepsis Outcomes Linked to Distinct Proteins in Common Pathways

    PubMed Central

    DeLeon-Pennell, Kristine Y.; Nguyen, Nguyen T.; de Castro Brás, Lisandra E.; Flynn, Elizabeth R.; Cannon, Presley L.; Griswold, Michael E.; Jin, Yu-Fang; Puskarich, Michael A.; Jones, Alan E.; Lindsey, Merry L.

    2015-01-01

    Objective Sepsis remains a predominant cause of mortality in the ICU, yet strategies to increase survival have proved largely unsuccessful. This study aimed to identify proteins linked to sepsis outcomes using a glycoproteomic approach to target extracellular proteins that trigger downstream pathways and direct patient outcomes. Design Plasma was obtained from the LacTATEs cohort. N-linked plasma glycopeptides were quantified by solid-phase extraction coupled with mass spectrometry. Glycopeptides were assigned to proteins using RefSeq and visualized in a heat map. Protein differences were validated by immunoblotting, and proteins were mapped for biological processes using Database for Annotation, Visualization and Integrated Discovery and for functional pathways using Kyoto Encyclopedia of Genes and Genomes databases. Setting Hospitalized care. Measurements and Main Results A total of 501 glycopeptides corresponding to 234 proteins were identified. Of these, 66 glycopeptides were unique to the survivor group and corresponded to 54 proteins, 60 were unique to the nonsurvivor group and corresponded to 43 proteins, and 375 were common responses between groups and corresponded to 137 proteins. Immunoblotting showed that nonsurvivors had increased total kininogen; decreased total cathepsin-L1, vascular cell adhesion molecule, periostin, and neutrophil gelatinase–associated lipocalin; and a two-fold decrease in glycosylated clusterin (all p < 0.05). Kyoto Encyclopedia of Genes and Genomes analysis identified six enriched pathways. Interestingly, survivors relied on the extrinsic pathway of the complement and coagulation cascade, whereas nonsurvivors relied on the intrinsic pathway. Conclusion This study identifies proteins linked to patient outcomes and provides insight into unexplored mechanisms that can be investigated for the identification of novel therapeutic targets. (Crit Care Med 2015; XX:00–00) PMID:26086942

  12. ChemiRs: a web application for microRNAs and chemicals.

    PubMed

    Su, Emily Chia-Yu; Chen, Yu-Sing; Tien, Yun-Cheng; Liu, Jeff; Ho, Bing-Ching; Yu, Sung-Liang; Singh, Sher

    2016-04-18

    MicroRNAs (miRNAs) are about 22 nucleotides, non-coding RNAs that affect various cellular functions, and play a regulatory role in different organisms including human. Until now, more than 2500 mature miRNAs in human have been discovered and registered, but still lack of information or algorithms to reveal the relations among miRNAs, environmental chemicals and human health. Chemicals in environment affect our health and daily life, and some of them can lead to diseases by inferring biological pathways. We develop a creditable online web server, ChemiRs, for predicting interactions and relations among miRNAs, chemicals and pathways. The database not only compares gene lists affected by chemicals and miRNAs, but also incorporates curated pathways to identify possible interactions. Here, we manually retrieved associations of miRNAs and chemicals from biomedical literature. We developed an online system, ChemiRs, which contains miRNAs, diseases, Medical Subject Heading (MeSH) terms, chemicals, genes, pathways and PubMed IDs. We connected each miRNA to miRBase, and every current gene symbol to HUGO Gene Nomenclature Committee (HGNC) for genome annotation. Human pathway information is also provided from KEGG and REACTOME databases. Information about Gene Ontology (GO) is queried from GO Online SQL Environment (GOOSE). With a user-friendly interface, the web application is easy to use. Multiple query results can be easily integrated and exported as report documents in PDF format. Association analysis of miRNAs and chemicals can help us understand the pathogenesis of chemical components. ChemiRs is freely available for public use at http://omics.biol.ntnu.edu.tw/ChemiRs .

  13. BioNetSim: a Petri net-based modeling tool for simulations of biochemical processes.

    PubMed

    Gao, Junhui; Li, Li; Wu, Xiaolin; Wei, Dong-Qing

    2012-03-01

    BioNetSim, a Petri net-based software for modeling and simulating biochemistry processes, is developed, whose design and implement are presented in this paper, including logic construction, real-time access to KEGG (Kyoto Encyclopedia of Genes and Genomes), and BioModel database. Furthermore, glycolysis is simulated as an example of its application. BioNetSim is a helpful tool for researchers to download data, model biological network, and simulate complicated biochemistry processes. Gene regulatory networks, metabolic pathways, signaling pathways, and kinetics of cell interaction are all available in BioNetSim, which makes modeling more efficient and effective. Similar to other Petri net-based softwares, BioNetSim does well in graphic application and mathematic construction. Moreover, it shows several powerful predominances. (1) It creates models in database. (2) It realizes the real-time access to KEGG and BioModel and transfers data to Petri net. (3) It provides qualitative analysis, such as computation of constants. (4) It generates graphs for tracing the concentration of every molecule during the simulation processes.

  14. HMDB 3.0--The Human Metabolome Database in 2013.

    PubMed

    Wishart, David S; Jewison, Timothy; Guo, An Chi; Wilson, Michael; Knox, Craig; Liu, Yifeng; Djoumbou, Yannick; Mandal, Rupasri; Aziat, Farid; Dong, Edison; Bouatra, Souhaila; Sinelnikov, Igor; Arndt, David; Xia, Jianguo; Liu, Philip; Yallou, Faizath; Bjorndahl, Trent; Perez-Pineiro, Rolando; Eisner, Roman; Allen, Felicity; Neveu, Vanessa; Greiner, Russ; Scalbert, Augustin

    2013-01-01

    The Human Metabolome Database (HMDB) (www.hmdb.ca) is a resource dedicated to providing scientists with the most current and comprehensive coverage of the human metabolome. Since its first release in 2007, the HMDB has been used to facilitate research for nearly 1000 published studies in metabolomics, clinical biochemistry and systems biology. The most recent release of HMDB (version 3.0) has been significantly expanded and enhanced over the 2009 release (version 2.0). In particular, the number of annotated metabolite entries has grown from 6500 to more than 40,000 (a 600% increase). This enormous expansion is a result of the inclusion of both 'detected' metabolites (those with measured concentrations or experimental confirmation of their existence) and 'expected' metabolites (those for which biochemical pathways are known or human intake/exposure is frequent but the compound has yet to be detected in the body). The latest release also has greatly increased the number of metabolites with biofluid or tissue concentration data, the number of compounds with reference spectra and the number of data fields per entry. In addition to this expansion in data quantity, new database visualization tools and new data content have been added or enhanced. These include better spectral viewing tools, more powerful chemical substructure searches, an improved chemical taxonomy and better, more interactive pathway maps. This article describes these enhancements to the HMDB, which was previously featured in the 2009 NAR Database Issue. (Note to referees, HMDB 3.0 will go live on 18 September 2012.).

  15. A Novel Biclustering Approach to Association Rule Mining for Predicting HIV-1–Human Protein Interactions

    PubMed Central

    Mukhopadhyay, Anirban; Maulik, Ujjwal; Bandyopadhyay, Sanghamitra

    2012-01-01

    Identification of potential viral-host protein interactions is a vital and useful approach towards development of new drugs targeting those interactions. In recent days, computational tools are being utilized for predicting viral-host interactions. Recently a database containing records of experimentally validated interactions between a set of HIV-1 proteins and a set of human proteins has been published. The problem of predicting new interactions based on this database is usually posed as a classification problem. However, posing the problem as a classification one suffers from the lack of biologically validated negative interactions. Therefore it will be beneficial to use the existing database for predicting new viral-host interactions without the need of negative samples. Motivated by this, in this article, the HIV-1–human protein interaction database has been analyzed using association rule mining. The main objective is to identify a set of association rules both among the HIV-1 proteins and among the human proteins, and use these rules for predicting new interactions. In this regard, a novel association rule mining technique based on biclustering has been proposed for discovering frequent closed itemsets followed by the association rules from the adjacency matrix of the HIV-1–human interaction network. Novel HIV-1–human interactions have been predicted based on the discovered association rules and tested for biological significance. For validation of the predicted new interactions, gene ontology-based and pathway-based studies have been performed. These studies show that the human proteins which are predicted to interact with a particular viral protein share many common biological activities. Moreover, literature survey has been used for validation purpose to identify some predicted interactions that are already validated experimentally but not present in the database. Comparison with other prediction methods is also discussed. PMID:22539940

  16. Incorporating ToxCast and Tox21 datasets to rank biological activity of chemicals at Superfund sites in North Carolina.

    PubMed

    Tilley, Sloane K; Reif, David M; Fry, Rebecca C

    2017-04-01

    The Superfund program of the Environmental Protection Agency (EPA) was established in 1980 to address public health concerns posed by toxic substances released into the environment in the United States. Forty-two of the 1328 hazardous waste sites that remain on the Superfund National Priority List are located in the state of North Carolina. We set out to develop a database that contained information on both the prevalence and biological activity of chemicals present at Superfund sites in North Carolina. A chemical characterization tool, the Toxicological Priority Index (ToxPi), was used to rank the biological activity of these chemicals based on their predicted bioavailability, documented associations with biological pathways, and activity in in vitro assays of the ToxCast and Tox21 programs. The ten most prevalent chemicals found at North Carolina Superfund sites were chromium, trichloroethene, lead, tetrachloroethene, arsenic, benzene, manganese, 1,2-dichloroethane, nickel, and barium. For all chemicals found at North Carolina Superfund sites, ToxPi analysis was used to rank their biological activity. Through this data integration, residual pesticides and organic solvents were identified to be some of the most highly-ranking predicted bioactive chemicals. This study provides a novel methodology for creating state or regional databases of biological activity of contaminants at Superfund sites. These data represent a novel integrated profile of the most prevalent chemicals at North Carolina Superfund sites. This information, and the associated methodology, is useful to toxicologists, risk assessors, and the communities living in close proximity to these sites. Copyright © 2016. Published by Elsevier Ltd.

  17. Molecular and comparative genetics of mental retardation.

    PubMed Central

    Inlow, Jennifer K; Restifo, Linda L

    2004-01-01

    Affecting 1-3% of the population, mental retardation (MR) poses significant challenges for clinicians and scientists. Understanding the biology of MR is complicated by the extraordinary heterogeneity of genetic MR disorders. Detailed analyses of >1000 Online Mendelian Inheritance in Man (OMIM) database entries and literature searches through September 2003 revealed 282 molecularly identified MR genes. We estimate that hundreds more MR genes remain to be identified. A novel test, in which we distributed unmapped MR disorders proportionately across the autosomes, failed to eliminate the well-known X-chromosome overrepresentation of MR genes and candidate genes. This evidence argues against ascertainment bias as the main cause of the skewed distribution. On the basis of a synthesis of clinical and laboratory data, we developed a biological functions classification scheme for MR genes. Metabolic pathways, signaling pathways, and transcription are the most common functions, but numerous other aspects of neuronal and glial biology are controlled by MR genes as well. Using protein sequence and domain-organization comparisons, we found a striking conservation of MR genes and genetic pathways across the approximately 700 million years that separate Homo sapiens and Drosophila melanogaster. Eighty-seven percent have one or more fruit fly homologs and 76% have at least one candidate functional ortholog. We propose that D. melanogaster can be used in a systematic manner to study MR and possibly to develop bioassays for therapeutic drug discovery. We selected 42 Drosophila orthologs as most likely to reveal molecular and cellular mechanisms of nervous system development or plasticity relevant to MR. PMID:15020472

  18. Biological profiling and dose-response modeling tools ...

    EPA Pesticide Factsheets

    Through its ToxCast project, the U.S. EPA has developed a battery of in vitro high throughput screening (HTS) assays designed to assess the potential toxicity of environmental chemicals. At present, over 1800 chemicals have been tested in up to 600 assays, yielding a large number of concentration-response data sets. Standard processing of these data sets involves finding a best fitting mathematical model and set of model parameters that specify this model. The model parameters include quantities such as the half-maximal activity concentration (or “AC50”) that have biological significance and can be used to inform the efficacy or potency of a given chemical with respect to a given assay. All of this data is processed and stored in an online-accessible database and website: http://actor.epa.gov/dashboard2. Results from these in vitro assays are used in a multitude of ways. New pathways and targets can be identified and incorporated into new or existing adverse outcome pathways (AOPs). Pharmacokinetic models such as those implemented EPA’s HTTK R package can be used to translate an in vitro concentration into an in vivo dose; i.e., one can predict the oral equivalent dose that might be expected to activate a specific biological pathway. Such predicted values can then be compared with estimated actual human exposures prioritize chemicals for further testing.Any quantitative examination should be accompanied by estimation of uncertainty. We are developing met

  19. ReNE: A Cytoscape Plugin for Regulatory Network Enhancement

    PubMed Central

    Politano, Gianfranco; Benso, Alfredo; Savino, Alessandro; Di Carlo, Stefano

    2014-01-01

    One of the biggest challenges in the study of biological regulatory mechanisms is the integration, americanmodeling, and analysis of the complex interactions which take place in biological networks. Despite post transcriptional regulatory elements (i.e., miRNAs) are widely investigated in current research, their usage and visualization in biological networks is very limited. Regulatory networks are commonly limited to gene entities. To integrate networks with post transcriptional regulatory data, researchers are therefore forced to manually resort to specific third party databases. In this context, we introduce ReNE, a Cytoscape 3.x plugin designed to automatically enrich a standard gene-based regulatory network with more detailed transcriptional, post transcriptional, and translational data, resulting in an enhanced network that more precisely models the actual biological regulatory mechanisms. ReNE can automatically import a network layout from the Reactome or KEGG repositories, or work with custom pathways described using a standard OWL/XML data format that the Cytoscape import procedure accepts. Moreover, ReNE allows researchers to merge multiple pathways coming from different sources. The merged network structure is normalized to guarantee a consistent and uniform description of the network nodes and edges and to enrich all integrated data with additional annotations retrieved from genome-wide databases like NCBI, thus producing a pathway fully manageable through the Cytoscape environment. The normalized network is then analyzed to include missing transcription factors, miRNAs, and proteins. The resulting enhanced network is still a fully functional Cytoscape network where each regulatory element (transcription factor, miRNA, gene, protein) and regulatory mechanism (up-regulation/down-regulation) is clearly visually identifiable, thus enabling a better visual understanding of its role and the effect in the network behavior. The enhanced network produced by ReNE is exportable in multiple formats for further analysis via third party applications. ReNE can be freely installed from the Cytoscape App Store (http://apps.cytoscape.org/apps/rene) and the full source code is freely available for download through a SVN repository accessible at http://www.sysbio.polito.it/tools_svn/BioInformatics/Rene/releases/. ReNE enhances a network by only integrating data from public repositories, without any inference or prediction. The reliability of the introduced interactions only depends on the reliability of the source data, which is out of control of ReNe developers. PMID:25541727

  20. Graphite Web: web tool for gene set analysis exploiting pathway topology

    PubMed Central

    Sales, Gabriele; Calura, Enrica; Martini, Paolo; Romualdi, Chiara

    2013-01-01

    Graphite web is a novel web tool for pathway analyses and network visualization for gene expression data of both microarray and RNA-seq experiments. Several pathway analyses have been proposed either in the univariate or in the global and multivariate context to tackle the complexity and the interpretation of expression results. These methods can be further divided into ‘topological’ and ‘non-topological’ methods according to their ability to gain power from pathway topology. Biological pathways are, in fact, not only gene lists but can be represented through a network where genes and connections are, respectively, nodes and edges. To this day, the most used approaches are non-topological and univariate although they miss the relationship among genes. On the contrary, topological and multivariate approaches are more powerful, but difficult to be used by researchers without bioinformatic skills. Here we present Graphite web, the first public web server for pathway analysis on gene expression data that combines topological and multivariate pathway analyses with an efficient system of interactive network visualizations for easy results interpretation. Specifically, Graphite web implements five different gene set analyses on three model organisms and two pathway databases. Graphite Web is freely available at http://graphiteweb.bio.unipd.it/. PMID:23666626

  1. Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data

    PubMed Central

    Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia

    2015-01-01

    Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/ PMID:26363020

  2. Network Analysis of Human Genes Influencing Susceptibility to Mycobacterial Infections

    PubMed Central

    Lipner, Ettie M.; Garcia, Benjamin J.; Strong, Michael

    2016-01-01

    Tuberculosis and nontuberculous mycobacterial infections constitute a high burden of pulmonary disease in humans, resulting in over 1.5 million deaths per year. Building on the premise that genetic factors influence the instance, progression, and defense of infectious disease, we undertook a systems biology approach to investigate relationships among genetic factors that may play a role in increased susceptibility or control of mycobacterial infections. We combined literature and database mining with network analysis and pathway enrichment analysis to examine genes, pathways, and networks, involved in the human response to Mycobacterium tuberculosis and nontuberculous mycobacterial infections. This approach allowed us to examine functional relationships among reported genes, and to identify novel genes and enriched pathways that may play a role in mycobacterial susceptibility or control. Our findings suggest that the primary pathways and genes influencing mycobacterial infection control involve an interplay between innate and adaptive immune proteins and pathways. Signaling pathways involved in autoimmune disease were significantly enriched as revealed in our networks. Mycobacterial disease susceptibility networks were also examined within the context of gene-chemical relationships, in order to identify putative drugs and nutrients with potential beneficial immunomodulatory or anti-mycobacterial effects. PMID:26751573

  3. Gene expression analysis of colorectal cancer by bioinformatics strategy.

    PubMed

    Cui, Meng; Yuan, Junhua; Li, Jun; Sun, Bing; Li, Tao; Li, Yuantao; Wu, Guoliang

    2014-10-01

    We used bioinformatics technology to analyze gene expression profiles involved in colorectal cancer tissue samples and healthy controls. In this paper, we downloaded the gene expression profile GSE4107 from Gene Expression Omnibus (GEO) database, in which a total of 22 chips were available, including normal colonic mucosa tissue from normal healthy donors (n=10), colorectal cancer tissue samples from colorectal patients (n=33). To further understand the biological functions of the screened DGEs, the KEGG pathway enrichment analysis were conducted. Then we built a transcriptome network to study differentially co-expressed links. A total of 3151 DEGs of CRC were selected. Besides, total 164 DCGs (Differentially Coexpressed Gene, DCG) and 29279 DCLs (Differentially Co-expressed Link, DCL) were obtained. Furthermore, the significantly enriched KEGG pathways were Endocytosis, Calcium signaling pathway, Vascular smooth muscle contraction, Linoleic acid metabolism, Arginine and proline metabolism, Inositol phosphate metabolism and MAPK signaling pathway. Our results show that the generation of CRC involves multiple genes, TFs and pathways. Several signal and immune pathways are linked to CRC and give us more clues in the process of CRC. Hence, our work would pave ways for novel diagnosis of CRC, and provided theoretical guidance into cancer therapy.

  4. Freshwater Biological Traits Database (Data Sources)

    EPA Science Inventory

    When EPA release the final report, Freshwater Biological Traits Database, it referenced numerous data sources that are included below. The Traits Database report covers the development of a database of freshwater biological traits with additional traits that are relevan...

  5. Assessing co-regulation of directly linked genes in biological networks using microarray time series analysis.

    PubMed

    Del Sorbo, Maria Rosaria; Balzano, Walter; Donato, Michele; Draghici, Sorin

    2013-11-01

    Differential expression of genes detected with the analysis of high throughput genomic experiments is a commonly used intermediate step for the identification of signaling pathways involved in the response to different biological conditions. The impact analysis was the first approach for the analysis of signaling pathways involved in a certain biological process that was able to take into account not only the magnitude of the expression change of the genes but also the topology of signaling pathways including the type of each interactions between the genes. In the impact analysis, signaling pathways are represented as weighted directed graphs with genes as nodes and the interactions between genes as edges. Edges weights are represented by a β factor, the regulatory efficiency, which is assumed to be equal to 1 in inductive interactions between genes and equal to -1 in repressive interactions. This study presents a similarity analysis between gene expression time series aimed to find correspondences with the regulatory efficiency, i.e. the β factor as found in a widely used pathway database. Here, we focused on correlations among genes directly connected in signaling pathways, assuming that the expression variations of upstream genes impact immediately downstream genes in a short time interval and without significant influences by the interactions with other genes. Time series were processed using three different similarity metrics. The first metric is based on the bit string matching; the second one is a specific application of the Dynamic Time Warping to detect similarities even in presence of stretching and delays; the third one is a quantitative comparative analysis resulting by an evaluation of frequency domain representation of time series: the similarity metric is the correlation between dominant spectral components. These three approaches are tested on real data and pathways, and a comparison is performed using Information Retrieval benchmark tools, indicating the frequency approach as the best similarity metric among the three, for its ability to detect the correlation based on the correspondence of the most significant frequency components. Copyright © 2013. Published by Elsevier Ireland Ltd.

  6. MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics

    PubMed Central

    Schoof, Heiko; Ernst, Rebecca; Nazarov, Vladimir; Pfeifer, Lukas; Mewes, Hans-Werner; Mayer, Klaus F. X.

    2004-01-01

    Arabidopsis thaliana is the most widely studied model plant. Functional genomics is intensively underway in many laboratories worldwide. Beyond the basic annotation of the primary sequence data, the annotated genetic elements of Arabidopsis must be linked to diverse biological data and higher order information such as metabolic or regulatory pathways. The MIPS Arabidopsis thaliana database MAtDB aims to provide a comprehensive resource for Arabidopsis as a genome model that serves as a primary reference for research in plants and is suitable for transfer of knowledge to other plants, especially crops. The genome sequence as a common backbone serves as a scaffold for the integration of data, while, in a complementary effort, these data are enhanced through the application of state-of-the-art bioinformatics tools. This information is visualized on a genome-wide and a gene-by-gene basis with access both for web users and applications. This report updates the information given in a previous report and provides an outlook on further developments. The MAtDB web interface can be accessed at http://mips.gsf.de/proj/thal/db. PMID:14681437

  7. Candidate genetic pathways for attention-deficit/hyperactivity disorder (ADHD) show association to hyperactive/impulsive symptoms in children with ADHD.

    PubMed

    Bralten, Janita; Franke, Barbara; Waldman, Irwin; Rommelse, Nanda; Hartman, Catharina; Asherson, Philip; Banaschewski, Tobias; Ebstein, Richard P; Gill, Michael; Miranda, Ana; Oades, Robert D; Roeyers, Herbert; Rothenberger, Aribert; Sergeant, Joseph A; Oosterlaan, Jaap; Sonuga-Barke, Edmund; Steinhausen, Hans-Christoph; Faraone, Stephen V; Buitelaar, Jan K; Arias-Vásquez, Alejandro

    2013-11-01

    Because multiple genes with small effect sizes are assumed to play a role in attention-deficit/hyperactivity disorder (ADHD) etiology, considering multiple variants within the same analysis likely increases the total explained phenotypic variance, thereby boosting the power of genetic studies. This study investigated whether pathway-based analysis could bring scientists closer to unraveling the biology of ADHD. The pathway was described as a predefined gene selection based on a well-established database or literature data. Common genetic variants in pathways involved in dopamine/norepinephrine and serotonin neurotransmission and genes involved in neuritic outgrowth were investigated in cases from the International Multicentre ADHD Genetics (IMAGE) study. Multivariable analysis was performed to combine the effects of single genetic variants within the pathway genes. Phenotypes were DSM-IV symptom counts for inattention and hyperactivity/impulsivity (n = 871) and symptom severity measured with the Conners Parent (n = 930) and Teacher (n = 916) Rating Scales. Summing genetic effects of common genetic variants within the pathways showed a significant association with hyperactive/impulsive symptoms ((p)empirical = .007) but not with inattentive symptoms ((p)empirical = .73). Analysis of parent-rated Conners hyperactive/impulsive symptom scores validated this result ((p)empirical = .0018). Teacher-rated Conners scores were not associated. Post hoc analyses showed a significant contribution of all pathways to the hyperactive/impulsive symptom domain (dopamine/norepinephrine, (p)empirical = .0004; serotonin, (p)empirical = .0149; neuritic outgrowth, (p)empirical = .0452). The present analysis shows an association between common variants in 3 genetic pathways and the hyperactive/impulsive component of ADHD. This study demonstrates that pathway-based association analyses, using quantitative measurements of ADHD symptom domains, can increase the power of genetic analyses to identify biological risk factors involved in this disorder. Copyright © 2013 American Academy of Child and Adolescent Psychiatry. Published by Elsevier Inc. All rights reserved.

  8. Informatics for Metabolomics.

    PubMed

    Kusonmano, Kanthida; Vongsangnak, Wanwipa; Chumnanpuen, Pramote

    2016-01-01

    Metabolome profiling of biological systems has the powerful ability to provide the biological understanding of their metabolic functional states responding to the environmental factors or other perturbations. Tons of accumulative metabolomics data have thus been established since pre-metabolomics era. This is directly influenced by the high-throughput analytical techniques, especially mass spectrometry (MS)- and nuclear magnetic resonance (NMR)-based techniques. Continuously, the significant numbers of informatics techniques for data processing, statistical analysis, and data mining have been developed. The following tools and databases are advanced for the metabolomics society which provide the useful metabolomics information, e.g., the chemical structures, mass spectrum patterns for peak identification, metabolite profiles, biological functions, dynamic metabolite changes, and biochemical transformations of thousands of small molecules. In this chapter, we aim to introduce overall metabolomics studies from pre- to post-metabolomics era and their impact on society. Directing on post-metabolomics era, we provide a conceptual framework of informatics techniques for metabolomics and show useful examples of techniques, tools, and databases for metabolomics data analysis starting from preprocessing toward functional interpretation. Throughout the framework of informatics techniques for metabolomics provided, it can be further used as a scaffold for translational biomedical research which can thus lead to reveal new metabolite biomarkers, potential metabolic targets, or key metabolic pathways for future disease therapy.

  9. A systems biology-led insight into the role of the proteome in neurodegenerative diseases.

    PubMed

    Fasano, Mauro; Monti, Chiara; Alberio, Tiziana

    2016-09-01

    Multifactorial disorders are the result of nonlinear interactions of several factors; therefore, a reductionist approach does not appear to be appropriate. Proteomics is a global approach that can be efficiently used to investigate pathogenetic mechanisms of neurodegenerative diseases. Here, we report a general introduction about the systems biology approach and mechanistic insights recently obtained by over-representation analysis of proteomics data of cellular and animal models of Alzheimer's disease, Parkinson's disease and other neurodegenerative disorders, as well as of affected human tissues. Expert commentary: As an inductive method, proteomics is based on unbiased observations that further require validation of generated hypotheses. Pathway databases and over-representation analysis tools allow researchers to assign an expectation value to pathogenetic mechanisms linked to neurodegenerative diseases. The systems biology approach based on omics data may be the key to unravel the complex mechanisms underlying neurodegeneration.

  10. ARMOUR - A Rice miRNA: mRNA Interaction Resource.

    PubMed

    Sanan-Mishra, Neeti; Tripathi, Anita; Goswami, Kavita; Shukla, Rohit N; Vasudevan, Madavan; Goswami, Hitesh

    2018-01-01

    ARMOUR was developed as A Rice miRNA:mRNA interaction resource. This informative and interactive database includes the experimentally validated expression profiles of miRNAs under different developmental and abiotic stress conditions across seven Indian rice cultivars. This comprehensive database covers 689 known and 1664 predicted novel miRNAs and their expression profiles in more than 38 different tissues or conditions along with their predicted/known target transcripts. The understanding of miRNA:mRNA interactome in regulation of functional cellular machinery is supported by the sequence information of the mature and hairpin structures. ARMOUR provides flexibility to users in querying the database using multiple ways like known gene identifiers, gene ontology identifiers, KEGG identifiers and also allows on the fly fold change analysis and sequence search query with inbuilt BLAST algorithm. ARMOUR database provides a cohesive platform for novel and mature miRNAs and their expression in different experimental conditions and allows searching for their interacting mRNA targets, GO annotation and their involvement in various biological pathways. The ARMOUR database includes a provision for adding more experimental data from users, with an aim to develop it as a platform for sharing and comparing experimental data contributed by research groups working on rice.

  11. Multi-membership gene regulation in pathway based microarray analysis

    PubMed Central

    2011-01-01

    Background Gene expression analysis has been intensively researched for more than a decade. Recently, there has been elevated interest in the integration of microarray data analysis with other types of biological knowledge in a holistic analytical approach. We propose a methodology that can be facilitated for pathway based microarray data analysis, based on the observation that a substantial proportion of genes present in biochemical pathway databases are members of a number of distinct pathways. Our methodology aims towards establishing the state of individual pathways, by identifying those truly affected by the experimental conditions based on the behaviour of such genes. For that purpose it considers all the pathways in which a gene participates and the general census of gene expression per pathway. Results We utilise hill climbing, simulated annealing and a genetic algorithm to analyse the consistency of the produced results, through the application of fuzzy adjusted rand indexes and hamming distance. All algorithms produce highly consistent genes to pathways allocations, revealing the contribution of genes to pathway functionality, in agreement with current pathway state visualisation techniques, with the simulated annealing search proving slightly superior in terms of efficiency. Conclusions We show that the expression values of genes, which are members of a number of biochemical pathways or modules, are the net effect of the contribution of each gene to these biochemical processes. We show that by manipulating the pathway and module contribution of such genes to follow underlying trends we can interpret microarray results centred on the behaviour of these genes. PMID:21939531

  12. Multi-membership gene regulation in pathway based microarray analysis.

    PubMed

    Pavlidis, Stelios P; Payne, Annette M; Swift, Stephen M

    2011-09-22

    Gene expression analysis has been intensively researched for more than a decade. Recently, there has been elevated interest in the integration of microarray data analysis with other types of biological knowledge in a holistic analytical approach. We propose a methodology that can be facilitated for pathway based microarray data analysis, based on the observation that a substantial proportion of genes present in biochemical pathway databases are members of a number of distinct pathways. Our methodology aims towards establishing the state of individual pathways, by identifying those truly affected by the experimental conditions based on the behaviour of such genes. For that purpose it considers all the pathways in which a gene participates and the general census of gene expression per pathway. We utilise hill climbing, simulated annealing and a genetic algorithm to analyse the consistency of the produced results, through the application of fuzzy adjusted rand indexes and hamming distance. All algorithms produce highly consistent genes to pathways allocations, revealing the contribution of genes to pathway functionality, in agreement with current pathway state visualisation techniques, with the simulated annealing search proving slightly superior in terms of efficiency. We show that the expression values of genes, which are members of a number of biochemical pathways or modules, are the net effect of the contribution of each gene to these biochemical processes. We show that by manipulating the pathway and module contribution of such genes to follow underlying trends we can interpret microarray results centred on the behaviour of these genes.

  13. The Pathway Tools software.

    PubMed

    Karp, Peter D; Paley, Suzanne; Romero, Pedro

    2002-01-01

    Bioinformatics requires reusable software tools for creating model-organism databases (MODs). The Pathway Tools is a reusable, production-quality software environment for creating a type of MOD called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc (see http://ecocyc.org) integrates our evolving understanding of the genes, proteins, metabolic network, and genetic network of an organism. This paper provides an overview of the four main components of the Pathway Tools: The PathoLogic component supports creation of new PGDBs from the annotated genome of an organism. The Pathway/Genome Navigator provides query, visualization, and Web-publishing services for PGDBs. The Pathway/Genome Editors support interactive updating of PGDBs. The Pathway Tools ontology defines the schema of PGDBs. The Pathway Tools makes use of the Ocelot object database system for data management services for PGDBs. The Pathway Tools has been used to build PGDBs for 13 organisms within SRI and by external users.

  14. Kinase Pathway Database: An Integrated Protein-Kinase and NLP-Based Protein-Interaction Resource

    PubMed Central

    Koike, Asako; Kobayashi, Yoshiyuki; Takagi, Toshihisa

    2003-01-01

    Protein kinases play a crucial role in the regulation of cellular functions. Various kinds of information about these molecules are important for understanding signaling pathways and organism characteristics. We have developed the Kinase Pathway Database, an integrated database involving major completely sequenced eukaryotes. It contains the classification of protein kinases and their functional conservation, ortholog tables among species, protein–protein, protein–gene, and protein–compound interaction data, domain information, and structural information. It also provides an automatic pathway graphic image interface. The protein, gene, and compound interactions are automatically extracted from abstracts for all genes and proteins by natural-language processing (NLP).The method of automatic extraction uses phrase patterns and the GENA protein, gene, and compound name dictionary, which was developed by our group. With this database, pathways are easily compared among species using data with more than 47,000 protein interactions and protein kinase ortholog tables. The database is available for querying and browsing at http://kinasedb.ontology.ims.u-tokyo.ac.jp/. PMID:12799355

  15. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    PubMed Central

    Caspi, Ron; Altman, Tomer; Dale, Joseph M.; Dreher, Kate; Fulcher, Carol A.; Gilham, Fred; Kaipa, Pallavi; Karthikeyan, Athikkattuvalasu S.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Paley, Suzanne; Popescu, Liviu; Pujar, Anuradha; Shearer, Alexander G.; Zhang, Peifen; Karp, Peter D.

    2010-01-01

    The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic pathways currently available. Pathways reactions are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and literature citations. BioCyc (BioCyc.org) is a collection of more than 500 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs also contain additional features, such as predicted operons, transport systems, and pathway hole-fillers. The BioCyc Web site offers several tools for the analysis of the PGDBs, including Omics Viewers that enable visualization of omics datasets on two different genome-scale diagrams and tools for comparative analysis. The BioCyc PGDBs generated by SRI are offered for adoption by any party interested in curation of metabolic, regulatory, and genome-related information about an organism. PMID:19850718

  16. Microarray analysis reveals key genes and pathways in Tetralogy of Fallot

    PubMed Central

    He, Yue-E; Qiu, Hui-Xian; Jiang, Jian-Bing; Wu, Rong-Zhou; Xiang, Ru-Lian; Zhang, Yuan-Hai

    2017-01-01

    The aim of the present study was to identify key genes that may be involved in the pathogenesis of Tetralogy of Fallot (TOF) using bioinformatics methods. The GSE26125 microarray dataset, which includes cardiovascular tissue samples derived from 16 children with TOF and five healthy age-matched control infants, was downloaded from the Gene Expression Omnibus database. Differential expression analysis was performed between TOF and control samples to identify differentially expressed genes (DEGs) using Student's t-test, and the R/limma package, with a log2 fold-change of >2 and a false discovery rate of <0.01 set as thresholds. The biological functions of DEGs were analyzed using the ToppGene database. The ReactomeFIViz application was used to construct functional interaction (FI) networks, and the genes in each module were subjected to pathway enrichment analysis. The iRegulon plugin was used to identify transcription factors predicted to regulate the DEGs in the FI network, and the gene-transcription factor pairs were then visualized using Cytoscape software. A total of 878 DEGs were identified, including 848 upregulated genes and 30 downregulated genes. The gene FI network contained seven function modules, which were all comprised of upregulated genes. Genes enriched in Module 1 were enriched in the following three neurological disorder-associated signaling pathways: Parkinson's disease, Alzheimer's disease and Huntington's disease. Genes in Modules 0, 3 and 5 were dominantly enriched in pathways associated with ribosomes and protein translation. The Xbox binding protein 1 transcription factor was demonstrated to be involved in the regulation of genes encoding the subunits of cytoplasmic and mitochondrial ribosomes, as well as genes involved in neurodegenerative disorders. Therefore, dysfunction of genes involved in signaling pathways associated with neurodegenerative disorders, ribosome function and protein translation may contribute to the pathogenesis of TOF. PMID:28713939

  17. Identification of key genes related to high-risk gastrointestinal stromal tumors using bioinformatics analysis.

    PubMed

    Jin, Shuan; Zhu, Wenhua; Li, Jun

    2018-01-01

    The purpose of this study was to identify predictive biomarkers used for clinical therapy and prognostic evaluation of high-risk gastrointestinal stromal tumors (GISTs). In this study, microarray data GSE31802 were used to identify differentially expressed genes (DEGs) between high-risk GISTs and low-risk GISTs. Then, enrichment analysis of DEGs was conducted based on the gene ontology and kyoto encyclopedia of genes and genomes pathway database. In addition, the transcription factors and cancer-related genes in DEGs were screened according to the TRANSFAC, TSGene, and TAG database. Finally, protein-protein interaction (PPI) network was constructed and analyzed to look for critical genes involved in high-risk GISTs. A total of forty DEGs were obtained and these genes were mainly involved in four pathways, including melanogenesis, neuroactive ligand-receptor interaction, malaria, and hematopoietic cell lineage. The enriched biological processes were related to the regulation of insulin secretion, integrin activation, and neuropeptide signaling pathway. Transcription factor analysis of DEGs indicated that POU domain, class 2, associating factor 1 (POU2AF1) was significantly downregulated in high-risk GISTs. By constructing the PPI network of DEGs, ten genes with high degrees formed local networks, such as PNOC, P2RY14, and SELP. Four genes as POU2AF1, PNOC, P2RY14, and SELP might be used as biomarkers for prognosis of high-risk GISTs.

  18. In silico analysis of expressed sequence tags from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform with conventional database searches.

    PubMed

    Nagaraj, Shivashankar H; Gasser, Robin B; Nisbet, Alasdair J; Ranganathan, Shoba

    2008-01-01

    The analysis of expressed sequence tags (EST) offers a rapid and cost effective approach to elucidate the transcriptome of an organism, but requires several computational methods for assembly and annotation. Researchers frequently analyse each step manually, which is laborious and time consuming. We have recently developed ESTExplorer, a semi-automated computational workflow system, in order to achieve the rapid analysis of EST datasets. In this study, we evaluated EST data analysis for the parasitic nematode Trichostrongylus vitrinus (order Strongylida) using ESTExplorer, compared with database matching alone. We functionally annotated 1776 ESTs obtained via suppressive-subtractive hybridisation from T. vitrinus, an important parasitic trichostrongylid of small ruminants. Cluster and comparative genomic analyses of the transcripts using ESTExplorer indicated that 290 (41%) sequences had homologues in Caenorhabditis elegans, 329 (42%) in parasitic nematodes, 202 (28%) in organisms other than nematodes, and 218 (31%) had no significant match to any sequence in the current databases. Of the C. elegans homologues, 90 were associated with 'non-wildtype' double-stranded RNA interference (RNAi) phenotypes, including embryonic lethality, maternal sterility, sterile progeny, larval arrest and slow growth. We could functionally classify 267 (38%) sequences using the Gene Ontologies (GO) and establish pathway associations for 230 (33%) sequences using the Kyoto Encyclopedia of Genes and Genomes (KEGG). Further examination of this EST dataset revealed a number of signalling molecules, proteases, protease inhibitors, enzymes, ion channels and immune-related genes. In addition, we identified 40 putative secreted proteins that could represent potential candidates for developing novel anthelmintics or vaccines. We further compared the automated EST sequence annotations, using ESTExplorer, with database search results for individual T. vitrinus ESTs. ESTExplorer reliably and rapidly annotated 301 ESTs, with pathway and GO information, eliminating 60 low quality hits from database searches. We evaluated the efficacy of ESTExplorer in analysing EST data, and demonstrate that computational tools can be used to accelerate the process of gene discovery in EST sequencing projects. The present study has elucidated sets of relatively conserved and potentially novel genes for biological investigation, and the annotated EST set provides further insight into the molecular biology of T. vitrinus, towards the identification of novel drug targets.

  19. HMDB 4.0: the human metabolome database for 2018

    PubMed Central

    Feunang, Yannick Djoumbou; Marcu, Ana; Guo, An Chi; Liang, Kevin; Vázquez-Fresno, Rosa; Sajed, Tanvir; Johnson, Daniel; Li, Carin; Karu, Naama; Sayeeda, Zinat; Lo, Elvis; Assempour, Nazanin; Berjanskii, Mark; Singhal, Sandeep; Arndt, David; Liang, Yonjie; Badran, Hasan; Grant, Jason; Serra-Cayuela, Arnau; Liu, Yifeng; Mandal, Rupa; Neveu, Vanessa; Pon, Allison; Knox, Craig; Wilson, Michael; Manach, Claudine; Scalbert, Augustin

    2018-01-01

    Abstract The Human Metabolome Database or HMDB (www.hmdb.ca) is a web-enabled metabolomic database containing comprehensive information about human metabolites along with their biological roles, physiological concentrations, disease associations, chemical reactions, metabolic pathways, and reference spectra. First described in 2007, the HMDB is now considered the standard metabolomic resource for human metabolic studies. Over the past decade the HMDB has continued to grow and evolve in response to emerging needs for metabolomics researchers and continuing changes in web standards. This year's update, HMDB 4.0, represents the most significant upgrade to the database in its history. For instance, the number of fully annotated metabolites has increased by nearly threefold, the number of experimental spectra has grown by almost fourfold and the number of illustrated metabolic pathways has grown by a factor of almost 60. Significant improvements have also been made to the HMDB’s chemical taxonomy, chemical ontology, spectral viewing, and spectral/text searching tools. A great deal of brand new data has also been added to HMDB 4.0. This includes large quantities of predicted MS/MS and GC–MS reference spectral data as well as predicted (physiologically feasible) metabolite structures to facilitate novel metabolite identification. Additional information on metabolite-SNP interactions and the influence of drugs on metabolite levels (pharmacometabolomics) has also been added. Many other important improvements in the content, the interface, and the performance of the HMDB website have been made and these should greatly enhance its ease of use and its potential applications in nutrition, biochemistry, clinical chemistry, clinical genetics, medicine, and metabolomics science. PMID:29140435

  20. Incorporating ToxCast and Tox21 Datasets to Rank Biological Activity of Chemicals at Superfund Sites in North Carolina

    PubMed Central

    Tilley, Sloane K.; Reif, David M.; Fry, Rebecca C.

    2017-01-01

    Background The Superfund program of the Environmental Protection Agency (EPA) was established in 1980 to address public health concerns posed by toxic substances released into the environment in the United States. Forty-two of the 1328 hazardous waste sites that remain on the Superfund National Priority List are located in the state of North Carolina. Methods We set out to develop a database that contained information on both the prevalence and biological activity of chemicals present at Superfund sites in North Carolina. A chemical characterization tool, the Toxicological Priority Index (ToxPi), was used to rank the biological activity of these chemicals based on their predicted bioavailability, documented associations with biological pathways, and activity in in vitro assays of the ToxCast and Tox21 programs. Results The ten most prevalent chemicals found at North Carolina Superfund sites were chromium, trichloroethene, lead, tetrachloroethene, arsenic, benzene, manganese, 1,2-dichloroethane, nickel, and barium. For all chemicals found at North Carolina Superfund sites, ToxPi analysis was used to rank their biological activity. Through this data integration, residual pesticides and organic solvents were identified to be some of the most highly-ranking predicted bioactive chemicals. This study provides a novel methodology for creating state or regional databases of Superfund sites. Conclusions These data represent a novel integrated profile of the most prevalent chemicals at North Carolina Superfund sites. This information, and the associated methodology, is useful to toxicologists, risk assessors, and the communities living in close proximity to these sites. PMID:28153528

  1. MSD-MAP: A Network-Based Systems Biology Platform for Predicting Disease-Metabolite Links.

    PubMed

    Wathieu, Henri; Issa, Naiem T; Mohandoss, Manisha; Byers, Stephen W; Dakshanamurthy, Sivanesan

    2017-01-01

    Cancer-associated metabolites result from cell-wide mechanisms of dysregulation. The field of metabolomics has sought to identify these aberrant metabolites as disease biomarkers, clues to understanding disease mechanisms, or even as therapeutic agents. This study was undertaken to reliably predict metabolites associated with colorectal, esophageal, and prostate cancers. Metabolite and disease biological action networks were compared in a computational platform called MSD-MAP (Multi Scale Disease-Metabolite Association Platform). Using differential gene expression analysis with patient-based RNAseq data from The Cancer Genome Atlas, genes up- or down-regulated in cancer compared to normal tissue were identified. Relational databases were used to map biological entities including pathways, functions, and interacting proteins, to those differential disease genes. Similar relational maps were built for metabolites, stemming from known and in silico predicted metabolite-protein associations. The hypergeometric test was used to find statistically significant relationships between disease and metabolite biological signatures at each tier, and metabolites were assessed for multi-scale association with each cancer. Metabolite networks were also directly associated with various other diseases using a disease functional perturbation database. Our platform recapitulated metabolite-disease links that have been empirically verified in the scientific literature, with network-based mapping of jointly-associated biological activity also matching known disease mechanisms. This was true for colorectal, esophageal, and prostate cancers, using metabolite action networks stemming from both predicted and known functional protein associations. By employing systems biology concepts, MSD-MAP reliably predicted known cancermetabolite links, and may serve as a predictive tool to streamline conventional metabolomic profiling methodologies. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.

    PubMed

    Santos, Carlos; Eggle, Daniela; States, David J

    2005-04-15

    Wnt signaling is a very active area of research with highly relevant publications appearing at a rate of more than one per day. Building and maintaining databases describing signal transduction networks is a time-consuming and demanding task that requires careful literature analysis and extensive domain-specific knowledge. For instance, more than 50 factors involved in Wnt signal transduction have been identified as of late 2003. In this work we describe a natural language processing (NLP) system that is able to identify references to biological interaction networks in free text and automatically assembles a protein association and interaction map. A 'gold standard' set of names and assertions was derived by manual scanning of the Wnt genes website (http://www.stanford.edu/~rnusse/wntwindow.html) including 53 interactions involved in Wnt signaling. This system was used to analyze a corpus of peer-reviewed articles related to Wnt signaling including 3369 Pubmed and 1230 full text papers. Names for key Wnt-pathway associated proteins and biological entities are identified using a chi-squared analysis of noun phrases over-represented in the Wnt literature as compared to the general signal transduction literature. Interestingly, we identified several instances where generic terms were used on the website when more specific terms occur in the literature, and one typographic error on the Wnt canonical pathway. Using the named entity list and performing an exhaustive assertion extraction of the corpus, 34 of the 53 interactions in the 'gold standard' Wnt signaling set were successfully identified (64% recall). In addition, the automated extraction found several interactions involving key Wnt-related molecules which were missing or different from those in the canonical diagram, and these were confirmed by manual review of the text. These results suggest that a combination of NLP techniques for information extraction can form a useful first-pass tool for assisting human annotation and maintenance of signal pathway databases. The pipeline software components are freely available on request to the authors. dstates@umich.edu http://stateslab.bioinformatics.med.umich.edu/software.html.

  3. Understanding sequence similarity and framework analysis between centromere proteins using computational biology.

    PubMed

    Doss, C George Priya; Chakrabarty, Chiranjib; Debajyoti, C; Debottam, S

    2014-11-01

    Certain mysteries pointing toward their recruitment pathways, cell cycle regulation mechanisms, spindle checkpoint assembly, and chromosome segregation process are considered the centre of attraction in cancer research. In modern times, with the established databases, ranges of computational platforms have provided a platform to examine almost all the physiological and biochemical evidences in disease-associated phenotypes. Using existing computational methods, we have utilized the amino acid residues to understand the similarity within the evolutionary variance of different associated centromere proteins. This study related to sequence similarity, protein-protein networking, co-expression analysis, and evolutionary trajectory of centromere proteins will speed up the understanding about centromere biology and will create a road map for upcoming researchers who are initiating their work of clinical sequencing using centromere proteins.

  4. Phylogenetically informed logic relationships improve detection of biological network organization

    PubMed Central

    2011-01-01

    Background A "phylogenetic profile" refers to the presence or absence of a gene across a set of organisms, and it has been proven valuable for understanding gene functional relationships and network organization. Despite this success, few studies have attempted to search beyond just pairwise relationships among genes. Here we search for logic relationships involving three genes, and explore its potential application in gene network analyses. Results Taking advantage of a phylogenetic matrix constructed from the large orthologs database Roundup, we invented a method to create balanced profiles for individual triplets of genes that guarantee equal weight on the different phylogenetic scenarios of coevolution between genes. When we applied this idea to LAPP, the method to search for logic triplets of genes, the balanced profiles resulted in significant performance improvement and the discovery of hundreds of thousands more putative triplets than unadjusted profiles. We found that logic triplets detected biological network organization and identified key proteins and their functions, ranging from neighbouring proteins in local pathways, to well separated proteins in the whole pathway, and to the interactions among different pathways at the system level. Finally, our case study suggested that the directionality in a logic relationship and the profile of a triplet could disclose the connectivity between the triplet and surrounding networks. Conclusion Balanced profiles are superior to the raw profiles employed by traditional methods of phylogenetic profiling in searching for high order gene sets. Gene triplets can provide valuable information in detection of biological network organization and identification of key genes at different levels of cellular interaction. PMID:22172058

  5. Genome-wide identification of genetic determinants for the cytotoxicity of perifosine

    PubMed Central

    2008-01-01

    Perifosine belongs to the class of alkylphospholipid analogues, which act primarily at the cell membrane, thereby targeting signal transduction pathways. In phase I/II clinical trials, perifosine has induced tumour regression and caused disease stabilisation in a variety of tumour types. The genetic determinants responsible for its cytotoxicity have not been comprehensively studied, however. We performed a genome-wide analysis to identify genes whose expression levels or genotypic variation were correlated with the cytotoxicity of perifosine, using public databases on the US National Cancer Institute (NCI)-60 human cancer cell lines. For demonstrating drug specificity, the NCI Standard Agent Database (including 171 drugs acting through a variety of mechanisms) was used as a control. We identified agents with similar cytotoxicity profiles to that of perifosine in compounds used in the NCI drug screen. Furthermore, Gene Ontology and pathway analyses were carried out on genes more likely to be perifosine specific. The results suggested that genes correlated with perifosine cytotoxicity are connected by certain known pathways that lead to the mitogen-activated protein kinase signalling pathway and apoptosis. Biological processes such as 'response to stress', 'inflammatory response' and 'ubiquitin cycle' were enriched among these genes. Three single nucleotide polymorphisms (SNPs) located in CACNA2DI and EXOC4 were found to be correlated with perifosine cytotoxicity. Our results provided a manageable list of genes whose expression levels or genotypic variation were strongly correlated with the cytotoxcity of perifosine. These genes could be targets for further studies using candidate-gene approaches. The results also provided insights into the pharmacodynamics of perifosine. PMID:19129090

  6. The chemokine receptor CCR1 is identified in mast cell-derived exosomes.

    PubMed

    Liang, Yuting; Qiao, Longwei; Peng, Xia; Cui, Zelin; Yin, Yue; Liao, Huanjin; Jiang, Min; Li, Li

    2018-01-01

    Mast cells are important effector cells of the immune system, and mast cell-derived exosomes carrying RNAs play a role in immune regulation. However, the molecular function of mast cell-derived exosomes is currently unknown, and here, we identify differentially expressed genes (DEGs) in mast cells and exosomes. We isolated mast cells derived exosomes through differential centrifugation and screened the DEGs from mast cell-derived exosomes, using the GSE25330 array dataset downloaded from the Gene Expression Omnibus database. Biochemical pathways were analyzed by Gene ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway on the online tool DAVID. DEGs-associated protein-protein interaction networks (PPIs) were constructed using the STRING database and Cytoscape software. The genes identified from these bioinformatics analyses were verified by qRT-PCR and Western blot in mast cells and exosomes. We identified 2121 DEGs (843 up and 1278 down-regulated genes) in HMC-1 cell-derived exosomes and HMC-1 cells. The up-regulated DEGs were classified into two significant modules. The chemokine receptor CCR1 was screened as a hub gene and enriched in cytokine-mediated signaling pathway in module one. Seven genes, including CCR1, CD9, KIT, TGFBR1, TLR9, TPSAB1 and TPSB2 were screened and validated through qRT-PCR analysis. We have achieved a comprehensive view of the pivotal genes and pathways in mast cells and exosomes and identified CCR1 as a hub gene in mast cell-derived exosomes. Our results provide novel clues with respect to the biological processes through which mast cell-derived exosomes modulate immune responses.

  7. Genome-wide genetic analyses highlight mitogen-activated protein kinase (MAPK) signaling in the pathogenesis of endometriosis.

    PubMed

    Uimari, Outi; Rahmioglu, Nilufer; Nyholt, Dale R; Vincent, Katy; Missmer, Stacey A; Becker, Christian; Morris, Andrew P; Montgomery, Grant W; Zondervan, Krina T

    2017-04-01

    Do genome-wide association study (GWAS) data for endometriosis provide insight into novel biological pathways associated with its pathogenesis? GWAS analysis uncovered multiple pathways that are statistically enriched for genetic association signals, analysis of Stage A disease highlighted a novel variant in MAP3K4, while top pathways significantly associated with all endometriosis and Stage A disease included several mitogen-activated protein kinase (MAPK)-related pathways. Endometriosis is a complex disease with an estimated heritability of 50%. To date, GWAS revealed 10 genomic regions associated with endometriosis, explaining <4% of heritability, while half of the heritability is estimated to be due to common risk variants. Pathway analyses combine the evidence of single variants into gene-based measures, leveraging the aggregate effect of variants in genes and uncovering biological pathways involved in disease pathogenesis. Pathway analysis was conducted utilizing the International Endogene Consortium GWAS data, comprising 3194 surgically confirmed endometriosis cases and 7060 controls of European ancestry with genotype data imputed up to 1000 Genomes Phase three reference panel. GWAS was performed for all endometriosis cases and for Stage A (revised American Fertility Society (rAFS) I/II, n = 1686) and B (rAFS III/IV, n = 1364) cases separately. The identified significant pathways were compared with pathways previously investigated in the literature through candidate association studies. The most comprehensive biological pathway databases, MSigDB (including BioCarta, KEGG, PID, SA, SIG, ST and GO) and PANTHER were utilized to test for enrichment of genetic variants associated with endometriosis. Statistical enrichment analysis was performed using the MAGENTA (Meta-Analysis Gene-set Enrichment of variaNT Associations) software. The first genome-wide association analysis for Stage A endometriosis revealed a novel locus, rs144240142 (P = 6.45 × 10-8, OR = 1.71, 95% CI = 1.23-2.37), an intronic single-nucleotide polymorphism (SNP) within MAP3K4. This SNP was not associated with Stage B disease (P = 0.086). MAP3K4 was also shown to be differentially expressed in eutopic endometrium between Stage A endometriosis cases and controls (P = 3.8 × 10-4), but not with Stage B disease (P = 0.26). A total of 14 pathways enriched with genetic endometriosis associations were identified (false discovery rate (FDR)-P < 0.05). The pathways associated with any endometriosis were Grb2-Sos provides linkage to MAPK signaling for integrins pathway (P = 2.8 × 10-5, FDR-P = 3.0 × 10-3), Wnt signaling (P = 0.026, FDR-P = 0.026) and p130Cas linkage to MAPK signaling for integrins pathway (P = 6.0 × 10-4, FDR-P = 0.029); with Stage A endometriosis: extracellular signal-regulated kinase (ERK)1 ERK2 MAPK (P = 5.0 × 10-4, FDR-P = 5.0 × 10-4) and with Stage B endometriosis: two overlapping pathways that related to extracellular matrix biology-Core matrisome (P = 1.4 × 10-3, FDR-P = 0.013) and ECM glycoproteins (P = 1.8 × 10-3, FDR-P = 7.1 × 10-3). Genes arising from endometriosis candidate gene studies performed to date were enriched for Interleukin signaling pathway (P = 2.3 × 10-12), Apoptosis signaling pathway (P = 9.7 × 10-9) and Gonadotropin releasing hormone receptor pathway (P = 1.2 × 10-6); however, these pathways did not feature in the results based on GWAS data. Not applicable. The analysis is restricted to (i) variants in/near genes that can be assigned to pathways, excluding intergenic variants; (ii) the gene-based pathway definition as registered in the databases; (iii) women of European ancestry. The top ranked pathways associated with overall and Stage A endometriosis in particular involve integrin-mediated MAPK activation and intracellular ERK/MAPK acting downstream in the MAPK cascade, both acting in the control of cell division, gene expression, cell movement and survival. Other top enriched pathways in Stage B disease include ECM glycoprotein pathways important for extracellular structure and biochemical support. The results highlight the need for increased efforts to understand the functional role of these pathways in endometriosis pathogenesis, including the investigation of the biological effects of the genetic variants on downstream molecular processes in tissue relevant to endometriosis. Additionally, our results offer further support for the hypothesis of at least partially distinct causal pathophysiology for minimal/mild (rAFS I/II) vs. moderate/severe (rAFS III/IV) endometriosis. The genome-wide association data and Wellcome Trust Case Control Consortium (WTCCC) were generated through funding from the Wellcome Trust (WT084766/Z/08/Z, 076113 and 085475) and the National Health and Medical Research Council (NHMRC) of Australia (241944, 339462, 389927, 389875, 389891, 389892, 389938, 443036, 442915, 442981, 496610, 496739, 552485 and 552498). N.R. was funded by a grant from the Medical Research Council UK (MR/K011480/1). A.P.M. is a Wellcome Trust Senior Fellow in Basic Biomedical Science (grant WT098017). All authors declare there are no conflicts of interest. © The Author 2017. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology.

  8. Integrated Bio-Entity Network: A System for Biological Knowledge Discovery

    PubMed Central

    Bell, Lindsey; Chowdhary, Rajesh; Liu, Jun S.; Niu, Xufeng; Zhang, Jinfeng

    2011-01-01

    A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein–protein interactions, protein/gene regulations, protein–small molecule interactions, protein–GO relationships, protein–pathway relationships, and pathway–disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses—the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs. PMID:21738677

  9. DrugQuest - a text mining workflow for drug association discovery.

    PubMed

    Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Vizirianakis, Ioannis S; Iliopoulos, Ioannis

    2016-06-06

    Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .

  10. Morphinome Database - The database of proteins altered by morphine administration - An update.

    PubMed

    Bodzon-Kulakowska, Anna; Padrtova, Tereza; Drabik, Anna; Ner-Kluza, Joanna; Antolak, Anna; Kulakowski, Konrad; Suder, Piotr

    2018-04-13

    Morphine is considered a gold standard in pain treatment. Nevertheless, its use could be associated with severe side effects, including drug addiction. Thus, it is very important to understand the molecular mechanism of morphine action in order to develop new methods of pain therapy, or at least to attenuate the side effects of opioids usage. Proteomics allows for the indication of proteins involved in certain biological processes, but the number of items identified in a single study is usually overwhelming. Thus, researchers face the difficult problem of choosing the proteins which are really important for the investigated processes and worth further studies. Therefore, based on the 29 published articles, we created a database of proteins regulated by morphine administration - The Morphinome Database (addiction-proteomics.org). This web tool allows for indicating proteins that were identified during different proteomics studies. Moreover, the collection and organization of such a vast amount of data allows us to find the same proteins that were identified in various studies and to create their ranking, based on the frequency of their identification. STRING and KEGG databases indicated metabolic pathways which those molecules are involved in. This means that those molecular pathways seem to be strongly affected by morphine administration and could be important targets for further investigations. The data about proteins identified by different proteomics studies of molecular changes caused by morphine administration (29 published articles) were gathered in the Morphinome Database. Unification of those data allowed for the identification of proteins that were indicated several times by distinct proteomics studies, which means that they seem to be very well verified and important for the entire process. Those proteins might be now considered promising aims for more detailed studies of their role in the molecular mechanism of morphine action. Copyright © 2018. Published by Elsevier B.V.

  11. A combined computational-experimental analyses of selected metabolic enzymes in Pseudomonas species.

    PubMed

    Perumal, Deepak; Lim, Chu Sing; Chow, Vincent T K; Sakharkar, Kishore R; Sakharkar, Meena K

    2008-09-10

    Comparative genomic analysis has revolutionized our ability to predict the metabolic subsystems that occur in newly sequenced genomes, and to explore the functional roles of the set of genes within each subsystem. These computational predictions can considerably reduce the volume of experimental studies required to assess basic metabolic properties of multiple bacterial species. However, experimental validations are still required to resolve the apparent inconsistencies in the predictions by multiple resources. Here, we present combined computational-experimental analyses on eight completely sequenced Pseudomonas species. Comparative pathway analyses reveal that several pathways within the Pseudomonas species show high plasticity and versatility. Potential bypasses in 11 metabolic pathways were identified. We further confirmed the presence of the enzyme O-acetyl homoserine (thiol) lyase (EC: 2.5.1.49) in P. syringae pv. tomato that revealed inconsistent annotations in KEGG and in the recently published SYSTOMONAS database. These analyses connect and integrate systematic data generation, computational data interpretation, and experimental validation and represent a synergistic and powerful means for conducting biological research.

  12. The Reactome Pathway Knowledgebase

    PubMed Central

    Jupe, Steven; Matthews, Lisa; Sidiropoulos, Konstantinos; Gillespie, Marc; Garapati, Phani; Haw, Robin; Jassal, Bijay; Korninger, Florian; May, Bruce; Milacic, Marija; Roca, Corina Duenas; Rothfels, Karen; Sevilla, Cristoffer; Shamovsky, Veronica; Shorser, Solomon; Varusai, Thawfeek; Viteri, Guilherme; Weiser, Joel

    2018-01-01

    Abstract The Reactome Knowledgebase (https://reactome.org) provides molecular details of signal transduction, transport, DNA replication, metabolism, and other cellular processes as an ordered network of molecular transformations—an extended version of a classic metabolic map, in a single consistent data model. Reactome functions both as an archive of biological processes and as a tool for discovering unexpected functional relationships in data such as gene expression profiles or somatic mutation catalogues from tumor cells. To support the continued brisk growth in the size and complexity of Reactome, we have implemented a graph database, improved performance of data analysis tools, and designed new data structures and strategies to boost diagram viewer performance. To make our website more accessible to human users, we have improved pathway display and navigation by implementing interactive Enhanced High Level Diagrams (EHLDs) with an associated icon library, and subpathway highlighting and zooming, in a simplified and reorganized web site with adaptive design. To encourage re-use of our content, we have enabled export of pathway diagrams as ‘PowerPoint’ files. PMID:29145629

  13. A Systems Biology Approach to Reveal Putative Host-Derived Biomarkers of Periodontitis by Network Topology Characterization of MMP-REDOX/NO and Apoptosis Integrated Pathways.

    PubMed

    Zeidán-Chuliá, Fares; Gürsoy, Mervi; Neves de Oliveira, Ben-Hur; Özdemir, Vural; Könönen, Eija; Gürsoy, Ulvi K

    2015-01-01

    Periodontitis, a formidable global health burden, is a common chronic disease that destroys tooth-supporting tissues. Biomarkers of the early phase of this progressive disease are of utmost importance for global health. In this context, saliva represents a non-invasive biosample. By using systems biology tools, we aimed to (1) identify an integrated interactome between matrix metalloproteinase (MMP)-REDOX/nitric oxide (NO) and apoptosis upstream pathways of periodontal inflammation, and (2) characterize the attendant topological network properties to uncover putative biomarkers to be tested in saliva from patients with periodontitis. Hence, we first generated a protein-protein network model of interactions ("BIOMARK" interactome) by using the STRING 10 database, a search tool for the retrieval of interacting genes/proteins, with "Experiments" and "Databases" as input options and a confidence score of 0.400. Second, we determined the centrality values (closeness, stress, degree or connectivity, and betweenness) for the "BIOMARK" members by using the Cytoscape software. We found Ubiquitin C (UBC), Jun proto-oncogene (JUN), and matrix metalloproteinase-14 (MMP14) as the most central hub- and non-hub-bottlenecks among the 211 genes/proteins of the whole interactome. We conclude that UBC, JUN, and MMP14 are likely an optimal candidate group of host-derived biomarkers, in combination with oral pathogenic bacteria-derived proteins, for detecting periodontitis at its early phase by using salivary samples from patients. These findings therefore have broader relevance for systems medicine in global health as well.

  14. Use of Graph Database for the Integration of Heterogeneous Biological Data.

    PubMed

    Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young

    2017-03-01

    Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.

  15. Use of Graph Database for the Integration of Heterogeneous Biological Data

    PubMed Central

    Yoon, Byoung-Ha; Kim, Seon-Kyu

    2017-01-01

    Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946

  16. NAViGaTing the Micronome – Using Multiple MicroRNA Prediction Databases to Identify Signalling Pathway-Associated MicroRNAs

    PubMed Central

    Shirdel, Elize A.; Xie, Wing; Mak, Tak W.; Jurisica, Igor

    2011-01-01

    Background MicroRNAs are a class of small RNAs known to regulate gene expression at the transcript level, the protein level, or both. Since microRNA binding is sequence-based but possibly structure-specific, work in this area has resulted in multiple databases storing predicted microRNA:target relationships computed using diverse algorithms. We integrate prediction databases, compare predictions to in vitro data, and use cross-database predictions to model the microRNA:transcript interactome – referred to as the micronome – to study microRNA involvement in well-known signalling pathways as well as associations with disease. We make this data freely available with a flexible user interface as our microRNA Data Integration Portal — mirDIP (http://ophid.utoronto.ca/mirDIP). Results mirDIP integrates prediction databases to elucidate accurate microRNA:target relationships. Using NAViGaTOR to produce interaction networks implicating microRNAs in literature-based, KEGG-based and Reactome-based pathways, we find these signalling pathway networks have significantly more microRNA involvement compared to chance (p<0.05), suggesting microRNAs co-target many genes in a given pathway. Further examination of the micronome shows two distinct classes of microRNAs; universe microRNAs, which are involved in many signalling pathways; and intra-pathway microRNAs, which target multiple genes within one signalling pathway. We find universe microRNAs to have more targets (p<0.0001), to be more studied (p<0.0002), and to have higher degree in the KEGG cancer pathway (p<0.0001), compared to intra-pathway microRNAs. Conclusions Our pathway-based analysis of mirDIP data suggests microRNAs are involved in intra-pathway signalling. We identify two distinct classes of microRNAs, suggesting a hierarchical organization of microRNAs co-targeting genes both within and between pathways, and implying differential involvement of universe and intra-pathway microRNAs at the disease level. PMID:21364759

  17. Differential reconstructed gene interaction networks for deriving toxicity threshold in chemical risk assessment.

    PubMed

    Yang, Yi; Maxwell, Andrew; Zhang, Xiaowei; Wang, Nan; Perkins, Edward J; Zhang, Chaoyang; Gong, Ping

    2013-01-01

    Pathway alterations reflected as changes in gene expression regulation and gene interaction can result from cellular exposure to toxicants. Such information is often used to elucidate toxicological modes of action. From a risk assessment perspective, alterations in biological pathways are a rich resource for setting toxicant thresholds, which may be more sensitive and mechanism-informed than traditional toxicity endpoints. Here we developed a novel differential networks (DNs) approach to connect pathway perturbation with toxicity threshold setting. Our DNs approach consists of 6 steps: time-series gene expression data collection, identification of altered genes, gene interaction network reconstruction, differential edge inference, mapping of genes with differential edges to pathways, and establishment of causal relationships between chemical concentration and perturbed pathways. A one-sample Gaussian process model and a linear regression model were used to identify genes that exhibited significant profile changes across an entire time course and between treatments, respectively. Interaction networks of differentially expressed (DE) genes were reconstructed for different treatments using a state space model and then compared to infer differential edges/interactions. DE genes possessing differential edges were mapped to biological pathways in databases such as KEGG pathways. Using the DNs approach, we analyzed a time-series Escherichia coli live cell gene expression dataset consisting of 4 treatments (control, 10, 100, 1000 mg/L naphthenic acids, NAs) and 18 time points. Through comparison of reconstructed networks and construction of differential networks, 80 genes were identified as DE genes with a significant number of differential edges, and 22 KEGG pathways were altered in a concentration-dependent manner. Some of these pathways were perturbed to a degree as high as 70% even at the lowest exposure concentration, implying a high sensitivity of our DNs approach. Findings from this proof-of-concept study suggest that our approach has a great potential in providing a novel and sensitive tool for threshold setting in chemical risk assessment. In future work, we plan to analyze more time-series datasets with a full spectrum of concentrations and sufficient replications per treatment. The pathway alteration-derived thresholds will also be compared with those derived from apical endpoints such as cell growth rate.

  18. miR2Pathway: A novel analytical method to discover MicroRNA-mediated dysregulated pathways involved in hepatocellular carcinoma.

    PubMed

    Li, Chaoxing; Dinu, Valentin

    2018-05-01

    MicroRNAs (miRNAs) are small, non-coding RNAs involved in the regulation of gene expression at a post-transcriptional level. Recent studies have shown miRNAs as key regulators of a variety of biological processes, such as proliferation, differentiation, apoptosis, metabolism, etc. Aberrantly expressed miRNAs influence individual gene expression level, but rewired miRNA-mRNA connections can influence the activity of biological pathways. Here, we define rewired miRNA-mRNA connections as the differential (rewiring) effects on the activity of biological pathways between hepatocellular carcinoma (HCC) and normal phenotypes. Our work presented here uses a PageRank-based approach to measure the degree of miRNA-mediated dysregulation of biological pathways between HCC and normal samples based on rewired miRNA-mRNA connections. In our study, we regard the degree of miRNA-mediated dysregulation of biological pathways as disease risk of biological pathways. Therefore, we propose a new method, miR2Pathway, to measure and rank the degree of miRNA-mediated dysregulation of biological pathways by measuring the total differential influence of miRNAs on the activity of pathways between HCC and normal states. miR2Pathway proposed here systematically shows the first evidence for a mechanism of biological pathways being dysregulated by rewired miRNA-mRNA connections, and provides new insight into exploring mechanisms behind HCC. Thus, miR2Pathway is a novel method to identify and rank miRNA-dysregulated pathways in HCC. Copyright © 2018 Elsevier Inc. All rights reserved.

  19. Software Tool for Researching Annotations of Proteins (STRAP): Open-Source Protein Annotation Software with Data Visualization

    PubMed Central

    Bhatia, Vivek N.; Perlman, David H.; Costello, Catherine E.; McComb, Mark E.

    2009-01-01

    In order that biological meaning may be derived and testable hypotheses may be built from proteomics experiments, assignments of proteins identified by mass spectrometry or other techniques must be supplemented with additional notation, such as information on known protein functions, protein-protein interactions, or biological pathway associations. Collecting, organizing, and interpreting this data often requires the input of experts in the biological field of study, in addition to the time-consuming search for and compilation of information from online protein databases. Furthermore, visualizing this bulk of information can be challenging due to the limited availability of easy-to-use and freely available tools for this process. In response to these constraints, we have undertaken the design of software to automate annotation and visualization of proteomics data in order to accelerate the pace of research. Here we present the Software Tool for Researching Annotations of Proteins (STRAP) – a user-friendly, open-source C# application. STRAP automatically obtains gene ontology (GO) terms associated with proteins in a proteomics results ID list using the freely accessible UniProtKB and EBI GOA databases. Summarized in an easy-to-navigate tabular format, STRAP includes meta-information on the protein in addition to complimentary GO terminology. Additionally, this information can be edited by the user so that in-house expertise on particular proteins may be integrated into the larger dataset. STRAP provides a sortable tabular view for all terms, as well as graphical representations of GO-term association data in pie (biological process, cellular component and molecular function) and bar charts (cross comparison of sample sets) to aid in the interpretation of large datasets and differential analyses experiments. Furthermore, proteins of interest may be exported as a unique FASTA-formatted file to allow for customizable re-searching of mass spectrometry data, and gene names corresponding to the proteins in the lists may be encoded in the Gaggle microformat for further characterization, including pathway analysis. STRAP, a tutorial, and the C# source code are freely available from http://cpctools.sourceforge.net. PMID:19839595

  20. Biological Networks for Predicting Chemical Hepatocarcinogenicity Using Gene Expression Data from Treated Mice and Relevance across Human and Rat Species

    PubMed Central

    Thomas, Reuben; Thomas, Russell S.; Auerbach, Scott S.; Portier, Christopher J.

    2013-01-01

    Background Several groups have employed genomic data from subchronic chemical toxicity studies in rodents (90 days) to derive gene-centric predictors of chronic toxicity and carcinogenicity. Genes are annotated to belong to biological processes or molecular pathways that are mechanistically well understood and are described in public databases. Objectives To develop a molecular pathway-based prediction model of long term hepatocarcinogenicity using 90-day gene expression data and to evaluate the performance of this model with respect to both intra-species, dose-dependent and cross-species predictions. Methods Genome-wide hepatic mRNA expression was retrospectively measured in B6C3F1 mice following subchronic exposure to twenty-six (26) chemicals (10 were positive, 2 equivocal and 14 negative for liver tumors) previously studied by the US National Toxicology Program. Using these data, a pathway-based predictor model for long-term liver cancer risk was derived using random forests. The prediction model was independently validated on test sets associated with liver cancer risk obtained from mice, rats and humans. Results Using 5-fold cross validation, the developed prediction model had reasonable predictive performance with the area under receiver-operator curve (AUC) equal to 0.66. The developed prediction model was then used to extrapolate the results to data associated with rat and human liver cancer. The extrapolated model worked well for both extrapolated species (AUC value of 0.74 for rats and 0.91 for humans). The prediction models implied a balanced interplay between all pathway responses leading to carcinogenicity predictions. Conclusions Pathway-based prediction models estimated from sub-chronic data hold promise for predicting long-term carcinogenicity and also for its ability to extrapolate results across multiple species. PMID:23737943

  1. Functional phylogenomics analysis of bacteria and archaea using consistent genome annotation with UniFam

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chai, Juanjuan; Kora, Guruprasad; Ahn, Tae-Hyuk

    2014-10-09

    To supply some background, phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes. Our results show a total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accuratemore » comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history. In conclusion, our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.« less

  2. Biological networks for predicting chemical hepatocarcinogenicity using gene expression data from treated mice and relevance across human and rat species.

    PubMed

    Thomas, Reuben; Thomas, Russell S; Auerbach, Scott S; Portier, Christopher J

    2013-01-01

    Several groups have employed genomic data from subchronic chemical toxicity studies in rodents (90 days) to derive gene-centric predictors of chronic toxicity and carcinogenicity. Genes are annotated to belong to biological processes or molecular pathways that are mechanistically well understood and are described in public databases. To develop a molecular pathway-based prediction model of long term hepatocarcinogenicity using 90-day gene expression data and to evaluate the performance of this model with respect to both intra-species, dose-dependent and cross-species predictions. Genome-wide hepatic mRNA expression was retrospectively measured in B6C3F1 mice following subchronic exposure to twenty-six (26) chemicals (10 were positive, 2 equivocal and 14 negative for liver tumors) previously studied by the US National Toxicology Program. Using these data, a pathway-based predictor model for long-term liver cancer risk was derived using random forests. The prediction model was independently validated on test sets associated with liver cancer risk obtained from mice, rats and humans. Using 5-fold cross validation, the developed prediction model had reasonable predictive performance with the area under receiver-operator curve (AUC) equal to 0.66. The developed prediction model was then used to extrapolate the results to data associated with rat and human liver cancer. The extrapolated model worked well for both extrapolated species (AUC value of 0.74 for rats and 0.91 for humans). The prediction models implied a balanced interplay between all pathway responses leading to carcinogenicity predictions. Pathway-based prediction models estimated from sub-chronic data hold promise for predicting long-term carcinogenicity and also for its ability to extrapolate results across multiple species.

  3. Systematically Studying Kinase Inhibitor Induced Signaling Network Signatures by Integrating Both Therapeutic and Side Effects

    PubMed Central

    Shao, Hongwei; Peng, Tao; Ji, Zhiwei; Su, Jing; Zhou, Xiaobo

    2013-01-01

    Substantial effort in recent years has been devoted to analyzing data based large-scale biological networks, which provide valuable insight into the topologies of complex biological networks but are rarely context specific and cannot be used to predict the responses of cell signaling proteins to specific ligands or compounds. In this work, we proposed a novel strategy to investigate kinase inhibitor induced pathway signatures by integrating multiplex data in Library of Integrated Network-based Cellular Signatures (LINCS), e.g. KINOMEscan data and cell proliferation/mitosis imaging data. Using this strategy, we first established a PC9 cell line specific pathway model to investigate the pathway signatures in PC9 cell line when perturbed by a small molecule kinase inhibitor GW843682. This specific pathway revealed the role of PI3K/AKT in modulating the cell proliferation process and the absence of two anti-proliferation links, which indicated a potential mechanism of abnormal expansion in PC9 cell number. Incorporating the pathway model for side effects on primary human hepatocytes, it was used to screen 27 kinase inhibitors in LINCS database and PF02341066, known as Crizotinib, was finally suggested with an optimal concentration 4.6 uM to suppress PC9 cancer cell expansion while avoiding severe damage to primary human hepatocytes. Drug combination analysis revealed that the synergistic effect region can be predicted straightforwardly based on a threshold which is an inherent property of each kinase inhibitor. Furthermore, this integration strategy can be easily extended to other specific cell lines to be a powerful tool for drug screen before clinical trials. PMID:24339888

  4. Identification of key genes and pathways associated with neuropathic pain in uninjured dorsal root ganglion by using bioinformatic analysis.

    PubMed

    Chen, Chao-Jin; Liu, De-Zhao; Yao, Wei-Feng; Gu, Yu; Huang, Fei; Hei, Zi-Qing; Li, Xiang

    2017-01-01

    Neuropathic pain is a complex chronic condition occurring post-nervous system damage. The transcriptional reprogramming of injured dorsal root ganglia (DRGs) drives neuropathic pain. However, few comparative analyses using high-throughput platforms have investigated uninjured DRG in neuropathic pain, and potential interactions among differentially expressed genes (DEGs) and pathways were not taken into consideration. The aim of this study was to identify changes in genes and pathways associated with neuropathic pain in uninjured L4 DRG after L5 spinal nerve ligation (SNL) by using bioinformatic analysis. The microarray profile GSE24982 was downloaded from the Gene Expression Omnibus database to identify DEGs between DRGs in SNL and sham rats. The prioritization for these DEGs was performed using the Toppgene database followed by gene ontology and pathway enrichment analyses. The relationships among DEGs from the protein interactive perspective were analyzed using protein-protein interaction (PPI) network and module analysis. Real-time polymerase chain reaction (PCR) and Western blotting were used to confirm the expression of DEGs in the rodent neuropathic pain model. A total of 206 DEGs that might play a role in neuropathic pain were identified in L4 DRG, of which 75 were upregulated and 131 were downregulated. The upregulated DEGs were enriched in biological processes related to transcription regulation and molecular functions such as DNA binding, cell cycle, and the FoxO signaling pathway. Ctnnb1 protein had the highest connectivity degrees in the PPI network. The in vivo studies also validated that mRNA and protein levels of Ctnnb1 were upregulated in both L4 and L5 DRGs. This study provides insight into the functional gene sets and pathways associated with neuropathic pain in L4 uninjured DRG after L5 SNL, which might promote our understanding of the molecular mechanisms underlying the development of neuropathic pain.

  5. Pathway Analysis and Omics Data Visualization Using Pathway Genome Databases: FragariaCyc, a Case Study.

    PubMed

    Naithani, Sushma; Jaiswal, Pankaj

    2017-01-01

    The species-specific plant Pathway Genome Databases (PGDBs) based on the BioCyc platform provide a conceptual model of the cellular metabolic network of an organism. Such frameworks allow analysis of the genome-scale expression data to understand changes in the overall metabolisms of an organism (or organs, tissues, and cells) in response to various extrinsic (e.g. developmental and differentiation) and/or extrinsic signals (e.g. pathogens and abiotic stresses) from the surrounding environment. Using FragariaCyc, a pathway database for the diploid strawberry Fragaria vesca, we show (1) the basic navigation across a PGDB; (2) a case study of pathway comparison across plant species; and (3) an example of RNA-Seq data analysis using Omics Viewer tool. The protocols described here generally apply to other Pathway Tools-based PGDBs.

  6. CellLineNavigator: a workbench for cancer cell line analysis

    PubMed Central

    Krupp, Markus; Itzel, Timo; Maass, Thorsten; Hildebrandt, Andreas; Galle, Peter R.; Teufel, Andreas

    2013-01-01

    The CellLineNavigator database, freely available at http://www.medicalgenomics.org/celllinenavigator, is a web-based workbench for large scale comparisons of a large collection of diverse cell lines. It aims to support experimental design in the fields of genomics, systems biology and translational biomedical research. Currently, this compendium holds genome wide expression profiles of 317 different cancer cell lines, categorized into 57 different pathological states and 28 individual tissues. To enlarge the scope of CellLineNavigator, the database was furthermore closely linked to commonly used bioinformatics databases and knowledge repositories. To ensure easy data access and search ability, a simple data and an intuitive querying interface were implemented. It allows the user to explore and filter gene expression, focusing on pathological or physiological conditions. For a more complex search, the advanced query interface may be used to query for (i) differentially expressed genes; (ii) pathological or physiological conditions; or (iii) gene names or functional attributes, such as Kyoto Encyclopaedia of Genes and Genomes pathway maps. These queries may also be combined. Finally, CellLineNavigator allows additional advanced analysis of differentially regulated genes by a direct link to the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources. PMID:23118487

  7. Looking for Cancer Clues in Publicly Accessible Databases

    PubMed Central

    Lemkin, Peter F.; Smythers, Gary W.; Munroe, David J.

    2004-01-01

    What started out as a mere attempt to tentatively identify proteins in experimental cancer-related 2D-PAGE maps developed into VIRTUAL2D, a web-accessible repository for theoretical pI/MW charts for 92 organisms. Using publicly available expression data, we developed a collection of tissue-specific plots based on differential gene expression between normal and diseased states. We use this comparative cancer proteomics knowledge base, known as the tissue molecular anatomy project (TMAP), to uncover threads of cancer markers common to several types of cancer and to relate this information to established biological pathways. PMID:18629065

  8. Looking for cancer clues in publicly accessible databases.

    PubMed

    Medjahed, Djamel; Lemkin, Peter F; Smythers, Gary W; Munroe, David J

    2004-01-01

    What started out as a mere attempt to tentatively identify proteins in experimental cancer-related 2D-PAGE maps developed into VIRTUAL2D, a web-accessible repository for theoretical pI/MW charts for 92 organisms. Using publicly available expression data, we developed a collection of tissue-specific plots based on differential gene expression between normal and diseased states. We use this comparative cancer proteomics knowledge base, known as the tissue molecular anatomy project (TMAP), to uncover threads of cancer markers common to several types of cancer and to relate this information to established biological pathways.

  9. Boosting probabilistic graphical model inference by incorporating prior knowledge from multiple sources.

    PubMed

    Praveen, Paurush; Fröhlich, Holger

    2013-01-01

    Inferring regulatory networks from experimental data via probabilistic graphical models is a popular framework to gain insights into biological systems. However, the inherent noise in experimental data coupled with a limited sample size reduces the performance of network reverse engineering. Prior knowledge from existing sources of biological information can address this low signal to noise problem by biasing the network inference towards biologically plausible network structures. Although integrating various sources of information is desirable, their heterogeneous nature makes this task challenging. We propose two computational methods to incorporate various information sources into a probabilistic consensus structure prior to be used in graphical model inference. Our first model, called Latent Factor Model (LFM), assumes a high degree of correlation among external information sources and reconstructs a hidden variable as a common source in a Bayesian manner. The second model, a Noisy-OR, picks up the strongest support for an interaction among information sources in a probabilistic fashion. Our extensive computational studies on KEGG signaling pathways as well as on gene expression data from breast cancer and yeast heat shock response reveal that both approaches can significantly enhance the reconstruction accuracy of Bayesian Networks compared to other competing methods as well as to the situation without any prior. Our framework allows for using diverse information sources, like pathway databases, GO terms and protein domain data, etc. and is flexible enough to integrate new sources, if available.

  10. HMDB 4.0: the human metabolome database for 2018.

    PubMed

    Wishart, David S; Feunang, Yannick Djoumbou; Marcu, Ana; Guo, An Chi; Liang, Kevin; Vázquez-Fresno, Rosa; Sajed, Tanvir; Johnson, Daniel; Li, Carin; Karu, Naama; Sayeeda, Zinat; Lo, Elvis; Assempour, Nazanin; Berjanskii, Mark; Singhal, Sandeep; Arndt, David; Liang, Yonjie; Badran, Hasan; Grant, Jason; Serra-Cayuela, Arnau; Liu, Yifeng; Mandal, Rupa; Neveu, Vanessa; Pon, Allison; Knox, Craig; Wilson, Michael; Manach, Claudine; Scalbert, Augustin

    2018-01-04

    The Human Metabolome Database or HMDB (www.hmdb.ca) is a web-enabled metabolomic database containing comprehensive information about human metabolites along with their biological roles, physiological concentrations, disease associations, chemical reactions, metabolic pathways, and reference spectra. First described in 2007, the HMDB is now considered the standard metabolomic resource for human metabolic studies. Over the past decade the HMDB has continued to grow and evolve in response to emerging needs for metabolomics researchers and continuing changes in web standards. This year's update, HMDB 4.0, represents the most significant upgrade to the database in its history. For instance, the number of fully annotated metabolites has increased by nearly threefold, the number of experimental spectra has grown by almost fourfold and the number of illustrated metabolic pathways has grown by a factor of almost 60. Significant improvements have also been made to the HMDB's chemical taxonomy, chemical ontology, spectral viewing, and spectral/text searching tools. A great deal of brand new data has also been added to HMDB 4.0. This includes large quantities of predicted MS/MS and GC-MS reference spectral data as well as predicted (physiologically feasible) metabolite structures to facilitate novel metabolite identification. Additional information on metabolite-SNP interactions and the influence of drugs on metabolite levels (pharmacometabolomics) has also been added. Many other important improvements in the content, the interface, and the performance of the HMDB website have been made and these should greatly enhance its ease of use and its potential applications in nutrition, biochemistry, clinical chemistry, clinical genetics, medicine, and metabolomics science. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Pathways over Time: Functional Genomics Research in an Introductory Laboratory Course.

    PubMed

    Reeves, Todd D; Warner, Douglas M; Ludlow, Larry H; O'Connor, Clare M

    2018-01-01

    National reports have called for the introduction of research experiences throughout the undergraduate curriculum, but practical implementation at many institutions faces challenges associated with sustainability, cost, and large student populations. We describe a novel course-based undergraduate research experience (CURE) that introduces introductory-level students to research in functional genomics in a 3-credit, multisection laboratory class. In the Pathways over Time class project, students study the functional conservation of the methionine biosynthetic pathway between divergent yeast species. Over the five semesters described in this study, students ( N = 793) showed statistically significant and sizable growth in content knowledge ( d = 1.85) and in self-reported research methods skills ( d = 0.65), experimental design, oral and written communication, database use, and collaboration. Statistical analyses indicated that content knowledge growth was larger for underrepresented minority students and that growth in content knowledge, but not research skills, varied by course section. Our findings add to the growing body of evidence that CUREs can support the scientific development of large numbers of students with diverse characteristics. The Pathways over Time project is designed to be sustainable and readily adapted to other institutional settings. © 2018 T. D. Reeves et al. CBE—Life Sciences Education © 2018 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).

  12. Network Analysis Tools: from biological networks to clusters and pathways.

    PubMed

    Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Vanderstocken, Gilles; van Helden, Jacques

    2008-01-01

    Network Analysis Tools (NeAT) is a suite of computer tools that integrate various algorithms for the analysis of biological networks: comparison between graphs, between clusters, or between graphs and clusters; network randomization; analysis of degree distribution; network-based clustering and path finding. The tools are interconnected to enable a stepwise analysis of the network through a complete analytical workflow. In this protocol, we present a typical case of utilization, where the tasks above are combined to decipher a protein-protein interaction network retrieved from the STRING database. The results returned by NeAT are typically subnetworks, networks enriched with additional information (i.e., clusters or paths) or tables displaying statistics. Typical networks comprising several thousands of nodes and arcs can be analyzed within a few minutes. The complete protocol can be read and executed in approximately 1 h.

  13. A Bioinformatics Facility for NASA

    NASA Technical Reports Server (NTRS)

    Schweighofer, Karl; Pohorille, Andrew

    2006-01-01

    Building on an existing prototype, we have fielded a facility with bioinformatics technologies that will help NASA meet its unique requirements for biological research. This facility consists of a cluster of computers capable of performing computationally intensive tasks, software tools, databases and knowledge management systems. Novel computational technologies for analyzing and integrating new biological data and already existing knowledge have been developed. With continued development and support, the facility will fulfill strategic NASA s bioinformatics needs in astrobiology and space exploration. . As a demonstration of these capabilities, we will present a detailed analysis of how spaceflight factors impact gene expression in the liver and kidney for mice flown aboard shuttle flight STS-108. We have found that many genes involved in signal transduction, cell cycle, and development respond to changes in microgravity, but that most metabolic pathways appear unchanged.

  14. VitisCyc: a metabolic pathway knowledgebase for grapevine (Vitis vinifera)

    PubMed Central

    Naithani, Sushma; Raja, Rajani; Waddell, Elijah N.; Elser, Justin; Gouthu, Satyanarayana; Deluc, Laurent G.; Jaiswal, Pankaj

    2014-01-01

    We have developed VitisCyc, a grapevine-specific metabolic pathway database that allows researchers to (i) search and browse the database for its various components such as metabolic pathways, reactions, compounds, genes and proteins, (ii) compare grapevine metabolic networks with other publicly available plant metabolic networks, and (iii) upload, visualize and analyze high-throughput data such as transcriptomes, proteomes, metabolomes etc. using OMICs-Viewer tool. VitisCyc is based on the genome sequence of the nearly homozygous genotype PN40024 of Vitis vinifera “Pinot Noir” cultivar with 12X v1 annotations and was built on BioCyc platform using Pathway Tools software and MetaCyc reference database. Furthermore, VitisCyc was enriched for plant-specific pathways and grape-specific metabolites, reactions and pathways. Currently VitisCyc harbors 68 super pathways, 362 biosynthesis pathways, 118 catabolic pathways, 5 detoxification pathways, 36 energy related pathways and 6 transport pathways, 10,908 enzymes, 2912 enzymatic reactions, 31 transport reactions and 2024 compounds. VitisCyc, as a community resource, can aid in the discovery of candidate genes and pathways that are regulated during plant growth and development, and in response to biotic and abiotic stress signals generated from a plant's immediate environment. VitisCyc version 3.18 is available online at http://pathways.cgrb.oregonstate.edu. PMID:25538713

  15. From genomics to chemical genomics: new developments in KEGG

    PubMed Central

    Kanehisa, Minoru; Goto, Susumu; Hattori, Masahiro; Aoki-Kinoshita, Kiyoko F.; Itoh, Masumi; Kawashima, Shuichi; Katayama, Toshiaki; Araki, Michihiro; Hirakawa, Mika

    2006-01-01

    The increasing amount of genomic and molecular information is the basis for understanding higher-order biological systems, such as the cell and the organism, and their interactions with the environment, as well as for medical, industrial and other practical applications. The KEGG resource () provides a reference knowledge base for linking genomes to biological systems, categorized as building blocks in the genomic space (KEGG GENES) and the chemical space (KEGG LIGAND), and wiring diagrams of interaction networks and reaction networks (KEGG PATHWAY). A fourth component, KEGG BRITE, has been formally added to the KEGG suite of databases. This reflects our attempt to computerize functional interpretations as part of the pathway reconstruction process based on the hierarchically structured knowledge about the genomic, chemical and network spaces. In accordance with the new chemical genomics initiatives, the scope of KEGG LIGAND has been significantly expanded to cover both endogenous and exogenous molecules. Specifically, RPAIR contains curated chemical structure transformation patterns extracted from known enzymatic reactions, which would enable analysis of genome-environment interactions, such as the prediction of new reactions and new enzyme genes that would degrade new environmental compounds. Additionally, drug information is now stored separately and linked to new KEGG DRUG structure maps. PMID:16381885

  16. Comparative peptidomic profile between human hypertrophic scar tissue and matched normal skin for identification of endogenous peptides involved in scar pathology.

    PubMed

    Li, Jingyun; Chen, Ling; Li, Qian; Cao, Jing; Gao, Yanli; Li, Jun

    2018-08-01

    Endogenous peptides recently attract increasing attention for their participation in various biological processes. Their roles in the pathogenesis of human hypertrophic scar remains poorly understood. In this study, we used liquid chromatography-tandem mass spectrometry to construct a comparative peptidomic profiling between human hypertrophic scar tissue and matched normal skin. A total of 179 peptides were significantly differentially expressed in human hypertrophic scar tissue, with 95 upregulated and 84 downregulated peptides between hypertrophic scar tissue and matched normal skin. Further bioinformatics analysis (Gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis) indicated that precursor proteins of these differentially expressed peptides correlate with cellular process, biological regulation, cell part, binding and structural molecule activity ribosome, and PPAR signaling pathway occurring during pathological changes of hypertrophic scar. Based on prediction database, we found that 78 differentially expressed peptides shared homology with antimicrobial peptides and five matched known immunomodulatory peptides. In conclusion, our results show significantly altered expression profiles of peptides in human hypertrophic scar tissue. These peptides may participate in the etiology of hypertrophic scar and provide beneficial scheme for scar evaluation and treatments. © 2017 Wiley Periodicals, Inc.

  17. Transcriptome differences between enrofloxacin-resistant and enrofloxacin-susceptible strains of Aeromonas hydrophila.

    PubMed

    Zhu, Fengjiao; Yang, Zongying; Zhang, Yiliu; Hu, Kun; Fang, Wenhong

    2017-01-01

    Enrofloxacin is the most commonly used antibiotic to control diseases in aquatic animals caused by A. hydrophila. This study conducted de novo transcriptome sequencing and compared the global transcriptomes of enrofloxacin-resistant and enrofloxacin-susceptible strains. We got a total of 4,714 unigenes were assembled. Of these, 4,122 were annotated. A total of 3,280 unigenes were assigned to GO, 3,388 unigenes were classified into Cluster of Orthologous Groups of proteins (COG) using BLAST and BLAST2GO software, and 2,568 were mapped onto pathways using the Kyoto Encyclopedia of Gene and Genomes Pathway database. Furthermore, 218 unigenes were deemed to be DEGs. After enrofloxacin treatment, 135 genes were upregulated and 83 genes were downregulated. The GO terms biological process (126 genes) and metabolic process (136 genes) were the most enriched, and the terms for protein folding, response to stress, and SOS response were also significantly enriched. This study identified enrofloxacin treatment affects multiple biological functions of A. hydrophila. Enrofloxacin resistance in A. hydrophila is closely related to the reduction of intracellular drug accumulation caused by ABC transporters and increased expression of topoisomerase IV.

  18. Transcriptome differences between enrofloxacin-resistant and enrofloxacin-susceptible strains of Aeromonas hydrophila

    PubMed Central

    Yang, Zongying; Zhang, Yiliu; Hu, Kun; Fang, Wenhong

    2017-01-01

    Enrofloxacin is the most commonly used antibiotic to control diseases in aquatic animals caused by A. hydrophila. This study conducted de novo transcriptome sequencing and compared the global transcriptomes of enrofloxacin-resistant and enrofloxacin-susceptible strains. We got a total of 4,714 unigenes were assembled. Of these, 4,122 were annotated. A total of 3,280 unigenes were assigned to GO, 3,388 unigenes were classified into Cluster of Orthologous Groups of proteins (COG) using BLAST and BLAST2GO software, and 2,568 were mapped onto pathways using the Kyoto Encyclopedia of Gene and Genomes Pathway database. Furthermore, 218 unigenes were deemed to be DEGs. After enrofloxacin treatment, 135 genes were upregulated and 83 genes were downregulated. The GO terms biological process (126 genes) and metabolic process (136 genes) were the most enriched, and the terms for protein folding, response to stress, and SOS response were also significantly enriched. This study identified enrofloxacin treatment affects multiple biological functions of A. hydrophila. Enrofloxacin resistance in A. hydrophila is closely related to the reduction of intracellular drug accumulation caused by ABC transporters and increased expression of topoisomerase IV. PMID:28708867

  19. A systematic analysis of a mi-RNA inter-pathway regulatory motif

    PubMed Central

    2013-01-01

    Background The continuing discovery of new types and functions of small non-coding RNAs is suggesting the presence of regulatory mechanisms far more complex than the ones currently used to study and design Gene Regulatory Networks. Just focusing on the roles of micro RNAs (miRNAs), they have been found to be part of several intra-pathway regulatory motifs. However, inter-pathway regulatory mechanisms have been often neglected and require further investigation. Results In this paper we present the result of a systems biology study aimed at analyzing a high-level inter-pathway regulatory motif called Pathway Protection Loop, not previously described, in which miRNAs seem to play a crucial role in the successful behavior and activation of a pathway. Through the automatic analysis of a large set of public available databases, we found statistical evidence that this inter-pathway regulatory motif is very common in several classes of KEGG Homo Sapiens pathways and concurs in creating a complex regulatory network involving several pathways connected by this specific motif. The role of this motif seems also confirmed by a deeper review of other research activities on selected representative pathways. Conclusions Although previous studies suggested transcriptional regulation mechanism at the pathway level such as the Pathway Protection Loop, a high-level analysis like the one proposed in this paper is still missing. The understanding of higher-level regulatory motifs could, as instance, lead to new approaches in the identification of therapeutic targets because it could unveil new and “indirect” paths to activate or silence a target pathway. However, a lot of work still needs to be done to better uncover this high-level inter-pathway regulation including enlarging the analysis to other small non-coding RNA molecules. PMID:24152805

  20. bioDBnet - Biological Database Network

    Cancer.gov

    bioDBnet is a comprehensive resource of most of the biological databases available from different sites like NCBI, Uniprot, EMBL, Ensembl, Affymetrix. It provides a queryable interface to all the databases available, converts identifiers from one database into another and generates comprehensive reports.

  1. Signalling Network Construction for Modelling Plant Defence Response

    PubMed Central

    Miljkovic, Dragana; Stare, Tjaša; Mozetič, Igor; Podpečan, Vid; Petek, Marko; Witek, Kamil; Dermastia, Marina; Lavrač, Nada; Gruden, Kristina

    2012-01-01

    Plant defence signalling response against various pathogens, including viruses, is a complex phenomenon. In resistant interaction a plant cell perceives the pathogen signal, transduces it within the cell and performs a reprogramming of the cell metabolism leading to the pathogen replication arrest. This work focuses on signalling pathways crucial for the plant defence response, i.e., the salicylic acid, jasmonic acid and ethylene signal transduction pathways, in the Arabidopsis thaliana model plant. The initial signalling network topology was constructed manually by defining the representation formalism, encoding the information from public databases and literature, and composing a pathway diagram. The manually constructed network structure consists of 175 components and 387 reactions. In order to complement the network topology with possibly missing relations, a new approach to automated information extraction from biological literature was developed. This approach, named Bio3graph, allows for automated extraction of biological relations from the literature, resulting in a set of (component1, reaction, component2) triplets and composing a graph structure which can be visualised, compared to the manually constructed topology and examined by the experts. Using a plant defence response vocabulary of components and reaction types, Bio3graph was applied to a set of 9,586 relevant full text articles, resulting in 137 newly detected reactions between the components. Finally, the manually constructed topology and the new reactions were merged to form a network structure consisting of 175 components and 524 reactions. The resulting pathway diagram of plant defence signalling represents a valuable source for further computational modelling and interpretation of omics data. The developed Bio3graph approach, implemented as an executable language processing and graph visualisation workflow, is publically available at http://ropot.ijs.si/bio3graph/and can be utilised for modelling other biological systems, given that an adequate vocabulary is provided. PMID:23272172

  2. Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data.

    PubMed

    Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia

    2015-01-01

    Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/. © The Author(s) 2015. Published by Oxford University Press.

  3. cMapper: gene-centric connectivity mapper for EBI-RDF platform.

    PubMed

    Shoaib, Muhammad; Ansari, Adnan Ahmad; Ahn, Sung-Min

    2017-01-15

    In this era of biological big data, data integration has become a common task and a challenge for biologists. The Resource Description Framework (RDF) was developed to enable interoperability of heterogeneous datasets. The EBI-RDF platform enables an efficient data integration of six independent biological databases using RDF technologies and shared ontologies. However, to take advantage of this platform, biologists need to be familiar with RDF technologies and SPARQL query language. To overcome this practical limitation of the EBI-RDF platform, we developed cMapper, a web-based tool that enables biologists to search the EBI-RDF databases in a gene-centric manner without a thorough knowledge of RDF and SPARQL. cMapper allows biologists to search data entities in the EBI-RDF platform that are connected to genes or small molecules of interest in multiple biological contexts. The input to cMapper consists of a set of genes or small molecules, and the output are data entities in six independent EBI-RDF databases connected with the given genes or small molecules in the user's query. cMapper provides output to users in the form of a graph in which nodes represent data entities and the edges represent connections between data entities and inputted set of genes or small molecules. Furthermore, users can apply filters based on database, taxonomy, organ and pathways in order to focus on a core connectivity graph of their interest. Data entities from multiple databases are differentiated based on background colors. cMapper also enables users to investigate shared connections between genes or small molecules of interest. Users can view the output graph on a web browser or download it in either GraphML or JSON formats. cMapper is available as a web application with an integrated MySQL database. The web application was developed using Java and deployed on Tomcat server. We developed the user interface using HTML5, JQuery and the Cytoscape Graph API. cMapper can be accessed at http://cmapper.ewostech.net Readers can download the development manual from the website http://cmapper.ewostech.net/docs/cMapperDocumentation.pdf. Source Code is available at https://github.com/muhammadshoaib/cmapperContact:smahn@gachon.ac.krSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. Lynx web services for annotations and systems analysis of multi-gene disorders.

    PubMed

    Sulakhe, Dinanath; Taylor, Andrew; Balasubramanian, Sandhya; Feng, Bo; Xie, Bingqing; Börnigen, Daniela; Dave, Utpal J; Foster, Ian T; Gilliam, T Conrad; Maltsev, Natalia

    2014-07-01

    Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Diterpenes and Their Derivatives as Potential Anticancer Agents.

    PubMed

    Islam, Muhammad Torequl

    2017-05-01

    As therapeutic tools, diterpenes and their derivatives have gained much attention of the medicinal scientists nowadays. It is due to their pledging and important biological activities. This review congregates the anticancer diterpenes. For this, a search was made with selected keywords in PubMed, Science Direct, Web of Science, Scopus, The American Chemical Society and miscellaneous databases from January 2012 to January 2017 for the published articles. A total 28, 789 published articles were seen. Among them, 240 were included in this study. More than 250 important anticancer diterpenes and their derivatives were seen in the databases, acting in the different pathways. Some of them are already under clinical trials, while others are in the nonclinical and/or pre-clinical trials. In conclusion, diterpenes may be one of the lead molecules in the treatment of cancer. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  6. The chemokine receptor CCR1 is identified in mast cell-derived exosomes

    PubMed Central

    Liang, Yuting; Qiao, Longwei; Peng, Xia; Cui, Zelin; Yin, Yue; Liao, Huanjin; Jiang, Min; Li, Li

    2018-01-01

    Mast cells are important effector cells of the immune system, and mast cell-derived exosomes carrying RNAs play a role in immune regulation. However, the molecular function of mast cell-derived exosomes is currently unknown, and here, we identify differentially expressed genes (DEGs) in mast cells and exosomes. We isolated mast cells derived exosomes through differential centrifugation and screened the DEGs from mast cell-derived exosomes, using the GSE25330 array dataset downloaded from the Gene Expression Omnibus database. Biochemical pathways were analyzed by Gene ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway on the online tool DAVID. DEGs-associated protein-protein interaction networks (PPIs) were constructed using the STRING database and Cytoscape software. The genes identified from these bioinformatics analyses were verified by qRT-PCR and Western blot in mast cells and exosomes. We identified 2121 DEGs (843 up and 1278 down-regulated genes) in HMC-1 cell-derived exosomes and HMC-1 cells. The up-regulated DEGs were classified into two significant modules. The chemokine receptor CCR1 was screened as a hub gene and enriched in cytokine-mediated signaling pathway in module one. Seven genes, including CCR1, CD9, KIT, TGFBR1, TLR9, TPSAB1 and TPSB2 were screened and validated through qRT-PCR analysis. We have achieved a comprehensive view of the pivotal genes and pathways in mast cells and exosomes and identified CCR1 as a hub gene in mast cell-derived exosomes. Our results provide novel clues with respect to the biological processes through which mast cell-derived exosomes modulate immune responses. PMID:29511430

  7. The Importance of Biological Databases in Biological Discovery.

    PubMed

    Baxevanis, Andreas D; Bateman, Alex

    2015-06-19

    Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. Copyright © 2015 John Wiley & Sons, Inc.

  8. SPARQLGraph: a web-based platform for graphically querying biological Semantic Web databases.

    PubMed

    Schweiger, Dominik; Trajanoski, Zlatko; Pabinger, Stephan

    2014-08-15

    Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers. This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at.

  9. Pathway modeling of microarray data: A case study of pathway activity changes in the testis following in utero exposure to dibutyl phthalate (DBP)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ovacik, Meric A.; Sen, Banalata; Euling, Susan Y.

    Pathway activity level analysis, the approach pursued in this study, focuses on all genes that are known to be members of metabolic and signaling pathways as defined by the KEGG database. The pathway activity level analysis entails singular value decomposition (SVD) of the expression data of the genes constituting a given pathway. We explore an extension of the pathway activity methodology for application to time-course microarray data. We show that pathway analysis enhances our ability to detect biologically relevant changes in pathway activity using synthetic data. As a case study, we apply the pathway activity level formulation coupled with significancemore » analysis to microarray data from two different rat testes exposed in utero to Dibutyl Phthalate (DBP). In utero DBP exposure in the rat results in developmental toxicity of a number of male reproductive organs, including the testes. One well-characterized mode of action for DBP and the male reproductive developmental effects is the repression of expression of genes involved in cholesterol transport, steroid biosynthesis and testosterone synthesis that lead to a decreased fetal testicular testosterone. Previous analyses of DBP testes microarray data focused on either individual gene expression changes or changes in the expression of specific genes that are hypothesized, or known, to be important in testicular development and testosterone synthesis. However, a pathway analysis may inform whether there are additional affected pathways that could inform additional modes of action linked to DBP developmental toxicity. We show that Pathway activity analysis may be considered for a more comprehensive analysis of microarray data.« less

  10. PAMDB: a comprehensive Pseudomonas aeruginosa metabolome database.

    PubMed

    Huang, Weiliang; Brewer, Luke K; Jones, Jace W; Nguyen, Angela T; Marcu, Ana; Wishart, David S; Oglesby-Sherrouse, Amanda G; Kane, Maureen A; Wilks, Angela

    2018-01-04

    The Pseudomonas aeruginosaMetabolome Database (PAMDB, http://pseudomonas.umaryland.edu) is a searchable, richly annotated metabolite database specific to P. aeruginosa. P. aeruginosa is a soil organism and significant opportunistic pathogen that adapts to its environment through a versatile energy metabolism network. Furthermore, P. aeruginosa is a model organism for the study of biofilm formation, quorum sensing, and bioremediation processes, each of which are dependent on unique pathways and metabolites. The PAMDB is modelled on the Escherichia coli (ECMDB), yeast (YMDB) and human (HMDB) metabolome databases and contains >4370 metabolites and 938 pathways with links to over 1260 genes and proteins. The database information was compiled from electronic databases, journal articles and mass spectrometry (MS) metabolomic data obtained in our laboratories. For each metabolite entered, we provide detailed compound descriptions, names and synonyms, structural and physiochemical information, nuclear magnetic resonance (NMR) and MS spectra, enzymes and pathway information, as well as gene and protein sequences. The database allows extensive searching via chemical names, structure and molecular weight, together with gene, protein and pathway relationships. The PAMBD and its future iterations will provide a valuable resource to biologists, natural product chemists and clinicians in identifying active compounds, potential biomarkers and clinical diagnostics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. An Integrated Human/Murine Transcriptome and Pathway Approach To Identify Prenatal Treatments For Down Syndrome.

    PubMed

    Guedj, Faycal; Pennings, Jeroen LA; Massingham, Lauren J; Wick, Heather C; Siegel, Ashley E; Tantravahi, Umadevi; Bianchi, Diana W

    2016-09-02

    Anatomical and functional brain abnormalities begin during fetal life in Down syndrome (DS). We hypothesize that novel prenatal treatments can be identified by targeting signaling pathways that are consistently perturbed in cell types/tissues obtained from human fetuses with DS and mouse embryos. We analyzed transcriptome data from fetuses with trisomy 21, age and sex-matched euploid controls, and embryonic day 15.5 forebrains from Ts1Cje, Ts65Dn, and Dp16 mice. The new datasets were compared to other publicly available datasets from humans with DS. We used the human Connectivity Map (CMap) database and created a murine adaptation to identify FDA-approved drugs that can rescue affected pathways. USP16 and TTC3 were dysregulated in all affected human cells and two mouse models. DS-associated pathway abnormalities were either the result of gene dosage specific effects or the consequence of a global cell stress response with activation of compensatory mechanisms. CMap analyses identified 56 molecules with high predictive scores to rescue abnormal gene expression in both species. Our novel integrated human/murine systems biology approach identified commonly dysregulated genes and pathways. This can help to prioritize therapeutic molecules on which to further test safety and efficacy. Additional studies in human cells are ongoing prior to pre-clinical prenatal treatment in mice.

  12. Challenges in horizontal model integration.

    PubMed

    Kolczyk, Katrin; Conradi, Carsten

    2016-03-11

    Systems Biology has motivated dynamic models of important intracellular processes at the pathway level, for example, in signal transduction and cell cycle control. To answer important biomedical questions, however, one has to go beyond the study of isolated pathways towards the joint study of interacting signaling pathways or the joint study of signal transduction and cell cycle control. Thereby the reuse of established models is preferable, as it will generally reduce the modeling effort and increase the acceptance of the combined model in the field. Obtaining a combined model can be challenging, especially if the submodels are large and/or come from different working groups (as is generally the case, when models stored in established repositories are used). To support this task, we describe a semi-automatic workflow based on established software tools. In particular, two frequent challenges are described: identification of the overlap and subsequent (re)parameterization of the integrated model. The reparameterization step is crucial, if the goal is to obtain a model that can reproduce the data explained by the individual models. For demonstration purposes we apply our workflow to integrate two signaling pathways (EGF and NGF) from the BioModels Database.

  13. Determining conserved metabolic biomarkers from a million database queries.

    PubMed

    Kurczy, Michael E; Ivanisevic, Julijana; Johnson, Caroline H; Uritboonthai, Winnie; Hoang, Linh; Fang, Mingliang; Hicks, Matthew; Aldebot, Anthony; Rinehart, Duane; Mellander, Lisa J; Tautenhahn, Ralf; Patti, Gary J; Spilker, Mary E; Benton, H Paul; Siuzdak, Gary

    2015-12-01

    Metabolite databases provide a unique window into metabolome research allowing the most commonly searched biomarkers to be catalogued. Omic scale metabolite profiling, or metabolomics, is finding increased utility in biomarker discovery largely driven by improvements in analytical technologies and the concurrent developments in bioinformatics. However, the successful translation of biomarkers into clinical or biologically relevant indicators is limited. With the aim of improving the discovery of translatable metabolite biomarkers, we present search analytics for over one million METLIN metabolite database queries. The most common metabolites found in METLIN were cross-correlated against XCMS Online, the widely used cloud-based data processing and pathway analysis platform. Analysis of the METLIN and XCMS common metabolite data has two primary implications: these metabolites, might indicate a conserved metabolic response to stressors and, this data may be used to gauge the relative uniqueness of potential biomarkers. METLIN can be accessed by logging on to: https://metlin.scripps.edu siuzdak@scripps.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. Systematic analysis of signaling pathways using an integrative environment.

    PubMed

    Visvanathan, Mahesh; Breit, Marc; Pfeifer, Bernhard; Baumgartner, Christian; Modre-Osprian, Robert; Tilg, Bernhard

    2007-01-01

    Understanding the biological processes of signaling pathways as a whole system requires an integrative software environment that has comprehensive capabilities. The environment should include tools for pathway design, visualization, simulation and a knowledge base concerning signaling pathways as one. In this paper we introduce a new integrative environment for the systematic analysis of signaling pathways. This system includes environments for pathway design, visualization, simulation and a knowledge base that combines biological and modeling information concerning signaling pathways that provides the basic understanding of the biological system, its structure and functioning. The system is designed with a client-server architecture. It contains a pathway designing environment and a simulation environment as upper layers with a relational knowledge base as the underlying layer. The TNFa-mediated NF-kB signal trans-duction pathway model was designed and tested using our integrative framework. It was also useful to define the structure of the knowledge base. Sensitivity analysis of this specific pathway was performed providing simulation data. Then the model was extended showing promising initial results. The proposed system offers a holistic view of pathways containing biological and modeling data. It will help us to perform biological interpretation of the simulation results and thus contribute to a better understanding of the biological system for drug identification.

  15. Gene Network Rewiring to Study Melanoma Stage Progression and Elements Essential for Driving Melanoma

    PubMed Central

    Kaushik, Abhinav; Bhatia, Yashuma; Ali, Shakir; Gupta, Dinesh

    2015-01-01

    Metastatic melanoma patients have a poor prognosis, mainly attributable to the underlying heterogeneity in melanoma driver genes and altered gene expression profiles. These characteristics of melanoma also make the development of drugs and identification of novel drug targets for metastatic melanoma a daunting task. Systems biology offers an alternative approach to re-explore the genes or gene sets that display dysregulated behaviour without being differentially expressed. In this study, we have performed systems biology studies to enhance our knowledge about the conserved property of disease genes or gene sets among mutually exclusive datasets representing melanoma progression. We meta-analysed 642 microarray samples to generate melanoma reconstructed networks representing four different stages of melanoma progression to extract genes with altered molecular circuitry wiring as compared to a normal cellular state. Intriguingly, a majority of the melanoma network-rewired genes are not differentially expressed and the disease genes involved in melanoma progression consistently modulate its activity by rewiring network connections. We found that the shortlisted disease genes in the study show strong and abnormal network connectivity, which enhances with the disease progression. Moreover, the deviated network properties of the disease gene sets allow ranking/prioritization of different enriched, dysregulated and conserved pathway terms in metastatic melanoma, in agreement with previous findings. Our analysis also reveals presence of distinct network hubs in different stages of metastasizing tumor for the same set of pathways in the statistically conserved gene sets. The study results are also presented as a freely available database at http://bioinfo.icgeb.res.in/m3db/. The web-based database resource consists of results from the analysis presented here, integrated with cytoscape web and user-friendly tools for visualization, retrieval and further analysis. PMID:26558755

  16. VisANT 3.0: new modules for pathway visualization, editing, prediction and construction.

    PubMed

    Hu, Zhenjun; Ng, David M; Yamada, Takuji; Chen, Chunnuan; Kawashima, Shuichi; Mellor, Joe; Linghu, Bolan; Kanehisa, Minoru; Stuart, Joshua M; DeLisi, Charles

    2007-07-01

    With the integration of the KEGG and Predictome databases as well as two search engines for coexpressed genes/proteins using data sets obtained from the Stanford Microarray Database (SMD) and Gene Expression Omnibus (GEO) database, VisANT 3.0 supports exploratory pathway analysis, which includes multi-scale visualization of multiple pathways, editing and annotating pathways using a KEGG compatible visual notation and visualization of expression data in the context of pathways. Expression levels are represented either by color intensity or by nodes with an embedded expression profile. Multiple experiments can be navigated or animated. Known KEGG pathways can be enriched by querying either coexpressed components of known pathway members or proteins with known physical interactions. Predicted pathways for genes/proteins with unknown functions can be inferred from coexpression or physical interaction data. Pathways produced in VisANT can be saved as computer-readable XML format (VisML), graphic images or high-resolution Scalable Vector Graphics (SVG). Pathways in the format of VisML can be securely shared within an interested group or published online using a simple Web link. VisANT is freely available at http://visant.bu.edu.

  17. Discovery of cashmere goat (Capra hircus) microRNAs in skin and hair follicles by Solexa sequencing.

    PubMed

    Yuan, Chao; Wang, Xiaolong; Geng, Rongqing; He, Xiaolin; Qu, Lei; Chen, Yulin

    2013-07-28

    MicroRNAs (miRNAs) are a large family of endogenous, non-coding RNAs, about 22 nucleotides long, which regulate gene expression through sequence-specific base pairing with target mRNAs. Extensive studies have shown that miRNA expression in the skin changes remarkably during distinct stages of the hair cycle in humans, mice, goats and sheep. In this study, the skin tissues were harvested from the three stages of hair follicle cycling (anagen, catagen and telogen) in a fibre-producing goat breed. In total, 63,109,004 raw reads were obtained by Solexa sequencing and 61,125,752 clean reads remained for the small RNA digitalisation analysis. This resulted in the identification of 399 conserved miRNAs; among these, 326 miRNAs were expressed in all three follicular cycling stages, whereas 3, 12 and 11 miRNAs were specifically expressed in anagen, catagen, and telogen, respectively. We also identified 172 potential novel miRNAs by Mireap, 36 miRNAs were expressed in all three cycling stages, whereas 23, 29 and 44 miRNAs were specifically expressed in anagen, catagen, and telogen, respectively. The expression level of five arbitrarily selected miRNAs was analyzed by quantitative PCR, and the results indicated that the expression patterns were consistent with the Solexa sequencing results. Gene Ontology and KEGG pathway analyses indicated that five major biological pathways (Metabolic pathways, Pathways in cancer, MAPK signalling pathway, Endocytosis and Focal adhesion) accounted for 23.08% of target genes among 278 biological functions, indicating that these pathways are likely to play significant roles during hair cycling. During all hair cycle stages of cashmere goats, a large number of conserved and novel miRNAs were identified through a high-throughput sequencing approach. This study enriches the Capra hircus miRNA databases and provides a comprehensive miRNA transcriptome profile in the skin of goats during the hair follicle cycle.

  18. Biological Databases for Human Research

    PubMed Central

    Zou, Dong; Ma, Lina; Yu, Jun; Zhang, Zhang

    2015-01-01

    The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation. PMID:25712261

  19. Identifying relevant data for a biological database: handcrafted rules versus machine learning.

    PubMed

    Sehgal, Aditya Kumar; Das, Sanmay; Noto, Keith; Saier, Milton H; Elkan, Charles

    2011-01-01

    With well over 1,000 specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both the methods compared and the results will be of interest to curators of many specialized databases.

  20. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data.

    PubMed

    Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-03-01

    Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics, and others yet to emerge on the postgenomics horizon.

  1. Constructing biological pathway models with hybrid functional Petri nets.

    PubMed

    Doi, Atsushi; Fujita, Sachie; Matsuno, Hiroshi; Nagasaki, Masao; Miyano, Satoru

    2004-01-01

    In many research projects on modeling and analyzing biological pathways, the Petri net has been recognized as a promising method for representing biological pathways. From the pioneering works by Reddy et al., 1993, and Hofestädt, 1994, that model metabolic pathways by traditional Petri net, several enhanced Petri nets such as colored Petri net, stochastic Petri net, and hybrid Petri net have been used for modeling biological phenomena. Recently, Matsuno et al., 2003b, introduced the hybrid functional Petri net (HFPN) in order to give a more intuitive and natural modeling method for biological pathways than these existing Petri nets. Although the paper demonstrates the effectiveness of HFPN with two examples of gene regulation mechanism for circadian rhythms and apoptosis signaling pathway, there has been no detailed explanation about the method of HFPN construction for these examples. The purpose of this paper is to describe method to construct biological pathways with the HFPN step-by-step. The method is demonstrated by the well-known glycolytic pathway controlled by the lac operon gene regulatory mechanism.

  2. Constructing biological pathway models with hybrid functional petri nets.

    PubMed

    Doi, Atsushi; Fujita, Sachie; Matsuno, Hiroshi; Nagasaki, Masao; Miyano, Satoru

    2011-01-01

    In many research projects on modeling and analyzing biological pathways, the Petri net has been recognized as a promising method for representing biological pathways. From the pioneering works by Reddy et al., 1993, and Hofestädt, 1994, that model metabolic pathways by traditional Petri net, several enhanced Petri nets such as colored Petri net, stochastic Petri net, and hybrid Petri net have been used for modeling biological phenomena. Recently, Matsuno et al., 2003b, introduced the hybrid functional Petri net (HFPN) in order to give a more intuitive and natural modeling method for biological pathways than these existing Petri nets. Although the paper demonstrates the effectiveness of HFPN with two examples of gene regulation mechanism for circadian rhythms and apoptosis signaling pathway, there has been no detailed explanation about the method of HFPN construction for these examples. The purpose of this paper is to describe method to construct biological pathways with the HFPN step-by-step. The method is demonstrated by the well-known glycolytic pathway controlled by the lac operon gene regulatory mechanism.

  3. An attempt to understand glioma stem cell biology through centrality analysis of a protein interaction network.

    PubMed

    Mallik, Mrinmay Kumar

    2018-02-07

    Biological networks can be analyzed using "Centrality Analysis" to identify the more influential nodes and interactions in the network. This study was undertaken to create and visualize a biological network comprising of protein-protein interactions (PPIs) amongst proteins which are preferentially over-expressed in glioma cancer stem cell component (GCSC) of glioblastomas as compared to the glioma non-stem cancer cell (GNSC) component and then to analyze this network through centrality analyses (CA) in order to identify the essential proteins in this network and their interactions. In addition, this study proposes a new centrality analysis method pertaining exclusively to transcription factors (TFs) and interactions amongst them. Moreover the relevant molecular functions, biological processes and biochemical pathways amongst these proteins were sought through enrichment analysis. A protein interaction network was created using a list of proteins which have been shown to be preferentially expressed or over-expressed in GCSCs isolated from glioblastomas as compared to the GNSCs. This list comprising of 38 proteins, created using manual literature mining, was submitted to the Reactome FIViz tool, a web based application integrated into Cytoscape, an open source software platform for visualizing and analyzing molecular interaction networks and biological pathways to produce the network. This network was subjected to centrality analyses utilizing ranked lists of six centrality measures using the FIViz application and (for the first time) a dedicated centrality analysis plug-in ; CytoNCA. The interactions exclusively amongst the transcription factors were nalyzed through a newly proposed centrality analysis method called "Gene Expression Associated Degree Centrality Analysis (GEADCA)". Enrichment analysis was performed using the "network function analysis" tool on Reactome. The CA was able to identify a small set of proteins with consistently high centrality ranks that is indicative of their strong influence in the protein protein interaction network. Similarly the newly proposed GEADCA helped identify the transcription factors with high centrality values indicative of their key roles in transcriptional regulation. The enrichment studies provided a list of molecular functions, biological processes and biochemical pathways associated with the constructed network. The study shows how pathway based databases may be used to create and analyze a relevant protein interaction network in glioma cancer stem cells and identify the essential elements within it to gather insights into the molecular interactions that regulate the properties of glioma stem cells. How these insights may be utilized to help the development of future research towards formulation of new management strategies have been discussed from a theoretical standpoint. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. Integrated Analysis of Mutation Data from Various Sources Identifies Key Genes and Signaling Pathways in Hepatocellular Carcinoma

    PubMed Central

    Wei, Lin; Tang, Ruqi; Lian, Baofeng; Zhao, Yingjun; He, Xianghuo; Xie, Lu

    2014-01-01

    Background Recently, a number of studies have performed genome or exome sequencing of hepatocellular carcinoma (HCC) and identified hundreds or even thousands of mutations in protein-coding genes. However, these studies have only focused on a limited number of candidate genes, and many important mutation resources remain to be explored. Principal Findings In this study, we integrated mutation data obtained from various sources and performed pathway and network analysis. We identified 113 pathways that were significantly mutated in HCC samples and found that the mutated genes included in these pathways contained high percentages of known cancer genes, and damaging genes and also demonstrated high conservation scores, indicating their important roles in liver tumorigenesis. Five classes of pathways that were mutated most frequently included (a) proliferation and apoptosis related pathways, (b) tumor microenvironment related pathways, (c) neural signaling related pathways, (d) metabolic related pathways, and (e) circadian related pathways. Network analysis further revealed that the mutated genes with the highest betweenness coefficients, such as the well-known cancer genes TP53, CTNNB1 and recently identified novel mutated genes GNAL and the ADCY family, may play key roles in these significantly mutated pathways. Finally, we highlight several key genes (e.g., RPS6KA3 and PCLO) and pathways (e.g., axon guidance) in which the mutations were associated with clinical features. Conclusions Our workflow illustrates the increased statistical power of integrating multiple studies of the same subject, which can provide biological insights that would otherwise be masked under individual sample sets. This type of bioinformatics approach is consistent with the necessity of making the best use of the ever increasing data provided in valuable databases, such as TCGA, to enhance the speed of deciphering human cancers. PMID:24988079

  5. Integrated analysis of mutation data from various sources identifies key genes and signaling pathways in hepatocellular carcinoma.

    PubMed

    Zhang, Yuannv; Qiu, Zhaoping; Wei, Lin; Tang, Ruqi; Lian, Baofeng; Zhao, Yingjun; He, Xianghuo; Xie, Lu

    2014-01-01

    Recently, a number of studies have performed genome or exome sequencing of hepatocellular carcinoma (HCC) and identified hundreds or even thousands of mutations in protein-coding genes. However, these studies have only focused on a limited number of candidate genes, and many important mutation resources remain to be explored. In this study, we integrated mutation data obtained from various sources and performed pathway and network analysis. We identified 113 pathways that were significantly mutated in HCC samples and found that the mutated genes included in these pathways contained high percentages of known cancer genes, and damaging genes and also demonstrated high conservation scores, indicating their important roles in liver tumorigenesis. Five classes of pathways that were mutated most frequently included (a) proliferation and apoptosis related pathways, (b) tumor microenvironment related pathways, (c) neural signaling related pathways, (d) metabolic related pathways, and (e) circadian related pathways. Network analysis further revealed that the mutated genes with the highest betweenness coefficients, such as the well-known cancer genes TP53, CTNNB1 and recently identified novel mutated genes GNAL and the ADCY family, may play key roles in these significantly mutated pathways. Finally, we highlight several key genes (e.g., RPS6KA3 and PCLO) and pathways (e.g., axon guidance) in which the mutations were associated with clinical features. Our workflow illustrates the increased statistical power of integrating multiple studies of the same subject, which can provide biological insights that would otherwise be masked under individual sample sets. This type of bioinformatics approach is consistent with the necessity of making the best use of the ever increasing data provided in valuable databases, such as TCGA, to enhance the speed of deciphering human cancers.

  6. MetaMapR: pathway independent metabolomic network analysis incorporating unknowns.

    PubMed

    Grapov, Dmitry; Wanichthanarak, Kwanjeera; Fiehn, Oliver

    2015-08-15

    Metabolic network mapping is a widely used approach for integration of metabolomic experimental results with biological domain knowledge. However, current approaches can be limited by biochemical domain or pathway knowledge which results in sparse disconnected graphs for real world metabolomic experiments. MetaMapR integrates enzymatic transformations with metabolite structural similarity, mass spectral similarity and empirical associations to generate richly connected metabolic networks. This open source, web-based or desktop software, written in the R programming language, leverages KEGG and PubChem databases to derive associations between metabolites even in cases where biochemical domain or molecular annotations are unknown. Network calculation is enhanced through an interface to the Chemical Translation System, which allows metabolite identifier translation between >200 common biochemical databases. Analysis results are presented as interactive visualizations or can be exported as high-quality graphics and numerical tables which can be imported into common network analysis and visualization tools. Freely available at http://dgrapov.github.io/MetaMapR/. Requires R and a modern web browser. Installation instructions, tutorials and application examples are available at http://dgrapov.github.io/MetaMapR/. ofiehn@ucdavis.edu. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. The Biological Macromolecule Crystallization Database and NASA Protein Crystal Growth Archive

    PubMed Central

    Gilliland, Gary L.; Tung, Michael; Ladner, Jane

    1996-01-01

    The NIST/NASA/CARB Biological Macromolecule Crystallization Database (BMCD), NIST Standard Reference Database 21, contains crystal data and crystallization conditions for biological macromolecules. The database entries include data abstracted from published crystallographic reports. Each entry consists of information describing the biological macromolecule crystallized and crystal data and the crystallization conditions for each crystal form. The BMCD serves as the NASA Protein Crystal Growth Archive in that it contains protocols and results of crystallization experiments undertaken in microgravity (space). These database entries report the results, whether successful or not, from NASA-sponsored protein crystal growth experiments in microgravity and from microgravity crystallization studies sponsored by other international organizations. The BMCD was designed as a tool to assist x-ray crystallographers in the development of protocols to crystallize biological macromolecules, those that have previously been crystallized, and those that have not been crystallized. PMID:11542472

  8. Genome Expression Pathway Analysis Tool – Analysis and visualization of microarray gene expression data under genomic, proteomic and metabolic context

    PubMed Central

    Weniger, Markus; Engelmann, Julia C; Schultz, Jörg

    2007-01-01

    Background Regulation of gene expression is relevant to many areas of biology and medicine, in the study of treatments, diseases, and developmental stages. Microarrays can be used to measure the expression level of thousands of mRNAs at the same time, allowing insight into or comparison of different cellular conditions. The data derived out of microarray experiments is highly dimensional and often noisy, and interpretation of the results can get intricate. Although programs for the statistical analysis of microarray data exist, most of them lack an integration of analysis results and biological interpretation. Results We have developed GEPAT, Genome Expression Pathway Analysis Tool, offering an analysis of gene expression data under genomic, proteomic and metabolic context. We provide an integration of statistical methods for data import and data analysis together with a biological interpretation for subsets of probes or single probes on the chip. GEPAT imports various types of oligonucleotide and cDNA array data formats. Different normalization methods can be applied to the data, afterwards data annotation is performed. After import, GEPAT offers various statistical data analysis methods, as hierarchical, k-means and PCA clustering, a linear model based t-test or chromosomal profile comparison. The results of the analysis can be interpreted by enrichment of biological terms, pathway analysis or interaction networks. Different biological databases are included, to give various information for each probe on the chip. GEPAT offers no linear work flow, but allows the usage of any subset of probes and samples as a start for a new data analysis. GEPAT relies on established data analysis packages, offers a modular approach for an easy extension, and can be run on a computer grid to allow a large number of users. It is freely available under the LGPL open source license for academic and commercial users at . Conclusion GEPAT is a modular, scalable and professional-grade software integrating analysis and interpretation of microarray gene expression data. An installation available for academic users can be found at . PMID:17543125

  9. Integrated metabolomics and proteomics highlight altered nicotinamide and polyamine pathways in lung adenocarcinoma

    PubMed Central

    Fahrmann, Johannes F.; Grapov, Dmitry; Wanichthanarak, Kwanjeera; DeFelice, Brian C.; Salemi, Michelle R.; Rom, William N.; Gandara, David R.; Phinney, Brett S.; Fiehn, Oliver; Pass, Harvey

    2017-01-01

    Abstract Lung cancer is the leading cause of cancer mortality in the United States with non-small cell lung cancer adenocarcinoma being the most common histological type. Early perturbations in cellular metabolism are a hallmark of cancer, but the extent of these changes in early stage lung adenocarcinoma remains largely unknown. In the current study, an integrated metabolomics and proteomics approach was utilized to characterize the biochemical and molecular alterations between malignant and matched control tissue from 27 subjects diagnosed with early stage lung adenocarcinoma. Differential analysis identified 71 metabolites and 1102 proteins that delineated tumor from control tissue. Integrated results indicated four major metabolic changes in early stage adenocarcinoma (1): increased glycosylation and glutaminolysis (2); elevated Nrf2 activation (3); increase in nicotinic and nicotinamide salvaging pathways and (4) elevated polyamine biosynthesis linked to differential regulation of the s-adenosylmethionine/nicotinamide methyl-donor pathway. Genomic data from publicly available databases were included to strengthen proteomic findings. Our findings provide insight into the biochemical and molecular biological reprogramming that may accompany early stage lung tumorigenesis and highlight potential therapeutic targets. PMID:28049629

  10. Analysis of the ergosterol biosynthesis pathway cloning, molecular characterization and phylogeny of lanosterol 14 α-demethylase (ERG11) gene of Moniliophthora perniciosa.

    PubMed

    de Oliveira Ceita, Geruza; Vilas-Boas, Laurival Antônio; Castilho, Marcelo Santos; Carazzolle, Marcelo Falsarella; Pirovani, Carlos Priminho; Selbach-Schnadelbach, Alessandra; Gramacho, Karina Peres; Ramos, Pablo Ivan Pereira; Barbosa, Luciana Veiga; Pereira, Gonçalo Amarante Guimarães; Góes-Neto, Aristóteles

    2014-10-01

    The phytopathogenic fungus Moniliophthora perniciosa (Stahel) Aime & Philips-Mora, causal agent of witches' broom disease of cocoa, causes countless damage to cocoa production in Brazil. Molecular studies have attempted to identify genes that play important roles in fungal survival and virulence. In this study, sequences deposited in the M. perniciosa Genome Sequencing Project database were analyzed to identify potential biological targets. For the first time, the ergosterol biosynthetic pathway in M. perniciosa was studied and the lanosterol 14α-demethylase gene (ERG11) that encodes the main enzyme of this pathway and is a target for fungicides was cloned, characterized molecularly and its phylogeny analyzed. ERG11 genomic DNA and cDNA were characterized and sequence analysis of the ERG11 protein identified highly conserved domains typical of this enzyme, such as SRS1, SRS4, EXXR and the heme-binding region (HBR). Comparison of the protein sequences and phylogenetic analysis revealed that the M. perniciosa enzyme was most closely related to that of Coprinopsis cinerea.

  11. Analysis of the ergosterol biosynthesis pathway cloning, molecular characterization and phylogeny of lanosterol 14 α-demethylase (ERG11) gene of Moniliophthora perniciosa

    PubMed Central

    de Oliveira Ceita, Geruza; Vilas-Boas, Laurival Antônio; Castilho, Marcelo Santos; Carazzolle, Marcelo Falsarella; Pirovani, Carlos Priminho; Selbach-Schnadelbach, Alessandra; Gramacho, Karina Peres; Ramos, Pablo Ivan Pereira; Barbosa, Luciana Veiga; Pereira, Gonçalo Amarante Guimarães; Góes-Neto, Aristóteles

    2014-01-01

    The phytopathogenic fungus Moniliophthora perniciosa (Stahel) Aime & Philips-Mora, causal agent of witches’ broom disease of cocoa, causes countless damage to cocoa production in Brazil. Molecular studies have attempted to identify genes that play important roles in fungal survival and virulence. In this study, sequences deposited in the M. perniciosa Genome Sequencing Project database were analyzed to identify potential biological targets. For the first time, the ergosterol biosynthetic pathway in M. perniciosa was studied and the lanosterol 14α-demethylase gene (ERG11) that encodes the main enzyme of this pathway and is a target for fungicides was cloned, characterized molecularly and its phylogeny analyzed. ERG11 genomic DNA and cDNA were characterized and sequence analysis of the ERG11 protein identified highly conserved domains typical of this enzyme, such as SRS1, SRS4, EXXR and the heme-binding region (HBR). Comparison of the protein sequences and phylogenetic analysis revealed that the M. perniciosa enzyme was most closely related to that of Coprinopsis cinerea. PMID:25505843

  12. Effect of curcumin on aged Drosophila melanogaster: a pathway prediction analysis.

    PubMed

    Zhang, Zhi-guo; Niu, Xu-yan; Lu, Ai-ping; Xiao, Gary Guishan

    2015-02-01

    To re-analyze the data published in order to explore plausible biological pathways that can be used to explain the anti-aging effect of curcumin. Microarray data generated from other study aiming to investigate effect of curcumin on extending lifespan of Drosophila melanogaster were further used for pathway prediction analysis. The differentially expressed genes were identified by using GeneSpring GX with a criterion of 3.0-fold change. Two Cytoscape plugins including BisoGenet and molecular complex detection (MCODE) were used to establish the protein-protein interaction (PPI) network based upon differential genes in order to detect highly connected regions. The function annotation clustering tool of Database for Annotation, Visualization and Integrated Discovery (DAVID) was used for pathway analysis. A total of 87 genes expressed differentially in D. melanogaster melanogaster treated with curcumin were identified, among which 50 were up-regulated significantly and 37 were remarkably down-regulated in D. melanogaster melanogaster treated with curcumin. Based upon these differential genes, PPI network was constructed with 1,082 nodes and 2,412 edges. Five highly connected regions in PPI networks were detected by MCODE algorithm, suggesting anti-aging effect of curcumin may be underlined through five different pathways including Notch signaling pathway, basal transcription factors, cell cycle regulation, ribosome, Wnt signaling pathway, and p53 pathway. Genes and their associated pathways in D. melanogaster melanogaster treated with anti-aging agent curcumin were identified using PPI network and MCODE algorithm, suggesting that curcumin may be developed as an alternative therapeutic medicine for treating aging-associated diseases.

  13. Integrated genome-wide Alu methylation and transcriptome profiling analyses reveal novel epigenetic regulatory networks associated with autism spectrum disorder.

    PubMed

    Saeliw, Thanit; Tangsuwansri, Chayanin; Thongkorn, Surangrat; Chonchaiya, Weerasak; Suphapeetiporn, Kanya; Mutirangura, Apiwat; Tencomnao, Tewin; Hu, Valerie W; Sarachana, Tewarit

    2018-01-01

    Alu elements are a group of repetitive elements that can influence gene expression through CpG residues and transcription factor binding. Altered gene expression and methylation profiles have been reported in various tissues and cell lines from individuals with autism spectrum disorder (ASD). However, the role of Alu elements in ASD remains unclear. We thus investigated whether Alu elements are associated with altered gene expression profiles in ASD. We obtained five blood-based gene expression profiles from the Gene Expression Omnibus database and human Alu-inserted gene lists from the TranspoGene database. Differentially expressed genes (DEGs) in ASD were identified from each study and overlapped with the human Alu-inserted genes. The biological functions and networks of Alu-inserted DEGs were then predicted by Ingenuity Pathway Analysis (IPA). A combined bisulfite restriction analysis of lymphoblastoid cell lines (LCLs) derived from 36 ASD and 20 sex- and age-matched unaffected individuals was performed to assess the global DNA methylation levels within Alu elements, and the Alu expression levels were determined by quantitative RT-PCR. In ASD blood or blood-derived cells, 320 Alu-inserted genes were reproducibly differentially expressed. Biological function and pathway analysis showed that these genes were significantly associated with neurodevelopmental disorders and neurological functions involved in ASD etiology. Interestingly, estrogen receptor and androgen signaling pathways implicated in the sex bias of ASD, as well as IL-6 signaling and neuroinflammation signaling pathways, were also highlighted. Alu methylation was not significantly different between the ASD and sex- and age-matched control groups. However, significantly altered Alu methylation patterns were observed in ASD cases sub-grouped based on Autism Diagnostic Interview-Revised scores compared with matched controls. Quantitative RT-PCR analysis of Alu expression also showed significant differences between ASD subgroups. Interestingly, Alu expression was correlated with methylation status in one phenotypic ASD subgroup. Alu methylation and expression were altered in LCLs from ASD subgroups. Our findings highlight the association of Alu elements with gene dysregulation in ASD blood samples and warrant further investigation. Moreover, the classification of ASD individuals into subgroups based on phenotypes may be beneficial and could provide insights into the still unknown etiology and the underlying mechanisms of ASD.

  14. An Integrative Framework for Bayesian Variable Selection with Informative Priors for Identifying Genes and Pathways

    PubMed Central

    Ander, Bradley P.; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R.; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with ‘large p, small n’ problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed. PMID:23844055

  15. An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways.

    PubMed

    Peng, Bin; Zhu, Dianwen; Ander, Bradley P; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.

  16. Bioinformatics Analysis Reveals Distinct Molecular Characteristics of Hepatitis B-Related Hepatocellular Carcinomas from Very Early to Advanced Barcelona Clinic Liver Cancer Stages.

    PubMed

    Kong, Fan-Yun; Wei, Xiao; Zhou, Kai; Hu, Wei; Kou, Yan-Bo; You, Hong-Juan; Liu, Xiao-Mei; Zheng, Kui-Yang; Tang, Ren-Xian

    2016-01-01

    Hepatocellular carcinoma (HCC)is the fifth most common malignancy associated with high mortality. One of the risk factors for HCC is chronic hepatitis B virus (HBV) infection. The treatment strategy for the disease is dependent on the stage of HCC, and the Barcelona clinic liver cancer (BCLC) staging system is used in most HCC cases. However, the molecular characteristics of HBV-related HCC in different BCLC stages are still unknown. Using GSE14520 microarray data from HBV-related HCC cases with BCLC stages from 0 (very early stage) to C (advanced stage) in the gene expression omnibus (GEO) database, differentially expressed genes (DEGs), including common DEGs and unique DEGs in different BCLC stages, were identified. These DEGs were located on different chromosomes. The molecular functions and biology pathways of DEGs were identified by gene ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and the interactome networks of DEGs were constructed using the NetVenn online tool. The results revealed that both common DEGs and stage-specific DEGs were associated with various molecular functions and were involved in special biological pathways. In addition, several hub genes were found in the interactome networks of DEGs. The identified DEGs and hub genes promote our understanding of the molecular mechanisms underlying the development of HBV-related HCC through the different BCLC stages, and might be used as staging biomarkers or molecular targets for the treatment of HCC with HBV infection.

  17. Genome-wide genetic analyses highlight mitogen-activated protein kinase (MAPK) signaling in the pathogenesis of endometriosis

    PubMed Central

    Uimari, Outi; Rahmioglu, Nilufer; Nyholt, Dale R.; Vincent, Katy; Missmer, Stacey A.; Becker, Christian; Morris, Andrew P.; Montgomery, Grant W.

    2017-01-01

    Abstract STUDY QUESTION Do genome-wide association study (GWAS) data for endometriosis provide insight into novel biological pathways associated with its pathogenesis? SUMMARY ANSWER GWAS analysis uncovered multiple pathways that are statistically enriched for genetic association signals, analysis of Stage A disease highlighted a novel variant in MAP3K4, while top pathways significantly associated with all endometriosis and Stage A disease included several mitogen-activated protein kinase (MAPK)-related pathways. WHAT IS KNOWN ALREADY Endometriosis is a complex disease with an estimated heritability of 50%. To date, GWAS revealed 10 genomic regions associated with endometriosis, explaining <4% of heritability, while half of the heritability is estimated to be due to common risk variants. Pathway analyses combine the evidence of single variants into gene-based measures, leveraging the aggregate effect of variants in genes and uncovering biological pathways involved in disease pathogenesis. STUDY DESIGN, SIZE, DURATION Pathway analysis was conducted utilizing the International Endogene Consortium GWAS data, comprising 3194 surgically confirmed endometriosis cases and 7060 controls of European ancestry with genotype data imputed up to 1000 Genomes Phase three reference panel. GWAS was performed for all endometriosis cases and for Stage A (revised American Fertility Society (rAFS) I/II, n = 1686) and B (rAFS III/IV, n = 1364) cases separately. The identified significant pathways were compared with pathways previously investigated in the literature through candidate association studies. PARTICIPANTS/MATERIALS, SETTING, METHODS The most comprehensive biological pathway databases, MSigDB (including BioCarta, KEGG, PID, SA, SIG, ST and GO) and PANTHER were utilized to test for enrichment of genetic variants associated with endometriosis. Statistical enrichment analysis was performed using the MAGENTA (Meta-Analysis Gene-set Enrichment of variaNT Associations) software. MAIN RESULTS AND THE ROLE OF CHANCE The first genome-wide association analysis for Stage A endometriosis revealed a novel locus, rs144240142 (P = 6.45 × 10−8, OR = 1.71, 95% CI = 1.23–2.37), an intronic single-nucleotide polymorphism (SNP) within MAP3K4. This SNP was not associated with Stage B disease (P = 0.086). MAP3K4 was also shown to be differentially expressed in eutopic endometrium between Stage A endometriosis cases and controls (P = 3.8 × 10−4), but not with Stage B disease (P = 0.26). A total of 14 pathways enriched with genetic endometriosis associations were identified (false discovery rate (FDR)-P < 0.05). The pathways associated with any endometriosis were Grb2-Sos provides linkage to MAPK signaling for integrins pathway (P = 2.8 × 10−5, FDR-P = 3.0 × 10−3), Wnt signaling (P = 0.026, FDR-P = 0.026) and p130Cas linkage to MAPK signaling for integrins pathway (P = 6.0 × 10−4, FDR-P = 0.029); with Stage A endometriosis: extracellular signal-regulated kinase (ERK)1 ERK2 MAPK (P = 5.0 × 10−4, FDR-P = 5.0 × 10−4) and with Stage B endometriosis: two overlapping pathways that related to extracellular matrix biology—Core matrisome (P = 1.4 × 10−3, FDR-P = 0.013) and ECM glycoproteins (P = 1.8 × 10−3, FDR-P = 7.1 × 10−3). Genes arising from endometriosis candidate gene studies performed to date were enriched for Interleukin signaling pathway (P = 2.3 × 10−12), Apoptosis signaling pathway (P = 9.7 × 10−9) and Gonadotropin releasing hormone receptor pathway (P = 1.2 × 10−6); however, these pathways did not feature in the results based on GWAS data. LARGE SCALE DATA Not applicable. LIMITATIONS, REASONS FOR CAUTION The analysis is restricted to (i) variants in/near genes that can be assigned to pathways, excluding intergenic variants; (ii) the gene-based pathway definition as registered in the databases; (iii) women of European ancestry. WIDER IMPLICATIONS OF THE FINDINGS The top ranked pathways associated with overall and Stage A endometriosis in particular involve integrin-mediated MAPK activation and intracellular ERK/MAPK acting downstream in the MAPK cascade, both acting in the control of cell division, gene expression, cell movement and survival. Other top enriched pathways in Stage B disease include ECM glycoprotein pathways important for extracellular structure and biochemical support. The results highlight the need for increased efforts to understand the functional role of these pathways in endometriosis pathogenesis, including the investigation of the biological effects of the genetic variants on downstream molecular processes in tissue relevant to endometriosis. Additionally, our results offer further support for the hypothesis of at least partially distinct causal pathophysiology for minimal/mild (rAFS I/II) vs. moderate/severe (rAFS III/IV) endometriosis. STUDY FUNDING/COMPETING INTEREST(S) The genome-wide association data and Wellcome Trust Case Control Consortium (WTCCC) were generated through funding from the Wellcome Trust (WT084766/Z/08/Z, 076113 and 085475) and the National Health and Medical Research Council (NHMRC) of Australia (241944, 339462, 389927, 389875, 389891, 389892, 389938, 443036, 442915, 442981, 496610, 496739, 552485 and 552498). N.R. was funded by a grant from the Medical Research Council UK (MR/K011480/1). A.P.M. is a Wellcome Trust Senior Fellow in Basic Biomedical Science (grant WT098017). All authors declare there are no conflicts of interest. PMID:28333195

  18. GPCR & company: databases and servers for GPCRs and interacting partners.

    PubMed

    Kowalsman, Noga; Niv, Masha Y

    2014-01-01

    G-protein-coupled receptors (GPCRs) are a large superfamily of membrane receptors that are involved in a wide range of signaling pathways. To fulfill their tasks, GPCRs interact with a variety of partners, including small molecules, lipids and proteins. They are accompanied by different proteins during all phases of their life cycle. Therefore, GPCR interactions with their partners are of great interest in basic cell-signaling research and in drug discovery.Due to the rapid development of computers and internet communication, knowledge and data can be easily shared within the worldwide research community via freely available databases and servers. These provide an abundance of biological, chemical and pharmacological information.This chapter describes the available web resources for investigating GPCR interactions. We review about 40 freely available databases and servers, and provide a few sentences about the essence and the data they supply. For simplification, the databases and servers were grouped under the following topics: general GPCR-ligand interactions; particular families of GPCRs and their ligands; GPCR oligomerization; GPCR interactions with intracellular partners; and structural information on GPCRs. In conclusion, a multitude of useful tools are currently available. Summary tables are provided to ease navigation between the numerous and partially overlapping resources. Suggestions for future enhancements of the online tools include the addition of links from general to specialized databases and enabling usage of user-supplied template for GPCR structural modeling.

  19. Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources

    PubMed Central

    Waagmeester, Andra; Pico, Alexander R.

    2016-01-01

    The diversity of online resources storing biological data in different formats provides a challenge for bioinformaticians to integrate and analyse their biological data. The semantic web provides a standard to facilitate knowledge integration using statements built as triples describing a relation between two objects. WikiPathways, an online collaborative pathway resource, is now available in the semantic web through a SPARQL endpoint at http://sparql.wikipathways.org. Having biological pathways in the semantic web allows rapid integration with data from other resources that contain information about elements present in pathways using SPARQL queries. In order to convert WikiPathways content into meaningful triples we developed two new vocabularies that capture the graphical representation and the pathway logic, respectively. Each gene, protein, and metabolite in a given pathway is defined with a standard set of identifiers to support linking to several other biological resources in the semantic web. WikiPathways triples were loaded into the Open PHACTS discovery platform and are available through its Web API (https://dev.openphacts.org/docs) to be used in various tools for drug development. We combined various semantic web resources with the newly converted WikiPathways content using a variety of SPARQL query types and third-party resources, such as the Open PHACTS API. The ability to use pathway information to form new links across diverse biological data highlights the utility of integrating WikiPathways in the semantic web. PMID:27336457

  20. Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources.

    PubMed

    Waagmeester, Andra; Kutmon, Martina; Riutta, Anders; Miller, Ryan; Willighagen, Egon L; Evelo, Chris T; Pico, Alexander R

    2016-06-01

    The diversity of online resources storing biological data in different formats provides a challenge for bioinformaticians to integrate and analyse their biological data. The semantic web provides a standard to facilitate knowledge integration using statements built as triples describing a relation between two objects. WikiPathways, an online collaborative pathway resource, is now available in the semantic web through a SPARQL endpoint at http://sparql.wikipathways.org. Having biological pathways in the semantic web allows rapid integration with data from other resources that contain information about elements present in pathways using SPARQL queries. In order to convert WikiPathways content into meaningful triples we developed two new vocabularies that capture the graphical representation and the pathway logic, respectively. Each gene, protein, and metabolite in a given pathway is defined with a standard set of identifiers to support linking to several other biological resources in the semantic web. WikiPathways triples were loaded into the Open PHACTS discovery platform and are available through its Web API (https://dev.openphacts.org/docs) to be used in various tools for drug development. We combined various semantic web resources with the newly converted WikiPathways content using a variety of SPARQL query types and third-party resources, such as the Open PHACTS API. The ability to use pathway information to form new links across diverse biological data highlights the utility of integrating WikiPathways in the semantic web.

  1. [The interpretation and integration of traditional Chinese phytotherapy into Western-type medicine with the possession of knowledge of the human genome].

    PubMed

    Blázovics, Anna

    2018-05-01

    The terminology of traditional Chinese medicine (TCM) is hardly interpretable in the context of human genome, therefore the human genome program attracted attention towards the Western practice of medicine in China. In the last two decades, several important steps could be observed in China in relation to the approach of traditional Chinese and Western medicine. The Chinese government supports the realization of information databases for research in order to clarify the molecular biology level to detect associations between gene expression signal transduction pathways and protein-protein interactions, and the effects of bioactive components of Chinese drugs and their effectiveness. The values of TCM are becoming more and more important for Western medicine as well, because molecular biological therapies did not redeem themselves, e.g., in tumor therapy. Orv Hetil. 2018; 159(18): 696-702.

  2. CoPub: a literature-based keyword enrichment tool for microarray data analysis.

    PubMed

    Frijters, Raoul; Heupers, Bart; van Beek, Pieter; Bouwhuis, Maurice; van Schaik, René; de Vlieg, Jacob; Polman, Jan; Alkema, Wynand

    2008-07-01

    Medline is a rich information source, from which links between genes and keywords describing biological processes, pathways, drugs, pathologies and diseases can be extracted. We developed a publicly available tool called CoPub that uses the information in the Medline database for the biological interpretation of microarray data. CoPub allows batch input of multiple human, mouse or rat genes and produces lists of keywords from several biomedical thesauri that are significantly correlated with the set of input genes. These lists link to Medline abstracts in which the co-occurring input genes and correlated keywords are highlighted. Furthermore, CoPub can graphically visualize differentially expressed genes and over-represented keywords in a network, providing detailed insight in the relationships between genes and keywords, and revealing the most influential genes as highly connected hubs. CoPub is freely accessible at http://services.nbic.nl/cgi-bin/copub/CoPub.pl.

  3. A System-Level Pathway-Phenotype Association Analysis Using Synthetic Feature Random Forest

    PubMed Central

    Pan, Qinxin; Hu, Ting; Malley, James D.; Andrew, Angeline S.; Karagas, Margaret R.; Moore, Jason H.

    2015-01-01

    As the cost of genome-wide genotyping decreases, the number of genome-wide association studies (GWAS) has increased considerably. However, the transition from GWAS findings to the underlying biology of various phenotypes remains challenging. As a result, due to its system-level interpretability, pathway analysis has become a popular tool for gaining insights on the underlying biology from high-throughput genetic association data. In pathway analyses, gene sets representing particular biological processes are tested for significant associations with a given phenotype. Most existing pathway analysis approaches rely on single-marker statistics and assume that pathways are independent of each other. As biological systems are driven by complex biomolecular interactions, embracing the complex relationships between single-nucleotide polymorphisms (SNPs) and pathways needs to be addressed. To incorporate the complexity of gene-gene interactions and pathway-pathway relationships, we propose a system-level pathway analysis approach, synthetic feature random forest (SF-RF), which is designed to detect pathway-phenotype associations without making assumptions about the relationships among SNPs or pathways. In our approach, the genotypes of SNPs in a particular pathway are aggregated into a synthetic feature representing that pathway via Random Forest (RF). Multiple synthetic features are analyzed using RF simultaneously and the significance of a synthetic feature indicates the significance of the corresponding pathway. We further complement SF-RF with pathway-based Statistical Epistasis Network (SEN) analysis that evaluates interactions among pathways. By investigating the pathway SEN, we hope to gain additional insights into the genetic mechanisms contributing to the pathway-phenotype association. We apply SF-RF to a population-based genetic study of bladder cancer and further investigate the mechanisms that help explain the pathway-phenotype associations using SEN. The bladder cancer associated pathways we found are both consistent with existing biological knowledge and reveal novel and plausible hypotheses for future biological validations. PMID:24535726

  4. Investigation of candidate genes for osteoarthritis based on gene expression profiles.

    PubMed

    Dong, Shuanghai; Xia, Tian; Wang, Lei; Zhao, Qinghua; Tian, Jiwei

    2016-12-01

    To explore the mechanism of osteoarthritis (OA) and provide valid biological information for further investigation. Gene expression profile of GSE46750 was downloaded from Gene Expression Omnibus database. The Linear Models for Microarray Data (limma) package (Bioconductor project, http://www.bioconductor.org/packages/release/bioc/html/limma.html) was used to identify differentially expressed genes (DEGs) in inflamed OA samples. Gene Ontology function enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis of DEGs were performed based on Database for Annotation, Visualization and Integrated Discovery data, and protein-protein interaction (PPI) network was constructed based on the Search Tool for the Retrieval of Interacting Genes/Proteins database. Regulatory network was screened based on Encyclopedia of DNA Elements. Molecular Complex Detection was used for sub-network screening. Two sub-networks with highest node degree were integrated with transcriptional regulatory network and KEGG functional enrichment analysis was processed for 2 modules. In total, 401 up- and 196 down-regulated DEGs were obtained. Up-regulated DEGs were involved in inflammatory response, while down-regulated DEGs were involved in cell cycle. PPI network with 2392 protein interactions was constructed. Moreover, 10 genes including Interleukin 6 (IL6) and Aurora B kinase (AURKB) were found to be outstanding in PPI network. There are 214 up- and 8 down-regulated transcription factor (TF)-target pairs in the TF regulatory network. Module 1 had TFs including SPI1, PRDM1, and FOS, while module 2 contained FOSL1. The nodes in module 1 were enriched in chemokine signaling pathway, while the nodes in module 2 were mainly enriched in cell cycle. The screened DEGs including IL6, AGT, and AURKB might be potential biomarkers for gene therapy for OA by being regulated by TFs such as FOS and SPI1, and participating in the cell cycle and cytokine-cytokine receptor interaction pathway. Copyright © 2016 Turkish Association of Orthopaedics and Traumatology. Production and hosting by Elsevier B.V. All rights reserved.

  5. SASD: the Synthetic Alternative Splicing Database for identifying novel isoform from proteomics

    PubMed Central

    2013-01-01

    Background Alternative splicing is an important and widespread mechanism for generating protein diversity and regulating protein expression. High-throughput identification and analysis of alternative splicing in the protein level has more advantages than in the mRNA level. The combination of alternative splicing database and tandem mass spectrometry provides a powerful technique for identification, analysis and characterization of potential novel alternative splicing protein isoforms from proteomics. Therefore, based on the peptidomic database of human protein isoforms for proteomics experiments, our objective is to design a new alternative splicing database to 1) provide more coverage of genes, transcripts and alternative splicing, 2) exclusively focus on the alternative splicing, and 3) perform context-specific alternative splicing analysis. Results We used a three-step pipeline to create a synthetic alternative splicing database (SASD) to identify novel alternative splicing isoforms and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. First, we extracted information on gene structures of all genes in the Ensembl Genes 71 database and incorporated the Integrated Pathway Analysis Database. Then, we compiled artificial splicing transcripts. Lastly, we translated the artificial transcripts into alternative splicing peptides. The SASD is a comprehensive database containing 56,630 genes (Ensembl gene IDs), 95,260 transcripts (Ensembl transcript IDs), and 11,919,779 Alternative Splicing peptides, and also covering about 1,956 pathways, 6,704 diseases, 5,615 drugs, and 52 organs. The database has a web-based user interface that allows users to search, display and download a single gene/transcript/protein, custom gene set, pathway, disease, drug, organ related alternative splicing. Moreover, the quality of the database was validated with comparison to other known databases and two case studies: 1) in liver cancer and 2) in breast cancer. Conclusions The SASD provides the scientific community with an efficient means to identify, analyze, and characterize novel Exon Skipping and Intron Retention protein isoforms from mass spectrometry and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. PMID:24267658

  6. funRiceGenes dataset for comprehensive understanding and application of rice functional genes.

    PubMed

    Yao, Wen; Li, Guangwei; Yu, Yiming; Ouyang, Yidan

    2018-01-01

    As a main staple food, rice is also a model plant for functional genomic studies of monocots. Decoding of every DNA element of the rice genome is essential for genetic improvement to address increasing food demands. The past 15 years have witnessed extraordinary advances in rice functional genomics. Systematic characterization and proper deposition of every rice gene are vital for both functional studies and crop genetic improvement. We built a comprehensive and accurate dataset of ∼2800 functionally characterized rice genes and ∼5000 members of different gene families by integrating data from available databases and reviewing every publication on rice functional genomic studies. The dataset accounts for 19.2% of the 39 045 annotated protein-coding rice genes, which provides the most exhaustive archive for investigating the functions of rice genes. We also constructed 214 gene interaction networks based on 1841 connections between 1310 genes. The largest network with 762 genes indicated that pleiotropic genes linked different biological pathways. Increasing degree of conservation of the flowering pathway was observed among more closely related plants, implying substantial value of rice genes for future dissection of flowering regulation in other crops. All data are deposited in the funRiceGenes database (https://funricegenes.github.io/). Functionality for advanced search and continuous updating of the database are provided by a Shiny application (http://funricegenes.ncpgr.cn/). The funRiceGenes dataset would enable further exploring of the crosslink between gene functions and natural variations in rice, which can also facilitate breeding design to improve target agronomic traits of rice. © The Authors 2017. Published by Oxford University Press.

  7. A comprehensive map of the influenza A virus replication cycle

    PubMed Central

    2013-01-01

    Background Influenza is a common infectious disease caused by influenza viruses. Annual epidemics cause severe illnesses, deaths, and economic loss around the world. To better defend against influenza viral infection, it is essential to understand its mechanisms and associated host responses. Many studies have been conducted to elucidate these mechanisms, however, the overall picture remains incompletely understood. A systematic understanding of influenza viral infection in host cells is needed to facilitate the identification of influential host response mechanisms and potential drug targets. Description We constructed a comprehensive map of the influenza A virus (‘IAV’) life cycle (‘FluMap’) by undertaking a literature-based, manual curation approach. Based on information obtained from publicly available pathway databases, updated with literature-based information and input from expert virologists and immunologists, FluMap is currently composed of 960 factors (i.e., proteins, mRNAs etc.) and 456 reactions, and is annotated with ~500 papers and curation comments. In addition to detailing the type of molecular interactions, isolate/strain specific data are also available. The FluMap was built with the pathway editor CellDesigner in standard SBML (Systems Biology Markup Language) format and visualized as an SBGN (Systems Biology Graphical Notation) diagram. It is also available as a web service (online map) based on the iPathways+ system to enable community discussion by influenza researchers. We also demonstrate computational network analyses to identify targets using the FluMap. Conclusion The FluMap is a comprehensive pathway map that can serve as a graphically presented knowledge-base and as a platform to analyze functional interactions between IAV and host factors. Publicly available webtools will allow continuous updating to ensure the most reliable representation of the host-virus interaction network. The FluMap is available at http://www.influenza-x.org/flumap/. PMID:24088197

  8. Identification of MicroRNAs and Target Genes in the Fruit and Shoot Tip of Lycium chinense: A Traditional Chinese Medicinal Plant

    PubMed Central

    Khaldun, A. B. M.; Huang, Wenjun; Liao, Sihong; Lv, Haiyan; Wang, Ying

    2015-01-01

    Although Lycium chinense (goji berry) is an important traditional Chinese medicinal plant, little genome information is available for this plant, particularly at the small-RNA level. Recent findings indicate that the evolutionary role of miRNAs is very important for a better understanding of gene regulation in different plant species. To elucidate small RNAs and their potential target genes in fruit and shoot tissues, high-throughput RNA sequencing technology was used followed by qRT-PCR and RLM 5’-RACE experiments. A total of 60 conserved miRNAs belonging to 31 families and 30 putative novel miRNAs were identified. A total of 62 significantly differentially expressed miRNAs were identified, of which 15 (14 known and 1 novel) were shoot-specific, and 12 (7 known and 5 novel) were fruit-specific. Additionally, 28 differentially expressed miRNAs were recorded as up-regulated in fruit tissues. The predicted potential targets were involved in a wide range of metabolic and regulatory pathways. GO (Gene Ontology) enrichment analysis and the KEGG (Kyoto Encyclopedia of Genes and Genomes) database revealed that “metabolic pathways” is the most significant pathway with respect to the rich factor and gene numbers. Moreover, five miRNAs were related to fruit maturation, lycopene biosynthesis and signaling pathways, which might be important for the further study of fruit molecular biology. This study is the first, to detect known and novel miRNAs, and their potential targets, of L. chinense. The data and findings that are presented here might be a good source for the functional genomic study of medicinal plants and for understanding the links among diversified biological pathways. PMID:25587984

  9. BIOSPIDA: A Relational Database Translator for NCBI.

    PubMed

    Hagen, Matthew S; Lee, Eva K

    2010-11-13

    As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time.

  10. Targetome Analysis Revealed Involvement of MiR-126 in Neurotrophin Signaling Pathway: A Possible Role in Prevention of Glioma Development.

    PubMed

    Rouigari, Maedeh; Dehbashi, Moein; Ghaedi, Kamran; Pourhossein, Meraj

    2018-07-01

    For the first time, we used molecular signaling pathway enrichment analysis to determine possible involvement of miR-126 and IRS-1 in neurotrophin pathway. In this prospective study, Validated and predicted targets (targetome) of miR-126 were collected following searching miRtarbase (http://mirtarbase.mbc.nctu.edu.tw/) and miRWalk 2.0 databases, respectively. Then, approximate expression of miR-126 targeting in Glioma tissue was examined using UniGene database (http://www.ncbi. nlm.nih.gov/unigene). In silico molecular pathway enrichment analysis was carried out by DAVID 6.7 database (http://david. abcc.ncifcrf.gov/) to explore which signaling pathway is related to miR-126 targeting and how miR-126 attributes to glioma development. MiR-126 exerts a variety of functions in cancer pathogenesis via suppression of expression of target gene including PI3K, KRAS, EGFL7, IRS-1 and VEGF. Our bioinformatic studies implementing DAVID database, showed the involvement of miR-126 target genes in several signaling pathways including cancer pathogenesis, neurotrophin functions, Glioma formation, insulin function, focal adhesion production, chemokine synthesis and secretion and regulation of the actin cytoskeleton. Taken together, we concluded that miR-126 enhances the formation of glioma cancer stem cell probably via down regulation of IRS-1 in neurotrophin signaling pathway. Copyright© by Royan Institute. All rights reserved.

  11. The Listeria monocytogenes strain 10403S BioCyc database

    PubMed Central

    Orsi, Renato H.; Bergholz, Teresa M.; Wiedmann, Martin; Boor, Kathryn J.

    2015-01-01

    Listeria monocytogenes is a food-borne pathogen of humans and other animals. The striking ability to survive several stresses usually used for food preservation makes L. monocytogenes one of the biggest concerns to the food industry, while the high mortality of listeriosis in specific groups of humans makes it a great concern for public health. Previous studies have shown that a regulatory network involving alternative sigma (σ) factors and transcription factors is pivotal to stress survival. However, few studies have evaluated at the metabolic networks controlled by these regulatory mechanisms. The L. monocytogenes BioCyc database uses the strain 10403S as a model. Computer-generated initial annotation for all genes also allowed for identification, annotation and display of predicted reactions and pathways carried out by a single cell. Further ongoing manual curation based on published data as well as database mining for selected genes allowed the more refined annotation of functions, which, in turn, allowed for annotation of new pathways and fine-tuning of previously defined pathways to more L. monocytogenes-specific pathways. Using RNA-Seq data, several transcription start sites and promoter regions were mapped to the 10403S genome and annotated within the database. Additionally, the identification of promoter regions and a comprehensive review of available literature allowed the annotation of several regulatory interactions involving σ factors and transcription factors. The L. monocytogenes 10403S BioCyc database is a new resource for researchers studying Listeria and related organisms. It allows users to (i) have a comprehensive view of all reactions and pathways predicted to take place within the cell in the cellular overview, as well as to (ii) upload their own data, such as differential expression data, to visualize the data in the scope of predicted pathways and regulatory networks and to carry on enrichment analyses using several different annotations available within the database. Database URL: http://biocyc.org/organism-summary?object=10403S_RAST PMID:25819074

  12. SBCDDB: Sleeping Beauty Cancer Driver Database for gene discovery in mouse models of human cancers

    PubMed Central

    Mann, Michael B

    2018-01-01

    Abstract Large-scale oncogenomic studies have identified few frequently mutated cancer drivers and hundreds of infrequently mutated drivers. Defining the biological context for rare driving events is fundamentally important to increasing our understanding of the druggable pathways in cancer. Sleeping Beauty (SB) insertional mutagenesis is a powerful gene discovery tool used to model human cancers in mice. Our lab and others have published a number of studies that identify cancer drivers from these models using various statistical and computational approaches. Here, we have integrated SB data from primary tumor models into an analysis and reporting framework, the Sleeping Beauty Cancer Driver DataBase (SBCDDB, http://sbcddb.moffitt.org), which identifies drivers in individual tumors or tumor populations. Unique to this effort, the SBCDDB utilizes a single, scalable, statistical analysis method that enables data to be grouped by different biological properties. This allows for SB drivers to be evaluated (and re-evaluated) under different contexts. The SBCDDB provides visual representations highlighting the spatial attributes of transposon mutagenesis and couples this functionality with analysis of gene sets, enabling users to interrogate relationships between drivers. The SBCDDB is a powerful resource for comparative oncogenomic analyses with human cancer genomics datasets for driver prioritization. PMID:29059366

  13. Integrated Computational Analysis of Genes Associated with Human Hereditary Insensitivity to Pain. A Drug Repurposing Perspective

    PubMed Central

    Lötsch, Jörn; Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred

    2017-01-01

    Genes causally involved in human insensitivity to pain provide a unique molecular source of studying the pathophysiology of pain and the development of novel analgesic drugs. The increasing availability of “big data” enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 20 genes causally involved in human hereditary insensitivity to pain with the knowledge about the functions of thousands of genes. An integrated computational analysis proposed that among the functions of this set of genes, the processes related to nervous system development and to ceramide and sphingosine signaling pathways are particularly important. This is in line with earlier suggestions to use these pathways as therapeutic target in pain. Following identification of the biological processes characterizing hereditary insensitivity to pain, the biological processes were used for a similarity analysis with the functions of n = 4,834 database-queried drugs. Using emergent self-organizing maps, a cluster of n = 22 drugs was identified sharing important functional features with hereditary insensitivity to pain. Several members of this cluster had been implicated in pain in preclinical experiments. Thus, the present concept of machine-learned knowledge discovery for pain research provides biologically plausible results and seems to be suitable for drug discovery by identifying a narrow choice of repurposing candidates, demonstrating that contemporary machine-learned methods offer innovative approaches to knowledge discovery from available evidence. PMID:28848388

  14. GARNET--gene set analysis with exploration of annotation relations.

    PubMed

    Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu

    2011-02-15

    Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).

  15. De novo characterization of the pine aphid Cinara pinitabulaeformis Zhang et Zhang transcriptome and analysis of genes relevant to pesticides

    PubMed Central

    Rebeca, Carballar-Lejarazú; Zhu, Xiaoli; Guo, Yajie; Lin, Qiannan; Hu, Xia; Wang, Rong; Liang, Guanghong; Guan, Xiong

    2017-01-01

    The pine aphid Cinara pinitabulaeformis Zhang et Zhang is the main pine pest in China, it causes pine needles to produce dense dew (honeydew) which can lead to sooty mold (black filamentous saprophytic ascomycetes). Although common chemical and physical strategies are used to prevent the disease caused by C. pinitabulaeformis Zhang et Zhang, new strategies based on biological and/or genetic approaches are promising to control and eradicate the disease. However, there is no information about genomics, proteomics or transcriptomics to allow the design of new control strategies for this pine aphid. We used next generation sequencing technology to sequence the transcriptome of C. pinitabulaeformis Zhang et Zhang and built a transcriptome database. We identified 80,259 unigenes assigned for Gene Ontology (GO) terms and information for a total of 11,609 classified unigenes was obtained in the Clusters of Orthologous Groups (COGs). A total of 10,806 annotated unigenes were analyzed to identify the represented biological pathways, among them 8,845 unigenes matched with 228 KEGG pathways. In addition, our data describe propagative viruses, nutrition-related genes, detoxification related molecules, olfactory related receptors, stressed-related protein, putative insecticide resistance genes and possible insecticide targets. Moreover, this study provides valuable information about putative insecticide resistance related genes and for the design of new genetic/biological based strategies to manage and control C. pinitabulaeformis Zhang et Zhang populations. PMID:28570707

  16. Boosting Probabilistic Graphical Model Inference by Incorporating Prior Knowledge from Multiple Sources

    PubMed Central

    Praveen, Paurush; Fröhlich, Holger

    2013-01-01

    Inferring regulatory networks from experimental data via probabilistic graphical models is a popular framework to gain insights into biological systems. However, the inherent noise in experimental data coupled with a limited sample size reduces the performance of network reverse engineering. Prior knowledge from existing sources of biological information can address this low signal to noise problem by biasing the network inference towards biologically plausible network structures. Although integrating various sources of information is desirable, their heterogeneous nature makes this task challenging. We propose two computational methods to incorporate various information sources into a probabilistic consensus structure prior to be used in graphical model inference. Our first model, called Latent Factor Model (LFM), assumes a high degree of correlation among external information sources and reconstructs a hidden variable as a common source in a Bayesian manner. The second model, a Noisy-OR, picks up the strongest support for an interaction among information sources in a probabilistic fashion. Our extensive computational studies on KEGG signaling pathways as well as on gene expression data from breast cancer and yeast heat shock response reveal that both approaches can significantly enhance the reconstruction accuracy of Bayesian Networks compared to other competing methods as well as to the situation without any prior. Our framework allows for using diverse information sources, like pathway databases, GO terms and protein domain data, etc. and is flexible enough to integrate new sources, if available. PMID:23826291

  17. BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data

    PubMed Central

    2014-01-01

    Background Biological databases vary enormously in size and data complexity, from small databases that contain a few million Resource Description Framework (RDF) triples to large databases that contain billions of triples. In this paper, we evaluate whether RDF native stores can be used to meet the needs of a biological database provider. Prior evaluations have used synthetic data with a limited database size. For example, the largest BSBM benchmark uses 1 billion synthetic e-commerce knowledge RDF triples on a single node. However, real world biological data differs from the simple synthetic data much. It is difficult to determine whether the synthetic e-commerce data is efficient enough to represent biological databases. Therefore, for this evaluation, we used five real data sets from biological databases. Results We evaluated five triple stores, 4store, Bigdata, Mulgara, Virtuoso, and OWLIM-SE, with five biological data sets, Cell Cycle Ontology, Allie, PDBj, UniProt, and DDBJ, ranging in size from approximately 10 million to 8 billion triples. For each database, we loaded all the data into our single node and prepared the database for use in a classical data warehouse scenario. Then, we ran a series of SPARQL queries against each endpoint and recorded the execution time and the accuracy of the query response. Conclusions Our paper shows that with appropriate configuration Virtuoso and OWLIM-SE can satisfy the basic requirements to load and query biological data less than 8 billion or so on a single node, for the simultaneous access of 64 clients. OWLIM-SE performs best for databases with approximately 11 million triples; For data sets that contain 94 million and 590 million triples, OWLIM-SE and Virtuoso perform best. They do not show overwhelming advantage over each other; For data over 4 billion Virtuoso works best. 4store performs well on small data sets with limited features when the number of triples is less than 100 million, and our test shows its scalability is poor; Bigdata demonstrates average performance and is a good open source triple store for middle-sized (500 million or so) data set; Mulgara shows a little of fragility. PMID:25089180

  18. BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data.

    PubMed

    Wu, Hongyan; Fujiwara, Toyofumi; Yamamoto, Yasunori; Bolleman, Jerven; Yamaguchi, Atsuko

    2014-01-01

    Biological databases vary enormously in size and data complexity, from small databases that contain a few million Resource Description Framework (RDF) triples to large databases that contain billions of triples. In this paper, we evaluate whether RDF native stores can be used to meet the needs of a biological database provider. Prior evaluations have used synthetic data with a limited database size. For example, the largest BSBM benchmark uses 1 billion synthetic e-commerce knowledge RDF triples on a single node. However, real world biological data differs from the simple synthetic data much. It is difficult to determine whether the synthetic e-commerce data is efficient enough to represent biological databases. Therefore, for this evaluation, we used five real data sets from biological databases. We evaluated five triple stores, 4store, Bigdata, Mulgara, Virtuoso, and OWLIM-SE, with five biological data sets, Cell Cycle Ontology, Allie, PDBj, UniProt, and DDBJ, ranging in size from approximately 10 million to 8 billion triples. For each database, we loaded all the data into our single node and prepared the database for use in a classical data warehouse scenario. Then, we ran a series of SPARQL queries against each endpoint and recorded the execution time and the accuracy of the query response. Our paper shows that with appropriate configuration Virtuoso and OWLIM-SE can satisfy the basic requirements to load and query biological data less than 8 billion or so on a single node, for the simultaneous access of 64 clients. OWLIM-SE performs best for databases with approximately 11 million triples; For data sets that contain 94 million and 590 million triples, OWLIM-SE and Virtuoso perform best. They do not show overwhelming advantage over each other; For data over 4 billion Virtuoso works best. 4store performs well on small data sets with limited features when the number of triples is less than 100 million, and our test shows its scalability is poor; Bigdata demonstrates average performance and is a good open source triple store for middle-sized (500 million or so) data set; Mulgara shows a little of fragility.

  19. Plasma metabolomic profiles of breast cancer patients after short-term limonene intervention.

    PubMed

    Miller, Jessica A; Pappan, Kirk; Thompson, Patricia A; Want, Elizabeth J; Siskos, Alexandros P; Keun, Hector C; Wulff, Jacob; Hu, Chengcheng; Lang, Julie E; Chow, H-H Sherry

    2015-01-01

    Limonene is a lipophilic monoterpene found in high levels in citrus peel. Limonene demonstrates anticancer properties in preclinical models with effects on multiple cellular targets at varying potency. While of interest as a cancer chemopreventive, the biologic activity of limonene in humans is poorly understood. We conducted metabolite profiling in 39 paired (pre/postintervention) plasma samples from early-stage breast cancer patients receiving limonene treatment (2 g QD) before surgical resection of their tumor. Metabolite profiling was conducted using ultra-performance liquid chromatography coupled to a linear trap quadrupole system and gas chromatography-mass spectrometry. Metabolites were identified by comparison of ion features in samples to a standard reference library. Pathway-based interpretation was conducted using the human metabolome database and the MetaCyc database. Of the 397 named metabolites identified, 72 changed significantly with limonene intervention. Class-based changes included significant decreases in adrenal steroids (P < 0.01), and significant increases in bile acids (P ≤ 0.05) and multiple collagen breakdown products (P < 0.001). The pattern of changes also suggested alterations in glucose metabolism. There were 47 metabolites whose change with intervention was significantly correlated to a decrease in cyclin D1, a cell-cycle regulatory protein, in patient tumor tissues (P ≤ 0.05). Here, oral administration of limonene resulted in significant changes in several metabolic pathways. Furthermore, pathway-based changes were related to the change in tissue level cyclin D1 expression. Future controlled clinical trials with limonene are necessary to determine the potential role and mechanisms of limonene in the breast cancer prevention setting. ©2014 American Association for Cancer Research.

  20. Flower bud transcriptome analysis of Sapium sebiferum (Linn.) Roxb. and primary investigation of drought induced flowering: pathway construction and G-quadruplex prediction based on transcriptome.

    PubMed

    Yang, Minglei; Wu, Ying; Jin, Shan; Hou, Jinyan; Mao, Yingji; Liu, Wenbo; Shen, Yangcheng; Wu, Lifang

    2015-01-01

    Sapium sebiferum (Linn.) Roxb. (Chinese Tallow Tree) is a perennial woody tree and its seeds are rich in oil which hold great potential for biodiesel production. Despite a traditional woody oil plant, our understanding on S. sebiferum genetics and molecular biology remains scant. In this study, the first comprehensive transcriptome of S. sebiferum flower has been generated by sequencing and de novo assembly. A total of 149,342 unigenes were generated from raw reads, of which 24,289 unigenes were successfully matched to public database. A total of 61 MADS box genes and putative pathways involved in S. sebiferum flower development have been identified. Abiotic stress response network was also constructed in this work, where 2,686 unigenes are involved in the pathway. As for lipid biosynthesis, 161 unigenes have been identified in fatty acid (FA) and triacylglycerol (TAG) biosynthesis. Besides, the G-Quadruplexes in RNA of S. sebiferum also have been predicted. An interesting finding is that the stress-induced flowering was observed in S. sebiferum for the first time. According to the results of semi-quantitative PCR, expression tendencies of flowering-related genes, GA1, AP2 and CRY2, accorded with stress-related genes, such as GRX50435 and PRXⅡ39562. This transcriptome provides functional genomic information for further research of S. sebiferum, especially for the genetic engineering to shorten the juvenile period and improve yield by regulating flower development. It also offers a useful database for the research of other Euphorbiaceae family plants.

  1. Nuclear Receptor Signaling Atlas: Opening Access to the Biology of Nuclear Receptor Signaling Pathways.

    PubMed

    Becnel, Lauren B; Darlington, Yolanda F; Ochsner, Scott A; Easton-Marks, Jeremy R; Watkins, Christopher M; McOwiti, Apollo; Kankanamge, Wasula H; Wise, Michael W; DeHart, Michael; Margolis, Ronald N; McKenna, Neil J

    2015-01-01

    Signaling pathways involving nuclear receptors (NRs), their ligands and coregulators, regulate tissue-specific transcriptomes in diverse processes, including development, metabolism, reproduction, the immune response and neuronal function, as well as in their associated pathologies. The Nuclear Receptor Signaling Atlas (NURSA) is a Consortium focused around a Hub website (www.nursa.org) that annotates and integrates diverse 'omics datasets originating from the published literature and NURSA-funded Data Source Projects (NDSPs). These datasets are then exposed to the scientific community on an Open Access basis through user-friendly data browsing and search interfaces. Here, we describe the redesign of the Hub, version 3.0, to deploy "Web 2.0" technologies and add richer, more diverse content. The Molecule Pages, which aggregate information relevant to NR signaling pathways from myriad external databases, have been enhanced to include resources for basic scientists, such as post-translational modification sites and targeting miRNAs, and for clinicians, such as clinical trials. A portal to NURSA's Open Access, PubMed-indexed journal Nuclear Receptor Signaling has been added to facilitate manuscript submissions. Datasets and information on reagents generated by NDSPs are available, as is information concerning periodic new NDSP funding solicitations. Finally, the new website integrates the Transcriptomine analysis tool, which allows for mining of millions of richly annotated public transcriptomic data points in the field, providing an environment for dataset re-use and citation, bench data validation and hypothesis generation. We anticipate that this new release of the NURSA database will have tangible, long term benefits for both basic and clinical research in this field.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hill, David P.; D’Eustachio, Peter; Berardini, Tanya Z.

    The concept of a biological pathway, an ordered sequence of molecular transformations, is used to collect and represent molecular knowledge for a broad span of organismal biology. Representations of biomedical pathways typically are rich but idiosyncratic presentations of organized knowledge about individual pathways. Meanwhile, biomedical ontologies and associated annotation files are powerful tools that organize molecular information in a logically rigorous form to support computational analysis. The Gene Ontology (GO), representing Molecular Functions, Biological Processes and Cellular Components, incorporates many aspects of biological pathways within its ontological representations. Here we present a methodology for extending and refining the classes inmore » the GO for more comprehensive, consistent and integrated representation of pathways, leveraging knowledge embedded in current pathway representations such as those in the Reactome Knowledgebase and MetaCyc. With carbohydrate metabolic pathways as a use case, we discuss how our representation supports the integration of variant pathway classes into a unified ontological structure that can be used for data comparison and analysis.« less

  3. Identification of key target genes and pathways in laryngeal carcinoma

    PubMed Central

    Liu, Feng; Du, Jintao; Liu, Jun; Wen, Bei

    2016-01-01

    The purpose of the present study was to screen the key genes associated with laryngeal carcinoma and to investigate the molecular mechanism of laryngeal carcinoma progression. The gene expression profile of GSE10935 [Gene Expression Omnibus (GEO) accession number], including 12 specimens from laryngeal papillomas and 12 specimens from normal laryngeal epithelia controls, was downloaded from the GEO database. Differentially expressed genes (DEGs) were screened in laryngeal papillomas compared with normal controls using Limma package in R language, followed by Gene Ontology (GO) enrichment analysis and pathway enrichment analysis. Furthermore, the protein-protein interaction (PPI) network of DEGs was constructed using Cytoscape software and modules were analyzed using MCODE plugin from the PPI network. Furthermore, significant biological pathway regions (sub-pathway) were identified by using iSubpathwayMiner analysis. A total of 67 DEGs were identified, including 27 up-regulated genes and 40 down-regulated genes and they were involved in different GO terms and pathways. PPI network analysis revealed that Ras association (RalGDS/AF-6) domain family member 1 (RASSF1) was a hub protein. The sub-pathway analysis identified 9 significantly enriched sub-pathways, including glycolysis/gluconeogenesis and nitrogen metabolism. Genes such as phosphoglycerate kinase 1 (PGK1), carbonic anhydrase II (CA2), and carbonic anhydrase XII (CA12) whose node degrees were >10 were identified in the disease risk sub-pathway. Genes in the sub-pathway, such as RASSF1, PGK1, CA2 and CA12 were presumed to serve critical roles in laryngeal carcinoma. The present study identified DEGs and their sub-pathways in the disease, which may serve as potential targets for treatment of laryngeal carcinoma. PMID:27446427

  4. Molecular Signatures of Microbial Metabolism in an Actively Growing, Silicified, Microbial Structure from Yellowstone National Park

    NASA Astrophysics Data System (ADS)

    Ferreira, M.; Creveling, J.; Hilburn, I.; Karlsson, E.; Pepe-Ranney, C.; Spear, J.; Dawson, S.; Geobio2008, I.

    2008-12-01

    Silicified structures that exhibit a putative biologic component in their formation permeate the rock record as stromatolites. We have studied a silicified microbial structure from a hot spring in Yellowstone National Park using phenotypic, phylogenetic, and metagenomic analyses to determine microbial carbon metabolic pathways and the phylogenetic affiliations of microbes present in this unique structure. In this multi-faceted approach, dominant physiologies, specifically with regards to anaerobic and aerobic metabolisms, were inferred from 16S rRNA gene sequences and 454 sequencing data from bulk DNA samples of the structure. Carbon utilization as indicated by ECO Biolog plates showed abundant heterotrophy and heterotrophic diversity throughout the microbial structure. Microbes within the structure are able to utilize all tested sources of carbohydrates, lipids/fatty acids, and protein/amino acids as carbon sources. ECO plate testing of the hot spring water yielded considerable less carbohydrate consumption (only 4 out of 13 tested carbohydrates) and similar lipids/fatty acids and protein/amino acids consumption (2 out of 3 and 5 out of 5 tested sources respectively). Full length 16S rRNA gene sequences and metagenomic 454 pyrosequencing of community DNA showed limited diversity among primary producers. From the 16S data, the majority of the autotrophs are inferred to utilize the Calvin cycle for CO2 fixation, followed by 3-hydroxypropionate/4- hydroxybutyrate CO2 fixation. However, an analysis of the metagenomic data compared to the KEGG database does not show genes directly involved with Calvin cycle carbon fixation. Further BLAST searches of our data failed to find significant matches within our 6514 metagenomic sequences to known RuBisCo sequences taken from the NCBI database. This is likely due to a far under-sampled dataset of metagenomic sequences, and the low number (958) that had matches to the KEGG pathways database. Anaerobic versus aerobic physiology also can be estimated from the 16S clone libraries. Phylogenetic analysis of recovered 16S sequences suggests that 15% of the 16S sequences can be attributed to anaerobic microbes while 42% likely come from aerobes. The remaining 43% of 16S rRNA gene sequences belong to metabolically unassigned phyla both known and novel. This preliminary study demonstrates that the small spatially stratified silicified microbial structure present on the margins of a hot spring contains a rich and complex microbial community with different trophic levels and enzymatic pathways.

  5. Relational Databases: A Transparent Framework for Encouraging Biology Students to Think Informatically

    ERIC Educational Resources Information Center

    Rice, Michael; Gladstone, William; Weir, Michael

    2004-01-01

    We discuss how relational databases constitute an ideal framework for representing and analyzing large-scale genomic data sets in biology. As a case study, we describe a Drosophila splice-site database that we recently developed at Wesleyan University for use in research and teaching. The database stores data about splice sites computed by a…

  6. Search and Discovery Strategies for Biotechnology: the Paradigm Shift

    PubMed Central

    Bull, Alan T.; Ward, Alan C.; Goodfellow, Michael

    2000-01-01

    Profound changes are occurring in the strategies that biotechnology-based industries are deploying in the search for exploitable biology and to discover new products and develop new or improved processes. The advances that have been made in the past decade in areas such as combinatorial chemistry, combinatorial biosynthesis, metabolic pathway engineering, gene shuffling, and directed evolution of proteins have caused some companies to consider withdrawing from natural product screening. In this review we examine the paradigm shift from traditional biology to bioinformatics that is revolutionizing exploitable biology. We conclude that the reinvigorated means of detecting novel organisms, novel chemical structures, and novel biocatalytic activities will ensure that natural products will continue to be a primary resource for biotechnology. The paradigm shift has been driven by a convergence of complementary technologies, exemplified by DNA sequencing and amplification, genome sequencing and annotation, proteome analysis, and phenotypic inventorying, resulting in the establishment of huge databases that can be mined in order to generate useful knowledge such as the identity and characterization of organisms and the identity of biotechnology targets. Concurrently there have been major advances in understanding the extent of microbial diversity, how uncultured organisms might be grown, and how expression of the metabolic potential of microorganisms can be maximized. The integration of information from complementary databases presents a significant challenge. Such integration should facilitate answers to complex questions involving sequence, biochemical, physiological, taxonomic, and ecological information of the sort posed in exploitable biology. The paradigm shift which we discuss is not absolute in the sense that it will replace established microbiology; rather, it reinforces our view that innovative microbiology is essential for releasing the potential of microbial diversity for biotechnology penetration throughout industry. Various of these issues are considered with reference to deep-sea microbiology and biotechnology. PMID:10974127

  7. Towards BioDBcore: a community-defined information specification for biological databases

    PubMed Central

    Gaudet, Pascale; Bairoch, Amos; Field, Dawn; Sansone, Susanna-Assunta; Taylor, Chris; Attwood, Teresa K.; Bateman, Alex; Blake, Judith A.; Bult, Carol J.; Cherry, J. Michael; Chisholm, Rex L.; Cochrane, Guy; Cook, Charles E.; Eppig, Janan T.; Galperin, Michael Y.; Gentleman, Robert; Goble, Carole A.; Gojobori, Takashi; Hancock, John M.; Howe, Douglas G.; Imanishi, Tadashi; Kelso, Janet; Landsman, David; Lewis, Suzanna E.; Mizrachi, Ilene Karsch; Orchard, Sandra; Ouellette, B. F. Francis; Ranganathan, Shoba; Richardson, Lorna; Rocca-Serra, Philippe; Schofield, Paul N.; Smedley, Damian; Southan, Christopher; Tan, Tin Wee; Tatusova, Tatiana; Whetzel, Patricia L.; White, Owen; Yamasaki, Chisato

    2011-01-01

    The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases. PMID:21097465

  8. Towards BioDBcore: a community-defined information specification for biological databases

    PubMed Central

    Gaudet, Pascale; Bairoch, Amos; Field, Dawn; Sansone, Susanna-Assunta; Taylor, Chris; Attwood, Teresa K.; Bateman, Alex; Blake, Judith A.; Bult, Carol J.; Cherry, J. Michael; Chisholm, Rex L.; Cochrane, Guy; Cook, Charles E.; Eppig, Janan T.; Galperin, Michael Y.; Gentleman, Robert; Goble, Carole A.; Gojobori, Takashi; Hancock, John M.; Howe, Douglas G.; Imanishi, Tadashi; Kelso, Janet; Landsman, David; Lewis, Suzanna E.; Karsch Mizrachi, Ilene; Orchard, Sandra; Ouellette, B.F. Francis; Ranganathan, Shoba; Richardson, Lorna; Rocca-Serra, Philippe; Schofield, Paul N.; Smedley, Damian; Southan, Christopher; Tan, Tin W.; Tatusova, Tatiana; Whetzel, Patricia L.; White, Owen; Yamasaki, Chisato

    2011-01-01

    The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases. PMID:21205783

  9. On determining firing delay time of transitions for Petri net based signaling pathways by introducing stochastic decision rules.

    PubMed

    Miwa, Yoshimasa; Li, Chen; Ge, Qi-Wei; Matsuno, Hiroshi; Miyano, Satoru

    2010-01-01

    Parameter determination is important in modeling and simulating biological pathways including signaling pathways. Parameters are determined according to biological facts obtained from biological experiments and scientific publications. However, such reliable data describing detailed reactions are not reported in most cases. This prompted us to develop a general methodology of determining the parameters of a model in the case of that no information of the underlying biological facts is provided. In this study, we use the Petri net approach for modeling signaling pathways, and propose a method to determine firing delay times of transitions for Petri net models of signaling pathways by introducing stochastic decision rules. Petri net technology provides a powerful approach to modeling and simulating various concurrent systems, and recently have been widely accepted as a description method for biological pathways. Our method enables to determine the range of firing delay time which realizes smooth token flows in the Petri net model of a signaling pathway. The availability of this method has been confirmed by the results of an application to the interleukin-1 induced signaling pathway.

  10. On determining firing delay time of transitions for petri net based signaling pathways by introducing stochastic decision rules.

    PubMed

    Miwa, Yoshimasa; Li, Chen; Ge, Qi-Wei; Matsuno, Hiroshi; Miyano, Satoru

    2011-01-01

    Parameter determination is important in modeling and simulating biological pathways including signaling pathways. Parameters are determined according to biological facts obtained from biological experiments and scientific publications. However, such reliable data describing detailed reactions are not reported in most cases. This prompted us to develop a general methodology of determining the parameters of a model in the case of that no information of the underlying biological facts is provided. In this study, we use the Petri net approach for modeling signaling pathways, and propose a method to determine firing delay times of transitions for Petri net models of signaling pathways by introducing stochastic decision rules. Petri net technology provides a powerful approach to modeling and simulating various concurrent systems, and recently have been widely accepted as a description method for biological pathways. Our method enables to determine the range of firing delay time which realizes smooth token flows in the Petri net model of a signaling pathway. The availability of this method has been confirmed by the results of an application to the interleukin-1 induced signaling pathway.

  11. Service-based analysis of biological pathways

    PubMed Central

    Zheng, George; Bouguettaya, Athman

    2009-01-01

    Background Computer-based pathway discovery is concerned with two important objectives: pathway identification and analysis. Conventional mining and modeling approaches aimed at pathway discovery are often effective at achieving either objective, but not both. Such limitations can be effectively tackled leveraging a Web service-based modeling and mining approach. Results Inspired by molecular recognitions and drug discovery processes, we developed a Web service mining tool, named PathExplorer, to discover potentially interesting biological pathways linking service models of biological processes. The tool uses an innovative approach to identify useful pathways based on graph-based hints and service-based simulation verifying user's hypotheses. Conclusion Web service modeling of biological processes allows the easy access and invocation of these processes on the Web. Web service mining techniques described in this paper enable the discovery of biological pathways linking these process service models. Algorithms presented in this paper for automatically highlighting interesting subgraph within an identified pathway network enable the user to formulate hypothesis, which can be tested out using our simulation algorithm that are also described in this paper. PMID:19796403

  12. Exploring consumer exposure pathways and patterns of use for chemicals in the environment through the Chemical/Product Categories Database

    EPA Pesticide Factsheets

    Exploring consumer exposure pathways and patterns of use for chemicals in the environment through the Chemical/Product Categories Database (CPCat) (Presented by: Kathie Dionisio, Sc.D., NERL, US EPA, Research Triangle Park, NC (1/23/2014).

  13. RRW: repeated random walks on genome-scale protein networks for local cluster discovery

    PubMed Central

    Macropol, Kathy; Can, Tolga; Singh, Ambuj K

    2009-01-01

    Background We propose an efficient and biologically sensitive algorithm based on repeated random walks (RRW) for discovering functional modules, e.g., complexes and pathways, within large-scale protein networks. Compared to existing cluster identification techniques, RRW implicitly makes use of network topology, edge weights, and long range interactions between proteins. Results We apply the proposed technique on a functional network of yeast genes and accurately identify statistically significant clusters of proteins. We validate the biological significance of the results using known complexes in the MIPS complex catalogue database and well-characterized biological processes. We find that 90% of the created clusters have the majority of their catalogued proteins belonging to the same MIPS complex, and about 80% have the majority of their proteins involved in the same biological process. We compare our method to various other clustering techniques, such as the Markov Clustering Algorithm (MCL), and find a significant improvement in the RRW clusters' precision and accuracy values. Conclusion RRW, which is a technique that exploits the topology of the network, is more precise and robust in finding local clusters. In addition, it has the added flexibility of being able to find multi-functional proteins by allowing overlapping clusters. PMID:19740439

  14. Systematic review of interleukin-12, interleukin-17, and interleukin-23 pathway inhibitors for the treatment of moderate-to-severe chronic plaque psoriasis: ustekinumab, briakinumab, tildrakizumab, guselkumab, secukinumab, ixekizumab, and brodalumab.

    PubMed

    Tausend, William; Downing, Christopher; Tyring, Stephen

    2014-01-01

    Monoclonal antibodies known as biologic agents specifically targeted against interleukin-12 (IL-12), interleukin-17A (IL-17), and interleukin-23 (IL-23) have been the focus of research for moderate-to-severe chronic plaque psoriasis in recent years. To discuss the immune-mediated model of psoriasis and to summarize current knowledge of the clinical efficacy and safety of new biologic agents for moderate-to-severe chronic plaque psoriasis. The PubMed database was searched for relevant articles on ustekinumab, briakinumab, tildrakizumab (MK-322), guselkumab, secukinumab, ixekizumab, and brodalumab published between January 2005 and July 2013. Fifty-five articles were identified. These studies suggest that the biologic agents specifically targeting IL-12, IL-17, and IL-23 are efficacious and safe in the treatment of moderate-to-severe psoriasis in adults. Current data from clinical trials suggest that biologic agents targeting IL-12, IL-17, and IL-23 are safe and efficacious drugs for use in moderate-to-severe chronic plaque psoriasis. Long-term data still need to be established.

  15. Psychological and biological responses to race-based social stress as pathways to disparities in educational outcomes.

    PubMed

    Levy, Dorainne J; Heissel, Jennifer A; Richeson, Jennifer A; Adam, Emma K

    2016-09-01

    We present the race-based disparities in stress and sleep in context model (RDSSC), which argues that racial/ethnic disparities in educational achievement and attainment are partially explained by the effects of race-based stressors, such as stereotype threat and perceived discrimination, on psychological and biological responses to stress, which, in turn, impact cognitive functioning and academic performance. Whereas the roles of psychological coping responses, such as devaluation and disidentification, have been theorized in previous work, the present model integrates the roles of biological stress responses, such as changes in stress hormones and sleep hours and quality, to this rich literature. We situate our model of the impact of race-based stress in the broader contexts of other stressors [e.g., stressors associated with socioeconomic status (SES)], developmental histories of stress, and individual and group differences in access to resources, opportunity and employment structures. Considering both psychological and biological responses to race-based stressors, in social contexts, will yield a more comprehensive understanding of the emergence of academic disparities between Whites and racial/ethnic minorities. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  16. Object-oriented parsing of biological databases with Python.

    PubMed

    Ramu, C; Gemünd, C; Gibson, T J

    2000-07-01

    While database activities in the biological area are increasing rapidly, rather little is done in the area of parsing them in a simple and object-oriented way. We present here an elegant, simple yet powerful way of parsing biological flat-file databases. We have taken EMBL, SWISSPROT and GENBANK as examples. EMBL and SWISS-PROT do not differ much in the format structure. GENBANK has a very different format structure than EMBL and SWISS-PROT. Extracting the desired fields in an entry (for example a sub-sequence with an associated feature) for later analysis is a constant need in the biological sequence-analysis community: this is illustrated with tools to make new splice-site databases. The interface to the parser is abstract in the sense that the access to all the databases is independent from their different formats, since parsing instructions are hidden.

  17. A systems biology approach identified different regulatory networks targeted by KSHV miR-K12-11 in B cells and endothelial cells.

    PubMed

    Yang, Yajie; Boss, Isaac W; McIntyre, Lauren M; Renne, Rolf

    2014-08-08

    Kaposi's sarcoma associated herpes virus (KSHV) is associated with tumors of endothelial and lymphoid origin. During latent infection, KSHV expresses miR-K12-11, an ortholog of the human tumor gene hsa-miR-155. Both gene products are microRNAs (miRNAs), which are important post-transcriptional regulators that contribute to tissue specific gene expression. Advances in target identification technologies and molecular interaction databases have allowed a systems biology approach to unravel the gene regulatory networks (GRNs) triggered by miR-K12-11 in endothelial and lymphoid cells. Understanding the tissue specific function of miR-K12-11 will help to elucidate underlying mechanisms of KSHV pathogenesis. Ectopic expression of miR-K12-11 differentially affected gene expression in BJAB cells of lymphoid origin and TIVE cells of endothelial origin. Direct miRNA targeting accounted for a small fraction of the observed transcriptome changes: only 29 genes were identified as putative direct targets of miR-K12-11 in both cell types. However, a number of commonly affected biological pathways, such as carbohydrate metabolism and interferon response related signaling, were revealed by gene ontology analysis. Integration of transcriptome profiling, bioinformatic algorithms, and databases of protein-protein interactome from the ENCODE project identified different nodes of GRNs utilized by miR-K12-11 in a tissue-specific fashion. These effector genes, including cancer associated transcription factors and signaling proteins, amplified the regulatory potential of a single miRNA, from a small set of putative direct targets to a larger set of genes. This is the first comparative analysis of miRNA-K12-11's effects in endothelial and B cells, from tissues infected with KSHV in vivo. MiR-K12-11 was able to broadly modulate gene expression in both cell types. Using a systems biology approach, we inferred that miR-K12-11 establishes its GRN by both repressing master TFs and influencing signaling pathways, to counter the host anti-viral response and to promote proliferation and survival of infected cells. The targeted GRNs are more reproducible and informative than target gene identification, and our approach can be applied to other regulatory factors of interest.

  18. THE ADVERSE OUTCOME PATHWAY (AOP) FRAMEWORK: A FRAMEWORK FOR ORGANIZING BIOLOGICAL KNOWLEDGE LEADING TO HEALTH RISKS.

    EPA Science Inventory

    An Adverse Outcome Pathway (AOP) represents the organization of current and newly acquired knowledge of biological pathways. These pathways contain a series of nodes (Key Events, KEs) that when sufficiently altered influence the next node on the pathway, beginning from an Molecul...

  19. ISMB Conference Funding to Support Attendance of Early Researchers and Students

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gaasterland, Terry

    ISMB Conference Funding for Students and Young Scientists Historical Description The Intelligent Systems for Molecular Biology (ISMB) conference has provided a general forum for disseminating the latest developments in bioinformatics on an annual basis for the past 22 years. ISMB is a multidisciplinary conference that brings together scientists from computer science, molecular biology, mathematics and statistics. The goal of the ISMB meeting is to bring together biologists and computational scientists in a focus on actual biological problems, i.e., not simply theoretical calculations. The combined focus on “intelligent systems” and actual biological data makes ISMB a unique and highly important meeting.more » 21 years of experience in holding the conference has resulted in a consistently well-organized, well attended, and highly respected annual conference. "Intelligent systems" include any software which goes beyond straightforward, closed-form algorithms or standard database technologies, and encompasses those that view data in a symbolic fashion, learn from examples, consolidate multiple levels of abstraction, or synthesize results to be cognitively tractable to a human, including the development and application of advanced computational methods for biological problems. Relevant computational techniques include, but are not limited to: machine learning, pattern recognition, knowledge representation, databases, combinatorics, stochastic modeling, string and graph algorithms, linguistic methods, robotics, constraint satisfaction, and parallel computation. Biological areas of interest include molecular structure, genomics, molecular sequence analysis, evolution and phylogenetics, molecular interactions, metabolic pathways, regulatory networks, developmental control, and molecular biology generally. Emphasis is placed on the validation of methods using real data sets, on practical applications in the biological sciences, and on development of novel computational techniques. The ISMB conferences are distinguished from many other conferences in computational biology or artificial intelligence by an insistence that the researchers work with real molecular biology data, not theoretical or toy examples; and from many other biological conferences by providing a forum for technical advances as they occur, which otherwise may be shunned until a firm experimental result is published. The resulting intellectual richness and cross-disciplinary diversity provides an important opportunity for both students and senior researchers. ISMB has become the premier conference series in this field with refereed, published proceedings, establishing an infrastructure to promote the growing body of research.« less

  20. BIOSPIDA: A Relational Database Translator for NCBI

    PubMed Central

    Hagen, Matthew S.; Lee, Eva K.

    2010-01-01

    As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time. PMID:21347013

  1. Modeling biochemical pathways in the gene ontology

    DOE PAGES

    Hill, David P.; D’Eustachio, Peter; Berardini, Tanya Z.; ...

    2016-09-01

    The concept of a biological pathway, an ordered sequence of molecular transformations, is used to collect and represent molecular knowledge for a broad span of organismal biology. Representations of biomedical pathways typically are rich but idiosyncratic presentations of organized knowledge about individual pathways. Meanwhile, biomedical ontologies and associated annotation files are powerful tools that organize molecular information in a logically rigorous form to support computational analysis. The Gene Ontology (GO), representing Molecular Functions, Biological Processes and Cellular Components, incorporates many aspects of biological pathways within its ontological representations. Here we present a methodology for extending and refining the classes inmore » the GO for more comprehensive, consistent and integrated representation of pathways, leveraging knowledge embedded in current pathway representations such as those in the Reactome Knowledgebase and MetaCyc. With carbohydrate metabolic pathways as a use case, we discuss how our representation supports the integration of variant pathway classes into a unified ontological structure that can be used for data comparison and analysis.« less

  2. 1H NMR-metabolomics: can they be a useful tool in our understanding of cardiac arrest?

    PubMed

    Chalkias, Athanasios; Fanos, Vassilios; Noto, Antonio; Castrén, Maaret; Gulati, Anil; Svavarsdóttir, Hildigunnur; Iacovidou, Nicoletta; Xanthos, Theodoros

    2014-05-01

    This review focuses on the presentation of the emerging technology of metabolomics, a promising tool for the detection of identifying the unrevealed biological pathways that lead to cardiac arrest. The electronic bases of PubMed, Scopus, and EMBASE were searched. Research terms were identified using the MESH database and were combined thereafter. Initial search terms were "cardiac arrest", "cardiopulmonary resuscitation", "post-cardiac arrest syndrome" combined with "metabolomics". Metabolomics allow the monitoring of hundreds of metabolites from tissues or body fluids and already influence research in the field of cardiac metabolism. This approach has elucidated several pathophysiological mechanisms and identified profiles of metabolic changes that can be used to follow the disease processes occurring in the peri-arrest period. This can be achieved through leveraging the strengths of unbiased metabolome-wide scans, which include thousands of final downstream products of gene transcription, enzyme activity and metabolic products of extraneously administered substances, in order to identify a metabolomic fingerprint associated with an increased risk of cardiac arrest. Although this technology is still under development, metabolomics is a promising tool for elucidating biological pathways and discovering clinical biomarkers, strengthening the efforts for optimizing both the prevention and treatment of cardiac arrest. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  3. Evolving Strategies for the Incorporation of Bioinformatics within the Undergraduate Cell Biology Curriculum

    ERIC Educational Resources Information Center

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in…

  4. Seeking unique and common biological themes in multiple gene lists or datasets: pathway pattern extraction pipeline for pathway-level comparative analysis.

    PubMed

    Yi, Ming; Mudunuri, Uma; Che, Anney; Stephens, Robert M

    2009-06-29

    One of the challenges in the analysis of microarray data is to integrate and compare the selected (e.g., differential) gene lists from multiple experiments for common or unique underlying biological themes. A common way to approach this problem is to extract common genes from these gene lists and then subject these genes to enrichment analysis to reveal the underlying biology. However, the capacity of this approach is largely restricted by the limited number of common genes shared by datasets from multiple experiments, which could be caused by the complexity of the biological system itself. We now introduce a new Pathway Pattern Extraction Pipeline (PPEP), which extends the existing WPS application by providing a new pathway-level comparative analysis scheme. To facilitate comparing and correlating results from different studies and sources, PPEP contains new interfaces that allow evaluation of the pathway-level enrichment patterns across multiple gene lists. As an exploratory tool, this analysis pipeline may help reveal the underlying biological themes at both the pathway and gene levels. The analysis scheme provided by PPEP begins with multiple gene lists, which may be derived from different studies in terms of the biological contexts, applied technologies, or methodologies. These lists are then subjected to pathway-level comparative analysis for extraction of pathway-level patterns. This analysis pipeline helps to explore the commonality or uniqueness of these lists at the level of pathways or biological processes from different but relevant biological systems using a combination of statistical enrichment measurements, pathway-level pattern extraction, and graphical display of the relationships of genes and their associated pathways as Gene-Term Association Networks (GTANs) within the WPS platform. As a proof of concept, we have used the new method to analyze many datasets from our collaborators as well as some public microarray datasets. This tool provides a new pathway-level analysis scheme for integrative and comparative analysis of data derived from different but relevant systems. The tool is freely available as a Pathway Pattern Extraction Pipeline implemented in our existing software package WPS, which can be obtained at http://www.abcc.ncifcrf.gov/wps/wps_index.php.

  5. Ouabain rescues rat nephrogenesis during intrauterine growth restriction by regulating the complement and coagulation cascades and calcium signaling pathway.

    PubMed

    Chen, L; Yue, J; Han, X; Li, J; Hu, Y

    2016-02-01

    Intrauterine growth restriction (IUGR) is associated with a reduction in the numbers of nephrons in neonates, which increases the risk of hypertension. Our previous study showed that ouabain protects the development of the embryonic kidney during IUGR. To explore this molecular mechanism, IUGR rats were induced by protein and calorie restriction throughout pregnancy, and ouabain was delivered using a mini osmotic pump. RNA sequencing technology was used to identify the differentially expressed genes (DEGs) of the embryonic kidneys. DEGs were submitted to the Database for Annotation and Visualization and Integrated Discovery, and gene ontology enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were conducted. Maternal malnutrition significantly reduced fetal weight, but ouabain treatment had no significant effect on body weight. A total of 322 (177 upregulated and 145 downregulated) DEGs were detected between control and the IUGR group. Meanwhile, 318 DEGs were found to be differentially expressed (180 increased and 138 decreased) between the IUGR group and the ouabain-treated group. KEGG pathway analysis indicated that maternal undernutrition mainly disrupts the complement and coagulation cascades and the calcium signaling pathway, which could be protected by ouabain treatment. Taken together, these two biological pathways may play an important role in nephrogenesis, indicating potential novel therapeutic targets against the unfavorable effects of IUGR.

  6. Prenatal Exposure to Arsenic and Cadmium Impacts Infectious Disease-Related Genes within the Glucocorticoid Receptor Signal Transduction Pathway

    PubMed Central

    Rager, Julia E.; Yosim, Andrew; Fry, Rebecca C.

    2014-01-01

    There is increasing evidence that environmental agents mediate susceptibility to infectious disease. Studies support the impact of prenatal/early life exposure to the environmental metals inorganic arsenic (iAs) and cadmium (Cd) on increased risk for susceptibility to infection. The specific biological mechanisms that underlie such exposure-mediated effects remain understudied. This research aimed to identify key genes/signal transduction pathways that associate prenatal exposure to these toxic metals with changes in infectious disease susceptibility using a Comparative Genomic Enrichment Method (CGEM). Using CGEM an infectious disease gene (IDG) database was developed comprising 1085 genes with known roles in viral, bacterial, and parasitic disease pathways. Subsequently, datasets collected from human pregnancy cohorts exposed to iAs or Cd were examined in relationship to the IDGs, specifically focusing on data representing epigenetic modifications (5-methyl cytosine), genomic perturbations (mRNA expression), and proteomic shifts (protein expression). A set of 82 infection and exposure-related genes was identified and found to be enriched for their role in the glucocorticoid receptor signal transduction pathway. Given their common identification across numerous human cohorts and their known toxicological role in disease, the identified genes within the glucocorticoid signal transduction pathway may underlie altered infectious disease susceptibility associated with prenatal exposures to the toxic metals iAs and Cd in humans. PMID:25479081

  7. Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application

    PubMed Central

    Cantor, Rita M.; Lange, Kenneth; Sinsheimer, Janet S.

    2010-01-01

    Genome-wide association studies (GWAS) have rapidly become a standard method for disease gene discovery. A substantial number of recent GWAS indicate that for most disorders, only a few common variants are implicated and the associated SNPs explain only a small fraction of the genetic risk. This review is written from the viewpoint that findings from the GWAS provide preliminary genetic information that is available for additional analysis by statistical procedures that accumulate evidence, and that these secondary analyses are very likely to provide valuable information that will help prioritize the strongest constellations of results. We review and discuss three analytic methods to combine preliminary GWAS statistics to identify genes, alleles, and pathways for deeper investigations. Meta-analysis seeks to pool information from multiple GWAS to increase the chances of finding true positives among the false positives and provides a way to combine associations across GWAS, even when the original data are unavailable. Testing for epistasis within a single GWAS study can identify the stronger results that are revealed when genes interact. Pathway analysis of GWAS results is used to prioritize genes and pathways within a biological context. Following a GWAS, association results can be assigned to pathways and tested in aggregate with computational tools and pathway databases. Reviews of published methods with recommendations for their application are provided within the framework for each approach. PMID:20074509

  8. Knowledge management for systems biology a general and visually driven framework applied to translational medicine.

    PubMed

    Maier, Dieter; Kalus, Wenzel; Wolff, Martin; Kalko, Susana G; Roca, Josep; Marin de Mas, Igor; Turan, Nil; Cascante, Marta; Falciani, Francesco; Hernandez, Miguel; Villà-Freixa, Jordi; Losko, Sascha

    2011-03-05

    To enhance our understanding of complex biological systems like diseases we need to put all of the available data into context and use this to detect relations, pattern and rules which allow predictive hypotheses to be defined. Life science has become a data rich science with information about the behaviour of millions of entities like genes, chemical compounds, diseases, cell types and organs, which are organised in many different databases and/or spread throughout the literature. Existing knowledge such as genotype-phenotype relations or signal transduction pathways must be semantically integrated and dynamically organised into structured networks that are connected with clinical and experimental data. Different approaches to this challenge exist but so far none has proven entirely satisfactory. To address this challenge we previously developed a generic knowledge management framework, BioXM™, which allows the dynamic, graphic generation of domain specific knowledge representation models based on specific objects and their relations supporting annotations and ontologies. Here we demonstrate the utility of BioXM for knowledge management in systems biology as part of the EU FP6 BioBridge project on translational approaches to chronic diseases. From clinical and experimental data, text-mining results and public databases we generate a chronic obstructive pulmonary disease (COPD) knowledge base and demonstrate its use by mining specific molecular networks together with integrated clinical and experimental data. We generate the first semantically integrated COPD specific public knowledge base and find that for the integration of clinical and experimental data with pre-existing knowledge the configuration based set-up enabled by BioXM reduced implementation time and effort for the knowledge base compared to similar systems implemented as classical software development projects. The knowledgebase enables the retrieval of sub-networks including protein-protein interaction, pathway, gene--disease and gene--compound data which are used for subsequent data analysis, modelling and simulation. Pre-structured queries and reports enhance usability; establishing their use in everyday clinical settings requires further simplification with a browser based interface which is currently under development.

  9. Knowledge management for systems biology a general and visually driven framework applied to translational medicine

    PubMed Central

    2011-01-01

    Background To enhance our understanding of complex biological systems like diseases we need to put all of the available data into context and use this to detect relations, pattern and rules which allow predictive hypotheses to be defined. Life science has become a data rich science with information about the behaviour of millions of entities like genes, chemical compounds, diseases, cell types and organs, which are organised in many different databases and/or spread throughout the literature. Existing knowledge such as genotype - phenotype relations or signal transduction pathways must be semantically integrated and dynamically organised into structured networks that are connected with clinical and experimental data. Different approaches to this challenge exist but so far none has proven entirely satisfactory. Results To address this challenge we previously developed a generic knowledge management framework, BioXM™, which allows the dynamic, graphic generation of domain specific knowledge representation models based on specific objects and their relations supporting annotations and ontologies. Here we demonstrate the utility of BioXM for knowledge management in systems biology as part of the EU FP6 BioBridge project on translational approaches to chronic diseases. From clinical and experimental data, text-mining results and public databases we generate a chronic obstructive pulmonary disease (COPD) knowledge base and demonstrate its use by mining specific molecular networks together with integrated clinical and experimental data. Conclusions We generate the first semantically integrated COPD specific public knowledge base and find that for the integration of clinical and experimental data with pre-existing knowledge the configuration based set-up enabled by BioXM reduced implementation time and effort for the knowledge base compared to similar systems implemented as classical software development projects. The knowledgebase enables the retrieval of sub-networks including protein-protein interaction, pathway, gene - disease and gene - compound data which are used for subsequent data analysis, modelling and simulation. Pre-structured queries and reports enhance usability; establishing their use in everyday clinical settings requires further simplification with a browser based interface which is currently under development. PMID:21375767

  10. CellNetVis: a web tool for visualization of biological networks using force-directed layout constrained by cellular components.

    PubMed

    Heberle, Henry; Carazzolle, Marcelo Falsarella; Telles, Guilherme P; Meirelles, Gabriela Vaz; Minghim, Rosane

    2017-09-13

    The advent of "omics" science has brought new perspectives in contemporary biology through the high-throughput analyses of molecular interactions, providing new clues in protein/gene function and in the organization of biological pathways. Biomolecular interaction networks, or graphs, are simple abstract representations where the components of a cell (e.g. proteins, metabolites etc.) are represented by nodes and their interactions are represented by edges. An appropriate visualization of data is crucial for understanding such networks, since pathways are related to functions that occur in specific regions of the cell. The force-directed layout is an important and widely used technique to draw networks according to their topologies. Placing the networks into cellular compartments helps to quickly identify where network elements are located and, more specifically, concentrated. Currently, only a few tools provide the capability of visually organizing networks by cellular compartments. Most of them cannot handle large and dense networks. Even for small networks with hundreds of nodes the available tools are not able to reposition the network while the user is interacting, limiting the visual exploration capability. Here we propose CellNetVis, a web tool to easily display biological networks in a cell diagram employing a constrained force-directed layout algorithm. The tool is freely available and open-source. It was originally designed for networks generated by the Integrated Interactome System and can be used with networks from others databases, like InnateDB. CellNetVis has demonstrated to be applicable for dynamic investigation of complex networks over a consistent representation of a cell on the Web, with capabilities not matched elsewhere.

  11. Financing a future for public biological data.

    PubMed

    Ellis, L B; Kalumbi, D

    1999-09-01

    The public web-based biological database infrastructure is a source of both wonder and worry. Users delight in the ever increasing amounts of information available; database administrators and curators worry about long-term financial support. An earlier study of 153 biological databases (Ellis and Kalumbi, Nature Biotechnol., 16, 1323-1324, 1998) determined that near future (1-5 year) funding for over two-thirds of them was uncertain. More detailed data are required to determine the magnitude of the problem and offer possible solutions. This study examines the finances and use statistics of a few of these organizations in more depth, and reviews several economic models that may help sustain them. Six organizations were studied. Their administrative overhead is fairly low; non-administrative personnel and computer-related costs account for 77% of expenses. One smaller, more specialized US database, in 1997, had 60% of total access from US domains; a majority (56%) of its US accesses came from commercial domains, although only 2% of the 153 databases originally studied received any industrial support. The most popular model used to gain industrial support is asymmetric pricing: preferentially charging the commercial users of a database. At least five biological databases have recently begun using this model. Advertising is another model which may be useful for the more general, more heavily used sites. Microcommerce has promise, especially for databases that do not attract advertisers, but needs further testing. The least income reported for any of the databases studied was $50,000/year; applying this rate to 400 biological databases (a lower limit of the number of such databases, many of which require far larger resources) would mean annual support need of at least $20 million. To obtain this level of support is challenging, yet failure to accept the challenge could be catastrophic. lynda@tc.umn. edu

  12. Poplar Wood Rays Are Involved in Seasonal Remodeling of Tree Physiology1[C][W

    PubMed Central

    Larisch, Christina; Dittrich, Marcus; Wildhagen, Henning; Lautner, Silke; Fromm, Jörg; Polle, Andrea; Hedrich, Rainer; Rennenberg, Heinz; Müller, Tobias; Ache, Peter

    2012-01-01

    Understanding seasonality and longevity is a major challenge in tree biology. In woody species, growth phases and dormancy follow one another consecutively. In the oldest living individuals, the annual cycle may run for more than 1,000 years. So far, however, not much is known about the processes triggering reactivation from dormancy. In this study, we focused on wood rays, which are known to play an important role in tree development. The transition phase from dormancy to flowering in early spring was compared with the phase of active growth in summer. Rays from wood samples of poplar (Populus × canescens) were enriched by laser microdissection, and transcripts were monitored by poplar whole-genome microarrays. The resulting seasonally varying complex expression and metabolite patterns were subjected to pathway analyses. In February, the metabolic pathways related to flower induction were high, indicating that reactivation from dormancy was already taking place at this time of the year. In July, the pathways related to active growth, like lignin biosynthesis, nitrogen assimilation, and defense, were enriched. Based on “marker” genes identified in our pathway analyses, we were able to validate periodical changes in wood samples by quantitative polymerase chain reaction. These studies, and the resulting ray database, provide new insights into the steps underlying the seasonality of poplar trees. PMID:22992511

  13. Corruption, development and governance indicators predict invasive species risk from trade

    PubMed Central

    Brenton-Rule, Evan C.; Barbieri, Rafael F.; Lester, Philip J.

    2016-01-01

    Invasive species have an enormous global impact, with international trade being the leading pathway for their introduction. Current multinational trade deals under negotiation will dramatically change trading partnerships and pathways. These changes have considerable potential to influence biological invasions and global biodiversity. Using a database of 47 328 interceptions spanning 10 years, we demonstrate how development and governance socio-economic indicators of trading partners can predict exotic species interceptions. For import pathways associated with vegetable material, a significantly higher risk of exotic species interceptions was associated with countries that are poorly regulated, have more forest cover and have surprisingly low corruption. Corruption and indicators such as political stability or adherence to rule of law were important in vehicle or timber import pathways. These results will be of considerable value to policy makers, primarily by shifting quarantine procedures to focus on countries of high risk based on their socio-economic status. Further, using New Zealand as an example, we demonstrate how a ninefold reduction in incursions could be achieved if socio-economic indicators were used to select trade partners. International trade deals that ignore governance and development indicators may facilitate introductions and biodiversity loss. Development and governance within countries clearly have biodiversity implications beyond borders. PMID:27306055

  14. Corruption, development and governance indicators predict invasive species risk from trade.

    PubMed

    Brenton-Rule, Evan C; Barbieri, Rafael F; Lester, Philip J

    2016-06-15

    Invasive species have an enormous global impact, with international trade being the leading pathway for their introduction. Current multinational trade deals under negotiation will dramatically change trading partnerships and pathways. These changes have considerable potential to influence biological invasions and global biodiversity. Using a database of 47 328 interceptions spanning 10 years, we demonstrate how development and governance socio-economic indicators of trading partners can predict exotic species interceptions. For import pathways associated with vegetable material, a significantly higher risk of exotic species interceptions was associated with countries that are poorly regulated, have more forest cover and have surprisingly low corruption. Corruption and indicators such as political stability or adherence to rule of law were important in vehicle or timber import pathways. These results will be of considerable value to policy makers, primarily by shifting quarantine procedures to focus on countries of high risk based on their socio-economic status. Further, using New Zealand as an example, we demonstrate how a ninefold reduction in incursions could be achieved if socio-economic indicators were used to select trade partners. International trade deals that ignore governance and development indicators may facilitate introductions and biodiversity loss. Development and governance within countries clearly have biodiversity implications beyond borders. © 2016 The Author(s).

  15. Identifying pathways affected by cancer mutations.

    PubMed

    Iengar, Prathima

    2017-12-16

    Mutations in 15 cancers, sourced from the COSMIC Whole Genomes database, and 297 human pathways, arranged into pathway groups based on the processes they orchestrate, and sourced from the KEGG pathway database, have together been used to identify pathways affected by cancer mutations. Genes studied in ≥15, and mutated in ≥10 samples of a cancer have been considered recurrently mutated, and pathways with recurrently mutated genes have been considered affected in the cancer. Novel doughnut plots have been presented which enable visualization of the extent to which pathways and genes, in each pathway group, are targeted, in each cancer. The 'organismal systems' pathway group (including organism-level pathways; e.g., nervous system) is the most targeted, more than even the well-recognized signal transduction, cell-cycle and apoptosis, and DNA repair pathway groups. The important, yet poorly-recognized, role played by the group merits attention. Pathways affected in ≥7 cancers yielded insights into processes affected. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. DrugPath: a database for academic investigators to match oncology molecular targets with drugs in development.

    PubMed

    Shah, Eric D; Fisch, Brandon M A; Arceci, Robert J; Buckley, Jonathan D; Reaman, Gregory H; Sorensen, Poul H; Triche, Timothy J; Reynolds, C Patrick

    2014-05-01

    Academic laboratories are developing increasingly large amounts of data that describe the genomic landscape and gene expression patterns of various types of cancers. Such data can potentially identify novel oncology molecular targets in cancer types that may not be the primary focus of a drug sponsor's initial research for an investigational new drug. Obtaining preclinical data that point toward the potential for a given molecularly targeted agent, or a novel combination of agents requires knowledge of drugs currently in development in both the academic and commercial sectors. We have developed the DrugPath database ( http://www.drugpath.org ) as a comprehensive, free-of-charge resource for academic investigators to identify agents being developed in academics or industry that may act against molecular targets of interest. DrugPath data on molecular targets overlay the Michigan Molecular Interactions ( http://mimi.ncibi.org ) gene-gene interaction map to facilitate identification of related agents in the same pathway. The database catalogs 2,081 drug development programs representing 751 drug sponsors and 722 molecular and genetic targets. DrugPath should assist investigators in identifying and obtaining drugs acting on specific molecular targets for biological and preclinical therapeutic studies.

  17. A proteomics study of barley powdery mildew haustoria.

    PubMed

    Godfrey, Dale; Zhang, Ziguo; Saalbach, Gerhard; Thordal-Christensen, Hans

    2009-06-01

    A number of fungal and oomycete plant pathogens of major economic importance feed on their hosts by means of haustoria, which they place inside living plant cells. The underlying mechanisms are poorly understood, partly due to difficulty in preparing haustoria. We have therefore developed a procedure for isolating haustoria from the barley powdery mildew fungus (Blumeria graminis f.sp. hordei, Bgh). We subsequently aimed to understand the molecular mechanisms of haustoria through a study of their proteome. Extracted proteins were digested using trypsin, separated by LC, and analysed by MS/MS. Searches of a custom Bgh EST sequence database and the NCBI-NR fungal protein database, using the MS/MS data, identified 204 haustoria proteins. The majority of the proteins appear to have roles in protein metabolic pathways and biological energy production. Surprisingly, pyruvate decarboxylase (PDC), involved in alcoholic fermentation and commonly abundant in fungi and plants, was absent in our Bgh proteome data set. A sequence encoding this enzyme was also absent in our EST sequence database. Significantly, BLAST searches of the recently available Bgh genome sequence data also failed to identify a sequence encoding this enzyme, strongly indicating that Bgh does not have a gene for PDC.

  18. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315

  19. BioWarehouse: a bioinformatics database warehouse toolkit.

    PubMed

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

    2006-03-23

    This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

  20. Computer applications making rapid advances in high throughput microbial proteomics (HTMP).

    PubMed

    Anandkumar, Balakrishna; Haga, Steve W; Wu, Hui-Fen

    2014-02-01

    The last few decades have seen the rise of widely-available proteomics tools. From new data acquisition devices, such as MALDI-MS and 2DE to new database searching softwares, these new products have paved the way for high throughput microbial proteomics (HTMP). These tools are enabling researchers to gain new insights into microbial metabolism, and are opening up new areas of study, such as protein-protein interactions (interactomics) discovery. Computer software is a key part of these emerging fields. This current review considers: 1) software tools for identifying the proteome, such as MASCOT or PDQuest, 2) online databases of proteomes, such as SWISS-PROT, Proteome Web, or the Proteomics Facility of the Pathogen Functional Genomics Resource Center, and 3) software tools for applying proteomic data, such as PSI-BLAST or VESPA. These tools allow for research in network biology, protein identification, functional annotation, target identification/validation, protein expression, protein structural analysis, metabolic pathway engineering and drug discovery.

  1. CellBase, a comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources.

    PubMed

    Bleda, Marta; Tarraga, Joaquin; de Maria, Alejandro; Salavert, Francisco; Garcia-Alonso, Luz; Celma, Matilde; Martin, Ainoha; Dopazo, Joaquin; Medina, Ignacio

    2012-07-01

    During the past years, the advances in high-throughput technologies have produced an unprecedented growth in the number and size of repositories and databases storing relevant biological data. Today, there is more biological information than ever but, unfortunately, the current status of many of these repositories is far from being optimal. Some of the most common problems are that the information is spread out in many small databases; frequently there are different standards among repositories and some databases are no longer supported or they contain too specific and unconnected information. In addition, data size is increasingly becoming an obstacle when accessing or storing biological data. All these issues make very difficult to extract and integrate information from different sources, to analyze experiments or to access and query this information in a programmatic way. CellBase provides a solution to the growing necessity of integration by easing the access to biological data. CellBase implements a set of RESTful web services that query a centralized database containing the most relevant biological data sources. The database is hosted in our servers and is regularly updated. CellBase documentation can be found at http://docs.bioinfo.cipf.es/projects/cellbase.

  2. Relation extraction for biological pathway construction using node2vec.

    PubMed

    Kim, Munui; Baek, Seung Han; Song, Min

    2018-06-13

    Systems biology is an important field for understanding whole biological mechanisms composed of interactions between biological components. One approach for understanding complex and diverse mechanisms is to analyze biological pathways. However, because these pathways consist of important interactions and information on these interactions is disseminated in a large number of biomedical reports, text-mining techniques are essential for extracting these relationships automatically. In this study, we applied node2vec, an algorithmic framework for feature learning in networks, for relationship extraction. To this end, we extracted genes from paper abstracts using pkde4j, a text-mining tool for detecting entities and relationships. Using the extracted genes, a co-occurrence network was constructed and node2vec was used with the network to generate a latent representation. To demonstrate the efficacy of node2vec in extracting relationships between genes, performance was evaluated for gene-gene interactions involved in a type 2 diabetes pathway. Moreover, we compared the results of node2vec to those of baseline methods such as co-occurrence and DeepWalk. Node2vec outperformed existing methods in detecting relationships in the type 2 diabetes pathway, demonstrating that this method is appropriate for capturing the relatedness between pairs of biological entities involved in biological pathways. The results demonstrated that node2vec is useful for automatic pathway construction.

  3. miRPathDB: a new dictionary on microRNAs and target pathways.

    PubMed

    Backes, Christina; Kehl, Tim; Stöckel, Daniel; Fehlmann, Tobias; Schneider, Lara; Meese, Eckart; Lenhof, Hans-Peter; Keller, Andreas

    2017-01-04

    In the last decade, miRNAs and their regulatory mechanisms have been intensively studied and many tools for the analysis of miRNAs and their targets have been developed. We previously presented a dictionary on single miRNAs and their putative target pathways. Since then, the number of miRNAs has tripled and the knowledge on miRNAs and targets has grown substantially. This, along with changes in pathway resources such as KEGG, leads to an improved understanding of miRNAs, their target genes and related pathways. Here, we introduce the miRNA Pathway Dictionary Database (miRPathDB), freely accessible at https://mpd.bioinf.uni-sb.de/ With the database we aim to complement available target pathway web-servers by providing researchers easy access to the information which pathways are regulated by a miRNA, which miRNAs target a pathway and how specific these regulations are. The database contains a large number of miRNAs (2595 human miRNAs), different miRNA target sets (14 773 experimentally validated target genes as well as 19 281 predicted targets genes) and a broad selection of functional biochemical categories (KEGG-, WikiPathways-, BioCarta-, SMPDB-, PID-, Reactome pathways, functional categories from gene ontology (GO), protein families from Pfam and chromosomal locations totaling 12 875 categories). In addition to Homo sapiens, also Mus musculus data are stored and can be compared to human target pathways. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. An improved hybrid of particle swarm optimization and the gravitational search algorithm to produce a kinetic parameter estimation of aspartate biochemical pathways.

    PubMed

    Ismail, Ahmad Muhaimin; Mohamad, Mohd Saberi; Abdul Majid, Hairudin; Abas, Khairul Hamimah; Deris, Safaai; Zaki, Nazar; Mohd Hashim, Siti Zaiton; Ibrahim, Zuwairie; Remli, Muhammad Akmal

    2017-12-01

    Mathematical modelling is fundamental to understand the dynamic behavior and regulation of the biochemical metabolisms and pathways that are found in biological systems. Pathways are used to describe complex processes that involve many parameters. It is important to have an accurate and complete set of parameters that describe the characteristics of a given model. However, measuring these parameters is typically difficult and even impossible in some cases. Furthermore, the experimental data are often incomplete and also suffer from experimental noise. These shortcomings make it challenging to identify the best-fit parameters that can represent the actual biological processes involved in biological systems. Computational approaches are required to estimate these parameters. The estimation is converted into multimodal optimization problems that require a global optimization algorithm that can avoid local solutions. These local solutions can lead to a bad fit when calibrating with a model. Although the model itself can potentially match a set of experimental data, a high-performance estimation algorithm is required to improve the quality of the solutions. This paper describes an improved hybrid of particle swarm optimization and the gravitational search algorithm (IPSOGSA) to improve the efficiency of a global optimum (the best set of kinetic parameter values) search. The findings suggest that the proposed algorithm is capable of narrowing down the search space by exploiting the feasible solution areas. Hence, the proposed algorithm is able to achieve a near-optimal set of parameters at a fast convergence speed. The proposed algorithm was tested and evaluated based on two aspartate pathways that were obtained from the BioModels Database. The results show that the proposed algorithm outperformed other standard optimization algorithms in terms of accuracy and near-optimal kinetic parameter estimation. Nevertheless, the proposed algorithm is only expected to work well in small scale systems. In addition, the results of this study can be used to estimate kinetic parameter values in the stage of model selection for different experimental conditions. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. PathJam: a new service for integrating biological pathway information.

    PubMed

    Glez-Peña, Daniel; Reboiro-Jato, Miguel; Domínguez, Rubén; Gómez-López, Gonzalo; Pisano, David G; Fdez-Riverola, Florentino

    2010-10-28

    Biological pathways are crucial to much of the scientific research today including the study of specific biological processes related with human diseases. PathJam is a new comprehensive and freely accessible web-server application integrating scattered human pathway annotation from several public sources. The tool has been designed for both (i) being intuitive for wet-lab users providing statistical enrichment analysis of pathway annotations and (ii) giving support to the development of new integrative pathway applications. PathJam’s unique features and advantages include interactive graphs linking pathways and genes of interest, downloadable results in fully compatible formats, GSEA compatible output files and a standardized RESTful API.

  6. Differential receptor dependencies: expression and significance of muscarinic M1 receptors in the biology of prostate cancer.

    PubMed

    Mannan Baig, Abdul; Khan, Naveed A; Effendi, Vardah; Rana, Zohaib; Ahmad, H R; Abbas, Farhat

    2017-01-01

    Recent reports on acetylcholine muscarinic receptor subtype 3 (CHRM3) have shown its growth-promoting role in prostate cancer. Additional studies report the proliferative effect of the cholinergic agonist carbachol on prostate cancer by its agonistic action on CHRM3. This study shows that the type 1 acetylcholine muscarinic receptor (CHRM1) contributes toward the proliferation and growth of prostate cancer. We used growth and cytotoxic assays, the prostate cancer microarray database and CHRM downstream pathways' homology of CHRM subtypes to uncover multiple signals leading to the growth of prostate cancer. Growth assays showed that pilocarpine stimulates the proliferation of prostate cancer. Moreover, it shows that carbachol exerts an additional agonistic action on nicotinic cholinergic receptor of prostate cancer cells that can be blocked by tubocurarine. With the use of selective CHRM1 antagonists such as pirenzepine and dicyclomine, a considerable inhibition of proliferation of prostate cancer cell lines was observed in dose ranging from 15-60 µg/ml of dicyclomine. The microarray database of prostate cancer shows a dominant expression of CHRM1 in prostate cancer compared with other cholinergic subtypes. The bioinformatics of prostate cancer and CHRM pathways show that the downstream signalling include PIP3-AKT-CaM-mediated growth in LNCaP and PC3 cells. Our study suggests that antagonism of CHRM1 may be a potential therapeutic target against prostate cancer.

  7. The new follow-on-biologics law: a section by section analysis of the patent litigation provisions in the Biologics Price Competition and Innovation Act of 2009.

    PubMed

    Dougherty, Michael P

    2010-01-01

    An abbreviated pathway for the approval of biosimilar biological products, often called "follow-on biologics," has been enacted into law as part of the health care legislation recently passed by Congress and signed by the President. The subtitle of the health care bill establishing this approval pathway, the Biologics Price Competition and Innovation Act of 2009, includes many provisions governing the identification of patents relevant to a given biosimilar biological product and the assertion of those patents in infringement suits. This article provides a section-by-section analysis of the patent-related provisions of the new approval pathway for biosimilar biological products, and points out several ways in which the new law differs fundamentally from the Hatch-Waxman Act, which provides the approval pathway for generic versions of small molecule drugs.

  8. Bioinformatics Analysis of Protein Phosphorylation in Plant Systems Biology Using P3DB.

    PubMed

    Yao, Qiuming; Xu, Dong

    2017-01-01

    Protein phosphorylation is one of the most pervasive protein post-translational modification events in plant cells. It is involved in many plant biological processes, such as plant growth, organ development, and plant immunology, by regulating or switching signaling and metabolic pathways. High-throughput experimental methods like mass spectrometry can easily characterize hundreds to thousands of phosphorylation events in a single experiment. With the increasing volume of the data sets, Plant Protein Phosphorylation DataBase (P3DB, http://p3db.org ) provides a comprehensive, systematic, and interactive online platform to deposit, query, analyze, and visualize these phosphorylation events in many plant species. It stores the protein phosphorylation sites in the context of identified mass spectra, phosphopeptides, and phosphoproteins contributed from various plant proteome studies. In addition, P3DB associates these plant phosphorylation sites to protein physicochemical information in the protein charts and tertiary structures, while various protein annotations from hierarchical kinase phosphatase families, protein domains, and gene ontology are also added into the database. P3DB not only provides rich information, but also interconnects and provides visualization of the data in networks, in systems biology context. Currently, P3DB includes the KiC (Kinase Client) assay network, the protein-protein interaction network, the kinase-substrate network, the phosphatase-substrate network, and the protein domain co-occurrence network. All of these are available to query for and visualize existing phosphorylation events. Although P3DB only hosts experimentally identified phosphorylation data, it provides a plant phosphorylation prediction model for any unknown queries on the fly. P3DB is an entry point to the plant phosphorylation community to deposit and visualize any customized data sets within this systems biology framework. Nowadays, P3DB has become one of the major bioinformatics platforms of protein phosphorylation in plant biology.

  9. The Listeria monocytogenes strain 10403S BioCyc database.

    PubMed

    Orsi, Renato H; Bergholz, Teresa M; Wiedmann, Martin; Boor, Kathryn J

    2015-01-01

    Listeria monocytogenes is a food-borne pathogen of humans and other animals. The striking ability to survive several stresses usually used for food preservation makes L. monocytogenes one of the biggest concerns to the food industry, while the high mortality of listeriosis in specific groups of humans makes it a great concern for public health. Previous studies have shown that a regulatory network involving alternative sigma (σ) factors and transcription factors is pivotal to stress survival. However, few studies have evaluated at the metabolic networks controlled by these regulatory mechanisms. The L. monocytogenes BioCyc database uses the strain 10403S as a model. Computer-generated initial annotation for all genes also allowed for identification, annotation and display of predicted reactions and pathways carried out by a single cell. Further ongoing manual curation based on published data as well as database mining for selected genes allowed the more refined annotation of functions, which, in turn, allowed for annotation of new pathways and fine-tuning of previously defined pathways to more L. monocytogenes-specific pathways. Using RNA-Seq data, several transcription start sites and promoter regions were mapped to the 10403S genome and annotated within the database. Additionally, the identification of promoter regions and a comprehensive review of available literature allowed the annotation of several regulatory interactions involving σ factors and transcription factors. The L. monocytogenes 10403S BioCyc database is a new resource for researchers studying Listeria and related organisms. It allows users to (i) have a comprehensive view of all reactions and pathways predicted to take place within the cell in the cellular overview, as well as to (ii) upload their own data, such as differential expression data, to visualize the data in the scope of predicted pathways and regulatory networks and to carry on enrichment analyses using several different annotations available within the database. © The Author(s) 2015. Published by Oxford University Press.

  10. Biological agents database in the armed forces.

    PubMed

    Niemcewicz, Marcin; Kocik, Janusz; Bielecka, Anna; Wierciński, Michał

    2014-10-01

    Rapid detection and identification of the biological agent during both, natural or deliberate outbreak is crucial for implementation of appropriate control measures and procedures in order to mitigate the spread of disease. Determination of pathogen etiology may not only support epidemiological investigation and safety of human beings, but also enhance forensic efforts in pathogen tracing, collection of evidences and correct inference. The article presents objectives of the Biological Agents Database, which was developed for the purpose of the Ministry of National Defense of the Republic of Poland under the European Defence Agency frame. The Biological Agents Database is an electronic catalogue of genetic markers of highly dangerous pathogens and biological agents of weapon of mass destruction concern, which provides full identification of biological threats emerging in Poland and in locations of activity of Polish troops. The Biological Agents Database is a supportive tool used for tracing biological agents' origin as well as rapid identification of agent causing the disease of unknown etiology. It also provides support in diagnosis, analysis, response and exchange of information between institutions that use information contained in it. Therefore, it can be used not only for military purposes, but also in a civilian environment.

  11. The relationship between inadvertent ingestion and dermal exposure pathways: a new integrated conceptual model and a database of dermal and oral transfer efficiencies.

    PubMed

    Gorman Ng, Melanie; Semple, Sean; Cherrie, John W; Christopher, Yvette; Northage, Christine; Tielemans, Erik; Veroughstraete, Violaine; Van Tongeren, Martie

    2012-11-01

    Occupational inadvertent ingestion exposure is ingestion exposure due to contact between the mouth and contaminated hands or objects. Although individuals are typically oblivious to their exposure by this route, it is a potentially significant source of occupational exposure for some substances. Due to the continual flux of saliva through the oral cavity and the non-specificity of biological monitoring to routes of exposure, direct measurement of exposure by the inadvertent ingestion route is challenging; predictive models may be required to assess exposure. The work described in this manuscript has been carried out as part of a project to develop a predictive model for estimating inadvertent ingestion exposure in the workplace. As inadvertent ingestion exposure mainly arises from hand-to-mouth contact, it is closely linked to dermal exposure. We present a new integrated conceptual model for dermal and inadvertent ingestion exposure that should help to increase our understanding of ingestion exposure and our ability to simultaneously estimate exposure by the dermal and ingestion routes. The conceptual model consists of eight compartments (source, air, surface contaminant layer, outer clothing contaminant layer, inner clothing contaminant layer, hands and arms layer, perioral layer, and oral cavity) and nine mass transport processes (emission, deposition, resuspension or evaporation, transfer, removal, redistribution, decontamination, penetration and/or permeation, and swallowing) that describe event-based movement of substances between compartments (e.g. emission, deposition, etc.). This conceptual model is intended to guide the development of predictive exposure models that estimate exposure from both the dermal and the inadvertent ingestion pathways. For exposure by these pathways the efficiency of transfer of materials between compartments (for example from surfaces to hands, or from hands to the mouth) are important determinants of exposure. A database of transfer efficiency data relevant for dermal and inadvertent ingestion exposure was developed, containing 534 empirically measured transfer efficiencies measured between 1980 and 2010 and reported in the peer-reviewed and grey literature. The majority of the reported transfer efficiencies (84%) relate to transfer between surfaces and hands, but the database also includes efficiencies for other transfer scenarios, including surface-to-glove, hand-to-mouth, and skin-to-skin. While the conceptual model can provide a framework for a predictive exposure assessment model, the database provides detailed information on transfer efficiencies between the various compartments. Together, the conceptual model and the database provide a basis for the development of a quantitative tool to estimate inadvertent ingestion exposure in the workplace.

  12. Bioinformatics analysis on molecular mechanism of rheum officinale in treatment of jaundice

    NASA Astrophysics Data System (ADS)

    Shan, Si; Tu, Jun; Nie, Peng; Yan, Xiaojun

    2017-01-01

    Objective: To study the molecular mechanism of Rheum officinale in the treatment of Jaundice by building molecular networks and comparing canonical pathways. Methods: Target proteins of Rheum officinale and related genes of Jaundice were searched from Pubchem and Gene databases online respectively. Molecular networks and canonical pathways comparison analyses were performed by Ingenuity Pathway Analysis (IPA). Results: The molecular networks of Rheum officinale and Jaundice were complex and multifunctional. The 40 target proteins of Rheum officinale and 33 Homo sapiens genes of Jaundice were found in databases. There were 19 common pathways both related networks. Rheum officinale could regulate endothelial differentiation, Interleukin-1B (IL-1B) and Tumor Necrosis Factor (TNF) in these pathways. Conclusions: Rheum officinale treat Jaundice by regulating many effective nodes of Apoptotic pathway and cellular immunity related pathways.

  13. Gene Polymorphism Studies in a Teaching Laboratory

    NASA Astrophysics Data System (ADS)

    Shultz, Jeffry

    2009-02-01

    I present a laboratory procedure for illustrating transcription, post-transcriptional modification, gene conservation, and comparative genetics for use in undergraduate biology education. Students are individually assigned genes in a targeted biochemical pathway, for which they design and test polymerase chain reaction (PCR) primers. In this example, students used genes annotated for the steroid biosynthesis pathway in soybean. The authoritative Kyoto encyclopedia of genes and genomes (KEGG) interactive database and other online resources were used to design primers based first on soybean expressed sequence tags (ESTs), then on ESTs from an alternate organism if soybean sequence was unavailable. Students designed a total of 50 gene-based primer pairs (37 soybean, 13 alternative) and tested these for polymorphism state and similarity between two soybean and two pea lines. Student assessment was based on acquisition of laboratory skills and successful project completion. This simple procedure illustrates conservation of genes and is not limited to soybean or pea. Cost per student estimates are included, along with a detailed protocol and flow diagram of the procedure.

  14. Linking disease-associated genes to regulatory networks via promoter organization

    PubMed Central

    Döhr, S.; Klingenhoff, A.; Maier, H.; de Angelis, M. Hrabé; Werner, T.; Schneider, R.

    2005-01-01

    Pathway- or disease-associated genes may participate in more than one transcriptional co-regulation network. Such gene groups can be readily obtained by literature analysis or by high-throughput techniques such as microarrays or protein-interaction mapping. We developed a strategy that defines regulatory networks by in silico promoter analysis, finding potentially co-regulated subgroups without a priori knowledge. Pairs of transcription factor binding sites conserved in orthologous genes (vertically) as well as in promoter sequences of co-regulated genes (horizontally) were used as seeds for the development of promoter models representing potential co-regulation. This approach was applied to a Maturity Onset Diabetes of the Young (MODY)-associated gene list, which yielded two models connecting functionally interacting genes within MODY-related insulin/glucose signaling pathways. Additional genes functionally connected to our initial gene list were identified by database searches with these promoter models. Thus, data-driven in silico promoter analysis allowed integrating molecular mechanisms with biological functions of the cell. PMID:15701758

  15. Differentiating pathway-specific from nonspecific effects in high-throughput toxicity data: A foundation for prioritizing adverse outcome pathway development

    EPA Science Inventory

    The U.S. Environmental Protection Agency’s ToxCast program has screened thousands of chemicals for biological activity, primarily using high-throughput in vitro bioassays. Adverse outcome pathways (AOPs) offer a means to link pathway-specific biological activities with potential ...

  16. Re-thinking organisms: The impact of databases on model organism biology.

    PubMed

    Leonelli, Sabina; Ankeny, Rachel A

    2012-03-01

    Community databases have become crucial to the collection, ordering and retrieval of data gathered on model organisms, as well as to the ways in which these data are interpreted and used across a range of research contexts. This paper analyses the impact of community databases on research practices in model organism biology by focusing on the history and current use of four community databases: FlyBase, Mouse Genome Informatics, WormBase and The Arabidopsis Information Resource. We discuss the standards used by the curators of these databases for what counts as reliable evidence, acceptable terminology, appropriate experimental set-ups and adequate materials (e.g., specimens). On the one hand, these choices are informed by the collaborative research ethos characterising most model organism communities. On the other hand, the deployment of these standards in databases reinforces this ethos and gives it concrete and precise instantiations by shaping the skills, practices, values and background knowledge required of the database users. We conclude that the increasing reliance on community databases as vehicles to circulate data is having a major impact on how researchers conduct and communicate their research, which affects how they understand the biology of model organisms and its relation to the biology of other species. Copyright © 2011 Elsevier Ltd. All rights reserved.

  17. New tools and methods for direct programmatic access to the dbSNP relational database.

    PubMed

    Saccone, Scott F; Quan, Jiaxi; Mehta, Gaurang; Bolze, Raphael; Thomas, Prasanth; Deelman, Ewa; Tischfield, Jay A; Rice, John P

    2011-01-01

    Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at http://cgsmd.isi.edu/dbsnpq. Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale.

  18. Accurate atom-mapping computation for biochemical reactions.

    PubMed

    Latendresse, Mario; Malerich, Jeremiah P; Travers, Mike; Karp, Peter D

    2012-11-26

    The complete atom mapping of a chemical reaction is a bijection of the reactant atoms to the product atoms that specifies the terminus of each reactant atom. Atom mapping of biochemical reactions is useful for many applications of systems biology, in particular for metabolic engineering where synthesizing new biochemical pathways has to take into account for the number of carbon atoms from a source compound that are conserved in the synthesis of a target compound. Rapid, accurate computation of the atom mapping(s) of a biochemical reaction remains elusive despite significant work on this topic. In particular, past researchers did not validate the accuracy of mapping algorithms. We introduce a new method for computing atom mappings called the minimum weighted edit-distance (MWED) metric. The metric is based on bond propensity to react and computes biochemically valid atom mappings for a large percentage of biochemical reactions. MWED models can be formulated efficiently as Mixed-Integer Linear Programs (MILPs). We have demonstrated this approach on 7501 reactions of the MetaCyc database for which 87% of the models could be solved in less than 10 s. For 2.1% of the reactions, we found multiple optimal atom mappings. We show that the error rate is 0.9% (22 reactions) by comparing these atom mappings to 2446 atom mappings of the manually curated Kyoto Encyclopedia of Genes and Genomes (KEGG) RPAIR database. To our knowledge, our computational atom-mapping approach is the most accurate and among the fastest published to date. The atom-mapping data will be available in the MetaCyc database later in 2012; the atom-mapping software will be available within the Pathway Tools software later in 2012.

  19. Development of a gene expression database and related analysis programs for evaluation of anticancer compounds.

    PubMed

    Ushijima, Masaru; Mashima, Tetsuo; Tomida, Akihiro; Dan, Shingo; Saito, Sakae; Furuno, Aki; Tsukahara, Satomi; Seimiya, Hiroyuki; Yamori, Takao; Matsuura, Masaaki

    2013-03-01

    Genome-wide transcriptional expression analysis is a powerful strategy for characterizing the biological activity of anticancer compounds. It is often instructive to identify gene sets involved in the activity of a given drug compound for comparison with different compounds. Currently, however, there is no comprehensive gene expression database and related application system that is; (i) specialized in anticancer agents; (ii) easy to use; and (iii) open to the public. To develop a public gene expression database of antitumor agents, we first examined gene expression profiles in human cancer cells after exposure to 35 compounds including 25 clinically used anticancer agents. Gene signatures were extracted that were classified as upregulated or downregulated after exposure to the drug. Hierarchical clustering showed that drugs with similar mechanisms of action, such as genotoxic drugs, were clustered. Connectivity map analysis further revealed that our gene signature data reflected modes of action of the respective agents. Together with the database, we developed analysis programs that calculate scores for ranking changes in gene expression and for searching statistically significant pathways from the Kyoto Encyclopedia of Genes and Genomes database in order to analyze the datasets more easily. Our database and the analysis programs are available online at our website (http://scads.jfcr.or.jp/db/cs/). Using these systems, we successfully showed that proteasome inhibitors are selectively classified as endoplasmic reticulum stress inducers and induce atypical endoplasmic reticulum stress. Thus, our public access database and related analysis programs constitute a set of efficient tools to evaluate the mode of action of novel compounds and identify promising anticancer lead compounds. © 2012 Japanese Cancer Association.

  20. Psychosocial stress in pregnancy and preterm birth: associations and mechanisms

    PubMed Central

    Shapiro, Gabriel D.; Fraser, William D.; Frasch, Martin G.; Séguin, Jean R.

    2016-01-01

    Aims Psychosocial stress during pregnancy (PSP) is a risk factor of growing interest in the etiology of preterm birth (PTB). This literature review assesses the published evidence concerning the association between PSP and PTB, highlighting established and hypothesized physiological pathways mediating this association. Method The PubMed and Web of Science databases were searched using the keywords “psychosocial stress”, “pregnancy”, “pregnancy stress”, “preterm”, “preterm birth”, “gestational age”, “anxiety”, and “social support”. After applying the exclusion criteria, the search produced 107 articles. Results The association of PSP with PTB varied according to the dimensions and timing of PSP. Stronger associations were generally found in early pregnancy, and most studies demonstrating positive results found moderate effect sizes, with risk ratios between 1.2 and 2.1. Subjective perception of stress and pregnancy-related anxiety appeared to be the stress measures most closely associated with PTB. Potential physiological pathways identified included behavioral, infectious, neuroinflammatory, and neuroendocrine mechanisms. Conclusions Future research should examine the biological pathways of these different psychosocial stress dimensions and at multiple time points across pregnancy. Culture-independent characterization of the vaginal microbiome and noninvasive monitoring of cholinergic activity represent two exciting frontiers in this research. PMID:24216160

  1. Comparative Transcriptomic Characterization of the Early Development in Pacific White Shrimp Litopenaeus vannamei

    PubMed Central

    Wei, Jiankai; Zhang, Xiaojun; Yu, Yang; Huang, Hao; Li, Fuhua; Xiang, Jianhai

    2014-01-01

    Penaeid shrimp has a distinctive metamorphosis stage during early development. Although morphological and biochemical studies about this ontogeny have been developed for decades, researches on gene expression level are still scarce. In this study, we have investigated the transcriptomes of five continuous developmental stages in Pacific white shrimp (Litopenaeus vannamei) with high throughput Illumina sequencing technology. The reads were assembled and clustered into 66,815 unigenes, of which 32,398 have putative homologues in nr database, 14,981 have been classified into diverse functional categories by Gene Ontology (GO) annotation and 26,257 have been associated with 255 pathways by KEGG pathway mapping. Meanwhile, the differentially expressed genes (DEGs) between adjacent developmental stages were identified and gene expression patterns were clustered. By GO term enrichment analysis, KEGG pathway enrichment analysis and functional gene profiling, the physiological changes during shrimp metamorphosis could be better understood, especially histogenesis, diet transition, muscle development and exoskeleton reconstruction. In conclusion, this is the first study that characterized the integrated transcriptomic profiles during early development of penaeid shrimp, and these findings will serve as significant references for shrimp developmental biology and aquaculture research. PMID:25197823

  2. Entourage: Visualizing Relationships between Biological Pathways using Contextual Subsets

    PubMed Central

    Lex, Alexander; Partl, Christian; Kalkofen, Denis; Streit, Marc; Gratzl, Samuel; Wassermann, Anne Mai; Schmalstieg, Dieter; Pfister, Hanspeter

    2014-01-01

    Biological pathway maps are highly relevant tools for many tasks in molecular biology. They reduce the complexity of the overall biological network by partitioning it into smaller manageable parts. While this reduction of complexity is their biggest strength, it is, at the same time, their biggest weakness. By removing what is deemed not important for the primary function of the pathway, biologists lose the ability to follow and understand cross-talks between pathways. Considering these cross-talks is, however, critical in many analysis scenarios, such as judging effects of drugs. In this paper we introduce Entourage, a novel visualization technique that provides contextual information lost due to the artificial partitioning of the biological network, but at the same time limits the presented information to what is relevant to the analyst’s task. We use one pathway map as the focus of an analysis and allow a larger set of contextual pathways. For these context pathways we only show the contextual subsets, i.e., the parts of the graph that are relevant to a selection. Entourage suggests related pathways based on similarities and highlights parts of a pathway that are interesting in terms of mapped experimental data. We visualize interdependencies between pathways using stubs of visual links, which we found effective yet not obtrusive. By combining this approach with visualization of experimental data, we can provide domain experts with a highly valuable tool. We demonstrate the utility of Entourage with case studies conducted with a biochemist who researches the effects of drugs on pathways. We show that the technique is well suited to investigate interdependencies between pathways and to analyze, understand, and predict the effect that drugs have on different cell types. Fig. 1Entourage showing the Glioma pathway in detail and contextual information of multiple related pathways. PMID:24051820

  3. Exploration of the Anti-Inflammatory Drug Space Through Network Pharmacology: Applications for Drug Repurposing

    PubMed Central

    de Anda-Jáuregui, Guillermo; Guo, Kai; McGregor, Brett A.; Hur, Junguk

    2018-01-01

    The quintessential biological response to disease is inflammation. It is a driver and an important element in a wide range of pathological states. Pharmacological management of inflammation is therefore central in the clinical setting. Anti-inflammatory drugs modulate specific molecules involved in the inflammatory response; these drugs are traditionally classified as steroidal and non-steroidal drugs. However, the effects of these drugs are rarely limited to their canonical targets, affecting other molecules and altering biological functions with system-wide effects that can lead to the emergence of secondary therapeutic applications or adverse drug reactions (ADRs). In this study, relationships among anti-inflammatory drugs, functional pathways, and ADRs were explored through network models. We integrated structural drug information, experimental anti-inflammatory drug perturbation gene expression profiles obtained from the Connectivity Map and Library of Integrated Network-Based Cellular Signatures, functional pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome databases, as well as adverse reaction information from the U.S. Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). The network models comprise nodes representing anti-inflammatory drugs, functional pathways, and adverse effects. We identified structural and gene perturbation similarities linking anti-inflammatory drugs. Functional pathways were connected to drugs by implementing Gene Set Enrichment Analysis (GSEA). Drugs and adverse effects were connected based on the proportional reporting ratio (PRR) of an adverse effect in response to a given drug. Through these network models, relationships among anti-inflammatory drugs, their functional effects at the pathway level, and their adverse effects were explored. These networks comprise 70 different anti-inflammatory drugs, 462 functional pathways, and 1,175 ADRs. Network-based properties, such as degree, clustering coefficient, and node strength, were used to identify new therapeutic applications within and beyond the anti-inflammatory context, as well as ADR risk for these drugs, helping to select better repurposing candidates. Based on these parameters, we identified naproxen, meloxicam, etodolac, tenoxicam, flufenamic acid, fenoprofen, and nabumetone as candidates for drug repurposing with lower ADR risk. This network-based analysis pipeline provides a novel way to explore the effects of drugs in a therapeutic space. PMID:29545755

  4. Exploration of the Anti-Inflammatory Drug Space Through Network Pharmacology: Applications for Drug Repurposing.

    PubMed

    de Anda-Jáuregui, Guillermo; Guo, Kai; McGregor, Brett A; Hur, Junguk

    2018-01-01

    The quintessential biological response to disease is inflammation. It is a driver and an important element in a wide range of pathological states. Pharmacological management of inflammation is therefore central in the clinical setting. Anti-inflammatory drugs modulate specific molecules involved in the inflammatory response; these drugs are traditionally classified as steroidal and non-steroidal drugs. However, the effects of these drugs are rarely limited to their canonical targets, affecting other molecules and altering biological functions with system-wide effects that can lead to the emergence of secondary therapeutic applications or adverse drug reactions (ADRs). In this study, relationships among anti-inflammatory drugs, functional pathways, and ADRs were explored through network models. We integrated structural drug information, experimental anti-inflammatory drug perturbation gene expression profiles obtained from the Connectivity Map and Library of Integrated Network-Based Cellular Signatures, functional pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome databases, as well as adverse reaction information from the U.S. Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). The network models comprise nodes representing anti-inflammatory drugs, functional pathways, and adverse effects. We identified structural and gene perturbation similarities linking anti-inflammatory drugs. Functional pathways were connected to drugs by implementing Gene Set Enrichment Analysis (GSEA). Drugs and adverse effects were connected based on the proportional reporting ratio (PRR) of an adverse effect in response to a given drug. Through these network models, relationships among anti-inflammatory drugs, their functional effects at the pathway level, and their adverse effects were explored. These networks comprise 70 different anti-inflammatory drugs, 462 functional pathways, and 1,175 ADRs. Network-based properties, such as degree, clustering coefficient, and node strength, were used to identify new therapeutic applications within and beyond the anti-inflammatory context, as well as ADR risk for these drugs, helping to select better repurposing candidates. Based on these parameters, we identified naproxen, meloxicam, etodolac, tenoxicam, flufenamic acid, fenoprofen, and nabumetone as candidates for drug repurposing with lower ADR risk. This network-based analysis pipeline provides a novel way to explore the effects of drugs in a therapeutic space.

  5. Construction of a Linux based chemical and biological information system.

    PubMed

    Molnár, László; Vágó, István; Fehér, András

    2003-01-01

    A chemical and biological information system with a Web-based easy-to-use interface and corresponding databases has been developed. The constructed system incorporates all chemical, numerical and textual data related to the chemical compounds, including numerical biological screen results. Users can search the database by traditional textual/numerical and/or substructure or similarity queries through the web interface. To build our chemical database management system, we utilized existing IT components such as ORACLE or Tripos SYBYL for database management and Zope application server for the web interface. We chose Linux as the main platform, however, almost every component can be used under various operating systems.

  6. De Novo Transcriptomic Analysis of Peripheral Blood Lymphocytes from the Chinese Goose: Gene Discovery and Immune System Pathway Description

    PubMed Central

    Tariq, Mansoor; Chen, Rong; Yuan, Hongyu; Liu, Yanjie; Wu, Yanan; Wang, Junya; Xia, Chun

    2015-01-01

    Background The Chinese goose is one of the most economically important poultry birds and is a natural reservoir for many avian viruses. However, the nature and regulation of the innate and adaptive immune systems of this waterfowl species are not completely understood due to limited information on the goose genome. Recently, transcriptome sequencing technology was applied in the genomic studies focused on novel gene discovery. Thus, this study described the transcriptome of the goose peripheral blood lymphocytes to identify immunity relevant genes. Principal Findings De novo transcriptome assembly of the goose peripheral blood lymphocytes was sequenced by Illumina-Solexa technology. In total, 211,198 unigenes were assembled from the 69.36 million cleaned reads. The average length, N50 size and the maximum length of the assembled unigenes were 687 bp, 1,298 bp and 18,992 bp, respectively. A total of 36,854 unigenes showed similarity by BLAST search against the NCBI non-redundant (Nr) protein database. For functional classification, 163,161 unigenes were comprised of three Gene Ontology (Go) categories and 67 subcategories. A total of 15,334 unigenes were annotated into 25 eukaryotic orthologous groups (KOGs) categories. Kyoto Encyclopedia of Genes and Genomes (KEGG) database annotated 39,585 unigenes into six biological functional groups and 308 pathways. Among the 2,757 unigenes that participated in the 15 immune system KEGG pathways, 125 of the most important immune relevant genes were summarized and analyzed by STRING analysis to identify gene interactions and relationships. Moreover, 10 genes were confirmed by PCR and analyzed. Of these 125 unigenes, 109 unigenes, approximately 87%, were not previously identified in the goose. Conclusion This de novo transcriptome analysis could provide important Chinese goose sequence information and highlights the value of new gene discovery, pathways investigation and immune system gene identification, and comparison with other avian species as useful tools to understand the goose immune system. PMID:25816068

  7. Genic and Intergenic SSR Database Generation, SNPs Determination and Pathway Annotations, in Date Palm (Phoenix dactylifera L.).

    PubMed

    Mokhtar, Morad M; Adawy, Sami S; El-Assal, Salah El-Din S; Hussein, Ebtissam H A

    2016-01-01

    The present investigation was carried out aiming to use the bioinformatics tools in order to identify and characterize, simple sequence repeats within the third Version of the date palm genome and develop a new SSR primers database. In addition single nucleotide polymorphisms (SNPs) that are located within the SSR flanking regions were recognized. Moreover, the pathways for the sequences assigned by SSR primers, the biological functions and gene interaction were determined. A total of 172,075 SSR motifs was identified on date palm genome sequence with a frequency of 450.97 SSRs per Mb. Out of these, 130,014 SSRs (75.6%) were located within the intergenic regions with a frequency of 499 SSRs per Mb. While, only 42,061 SSRs (24.4%) were located within the genic regions with a frequency of 347.5 SSRs per Mb. A total of 111,403 of SSR primer pairs were designed, that represents 291.9 SSR primers per Mb. Out of the 111,403, only 31,380 SSR primers were in the genic regions, while 80,023 primers were in the intergenic regions. A number of 250,507 SNPs were recognized in 84,172 SSR flanking regions, which represents 75.55% of the total SSR flanking regions. Out of 12,274 genes only 463 genes comprising 896 SSR primers were mapped onto 111 pathways using KEGG data base. The most abundant enzymes were identified in the pathway related to the biosynthesis of antibiotics. We tested 1031 SSR primers using both publicly available date palm genome sequences as templates in the in silico PCR reactions. Concerning in vitro validation, 31 SSR primers among those used in the in silico PCR were synthesized and tested for their ability to detect polymorphism among six Egyptian date palm cultivars. All tested primers have successfully amplified products, but only 18 primers detected polymorphic amplicons among the studied date palm cultivars.

  8. Affected pathways and transcriptional regulators in gene expression response to an ultra-marathon trail: Global and independent activity approaches

    PubMed Central

    Roca, Emma; Brotons, Daniel; Soria, Jose Manuel; Perera, Alexandre

    2017-01-01

    Gene expression (GE) analyses on blood samples from marathon and half-marathon runners have reported significant impacts on the immune and inflammatory systems. An ultra-marathon trail (UMT) represents a greater effort due to its more testing conditions. For the first time, we report the genome-wide GE profiling in a group of 16 runners participating in an 82 km UMT competition. We quantified their differential GE profile before and after the race using HuGene2.0st microarrays (Affymetrix Inc., California, US). The results obtained were decomposed by means of an independent component analysis (ICA) targeting independent expression modes. We observed significant differences in the expression levels of 5,084 protein coding genes resulting in an overrepresentation of 14% of the human biological pathways from the Kyoto Encyclopedia of Genes and Genomes database. These were mainly clustered on terms related with protein synthesis repression, altered immune system and infectious diseases related mechanisms. In a second analysis, 27 out of the 196 transcriptional regulators (TRs) included in the Open Regulatory Annotation database were overrepresented. Among these TRs, we identified transcription factors from the hypoxia-inducible factors (HIF) family EPAS1 (p< 0.01) and HIF1A (p<0.001), and others jointly described in the gluconeogenesis program such as HNF4 (p< 0.001), EGR1 (p<0.001), CEBPA (p< 0.001) and a highly specific TR, YY1 (p<0.01). The five independent components, obtained from ICA, further revealed a down-regulation of 10 genes distributed in the complex I, III and V from the electron transport chain. This mitochondrial activity reduction is compatible with HIF-1 system activation. The vascular endothelial growth factor (VEGF) pathway, known to be regulated by HIF, also emerged (p<0.05). Additionally, and related to the brain rewarding circuit, the endocannabinoid signalling pathway was overrepresented (p<0.05). PMID:29028836

  9. Plasma metabolomic profiles of breast cancer patients after short-term limonene intervention

    PubMed Central

    Miller, Jessica A.; Pappan, Kirk; Thompson, Patricia A.; Want, Elizabeth J.; Siskos, Alexandros; Keun, Hector C.; Wulff, Jacob; Hu, Chengcheng; Lang, Julie E.; Chow, H-H. Sherry

    2014-01-01

    Limonene is a lipophilic monoterpene found in high levels in citrus peel. Limonene demonstrates anti-cancer properties in preclinical models with effects on multiple cellular targets at varying potency. While of interest as a cancer chemopreventive, the biological activity of limonene in humans is poorly understood. We conducted metabolite profiling in 39 paired (pre/post-intervention) plasma samples from early-stage breast cancer patients receiving limonene treatment (2 g QD) before surgical resection of their tumor. Metabolite profiling was conducted using ultra-performance liquid chromatography (UPLC) coupled to a linear trap quadrupole (LTQ) system and gas chromatography mass spectrometry (GC-MS). Metabolites were identified by comparison of ion features in samples to a standard reference library. Pathway-based interpretation was conducted using the human metabolome database (HMDB) and the MetaCyc database. Of the 397 named metabolites identified, 72 changed significantly with limonene intervention. Class-based changes included significant decreases in adrenal steroids (P’s<0.01), and significant increases in bile acids (P’s≤0.05) and multiple collagen breakdown products (P’s<0.001). The pattern of changes also suggested alterations in glucose metabolism. There were 47 metabolites whose change with intervention was significantly correlated to a decrease in cyclin D1, a cell cycle regulatory protein, in patient tumor tissues (P’s≤0.05). Here, oral administration of limonene resulted in significant changes in several metabolic pathways. Further, pathway-based changes were related to the change in tissue level cyclin D1 expression. Future controlled clinical trials with limonene are necessary to determine the potential role and mechanisms of limonene in the breast cancer prevention setting. PMID:25388013

  10. Nuclear Receptor Signaling Atlas: Opening Access to the Biology of Nuclear Receptor Signaling Pathways

    PubMed Central

    Becnel, Lauren B.; Darlington, Yolanda F.; Ochsner, Scott A.; Easton-Marks, Jeremy R.; Watkins, Christopher M.; McOwiti, Apollo; Kankanamge, Wasula H.; Wise, Michael W.; DeHart, Michael; Margolis, Ronald N.; McKenna, Neil J.

    2015-01-01

    Signaling pathways involving nuclear receptors (NRs), their ligands and coregulators, regulate tissue-specific transcriptomes in diverse processes, including development, metabolism, reproduction, the immune response and neuronal function, as well as in their associated pathologies. The Nuclear Receptor Signaling Atlas (NURSA) is a Consortium focused around a Hub website (www.nursa.org) that annotates and integrates diverse ‘omics datasets originating from the published literature and NURSA-funded Data Source Projects (NDSPs). These datasets are then exposed to the scientific community on an Open Access basis through user-friendly data browsing and search interfaces. Here, we describe the redesign of the Hub, version 3.0, to deploy “Web 2.0” technologies and add richer, more diverse content. The Molecule Pages, which aggregate information relevant to NR signaling pathways from myriad external databases, have been enhanced to include resources for basic scientists, such as post-translational modification sites and targeting miRNAs, and for clinicians, such as clinical trials. A portal to NURSA’s Open Access, PubMed-indexed journal Nuclear Receptor Signaling has been added to facilitate manuscript submissions. Datasets and information on reagents generated by NDSPs are available, as is information concerning periodic new NDSP funding solicitations. Finally, the new website integrates the Transcriptomine analysis tool, which allows for mining of millions of richly annotated public transcriptomic data points in the field, providing an environment for dataset re-use and citation, bench data validation and hypothesis generation. We anticipate that this new release of the NURSA database will have tangible, long term benefits for both basic and clinical research in this field. PMID:26325041

  11. Identification of microRNAs and genes associated with hyperandrogenism in the follicular fluid of women with polycystic ovary syndrome.

    PubMed

    Xue, Yunping; Lv, Juan; Xu, Pengfei; Gu, Lin; Cao, Jian; Xu, Lingling; Xue, Kai; Li, Qian

    2018-05-01

    Polycystic ovary syndrome (PCOS) is a common reproductive endocrine disease, which is characterized by hyperandrogenism (HA), chronic anovulation, polycystic ovaries, insulin resistance, and obesity. At present, the mechanism by which PCOS/HA occurs has not been fully elucidated, thus, the mechanisms behind and interventions for HA in PCOS are current hot topics in research. MiRNAs have recently been shown to serve as diagnostic or prognostic biomarkers in patients with cancer. Thus, we are currently focused on studying the altered expression of miRNAs in follicular fluid and their correlation with HA in PCOS. Illumina deep sequencing technology was used to explore different miRNAs in the follicular fluid of women with PCOS/HA and in the follicular fluid of women in a control group. Target prediction databases were then used to analyse the target genes of different expressed miRNAs, and GO analysis and the KEGG pathway database were used to identify the functions and the main biochemical and signalling pathways of differentially expressed target genes. The expression levels of 263 miRNAs were significantly different (>2-fold up-regulated or <0.5-fold down-regulated, P < 0.05) between the two groups of women. For example, the expression levels of miRNA (200a-3p, 10b-3p, 200b-3p, 29c-3p, 99a-3p, and 125a-5p) were significantly increased, while there was a decreased expression of miR-105-3p in PCOS patients with respect to the control. Literature has shown that the above seven miRNAs were associated with HA in PCOS. Furthermore, 31 770 genes were predicted to be targets of the 263 differentially expressed microRNAs. GO analysis and the KEGG pathway database showed involvement of these target genes in HA in PCOS. These results suggest the presence of differentially expressed miRNAs in the follicular fluid of women with PCOS/HA versus women in the control group. The potential role of these microRNAs was elucidated using bioinformatics tools and was found to be involved in the regulation of different pathways, biological functions, and cellular components underlying PCOS. The results of this research may reveal new mechanisms of PCOS/HA and suggest potential treatment targets. © 2017 Wiley Periodicals, Inc.

  12. Role of miR-452-5p in the tumorigenesis of prostate cancer: A study based on the Cancer Genome Atl(TCGA), Gene Expression Omnibus (GEO), and bioinformatics analysis.

    PubMed

    Gao, Li; Zhang, Li-Jie; Li, Sheng-Hua; Wei, Li-Li; Luo, Bin; He, Rong-Quan; Xia, Shuang

    2018-03-06

    MiR-452-5p has been reported to be down-regulated in prostate cancer, affecting the development of this type of cancer. However, the molecular mechanism of miR-452-5p in prostate cancer remains unclear. Therefore, we investigated the network of target genes of miR-452-5p in prostate cancer using bioinformatics analyses. We first analyzed the expression profiles and prognostic value of miR-452-5p in prostate cancer tissues from a public database. Gene Ontology (GO), the Kyoto Encyclopedia of Genes and Genomes (KEGG), PANTHER pathway analyses, and a disease ontology (DG) analysis were performed to find the molecular functions of the target genes from GSE datasets and miRWalk. Finally, we validated hub genes from the protein-protein interaction (PPI) networks of the target genes in the Human Protein Atlas (HPA) database and Gene Expression Profiling Interactive Analysis (GEPIA). Narrowing down the optimal target genes was conducted by seeking the common parts of up-regulated genes from GEPIA, down-regulated genes from GSE datasets, and predicted genes in miRWalk. Based on mining of GEO and ArrayExpress microarray chips and miRNA-Seq data in the TCGA database, which includes 1007 prostate cancer samples and 387 non-cancer samples, miR-452-5p is shown to be down-regulated in prostate cancer. GO, KEGG, and PANTHER pathway analyses suggested that the target genes might participate in important biological processes, such as transforming growth factor beta signaling and the positive regulation of brown fat cell differentiation and mesenchymal cell differentiation, as well as the Ras signaling pathway and pathways regulating the pluripotency of stem cells and arrhythmogenic right ventricular cardiomyopathy (ARVC). Nine genes-GABBR, PNISR, NTSR1, DOCK1, EREG, SFRP1, PTGS2, LEF1, and BMP2-were defined as hub genes in the PPI network. Three genes-FAM174B, SLC30A4, and SLIT1-were jointly shared by GEPIA, the GSE datasets, and miRWalk. Down-regulated miR-452-5p might play an essential role in the tumorigenesis of prostate cancer. Copyright © 2018. Published by Elsevier GmbH.

  13. Application of the ToxMiner Database: Network Analysis of ...

    EPA Pesticide Factsheets

    The US EPA ToxCast program is using in vitro HTS (High-Throughput Screening) methods to profile and model bioactivity of environmental chemicals. The main goals of the ToxCast program are to generate predictive signatures of toxicity, and ultimately provide rapid and cost-effective alternatives to animal testing. The chemicals selected for Phase I are composed largely by a diverse set of pesticide active ingredients, which had sufficient supporting in vivo data included as part of their registration process with the EPA. Other miscellaneous chemicals of environmental concern were also included. Application of HTS to environmental toxicants is a novel approach to predictive toxicology and health risk assessment, and differs from what is required for drug efficacy screening in that biochemical interaction of environmental chemicals are sometimes weaker than that seen with drugs and their intended targets. Additionally, the chemical space covered by environmental chemicals is much broader compared to that of pharmaceuticals. The ToxMiner database has been created and added to the EPA’s ACToR (Aggregated Computational Toxicology Resource) chemical database. One purpose of the ToxMiner database is to link biological, metabolic and cellular pathway data to genes and in vitro assay data for the initial subset of chemicals screened in the ToxCast Phase I HTS assays. Also included in ToxMiner is human disease information, which correlates with ToxCast assays that tar

  14. Motif discovery with data mining in 3D protein structure databases: discovery, validation and prediction of the U-shape zinc binding ("Huf-Zinc") motif.

    PubMed

    Maurer-Stroh, Sebastian; Gao, He; Han, Hao; Baeten, Lies; Schymkowitz, Joost; Rousseau, Frederic; Zhang, Louxin; Eisenhaber, Frank

    2013-02-01

    Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif--structural motif--function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL (http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/).

  15. Application of the ToxMiner Database: Network Analysis ...

    EPA Pesticide Factsheets

    The US EPA ToxCast program is using in vitro HTS (High-Throughput Screening) methods to profile and model bioactivity of environmental chemicals. The main goals of the ToxCast program are to generate predictive signatures of toxicity, and ultimately provide rapid and cost-effective alternatives to animal testing. The chemicals selected for Phase I are composed largely by a diverse set of pesticide active ingredients, which had sufficient supporting in vivo data included as part of their registration process with the EPA. Other miscellaneous chemicals of environmental concern were also included. Application of HTS to environmental toxicants is a novel approach to predictive toxicology and health risk assessment, and differs from what is required for drug efficacy screening in that biochemical interaction of environmental chemicals are sometimes weaker than that seen with drugs and their intended targets. Additionally, the chemical space covered by environmental chemicals is much broader compared to that of pharmaceuticals. The ToxMiner database has been created and added to the EPA’s ACToR (Aggregated Computational Toxicology Resource) chemical database. One purpose of the ToxMiner database is to link biological, metabolic, and cellular pathway data to genes and in vitro assay data for the initial subset of chemicals screened in the ToxCast Phase I HTS assays. Also included in ToxMiner is human disease information, which correlates with ToxCast assays that ta

  16. Differentiating pathway-specific from non-specific effects in high-throughput toxicity data: A foundation for prioritizing adverse outcome pathway development

    EPA Science Inventory

    The U.S. Environmental Protection Agency’s ToxCast program has screened thousands of chemicals for biological activity, primarily using high-throughput in vitro bioassays. Adverse outcome pathways (AOPs) offer a means to link pathway-specific biological activities with pote...

  17. Mining and integration of pathway diagrams from imaging data.

    PubMed

    Kozhenkov, Sergey; Baitaluk, Michael

    2012-03-01

    Pathway diagrams from PubMed and World Wide Web (WWW) contain valuable highly curated information difficult to reach without tools specifically designed and customized for the biological semantics and high-content density of the images. There is currently no search engine or tool that can analyze pathway images, extract their pathway components (molecules, genes, proteins, organelles, cells, organs, etc.) and indicate their relationships. Here, we describe a resource of pathway diagrams retrieved from article and web-page images through optical character recognition, in conjunction with data mining and data integration methods. The recognized pathways are integrated into the BiologicalNetworks research environment linking them to a wealth of data available in the BiologicalNetworks' knowledgebase, which integrates data from >100 public data sources and the biomedical literature. Multiple search and analytical tools are available that allow the recognized cellular pathways, molecular networks and cell/tissue/organ diagrams to be studied in the context of integrated knowledge, experimental data and the literature. BiologicalNetworks software and the pathway repository are freely available at www.biologicalnetworks.org. Supplementary data are available at Bioinformatics online.

  18. Data warehousing in molecular biology.

    PubMed

    Schönbach, C; Kowalski-Saunders, P; Brusic, V

    2000-05-01

    In the business and healthcare sectors data warehousing has provided effective solutions for information usage and knowledge discovery from databases. However, data warehousing applications in the biological research and development (R&D) sector are lagging far behind. The fuzziness and complexity of biological data represent a major challenge in data warehousing for molecular biology. By combining experiences in other domains with our findings from building a model database, we have defined the requirements for data warehousing in molecular biology.

  19. The 2015 Nucleic Acids Research Database Issue and molecular biology database collection.

    PubMed

    Galperin, Michael Y; Rigden, Daniel J; Fernández-Suárez, Xosé M

    2015-01-01

    The 2015 Nucleic Acids Research Database Issue contains 172 papers that include descriptions of 56 new molecular biology databases, and updates on 115 databases whose descriptions have been previously published in NAR or other journals. Following the classification that has been introduced last year in order to simplify navigation of the entire issue, these articles are divided into eight subject categories. This year's highlights include RNAcentral, an international community portal to various databases on noncoding RNA; ValidatorDB, a validation database for protein structures and their ligands; SASBDB, a primary repository for small-angle scattering data of various macromolecular complexes; MoonProt, a database of 'moonlighting' proteins, and two new databases of protein-protein and other macromolecular complexes, ComPPI and the Complex Portal. This issue also includes an unusually high number of cancer-related databases and other databases dedicated to genomic basics of disease and potential drugs and drug targets. The size of NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/a/, remained approximately the same, following the addition of 74 new resources and removal of 77 obsolete web sites. The entire Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/). Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  20. PROFESS: a PROtein Function, Evolution, Structure and Sequence database

    PubMed Central

    Triplet, Thomas; Shortridge, Matthew D.; Griep, Mark A.; Stark, Jaime L.; Powers, Robert; Revesz, Peter

    2010-01-01

    The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are ∼1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein–protein interaction networks. Database URL: http://cse.unl.edu/∼profess/ PMID:20624718

  1. A taxonomy of visualization tasks for the analysis of biological pathway data.

    PubMed

    Murray, Paul; McGee, Fintan; Forbes, Angus G

    2017-02-15

    Understanding complicated networks of interactions and chemical components is essential to solving contemporary problems in modern biology, especially in domains such as cancer and systems research. In these domains, biological pathway data is used to represent chains of interactions that occur within a given biological process. Visual representations can help researchers understand, interact with, and reason about these complex pathways in a number of ways. At the same time, these datasets offer unique challenges for visualization, due to their complexity and heterogeneity. Here, we present taxonomy of tasks that are regularly performed by researchers who work with biological pathway data. The generation of these tasks was done in conjunction with interviews with several domain experts in biology. These tasks require further classification than is provided by existing taxonomies. We also examine existing visualization techniques that support each task, and we discuss gaps in the existing visualization space revealed by our taxonomy. Our taxonomy is designed to support the development and design of future biological pathway visualization applications. We conclude by suggesting future research directions based on our taxonomy and motivated by the comments received by our domain experts.

  2. Application of synthetic biology for production of chemicals in yeast Saccharomyces cerevisiae.

    PubMed

    Li, Mingji; Borodina, Irina

    2015-02-01

    Synthetic biology and metabolic engineering enable generation of novel cell factories that efficiently convert renewable feedstocks into biofuels, bulk, and fine chemicals, thus creating the basis for biosustainable economy independent on fossil resources. While over a hundred proof-of-concept chemicals have been made in yeast, only a very small fraction of those has reached commercial-scale production so far. The limiting factor is the high research cost associated with the development of a robust cell factory that can produce the desired chemical at high titer, rate, and yield. Synthetic biology has the potential to bring down this cost by improving our ability to predictably engineer biological systems. This review highlights synthetic biology applications for design, assembly, and optimization of non-native biochemical pathways in baker's yeast Saccharomyces cerevisiae We describe computational tools for the prediction of biochemical pathways, molecular biology methods for assembly of DNA parts into pathways, and for introducing the pathways into the host, and finally approaches for optimizing performance of the introduced pathways. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permission@oup.com.

  3. Challenges of the information age: the impact of false discovery on pathway identification.

    PubMed

    Rog, Colin J; Chekuri, Srinivasa C; Edgerton, Mary E

    2012-11-21

    Pathways with members that have known relevance to a disease are used to support hypotheses generated from analyses of gene expression and proteomic studies. Using cancer as an example, the pitfalls of searching pathways databases as support for genes and proteins that could represent false discoveries are explored. The frequency with which networks could be generated from 100 instances each of randomly selected five and ten genes sets as input to MetaCore, a commercial pathways database, was measured. A PubMed search enumerated cancer-related literature published for any gene in the networks. Using three, two, and one maximum intervening step between input genes to populate the network, networks were generated with frequencies of 97%, 77%, and 7% using ten gene sets and 73%, 27%, and 1% using five gene sets. PubMed reported an average of 4225 cancer-related articles per network gene. This can be attributed to the richly populated pathways databases and the interest in the molecular basis of cancer. As information sources become enriched, they are more likely to generate plausible mechanisms for false discoveries.

  4. The unconventional antimicrobial peptides of the classical propionibacteria.

    PubMed

    Faye, Therese; Holo, Helge; Langsrud, Thor; Nes, Ingolf F; Brede, Dag A

    2011-02-01

    The classical propionibacteria produce genetically unique antimicrobial peptides, whose biological activities are without equivalents, and to which there are no homologous sequences in public databases. In this review, we summarize the genetics, biochemistry, biosynthesis, and biological activities of three extensively studied antimicrobial peptides from propionibacteria. The propionicin T1 peptide constitutes a bona fide example of an unmodified general secretory pathway (sec)-dependent bacteriocin, which is bactericidal towards all tested species of propionibacteria except Propionibacterium freudenreichii. The PAMP antimicrobial peptide represents a novel concept within bacterial antagonism, where an inactive precursor protein is secreted in large amounts, and which activation appears to rely on subsequent processing by proteases in its resident milieu. Propionicin F is a negatively charged bacteriocin that displays an intraspecies bactericidal inhibition spectrum. The biosynthesis of propionicin F appears to proceed through a series of unusual events requiring both N- and C-terminal processing of a precursor protein, which probably requires the radical SAM superfamily enzyme PcfB.

  5. Genomic atlas of the human plasma proteome.

    PubMed

    Sun, Benjamin B; Maranville, Joseph C; Peters, James E; Stacey, David; Staley, James R; Blackshaw, James; Burgess, Stephen; Jiang, Tao; Paige, Ellie; Surendran, Praveen; Oliver-Williams, Clare; Kamat, Mihir A; Prins, Bram P; Wilcox, Sheri K; Zimmerman, Erik S; Chi, An; Bansal, Narinder; Spain, Sarah L; Wood, Angela M; Morrell, Nicholas W; Bradley, John R; Janjic, Nebojsa; Roberts, David J; Ouwehand, Willem H; Todd, John A; Soranzo, Nicole; Suhre, Karsten; Paul, Dirk S; Fox, Caroline S; Plenge, Robert M; Danesh, John; Runz, Heiko; Butterworth, Adam S

    2018-06-01

    Although plasma proteins have important roles in biological processes and are the direct targets of many drugs, the genetic factors that control inter-individual variation in plasma protein levels are not well understood. Here we characterize the genetic architecture of the human plasma proteome in healthy blood donors from the INTERVAL study. We identify 1,927 genetic associations with 1,478 proteins, a fourfold increase on existing knowledge, including trans associations for 1,104 proteins. To understand the consequences of perturbations in plasma protein levels, we apply an integrated approach that links genetic variation with biological pathway, disease, and drug databases. We show that protein quantitative trait loci overlap with gene expression quantitative trait loci, as well as with disease-associated loci, and find evidence that protein biomarkers have causal roles in disease using Mendelian randomization analysis. By linking genetic factors to diseases via specific proteins, our analyses highlight potential therapeutic targets, opportunities for matching existing drugs with new disease indications, and potential safety concerns for drugs under development.

  6. Adverse outcome pathway (AOP) development I: Strategies and principles

    EPA Science Inventory

    An adverse outcome pathway (AOP) is a conceptual framework that organizes existing knowledge concerning biologically plausible, and empirically-supported, links between molecular-level perturbation of a biological system and an adverse outcome at a level of biological organizatio...

  7. The role of drug profiles as similarity metrics: applications to repurposing, adverse effects detection and drug-drug interactions.

    PubMed

    Vilar, Santiago; Hripcsak, George

    2017-07-01

    Explosion of the availability of big data sources along with the development in computational methods provides a useful framework to study drugs' actions, such as interactions with pharmacological targets and off-targets. Databases related to protein interactions, adverse effects and genomic profiles are available to be used for the construction of computational models. In this article, we focus on the description of biological profiles for drugs that can be used as a system to compare similarity and create methods to predict and analyze drugs' actions. We highlight profiles constructed with different biological data, such as target-protein interactions, gene expression measurements, adverse effects and disease profiles. We focus on the discovery of new targets or pathways for drugs already in the pharmaceutical market, also called drug repurposing, in the interaction with off-targets responsible for adverse reactions and in drug-drug interaction analysis. The current and future applications, strengths and challenges facing all these methods are also discussed. Biological profiles or signatures are an important source of data generation to deeply analyze biological actions with important implications in drug-related studies. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  8. Identification of Key Transcription Factors Associated with Lung Squamous Cell Carcinoma

    PubMed Central

    Zhang, Feng; Chen, Xia; Wei, Ke; Liu, Daoming; Xu, Xiaodong; Zhang, Xing; Shi, Hong

    2017-01-01

    Background Lung squamous cell carcinoma (lung SCC) is a common type of lung cancer, but its mechanism of pathogenesis is unclear. The aim of this study was to identify key transcription factors in lung SCC and elucidate its mechanism. Material/Methods Six published microarray datasets of lung SCC were downloaded from Gene Expression Omnibus (GEO) for integrated bioinformatics analysis. Significance analysis of microarrays was used to identify differentially expressed genes (DEGs) between lung SCC and normal controls. The biological functions and signaling pathways of DEGs were mapped in the Gene Otology and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database, respectively. A transcription factor gene regulatory network was used to obtain insights into the functions of DEGs. Results A total of 1,011 genes, including 539 upregulated genes and 462 downregulated genes, were filtered as DEGs between lung SCC and normal controls. DEGs were significantly enriched in cell cycle, DNA replication, p53 signaling pathway, pathways in cancer, adherens junction, and cell adhesion molecules signaling pathways. There were 57 transcription factors identified, which were used to construct a regulatory network. The network consisted of 736 interactions between 49 transcription factors and 486 DEGs. NFIC, BRCA1, and NFATC2 were the top 3 transcription factors that had the highest connectivity with DEGs and that regulated 83, 82, and 75 DEGs in the network, respectively. Conclusions NFIC, BRCA1, and NFATC2 might be the key transcription factors in the development of lung SCC by regulating the genes involved in cell cycle and DNA replication pathways. PMID:28081052

  9. Transcriptome analysis reveals enrichment of genes associated with auditory system in swimbladder of channel catfish.

    PubMed

    Yang, Yujia; Wang, Xiaozhu; Liu, Yang; Fu, Qiang; Tian, Changxu; Wu, Chenglong; Shi, Huitong; Yuan, Zihao; Tan, Suxu; Liu, Shikai; Gao, Dongya; Dunham, Rex; Liu, Zhanjiang

    2018-04-30

    In aquatic organisms, hearing is an important sense for acoustic communications and detection of sound-emitting predators and prey. Channel catfish is a dominant aquaculture species in the United States. As channel catfish can hear sounds of relatively high frequency, it serves as a good model for study auditory mechanisms. In catfishes, Weberian ossicles connect the swimbladder to the inner ear to transfer the forced vibrations and improve hearing ability. In this study, we examined the transcriptional profiles of channel catfish swimbladder and other four tissues (gill, liver, skin, and intestine). We identified a total of 1777 genes that exhibited preferential expression pattern in swimbladder of channel catfish. Based on Gene Ontology enrichment analysis, many of swimbladder-enriched genes were categorized into sensory perception of sound, auditory behavior, response to auditory stimulus, or detection of mechanical stimulus involved in sensory perception of sound, such as coch, kcnq4, sptbn1, sptbn4, dnm1, ush2a, and col11a1. Six signaling pathways associated with hearing (Glutamatergic synapse, GABAergic synapse pathways, Axon guidance, cAMP signaling pathway, Ionotropic glutamate receptor pathway, and Metabotropic glutamate receptor group III pathway) were over-represented in KEGG and PANTHER databases. Protein interaction prediction revealed an interactive relationship among the swimbladder-enriched genes and genes involved in sensory perception of sound. This study identified a set of genes and signaling pathways associated with auditory system in the swimbladder of channel catfish and provide resources for further study on the biological and physiological roles in catfish swimbladder. Copyright © 2018 Elsevier Inc. All rights reserved.

  10. ISAAC - InterSpecies Analysing Application using Containers.

    PubMed

    Baier, Herbert; Schultz, Jörg

    2014-01-15

    Information about genes, transcripts and proteins is spread over a wide variety of databases. Different tools have been developed using these databases to identify biological signals in gene lists from large scale analysis. Mostly, they search for enrichments of specific features. But, these tools do not allow an explorative walk through different views and to change the gene lists according to newly upcoming stories. To fill this niche, we have developed ISAAC, the InterSpecies Analysing Application using Containers. The central idea of this web based tool is to enable the analysis of sets of genes, transcripts and proteins under different biological viewpoints and to interactively modify these sets at any point of the analysis. Detailed history and snapshot information allows tracing each action. Furthermore, one can easily switch back to previous states and perform new analyses. Currently, sets can be viewed in the context of genomes, protein functions, protein interactions, pathways, regulation, diseases and drugs. Additionally, users can switch between species with an automatic, orthology based translation of existing gene sets. As todays research usually is performed in larger teams and consortia, ISAAC provides group based functionalities. Here, sets as well as results of analyses can be exchanged between members of groups. ISAAC fills the gap between primary databases and tools for the analysis of large gene lists. With its highly modular, JavaEE based design, the implementation of new modules is straight forward. Furthermore, ISAAC comes with an extensive web-based administration interface including tools for the integration of third party data. Thus, a local installation is easily feasible. In summary, ISAAC is tailor made for highly explorative interactive analyses of gene, transcript and protein sets in a collaborative environment.

  11. Featured Article: Genotation: Actionable knowledge for the scientific reader.

    PubMed

    Nagahawatte, Panduka; Willis, Ethan; Sakauye, Mark; Jose, Rony; Chen, Hao; Davis, Robert L

    2016-06-01

    We present an article viewer application that allows a scientific reader to easily discover and share knowledge by linking genomics-related concepts to knowledge of disparate biomedical databases. High-throughput data streams generated by technical advancements have contributed to scientific knowledge discovery at an unprecedented rate. Biomedical Informaticists have created a diverse set of databases to store and retrieve the discovered knowledge. The diversity and abundance of such resources present biomedical researchers a challenge with knowledge discovery. These challenges highlight a need for a better informatics solution. We use a text mining algorithm, Genomine, to identify gene symbols from the text of a journal article. The identified symbols are supplemented with information from the GenoDB knowledgebase. Self-updating GenoDB contains information from NCBI Gene, Clinvar, Medgen, dbSNP, KEGG, PharmGKB, Uniprot, and Hugo Gene databases. The journal viewer is a web application accessible via a web browser. The features described herein are accessible on www.genotation.org The Genomine algorithm identifies gene symbols with an accuracy shown by .65 F-Score. GenoDB currently contains information regarding 59,905 gene symbols, 5633 drug-gene relationships, 5981 gene-disease relationships, and 713 pathways. This application provides scientific readers with actionable knowledge related to concepts of a manuscript. The reader will be able to save and share supplements to be visualized in a graphical manner. This provides convenient access to details of complex biological phenomena, enabling biomedical researchers to generate novel hypothesis to further our knowledge in human health. This manuscript presents a novel application that integrates genomic, proteomic, and pharmacogenomic information to supplement content of a biomedical manuscript and enable readers to automatically discover actionable knowledge. © 2016 by the Society for Experimental Biology and Medicine.

  12. Impact of constitutional copy number variants on biological pathway evolution.

    PubMed

    Poptsova, Maria; Banerjee, Samprit; Gokcumen, Omer; Rubin, Mark A; Demichelis, Francesca

    2013-01-23

    Inherited Copy Number Variants (CNVs) can modulate the expression levels of individual genes. However, little is known about how CNVs alter biological pathways and how this varies across different populations. To trace potential evolutionary changes of well-described biological pathways, we jointly queried the genomes and the transcriptomes of a collection of individuals with Caucasian, Asian or Yoruban descent combining high-resolution array and sequencing data. We implemented an enrichment analysis of pathways accounting for CNVs and genes sizes and detected significant enrichment not only in signal transduction and extracellular biological processes, but also in metabolism pathways. Upon the estimation of CNV population differentiation (CNVs with different polymorphism frequencies across populations), we evaluated that 22% of the pathways contain at least one gene that is proximal to a CNV (CNV-gene pair) that shows significant population differentiation. The majority of these CNV-gene pairs belong to signal transduction pathways and 6% of the CNV-gene pairs show statistical association between the copy number states and the transcript levels. The analysis suggested possible examples of positive selection within individual populations including NF-kB, MAPK signaling pathways, and Alu/L1 retrotransposition factors. Altogether, our results suggest that constitutional CNVs may modulate subtle pathway changes through specific pathway enzymes, which may become fixed in some populations.

  13. Impact of constitutional copy number variants on biological pathway evolution

    PubMed Central

    2013-01-01

    Background Inherited Copy Number Variants (CNVs) can modulate the expression levels of individual genes. However, little is known about how CNVs alter biological pathways and how this varies across different populations. To trace potential evolutionary changes of well-described biological pathways, we jointly queried the genomes and the transcriptomes of a collection of individuals with Caucasian, Asian or Yoruban descent combining high-resolution array and sequencing data. Results We implemented an enrichment analysis of pathways accounting for CNVs and genes sizes and detected significant enrichment not only in signal transduction and extracellular biological processes, but also in metabolism pathways. Upon the estimation of CNV population differentiation (CNVs with different polymorphism frequencies across populations), we evaluated that 22% of the pathways contain at least one gene that is proximal to a CNV (CNV-gene pair) that shows significant population differentiation. The majority of these CNV-gene pairs belong to signal transduction pathways and 6% of the CNV-gene pairs show statistical association between the copy number states and the transcript levels. Conclusions The analysis suggested possible examples of positive selection within individual populations including NF-kB, MAPK signaling pathways, and Alu/L1 retrotransposition factors. Altogether, our results suggest that constitutional CNVs may modulate subtle pathway changes through specific pathway enzymes, which may become fixed in some populations. PMID:23342974

  14. New tools and methods for direct programmatic access to the dbSNP relational database

    PubMed Central

    Saccone, Scott F.; Quan, Jiaxi; Mehta, Gaurang; Bolze, Raphael; Thomas, Prasanth; Deelman, Ewa; Tischfield, Jay A.; Rice, John P.

    2011-01-01

    Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at http://cgsmd.isi.edu/dbsnpq. Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale. PMID:21037260

  15. Comprehensive Gene expression meta-analysis and integrated bioinformatic approaches reveal shared signatures between thrombosis and myeloproliferative disorders

    PubMed Central

    Jha, Prabhash Kumar; Vijay, Aatira; Sahu, Anita; Ashraf, Mohammad Zahid

    2016-01-01

    Thrombosis is a leading cause of morbidity and mortality in patients with myeloproliferative disorders (MPDs), particularly polycythemia vera (PV) and essential thrombocythemia (ET). Despite the attempts to establish a link between them, the shared biological mechanisms are yet to be characterized. An integrated gene expression meta-analysis of five independent publicly available microarray data of the three diseases was conducted to identify shared gene expression signatures and overlapping biological processes. Using INMEX bioinformatic tool, based on combined Effect Size (ES) approaches, we identified a total of 1,157 differentially expressed genes (DEGs) (697 overexpressed and 460 underexpressed genes) shared between the three diseases. EnrichR tool’s rich library was used for comprehensive functional enrichment and pathway analysis which revealed “mRNA Splicing” and “SUMO E3 ligases SUMOylate target proteins” among the most enriched terms. Network based meta-analysis identified MYC and FN1 to be the most highly ranked hub genes. Our results reveal that the alterations in biomarkers of the coagulation cascade like F2R, PROS1, SELPLG and ITGB2 were common between the three diseases. Interestingly, the study has generated a novel database of candidate genetic markers, pathways and transcription factors shared between thrombosis and MPDs, which might aid in the development of prognostic therapeutic biomarkers. PMID:27892526

  16. miRegulome: a knowledge-base of miRNA regulomics and analysis.

    PubMed

    Barh, Debmalya; Kamapantula, Bhanu; Jain, Neha; Nalluri, Joseph; Bhattacharya, Antaripa; Juneja, Lucky; Barve, Neha; Tiwari, Sandeep; Miyoshi, Anderson; Azevedo, Vasco; Blum, Kenneth; Kumar, Anil; Silva, Artur; Ghosh, Preetam

    2015-08-05

    miRNAs regulate post transcriptional gene expression by targeting multiple mRNAs and hence can modulate multiple signalling pathways, biological processes, and patho-physiologies. Therefore, understanding of miRNA regulatory networks is essential in order to modulate the functions of a miRNA. The focus of several existing databases is to provide information on specific aspects of miRNA regulation. However, an integrated resource on the miRNA regulome is currently not available to facilitate the exploration and understanding of miRNA regulomics. miRegulome attempts to bridge this gap. The current version of miRegulome v1.0 provides details on the entire regulatory modules of miRNAs altered in response to chemical treatments and transcription factors, based on validated data manually curated from published literature. Modules of miRegulome (upstream regulators, downstream targets, miRNA regulated pathways, functions, diseases, etc) are hyperlinked to an appropriate external resource and are displayed visually to provide a comprehensive understanding. Four analysis tools are incorporated to identify relationships among different modules based on user specified datasets. miRegulome and its tools are helpful in understanding the biology of miRNAs and will also facilitate the discovery of biomarkers and therapeutics. With added features in upcoming releases, miRegulome will be an essential resource to the scientific community. http://bnet.egr.vcu.edu/miRegulome.

  17. Review of family relational stress and pediatric asthma: the value of biopsychosocial systemic models.

    PubMed

    Wood, Beatrice L; Miller, Bruce D; Lehman, Heather K

    2015-06-01

    Asthma is the most common chronic disease in children. Despite dramatic advances in pharmacological treatments, asthma remains a leading public health problem, especially in socially disadvantaged minority populations. Some experts believe that this health gap is due to the failure to address the impact of stress on the disease. Asthma is a complex disease that is influenced by multilevel factors, but the nature of these factors and their interrelations are not well understood. This paper aims to integrate social, psychological, and biological literatures on relations between family/parental stress and pediatric asthma, and to illustrate the utility of multilevel systemic models for guiding treatment and stimulating future research. We used electronic database searches and conducted an integrated analysis of selected epidemiological, longitudinal, and empirical studies. Evidence is substantial for the effects of family/parental stress on asthma mediated by both disease management and psychobiological stress pathways. However, integrative models containing specific pathways are scarce. We present two multilevel models, with supporting data, as potential prototypes for other such models. We conclude that these multilevel systems models may be of substantial heuristic value in organizing investigations of, and clinical approaches to, the complex social-biological aspects of family stress in pediatric asthma. However, additional systemic models are needed, and the models presented herein could serve as prototypes for model development. © 2015 Family Process Institute.

  18. New perspectives in toxicological information management, and the role of ISSTOX databases in assessing chemical mutagenicity and carcinogenicity.

    PubMed

    Benigni, Romualdo; Battistelli, Chiara Laura; Bossa, Cecilia; Tcheremenskaia, Olga; Crettaz, Pierre

    2013-07-01

    Currently, the public has access to a variety of databases containing mutagenicity and carcinogenicity data. These resources are crucial for the toxicologists and regulators involved in the risk assessment of chemicals, which necessitates access to all the relevant literature, and the capability to search across toxicity databases using both biological and chemical criteria. Towards the larger goal of screening chemicals for a wide range of toxicity end points of potential interest, publicly available resources across a large spectrum of biological and chemical data space must be effectively harnessed with current and evolving information technologies (i.e. systematised, integrated and mined), if long-term screening and prediction objectives are to be achieved. A key to rapid progress in the field of chemical toxicity databases is that of combining information technology with the chemical structure as identifier of the molecules. This permits an enormous range of operations (e.g. retrieving chemicals or chemical classes, describing the content of databases, finding similar chemicals, crossing biological and chemical interrogations, etc.) that other more classical databases cannot allow. This article describes the progress in the technology of toxicity databases, including the concepts of Chemical Relational Database and Toxicological Standardized Controlled Vocabularies (Ontology). Then it describes the ISSTOX cluster of toxicological databases at the Istituto Superiore di Sanitá. It consists of freely available databases characterised by the use of modern information technologies and by curation of the quality of the biological data. Finally, this article provides examples of analyses and results made possible by ISSTOX.

  19. Transcriptome Sequencing in a Tibetan Barley Landrace with High Resistance to Powdery Mildew

    PubMed Central

    Zeng, Xing-Quan; Luo, Xiao-Mei; Wang, Yu-Lin; Xu, Qi-Jun; Bai, Li-Jun; Yuan, Hong-Jun; Tashi, Nyima

    2014-01-01

    Hulless barley is an important cereal crop worldwide, especially in Tibet of China. However, this crop is usually susceptible to powdery mildew caused by Blumeria graminis f. sp. hordei. In this study, we aimed to understand the functions and pathways of genes involved in the disease resistance by transcriptome sequencing of a Tibetan barley landrace with high resistance to powdery mildew. A total of 831 significant differentially expressed genes were found in the infected seedlings, covering 19 functions. Either “cell,” “cell part,” and “extracellular region” in the cellular component category or “binding” and “catalytic” in the category of molecular function as well as “metabolic process” and “cellular process” in the biological process category together demonstrated that these functions may be involved in the resistance to powdery mildew of the hulless barley. In addition, 330 KEGG pathways were found using BLASTx with an E-value cut-off of <10−5. Among them, three pathways, namely, “photosynthesis,” “plant-pathogen interaction,” and “photosynthesis-antenna proteins” had significant matches in the database. Significant expressions of the three pathways were detected at 24 h, 48 h, and 96 h after infection, respectively. These results indicated a complex process of barley response to powdery mildew infection. PMID:25587568

  20. Integrated pathway-based approach identifies association between genomic regions at CTCF and CACNB2 and schizophrenia.

    PubMed

    Juraeva, Dilafruz; Haenisch, Britta; Zapatka, Marc; Frank, Josef; Witt, Stephanie H; Mühleisen, Thomas W; Treutlein, Jens; Strohmaier, Jana; Meier, Sandra; Degenhardt, Franziska; Giegling, Ina; Ripke, Stephan; Leber, Markus; Lange, Christoph; Schulze, Thomas G; Mössner, Rainald; Nenadic, Igor; Sauer, Heinrich; Rujescu, Dan; Maier, Wolfgang; Børglum, Anders; Ophoff, Roel; Cichon, Sven; Nöthen, Markus M; Rietschel, Marcella; Mattheisen, Manuel; Brors, Benedikt

    2014-06-01

    In the present study, an integrated hierarchical approach was applied to: (1) identify pathways associated with susceptibility to schizophrenia; (2) detect genes that may be potentially affected in these pathways since they contain an associated polymorphism; and (3) annotate the functional consequences of such single-nucleotide polymorphisms (SNPs) in the affected genes or their regulatory regions. The Global Test was applied to detect schizophrenia-associated pathways using discovery and replication datasets comprising 5,040 and 5,082 individuals of European ancestry, respectively. Information concerning functional gene-sets was retrieved from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and the Molecular Signatures Database. Fourteen of the gene-sets or pathways identified in the discovery dataset were confirmed in the replication dataset. These include functional processes involved in transcriptional regulation and gene expression, synapse organization, cell adhesion, and apoptosis. For two genes, i.e. CTCF and CACNB2, evidence for association with schizophrenia was available (at the gene-level) in both the discovery study and published data from the Psychiatric Genomics Consortium schizophrenia study. Furthermore, these genes mapped to four of the 14 presently identified pathways. Several of the SNPs assigned to CTCF and CACNB2 have potential functional consequences, and a gene in close proximity to CACNB2, i.e. ARL5B, was identified as a potential gene of interest. Application of the present hierarchical approach thus allowed: (1) identification of novel biological gene-sets or pathways with potential involvement in the etiology of schizophrenia, as well as replication of these findings in an independent cohort; (2) detection of genes of interest for future follow-up studies; and (3) the highlighting of novel genes in previously reported candidate regions for schizophrenia.

  1. Pathway analysis of high-throughput biological data within a Bayesian network framework.

    PubMed

    Isci, Senol; Ozturk, Cengizhan; Jones, Jon; Otu, Hasan H

    2011-06-15

    Most current approaches to high-throughput biological data (HTBD) analysis either perform individual gene/protein analysis or, gene/protein set enrichment analysis for a list of biologically relevant molecules. Bayesian Networks (BNs) capture linear and non-linear interactions, handle stochastic events accounting for noise, and focus on local interactions, which can be related to causal inference. Here, we describe for the first time an algorithm that models biological pathways as BNs and identifies pathways that best explain given HTBD by scoring fitness of each network. Proposed method takes into account the connectivity and relatedness between nodes of the pathway through factoring pathway topology in its model. Our simulations using synthetic data demonstrated robustness of our approach. We tested proposed method, Bayesian Pathway Analysis (BPA), on human microarray data regarding renal cell carcinoma (RCC) and compared our results with gene set enrichment analysis. BPA was able to find broader and more specific pathways related to RCC. Accompanying BPA software (BPAS) package is freely available for academic use at http://bumil.boun.edu.tr/bpa.

  2. dEMBF: A Comprehensive Database of Enzymes of Microalgal Biofuel Feedstock.

    PubMed

    Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar; Mishra, Barada Kanta

    2016-01-01

    Microalgae have attracted wide attention as one of the most versatile renewable feedstocks for production of biofuel. To develop genetically engineered high lipid yielding algal strains, a thorough understanding of the lipid biosynthetic pathway and the underpinning enzymes is essential. In this work, we have systematically mined the genomes of fifteen diverse algal species belonging to Chlorophyta, Heterokontophyta, Rhodophyta, and Haptophyta, to identify and annotate the putative enzymes of lipid metabolic pathway. Consequently, we have also developed a database, dEMBF (Database of Enzymes of Microalgal Biofuel Feedstock), which catalogues the complete list of identified enzymes along with their computed annotation details including length, hydrophobicity, amino acid composition, subcellular location, gene ontology, KEGG pathway, orthologous group, Pfam domain, intron-exon organization, transmembrane topology, and secondary/tertiary structural data. Furthermore, to facilitate functional and evolutionary study of these enzymes, a collection of built-in applications for BLAST search, motif identification, sequence and phylogenetic analysis have been seamlessly integrated into the database. dEMBF is the first database that brings together all enzymes responsible for lipid synthesis from available algal genomes, and provides an integrative platform for enzyme inquiry and analysis. This database will be extremely useful for algal biofuel research. It can be accessed at http://bbprof.immt.res.in/embf.

  3. dEMBF: A Comprehensive Database of Enzymes of Microalgal Biofuel Feedstock

    PubMed Central

    Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar; Mishra, Barada Kanta

    2016-01-01

    Microalgae have attracted wide attention as one of the most versatile renewable feedstocks for production of biofuel. To develop genetically engineered high lipid yielding algal strains, a thorough understanding of the lipid biosynthetic pathway and the underpinning enzymes is essential. In this work, we have systematically mined the genomes of fifteen diverse algal species belonging to Chlorophyta, Heterokontophyta, Rhodophyta, and Haptophyta, to identify and annotate the putative enzymes of lipid metabolic pathway. Consequently, we have also developed a database, dEMBF (Database of Enzymes of Microalgal Biofuel Feedstock), which catalogues the complete list of identified enzymes along with their computed annotation details including length, hydrophobicity, amino acid composition, subcellular location, gene ontology, KEGG pathway, orthologous group, Pfam domain, intron-exon organization, transmembrane topology, and secondary/tertiary structural data. Furthermore, to facilitate functional and evolutionary study of these enzymes, a collection of built-in applications for BLAST search, motif identification, sequence and phylogenetic analysis have been seamlessly integrated into the database. dEMBF is the first database that brings together all enzymes responsible for lipid synthesis from available algal genomes, and provides an integrative platform for enzyme inquiry and analysis. This database will be extremely useful for algal biofuel research. It can be accessed at http://bbprof.immt.res.in/embf. PMID:26727469

  4. An Integrative data mining approach to identifying Adverse Outcome Pathway (AOP) Signatures

    EPA Science Inventory

    The Adverse Outcome Pathway (AOP) framework is a tool for making biological connections and summarizing key information across different levels of biological organization to connect biological perturbations at the molecular level to adverse outcomes for an individual or populatio...

  5. Interaction of Herbal Compounds with Biological Targets: A Case Study with Berberine

    PubMed Central

    Chen, Xiao-Wu; Di, Yuan Ming; Zhang, Jian; Zhou, Zhi-Wei; Li, Chun Guang; Zhou, Shu-Feng

    2012-01-01

    Berberine is one of the main alkaloids found in the Chinese herb Huang lian (Rhizoma Coptidis), which has been reported to have multiple pharmacological activities. This study aimed to analyze the molecular targets of berberine based on literature data followed by a pathway analysis using the PANTHER program. PANTHER analysis of berberine targets showed that the most classes of molecular functions include receptor binding, kinase activity, protein binding, transcription activity, DNA binding, and kinase regulator activity. Based on the biological process classification of in vitro berberine targets, those targets related to signal transduction, intracellular signalling cascade, cell surface receptor-linked signal transduction, cell motion, cell cycle control, immunity system process, and protein metabolic process are most frequently involved. In addition, berberine was found to interact with a mixture of biological pathways, such as Alzheimer's disease-presenilin and -secretase pathways, angiogenesis, apoptosis signalling pathway, FAS signalling pathway, Hungtington disease, inflammation mediated by chemokine and cytokine signalling pathways, interleukin signalling pathway, and p53 pathways. We also explored the possible mechanism of action for the anti-diabetic effect of berberine. Further studies are warranted to elucidate the mechanisms of action of berberine using systems biology approach. PMID:23213296

  6. GOSAP: Gene Ontology-Based Semantic Alignment of Biological Pathways.

    PubMed

    Gamalielsson, Jonas; Olsson, Bjorn

    2008-01-01

    We present a new method for semantic comparison of biological pathways, aiming to discover evolutionary conservation of pathways between species. Our method uses all three sub-ontologies of Gene Ontology (GO) and a measure of semantic similarity to calculate match scores between gene products. These scores are used for finding local pairwise pathway alignments. This approach has the advantage of being applicable to all types of pathways where nodes are gene products, e.g., regulatory pathways, signalling pathways and metabolic enzyme-to-enzyme pathways. We demonstrate the usefulness of the method using regulatory and metabolic pathways from E. coli and S. cerevisiae as examples.

  7. De novo assembly and functional annotation of Myrciaria dubia fruit transcriptome reveals multiple metabolic pathways for L-ascorbic acid biosynthesis.

    PubMed

    Castro, Juan C; Maddox, J Dylan; Cobos, Marianela; Requena, David; Zimic, Mirko; Bombarely, Aureliano; Imán, Sixto A; Cerdeira, Luis A; Medina, Andersson E

    2015-11-24

    Myrciaria dubia is an Amazonian fruit shrub that produces numerous bioactive phytochemicals, but is best known by its high L-ascorbic acid (AsA) content in fruits. Pronounced variation in AsA content has been observed both within and among individuals, but the genetic factors responsible for this variation are largely unknown. The goals of this research, therefore, were to assemble, characterize, and annotate the fruit transcriptome of M. dubia in order to reconstruct metabolic pathways and determine if multiple pathways contribute to AsA biosynthesis. In total 24,551,882 high-quality sequence reads were de novo assembled into 70,048 unigenes (mean length = 1150 bp, N50 = 1775 bp). Assembled sequences were annotated using BLASTX against public databases such as TAIR, GR-protein, FB, MGI, RGD, ZFIN, SGN, WB, TIGR_CMR, and JCVI-CMR with 75.2 % of unigenes having annotations. Of the three core GO annotation categories, biological processes comprised 53.6 % of the total assigned annotations, whereas cellular components and molecular functions comprised 23.3 and 23.1 %, respectively. Based on the KEGG pathway assignment of the functionally annotated transcripts, five metabolic pathways for AsA biosynthesis were identified: animal-like pathway, myo-inositol pathway, L-gulose pathway, D-mannose/L-galactose pathway, and uronic acid pathway. All transcripts coding enzymes involved in the ascorbate-glutathione cycle were also identified. Finally, we used the assembly to identified 6314 genic microsatellites and 23,481 high quality SNPs. This study describes the first next-generation sequencing effort and transcriptome annotation of a non-model Amazonian plant that is relevant for AsA production and other bioactive phytochemicals. Genes encoding key enzymes were successfully identified and metabolic pathways involved in biosynthesis of AsA, anthocyanins, and other metabolic pathways have been reconstructed. The identification of these genes and pathways is in agreement with the empirically observed capability of M. dubia to synthesize and accumulate AsA and other important molecules, and adds to our current knowledge of the molecular biology and biochemistry of their production in plants. By providing insights into the mechanisms underpinning these metabolic processes, these results can be used to direct efforts to genetically manipulate this organism in order to enhance the production of these bioactive phytochemicals. The accumulation of AsA precursor and discovery of genes associated with their biosynthesis and metabolism in M. dubia is intriguing and worthy of further investigation. The sequences and pathways produced here present the genetic framework required for further studies. Quantitative transcriptomics in concert with studies of the genome, proteome, and metabolome under conditions that stimulate production and accumulation of AsA and their precursors are needed to provide a more comprehensive view of how these pathways for AsA metabolism are regulated and linked in this species.

  8. Application of in Vitro Biotransformation Data and ...

    EPA Pesticide Factsheets

    The adverse biological effects of toxic substances are dependent upon the exposure concentration and the duration of exposure. Pharmacokinetic models can quantitatively relate the external concentration of a toxicant in the environment to the internal dose of the toxicant in the target tissues of an exposed organism. The exposure concentration of a toxic substance is usually not the same as the concentration of the active form of the toxicant that reaches the target tissues following absorption, distribution, and biotransformation of the parent toxicant. Biotransformation modulates the biological activity of chemicals through bioactivation and detoxication pathways. Many toxicants require biotransformation to exert their adverse biological effects. Considerable species differences in biotransformation and other pharmacokinetic processes can make extrapolation of toxicity data from laboratory animals to humans problematic. Additionally, interindividual differences in biotransformation among human populations with diverse genetics and lifestyles can lead to considerable variability in the bioactivation of toxic chemicals. Compartmental pharmacokinetic models of animals and humans are needed to understand the quantitative relationships between chemical exposure and target tissue dose as well as animal to human differences and interindividual differences in human populations. The data-based compartmental pharmacokinetic models widely used in clinical pharmacology ha

  9. Differential Proteome Analysis of a Flor Yeast Strain under Biofilm Formation.

    PubMed

    Moreno-García, Jaime; Mauricio, Juan Carlos; Moreno, Juan; García-Martínez, Teresa

    2017-03-28

    Several Saccharomyces cerevisiae strains (flor yeasts) form a biofilm (flor velum) on the surface of Sherry wines after fermentation, when glucose is depleted. This flor velum is fundamental to biological aging of these particular wines. In this study, we identify abundant proteins in the formation of the biofilm of an industrial flor yeast strain. A database search to enrich flor yeast "biological process" and "cellular component" according to Gene Ontology Terminology (GO Terms) and, "pathways" was carried out. The most abundant proteins detected were largely involved in respiration, translation, stress damage prevention and repair, amino acid metabolism (glycine, isoleucine, leucine and arginine), glycolysis/gluconeogenesis and biosynthesis of vitamin B9 (folate). These proteins were located in cellular components as in the peroxisome, mitochondria, vacuole, cell wall and extracellular region; being these two last directly related with the flor formation. Proteins like Bgl2p, Gcv3p, Hyp2p, Mdh1p, Suc2p and Ygp1p were quantified in very high levels. This study reveals some expected processes and provides new and important information for the design of conditions and genetic constructions of flor yeasts for improving the cellular survival and, thus, to optimize biological aging of Sherry wine production.

  10. Biological Pathways

    MedlinePlus

    ... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...

  11. Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks.

    PubMed

    Deeter, Anthony; Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui

    2017-01-01

    The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.

  12. Analysis of expressed sequence tags from Maize mosaic rhabdovirus-infected gut tissues of Peregrinus maidis reveals the presence of key components of insect innate immunity.

    PubMed

    Whitfield, A E; Rotenberg, D; Aritua, V; Hogenhout, S A

    2011-04-01

    The corn planthopper, Peregrinus maidis, causes direct feeding damage to plants and transmits Maize mosaic rhabdovirus (MMV) in a persistent-propagative manner. MMV must cross several insect tissue layers for successful transmission to occur, and the gut serves as an important barrier for rhabdovirus transmission. In order to facilitate the identification of proteins that may interact with MMV either by facilitating acquisition or responding to virus infection, we generated and analysed the gut transcriptome of P. maidis. From two normalized cDNA libraries, we generated a P. maidis gut transcriptome composed of 20,771 expressed sequence tags (ESTs). Assembly of the sequences yielded 1860 contigs and 14,032 singletons, and biological roles were assigned to 5793 (36%). Comparison of P. maidis ESTs with other insect amino acid sequences revealed that P. maidis shares greatest sequence similarity with another hemipteran, the brown planthopper Nilaparvata lugens. We identified 202 P. maidis transcripts with putative homology to proteins associated with insect innate immunity, including those implicated in the Toll, Imd, JAK/STAT, Jnk and the small-interfering RNA-mediated pathways. Sequence comparisons between our P. maidis gut EST collection and the currently available National Center for Biotechnology Information EST database collection for Ni. lugens revealed that a pathogen recognition receptor in the Imd pathway, peptidoglycan recognition protein-long class (PGRP-LC), is present in these two members of the family Delphacidae; however, these recognition receptors are lacking in the model hemipteran Acyrthosiphon pisum. In addition, we identified sequences in the P. maidis gut transcriptome that share significant amino acid sequence similarities with the rhabdovirus receptor molecule, acetylcholine receptor (AChR), found in other hosts. This EST analysis sheds new light on immune response pathways in hemipteran guts that will be useful for further dissecting innate defence response pathways to rhabdovirus infection. © 2011 The Authors. Insect Molecular Biology © 2011 The Royal Entomological Society.

  13. ASD: a comprehensive database of allosteric proteins and modulators

    PubMed Central

    Huang, Zhimin; Zhu, Liang; Cao, Yan; Wu, Geng; Liu, Xinyi; Chen, Yingyi; Wang, Qi; Shi, Ting; Zhao, Yaxue; Wang, Yuefei; Li, Weihua; Li, Yixue; Chen, Haifeng; Chen, Guoqiang; Zhang, Jian

    2011-01-01

    Allostery is the most direct, rapid and efficient way of regulating protein function, ranging from the control of metabolic mechanisms to signal-transduction pathways. However, an enormous amount of unsystematic allostery information has deterred scientists who could benefit from this field. Here, we present the AlloSteric Database (ASD), the first online database that provides a central resource for the display, search and analysis of structure, function and related annotation for allosteric molecules. Currently, ASD contains 336 allosteric proteins from 101 species and 8095 modulators in three categories (activators, inhibitors and regulators). Proteins are annotated with a detailed description of allostery, biological process and related diseases, and modulators with binding affinity, physicochemical properties and therapeutic area. Integrating the information of allosteric proteins in ASD should allow for the identification of specific allosteric sites of a given subtype among proteins of the same family that can potentially serve as ideal targets for experimental validation. In addition, modulators curated in ASD can be used to investigate potent allosteric targets for the query compound, and also help chemists to implement structure modifications for novel allosteric drug design. Therefore, ASD could be a platform and a starting point for biologists and medicinal chemists for furthering allosteric research. ASD is freely available at http://mdl.shsmu.edu.cn/ASD/. PMID:21051350

  14. PeroxisomeDB: a database for the peroxisomal proteome, functional genomics and disease

    PubMed Central

    Schlüter, Agatha; Fourcade, Stéphane; Domènech-Estévez, Enric; Gabaldón, Toni; Huerta-Cepas, Jaime; Berthommier, Guillaume; Ripp, Raymond; Wanders, Ronald J. A.; Poch, Olivier; Pujol, Aurora

    2007-01-01

    Peroxisomes are essential organelles of eukaryotic origin, ubiquitously distributed in cells and organisms, playing key roles in lipid and antioxidant metabolism. Loss or malfunction of peroxisomes causes more than 20 fatal inherited conditions. We have created a peroxisomal database () that includes the complete peroxisomal proteome of Homo sapiens and Saccharomyces cerevisiae, by gathering, updating and integrating the available genetic and functional information on peroxisomal genes. PeroxisomeDB is structured in interrelated sections ‘Genes’, ‘Functions’, ‘Metabolic pathways’ and ‘Diseases’, that include hyperlinks to selected features of NCBI, ENSEMBL and UCSC databases. We have designed graphical depictions of the main peroxisomal metabolic routes and have included updated flow charts for diagnosis. Precomputed BLAST, PSI-BLAST, multiple sequence alignment (MUSCLE) and phylogenetic trees are provided to assist in direct multispecies comparison to study evolutionary conserved functions and pathways. Highlights of the PeroxisomeDB include new tools developed for facilitating (i) identification of novel peroxisomal proteins, by means of identifying proteins carrying peroxisome targeting signal (PTS) motifs, (ii) detection of peroxisomes in silico, particularly useful for screening the deluge of newly sequenced genomes. PeroxisomeDB should contribute to the systematic characterization of the peroxisomal proteome and facilitate system biology approaches on the organelle. PMID:17135190

  15. KIDFamMap: a database of kinase-inhibitor-disease family maps for kinase inhibitor selectivity and binding mechanisms

    PubMed Central

    Chiu, Yi-Yuan; Lin, Chih-Ta; Huang, Jhang-Wei; Hsu, Kai-Cheng; Tseng, Jen-Hu; You, Syuan-Ren; Yang, Jinn-Moon

    2013-01-01

    Kinases play central roles in signaling pathways and are promising therapeutic targets for many diseases. Designing selective kinase inhibitors is an emergent and challenging task, because kinases share an evolutionary conserved ATP-binding site. KIDFamMap (http://gemdock.life.nctu.edu.tw/KIDFamMap/) is the first database to explore kinase-inhibitor families (KIFs) and kinase-inhibitor-disease (KID) relationships for kinase inhibitor selectivity and mechanisms. This database includes 1208 KIFs, 962 KIDs, 55 603 kinase-inhibitor interactions (KIIs), 35 788 kinase inhibitors, 399 human protein kinases, 339 diseases and 638 disease allelic variants. Here, a KIF can be defined as follows: (i) the kinases in the KIF with significant sequence similarity, (ii) the inhibitors in the KIF with significant topology similarity and (iii) the KIIs in the KIF with significant interaction similarity. The KIIs within a KIF are often conserved on some consensus KIDFamMap anchors, which represent conserved interactions between the kinase subsites and consensus moieties of their inhibitors. Our experimental results reveal that the members of a KIF often possess similar inhibition profiles. The KIDFamMap anchors can reflect kinase conformations types, kinase functions and kinase inhibitor selectivity. We believe that KIDFamMap provides biological insights into kinase inhibitor selectivity and binding mechanisms. PMID:23193279

  16. Plant MetGenMAP: an integrative analysis system for plant systems biology

    USDA-ARS?s Scientific Manuscript database

    We have developed a web-based system, Plant MetGenMAP, which can identify significantly altered biochemical pathways and highly affected biological processes, predict functional roles of pathway genes, and potential pathway-related regulatory motifs from transcript and metabolite profile datasets. P...

  17. Using Bioinformatic Approaches to Identify Pathways Targeted by Human Leukemogens

    PubMed Central

    Thomas, Reuben; Phuong, Jimmy; McHale, Cliona M.; Zhang, Luoping

    2012-01-01

    We have applied bioinformatic approaches to identify pathways common to chemical leukemogens and to determine whether leukemogens could be distinguished from non-leukemogenic carcinogens. From all known and probable carcinogens classified by IARC and NTP, we identified 35 carcinogens that were associated with leukemia risk in human studies and 16 non-leukemogenic carcinogens. Using data on gene/protein targets available in the Comparative Toxicogenomics Database (CTD) for 29 of the leukemogens and 11 of the non-leukemogenic carcinogens, we analyzed for enrichment of all 250 human biochemical pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The top pathways targeted by the leukemogens included metabolism of xenobiotics by cytochrome P450, glutathione metabolism, neurotrophin signaling pathway, apoptosis, MAPK signaling, Toll-like receptor signaling and various cancer pathways. The 29 leukemogens formed 18 distinct clusters comprising 1 to 3 chemicals that did not correlate with known mechanism of action or with structural similarity as determined by 2D Tanimoto coefficients in the PubChem database. Unsupervised clustering and one-class support vector machines, based on the pathway data, were unable to distinguish the 29 leukemogens from 11 non-leukemogenic known and probable IARC carcinogens. However, using two-class random forests to estimate leukemogen and non-leukemogen patterns, we estimated a 76% chance of distinguishing a random leukemogen/non-leukemogen pair from each other. PMID:22851955

  18. Sig2BioPAX: Java tool for converting flat files to BioPAX Level 3 format.

    PubMed

    Webb, Ryan L; Ma'ayan, Avi

    2011-03-21

    The World Wide Web plays a critical role in enabling molecular, cell, systems and computational biologists to exchange, search, visualize, integrate, and analyze experimental data. Such efforts can be further enhanced through the development of semantic web concepts. The semantic web idea is to enable machines to understand data through the development of protocol free data exchange formats such as Resource Description Framework (RDF) and the Web Ontology Language (OWL). These standards provide formal descriptors of objects, object properties and their relationships within a specific knowledge domain. However, the overhead of converting datasets typically stored in data tables such as Excel, text or PDF into RDF or OWL formats is not trivial for non-specialists and as such produces a barrier to seamless data exchange between researchers, databases and analysis tools. This problem is particularly of importance in the field of network systems biology where biochemical interactions between genes and their protein products are abstracted to networks. For the purpose of converting biochemical interactions into the BioPAX format, which is the leading standard developed by the computational systems biology community, we developed an open-source command line tool that takes as input tabular data describing different types of molecular biochemical interactions. The tool converts such interactions into the BioPAX level 3 OWL format. We used the tool to convert several existing and new mammalian networks of protein interactions, signalling pathways, and transcriptional regulatory networks into BioPAX. Some of these networks were deposited into PathwayCommons, a repository for consolidating and organizing biochemical networks. The software tool Sig2BioPAX is a resource that enables experimental and computational systems biologists to contribute their identified networks and pathways of molecular interactions for integration and reuse with the rest of the research community.

  19. Curation of inhibitor-target data: process and impact on pathway analysis.

    PubMed

    Devidas, Sreenivas

    2009-01-01

    The past decade has seen a significant emergence in the availability and use of pathway analysis tools. The workflow that is supported by most of the pathway analysis tools is limited to either of the following: a. a network of genes based on the input data set, or b. the resultant network filtered down by a few criteria such as (but not limited to) i. disease association of the genes in the network; ii. targets known to be the target of one or more launched drugs; iii. targets known to be the target of one or more compounds in clinical trials; and iv. targets reasonably known to be potential candidate or clinical biomarkers. Almost all the tools in use today are biased towards the biological side and contain little, if any, information on the chemical inhibitors associated with the components of a given biological network. The limitation resides as follows: The fact that the number of inhibitors that have been published or patented is probably several fold (probably greater than 10-fold) more than the number of published protein-protein interactions. Curation of such data is both expensive and time consuming and could impact ROI significantly. The non-standardization associated with protein and gene names makes mapping reasonably non-straightforward. The number of patented and published inhibitors across target classes increases by over a million per year. Therefore, keeping the databases current becomes a monumental problem. Modifications required in the product architectures to accommodate chemistry-related content. GVK Bio has, over the past 7 years, curated the compound-target data that is necessary for the addition of such compound-centric workflows. This chapter focuses on identification, curation and utility of such data.

  20. The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection.

    PubMed

    Fernández-Suárez, Xosé M; Rigden, Daniel J; Galperin, Michael Y

    2014-01-01

    The 2014 Nucleic Acids Research Database Issue includes descriptions of 58 new molecular biology databases and recent updates to 123 databases previously featured in NAR or other journals. For convenience, the issue is now divided into eight sections that reflect major subject categories. Among the highlights of this issue are six databases of the transcription factor binding sites in various organisms and updates on such popular databases as CAZy, Database of Genomic Variants (DGV), dbGaP, DrugBank, KEGG, miRBase, Pfam, Reactome, SEED, TCDB and UniProt. There is a strong block of structural databases, which includes, among others, the new RNA Bricks database, updates on PDBe, PDBsum, ArchDB, Gene3D, ModBase, Nucleic Acid Database and the recently revived iPfam database. An update on the NCBI's MMDB describes VAST+, an improved tool for protein structure comparison. Two articles highlight the development of the Structural Classification of Proteins (SCOP) database: one describes SCOPe, which automates assignment of new structures to the existing SCOP hierarchy; the other one describes the first version of SCOP2, with its more flexible approach to classifying protein structures. This issue also includes a collection of articles on bacterial taxonomy and metagenomics, which includes updates on the List of Prokaryotic Names with Standing in Nomenclature (LPSN), Ribosomal Database Project (RDP), the Silva/LTP project and several new metagenomics resources. The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been expanded to 1552 databases. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/).

  1. Analysis of expressed sequence tags (ESTs) from cocoa (Theobroma cacao L) upon infection with Phytophthora megakarya.

    PubMed

    Naganeeswaran, Sudalaimuthu Asari; Subbian, Elain Apshara; Ramaswamy, Manimekalai

    2012-01-01

    Phytophthora megakarya, the causative agent of cacao black pod disease in West African countries causes an extensive loss of yield. In this study we have analyzed 4 libraries of ESTs derived from Phytophthora megakarya infected cocoa leaf and pod tissues. Totally 6379 redundant sequences were retrieved from ESTtik database and EST processing was performed using seqclean tool. Clustering and assembling using CAP3 generated 3333 non-redundant (907 contigs and 2426 singletons) sequences. The primary sequence analysis of 3333 non-redundant sequences showed that the GC percentage was 42.7 and the sequence length ranged from 101 - 2576 nucleotides. Further, functional analysis (Blast, Interproscan, Gene ontology and KEGG search) were executed and 1230 orthologous genes were annotated. Totally 272 enzymes corresponding to 114 metabolic pathways were identified. Functional annotation revealed that most of the sequences are related to molecular function, stress response and biological processes. The annotated enzymes are aldehyde dehydrogenase (E.C: 1.2.1.3), catalase (E.C: 1.11.1.6), acetyl-CoA C-acetyltransferase (E.C: 2.3.1.9), threonine ammonia-lyase (E.C: 4.3.1.19), acetolactate synthase (E.C: 2.2.1.6), O-methyltransferase (E.C: 2.1.1.68) which play an important role in amino acid biosynthesis and phenyl propanoid biosynthesis. All this information was stored in MySQL database management system to be used in future for reconstruction of biotic stress response pathway in cocoa.

  2. Genome-wide identification, classification, and functional analysis of the basic helix-loop-helix transcription factors in the cattle, Bos Taurus.

    PubMed

    Li, Fengmei; Liu, Wuyi

    2017-06-01

    The basic helix-loop-helix (bHLH) transcription factors (TFs) form a huge superfamily and play crucial roles in many essential developmental, genetic, and physiological-biochemical processes of eukaryotes. In total, 109 putative bHLH TFs were identified and categorized successfully in the genomic databases of cattle, Bos Taurus, after removing redundant sequences and merging genetic isoforms. Through phylogenetic analyses, 105 proteins among these bHLH TFs were classified into 44 families with 46, 25, 14, 3, 13, and 4 members in the high-order groups A, B, C, D, E, and F, respectively. The remaining 4 bHLH proteins were sorted out as 'orphans.' Next, these 109 putative bHLH proteins identified were further characterized as significantly enriched in 524 significant Gene Ontology (GO) annotations (corrected P value ≤ 0.05) and 21 significantly enriched pathways (corrected P value ≤ 0.05) that had been mapped by the web server KOBAS 2.0. Furthermore, 95 bHLH proteins were further screened and analyzed together with two uncharacterized proteins in the STRING online database to reconstruct the protein-protein interaction network of cattle bHLH TFs. Ultimately, 89 bHLH proteins were fully mapped in a network with 67 biological process, 13 molecular functions, 5 KEGG pathways, 12 PFAM protein domains, and 25 INTERPRO classified protein domains and features. These results provide much useful information and a good reference for further functional investigations and updated researches on cattle bHLH TFs.

  3. The 2018 Nucleic Acids Research database issue and the online molecular biology database collection.

    PubMed

    Rigden, Daniel J; Fernández, Xosé M

    2018-01-04

    The 2018 Nucleic Acids Research Database Issue contains 181 papers spanning molecular biology. Among them, 82 are new and 84 are updates describing resources that appeared in the Issue previously. The remaining 15 cover databases most recently published elsewhere. Databases in the area of nucleic acids include 3DIV for visualisation of data on genome 3D structure and RNArchitecture, a hierarchical classification of RNA families. Protein databases include the established SMART, ELM and MEROPS while GPCRdb and the newcomer STCRDab cover families of biomedical interest. In the area of metabolism, HMDB and Reactome both report new features while PULDB appears in NAR for the first time. This issue also contains reports on genomics resources including Ensembl, the UCSC Genome Browser and ENCODE. Update papers from the IUPHAR/BPS Guide to Pharmacology and DrugBank are highlights of the drug and drug target section while a number of proteomics databases including proteomicsDB are also covered. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been updated, reviewing 138 entries, adding 88 new resources and eliminating 47 discontinued URLs, bringing the current total to 1737 databases. It is available at http://www.oxfordjournals.org/nar/database/c/. © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Bibliographical database of radiation biological dosimetry and risk assessment: Part 1, through June 1988

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Straume, T.; Ricker, Y.; Thut, M.

    1988-08-29

    This database was constructed to support research in radiation biological dosimetry and risk assessment. Relevant publications were identified through detailed searches of national and international electronic databases and through our personal knowledge of the subject. Publications were numbered and key worded, and referenced in an electronic data-retrieval system that permits quick access through computerized searches on publication number, authors, key words, title, year, and journal name. Photocopies of all publications contained in the database are maintained in a file that is numerically arranged by citation number. This report of the database is provided as a useful reference and overview. Itmore » should be emphasized that the database will grow as new citations are added to it. With that in mind, we arranged this report in order of ascending citation number so that follow-up reports will simply extend this document. The database cite 1212 publications. Publications are from 119 different scientific journals, 27 of these journals are cited at least 5 times. It also contains reference to 42 books and published symposia, and 129 reports. Information relevant to radiation biological dosimetry and risk assessment is widely distributed among the scientific literature, although a few journals clearly dominate. The four journals publishing the largest number of relevant papers are Health Physics, Mutation Research, Radiation Research, and International Journal of Radiation Biology. Publications in Health Physics make up almost 10% of the current database.« less

  5. LeishCyc: a guide to building a metabolic pathway database and visualization of metabolomic data.

    PubMed

    Saunders, Eleanor C; MacRae, James I; Naderer, Thomas; Ng, Milica; McConville, Malcolm J; Likić, Vladimir A

    2012-01-01

    The complexity of the metabolic networks in even the simplest organisms has raised new challenges in organizing metabolic information. To address this, specialized computer frameworks have been developed to capture, manage, and visualize metabolic knowledge. The leading databases of metabolic information are those organized under the umbrella of the BioCyc project, which consists of the reference database MetaCyc, and a number of pathway/genome databases (PGDBs) each focussed on a specific organism. A number of PGDBs have been developed for bacterial, fungal, and protozoan pathogens, greatly facilitating dissection of the metabolic potential of these organisms and the identification of new drug targets. Leishmania are protozoan parasites belonging to the family Trypanosomatidae that cause a broad spectrum of diseases in humans. In this work we use the LeishCyc database, the BioCyc database for Leishmania major, to describe how to build a BioCyc database from genomic sequences and associated annotations. By using metabolomic data generated in our group, we show how such databases can be utilized to elucidate specific changes in parasite metabolism.

  6. BioCarian: search engine for exploratory searches in heterogeneous biological databases.

    PubMed

    Zaki, Nazar; Tennakoon, Chandana

    2017-10-02

    There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search on previously published viral integration data and were able to deduce the main conclusions of the original publication. BioCarian is accessible via http://www.biocarian.com . We have developed a search engine to explore RDF databases that can be used by both novice and advanced users.

  7. Frameworks for organizing exposure and toxicity data - the Aggregate Exposure Pathway (AEP) and the Adverse Outcome Pathway (AOP)

    EPA Science Inventory

    The Adverse Outcome Pathway (AOP) framework organizes existing knowledge regarding a series of biological events, starting with a molecular initiating event (MIE) and ending at an adverse outcome. The AOP framework provides a biological context to interpret in vitro toxicity dat...

  8. ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding.

    PubMed

    Guhlin, Joseph; Silverstein, Kevin A T; Zhou, Peng; Tiffin, Peter; Young, Nevin D

    2017-08-10

    Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data. The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations. ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or understudied species. For species for which more data are available, ODG can be used to conduct complex multi-omics, pattern-matching queries.

  9. Disentangling the multigenic and pleiotropic nature of molecular function

    PubMed Central

    2015-01-01

    Background Biological processes at the molecular level are usually represented by molecular interaction networks. Function is organised and modularity identified based on network topology, however, this approach often fails to account for the dynamic and multifunctional nature of molecular components. For example, a molecule engaging in spatially or temporally independent functions may be inappropriately clustered into a single functional module. To capture biologically meaningful sets of interacting molecules, we use experimentally defined pathways as spatial/temporal units of molecular activity. Results We defined functional profiles of Saccharomyces cerevisiae based on a minimal set of Gene Ontology terms sufficient to represent each pathway's genes. The Gene Ontology terms were used to annotate 271 pathways, accounting for pathway multi-functionality and gene pleiotropy. Pathways were then arranged into a network, linked by shared functionality. Of the genes in our data set, 44% appeared in multiple pathways performing a diverse set of functions. Linking pathways by overlapping functionality revealed a modular network with energy metabolism forming a sparse centre, surrounded by several denser clusters comprised of regulatory and metabolic pathways. Signalling pathways formed a relatively discrete cluster connected to the centre of the network. Genetic interactions were enriched within the clusters of pathways by a factor of 5.5, confirming the organisation of our pathway network is biologically significant. Conclusions Our representation of molecular function according to pathway relationships enables analysis of gene/protein activity in the context of specific functional roles, as an alternative to typical molecule-centric graph-based methods. The pathway network demonstrates the cooperation of multiple pathways to perform biological processes and organises pathways into functionally related clusters with interdependent outcomes. PMID:26678917

  10. MitProNet: A Knowledgebase and Analysis Platform of Proteome, Interactome and Diseases for Mammalian Mitochondria

    PubMed Central

    Mao, Song; Chai, Xiaoqiang; Hu, Yuling; Hou, Xugang; Tang, Yiheng; Bi, Cheng; Li, Xiao

    2014-01-01

    Mitochondrion plays a central role in diverse biological processes in most eukaryotes, and its dysfunctions are critically involved in a large number of diseases and the aging process. A systematic identification of mitochondrial proteomes and characterization of functional linkages among mitochondrial proteins are fundamental in understanding the mechanisms underlying biological functions and human diseases associated with mitochondria. Here we present a database MitProNet which provides a comprehensive knowledgebase for mitochondrial proteome, interactome and human diseases. First an inventory of mammalian mitochondrial proteins was compiled by widely collecting proteomic datasets, and the proteins were classified by machine learning to achieve a high-confidence list of mitochondrial proteins. The current version of MitProNet covers 1124 high-confidence proteins, and the remainders were further classified as middle- or low-confidence. An organelle-specific network of functional linkages among mitochondrial proteins was then generated by integrating genomic features encoded by a wide range of datasets including genomic context, gene expression profiles, protein-protein interactions, functional similarity and metabolic pathways. The functional-linkage network should be a valuable resource for the study of biological functions of mitochondrial proteins and human mitochondrial diseases. Furthermore, we utilized the network to predict candidate genes for mitochondrial diseases using prioritization algorithms. All proteins, functional linkages and disease candidate genes in MitProNet were annotated according to the information collected from their original sources including GO, GEO, OMIM, KEGG, MIPS, HPRD and so on. MitProNet features a user-friendly graphic visualization interface to present functional analysis of linkage networks. As an up-to-date database and analysis platform, MitProNet should be particularly helpful in comprehensive studies of complicated biological mechanisms underlying mitochondrial functions and human mitochondrial diseases. MitProNet is freely accessible at http://bio.scu.edu.cn:8085/MitProNet. PMID:25347823

  11. FragariaCyc: A Metabolic Pathway Database for Woodland Strawberry Fragaria vesca

    PubMed Central

    Naithani, Sushma; Partipilo, Christina M.; Raja, Rajani; Elser, Justin L.; Jaiswal, Pankaj

    2016-01-01

    FragariaCyc is a strawberry-specific cellular metabolic network based on the annotated genome sequence of Fragaria vesca L. ssp. vesca, accession Hawaii 4. It was built on the Pathway-Tools platform using MetaCyc as the reference. The experimental evidences from published literature were used for supporting/editing existing entities and for the addition of new pathways, enzymes, reactions, compounds, and small molecules in the database. To date, FragariaCyc comprises 66 super-pathways, 488 unique pathways, 2348 metabolic reactions, 3507 enzymes, and 2134 compounds. In addition to searching and browsing FragariaCyc, researchers can compare pathways across various plant metabolic networks and analyze their data using Omics Viewer tool. We view FragariaCyc as a resource for the community of researchers working with strawberry and related fruit crops. It can help understanding the regulation of overall metabolism of strawberry plant during development and in response to diseases and abiotic stresses. FragariaCyc is available online at http://pathways.cgrb.oregonstate.edu. PMID:26973684

  12. KEGGtranslator: visualizing and converting the KEGG PATHWAY database to various formats.

    PubMed

    Wrzodek, Clemens; Dräger, Andreas; Zell, Andreas

    2011-08-15

    The KEGG PATHWAY database provides a widely used service for metabolic and nonmetabolic pathways. It contains manually drawn pathway maps with information about the genes, reactions and relations contained therein. To store these pathways, KEGG uses KGML, a proprietary XML-format. Parsers and translators are needed to process the pathway maps for usage in other applications and algorithms. We have developed KEGGtranslator, an easy-to-use stand-alone application that can visualize and convert KGML formatted XML-files into multiple output formats. Unlike other translators, KEGGtranslator supports a plethora of output formats, is able to augment the information in translated documents (e.g. MIRIAM annotations) beyond the scope of the KGML document, and amends missing components to fragmentary reactions within the pathway to allow simulations on those. KEGGtranslator is freely available as a Java(™) Web Start application and for download at http://www.cogsys.cs.uni-tuebingen.de/software/KEGGtranslator/. KGML files can be downloaded from within the application. clemens.wrzodek@uni-tuebingen.de Supplementary data are available at Bioinformatics online.

  13. The usefulness of ozone treatment in spinal pain

    PubMed Central

    Bocci, Velio; Borrelli, Emma; Zanardi, Iacopo; Travagli, Valter

    2015-01-01

    Objective The aim of this review is to elucidate the biochemical, molecular, immunological, and pharmaceutical mechanisms of action of ozone dissolved in biological fluids. Studies performed during the last two decades allow the drawing of a comprehensive framework for understanding and recommending the integration of ozone therapy for spinal pain. Methods An in-depth screening of primary sources of information online – via SciFinder Scholar, Google Scholar, and Scopus databases as well as Embase, PubMed, and the Cochrane Database of Systemic Reviews – was performed. In this review, the most significant papers of the last 25 years are presented and their proposals critically evaluated, regardless of the bibliometric impact of the journals. Results The efficacy of standard treatments combined with the unique capacity of ozone therapy to reactivate the innate antioxidant system is the key to correcting the oxidative stress typical of chronic inflammatory diseases. Pain pathways and control systems of algesic signals after ozone administration are described. Conclusion This paper finds favors the full insertion of ozone therapy into pharmaceutical sciences, rather than as either an alternative or an esoteric approach. PMID:26028964

  14. The State of the Art of the Zebrafish Model for Toxicology and Toxicologic Pathology Research—Advantages and Current Limitations

    PubMed Central

    Spitsbergen, Jan M.; Kent, Michael L.

    2007-01-01

    The zebrafish (Danio rerio) is now the pre-eminent vertebrate model system for clarification of the roles of specific genes and signaling pathways in development. The zebrafish genome will be completely sequenced within the next 1–2 years. Together with the substantial historical database regarding basic developmental biology, toxicology, and gene transfer, the rich foundation of molecular genetic and genomic data makes zebrafish a powerful model system for clarifying mechanisms in toxicity. In contrast to the highly advanced knowledge base on molecular developmental genetics in zebrafish, our database regarding infectious and noninfectious diseases and pathologic lesions in zebrafish lags far behind the information available on most other domestic mammalian and avian species, particularly rodents. Currently, minimal data are available regarding spontaneous neoplasm rates or spontaneous aging lesions in any of the commonly used wild-type or mutant lines of zebrafish. Therefore, to fully utilize the potential of zebrafish as an animal model for understanding human development, disease, and toxicology we must greatly advance our knowledge on zebrafish diseases and pathology. PMID:12597434

  15. Low Frequency Variants, Collapsed Based on Biological Knowledge, Uncover Complexity of Population Stratification in 1000 Genomes Project Data

    PubMed Central

    Moore, Carrie B.; Wallace, John R.; Wolfe, Daniel J.; Frase, Alex T.; Pendergrass, Sarah A.; Weiss, Kenneth M.; Ritchie, Marylyn D.

    2013-01-01

    Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03) between populations from 1000 Genomes Project Phase I data. We found that on average 26.87% of gene bins, 35.47% of intergenic bins, 42.85% of pathway bins, 14.86% of ORegAnno regulatory bins, and 5.97% of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses. PMID:24385916

  16. Characteristics of genomic signatures derived using univariate methods and mechanistically anchored functional descriptors for predicting drug- and xenobiotic-induced nephrotoxicity.

    PubMed

    Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J

    2008-01-01

    ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.

  17. [Advance in flavonoids biosynthetic pathway and synthetic biology].

    PubMed

    Zou, Li-Qiu; Wang, Cai-Xia; Kuang, Xue-Jun; Li, Ying; Sun, Chao

    2016-11-01

    Flavonoids are the valuable components in medicinal plants, which possess a variety of pharmacological activities, including anti-tumor, antioxidant and anti-inflammatory activities. There is an unambiguous understanding about flavonoids biosynthetic pathway, that is,2S-flavanones including naringenin and pinocembrin are the skeleton of other flavonoids and they can transform to other flavonoids through branched metabolic pathway. Elucidation of the flavonoids biosynthetic pathway lays a solid foundation for their synthetic biology. A few flavonoids have been produced in Escherichia coli or yeast with synthetic biological technologies, such as naringenin, pinocembrin and fisetin. Synthetic biology will provide a new way to get valuable flavonoids and promote the research and development of flavonoid drugs and health products, making flavonoids play more important roles in human diet and health. Copyright© by the Chinese Pharmaceutical Association.

  18. A novel method to identify pathways associated with renal cell carcinoma based on a gene co-expression network

    PubMed Central

    RUAN, XIYUN; LI, HONGYUN; LIU, BO; CHEN, JIE; ZHANG, SHIBAO; SUN, ZEQIANG; LIU, SHUANGQING; SUN, FAHAI; LIU, QINGYONG

    2015-01-01

    The aim of the present study was to develop a novel method for identifying pathways associated with renal cell carcinoma (RCC) based on a gene co-expression network. A framework was established where a co-expression network was derived from the database as well as various co-expression approaches. First, the backbone of the network based on differentially expressed (DE) genes between RCC patients and normal controls was constructed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database. The differentially co-expressed links were detected by Pearson’s correlation, the empirical Bayesian (EB) approach and Weighted Gene Co-expression Network Analysis (WGCNA). The co-expressed gene pairs were merged by a rank-based algorithm. We obtained 842; 371; 2,883 and 1,595 co-expressed gene pairs from the co-expression networks of the STRING database, Pearson’s correlation EB method and WGCNA, respectively. Two hundred and eighty-one differentially co-expressed (DC) gene pairs were obtained from the merged network using this novel method. Pathway enrichment analysis based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and the network enrichment analysis (NEA) method were performed to verify feasibility of the merged method. Results of the KEGG and NEA pathway analyses showed that the network was associated with RCC. The suggested method was computationally efficient to identify pathways associated with RCC and has been identified as a useful complement to traditional co-expression analysis. PMID:26058425

  19. Bayesian parameter estimation for nonlinear modelling of biological pathways.

    PubMed

    Ghasemi, Omid; Lindsey, Merry L; Yang, Tianyi; Nguyen, Nguyen; Huang, Yufei; Jin, Yu-Fang

    2011-01-01

    The availability of temporal measurements on biological experiments has significantly promoted research areas in systems biology. To gain insight into the interaction and regulation of biological systems, mathematical frameworks such as ordinary differential equations have been widely applied to model biological pathways and interpret the temporal data. Hill equations are the preferred formats to represent the reaction rate in differential equation frameworks, due to their simple structures and their capabilities for easy fitting to saturated experimental measurements. However, Hill equations are highly nonlinearly parameterized functions, and parameters in these functions cannot be measured easily. Additionally, because of its high nonlinearity, adaptive parameter estimation algorithms developed for linear parameterized differential equations cannot be applied. Therefore, parameter estimation in nonlinearly parameterized differential equation models for biological pathways is both challenging and rewarding. In this study, we propose a Bayesian parameter estimation algorithm to estimate parameters in nonlinear mathematical models for biological pathways using time series data. We used the Runge-Kutta method to transform differential equations to difference equations assuming a known structure of the differential equations. This transformation allowed us to generate predictions dependent on previous states and to apply a Bayesian approach, namely, the Markov chain Monte Carlo (MCMC) method. We applied this approach to the biological pathways involved in the left ventricle (LV) response to myocardial infarction (MI) and verified our algorithm by estimating two parameters in a Hill equation embedded in the nonlinear model. We further evaluated our estimation performance with different parameter settings and signal to noise ratios. Our results demonstrated the effectiveness of the algorithm for both linearly and nonlinearly parameterized dynamic systems. Our proposed Bayesian algorithm successfully estimated parameters in nonlinear mathematical models for biological pathways. This method can be further extended to high order systems and thus provides a useful tool to analyze biological dynamics and extract information using temporal data.

  20. De novo assembly of pen shell ( Atrina pectinata) transcriptome and screening of its genic microsatellites

    NASA Astrophysics Data System (ADS)

    Sun, Xiujun; Li, Dongming; Liu, Zhihong; Zhou, Liqing; Wu, Biao; Yang, Aiguo

    2017-10-01

    The pen shell ( Atrina pectinata) is a large wedge-shaped bivalve, which belongs to family Pinnidae. Due to its large and nutritious adductor muscle, it is the popular seafood with high commercial value in Asia-Pacific countries. However, limiting genomic and transcriptomic data have hampered its genetic investigations. In this study, the transcriptome of A. pectinata was deeply sequenced using Illumina pair-end sequencing technology. After assembling, a total of 127263 unigenes were obtained. Functional annotation indicated that the highest percentage of unigenes (18.60%) was annotated on GO database, followed by 18.44% on PFAM database and 17.04% on NR database. There were 270 biological pathways matched with those in KEGG database. Furthermore, a total of 23452 potential simple sequence repeats (SSRs) were identified, of them the most abundant type was mono-nucleotide repeats (12902, 55.01%), which was followed by di-nucleotide (8132, 34.68%), tri-nucleotide (2010, 8.57%), tetra-nucleotide (401, 1.71%), and penta-nucleotide (7, 0.03%) repeats. Sixty SSRs were selected for validating and developing genic SSR markers, of them 23 showed polymorphism in a cultured population with the average observed and expected heterozygosities of 0.412 and 0.579, respectively. In this study, we established the first comprehensive transcript dataset of A. pectinata genes. Our results demonstrated that RNA-Seq is a fast and cost-effective method for genic SSR development in non-model species.

Top