Sample records for protein interaction database

  1. A Brief Review of RNA–Protein Interaction Database Resources

    PubMed Central

    Yi, Ying; Zhao, Yue; Huang, Yan; Wang, Dong

    2017-01-01

    RNA–Protein interactions play critical roles in various biological processes. By collecting and analyzing the RNA–Protein interactions and binding sites from experiments and predictions, RNA–Protein interaction databases have become an essential resource for the exploration of the transcriptional and post-transcriptional regulatory network. Here, we briefly review several widely used RNA–Protein interaction database resources developed in recent years to provide a guide of these databases. The content and major functions in databases are presented. The brief description of database helps users to quickly choose the database containing information they interested. In short, these RNA–Protein interaction database resources are continually updated, but the current state shows the efforts to identify and analyze the large amount of RNA–Protein interactions. PMID:29657278

  2. The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data.

    PubMed

    Hermjakob, Henning; Montecchi-Palazzi, Luisa; Bader, Gary; Wojcik, Jérôme; Salwinski, Lukasz; Ceol, Arnaud; Moore, Susan; Orchard, Sandra; Sarkans, Ugis; von Mering, Christian; Roechert, Bernd; Poux, Sylvain; Jung, Eva; Mersch, Henning; Kersey, Paul; Lappe, Michael; Li, Yixue; Zeng, Rong; Rana, Debashis; Nikolski, Macha; Husi, Holger; Brun, Christine; Shanker, K; Grant, Seth G N; Sander, Chris; Bork, Peer; Zhu, Weimin; Pandey, Akhilesh; Brazma, Alvis; Jacq, Bernard; Vidal, Marc; Sherman, David; Legrain, Pierre; Cesareni, Gianni; Xenarios, Ioannis; Eisenberg, David; Steipe, Boris; Hogue, Chris; Apweiler, Rolf

    2004-02-01

    A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across experiments is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community standard data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Standards Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomolecular Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Reference Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Molecular Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).

  3. SynechoNET: integrated protein-protein interaction database of a model cyanobacterium Synechocystis sp. PCC 6803.

    PubMed

    Kim, Woo-Yeon; Kang, Sungsoo; Kim, Byoung-Chul; Oh, Jeehyun; Cho, Seongwoong; Bhak, Jong; Choi, Jong-Soon

    2008-01-01

    Cyanobacteria are model organisms for studying photosynthesis, carbon and nitrogen assimilation, evolution of plant plastids, and adaptability to environmental stresses. Despite many studies on cyanobacteria, there is no web-based database of their regulatory and signaling protein-protein interaction networks to date. We report a database and website SynechoNET that provides predicted protein-protein interactions. SynechoNET shows cyanobacterial domain-domain interactions as well as their protein-level interactions using the model cyanobacterium, Synechocystis sp. PCC 6803. It predicts the protein-protein interactions using public interaction databases that contain mutually complementary and redundant data. Furthermore, SynechoNET provides information on transmembrane topology, signal peptide, and domain structure in order to support the analysis of regulatory membrane proteins. Such biological information can be queried and visualized in user-friendly web interfaces that include the interactive network viewer and search pages by keyword and functional category. SynechoNET is an integrated protein-protein interaction database designed to analyze regulatory membrane proteins in cyanobacteria. It provides a platform for biologists to extend the genomic data of cyanobacteria by predicting interaction partners, membrane association, and membrane topology of Synechocystis proteins. SynechoNET is freely available at http://synechocystis.org/ or directly at http://bioportal.kobic.kr/SynechoNET/.

  4. DOMMINO 2.0: integrating structurally resolved protein-, RNA-, and DNA-mediated macromolecular interactions

    PubMed Central

    Kuang, Xingyan; Dhroso, Andi; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry

    2016-01-01

    Macromolecular interactions are formed between proteins, DNA and RNA molecules. Being a principle building block in macromolecular assemblies and pathways, the interactions underlie most of cellular functions. Malfunctioning of macromolecular interactions is also linked to a number of diseases. Structural knowledge of the macromolecular interaction allows one to understand the interaction’s mechanism, determine its functional implications and characterize the effects of genetic variations, such as single nucleotide polymorphisms, on the interaction. Unfortunately, until now the interactions mediated by different types of macromolecules, e.g. protein–protein interactions or protein–DNA interactions, are collected into individual and unrelated structural databases. This presents a significant obstacle in the analysis of macromolecular interactions. For instance, the homogeneous structural interaction databases prevent scientists from studying structural interactions of different types but occurring in the same macromolecular complex. Here, we introduce DOMMINO 2.0, a structural Database Of Macro-Molecular INteractiOns. Compared to DOMMINO 1.0, a comprehensive database on protein-protein interactions, DOMMINO 2.0 includes the interactions between all three basic types of macromolecules extracted from PDB files. DOMMINO 2.0 is automatically updated on a weekly basis. It currently includes ∼1 040 000 interactions between two polypeptide subunits (e.g. domains, peptides, termini and interdomain linkers), ∼43 000 RNA-mediated interactions, and ∼12 000 DNA-mediated interactions. All protein structures in the database are annotated using SCOP and SUPERFAMILY family annotation. As a result, protein-mediated interactions involving protein domains, interdomain linkers, C- and N- termini, and peptides are identified. Our database provides an intuitive web interface, allowing one to investigate interactions at three different resolution levels: whole subunit network, binary interaction and interaction interface. Database URL: http://dommino.org PMID:26827237

  5. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system.

    PubMed

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    2015-11-19

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.

  6. Merging in-silico and in vitro salivary protein complex partners using the STRING database: A tutorial.

    PubMed

    Crosara, Karla Tonelli Bicalho; Moffa, Eduardo Buozi; Xiao, Yizhi; Siqueira, Walter Luiz

    2018-01-16

    Protein-protein interaction is a common physiological mechanism for protection and actions of proteins in an organism. The identification and characterization of protein-protein interactions in different organisms is necessary to better understand their physiology and to determine their efficacy. In a previous in vitro study using mass spectrometry, we identified 43 proteins that interact with histatin 1. Six previously documented interactors were confirmed and 37 novel partners were identified. In this tutorial, we aimed to demonstrate the usefulness of the STRING database for studying protein-protein interactions. We used an in-silico approach along with the STRING database (http://string-db.org/) and successfully performed a fast simulation of a novel constructed histatin 1 protein-protein network, including both the previously known and the predicted interactors, along with our newly identified interactors. Our study highlights the advantages and importance of applying bioinformatics tools to merge in-silico tactics with experimental in vitro findings for rapid advancement of our knowledge about protein-protein interactions. Our findings also indicate that bioinformatics tools such as the STRING protein network database can help predict potential interactions between proteins and thus serve as a guide for future steps in our exploration of the Human Interactome. Our study highlights the usefulness of the STRING protein database for studying protein-protein interactions. The STRING database can collect and integrate data about known and predicted protein-protein associations from many organisms, including both direct (physical) and indirect (functional) interactions, in an easy-to-use interface. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. NPIDB: Nucleic acid-Protein Interaction DataBase.

    PubMed

    Kirsanov, Dmitry D; Zanegina, Olga N; Aksianov, Evgeniy A; Spirin, Sergei A; Karyagina, Anna S; Alexeevski, Andrei V

    2013-01-01

    The Nucleic acid-Protein Interaction DataBase (http://npidb.belozersky.msu.ru/) contains information derived from structures of DNA-protein and RNA-protein complexes extracted from the Protein Data Bank (3846 complexes in October 2012). It provides a web interface and a set of tools for extracting biologically meaningful characteristics of nucleoprotein complexes. The content of the database is updated weekly. The current version of the Nucleic acid-Protein Interaction DataBase is an upgrade of the version published in 2007. The improvements include a new web interface, new tools for calculation of intermolecular interactions, a classification of SCOP families that contains DNA-binding protein domains and data on conserved water molecules on the DNA-protein interface.

  8. SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein–Protein Interactions

    PubMed Central

    Jefferson, Emily R.; Walsh, Thomas P.; Roberts, Timothy J.; Barton, Geoffrey J.

    2007-01-01

    SNAPPI-DB, a high performance database of Structures, iNterfaces and Alignments of Protein–Protein Interactions, and its associated Java Application Programming Interface (API) is described. SNAPPI-DB contains structural data, down to the level of atom co-ordinates, for each structure in the Protein Data Bank (PDB) together with associated data including SCOP, CATH, Pfam, SWISSPROT, InterPro, GO terms, Protein Quaternary Structures (PQS) and secondary structure information. Domain–domain interactions are stored for multiple domain definitions and are classified by their Superfamily/Family pair and interaction interface. Each set of classified domain–domain interactions has an associated multiple structure alignment for each partner. The API facilitates data access via PDB entries, domains and domain–domain interactions. Rapid development, fast database access and the ability to perform advanced queries without the requirement for complex SQL statements are provided via an object oriented database and the Java Data Objects (JDO) API. SNAPPI-DB contains many features which are not available in other databases of structural protein–protein interactions. It has been applied in three studies on the properties of protein–protein interactions and is currently being employed to train a protein–protein interaction predictor and a functional residue predictor. The database, API and manual are available for download at: . PMID:17202171

  9. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less

  10. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

    DOE PAGES

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    2015-11-19

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less

  11. Role for protein–protein interaction databases in human genetics

    PubMed Central

    Pattin, Kristine A; Moore, Jason H

    2010-01-01

    Proteomics and the study of protein–protein interactions are becoming increasingly important in our effort to understand human diseases on a system-wide level. Thanks to the development and curation of protein-interaction databases, up-to-date information on these interaction networks is accessible and publicly available to the scientific community. As our knowledge of protein–protein interactions increases, it is important to give thought to the different ways that these resources can impact biomedical research. In this article, we highlight the importance of protein–protein interactions in human genetics and genetic epidemiology. Since protein–protein interactions demonstrate one of the strongest functional relationships between genes, combining genomic data with available proteomic data may provide us with a more in-depth understanding of common human diseases. In this review, we will discuss some of the fundamentals of protein interactions, the databases that are publicly available and how information from these databases can be used to facilitate genome-wide genetic studies. PMID:19929610

  12. Kinase Pathway Database: An Integrated Protein-Kinase and NLP-Based Protein-Interaction Resource

    PubMed Central

    Koike, Asako; Kobayashi, Yoshiyuki; Takagi, Toshihisa

    2003-01-01

    Protein kinases play a crucial role in the regulation of cellular functions. Various kinds of information about these molecules are important for understanding signaling pathways and organism characteristics. We have developed the Kinase Pathway Database, an integrated database involving major completely sequenced eukaryotes. It contains the classification of protein kinases and their functional conservation, ortholog tables among species, protein–protein, protein–gene, and protein–compound interaction data, domain information, and structural information. It also provides an automatic pathway graphic image interface. The protein, gene, and compound interactions are automatically extracted from abstracts for all genes and proteins by natural-language processing (NLP).The method of automatic extraction uses phrase patterns and the GENA protein, gene, and compound name dictionary, which was developed by our group. With this database, pathways are easily compared among species using data with more than 47,000 protein interactions and protein kinase ortholog tables. The database is available for querying and browsing at http://kinasedb.ontology.ims.u-tokyo.ac.jp/. PMID:12799355

  13. HitPredict version 4: comprehensive reliability scoring of physical protein-protein interactions from more than 100 species.

    PubMed

    López, Yosvany; Nakai, Kenta; Patil, Ashwini

    2015-01-01

    HitPredict is a consolidated resource of experimentally identified, physical protein-protein interactions with confidence scores to indicate their reliability. The study of genes and their inter-relationships using methods such as network and pathway analysis requires high quality protein-protein interaction information. Extracting reliable interactions from most of the existing databases is challenging because they either contain only a subset of the available interactions, or a mixture of physical, genetic and predicted interactions. Automated integration of interactions is further complicated by varying levels of accuracy of database content and lack of adherence to standard formats. To address these issues, the latest version of HitPredict provides a manually curated dataset of 398 696 physical associations between 70 808 proteins from 105 species. Manual confirmation was used to resolve all issues encountered during data integration. For improved reliability assessment, this version combines a new score derived from the experimental information of the interactions with the original score based on the features of the interacting proteins. The combined interaction score performs better than either of the individual scores in HitPredict as well as the reliability score of another similar database. HitPredict provides a web interface to search proteins and visualize their interactions, and the data can be downloaded for offline analysis. Data usability has been enhanced by mapping protein identifiers across multiple reference databases. Thus, the latest version of HitPredict provides a significantly larger, more reliable and usable dataset of protein-protein interactions from several species for the study of gene groups. Database URL: http://hintdb.hgc.jp/htp. © The Author(s) 2015. Published by Oxford University Press.

  14. iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence

    PubMed Central

    Turner, Brian; Razick, Sabry; Turinsky, Andrei L.; Vlasblom, James; Crowdy, Edgard K.; Cho, Emerson; Morrison, Kyle; Wodak, Shoshana J.

    2010-01-01

    We present iRefWeb, a web interface to protein interaction data consolidated from 10 public databases: BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPact, MPPI and OPHID. iRefWeb enables users to examine aggregated interactions for a protein of interest, and presents various statistical summaries of the data across databases, such as the number of organism-specific interactions, proteins and cited publications. Through links to source databases and supporting evidence, researchers may gauge the reliability of an interaction using simple criteria, such as the detection methods, the scale of the study (high- or low-throughput) or the number of cited publications. Furthermore, iRefWeb compares the information extracted from the same publication by different databases, and offers means to follow-up possible inconsistencies. We provide an overview of the consolidated protein–protein interaction landscape and show how it can be automatically cropped to aid the generation of meaningful organism-specific interactomes. iRefWeb can be accessed at: http://wodaklab.org/iRefWeb. Database URL: http://wodaklab.org/iRefWeb/ PMID:20940177

  15. A Novel Biclustering Approach to Association Rule Mining for Predicting HIV-1–Human Protein Interactions

    PubMed Central

    Mukhopadhyay, Anirban; Maulik, Ujjwal; Bandyopadhyay, Sanghamitra

    2012-01-01

    Identification of potential viral-host protein interactions is a vital and useful approach towards development of new drugs targeting those interactions. In recent days, computational tools are being utilized for predicting viral-host interactions. Recently a database containing records of experimentally validated interactions between a set of HIV-1 proteins and a set of human proteins has been published. The problem of predicting new interactions based on this database is usually posed as a classification problem. However, posing the problem as a classification one suffers from the lack of biologically validated negative interactions. Therefore it will be beneficial to use the existing database for predicting new viral-host interactions without the need of negative samples. Motivated by this, in this article, the HIV-1–human protein interaction database has been analyzed using association rule mining. The main objective is to identify a set of association rules both among the HIV-1 proteins and among the human proteins, and use these rules for predicting new interactions. In this regard, a novel association rule mining technique based on biclustering has been proposed for discovering frequent closed itemsets followed by the association rules from the adjacency matrix of the HIV-1–human interaction network. Novel HIV-1–human interactions have been predicted based on the discovered association rules and tested for biological significance. For validation of the predicted new interactions, gene ontology-based and pathway-based studies have been performed. These studies show that the human proteins which are predicted to interact with a particular viral protein share many common biological activities. Moreover, literature survey has been used for validation purpose to identify some predicted interactions that are already validated experimentally but not present in the database. Comparison with other prediction methods is also discussed. PMID:22539940

  16. RAIN: RNA–protein Association and Interaction Networks

    PubMed Central

    Junge, Alexander; Refsgaard, Jan C.; Garde, Christian; Pan, Xiaoyong; Santos, Alberto; Alkan, Ferhat; Anthon, Christian; von Mering, Christian; Workman, Christopher T.; Jensen, Lars Juhl; Gorodkin, Jan

    2017-01-01

    Protein association networks can be inferred from a range of resources including experimental data, literature mining and computational predictions. These types of evidence are emerging for non-coding RNAs (ncRNAs) as well. However, integration of ncRNAs into protein association networks is challenging due to data heterogeneity. Here, we present a database of ncRNA–RNA and ncRNA–protein interactions and its integration with the STRING database of protein–protein interactions. These ncRNA associations cover four organisms and have been established from curated examples, experimental data, interaction predictions and automatic literature mining. RAIN uses an integrative scoring scheme to assign a confidence score to each interaction. We demonstrate that RAIN outperforms the underlying microRNA-target predictions in inferring ncRNA interactions. RAIN can be operated through an easily accessible web interface and all interaction data can be downloaded. Database URL: http://rth.dk/resources/rain PMID:28077569

  17. The BioGRID interaction database: 2017 update

    PubMed Central

    Chatr-aryamontri, Andrew; Oughtred, Rose; Boucher, Lorrie; Rust, Jennifer; Chang, Christie; Kolas, Nadine K.; O'Donnell, Lara; Oster, Sara; Theesfeld, Chandra; Sellam, Adnane; Stark, Chris; Breitkreutz, Bobby-Joe; Dolinski, Kara; Tyers, Mike

    2017-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the annotation and archival of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2016 (build 3.4.140), the BioGRID contains 1 072 173 genetic and protein interactions, and 38 559 post-translational modifications, as manually annotated from 48 114 publications. This dataset represents interaction records for 66 model organisms and represents a 30% increase compared to the previous 2015 BioGRID update. BioGRID curates the biomedical literature for major model organism species, including humans, with a recent emphasis on central biological processes and specific human diseases. To facilitate network-based approaches to drug discovery, BioGRID now incorporates 27 501 chemical–protein interactions for human drug targets, as drawn from the DrugBank database. A new dynamic interaction network viewer allows the easy navigation and filtering of all genetic and protein interaction data, as well as for bioactive compounds and their established targets. BioGRID data are directly downloadable without restriction in a variety of standardized formats and are freely distributed through partner model organism databases and meta-databases. PMID:27980099

  18. Integrating In Silico Resources to Map a Signaling Network

    PubMed Central

    Liu, Hanqing; Beck, Tim N.; Golemis, Erica A.; Serebriiskii, Ilya G.

    2013-01-01

    The abundance of publicly available life science databases offer a wealth of information that can support interpretation of experimentally derived data and greatly enhance hypothesis generation. Protein interaction and functional networks are not simply new renditions of existing data: they provide the opportunity to gain insights into the specific physical and functional role a protein plays as part of the biological system. In this chapter, we describe different in silico tools that can quickly and conveniently retrieve data from existing data repositories and discuss how the available tools are best utilized for different purposes. While emphasizing protein-protein interaction databases (e.g., BioGrid and IntAct), we also introduce metasearch platforms such as STRING and GeneMANIA, pathway databases (e.g., BioCarta and Pathway Commons), text mining approaches (e.g., PubMed and Chilibot), and resources for drug-protein interactions, genetic information for model organisms and gene expression information based on microarray data mining. Furthermore, we provide a simple step-by-step protocol to building customized protein-protein interaction networks in Cytoscape, a powerful network assembly and visualization program, integrating data retrieved from these various databases. As we illustrate, generation of composite interaction networks enables investigators to extract significantly more information about a given biological system than utilization of a single database or sole reliance on primary literature. PMID:24233784

  19. Integrated web visualizations for protein-protein interaction databases.

    PubMed

    Jeanquartier, Fleur; Jean-Quartier, Claire; Holzinger, Andreas

    2015-06-16

    Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. We selected M=10 out of N=53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. The supplementary table can be accessed on http://tinyurl.com/PPI-DB-Comparison-2015. Only some web resources featuring graph visualization can be successfully applied to interactive visual analysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization integration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive feature and visualization maturing.

  20. An ontology-based search engine for protein-protein interactions

    PubMed Central

    2010-01-01

    Background Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. Results We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. Conclusion Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology. PMID:20122195

  1. An ontology-based search engine for protein-protein interactions.

    PubMed

    Park, Byungkyu; Han, Kyungsook

    2010-01-18

    Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology.

  2. A combined database related and de novo MS-identification of yeast mannose-1-phosphate guanyltransferase PSA1 interaction partners at different phases of batch cultivation

    NASA Astrophysics Data System (ADS)

    Parviainen, Ville; Joenväärä, Sakari; Peltoniemi, Hannu; Mattila, Pirkko; Renkonen, Risto

    2009-04-01

    Mass spectrometry-based proteomic research has become one of the main methods in protein-protein interaction research. Several high throughput studies have established an interaction landscape of exponentially growing Baker's yeast culture. However, many of the protein-protein interactions are likely to change in different environmental conditions. In order to examine the dynamic nature of the protein interactions we isolated the protein complexes of mannose-1-phosphate guanyltransferase PSA1 from Saccharomyces cerevisiae at four different time points during batch cultivation. We used the tandem affinity purification (TAP)-method to purify the complexes and subjected the tryptic peptides to LC-MS/MS. The resulting peak lists were analyzed with two different methods: the database related protein identification program X!Tandem and the de novo sequencing program Lutefisk. We observed significant changes in the interactome of PSA1 during the batch cultivation and identified altogether 74 proteins interacting with PSA1 of which only six were found to interact during all time points. All the other proteins showed a more dynamic nature of binding activity. In this study we also demonstrate the benefit of using both database related and de novo methods in the protein interaction research to enhance both the quality and the quantity of observations.

  3. hPDI: a database of experimental human protein-DNA interactions.

    PubMed

    Xie, Zhi; Hu, Shaohui; Blackshaw, Seth; Zhu, Heng; Qian, Jiang

    2010-01-15

    The human protein DNA Interactome (hPDI) database holds experimental protein-DNA interaction data for humans identified by protein microarray assays. The unique characteristics of hPDI are that it contains consensus DNA-binding sequences not only for nearly 500 human transcription factors but also for >500 unconventional DNA-binding proteins, which are completely uncharacterized previously. Users can browse, search and download a subset or the entire data via a web interface. This database is freely accessible for any academic purposes. http://bioinfo.wilmer.jhu.edu/PDI/.

  4. sc-PDB-Frag: a database of protein-ligand interaction patterns for Bioisosteric replacements.

    PubMed

    Desaphy, Jérémy; Rognan, Didier

    2014-07-28

    Bioisosteric replacement plays an important role in medicinal chemistry by keeping the biological activity of a molecule while changing either its core scaffold or substituents, thereby facilitating lead optimization and patenting. Bioisosteres are classically chosen in order to keep the main pharmacophoric moieties of the substructure to replace. However, notably when changing a scaffold, no attention is usually paid as whether all atoms of the reference scaffold are equally important for binding to the desired target. We herewith propose a novel database for bioisosteric replacement (scPDBFrag), capitalizing on our recently published structure-based approach to scaffold hopping, focusing on interaction pattern graphs. Protein-bound ligands are first fragmented and the interaction of the corresponding fragments with their protein environment computed-on-the-fly. Using an in-house developed graph alignment tool, interaction patterns graphs can be compared, aligned, and sorted by decreasing similarity to any reference. In the herein presented sc-PDB-Frag database ( http://bioinfo-pharma.u-strasbg.fr/scPDBFrag ), fragments, interaction patterns, alignments, and pairwise similarity scores have been extracted from the sc-PDB database of 8077 druggable protein-ligand complexes and further stored in a relational database. We herewith present the database, its Web implementation, and procedures for identifying true bioisosteric replacements based on conserved interaction patterns.

  5. The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases.

    PubMed

    Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning

    2007-10-18

    Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at http://www.ebi.ac.uk/Tools/picr.

  6. The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases

    PubMed Central

    Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning

    2007-01-01

    Background Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. Results We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. Conclusion We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at . PMID:17945017

  7. SpirPro: A Spirulina proteome database and web-based tools for the analysis of protein-protein interactions at the metabolic level in Spirulina (Arthrospira) platensis C1.

    PubMed

    Senachak, Jittisak; Cheevadhanarak, Supapon; Hongsthong, Apiradee

    2015-07-29

    Spirulina (Arthrospira) platensis is the only cyanobacterium that in addition to being studied at the molecular level and subjected to gene manipulation, can also be mass cultivated in outdoor ponds for commercial use as a food supplement. Thus, encountering environmental changes, including temperature stresses, is common during the mass production of Spirulina. The use of cyanobacteria as an experimental platform, especially for photosynthetic gene manipulation in plants and bacteria, is becoming increasingly important. Understanding the mechanisms and protein-protein interaction networks that underlie low- and high-temperature responses is relevant to Spirulina mass production. To accomplish this goal, high-throughput techniques such as OMICs analyses are used. Thus, large datasets must be collected, managed and subjected to information extraction. Therefore, databases including (i) proteomic analysis and protein-protein interaction (PPI) data and (ii) domain/motif visualization tools are required for potential use in temperature response models for plant chloroplasts and photosynthetic bacteria. A web-based repository was developed including an embedded database, SpirPro, and tools for network visualization. Proteome data were analyzed integrated with protein-protein interactions and/or metabolic pathways from KEGG. The repository provides various information, ranging from raw data (2D-gel images) to associated results, such as data from interaction and/or pathway analyses. This integration allows in silico analyses of protein-protein interactions affected at the metabolic level and, particularly, analyses of interactions between and within the affected metabolic pathways under temperature stresses for comparative proteomic analysis. The developed tool, which is coded in HTML with CSS/JavaScript and depicted in Scalable Vector Graphics (SVG), is designed for interactive analysis and exploration of the constructed network. SpirPro is publicly available on the web at http://spirpro.sbi.kmutt.ac.th . SpirPro is an analysis platform containing an integrated proteome and PPI database that provides the most comprehensive data on this cyanobacterium at the systematic level. As an integrated database, SpirPro can be applied in various analyses, such as temperature stress response networking analysis in cyanobacterial models and interacting domain-domain analysis between proteins of interest.

  8. DB-PABP: a database of polyanion-binding proteins

    PubMed Central

    Fang, Jianwen; Dong, Yinghua; Salamat-Miller, Nazila; Russell Middaugh, C.

    2008-01-01

    The interactions between polyanions (PAs) and polyanion-binding proteins (PABPs) have been found to play significant roles in many essential biological processes including intracellular organization, transport and protein folding. Furthermore, many neurodegenerative disease-related proteins are PABPs. Thus, a better understanding of PA/PABP interactions may not only enhance our understandings of biological systems but also provide new clues to these deadly diseases. The literature in this field is widely scattered, suggesting the need for a comprehensive and searchable database of PABPs. The DB-PABP is a comprehensive, manually curated and searchable database of experimentally characterized PABPs. It is freely available and can be accessed online at http://pabp.bcf.ku.edu/DB_PABP/. The DB-PABP was implemented as a MySQL relational database. An interactive web interface was created using Java Server Pages (JSP). The search page of the database is organized into a main search form and a section for utilities. The main search form enables custom searches via four menus: protein names, polyanion names, the source species of the proteins and the methods used to discover the interactions. Available utilities include a commonality matrix, a function of listing PABPs by the number of interacting polyanions and a string search for author surnames. The DB-PABP is maintained at the University of Kansas. We encourage users to provide feedback and submit new data and references. PMID:17916573

  9. DB-PABP: a database of polyanion-binding proteins.

    PubMed

    Fang, Jianwen; Dong, Yinghua; Salamat-Miller, Nazila; Middaugh, C Russell

    2008-01-01

    The interactions between polyanions (PAs) and polyanion-binding proteins (PABPs) have been found to play significant roles in many essential biological processes including intracellular organization, transport and protein folding. Furthermore, many neurodegenerative disease-related proteins are PABPs. Thus, a better understanding of PA/PABP interactions may not only enhance our understandings of biological systems but also provide new clues to these deadly diseases. The literature in this field is widely scattered, suggesting the need for a comprehensive and searchable database of PABPs. The DB-PABP is a comprehensive, manually curated and searchable database of experimentally characterized PABPs. It is freely available and can be accessed online at http://pabp.bcf.ku.edu/DB_PABP/. The DB-PABP was implemented as a MySQL relational database. An interactive web interface was created using Java Server Pages (JSP). The search page of the database is organized into a main search form and a section for utilities. The main search form enables custom searches via four menus: protein names, polyanion names, the source species of the proteins and the methods used to discover the interactions. Available utilities include a commonality matrix, a function of listing PABPs by the number of interacting polyanions and a string search for author surnames. The DB-PABP is maintained at the University of Kansas. We encourage users to provide feedback and submit new data and references.

  10. PARPs database: A LIMS systems for protein-protein interaction data mining or laboratory information management system

    PubMed Central

    Droit, Arnaud; Hunter, Joanna M; Rouleau, Michèle; Ethier, Chantal; Picard-Cloutier, Aude; Bourgais, David; Poirier, Guy G

    2007-01-01

    Background In the "post-genome" era, mass spectrometry (MS) has become an important method for the analysis of proteins and the rapid advancement of this technique, in combination with other proteomics methods, results in an increasing amount of proteome data. This data must be archived and analysed using specialized bioinformatics tools. Description We herein describe "PARPs database," a data analysis and management pipeline for liquid chromatography tandem mass spectrometry (LC-MS/MS) proteomics. PARPs database is a web-based tool whose features include experiment annotation, protein database searching, protein sequence management, as well as data-mining of the peptides and proteins identified. Conclusion Using this pipeline, we have successfully identified several interactions of biological significance between PARP-1 and other proteins, namely RFC-1, 2, 3, 4 and 5. PMID:18093328

  11. HPIminer: A text mining system for building and visualizing human protein interaction networks and pathways.

    PubMed

    Subramani, Suresh; Kalpana, Raja; Monickaraj, Pankaj Moses; Natarajan, Jeyakumar

    2015-04-01

    The knowledge on protein-protein interactions (PPI) and their related pathways are equally important to understand the biological functions of the living cell. Such information on human proteins is highly desirable to understand the mechanism of several diseases such as cancer, diabetes, and Alzheimer's disease. Because much of that information is buried in biomedical literature, an automated text mining system for visualizing human PPI and pathways is highly desirable. In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature. HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG. To our knowledge, HPIminer is the first system to build interaction networks from literature as well as curated databases. Further, the new interactions mined only from literature and not reported earlier in databases are highlighted as new. A comparative study with other similar tools shows that the resultant network is more informative and provides additional information on interacting proteins and their associated networks. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Protein-protein interaction predictions using text mining methods.

    PubMed

    Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Iliopoulos, Ioannis

    2015-03-01

    It is beyond any doubt that proteins and their interactions play an essential role in most complex biological processes. The understanding of their function individually, but also in the form of protein complexes is of a great importance. Nowadays, despite the plethora of various high-throughput experimental approaches for detecting protein-protein interactions, many computational methods aiming to predict new interactions have appeared and gained interest. In this review, we focus on text-mining based computational methodologies, aiming to extract information for proteins and their interactions from public repositories such as literature and various biological databases. We discuss their strengths, their weaknesses and how they complement existing experimental techniques by simultaneously commenting on the biological databases which hold such information and the benchmark datasets that can be used for evaluating new tools. Copyright © 2014 Elsevier Inc. All rights reserved.

  13. PICKLE 2.0: A human protein-protein interaction meta-database employing data integration via genetic information ontology

    PubMed Central

    Gioutlakis, Aris; Klapa, Maria I.

    2017-01-01

    It has been acknowledged that source databases recording experimentally supported human protein-protein interactions (PPIs) exhibit limited overlap. Thus, the reconstruction of a comprehensive PPI network requires appropriate integration of multiple heterogeneous primary datasets, presenting the PPIs at various genetic reference levels. Existing PPI meta-databases perform integration via normalization; namely, PPIs are merged after converted to a certain target level. Hence, the node set of the integrated network depends each time on the number and type of the combined datasets. Moreover, the irreversible a priori normalization process hinders the identification of normalization artifacts in the integrated network, which originate from the nonlinearity characterizing the genetic information flow. PICKLE (Protein InteraCtion KnowLedgebasE) 2.0 implements a new architecture for this recently introduced human PPI meta-database. Its main novel feature over the existing meta-databases is its approach to primary PPI dataset integration via genetic information ontology. Building upon the PICKLE principles of using the reviewed human complete proteome (RHCP) of UniProtKB/Swiss-Prot as the reference protein interactor set, and filtering out protein interactions with low probability of being direct based on the available evidence, PICKLE 2.0 first assembles the RHCP genetic information ontology network by connecting the corresponding genes, nucleotide sequences (mRNAs) and proteins (UniProt entries) and then integrates PPI datasets by superimposing them on the ontology network without any a priori transformations. Importantly, this process allows the resulting heterogeneous integrated network to be reversibly normalized to any level of genetic reference without loss of the original information, the latter being used for identification of normalization biases, and enables the appraisal of potential false positive interactions through PPI source database cross-checking. The PICKLE web-based interface (www.pickle.gr) allows for the simultaneous query of multiple entities and provides integrated human PPI networks at either the protein (UniProt) or the gene level, at three PPI filtering modes. PMID:29023571

  14. Interactome of the hepatitis C virus: Literature mining with ANDSystem.

    PubMed

    Saik, Olga V; Ivanisenko, Timofey V; Demenkov, Pavel S; Ivanisenko, Vladimir A

    2016-06-15

    A study of the molecular genetics mechanisms of host-pathogen interactions is of paramount importance in developing drugs against viral diseases. Currently, the literature contains a huge amount of information that describes interactions between HCV and human proteins. In addition, there are many factual databases that contain experimentally verified data on HCV-host interactions. The sources of such data are the original data along with the data manually extracted from the literature. However, the manual analysis of scientific publications is time consuming and, because of this, databases created with such an approach often do not have complete information. One of the most promising methods to provide actualisation and completeness of information is text mining. Here, with the use of a previously developed method by the authors using ANDSystem, an automated extraction of information on the interactions between HCV and human proteins was conducted. As a data source for the text mining approach, PubMed abstracts and full text articles were used. Additionally, external factual databases were analyzed. On the basis of this analysis, a special version of ANDSystem, extended with the HCV interactome, was created. The HCV interactome contains information about the interactions between 969 human and 11 HCV proteins. Among the 969 proteins, 153 'new' proteins were found not previously referred to in any external databases of protein-protein interactions for HCV-host interactions. Thus, the extended ANDSystem possesses a more comprehensive detailing of HCV-host interactions versus other existing databases. It was interesting that HCV proteins more preferably interact with human proteins that were already involved in a large number of protein-protein interactions as well as those associated with many diseases. Among human proteins of the HCV interactome, there were a large number of proteins regulated by microRNAs. It turned out that the results obtained for protein-protein interactions and microRNA-regulation did not depend on how well the proteins were studied, while protein-disease interactions appeared to be dependent on the level of study. In particular, the mean number of diseases linked to well-studied proteins (proteins were considered well-studied if they were mentioned in 50 or more PubMed publications) from the HCV interactome was 20.8, significantly exceeding the mean number of associations with diseases (10.1) for the total set of well-studied human proteins present in ANDSystem. For proteins not highly poorly-studied investigated, proteins from the HCV interactome (each protein was referred to in less than 50 publications) distribution of the number of diseases associated with them had no statistically significant differences from the distribution of the number of diseases associated with poorly-studied proteins based on the total set of human proteins stored in ANDSystem. With this, the average number of associations with diseases for the HCV interactome and the total set of human proteins were 0.3 and 0.2, respectively. Thus, ANDSystem, extended with the HCV interactome, can be helpful in a wide range of issues related to analyzing HCV-host interactions in the search for anti-HCV drug targets. The demo version of the extended ANDSystem covered here containing only interactions between human proteins, genes, metabolites, diseases, miRNAs and molecular-genetic pathways, as well as interactions between human proteins/genes and HCV proteins, is freely available at the following web address: http://www-bionet.sscc.ru/psd/andhcv/. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  15. Protein-protein interaction networks: unraveling the wiring of molecular machines within the cell.

    PubMed

    De Las Rivas, Javier; Fontanillo, Celia

    2012-11-01

    Mapping and understanding of the protein interaction networks with their key modules and hubs can provide deeper insights into the molecular machinery underlying complex phenotypes. In this article, we present the basic characteristics and definitions of protein networks, starting with a distinction of the different types of associations between proteins. We focus the review on protein-protein interactions (PPIs), a subset of associations defined as physical contacts between proteins that occur by selective molecular docking in a particular biological context. We present such definition as opposed to other types of protein associations derived from regulatory, genetic, structural or functional relations. To determine PPIs, a variety of binary and co-complex methods exist; however, not all the technologies provide the same information and data quality. A way of increasing confidence in a given protein interaction is to integrate orthogonal experimental evidences. The use of several complementary methods testing each single interaction assesses the accuracy of PPI data and tries to minimize the occurrence of false interactions. Following this approach there have been important efforts to unify primary databases of experimentally proven PPIs into integrated databases. These meta-databases provide a measure of the confidence of interactions based on the number of experimental proofs that report them. As a conclusion, we can state that integrated information allows the building of more reliable interaction networks. Identification of communities, cliques, modules and hubs by analysing the topological parameters and graph properties of the protein networks allows the discovery of central/critical nodes, which are candidates to regulate cellular flux and dynamics.

  16. Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases.

    PubMed

    Berger, Seth I; Posner, Jeremy M; Ma'ayan, Avi

    2007-10-04

    In recent years, mammalian protein-protein interaction network databases have been developed. The interactions in these databases are either extracted manually from low-throughput experimental biomedical research literature, extracted automatically from literature using techniques such as natural language processing (NLP), generated experimentally using high-throughput methods such as yeast-2-hybrid screens, or interactions are predicted using an assortment of computational approaches. Genes or proteins identified as significantly changing in proteomic experiments, or identified as susceptibility disease genes in genomic studies, can be placed in the context of protein interaction networks in order to assign these genes and proteins to pathways and protein complexes. Genes2Networks is a software system that integrates the content of ten mammalian interaction network datasets. Filtering techniques to prune low-confidence interactions were implemented. Genes2Networks is delivered as a web-based service using AJAX. The system can be used to extract relevant subnetworks created from "seed" lists of human Entrez gene symbols. The output includes a dynamic linkable three color web-based network map, with a statistical analysis report that identifies significant intermediate nodes used to connect the seed list. Genes2Networks is powerful web-based software that can help experimental biologists to interpret lists of genes and proteins such as those commonly produced through genomic and proteomic experiments, as well as lists of genes and proteins associated with disease processes. This system can be used to find relationships between genes and proteins from seed lists, and predict additional genes or proteins that may play key roles in common pathways or protein complexes.

  17. UbSRD: The Ubiquitin Structural Relational Database.

    PubMed

    Harrison, Joseph S; Jacobs, Tim M; Houlihan, Kevin; Van Doorslaer, Koenraad; Kuhlman, Brian

    2016-02-22

    The structurally defined ubiquitin-like homology fold (UBL) can engage in several unique protein-protein interactions and many of these complexes have been characterized with high-resolution techniques. Using Rosetta's structural classification tools, we have created the Ubiquitin Structural Relational Database (UbSRD), an SQL database of features for all 509 UBL-containing structures in the PDB, allowing users to browse these structures by protein-protein interaction and providing a platform for quantitative analysis of structural features. We used UbSRD to define the recognition features of ubiquitin (UBQ) and SUMO observed in the PDB and the orientation of the UBQ tail while interacting with certain types of proteins. While some of the interaction surfaces on UBQ and SUMO overlap, each molecule has distinct features that aid in molecular discrimination. Additionally, we find that the UBQ tail is malleable and can adopt a variety of conformations upon binding. UbSRD is accessible as an online resource at rosettadesign.med.unc.edu/ubsrd. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. SInCRe—structural interactome computational resource for Mycobacterium tuberculosis

    PubMed Central

    Metri, Rahul; Hariharaputran, Sridhar; Ramakrishnan, Gayatri; Anand, Praveen; Raghavender, Upadhyayula S.; Ochoa-Montaño, Bernardo; Higueruelo, Alicia P.; Sowdhamini, Ramanathan; Chandra, Nagasuma R.; Blundell, Tom L.; Srinivasan, Narayanaswamy

    2015-01-01

    We have developed an integrated database for Mycobacterium tuberculosis H37Rv (Mtb) that collates information on protein sequences, domain assignments, functional annotation and 3D structural information along with protein–protein and protein–small molecule interactions. SInCRe (Structural Interactome Computational Resource) is developed out of CamBan (Cambridge and Bangalore) collaboration. The motivation for development of this database is to provide an integrated platform to allow easily access and interpretation of data and results obtained by all the groups in CamBan in the field of Mtb informatics. In-house algorithms and databases developed independently by various academic groups in CamBan are used to generate Mtb-specific datasets and are integrated in this database to provide a structural dimension to studies on tuberculosis. The SInCRe database readily provides information on identification of functional domains, genome-scale modelling of structures of Mtb proteins and characterization of the small-molecule binding sites within Mtb. The resource also provides structure-based function annotation, information on small-molecule binders including FDA (Food and Drug Administration)-approved drugs, protein–protein interactions (PPIs) and natural compounds that bind to pathogen proteins potentially and result in weakening or elimination of host–pathogen protein–protein interactions. Together they provide prerequisites for identification of off-target binding. Database URL: http://proline.biochem.iisc.ernet.in/sincre PMID:26130660

  19. A protein relational database and protein family knowledge bases to facilitate structure-based design analyses.

    PubMed

    Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine

    2010-08-01

    The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.

  20. MIPS: analysis and annotation of proteins from whole genomes in 2005.

    PubMed

    Mewes, H W; Frishman, D; Mayer, K F X; Münsterkötter, M; Noubibou, O; Pagel, P; Rattei, T; Oesterheld, M; Ruepp, A; Stümpflen, V

    2006-01-01

    The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein-protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.gsf.de).

  1. DBSecSys 2.0: a database of Burkholderia mallei and Burkholderia pseudomallei secretion systems.

    PubMed

    Memišević, Vesna; Kumar, Kamal; Zavaljevski, Nela; DeShazer, David; Wallqvist, Anders; Reifman, Jaques

    2016-09-20

    Burkholderia mallei and B. pseudomallei are the causative agents of glanders and melioidosis, respectively, diseases with high morbidity and mortality rates. B. mallei and B. pseudomallei are closely related genetically; B. mallei evolved from an ancestral strain of B. pseudomallei by genome reduction and adaptation to an obligate intracellular lifestyle. Although these two bacteria cause different diseases, they share multiple virulence factors, including bacterial secretion systems, which represent key components of bacterial pathogenicity. Despite recent progress, the secretion system proteins for B. mallei and B. pseudomallei, their pathogenic mechanisms of action, and host factors are not well characterized. We previously developed a manually curated database, DBSecSys, of bacterial secretion system proteins for B. mallei. Here, we report an expansion of the database with corresponding information about B. pseudomallei. DBSecSys 2.0 contains comprehensive literature-based and computationally derived information about B. mallei ATCC 23344 and literature-based and computationally derived information about B. pseudomallei K96243. The database contains updated information for 163 B. mallei proteins from the previous database and 61 additional B. mallei proteins, and new information for 281 B. pseudomallei proteins associated with 5 secretion systems, their 1,633 human- and murine-interacting targets, and 2,400 host-B. mallei interactions and 2,286 host-B. pseudomallei interactions. The database also includes information about 13 pathogenic mechanisms of action for B. mallei and B. pseudomallei secretion system proteins inferred from the available literature or computationally. Additionally, DBSecSys 2.0 provides details about 82 virulence attenuation experiments for 52 B. mallei secretion system proteins and 98 virulence attenuation experiments for 61 B. pseudomallei secretion system proteins. We updated the Web interface and data access layer to speed-up users' search of detailed information for orthologous proteins related to secretion systems of the two pathogens. The updates of DBSecSys 2.0 provide unique capabilities to access comprehensive information about secretion systems of B. mallei and B. pseudomallei. They enable studies and comparisons of corresponding proteins of these two closely related pathogens and their host-interacting partners. The database is available at http://dbsecsys.bhsai.org .

  2. Text mining for metabolic pathways, signaling cascades, and protein networks.

    PubMed

    Hoffmann, Robert; Krallinger, Martin; Andres, Eduardo; Tamames, Javier; Blaschke, Christian; Valencia, Alfonso

    2005-05-10

    The complexity of the information stored in databases and publications on metabolic and signaling pathways, the high throughput of experimental data, and the growing number of publications make it imperative to provide systems to help the researcher navigate through these interrelated information resources. Text-mining methods have started to play a key role in the creation and maintenance of links between the information stored in biological databases and its original sources in the literature. These links will be extremely useful for database updating and curation, especially if a number of technical problems can be solved satisfactorily, including the identification of protein and gene names (entities in general) and the characterization of their types of interactions. The first generation of openly accessible text-mining systems, such as iHOP (Information Hyperlinked over Proteins), provides additional functions to facilitate the reconstruction of protein interaction networks, combine database and text information, and support the scientist in the formulation of novel hypotheses. The next challenge is the generation of comprehensive information regarding the general function of signaling pathways and protein interaction networks.

  3. DBSecSys: a database of Burkholderia mallei secretion systems.

    PubMed

    Memišević, Vesna; Kumar, Kamal; Cheng, Li; Zavaljevski, Nela; DeShazer, David; Wallqvist, Anders; Reifman, Jaques

    2014-07-16

    Bacterial pathogenicity represents a major public health concern worldwide. Secretion systems are a key component of bacterial pathogenicity, as they provide the means for bacterial proteins to penetrate host-cell membranes and insert themselves directly into the host cells' cytosol. Burkholderia mallei is a Gram-negative bacterium that uses multiple secretion systems during its host infection life cycle. To date, the identities of secretion system proteins for B. mallei are not well known, and their pathogenic mechanisms of action and host factors are largely uncharacterized. We present the Database of Burkholderia malleiSecretion Systems (DBSecSys), a compilation of manually curated and computationally predicted bacterial secretion system proteins and their host factors. Currently, DBSecSys contains comprehensive experimentally and computationally derived information about B. mallei strain ATCC 23344. The database includes 143 B. mallei proteins associated with five secretion systems, their 1,635 human and murine interacting targets, and the corresponding 2,400 host-B. mallei interactions. The database also includes information about 10 pathogenic mechanisms of action for B. mallei secretion system proteins inferred from the available literature. Additionally, DBSecSys provides details about 42 virulence attenuation experiments for 27 B. mallei secretion system proteins. Users interact with DBSecSys through a Web interface that allows for data browsing, querying, visualizing, and downloading. DBSecSys provides a comprehensive, systematically organized resource of experimental and computational data associated with B. mallei secretion systems. It provides the unique ability to study secretion systems not only through characterization of their corresponding pathogen proteins, but also through characterization of their host-interacting partners.The database is available at https://applications.bhsai.org/dbsecsys.

  4. GALT protein database: querying structural and functional features of GALT enzyme.

    PubMed

    d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

    2014-09-01

    Knowledge of the impact of variations on protein structure can enhance the comprehension of the mechanisms of genetic diseases related to that protein. Here, we present a new version of GALT Protein Database, a Web-accessible data repository for the storage and interrogation of structural effects of variations of the enzyme galactose-1-phosphate uridylyltransferase (GALT), the impairment of which leads to classic Galactosemia, a rare genetic disease. This new version of this database now contains the models of 201 missense variants of GALT enzyme, including heterozygous variants, and it allows users not only to retrieve information about the missense variations affecting this protein, but also to investigate their impact on substrate binding, intersubunit interactions, stability, and other structural features. In addition, it allows the interactive visualization of the models of variants collected into the database. We have developed additional tools to improve the use of the database by nonspecialized users. This Web-accessible database (http://bioinformatica.isa.cnr.it/GALT/GALT2.0) represents a model of tools potentially suitable for application to other proteins that are involved in human pathologies and that are subjected to genetic variations. © 2014 WILEY PERIODICALS, INC.

  5. An updated version of NPIDB includes new classifications of DNA–protein complexes and their families

    PubMed Central

    Zanegina, Olga; Kirsanov, Dmitriy; Baulin, Eugene; Karyagina, Anna; Alexeevski, Andrei; Spirin, Sergey

    2016-01-01

    The recent upgrade of nucleic acid–protein interaction database (NPIDB, http://npidb.belozersky.msu.ru/) includes a newly elaborated classification of complexes of protein domains with double-stranded DNA and a classification of families of related complexes. Our classifications are based on contacting structural elements of both DNA: the major groove, the minor groove and the backbone; and protein: helices, beta-strands and unstructured segments. We took into account both hydrogen bonds and hydrophobic interaction. The analyzed material contains 1942 structures of protein domains from 748 PDB entries. We have identified 97 interaction modes of individual protein domain–DNA complexes and 17 DNA–protein interaction classes of protein domain families. We analyzed the sources of diversity of DNA–protein interaction modes in different complexes of one protein domain family. The observed interaction mode is sometimes influenced by artifacts of crystallization or diversity in secondary structure assignment. The interaction classes of domain families are more stable and thus possess more biological sense than a classification of single complexes. Integration of the classification into NPIDB allows the user to browse the database according to the interacting structural elements of DNA and protein molecules. For each family, we present average DNA shape parameters in contact zones with domains of the family. PMID:26656949

  6. MEGADOCK-Web: an integrated database of high-throughput structure-based protein-protein interaction predictions.

    PubMed

    Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka

    2018-05-08

    Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on docking calculations with biochemical pathways and enables users to easily and quickly assess PPI feasibilities by archiving PPI predictions. MEGADOCK-Web also promotes the discovery of new PPIs and protein functions and is freely available for use at http://www.bi.cs.titech.ac.jp/megadock-web/ .

  7. The systematic annotation of the three main GPCR families in Reactome.

    PubMed

    Jassal, Bijay; Jupe, Steven; Caudy, Michael; Birney, Ewan; Stein, Lincoln; Hermjakob, Henning; D'Eustachio, Peter

    2010-07-29

    Reactome is an open-source, freely available database of human biological pathways and processes. A major goal of our work is to provide an integrated view of cellular signalling processes that spans from ligand-receptor interactions to molecular readouts at the level of metabolic and transcriptional events. To this end, we have built the first catalogue of all human G protein-coupled receptors (GPCRs) known to bind endogenous or natural ligands. The UniProt database has records for 797 proteins classified as GPCRs and sorted into families A/1, B/2 and C/3 on the basis of amino acid sequence. To these records we have added details from the IUPHAR database and our own manual curation of relevant literature to create reactions in which 563 GPCRs bind ligands and also interact with specific G-proteins to initiate signalling cascades. We believe the remaining 234 GPCRs are true orphans. The Reactome GPCR pathway can be viewed as a detailed interactive diagram and can be exported in many forms. It provides a template for the orthology-based inference of GPCR reactions for diverse model organism species, and can be overlaid with protein-protein interaction and gene expression datasets to facilitate overrepresentation studies and other forms of pathway analysis. Database URL: http://www.reactome.org.

  8. An interactive mutation database for human coagulation factor IX provides novel insights into the phenotypes and genetics of hemophilia B.

    PubMed

    Rallapalli, P M; Kemball-Cook, G; Tuddenham, E G; Gomez, K; Perkins, S J

    2013-07-01

    Factor IX (FIX) is important in the coagulation cascade, being activated to FIXa on cleavage. Defects in the human F9 gene frequently lead to hemophilia B. To assess 1113 unique F9 mutations corresponding to 3721 patient entries in a new and up-to-date interactive web database alongside the FIXa protein structure. The mutations database was built using MySQL and structural analyses were based on a homology model for the human FIXa structure based on closely-related crystal structures. Mutations have been found in 336 (73%) out of 461 residues in FIX. There were 812 unique point mutations, 182 deletions, 54 polymorphisms, 39 insertions and 26 others that together comprise a total of 1113 unique variants. The 64 unique mild severity mutations in the mature protein with known circulating protein phenotypes include 15 (23%) quantitative type I mutations and 41 (64%) predominantly qualitative type II mutations. Inhibitors were described in 59 reports (1.6%) corresponding to 25 unique mutations. The interactive database provides insights into mechanisms of hemophilia B. Type II mutations are deduced to disrupt predominantly those structural regions involved with functional interactions. The interactive features of the database will assist in making judgments about patient management. © 2013 International Society on Thrombosis and Haemostasis.

  9. Interactive and Versatile Navigation of Structural Databases.

    PubMed

    Korb, Oliver; Kuhn, Bernd; Hert, Jérôme; Taylor, Neil; Cole, Jason; Groom, Colin; Stahl, Martin

    2016-05-12

    We present CSD-CrossMiner, a novel tool for pharmacophore-based searches in crystal structure databases. Intuitive pharmacophore queries describing, among others, protein-ligand interaction patterns, ligand scaffolds, or protein environments can be built and modified interactively. Matching crystal structures are overlaid onto the query and visualized as soon as they are available, enabling the researcher to quickly modify a hypothesis on the fly. We exemplify the utility of the approach by showing applications relevant to real-world drug discovery projects, including the identification of novel fragments for a specific protein environment or scaffold hopping. The ability to concurrently search protein-ligand binding sites extracted from the Protein Data Bank (PDB) and small organic molecules from the Cambridge Structural Database (CSD) using the same pharmacophore query further emphasizes the flexibility of CSD-CrossMiner. We believe that CSD-CrossMiner closes an important gap in mining structural data and will allow users to extract more value from the growing number of available crystal structures.

  10. GPCR & company: databases and servers for GPCRs and interacting partners.

    PubMed

    Kowalsman, Noga; Niv, Masha Y

    2014-01-01

    G-protein-coupled receptors (GPCRs) are a large superfamily of membrane receptors that are involved in a wide range of signaling pathways. To fulfill their tasks, GPCRs interact with a variety of partners, including small molecules, lipids and proteins. They are accompanied by different proteins during all phases of their life cycle. Therefore, GPCR interactions with their partners are of great interest in basic cell-signaling research and in drug discovery.Due to the rapid development of computers and internet communication, knowledge and data can be easily shared within the worldwide research community via freely available databases and servers. These provide an abundance of biological, chemical and pharmacological information.This chapter describes the available web resources for investigating GPCR interactions. We review about 40 freely available databases and servers, and provide a few sentences about the essence and the data they supply. For simplification, the databases and servers were grouped under the following topics: general GPCR-ligand interactions; particular families of GPCRs and their ligands; GPCR oligomerization; GPCR interactions with intracellular partners; and structural information on GPCRs. In conclusion, a multitude of useful tools are currently available. Summary tables are provided to ease navigation between the numerous and partially overlapping resources. Suggestions for future enhancements of the online tools include the addition of links from general to specialized databases and enabling usage of user-supplied template for GPCR structural modeling.

  11. UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions.

    PubMed

    Robasky, Kimberly; Bulyk, Martha L

    2011-01-01

    The Universal PBM Resource for Oligonucleotide-Binding Evaluation (UniPROBE) database is a centralized repository of information on the DNA-binding preferences of proteins as determined by universal protein-binding microarray (PBM) technology. Each entry for a protein (or protein complex) in UniPROBE provides the quantitative preferences for all possible nucleotide sequence variants ('words') of length k ('k-mers'), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. In this update, we describe >130% expansion of the database content, incorporation of a protein BLAST (blastp) tool for finding protein sequence matches in UniPROBE, the introduction of UniPROBE accession numbers and additional database enhancements. The UniPROBE database is available at http://uniprobe.org.

  12. PCPPI: a comprehensive database for the prediction of Penicillium-crop protein-protein interactions.

    PubMed

    Yue, Junyang; Zhang, Danfeng; Ban, Rongjun; Ma, Xiaojing; Chen, Danyang; Li, Guangwei; Liu, Jia; Wisniewski, Michael; Droby, Samir; Liu, Yongsheng

    2017-01-01

    Penicillium expansum , the causal agent of blue mold, is one of the most prevalent post-harvest pathogens, infecting a wide range of crops after harvest. In response, crops have evolved various defense systems to protect themselves against this and other pathogens. Penicillium -crop interaction is a multifaceted process and mediated by pathogen- and host-derived proteins. Identification and characterization of the inter-species protein-protein interactions (PPIs) are fundamental to elucidating the molecular mechanisms underlying infection processes between P. expansum and plant crops. Here, we have developed PCPPI, the Penicillium -Crop Protein-Protein Interactions database, which is constructed based on the experimentally determined orthologous interactions in pathogen-plant systems and available domain-domain interactions (DDIs) in each PPI. Thus far, it stores information on 9911 proteins, 439 904 interactions and seven host species, including apple, kiwifruit, maize, pear, rice, strawberry and tomato. Further analysis through the gene ontology (GO) annotation indicated that proteins with more interacting partners tend to execute the essential function. Significantly, semantic statistics of the GO terms also provided strong support for the accuracy of our predicted interactions in PCPPI. We believe that all the PCPPI datasets are helpful to facilitate the study of pathogen-crop interactions and freely available to the research community. : http://bdg.hfut.edu.cn/pcppi/index.html. © The Author(s) 2017. Published by Oxford University Press.

  13. FARE-CAFE: a database of functional and regulatory elements of cancer-associated fusion events.

    PubMed

    Korla, Praveen Kumar; Cheng, Jack; Huang, Chien-Hung; Tsai, Jeffrey J P; Liu, Yu-Hsuan; Kurubanjerdjit, Nilubon; Hsieh, Wen-Tsong; Chen, Huey-Yi; Ng, Ka-Lok

    2015-01-01

    Chromosomal translocation (CT) is of enormous clinical interest because this disorder is associated with various major solid tumors and leukemia. A tumor-specific fusion gene event may occur when a translocation joins two separate genes. Currently, various CT databases provide information about fusion genes and their genomic elements. However, no database of the roles of fusion genes, in terms of essential functional and regulatory elements in oncogenesis, is available. FARE-CAFE is a unique combination of CTs, fusion proteins, protein domains, domain-domain interactions, protein-protein interactions, transcription factors and microRNAs, with subsequent experimental information, which cannot be found in any other CT database. Genomic DNA information including, for example, manually collected exact locations of the first and second break points, sequences and karyotypes of fusion genes are included. FARE-CAFE will substantially facilitate the cancer biologist's mission of elucidating the pathogenesis of various types of cancer. This database will ultimately help to develop 'novel' therapeutic approaches. Database URL: http://ppi.bioinfo.asia.edu.tw/FARE-CAFE. © The Author(s) 2015. Published by Oxford University Press.

  14. NRF2-ome: an integrated web resource to discover protein interaction and regulatory networks of NRF2.

    PubMed

    Türei, Dénes; Papp, Diána; Fazekas, Dávid; Földvári-Nagy, László; Módos, Dezső; Lenti, Katalin; Csermely, Péter; Korcsmáros, Tamás

    2013-01-01

    NRF2 is the master transcriptional regulator of oxidative and xenobiotic stress responses. NRF2 has important roles in carcinogenesis, inflammation, and neurodegenerative diseases. We developed an online resource, NRF2-ome, to provide an integrated and systems-level database for NRF2. The database contains manually curated and predicted interactions of NRF2 as well as data from external interaction databases. We integrated NRF2 interactome with NRF2 target genes, NRF2 regulating TFs, and miRNAs. We connected NRF2-ome to signaling pathways to allow mapping upstream NRF2 regulatory components that could directly or indirectly influence NRF2 activity totaling 35,967 protein-protein and signaling interactions. The user-friendly website allows researchers without computational background to search, browse, and download the database. The database can be downloaded in SQL, CSV, BioPAX, SBML, PSI-MI, and in a Cytoscape CYS file formats. We illustrated the applicability of the website by suggesting a posttranscriptional negative feedback of NRF2 by MAFG protein and raised the possibility of a connection between NRF2 and the JAK/STAT pathway through STAT1 and STAT3. NRF2-ome can also be used as an evaluation tool to help researchers and drug developers to understand the hidden regulatory mechanisms in the complex network of NRF2.

  15. MIPS: analysis and annotation of proteins from whole genomes.

    PubMed

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  16. The BioGRID Interaction Database: 2011 update

    PubMed Central

    Stark, Chris; Breitkreutz, Bobby-Joe; Chatr-aryamontri, Andrew; Boucher, Lorrie; Oughtred, Rose; Livstone, Michael S.; Nixon, Julie; Van Auken, Kimberly; Wang, Xiaodong; Shi, Xiaoqi; Reguly, Teresa; Rust, Jennifer M.; Winter, Andrew; Dolinski, Kara; Tyers, Mike

    2011-01-01

    The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions. PMID:21071413

  17. SAFE Software and FED Database to Uncover Protein-Protein Interactions using Gene Fusion Analysis.

    PubMed

    Tsagrasoulis, Dimosthenis; Danos, Vasilis; Kissa, Maria; Trimpalis, Philip; Koumandou, V Lila; Karagouni, Amalia D; Tsakalidis, Athanasios; Kossida, Sophia

    2012-01-01

    Domain Fusion Analysis takes advantage of the fact that certain proteins in a given proteome A, are found to have statistically significant similarity with two separate proteins in another proteome B. In other words, the result of a fusion event between two separate proteins in proteome B is a specific full-length protein in proteome A. In such a case, it can be safely concluded that the protein pair has a common biological function or even interacts physically. In this paper, we present the Fusion Events Database (FED), a database for the maintenance and retrieval of fusion data both in prokaryotic and eukaryotic organisms and the Software for the Analysis of Fusion Events (SAFE), a computational platform implemented for the automated detection, filtering and visualization of fusion events (both available at: http://www.bioacademy.gr/bioinformatics/projects/ProteinFusion/index.htm). Finally, we analyze the proteomes of three microorganisms using these tools in order to demonstrate their functionality.

  18. SAFE Software and FED Database to Uncover Protein-Protein Interactions using Gene Fusion Analysis

    PubMed Central

    Tsagrasoulis, Dimosthenis; Danos, Vasilis; Kissa, Maria; Trimpalis, Philip; Koumandou, V. Lila; Karagouni, Amalia D.; Tsakalidis, Athanasios; Kossida, Sophia

    2012-01-01

    Domain Fusion Analysis takes advantage of the fact that certain proteins in a given proteome A, are found to have statistically significant similarity with two separate proteins in another proteome B. In other words, the result of a fusion event between two separate proteins in proteome B is a specific full-length protein in proteome A. In such a case, it can be safely concluded that the protein pair has a common biological function or even interacts physically. In this paper, we present the Fusion Events Database (FED), a database for the maintenance and retrieval of fusion data both in prokaryotic and eukaryotic organisms and the Software for the Analysis of Fusion Events (SAFE), a computational platform implemented for the automated detection, filtering and visualization of fusion events (both available at: http://www.bioacademy.gr/bioinformatics/projects/ProteinFusion/index.htm). Finally, we analyze the proteomes of three microorganisms using these tools in order to demonstrate their functionality. PMID:22267904

  19. MIPS: analysis and annotation of proteins from whole genomes in 2005

    PubMed Central

    Mewes, H. W.; Frishman, D.; Mayer, K. F. X.; Münsterkötter, M.; Noubibou, O.; Pagel, P.; Rattei, T.; Oesterheld, M.; Ruepp, A.; Stümpflen, V.

    2006-01-01

    The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein–protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (). PMID:16381839

  20. Comprehensive data resources and analytical tools for pathological association of aminoacyl tRNA synthetases with cancer

    PubMed Central

    Lee, Ji-Hyun; You, Sungyong; Hyeon, Do Young; Kang, Byeongsoo; Kim, Hyerim; Park, Kyoung Mii; Han, Byungwoo; Hwang, Daehee; Kim, Sunghoon

    2015-01-01

    Mammalian cells have cytoplasmic and mitochondrial aminoacyl-tRNA synthetases (ARSs) that catalyze aminoacylation of tRNAs during protein synthesis. Despite their housekeeping functions in protein synthesis, recently, ARSs and ARS-interacting multifunctional proteins (AIMPs) have been shown to play important roles in disease pathogenesis through their interactions with disease-related molecules. However, there are lacks of data resources and analytical tools that can be used to examine disease associations of ARS/AIMPs. Here, we developed an Integrated Database for ARSs (IDA), a resource database including cancer genomic/proteomic and interaction data of ARS/AIMPs. IDA includes mRNA expression, somatic mutation, copy number variation and phosphorylation data of ARS/AIMPs and their interacting proteins in various cancers. IDA further includes an array of analytical tools for exploration of disease association of ARS/AIMPs, identification of disease-associated ARS/AIMP interactors and reconstruction of ARS-dependent disease-perturbed network models. Therefore, IDA provides both comprehensive data resources and analytical tools for understanding potential roles of ARS/AIMPs in cancers. Database URL: http://ida.biocon.re.kr/, http://ars.biocon.re.kr/ PMID:25824651

  1. MIPS: analysis and annotation of proteins from whole genomes

    PubMed Central

    Mewes, H. W.; Amid, C.; Arnold, R.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Münsterkötter, M.; Pagel, P.; Strack, N.; Stümpflen, V.; Warfsmann, J.; Ruepp, A.

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein–protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354

  2. SIMAP—the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage

    PubMed Central

    Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas

    2014-01-01

    The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881

  3. JET2 Viewer: a database of predicted multiple, possibly overlapping, protein-protein interaction sites for PDB structures.

    PubMed

    Ripoche, Hugues; Laine, Elodie; Ceres, Nicoletta; Carbone, Alessandra

    2017-01-04

    The database JET2 Viewer, openly accessible at http://www.jet2viewer.upmc.fr/, reports putative protein binding sites for all three-dimensional (3D) structures available in the Protein Data Bank (PDB). This knowledge base was generated by applying the computational method JET 2 at large-scale on more than 20 000 chains. JET 2 strategy yields very precise predictions of interacting surfaces and unravels their evolutionary process and complexity. JET2 Viewer provides an online intelligent display, including interactive 3D visualization of the binding sites mapped onto PDB structures and suitable files recording JET 2 analyses. Predictions were evaluated on more than 15 000 experimentally characterized protein interfaces. This is, to our knowledge, the largest evaluation of a protein binding site prediction method. The overall performance of JET 2 on all interfaces are: Sen = 52.52, PPV = 51.24, Spe = 80.05, Acc = 75.89. The data can be used to foster new strategies for protein-protein interactions modulation and interaction surface redesign. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Cytoscape: a software environment for integrated models of biomolecular interaction networks.

    PubMed

    Shannon, Paul; Markiel, Andrew; Ozier, Owen; Baliga, Nitin S; Wang, Jonathan T; Ramage, Daniel; Amin, Nada; Schwikowski, Benno; Ideker, Trey

    2003-11-01

    Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

  5. AIM: a comprehensive Arabidopsis interactome module database and related interologs in plants.

    PubMed

    Wang, Yi; Thilmony, Roger; Zhao, Yunjun; Chen, Guoping; Gu, Yong Q

    2014-01-01

    Systems biology analysis of protein modules is important for understanding the functional relationships between proteins in the interactome. Here, we present a comprehensive database named AIM for Arabidopsis (Arabidopsis thaliana) interactome modules. The database contains almost 250,000 modules that were generated using multiple analysis methods and integration of microarray expression data. All the modules in AIM are well annotated using multiple gene function knowledge databases. AIM provides a user-friendly interface for different types of searches and offers a powerful graphical viewer for displaying module networks linked to the enrichment annotation terms. Both interactive Venn diagram and power graph viewer are integrated into the database for easy comparison of modules. In addition, predicted interologs from other plant species (homologous proteins from different species that share a conserved interaction module) are available for each Arabidopsis module. AIM is a powerful systems biology platform for obtaining valuable insights into the function of proteins in Arabidopsis and other plants using the modules of the Arabidopsis interactome. Database URL:http://probes.pw.usda.gov/AIM Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.

  6. DNAproDB: an interactive tool for structural analysis of DNA–protein complexes

    PubMed Central

    Sagendorf, Jared M.

    2017-01-01

    Abstract Many biological processes are mediated by complex interactions between DNA and proteins. Transcription factors, various polymerases, nucleases and histones recognize and bind DNA with different levels of binding specificity. To understand the physical mechanisms that allow proteins to recognize DNA and achieve their biological functions, it is important to analyze structures of DNA–protein complexes in detail. DNAproDB is a web-based interactive tool designed to help researchers study these complexes. DNAproDB provides an automated structure-processing pipeline that extracts structural features from DNA–protein complexes. The extracted features are organized in structured data files, which are easily parsed with any programming language or viewed in a browser. We processed a large number of DNA–protein complexes retrieved from the Protein Data Bank and created the DNAproDB database to store this data. Users can search the database by combining features of the DNA, protein or DNA–protein interactions at the interface. Additionally, users can upload their own structures for processing privately and securely. DNAproDB provides several interactive and customizable tools for creating visualizations of the DNA–protein interface at different levels of abstraction that can be exported as high quality figures. All functionality is documented and freely accessible at http://dnaprodb.usc.edu. PMID:28431131

  7. MIPS: a database for protein sequences and complete genomes.

    PubMed Central

    Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

    1998-01-01

    The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795

  8. mpMoRFsDB: a database of molecular recognition features in membrane proteins.

    PubMed

    Gypas, Foivos; Tsaousis, Georgios N; Hamodrakas, Stavros J

    2013-10-01

    Molecular recognition features (MoRFs) are small, intrinsically disordered regions in proteins that undergo a disorder-to-order transition on binding to their partners. MoRFs are involved in protein-protein interactions and may function as the initial step in molecular recognition. The aim of this work was to collect, organize and store all membrane proteins that contain MoRFs. Membrane proteins constitute ∼30% of fully sequenced proteomes and are responsible for a wide variety of cellular functions. MoRFs were classified according to their secondary structure, after interacting with their partners. We identified MoRFs in transmembrane and peripheral membrane proteins. The position of transmembrane protein MoRFs was determined in relation to a protein's topology. All information was stored in a publicly available mySQL database with a user-friendly web interface. A Jmol applet is integrated for visualization of the structures. mpMoRFsDB provides valuable information related to disorder-based protein-protein interactions in membrane proteins. http://bioinformatics.biol.uoa.gr/mpMoRFsDB

  9. ChiTaRS-3.1-the enhanced chimeric transcripts and RNA-seq database matched with protein-protein interactions.

    PubMed

    Gorohovski, Alessandro; Tagore, Somnath; Palande, Vikrant; Malka, Assaf; Raviv-Shay, Dorith; Frenkel-Morgenstern, Milana

    2017-01-04

    Discovery of chimeric RNAs, which are produced by chromosomal translocations as well as the joining of exons from different genes by trans-splicing, has added a new level of complexity to our study and understanding of the transcriptome. The enhanced ChiTaRS-3.1 database (http://chitars.md.biu.ac.il) is designed to make widely accessible a wealth of mined data on chimeric RNAs, with easy-to-use analytical tools built-in. The database comprises 34 922: chimeric transcripts along with 11 714: cancer breakpoints. In this latest version, we have included multiple cross-references to GeneCards, iHop, PubMed, NCBI, Ensembl, OMIM, RefSeq and the Mitelman collection for every entry in the 'Full Collection'. In addition, for every chimera, we have added a predicted Chimeric Protein-Protein Interaction (ChiPPI) network, which allows for easy visualization of protein partners of both parental and fusion proteins for all human chimeras. The database contains a comprehensive annotation for 34 922: chimeric transcripts from eight organisms, and includes the manual annotation of 200 sense-antiSense (SaS) chimeras. The current improvements in the content and functionality to the ChiTaRS database make it a central resource for the study of chimeric transcripts and fusion proteins. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. ChemProt-2.0: visual navigation in a disease chemical biology database

    PubMed Central

    Kim Kjærulff, Sonny; Wich, Louis; Kringelum, Jens; Jacobsen, Ulrik P.; Kouskoumvekaki, Irene; Audouze, Karine; Lund, Ole; Brunak, Søren; Oprea, Tudor I.; Taboureau, Olivier

    2013-01-01

    ChemProt-2.0 (http://www.cbs.dtu.dk/services/ChemProt-2.0) is a public available compilation of multiple chemical–protein annotation resources integrated with diseases and clinical outcomes information. The database has been updated to >1.15 million compounds with 5.32 millions bioactivity measurements for 15 290 proteins. Each protein is linked to quality-scored human protein–protein interactions data based on more than half a million interactions, for studying diseases and biological outcomes (diseases, pathways and GO terms) through protein complexes. In ChemProt-2.0, therapeutic effects as well as adverse drug reactions have been integrated allowing for suggesting proteins associated to clinical outcomes. New chemical structure fingerprints were computed based on the similarity ensemble approach. Protein sequence similarity search was also integrated to evaluate the promiscuity of proteins, which can help in the prediction of off-target effects. Finally, the database was integrated into a visual interface that enables navigation of the pharmacological space for small molecules. Filtering options were included in order to facilitate and to guide dynamic search of specific queries. PMID:23185041

  11. Rigid-Docking Approaches to Explore Protein-Protein Interaction Space.

    PubMed

    Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ohue, Masahito; Akiyama, Yutaka

    Protein-protein interactions play core roles in living cells, especially in the regulatory systems. As information on proteins has rapidly accumulated on publicly available databases, much effort has been made to obtain a better picture of protein-protein interaction networks using protein tertiary structure data. Predicting relevant interacting partners from their tertiary structure is a challenging task and computer science methods have the potential to assist with this. Protein-protein rigid docking has been utilized by several projects, docking-based approaches having the advantages that they can suggest binding poses of predicted binding partners which would help in understanding the interaction mechanisms and that comparing docking results of both non-binders and binders can lead to understanding the specificity of protein-protein interactions from structural viewpoints. In this review we focus on explaining current computational prediction methods to predict pairwise direct protein-protein interactions that form protein complexes.

  12. DenHunt - A Comprehensive Database of the Intricate Network of Dengue-Human Interactions

    PubMed Central

    Arjunan, Selvam; Sastri, Narayan P.; Chandra, Nagasuma

    2016-01-01

    Dengue virus (DENV) is a human pathogen and its etiology has been widely established. There are many interactions between DENV and human proteins that have been reported in literature. However, no publicly accessible resource for efficiently retrieving the information is yet available. In this study, we mined all publicly available dengue–human interactions that have been reported in the literature into a database called DenHunt. We retrieved 682 direct interactions of human proteins with dengue viral components, 382 indirect interactions and 4120 differentially expressed human genes in dengue infected cell lines and patients. We have illustrated the importance of DenHunt by mapping the dengue–human interactions on to the host interactome and observed that the virus targets multiple host functional complexes of important cellular processes such as metabolism, immune system and signaling pathways suggesting a potential role of these interactions in viral pathogenesis. We also observed that 7 percent of the dengue virus interacting human proteins are also associated with other infectious and non-infectious diseases. Finally, the understanding that comes from such analyses could be used to design better strategies to counteract the diseases caused by dengue virus. The whole dataset has been catalogued in a searchable database, called DenHunt (http://proline.biochem.iisc.ernet.in/DenHunt/). PMID:27618709

  13. DenHunt - A Comprehensive Database of the Intricate Network of Dengue-Human Interactions.

    PubMed

    Karyala, Prashanthi; Metri, Rahul; Bathula, Christopher; Yelamanchi, Syam K; Sahoo, Lipika; Arjunan, Selvam; Sastri, Narayan P; Chandra, Nagasuma

    2016-09-01

    Dengue virus (DENV) is a human pathogen and its etiology has been widely established. There are many interactions between DENV and human proteins that have been reported in literature. However, no publicly accessible resource for efficiently retrieving the information is yet available. In this study, we mined all publicly available dengue-human interactions that have been reported in the literature into a database called DenHunt. We retrieved 682 direct interactions of human proteins with dengue viral components, 382 indirect interactions and 4120 differentially expressed human genes in dengue infected cell lines and patients. We have illustrated the importance of DenHunt by mapping the dengue-human interactions on to the host interactome and observed that the virus targets multiple host functional complexes of important cellular processes such as metabolism, immune system and signaling pathways suggesting a potential role of these interactions in viral pathogenesis. We also observed that 7 percent of the dengue virus interacting human proteins are also associated with other infectious and non-infectious diseases. Finally, the understanding that comes from such analyses could be used to design better strategies to counteract the diseases caused by dengue virus. The whole dataset has been catalogued in a searchable database, called DenHunt (http://proline.biochem.iisc.ernet.in/DenHunt/).

  14. DITOP: drug-induced toxicity related protein database.

    PubMed

    Zhang, Jing-Xian; Huang, Wei-Juan; Zeng, Jing-Hua; Huang, Wen-Hui; Wang, Yi; Zhao, Rui; Han, Bu-Cong; Liu, Qing-Feng; Chen, Yu-Zong; Ji, Zhi-Liang

    2007-07-01

    Drug-induced toxicity related proteins (DITRPs) are proteins that mediate adverse drug reactions (ADRs) or toxicities through their binding to drugs or reactive metabolites. Collection of these proteins facilitates better understanding of the molecular mechanisms of drug-induced toxicity and the rational drug discovery. Drug-induced toxicity related protein database (DITOP) is such a database that is intending to provide comprehensive information of DITRPs. Currently, DITOP contains 1501 records, covering 618 distinct literature-reported DITRPs, 529 drugs/ligands and 418 distinct toxicity terms. These proteins were confirmed experimentally to interact with drugs or their reactive metabolites, thus directly or indirectly cause adverse effects or toxicities. Five major types of drug-induced toxicities or ADRs are included in DITOP, which are the idiosyncratic adverse drug reactions, the dose-dependent toxicities, the drug-drug interactions, the immune-mediated adverse drug effects (IMADEs) and the toxicities caused by genetic susceptibility. Molecular mechanisms underlying the toxicity and cross-links to related resources are also provided while available. Moreover, a series of user-friendly interfaces were designed for flexible retrieval of DITRPs-related information. The DITOP can be accessed freely at http://bioinf.xmu.edu.cn/databases/ADR/index.html. Supplementary data are available at Bioinformatics online.

  15. FARE-CAFE: a database of functional and regulatory elements of cancer-associated fusion events

    PubMed Central

    Korla, Praveen Kumar; Cheng, Jack; Huang, Chien-Hung; Tsai, Jeffrey J. P.; Liu, Yu-Hsuan; Kurubanjerdjit, Nilubon; Hsieh, Wen-Tsong; Chen, Huey-Yi; Ng, Ka-Lok

    2015-01-01

    Chromosomal translocation (CT) is of enormous clinical interest because this disorder is associated with various major solid tumors and leukemia. A tumor-specific fusion gene event may occur when a translocation joins two separate genes. Currently, various CT databases provide information about fusion genes and their genomic elements. However, no database of the roles of fusion genes, in terms of essential functional and regulatory elements in oncogenesis, is available. FARE-CAFE is a unique combination of CTs, fusion proteins, protein domains, domain–domain interactions, protein–protein interactions, transcription factors and microRNAs, with subsequent experimental information, which cannot be found in any other CT database. Genomic DNA information including, for example, manually collected exact locations of the first and second break points, sequences and karyotypes of fusion genes are included. FARE-CAFE will substantially facilitate the cancer biologist’s mission of elucidating the pathogenesis of various types of cancer. This database will ultimately help to develop ‘novel’ therapeutic approaches. Database URL: http://ppi.bioinfo.asia.edu.tw/FARE-CAFE PMID:26384373

  16. SANSparallel: interactive homology search against Uniprot

    PubMed Central

    Somervuo, Panu; Holm, Liisa

    2015-01-01

    Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. PMID:25855811

  17. PROFESS: a PROtein Function, Evolution, Structure and Sequence database

    PubMed Central

    Triplet, Thomas; Shortridge, Matthew D.; Griep, Mark A.; Stark, Jaime L.; Powers, Robert; Revesz, Peter

    2010-01-01

    The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are ∼1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein–protein interaction networks. Database URL: http://cse.unl.edu/∼profess/ PMID:20624718

  18. DIMA 3.0: Domain Interaction Map.

    PubMed

    Luo, Qibin; Pagel, Philipp; Vilne, Baiba; Frishman, Dmitrij

    2011-01-01

    Domain Interaction MAp (DIMA, available at http://webclu.bio.wzw.tum.de/dima) is a database of predicted and known interactions between protein domains. It integrates 5807 structurally known interactions imported from the iPfam and 3did databases and 46,900 domain interactions predicted by four computational methods: domain phylogenetic profiling, domain pair exclusion algorithm correlated mutations and domain interaction prediction in a discriminative way. Additionally predictions are filtered to exclude those domain pairs that are reported as non-interacting by the Negatome database. The DIMA Web site allows to calculate domain interaction networks either for a domain of interest or for entire organisms, and to explore them interactively using the Flash-based Cytoscape Web software.

  19. IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model

    PubMed Central

    Xia, Kai; Dong, Dong; Han, Jing-Dong J

    2006-01-01

    Background Although protein-protein interaction (PPI) networks have been explored by various experimental methods, the maps so built are still limited in coverage and accuracy. To further expand the PPI network and to extract more accurate information from existing maps, studies have been carried out to integrate various types of functional relationship data. A frequently updated database of computationally analyzed potential PPIs to provide biological researchers with rapid and easy access to analyze original data as a biological network is still lacking. Results By applying a probabilistic model, we integrated 27 heterogeneous genomic, proteomic and functional annotation datasets to predict PPI networks in human. In addition to previously studied data types, we show that phenotypic distances and genetic interactions can also be integrated to predict PPIs. We further built an easy-to-use, updatable integrated PPI database, the Integrated Network Database (IntNetDB) online, to provide automatic prediction and visualization of PPI network among genes of interest. The networks can be visualized in SVG (Scalable Vector Graphics) format for zooming in or out. IntNetDB also provides a tool to extract topologically highly connected network neighborhoods from a specific network for further exploration and research. Using the MCODE (Molecular Complex Detections) algorithm, 190 such neighborhoods were detected among all the predicted interactions. The predicted PPIs can also be mapped to worm, fly and mouse interologs. Conclusion IntNetDB includes 180,010 predicted protein-protein interactions among 9,901 human proteins and represents a useful resource for the research community. Our study has increased prediction coverage by five-fold. IntNetDB also provides easy-to-use network visualization and analysis tools that allow biological researchers unfamiliar with computational biology to access and analyze data over the internet. The web interface of IntNetDB is freely accessible at . Visualization requires Mozilla version 1.8 (or higher) or Internet Explorer with installation of SVGviewer. PMID:17112386

  20. Databases and Associated Tools for Glycomics and Glycoproteomics.

    PubMed

    Lisacek, Frederique; Mariethoz, Julien; Alocci, Davide; Rudd, Pauline M; Abrahams, Jodie L; Campbell, Matthew P; Packer, Nicolle H; Ståhle, Jonas; Widmalm, Göran; Mullen, Elaine; Adamczyk, Barbara; Rojas-Macias, Miguel A; Jin, Chunsheng; Karlsson, Niclas G

    2017-01-01

    The access to biodatabases for glycomics and glycoproteomics has proven to be essential for current glycobiological research. This chapter presents available databases that are devoted to different aspects of glycobioinformatics. This includes oligosaccharide sequence databases, experimental databases, 3D structure databases (of both glycans and glycorelated proteins) and association of glycans with tissue, disease, and proteins. Specific search protocols are also provided using tools associated with experimental databases for converting primary glycoanalytical data to glycan structural information. In particular, researchers using glycoanalysis methods by U/HPLC (GlycoBase), MS (GlycoWorkbench, UniCarb-DB, GlycoDigest), and NMR (CASPER) will benefit from this chapter. In addition we also include information on how to utilize glycan structural information to query databases that associate glycans with proteins (UniCarbKB) and with interactions with pathogens (SugarBind).

  1. THGS: a web-based database of Transmembrane Helices in Genome Sequences

    PubMed Central

    Fernando, S. A.; Selvarani, P.; Das, Soma; Kumar, Ch. Kiran; Mondal, Sukanta; Ramakumar, S.; Sekar, K.

    2004-01-01

    Transmembrane Helices in Genome Sequences (THGS) is an interactive web-based database, developed to search the transmembrane helices in the user-interested gene sequences available in the Genome Database (GDB). The proposed database has provision to search sequence motifs in transmembrane and globular proteins. In addition, the motif can be searched in the other sequence databases (Swiss-Prot and PIR) or in the macromolecular structure database, Protein Data Bank (PDB). Further, the 3D structure of the corresponding queried motif, if it is available in the solved protein structures deposited in the Protein Data Bank, can also be visualized using the widely used graphics package RASMOL. All the sequence databases used in the present work are updated frequently and hence the results produced are up to date. The database THGS is freely available via the world wide web and can be accessed at http://pranag.physics.iisc.ernet.in/thgs/ or http://144.16.71.10/thgs/. PMID:14681375

  2. Web server to identify similarity of amino acid motifs to compounds (SAAMCO).

    PubMed

    Casey, Fergal P; Davey, Norman E; Baran, Ivan; Varekova, Radka Svobodova; Shields, Denis C

    2008-07-01

    Protein-protein interactions are fundamental in mediating biological processes including metabolism, cell growth, and signaling. To be able to selectively inhibit or induce protein activity or complex formation is a key feature in controlling disease. For those situations in which protein-protein interactions derive substantial affinity from short linear peptide sequences, or motifs, we can develop search algorithms for peptidomimetic compounds that resemble the short peptide's structure but are not compromised by poor pharmacological properties. SAAMCO is a Web service ( http://bioware.ucd.ie/ approximately saamco) that facilitates the screening of motifs with known structures against bioactive compound databases. It is built on an algorithm that defines compound similarity based on the presence of appropriate amino acid side chain fragments and a favorable Root Mean Squared Deviation (RMSD) between compound and motif structure. The methodology is efficient as the available compound databases are preprocessed and fast regular expression searches filter potential matches before time-intensive 3D superposition is performed. The required input information is minimal, and the compound databases have been selected to maximize the availability of information on biological activity. "Hits" are accompanied with a visualization window and links to source database entries. Motif matching can be defined on partial or full similarity which will increase or reduce respectively the number of potential mimetic compounds. The Web server provides the functionality for rapid screening of known or putative interaction motifs against prepared compound libraries using a novel search algorithm. The tabulated results can be analyzed by linking to appropriate databases and by visualization.

  3. Domain fusion analysis by applying relational algebra to protein sequence and domain databases

    PubMed Central

    Truong, Kevin; Ikura, Mitsuhiko

    2003-01-01

    Background Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. Results This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . Conclusion As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time. PMID:12734020

  4. Complex network theory for the identification and assessment of candidate protein targets.

    PubMed

    McGarry, Ken; McDonald, Sharon

    2018-06-01

    In this work we use complex network theory to provide a statistical model of the connectivity patterns of human proteins and their interaction partners. Our intention is to identify important proteins that may be predisposed to be potential candidates as drug targets for therapeutic interventions. Target proteins usually have more interaction partners than non-target proteins, but there are no hard-and-fast rules for defining the actual number of interactions. We devise a statistical measure for identifying hub proteins, we score our target proteins with gene ontology annotations. The important druggable protein targets are likely to have similar biological functions that can be assessed for their potential therapeutic value. Our system provides a statistical analysis of the local and distant neighborhood protein interactions of the potential targets using complex network measures. This approach builds a more accurate model of drug-to-target activity and therefore the likely impact on treating diseases. We integrate high quality protein interaction data from the HINT database and disease associated proteins from the DrugTarget database. Other sources include biological knowledge from Gene Ontology and drug information from DrugBank. The problem is a very challenging one since the data is highly imbalanced between target proteins and the more numerous nontargets. We use undersampling on the training data and build Random Forest classifier models which are used to identify previously unclassified target proteins. We validate and corroborate these findings from the available literature. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. BIND: the Biomolecular Interaction Network Database

    PubMed Central

    Bader, Gary D.; Betel, Doron; Hogue, Christopher W. V.

    2003-01-01

    The Biomolecular Interaction Network Database (BIND: http://bind.ca) archives biomolecular interaction, complex and pathway information. A web-based system is available to query, view and submit records. BIND continues to grow with the addition of individual submissions as well as interaction data from the PDB and a number of large-scale interaction and complex mapping experiments using yeast two hybrid, mass spectrometry, genetic interactions and phage display. We have developed a new graphical analysis tool that provides users with a view of the domain composition of proteins in interaction and complex records to help relate functional domains to protein interactions. An interaction network clustering tool has also been developed to help focus on regions of interest. Continued input from users has helped further mature the BIND data specification, which now includes the ability to store detailed information about genetic interactions. The BIND data specification is available as ASN.1 and XML DTD. PMID:12519993

  6. MIPS: a database for protein sequences, homology data and yeast genome information.

    PubMed Central

    Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

    1997-01-01

    The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

  7. Protein interactions in 3D: from interface evolution to drug discovery.

    PubMed

    Winter, Christof; Henschel, Andreas; Tuukkanen, Anne; Schroeder, Michael

    2012-09-01

    Over the past 10years, much research has been dedicated to the understanding of protein interactions. Large-scale experiments to elucidate the global structure of protein interaction networks have been complemented by detailed studies of protein interaction interfaces. Understanding the evolution of interfaces allows one to identify convergently evolved interfaces which are evolutionary unrelated but share a few key residues and hence have common binding partners. Understanding interaction interfaces and their evolution is an important basis for pharmaceutical applications in drug discovery. Here, we review the algorithms and databases on 3D protein interactions and discuss in detail applications in interface evolution, drug discovery, and interface prediction. Copyright © 2012 Elsevier Inc. All rights reserved.

  8. Oligomerisation status and evolutionary conservation of interfaces of protein structural domain superfamilies.

    PubMed

    Sukhwal, Anshul; Sowdhamini, Ramanathan

    2013-07-01

    Protein-protein interactions are important in carrying out many biological processes and functions. These interactions may be either permanent or of temporary nature. Several studies have employed tools like solvent accessibility and graph theory to identify these interactions, but still more studies need to be performed to quantify and validate them. Although we now have many databases available with predicted and experimental results on protein-protein interactions, we still do not have many databases which focus on providing structural details of the interacting complexes, their oligomerisation state and homologues. In this work, protein-protein interactions have been thoroughly investigated within the structural regime and quantified for their strength using calculated pseudoenergies. The PPCheck server, an in-house webserver, has been used for calculating the pseudoenergies like van der Waals, hydrogen bonds and electrostatic energy based on distances between atoms of amino acids from two interacting proteins. PPCheck can be visited at . Based on statistical data, as obtained by studying established protein-protein interacting complexes from earlier studies, we came to a conclusion that an average protein-protein interface consisted of about 51 to 150 amino acid residues and the generalized energy per residue ranged from -2 kJ mol(-1) to -6 kJ mol(-1). We found that some of the proteins have an exceptionally higher number of amino acids at the interface and it was purely because of their elaborate interface or extended topology i.e. some of their secondary structure regions or loops were either inter-mixing or running parallel to one another or they were taking part in domain swapping. Residue networks were prepared for all the amino acids of the interacting proteins involved in different types of interactions (like van der Waals, hydrogen-bonding, electrostatic or intramolecular interactions) and were analysed between the query domain-interacting partner pair and its remote homologue-interacting partner pair. We found that, in exceptional cases, homologous proteins belonging to the same superfamily, but with remote sequence similarity, can share similar interfaces.

  9. TranscriptomeBrowser 3.0: introducing a new compendium of molecular interactions and a new visualization tool for the study of gene regulatory networks.

    PubMed

    Lepoivre, Cyrille; Bergon, Aurélie; Lopez, Fabrice; Perumal, Narayanan B; Nguyen, Catherine; Imbert, Jean; Puthier, Denis

    2012-01-31

    Deciphering gene regulatory networks by in silico approaches is a crucial step in the study of the molecular perturbations that occur in diseases. The development of regulatory maps is a tedious process requiring the comprehensive integration of various evidences scattered over biological databases. Thus, the research community would greatly benefit from having a unified database storing known and predicted molecular interactions. Furthermore, given the intrinsic complexity of the data, the development of new tools offering integrated and meaningful visualizations of molecular interactions is necessary to help users drawing new hypotheses without being overwhelmed by the density of the subsequent graph. We extend the previously developed TranscriptomeBrowser database with a set of tables containing 1,594,978 human and mouse molecular interactions. The database includes: (i) predicted regulatory interactions (computed by scanning vertebrate alignments with a set of 1,213 position weight matrices), (ii) potential regulatory interactions inferred from systematic analysis of ChIP-seq experiments, (iii) regulatory interactions curated from the literature, (iv) predicted post-transcriptional regulation by micro-RNA, (v) protein kinase-substrate interactions and (vi) physical protein-protein interactions. In order to easily retrieve and efficiently analyze these interactions, we developed In-teractomeBrowser, a graph-based knowledge browser that comes as a plug-in for Transcriptome-Browser. The first objective of InteractomeBrowser is to provide a user-friendly tool to get new insight into any gene list by providing a context-specific display of putative regulatory and physical interactions. To achieve this, InteractomeBrowser relies on a "cell compartments-based layout" that makes use of a subset of the Gene Ontology to map gene products onto relevant cell compartments. This layout is particularly powerful for visual integration of heterogeneous biological information and is a productive avenue in generating new hypotheses. The second objective of InteractomeBrowser is to fill the gap between interaction databases and dynamic modeling. It is thus compatible with the network analysis software Cytoscape and with the Gene Interaction Network simulation software (GINsim). We provide examples underlying the benefits of this visualization tool for large gene set analysis related to thymocyte differentiation. The InteractomeBrowser plugin is a powerful tool to get quick access to a knowledge database that includes both predicted and validated molecular interactions. InteractomeBrowser is available through the TranscriptomeBrowser framework and can be found at: http://tagc.univ-mrs.fr/tbrowser/. Our database is updated on a regular basis.

  10. Atomic analysis of protein-protein interfaces with known inhibitors: the 2P2I database.

    PubMed

    Bourgeas, Raphaël; Basse, Marie-Jeanne; Morelli, Xavier; Roche, Philippe

    2010-03-09

    In the last decade, the inhibition of protein-protein interactions (PPIs) has emerged from both academic and private research as a new way to modulate the activity of proteins. Inhibitors of these original interactions are certainly the next generation of highly innovative drugs that will reach the market in the next decade. However, in silico design of such compounds still remains challenging. Here we describe this particular PPI chemical space through the presentation of 2P2I(DB), a hand-curated database dedicated to the structure of PPIs with known inhibitors. We have analyzed protein/protein and protein/inhibitor interfaces in terms of geometrical parameters, atom and residue properties, buried accessible surface area and other biophysical parameters. The interfaces found in 2P2I(DB) were then compared to those of representative datasets of heterodimeric complexes. We propose a new classification of PPIs with known inhibitors into two classes depending on the number of segments present at the interface and corresponding to either a single secondary structure element or to a more globular interacting domain. 2P2I(DB) complexes share global shape properties with standard transient heterodimer complexes, but their accessible surface areas are significantly smaller. No major conformational changes are seen between the different states of the proteins. The interfaces are more hydrophobic than general PPI's interfaces, with less charged residues and more non-polar atoms. Finally, fifty percent of the complexes in the 2P2I(DB) dataset possess more hydrogen bonds than typical protein-protein complexes. Potential areas of study for the future are proposed, which include a new classification system consisting of specific families and the identification of PPI targets with high druggability potential based on key descriptors of the interaction. 2P2I database stores structural information about PPIs with known inhibitors and provides a useful tool for biologists to assess the potential druggability of their interfaces. The database can be accessed at http://2p2idb.cnrs-mrs.fr.

  11. SANSparallel: interactive homology search against Uniprot.

    PubMed

    Somervuo, Panu; Holm, Liisa

    2015-07-01

    Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. The Protein Disease Database of human body fluids: II. Computer methods and data issues.

    PubMed

    Lemkin, P F; Orr, G A; Goldstein, M P; Creed, G J; Myrick, J E; Merril, C R

    1995-01-01

    The Protein Disease Database (PDD) is a relational database of proteins and diseases. With this database it is possible to screen for quantitative protein abnormalities associated with disease states. These quantitative relationships use data drawn from the peer-reviewed biomedical literature. Assays may also include those observed in high-resolution electrophoretic gels that offer the potential to quantitate many proteins in a single test as well as data gathered by enzymatic or immunologic assays. We are using the Internet World Wide Web (WWW) and the Web browser paradigm as an access method for wide distribution and querying of the Protein Disease Database. The WWW hypertext transfer protocol and its Common Gateway Interface make it possible to build powerful graphical user interfaces that can support easy-to-use data retrieval using query specification forms or images. The details of these interactions are totally transparent to the users of these forms. Using a client-server SQL relational database, user query access, initial data entry and database maintenance are all performed over the Internet with a Web browser. We discuss the underlying design issues, mapping mechanisms and assumptions that we used in constructing the system, data entry, access to the database server, security, and synthesis of derived two-dimensional gel image maps and hypertext documents resulting from SQL database searches.

  13. Stacking and T-shape competition in aromatic-aromatic amino acid interactions.

    PubMed

    Chelli, Riccardo; Gervasio, Francesco Luigi; Procacci, Piero; Schettino, Vincenzo

    2002-05-29

    The potential of mean force of interacting aromatic amino acids is calculated using molecular dynamics simulations. The free energy surface is determined in order to study stacking and T-shape competition for phenylalanine-phenylalanine (Phe-Phe), phenylalanine-tyrosine (Phe-Tyr), and tyrosine-tyrosine (Tyr-Tyr) complexes in vacuo, water, carbon tetrachloride, and methanol. Stacked structures are favored in all solvents with the exception of the Tyr-Tyr complex in carbon tetrachloride, where T-shaped structures are also important. The effect of anchoring the two alpha-carbons (C(alpha)) at selected distances is investigated. We find that short and large C(alpha)-C(alpha) distances favor stacked and T-shaped structures, respectively. We analyze a set of 2396 protein structures resolved experimentally. Comparison of theoretical free energies for the complexes to the experimental analogue shows that Tyr-Tyr interaction occurs mainly at the protein surface, while Phe-Tyr and Phe-Phe interactions are more frequent in the hydrophobic protein core. This is confirmed by the Voronoi polyhedron analysis on the database protein structures. As found from the free energy calculation, analysis of the protein database has shown that proximal and distal interacting aromatic residues are predominantly stacked and T-shaped, respectively.

  14. AN OVERVIEW OF COMPUTATIONAL LIFE SCIENCE DATABASES & EXCHANGE FORMATS OF RELEVANCE TO CHEMICAL BIOLOGY RESEARCH

    PubMed Central

    Hall, Aaron Smalter; Shan, Yunfeng; Lushington, Gerald; Visvanathan, Mahesh

    2016-01-01

    Databases and exchange formats describing biological entities such as chemicals and proteins, along with their relationships, are a critical component of research in life sciences disciplines, including chemical biology wherein small information about small molecule properties converges with cellular and molecular biology. Databases for storing biological entities are growing not only in size, but also in type, with many similarities between them and often subtle differences. The data formats available to describe and exchange these entities are numerous as well. In general, each format is optimized for a particular purpose or database, and hence some understanding of these formats is required when choosing one for research purposes. This paper reviews a selection of different databases and data formats with the goal of summarizing their purposes, features, and limitations. Databases are reviewed under the categories of 1) protein interactions, 2) metabolic pathways, 3) chemical interactions, and 4) drug discovery. Representation formats will be discussed according to those describing chemical structures, and those describing genomic/proteomic entities. PMID:22934944

  15. An overview of computational life science databases & exchange formats of relevance to chemical biology research.

    PubMed

    Smalter Hall, Aaron; Shan, Yunfeng; Lushington, Gerald; Visvanathan, Mahesh

    2013-03-01

    Databases and exchange formats describing biological entities such as chemicals and proteins, along with their relationships, are a critical component of research in life sciences disciplines, including chemical biology wherein small information about small molecule properties converges with cellular and molecular biology. Databases for storing biological entities are growing not only in size, but also in type, with many similarities between them and often subtle differences. The data formats available to describe and exchange these entities are numerous as well. In general, each format is optimized for a particular purpose or database, and hence some understanding of these formats is required when choosing one for research purposes. This paper reviews a selection of different databases and data formats with the goal of summarizing their purposes, features, and limitations. Databases are reviewed under the categories of 1) protein interactions, 2) metabolic pathways, 3) chemical interactions, and 4) drug discovery. Representation formats will be discussed according to those describing chemical structures, and those describing genomic/proteomic entities.

  16. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae

    PubMed Central

    Reguly, Teresa; Breitkreutz, Ashton; Boucher, Lorrie; Breitkreutz, Bobby-Joe; Hon, Gary C; Myers, Chad L; Parsons, Ainslie; Friesen, Helena; Oughtred, Rose; Tong, Amy; Stark, Chris; Ho, Yuen; Botstein, David; Andrews, Brenda; Boone, Charles; Troyanskya, Olga G; Ideker, Trey; Dolinski, Kara; Batada, Nizar N; Tyers, Mike

    2006-01-01

    Background The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference. Results We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID () and SGD () databases. Conclusion Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks. PMID:16762047

  17. CREDO: a structural interactomics database for drug discovery

    PubMed Central

    Schreyer, Adrian M.; Blundell, Tom L.

    2013-01-01

    CREDO is a unique relational database storing all pairwise atomic interactions of inter- as well as intra-molecular contacts between small molecules and macromolecules found in experimentally determined structures from the Protein Data Bank. These interactions are integrated with further chemical and biological data. The database implements useful data structures and algorithms such as cheminformatics routines to create a comprehensive analysis platform for drug discovery. The database can be accessed through a web-based interface, downloads of data sets and web services at http://www-cryst.bioc.cam.ac.uk/credo. Database URL: http://www-cryst.bioc.cam.ac.uk/credo PMID:23868908

  18. Functional Interaction Network Construction and Analysis for Disease Discovery.

    PubMed

    Wu, Guanming; Haw, Robin

    2017-01-01

    Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.

  19. ImmunemiR - A Database of Prioritized Immune miRNA Disease Associations and its Interactome.

    PubMed

    Prabahar, Archana; Natarajan, Jeyakumar

    2017-01-01

    MicroRNAs are the key regulators of gene expression and their abnormal expression in the immune system may be associated with several human diseases such as inflammation, cancer and autoimmune diseases. Elucidation of miRNA disease association through the interactome will deepen the understanding of its disease mechanisms. A specialized database for immune miRNAs is highly desirable to demonstrate the immune miRNA disease associations in the interactome. miRNAs specific to immune related diseases were retrieved from curated databases such as HMDD, miR2disease and PubMed literature based on MeSH classification of immune system diseases. The additional data such as miRNA target genes, genes coding protein-protein interaction information were compiled from related resources. Further, miRNAs were prioritized to specific immune diseases using random walk ranking algorithm. In total 245 immune miRNAs associated with 92 OMIM disease categories were identified from external databases. The resultant data were compiled as ImmunemiR, a database of prioritized immune miRNA disease associations. This database provides both text based annotation information and network visualization of its interactome. To our knowledge, ImmunemiR is the first available database to provide a comprehensive repository of human immune disease associated miRNAs with network visualization options of its target genes, protein-protein interactions (PPI) and its disease associations. It is freely available at http://www.biominingbu.org/immunemir/. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  20. PROXiMATE: a database of mutant protein-protein complex thermodynamics and kinetics.

    PubMed

    Jemimah, Sherlyn; Yugandhar, K; Michael Gromiha, M

    2017-09-01

    We have developed PROXiMATE, a database of thermodynamic data for more than 6000 missense mutations in 174 heterodimeric protein-protein complexes, supplemented with interaction network data from STRING database, solvent accessibility, sequence, structural and functional information, experimental conditions and literature information. Additional features include complex structure visualization, search and display options, download options and a provision for users to upload their data. The database is freely available at http://www.iitm.ac.in/bioinfo/PROXiMATE/ . The website is implemented in Python, and supports recent versions of major browsers such as IE10, Firefox, Chrome and Opera. gromiha@iitm.ac.in. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  1. Mapping protein-protein interactions with phage-displayed combinatorial peptide libraries.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kay, B. K.; Castagnoli, L.; Biosciences Division

    This unit describes the process and analysis of affinity selecting bacteriophage M13 from libraries displaying combinatorial peptides fused to either a minor or major capsid protein. Direct affinity selection uses target protein bound to a microtiter plate followed by purification of selected phage by ELISA. Alternatively, there is a bead-based affinity selection method. These methods allow one to readily isolate peptide ligands that bind to a protein target of interest and use the consensus sequence to search proteomic databases for putative interacting proteins.

  2. Domain fusion analysis by applying relational algebra to protein sequence and domain databases.

    PubMed

    Truong, Kevin; Ikura, Mitsuhiko

    2003-05-06

    Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.

  3. PCPPI: a comprehensive database for the prediction of Penicillium-crop protein-protein interactions

    USDA-ARS?s Scientific Manuscript database

    Penicillium expansum, the causal agent of blue mold, is one of the most prevalent postharvest pathogens infecting a wide range of crops after harvest. In response, crops have evolved various defense systems to protect themselves against this and other pathogens. Penicillium-crop interaction is a m...

  4. Exploiting the Proteome to Improve the Genome-Wide Genetic Analysis of Epistasis in Common Human Diseases

    PubMed Central

    Pattin, Kristine A.; Moore, Jason H.

    2009-01-01

    One of the central goals of human genetics is the identification of loci with alleles or genotypes that confer increased susceptibility. The availability of dense maps of single-nucleotide polymorphisms (SNPs) along with high-throughput genotyping technologies has set the stage for routine genome-wide association studies that are expected to significantly improve our ability to identify susceptibility loci. Before this promise can be realized, there are some significant challenges that need to be addressed. We address here the challenge of detecting epistasis or gene-gene interactions in genome-wide association studies. Discovering epistatic interactions in high dimensional datasets remains a challenge due to the computational complexity resulting from the analysis of all possible combinations of SNPs. One potential way to overcome the computational burden of a genome-wide epistasis analysis would be to devise a logical way to prioritize the many SNPs in a dataset so that the data may be analyzed more efficiently and yet still retain important biological information. One of the strongest demonstrations of the functional relationship between genes is protein-protein interaction. Thus, it is plausible that the expert knowledge extracted from protein interaction databases may allow for a more efficient analysis of genome-wide studies as well as facilitate the biological interpretation of the data. In this review we will discuss the challenges of detecting epistasis in genome-wide genetic studies and the means by which we propose to apply expert knowledge extracted from protein interaction databases to facilitate this process. We explore some of the fundamentals of protein interactions and the databases that are publicly available. PMID:18551320

  5. ARCPHdb: A comprehensive protein database for SF1 and SF2 helicase from archaea.

    PubMed

    Moukhtar, Mirna; Chaar, Wafi; Abdel-Razzak, Ziad; Khalil, Mohamad; Taha, Samir; Chamieh, Hala

    2017-01-01

    Superfamily 1 and Superfamily 2 helicases, two of the largest helicase protein families, play vital roles in many biological processes including replication, transcription and translation. Study of helicase proteins in the model microorganisms of archaea have largely contributed to the understanding of their function, architecture and assembly. Based on a large phylogenomics approach, we have identified and classified all SF1 and SF2 protein families in ninety five sequenced archaea genomes. Here we developed an online webserver linked to a specialized protein database named ARCPHdb to provide access for SF1 and SF2 helicase families from archaea. ARCPHdb was implemented using MySQL relational database. Web interfaces were developed using Netbeans. Data were stored according to UniProt accession numbers, NCBI Ref Seq ID, PDB IDs and Entrez Databases. A user-friendly interactive web interface has been developed to browse, search and download archaeal helicase protein sequences, their available 3D structure models, and related documentation available in the literature provided by ARCPHdb. The database provides direct links to matching external databases. The ARCPHdb is the first online database to compile all protein information on SF1 and SF2 helicase from archaea in one platform. This database provides essential resource information for all researchers interested in the field. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. Rice proteome analysis: a step toward functional analysis of the rice genome.

    PubMed

    Komatsu, Setsuko; Tanaka, Naoki

    2005-03-01

    The technique of proteome analysis using 2-DE has the power to monitor global changes that occur in the protein complement of tissues and subcellular compartments. In this review, we describe construction of the rice proteome database, the cataloging of rice proteins, and the functional characterization of some of the proteins identified. Initially, proteins extracted from various tissues and organelles were separated by 2-DE and an image analyzer was used to construct a display or reference map of the proteins. The rice proteome database currently contains 23 reference maps based on 2-DE of proteins from different rice tissues and subcellular compartments. These reference maps comprise 13 129 rice proteins, and the amino acid sequences of 5092 of these proteins are entered in the database. Major proteins involved in growth or stress responses have been identified by using a proteomics approach and some of these proteins have unique functions. Furthermore, initial work has also begun on analyzing the phosphoproteome and protein-protein interactions in rice. The information obtained from the rice proteome database will aid in the molecular cloning of rice genes and in predicting the function of unknown proteins.

  7. sc-PDB: a 3D-database of ligandable binding sites—10 years on

    PubMed Central

    Desaphy, Jérémy; Bret, Guillaume; Rognan, Didier; Kellenberger, Esther

    2015-01-01

    The sc-PDB database (available at http://bioinfo-pharma.u-strasbg.fr/scPDB/) is a comprehensive and up-to-date selection of ligandable binding sites of the Protein Data Bank. Sites are defined from complexes between a protein and a pharmacological ligand. The database provides the all-atom description of the protein, its ligand, their binding site and their binding mode. Currently, the sc-PDB archive registers 9283 binding sites from 3678 unique proteins and 5608 unique ligands. The sc-PDB database was publicly launched in 2004 with the aim of providing structure files suitable for computational approaches to drug design, such as docking. During the last 10 years we have improved and standardized the processes for (i) identifying binding sites, (ii) correcting structures, (iii) annotating protein function and ligand properties and (iv) characterizing their binding mode. This paper presents the latest enhancements in the database, specifically pertaining to the representation of molecular interaction and to the similarity between ligand/protein binding patterns. The new website puts emphasis in pictorial analysis of data. PMID:25300483

  8. ProMateus—an open research approach to protein-binding sites analysis

    PubMed Central

    Neuvirth, Hani; Heinemann, Uri; Birnbaum, David; Tishby, Naftali; Schreiber, Gideon

    2007-01-01

    The development of bioinformatic tools by individual labs results in the abundance of parallel programs for the same task. For example, identification of binding site regions between interacting proteins is done using: ProMate, WHISCY, PPI-Pred, PINUP and others. All servers first identify unique properties of binding sites and then incorporate them into a predictor. Obviously, the resulting prediction would improve if the most suitable parameters from each of those predictors would be incorporated into one server. However, because of the variation in methods and databases, this is currently not feasible. Here, the protein-binding site prediction server is extended into a general protein-binding sites research tool, ProMateus. This web tool, based on ProMate's infrastructure enables the easy exploration and incorporation of new features and databases by the user, providing an evaluation of the benefit of individual features and their combination within a set framework. This transforms the individual research into a community exercise, bringing out the best from all users for optimized predictions. The analysis is demonstrated on a database of protein protein and protein-DNA interactions. This approach is basically different from that used in generating meta-servers. The implications of the open-research approach are discussed. ProMateus is available at http://bip.weizmann.ac.il/promate. PMID:17488838

  9. Target identification in Fusobacterium nucleatum by subtractive genomics approach and enrichment analysis of host-pathogen protein-protein interactions.

    PubMed

    Kumar, Amit; Thotakura, Pragna Lakshmi; Tiwary, Basant Kumar; Krishna, Ramadas

    2016-05-12

    Fusobacterium nucleatum, a well studied bacterium in periodontal diseases, appendicitis, gingivitis, osteomyelitis and pregnancy complications has recently gained attention due to its association with colorectal cancer (CRC) progression. Treatment with berberine was shown to reverse F. nucleatum-induced CRC progression in mice by balancing the growth of opportunistic pathogens in tumor microenvironment. Intestinal microbiota imbalance and the infections caused by F. nucleatum might be regulated by therapeutic intervention. Hence, we aimed to predict drug target proteins in F. nucleatum, through subtractive genomics approach and host-pathogen protein-protein interactions (HP-PPIs). We also carried out enrichment analysis of host interacting partners to hypothesize the possible mechanisms involved in CRC progression due to F. nucleatum. In subtractive genomics approach, the essential, virulence and resistance related proteins were retrieved from RefSeq proteome of F. nucleatum by searching against Database of Essential Genes (DEG), Virulence Factor Database (VFDB) and Antibiotic Resistance Gene-ANNOTation (ARG-ANNOT) tool respectively. A subsequent hierarchical screening to identify non-human homologous, metabolic pathway-independent/pathway-specific and druggable proteins resulted in eight pathway-independent and 27 pathway-specific druggable targets. Co-aggregation of F. nucleatum with host induces proinflammatory gene expression thereby potentiates tumorigenesis. Hence, proteins from IBDsite, a database for inflammatory bowel disease (IBD) research and those involved in colorectal adenocarcinoma as interpreted from The Cancer Genome Atlas (TCGA) were retrieved to predict drug targets based on HP-PPIs with F. nucleatum proteome. Prediction of HP-PPIs exhibited 186 interactions contributed by 103 host and 76 bacterial proteins. Bacterial interacting partners were accounted as putative targets. And enrichment analysis of host interacting partners showed statistically enriched terms that were in positive correlation with CRC, atherosclerosis, cardiovascular, osteoporosis, Alzheimer's and other diseases. Subtractive genomics analysis provided a set of target proteins suggested to be indispensable for survival and pathogenicity of F. nucleatum. These target proteins might be considered for designing potent inhibitors to abrogate F. nucleatum infections. From enrichment analysis, it was hypothesized that F. nucleatum infection might enhance CRC progression by simultaneously regulating multiple signaling cascades which could lead to up-regulation of proinflammatory responses, oncogenes, modulation of host immune defense mechanism and suppression of DNA repair system.

  10. Database resources of the National Center for Biotechnology Information

    PubMed Central

    2015-01-01

    The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (Bookshelf, PubMed Central (PMC) and PubReader); medical genetics (ClinVar, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen); genes and genomics (BioProject, BioSample, dbSNP, dbVar, Epigenomics, Gene, Gene Expression Omnibus (GEO), Genome, HomoloGene, the Map Viewer, Nucleotide, PopSet, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser, Trace Archive and UniGene); and proteins and chemicals (Biosystems, COBALT, the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB), Protein Clusters, Protein and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for many of these databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. PMID:25398906

  11. Database resources of the National Center for Biotechnology Information

    PubMed Central

    2016-01-01

    The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (PubMed Central (PMC), Bookshelf and PubReader), health (ClinVar, dbGaP, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen), genomes (BioProject, Assembly, Genome, BioSample, dbSNP, dbVar, Epigenomics, the Map Viewer, Nucleotide, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser and the Trace Archive), genes (Gene, Gene Expression Omnibus (GEO), HomoloGene, PopSet and UniGene), proteins (Protein, the Conserved Domain Database (CDD), COBALT, Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB) and Protein Clusters) and chemicals (Biosystems and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for most of these databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:26615191

  12. The Biomolecular Interaction Network Database and related tools 2005 update

    PubMed Central

    Alfarano, C.; Andrade, C. E.; Anthony, K.; Bahroos, N.; Bajec, M.; Bantoft, K.; Betel, D.; Bobechko, B.; Boutilier, K.; Burgess, E.; Buzadzija, K.; Cavero, R.; D'Abreo, C.; Donaldson, I.; Dorairajoo, D.; Dumontier, M. J.; Dumontier, M. R.; Earles, V.; Farrall, R.; Feldman, H.; Garderman, E.; Gong, Y.; Gonzaga, R.; Grytsan, V.; Gryz, E.; Gu, V.; Haldorsen, E.; Halupa, A.; Haw, R.; Hrvojic, A.; Hurrell, L.; Isserlin, R.; Jack, F.; Juma, F.; Khan, A.; Kon, T.; Konopinsky, S.; Le, V.; Lee, E.; Ling, S.; Magidin, M.; Moniakis, J.; Montojo, J.; Moore, S.; Muskat, B.; Ng, I.; Paraiso, J. P.; Parker, B.; Pintilie, G.; Pirone, R.; Salama, J. J.; Sgro, S.; Shan, T.; Shu, Y.; Siew, J.; Skinner, D.; Snyder, K.; Stasiuk, R.; Strumpf, D.; Tuekam, B.; Tao, S.; Wang, Z.; White, M.; Willis, R.; Wolting, C.; Wong, S.; Wrong, A.; Xin, C.; Yao, R.; Yates, B.; Zhang, S.; Zheng, K.; Pawson, T.; Ouellette, B. F. F.; Hogue, C. W. V.

    2005-01-01

    The Biomolecular Interaction Network Database (BIND) (http://bind.ca) archives biomolecular interaction, reaction, complex and pathway information. Our aim is to curate the details about molecular interactions that arise from published experimental research and to provide this information, as well as tools to enable data analysis, freely to researchers worldwide. BIND data are curated into a comprehensive machine-readable archive of computable information and provides users with methods to discover interactions and molecular mechanisms. BIND has worked to develop new methods for visualization that amplify the underlying annotation of genes and proteins to facilitate the study of molecular interaction networks. BIND has maintained an open database policy since its inception in 1999. Data growth has proceeded at a tremendous rate, approaching over 100 000 records. New services provided include a new BIND Query and Submission interface, a Standard Object Access Protocol service and the Small Molecule Interaction Database (http://smid.blueprint.org) that allows users to determine probable small molecule binding sites of new sequences and examine conserved binding residues. PMID:15608229

  13. Data in support of the identification of neuronal and astrocyte proteins interacting with extracellularly applied oligomeric and fibrillar α-synuclein assemblies by mass spectrometry

    PubMed Central

    Shrivastava, Amulya Nidhi; Redeker, Virginie; Fritz, Nicolas; Pieri, Laura; Almeida, Leandro G.; Spolidoro, Maria; Liebmann, Thomas; Bousset, Luc; Renner, Marianne; Léna, Clément; Aperia, Anita; Melki, Ronald; Triller, Antoine

    2016-01-01

    α-Synuclein (α-syn) is the principal component of Lewy bodies, the pathophysiological hallmark of individuals affected by Parkinson disease (PD). This neuropathologic form of α-syn contributes to PD progression and propagation of α-syn assemblies between neurons. The data we present here support the proteomic analysis used to identify neuronal proteins that specifically interact with extracellularly applied oligomeric or fibrillar α-syn assemblies (conditions 1 and 2, respectively) (doi: 10.15252/embj.201591397[1]). α-syn assemblies and their cellular partner proteins were pulled down from neuronal cell lysed shortly after exposure to exogenous α-syn assemblies and the associated proteins were identified by mass spectrometry using a shotgun proteomic-based approach. We also performed experiments on pure cultures of astrocytes to identify astrocyte-specific proteins interacting with oligomeric or fibrillar α-syn (conditions 3 and 4, respectively). For each condition, proteins interacting selectively with α-syn assemblies were identified by comparison to proteins pulled-down from untreated cells used as controls. The mass spectrometry data, the database search and the peak lists have been deposited to the ProteomeXchange Consortium database via the PRIDE partner repository with the dataset identifiers PRIDE: PXD002256 to PRIDE: PXD002263 and doi: 10.6019/PXD002256 to 10.6019/PXD002263. PMID:26958642

  14. Data in support of the identification of neuronal and astrocyte proteins interacting with extracellularly applied oligomeric and fibrillar α-synuclein assemblies by mass spectrometry.

    PubMed

    Shrivastava, Amulya Nidhi; Redeker, Virginie; Fritz, Nicolas; Pieri, Laura; Almeida, Leandro G; Spolidoro, Maria; Liebmann, Thomas; Bousset, Luc; Renner, Marianne; Léna, Clément; Aperia, Anita; Melki, Ronald; Triller, Antoine

    2016-06-01

    α-Synuclein (α-syn) is the principal component of Lewy bodies, the pathophysiological hallmark of individuals affected by Parkinson disease (PD). This neuropathologic form of α-syn contributes to PD progression and propagation of α-syn assemblies between neurons. The data we present here support the proteomic analysis used to identify neuronal proteins that specifically interact with extracellularly applied oligomeric or fibrillar α-syn assemblies (conditions 1 and 2, respectively) (doi: 10.15252/embj.201591397[1]). α-syn assemblies and their cellular partner proteins were pulled down from neuronal cell lysed shortly after exposure to exogenous α-syn assemblies and the associated proteins were identified by mass spectrometry using a shotgun proteomic-based approach. We also performed experiments on pure cultures of astrocytes to identify astrocyte-specific proteins interacting with oligomeric or fibrillar α-syn (conditions 3 and 4, respectively). For each condition, proteins interacting selectively with α-syn assemblies were identified by comparison to proteins pulled-down from untreated cells used as controls. The mass spectrometry data, the database search and the peak lists have been deposited to the ProteomeXchange Consortium database via the PRIDE partner repository with the dataset identifiers PRIDE: PXD002256 to PRIDE: PXD002263 and doi: 10.6019/PXD002256 to 10.6019/PXD002263.

  15. VaProS: a database-integration approach for protein/genome information retrieval.

    PubMed

    Gojobori, Takashi; Ikeo, Kazuho; Katayama, Yukie; Kawabata, Takeshi; Kinjo, Akira R; Kinoshita, Kengo; Kwon, Yeondae; Migita, Ohsuke; Mizutani, Hisashi; Muraoka, Masafumi; Nagata, Koji; Omori, Satoshi; Sugawara, Hideaki; Yamada, Daichi; Yura, Kei

    2016-12-01

    Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein-protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts' knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/ .

  16. sc-PDB: a 3D-database of ligandable binding sites--10 years on.

    PubMed

    Desaphy, Jérémy; Bret, Guillaume; Rognan, Didier; Kellenberger, Esther

    2015-01-01

    The sc-PDB database (available at http://bioinfo-pharma.u-strasbg.fr/scPDB/) is a comprehensive and up-to-date selection of ligandable binding sites of the Protein Data Bank. Sites are defined from complexes between a protein and a pharmacological ligand. The database provides the all-atom description of the protein, its ligand, their binding site and their binding mode. Currently, the sc-PDB archive registers 9283 binding sites from 3678 unique proteins and 5608 unique ligands. The sc-PDB database was publicly launched in 2004 with the aim of providing structure files suitable for computational approaches to drug design, such as docking. During the last 10 years we have improved and standardized the processes for (i) identifying binding sites, (ii) correcting structures, (iii) annotating protein function and ligand properties and (iv) characterizing their binding mode. This paper presents the latest enhancements in the database, specifically pertaining to the representation of molecular interaction and to the similarity between ligand/protein binding patterns. The new website puts emphasis in pictorial analysis of data. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Columba: an integrated database of proteins, structures, and annotations.

    PubMed

    Trissl, Silke; Rother, Kristian; Müller, Heiko; Steinke, Thomas; Koch, Ina; Preissner, Robert; Frömmel, Cornelius; Leser, Ulf

    2005-03-31

    Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures. COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web. The COLUMBA database facilitates the creation of protein structure data sets for many structure-based studies. It allows to combine queries on a number of structure-related databases not covered by other projects at present. Thus, information on both many and few protein structures can be used efficiently. The web interface for COLUMBA is available at http://www.columba-db.de.

  18. JAIL: a structure-based interface library for macromolecules.

    PubMed

    Günther, Stefan; von Eichborn, Joachim; May, Patrick; Preissner, Robert

    2009-01-01

    The increasing number of solved macromolecules provides a solid number of 3D interfaces, if all types of molecular contacts are being considered. JAIL annotates three different kinds of macromolecular interfaces, those between interacting protein domains, interfaces of different protein chains and interfaces between proteins and nucleic acids. This results in a total number of about 184,000 database entries. All the interfaces can easily be identified by a detailed search form or by a hierarchical tree that describes the protein domain architectures classified by the SCOP database. Visual inspection of the interfaces is possible via an interactive protein viewer. Furthermore, large scale analyses are supported by an implemented sequential and by a structural clustering. Similar interfaces as well as non-redundant interfaces can be easily picked out. Additionally, the sequential conservation of binding sites was also included in the database and is retrievable via Jmol. A comprehensive download section allows the composition of representative data sets with user defined parameters. The huge data set in combination with various search options allow a comprehensive view on all interfaces between macromolecules included in the Protein Data Bank (PDB). The download of the data sets supports numerous further investigations in macromolecular recognition. JAIL is publicly available at http://bioinformatics.charite.de/jail.

  19. MitoNuc: a database of nuclear genes coding for mitochondrial proteins. Update 2002.

    PubMed

    Attimonelli, Marcella; Catalano, Domenico; Gissi, Carmela; Grillo, Giorgio; Licciulli, Flavio; Liuni, Sabino; Santamaria, Monica; Pesole, Graziano; Saccone, Cecilia

    2002-01-01

    Mitochondria, besides their central role in energy metabolism, have recently been found to be involved in a number of basic processes of cell life and to contribute to the pathogenesis of many degenerative diseases. All functions of mitochondria depend on the interaction of nuclear and organelle genomes. Mitochondrial genomes have been extensively sequenced and analysed and data have been collected in several specialised databases. In order to collect information on nuclear coded mitochondrial proteins we developed MitoNuc, a database containing detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa. The MitoNuc database can be retrieved through SRS and is available via the web site http://bighost.area.ba.cnr.it/mitochondriome where other mitochondrial databases developed by our group, the complete list of the sequenced mitochondrial genomes, links to other mitochondrial sites and related information, are available. The MitoAln database, related to MitoNuc in the previous release, reporting the multiple alignments of the relevant homologous protein coding regions, is no longer supported in the present release. In order to keep the links among entries in MitoNuc from homologous proteins, a new field in the database has been defined: the cluster identifier, an alpha numeric code used to identify each cluster of homologous proteins. A comment field derived from the corresponding SWISS-PROT entry has been introduced; this reports clinical data related to dysfunction of the protein. The logic scheme of MitoNuc database has been implemented in the ORACLE DBMS. This will allow the end-users to retrieve data through a friendly interface that will be soon implemented.

  20. RAID: a comprehensive resource for human RNA-associated (RNA–RNA/RNA–protein) interaction

    PubMed Central

    Zhang, Xiaomeng; Wu, Deng; Chen, Liqun; Li, Xiang; Yang, Jinxurong; Fan, Dandan; Dong, Tingting; Liu, Mingyue; Tan, Puwen; Xu, Jintian; Yi, Ying; Wang, Yuting; Zou, Hua; Hu, Yongfei; Fan, Kaili; Kang, Juanjuan; Huang, Yan; Miao, Zhengqiang; Bi, Miaoman; Jin, Nana; Li, Kongning; Li, Xia; Xu, Jianzhen; Wang, Dong

    2014-01-01

    Transcriptomic analyses have revealed an unexpected complexity in the eukaryote transcriptome, which includes not only protein-coding transcripts but also an expanding catalog of noncoding RNAs (ncRNAs). Diverse coding and noncoding RNAs (ncRNAs) perform functions through interaction with each other in various cellular processes. In this project, we have developed RAID (http://www.rna-society.org/raid), an RNA-associated (RNA–RNA/RNA–protein) interaction database. RAID intends to provide the scientific community with all-in-one resources for efficient browsing and extraction of the RNA-associated interactions in human. This version of RAID contains more than 6100 RNA-associated interactions obtained by manually reviewing more than 2100 published papers, including 4493 RNA–RNA interactions and 1619 RNA–protein interactions. Each entry contains detailed information on an RNA-associated interaction, including RAID ID, RNA/protein symbol, RNA/protein categories, validated method, expressing tissue, literature references (Pubmed IDs), and detailed functional description. Users can query, browse, analyze, and manipulate RNA-associated (RNA–RNA/RNA–protein) interaction. RAID provides a comprehensive resource of human RNA-associated (RNA–RNA/RNA–protein) interaction network. Furthermore, this resource will help in uncovering the generic organizing principles of cellular function network. PMID:24803509

  1. Gene essentiality and the topology of protein interaction networks

    PubMed Central

    Coulomb, Stéphane; Bauer, Michel; Bernard, Denis; Marsolier-Kergoat, Marie-Claude

    2005-01-01

    The mechanistic bases for gene essentiality and for cell mutational resistance have long been disputed. The recent availability of large protein interaction databases has fuelled the analysis of protein interaction networks and several authors have proposed that gene dispensability could be strongly related to some topological parameters of these networks. However, many results were based on protein interaction data whose biases were not taken into account. In this article, we show that the essentiality of a gene in yeast is poorly related to the number of interactants (or degree) of the corresponding protein and that the physiological consequences of gene deletions are unrelated to several other properties of proteins in the interaction networks, such as the average degrees of their nearest neighbours, their clustering coefficients or their relative distances. We also found that yeast protein interaction networks lack degree correlation, i.e. a propensity for their vertices to associate according to their degrees. Gene essentiality and more generally cell resistance against mutations thus seem largely unrelated to many parameters of protein network topology. PMID:16087428

  2. In silico study of protein to protein interaction analysis of AMP-activated protein kinase and mitochondrial activity in three different farm animal species

    NASA Astrophysics Data System (ADS)

    Prastowo, S.; Widyas, N.

    2018-03-01

    AMP-activated protein kinase (AMPK) is cellular energy censor which works based on ATP and AMP concentration. This protein interacts with mitochondria in determine its activity to generate energy for cell metabolism purposes. For that, this paper aims to compare the protein to protein interaction of AMPK and mitochondrial activity genes in the metabolism of known animal farm (domesticated) that are cattle (Bos taurus), pig (Sus scrofa) and chicken (Gallus gallus). In silico study was done using STRING V.10 as prominent protein interaction database, followed with biological function comparison in KEGG PATHWAY database. Set of genes (12 in total) were used as input analysis that are PRKAA1, PRKAA2, PRKAB1, PRKAB2, PRKAG1, PRKAG2, PRKAG3, PPARGC1, ACC, CPT1B, NRF2 and SOD. The first 7 genes belong to gene in AMPK family, while the last 5 belong to mitochondrial activity genes. The protein interaction result shows 11, 8 and 5 metabolism pathways in Bos taurus, Sus scrofa and Gallus gallus, respectively. The top pathway in Bos taurus is AMPK signaling pathway (10 genes), Sus scrofa is Adipocytokine signaling pathway (8 genes) and Gallus gallus is FoxO signaling pathway (5 genes). Moreover, the common pathways found in those 3 species are Adipocytokine signaling pathway, Insulin signaling pathway and FoxO signaling pathway. Genes clustered in Adipocytokine and Insulin signaling pathway are PRKAA2, PPARGC1A, PRKAB1 and PRKAG2. While, in FoxO signaling pathway are PRKAA2, PRKAB1, PRKAG2. According to that, we found PRKAA2, PRKAB1 and PRKAG2 are the common genes. Based on the bioinformatics analysis, we can demonstrate that protein to protein interaction shows distinct different of metabolism in different species. However, further validation is needed to give a clear explanation.

  3. dbAMEPNI: a database of alanine mutagenic effects for protein–nucleic acid interactions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Ling; Xiong, Yi; Gao, Hongyun

    Protein–nucleic acid interactions play essential roles in various biological activities such as gene regulation, transcription, DNA repair and DNA packaging. Understanding the effects of amino acid substitutions on protein–nucleic acid binding affinities can help elucidate the molecular mechanism of protein–nucleic acid recognition. Until now, no comprehensive and updated database of quantitative binding data on alanine mutagenic effects for protein–nucleic acid interactions is publicly accessible. Thus, we developed a new database of Alanine Mutagenic Effects for Protein-Nucleic Acid Interactions (dbAMEPNI). dbAMEPNI is a manually curated, literature-derived database, comprising over 577 alanine mutagenic data with experimentally determined binding affinities for protein–nucleic acidmore » complexes. Here, it contains several important parameters, such as dissociation constant (Kd), Gibbs free energy change (ΔΔG), experimental conditions and structural parameters of mutant residues. In addition, the database provides an extended dataset of 282 single alanine mutations with only qualitative data (or descriptive effects) of thermodynamic information.« less

  4. dbAMEPNI: a database of alanine mutagenic effects for protein–nucleic acid interactions

    DOE PAGES

    Liu, Ling; Xiong, Yi; Gao, Hongyun; ...

    2018-04-02

    Protein–nucleic acid interactions play essential roles in various biological activities such as gene regulation, transcription, DNA repair and DNA packaging. Understanding the effects of amino acid substitutions on protein–nucleic acid binding affinities can help elucidate the molecular mechanism of protein–nucleic acid recognition. Until now, no comprehensive and updated database of quantitative binding data on alanine mutagenic effects for protein–nucleic acid interactions is publicly accessible. Thus, we developed a new database of Alanine Mutagenic Effects for Protein-Nucleic Acid Interactions (dbAMEPNI). dbAMEPNI is a manually curated, literature-derived database, comprising over 577 alanine mutagenic data with experimentally determined binding affinities for protein–nucleic acidmore » complexes. Here, it contains several important parameters, such as dissociation constant (Kd), Gibbs free energy change (ΔΔG), experimental conditions and structural parameters of mutant residues. In addition, the database provides an extended dataset of 282 single alanine mutations with only qualitative data (or descriptive effects) of thermodynamic information.« less

  5. Integration of multiple biological features yields high confidence human protein interactome.

    PubMed

    Karagoz, Kubra; Sevimoglu, Tuba; Arga, Kazim Yalcin

    2016-08-21

    The biological function of a protein is usually determined by its physical interaction with other proteins. Protein-protein interactions (PPIs) are identified through various experimental methods and are stored in curated databases. The noisiness of the existing PPI data is evident, and it is essential that a more reliable data is generated. Furthermore, the selection of a set of PPIs at different confidence levels might be necessary for many studies. Although different methodologies were introduced to evaluate the confidence scores for binary interactions, a highly reliable, almost complete PPI network of Homo sapiens is not proposed yet. The quality and coverage of human protein interactome need to be improved to be used in various disciplines, especially in biomedicine. In the present work, we propose an unsupervised statistical approach to assign confidence scores to PPIs of H. sapiens. To achieve this goal PPI data from six different databases were collected and a total of 295,288 non-redundant interactions between 15,950 proteins were acquired. The present scoring system included the context information that was assigned to PPIs derived from eight biological attributes. A high confidence network, which included 147,923 binary interactions between 13,213 proteins, had scores greater than the cutoff value of 0.80, for which sensitivity, specificity, and coverage were 94.5%, 80.9%, and 82.8%, respectively. We compared the present scoring method with others for evaluation. Reducing the noise inherent in experimental PPIs via our scoring scheme increased the accuracy significantly. As it was demonstrated through the assessment of process and cancer subnetworks, this study allows researchers to construct and analyze context-specific networks via valid PPI sets and one can easily achieve subnetworks around proteins of interest at a specified confidence level. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. ATtRACT-a database of RNA-binding proteins and associated motifs.

    PubMed

    Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

    2016-01-01

    RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. © The Author(s) 2016. Published by Oxford University Press.

  7. EcoCyc: a comprehensive database resource for Escherichia coli

    PubMed Central

    Keseler, Ingrid M.; Collado-Vides, Julio; Gama-Castro, Socorro; Ingraham, John; Paley, Suzanne; Paulsen, Ian T.; Peralta-Gil, Martín; Karp, Peter D.

    2005-01-01

    The EcoCyc database (http://EcoCyc.org/) is a comprehensive source of information on the biology of the prototypical model organism Escherichia coli K12. The mission for EcoCyc is to contain both computable descriptions of, and detailed comments describing, all genes, proteins, pathways and molecular interactions in E.coli. Through ongoing manual curation, extensive information such as summary comments, regulatory information, literature citations and evidence types has been extracted from 8862 publications and added to Version 8.5 of the EcoCyc database. The EcoCyc database can be accessed through a World Wide Web interface, while the downloadable Pathway Tools software and data files enable computational exploration of the data and provide enhanced querying capabilities that web interfaces cannot support. For example, EcoCyc contains carefully curated information that can be used as training sets for bioinformatics prediction of entities such as promoters, operons, genetic networks, transcription factor binding sites, metabolic pathways, functionally related genes, protein complexes and protein–ligand interactions. PMID:15608210

  8. Protein annotation from protein interaction networks and Gene Ontology.

    PubMed

    Nguyen, Cao D; Gardiner, Katheleen J; Cios, Krzysztof J

    2011-10-01

    We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precision and 60% recall versus 45% and 26% for Majority and 24% and 61% for χ²-statistics, respectively. Copyright © 2011 Elsevier Inc. All rights reserved.

  9. PIPE: a protein–protein interaction passage extraction module for BioCreative challenge

    PubMed Central

    Chu, Chun-Han; Su, Yu-Chen; Chen, Chien Chin; Hsu, Wen-Lian

    2016-01-01

    Identifying the interactions between proteins mentioned in biomedical literatures is one of the frequently discussed topics of text mining in the life science field. In this article, we propose PIPE, an interaction pattern generation module used in the Collaborative Biocurator Assistant Task at BioCreative V (http://www.biocreative.org/) to capture frequent protein-protein interaction (PPI) patterns within text. We also present an interaction pattern tree (IPT) kernel method that integrates the PPI patterns with convolution tree kernel (CTK) to extract PPIs. Methods were evaluated on LLL, IEPA, HPRD50, AIMed and BioInfer corpora using cross-validation, cross-learning and cross-corpus evaluation. Empirical evaluations demonstrate that our method is effective and outperforms several well-known PPI extraction methods. Database URL: PMID:27524807

  10. Semantic integration to identify overlapping functional modules in protein interaction networks

    PubMed Central

    Cho, Young-Rae; Hwang, Woochang; Ramanathan, Murali; Zhang, Aidong

    2007-01-01

    Background The systematic analysis of protein-protein interactions can enable a better understanding of cellular organization, processes and functions. Functional modules can be identified from the protein interaction networks derived from experimental data sets. However, these analyses are challenging because of the presence of unreliable interactions and the complex connectivity of the network. The integration of protein-protein interactions with the data from other sources can be leveraged for improving the effectiveness of functional module detection algorithms. Results We have developed novel metrics, called semantic similarity and semantic interactivity, which use Gene Ontology (GO) annotations to measure the reliability of protein-protein interactions. The protein interaction networks can be converted into a weighted graph representation by assigning the reliability values to each interaction as a weight. We presented a flow-based modularization algorithm to efficiently identify overlapping modules in the weighted interaction networks. The experimental results show that the semantic similarity and semantic interactivity of interacting pairs were positively correlated with functional co-occurrence. The effectiveness of the algorithm for identifying modules was evaluated using functional categories from the MIPS database. We demonstrated that our algorithm had higher accuracy compared to other competing approaches. Conclusion The integration of protein interaction networks with GO annotation data and the capability of detecting overlapping modules substantially improve the accuracy of module identification. PMID:17650343

  11. Revealing the potential pathogenesis of glioma by utilizing a glioma associated protein-protein interaction network.

    PubMed

    Pan, Weiran; Li, Gang; Yang, Xiaoxiao; Miao, Jinming

    2015-04-01

    This study aims to explore the potential mechanism of glioma through bioinformatic approaches. The gene expression profile (GSE4290) of glioma tumor and non-tumor samples was downloaded from Gene Expression Omnibus database. A total of 180 samples were available, including 23 non-tumor and 157 tumor samples. Then the raw data were preprocessed using robust multiarray analysis, and 8,890 differentially expressed genes (DEGs) were identified by using t-test (false discovery rate < 0.0005). Furthermore, 16 known glioma related genes were abstracted from Genetic Association Database. After mapping 8,890 DEGs and 16 known glioma related genes to Human Protein Reference Database, a glioma associated protein-protein interaction network (GAPN) was constructed. In addition, 51 sub-networks in GAPN were screened out through Molecular Complex Detection (score ≥ 1), and sub-network 1 was found to have the closest interaction (score = 3). What' more, for the top 10 sub-networks, Gene Ontology (GO) enrichment analysis (p value < 0.05) was performed, and DEGs involved in sub-network 1 and 2, such as BRMS1L and CCNA1, were predicted to regulate cell growth, cell cycle, and DNA replication via interacting with known glioma related genes. Finally, the overlaps of DEGs and human essential, housekeeping, tissue-specific genes were calculated (p value = 1.0, 1.0, and 0.00014, respectively) and visualized by Venn Diagram package in R. About 61% of human tissue-specific genes were DEGs as well. This research shed new light on the pathogenesis of glioma based on DEGs and GAPN, and our findings might provide potential targets for clinical glioma treatment.

  12. HypoxiaDB: a database of hypoxia-regulated proteins

    PubMed Central

    Khurana, Pankaj; Sugadev, Ragumani; Jain, Jaspreet; Singh, Shashi Bala

    2013-01-01

    There has been intense interest in the cellular response to hypoxia, and a large number of differentially expressed proteins have been identified through various high-throughput experiments. These valuable data are scattered, and there have been no systematic attempts to document the various proteins regulated by hypoxia. Compilation, curation and annotation of these data are important in deciphering their role in hypoxia and hypoxia-related disorders. Therefore, we have compiled HypoxiaDB, a database of hypoxia-regulated proteins. It is a comprehensive, manually-curated, non-redundant catalog of proteins whose expressions are shown experimentally to be altered at different levels and durations of hypoxia. The database currently contains 72 000 manually curated entries taken on 3500 proteins extracted from 73 peer-reviewed publications selected from PubMed. HypoxiaDB is distinctive from other generalized databases: (i) it compiles tissue-specific protein expression changes under different levels and duration of hypoxia. Also, it provides manually curated literature references to support the inclusion of the protein in the database and establish its association with hypoxia. (ii) For each protein, HypoxiaDB integrates data on gene ontology, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway, protein–protein interactions, protein family (Pfam), OMIM (Online Mendelian Inheritance in Man), PDB (Protein Data Bank) structures and homology to other sequenced genomes. (iii) It also provides pre-compiled information on hypoxia-proteins, which otherwise requires tedious computational analysis. This includes information like chromosomal location, identifiers like Entrez, HGNC, Unigene, Uniprot, Ensembl, Vega, GI numbers and Genbank accession numbers associated with the protein. These are further cross-linked to respective public databases augmenting HypoxiaDB to the external repositories. (iv) In addition, HypoxiaDB provides an online sequence-similarity search tool for users to compare their protein sequences with HypoxiaDB protein database. We hope that HypoxiaDB will enrich our knowledge about hypoxia-related biology and eventually will lead to the development of novel hypothesis and advancements in diagnostic and therapeutic activities. HypoxiaDB is freely accessible for academic and non-profit users via http://www.hypoxiadb.com. Database URL: http://www.hypoxiadb.com PMID:24178989

  13. Update of KDBI: Kinetic Data of Bio-molecular Interaction database

    PubMed Central

    Kumar, Pankaj; Han, B. C.; Shi, Z.; Jia, J.; Wang, Y. P.; Zhang, Y. T.; Liang, L.; Liu, Q. F.; Ji, Z. L.; Chen, Y. Z.

    2009-01-01

    Knowledge of the kinetics of biomolecular interactions is important for facilitating the study of cellular processes and underlying molecular events, and is essential for quantitative study and simulation of biological systems. Kinetic Data of Bio-molecular Interaction database (KDBI) has been developed to provide information about experimentally determined kinetic data of protein–protein, protein–nucleic acid, protein–ligand, nucleic acid–ligand binding or reaction events described in the literature. To accommodate increasing demand for studying and simulating biological systems, numerous improvements and updates have been made to KDBI, including new ways to access data by pathway and molecule names, data file in System Biology Markup Language format, more efficient search engine, access to published parameter sets of simulation models of 63 pathways, and 2.3-fold increase of data (19 263 entries of 10 532 distinctive biomolecular binding and 11 954 interaction events, involving 2635 proteins/protein complexes, 847 nucleic acids, 1603 small molecules and 45 multi-step processes). KDBI is publically available at http://bidd.nus.edu.sg/group/kdbi/kdbi.asp. PMID:18971255

  14. Changes in the Proteome of Xylem Sap in Brassica oleracea in Response to Fusarium oxysporum Stress

    PubMed Central

    Pu, Zijing; Ino, Yoko; Kimura, Yayoi; Tago, Asumi; Shimizu, Motoki; Natsume, Satoshi; Sano, Yoshitaka; Fujimoto, Ryo; Kaneko, Kentaro; Shea, Daniel J.; Fukai, Eigo; Fuji, Shin-Ichi; Hirano, Hisashi; Okazaki, Keiichi

    2016-01-01

    Fusarium oxysporum f.sp. conlutinans (Foc) is a serious root-invading and xylem-colonizing fungus that causes yellowing in Brassica oleracea. To comprehensively understand the interaction between F. oxysporum and B. oleracea, composition of the xylem sap proteome of the non-infected and Foc-infected plants was investigated in both resistant and susceptible cultivars using liquid chromatography-tandem mass spectrometry (LC-MS/MS) after in-solution digestion of xylem sap proteins. Whole genome sequencing of Foc was carried out and generated a predicted Foc protein database. The predicted Foc protein database was then combined with the public B. oleracea and B. rapa protein databases downloaded from Uniprot and used for protein identification. About 200 plant proteins were identified in the xylem sap of susceptible and resistant plants. Comparison between the non-infected and Foc-infected samples revealed that Foc infection causes changes to the protein composition in B. oleracea xylem sap where repressed proteins accounted for a greater proportion than those of induced in both the susceptible and resistant reactions. The analysis on the proteins with concentration change > = 2-fold indicated a large portion of up- and down-regulated proteins were those acting on carbohydrates. Proteins with leucine-rich repeats and legume lectin domains were mainly induced in both resistant and susceptible system, so was the case of thaumatins. Twenty-five Foc proteins were identified in the infected xylem sap and 10 of them were cysteine-containing secreted small proteins that are good candidates for virulence and/or avirulence effectors. The findings of differential response of protein contents in the xylem sap between the non-infected and Foc-infected samples as well as the Foc candidate effectors secreted in xylem provide valuable insights into B. oleracea-Foc interactions. PMID:26870056

  15. Changes in the Proteome of Xylem Sap in Brassica oleracea in Response to Fusarium oxysporum Stress.

    PubMed

    Pu, Zijing; Ino, Yoko; Kimura, Yayoi; Tago, Asumi; Shimizu, Motoki; Natsume, Satoshi; Sano, Yoshitaka; Fujimoto, Ryo; Kaneko, Kentaro; Shea, Daniel J; Fukai, Eigo; Fuji, Shin-Ichi; Hirano, Hisashi; Okazaki, Keiichi

    2016-01-01

    Fusarium oxysporum f.sp. conlutinans (Foc) is a serious root-invading and xylem-colonizing fungus that causes yellowing in Brassica oleracea. To comprehensively understand the interaction between F. oxysporum and B. oleracea, composition of the xylem sap proteome of the non-infected and Foc-infected plants was investigated in both resistant and susceptible cultivars using liquid chromatography-tandem mass spectrometry (LC-MS/MS) after in-solution digestion of xylem sap proteins. Whole genome sequencing of Foc was carried out and generated a predicted Foc protein database. The predicted Foc protein database was then combined with the public B. oleracea and B. rapa protein databases downloaded from Uniprot and used for protein identification. About 200 plant proteins were identified in the xylem sap of susceptible and resistant plants. Comparison between the non-infected and Foc-infected samples revealed that Foc infection causes changes to the protein composition in B. oleracea xylem sap where repressed proteins accounted for a greater proportion than those of induced in both the susceptible and resistant reactions. The analysis on the proteins with concentration change > = 2-fold indicated a large portion of up- and down-regulated proteins were those acting on carbohydrates. Proteins with leucine-rich repeats and legume lectin domains were mainly induced in both resistant and susceptible system, so was the case of thaumatins. Twenty-five Foc proteins were identified in the infected xylem sap and 10 of them were cysteine-containing secreted small proteins that are good candidates for virulence and/or avirulence effectors. The findings of differential response of protein contents in the xylem sap between the non-infected and Foc-infected samples as well as the Foc candidate effectors secreted in xylem provide valuable insights into B. oleracea-Foc interactions.

  16. Prediction of Heterodimeric Protein Complexes from Weighted Protein-Protein Interaction Networks Using Novel Features and Kernel Functions

    PubMed Central

    Ruan, Peiying; Hayashida, Morihiro; Maruyama, Osamu; Akutsu, Tatsuya

    2013-01-01

    Since many proteins express their functional activity by interacting with other proteins and forming protein complexes, it is very useful to identify sets of proteins that form complexes. For that purpose, many prediction methods for protein complexes from protein-protein interactions have been developed such as MCL, MCODE, RNSC, PCP, RRW, and NWE. These methods have dealt with only complexes with size of more than three because the methods often are based on some density of subgraphs. However, heterodimeric protein complexes that consist of two distinct proteins occupy a large part according to several comprehensive databases of known complexes. In this paper, we propose several feature space mappings from protein-protein interaction data, in which each interaction is weighted based on reliability. Furthermore, we make use of prior knowledge on protein domains to develop feature space mappings, domain composition kernel and its combination kernel with our proposed features. We perform ten-fold cross-validation computational experiments. These results suggest that our proposed kernel considerably outperforms the naive Bayes-based method, which is the best existing method for predicting heterodimeric protein complexes. PMID:23776458

  17. Catalytic site identification—a web server to identify catalytic site structural matches throughout PDB

    PubMed Central

    Kirshner, Daniel A.; Nilmeier, Jerome P.; Lightstone, Felice C.

    2013-01-01

    The catalytic site identification web server provides the innovative capability to find structural matches to a user-specified catalytic site among all Protein Data Bank proteins rapidly (in less than a minute). The server also can examine a user-specified protein structure or model to identify structural matches to a library of catalytic sites. Finally, the server provides a database of pre-calculated matches between all Protein Data Bank proteins and the library of catalytic sites. The database has been used to derive a set of hypothesized novel enzymatic function annotations. In all cases, matches and putative binding sites (protein structure and surfaces) can be visualized interactively online. The website can be accessed at http://catsid.llnl.gov. PMID:23680785

  18. Catalytic site identification--a web server to identify catalytic site structural matches throughout PDB.

    PubMed

    Kirshner, Daniel A; Nilmeier, Jerome P; Lightstone, Felice C

    2013-07-01

    The catalytic site identification web server provides the innovative capability to find structural matches to a user-specified catalytic site among all Protein Data Bank proteins rapidly (in less than a minute). The server also can examine a user-specified protein structure or model to identify structural matches to a library of catalytic sites. Finally, the server provides a database of pre-calculated matches between all Protein Data Bank proteins and the library of catalytic sites. The database has been used to derive a set of hypothesized novel enzymatic function annotations. In all cases, matches and putative binding sites (protein structure and surfaces) can be visualized interactively online. The website can be accessed at http://catsid.llnl.gov.

  19. ActiveDriverDB: human disease mutations and genome variation in post-translational modification sites of proteins

    PubMed Central

    Krassowski, Michal; Paczkowska, Marta; Cullion, Kim; Huang, Tina; Dzneladze, Irakli; Ouellette, B F Francis; Yamada, Joseph T; Fradet-Turcotte, Amelie

    2018-01-01

    Abstract Interpretation of genetic variation is needed for deciphering genotype-phenotype associations, mechanisms of inherited disease, and cancer driver mutations. Millions of single nucleotide variants (SNVs) in human genomes are known and thousands are associated with disease. An estimated 21% of disease-associated amino acid substitutions corresponding to missense SNVs are located in protein sites of post-translational modifications (PTMs), chemical modifications of amino acids that extend protein function. ActiveDriverDB is a comprehensive human proteo-genomics database that annotates disease mutations and population variants through the lens of PTMs. We integrated >385,000 published PTM sites with ∼3.6 million substitutions from The Cancer Genome Atlas (TCGA), the ClinVar database of disease genes, and human genome sequencing projects. The database includes site-specific interaction networks of proteins, upstream enzymes such as kinases, and drugs targeting these enzymes. We also predicted network-rewiring impact of mutations by analyzing gains and losses of kinase-bound sequence motifs. ActiveDriverDB provides detailed visualization, filtering, browsing and searching options for studying PTM-associated mutations. Users can upload mutation datasets interactively and use our application programming interface in pipelines. Integrative analysis of mutations and PTMs may help decipher molecular mechanisms of phenotypes and disease, as exemplified by case studies of TP53, BRCA2 and VHL. The open-source database is available at https://www.ActiveDriverDB.org. PMID:29126202

  20. Global De Novo Protein-Protein Interactome Elucidates Interactions of Drought-Responsive Proteins in Horse Gram (Macrotyloma uniflorum).

    PubMed

    Bhardwaj, Jyoti; Gangwar, Indu; Panzade, Ganesh; Shankar, Ravi; Yadav, Sudesh Kumar

    2016-06-03

    Inspired by the availability of de novo transcriptome of horse gram (Macrotyloma uniflorum) and recent developments in systems biology studies, the first ever global protein-protein interactome (PPI) map was constructed for this highly drought-tolerant legume. Large-scale studies of PPIs and the constructed database would provide rationale behind the interplay at cascading translational levels for drought stress-adaptive mechanisms in horse gram. Using a bidirectional approach (interolog and domain-based), a high-confidence interactome map and database for horse gram was constructed. Available transcriptomic information for shoot and root tissues of a sensitive (M-191; genotype 1) and a drought-tolerant (M-249; genotype 2) genotype of horse gram was utilized to draw comparative PPI subnetworks under drought stress. High-confidence 6804 interactions were predicted among 1812 proteins covering about one-fourth of the horse gram proteome. The highest number of interactions (33.86%) in horse gram interactome matched with Arabidopsis PPI data. The top five hub nodes mostly included ubiquitin and heat-shock-related proteins. Higher numbers of PPIs were found to be responsive in shoot tissue (416) and root tissue (2228) of genotype 2 compared with shoot tissue (136) and root tissue (579) of genotype 1. Characterization of PPIs using gene ontology analysis revealed that kinase and transferase activities involved in signal transduction, cellular processes, nucleocytoplasmic transport, protein ubiquitination, and localization of molecules were most responsive to drought stress. Hence, these could be framed in stress adaptive mechanisms of horse gram. Being the first legume global PPI map, it would provide new insights into gene and protein regulatory networks for drought stress tolerance mechanisms in horse gram. Information compiled in the form of database (MauPIR) will provide the much needed high-confidence systems biology information for horse gram genes, proteins, and involved processes. This information would ease the effort and increase the efficacy for similar studies on other legumes. Public access is available at http://14.139.59.221/MauPIR/ .

  1. Evolution and function of CAG/polyglutamine repeats in protein–protein interaction networks

    PubMed Central

    Schaefer, Martin H.; Wanker, Erich E.; Andrade-Navarro, Miguel A.

    2012-01-01

    Expanded runs of consecutive trinucleotide CAG repeats encoding polyglutamine (polyQ) stretches are observed in the genes of a large number of patients with different genetic diseases such as Huntington's and several Ataxias. Protein aggregation, which is a key feature of most of these diseases, is thought to be triggered by these expanded polyQ sequences in disease-related proteins. However, polyQ tracts are a normal feature of many human proteins, suggesting that they have an important cellular function. To clarify the potential function of polyQ repeats in biological systems, we systematically analyzed available information stored in sequence and protein interaction databases. By integrating genomic, phylogenetic, protein interaction network and functional information, we obtained evidence that polyQ tracts in proteins stabilize protein interactions. This happens most likely through structural changes whereby the polyQ sequence extends a neighboring coiled-coil region to facilitate its interaction with a coiled-coil region in another protein. Alteration of this important biological function due to polyQ expansion results in gain of abnormal interactions, leading to pathological effects like protein aggregation. Our analyses suggest that research on polyQ proteins should shift focus from expanded polyQ proteins into the characterization of the influence of the wild-type polyQ on protein interactions. PMID:22287626

  2. RAID v2.0: an updated resource of RNA-associated interactions across organisms

    PubMed Central

    Yi, Ying; Zhao, Yue; Li, Chunhua; Zhang, Lin; Huang, Huiying; Li, Yana; Liu, Lanlan; Hou, Ping; Cui, Tianyu; Tan, Puwen; Hu, Yongfei; Zhang, Ting; Huang, Yan; Li, Xiaobo; Yu, Jia; Wang, Dong

    2017-01-01

    With the development of biotechnologies and computational prediction algorithms, the number of experimental and computational prediction RNA-associated interactions has grown rapidly in recent years. However, diverse RNA-associated interactions are scattered over a wide variety of resources and organisms, whereas a fully comprehensive view of diverse RNA-associated interactions is still not available for any species. Hence, we have updated the RAID database to version 2.0 (RAID v2.0, www.rna-society.org/raid/) by integrating experimental and computational prediction interactions from manually reading literature and other database resources under one common framework. The new developments in RAID v2.0 include (i) over 850-fold RNA-associated interactions, an enhancement compared to the previous version; (ii) numerous resources integrated with experimental or computational prediction evidence for each RNA-associated interaction; (iii) a reliability assessment for each RNA-associated interaction based on an integrative confidence score; and (iv) an increase of species coverage to 60. Consequently, RAID v2.0 recruits more than 5.27 million RNA-associated interactions, including more than 4 million RNA–RNA interactions and more than 1.2 million RNA–protein interactions, referring to nearly 130 000 RNA/protein symbols across 60 species. PMID:27899615

  3. Feature selection and classification of protein-protein complexes based on their binding affinities using machine learning approaches.

    PubMed

    Yugandhar, K; Gromiha, M Michael

    2014-09-01

    Protein-protein interactions are intrinsic to virtually every cellular process. Predicting the binding affinity of protein-protein complexes is one of the challenging problems in computational and molecular biology. In this work, we related sequence features of protein-protein complexes with their binding affinities using machine learning approaches. We set up a database of 185 protein-protein complexes for which the interacting pairs are heterodimers and their experimental binding affinities are available. On the other hand, we have developed a set of 610 features from the sequences of protein complexes and utilized Ranker search method, which is the combination of Attribute evaluator and Ranker method for selecting specific features. We have analyzed several machine learning algorithms to discriminate protein-protein complexes into high and low affinity groups based on their Kd values. Our results showed a 10-fold cross-validation accuracy of 76.1% with the combination of nine features using support vector machines. Further, we observed accuracy of 83.3% on an independent test set of 30 complexes. We suggest that our method would serve as an effective tool for identifying the interacting partners in protein-protein interaction networks and human-pathogen interactions based on the strength of interactions. © 2014 Wiley Periodicals, Inc.

  4. The structure and dipole moment of globular proteins in solution and crystalline states: use of NMR and X-ray databases for the numerical calculation of dipole moment.

    PubMed

    Takashima, S

    2001-04-05

    The large dipole moment of globular proteins has been well known because of the detailed studies using dielectric relaxation and electro-optical methods. The search for the origin of these dipolemoments, however, must be based on the detailed knowledge on protein structure with atomic resolutions. At present, we have two sources of information on the structure of protein molecules: (1) x-ray databases obtained in crystalline state; (2) NMR databases obtained in solution state. While x-ray databases consist of only one model, NMR databases, because of the fluctuation of the protein folding in solution, consist of a number of models, thus enabling the computation of dipole moment repeated for all these models. The aim of this work, using these databases, is the detailed investigation on the interdependence between the structure and dipole moment of protein molecules. The dipole moment of protein molecules has roughly two components: one dipole moment is due to surface charges and the other, core dipole moment, is due to polar groups such as N--H and C==O bonds. The computation of surface charge dipole moment consists of two steps: (A) calculation of the pK shifts of charged groups for electrostatic interactions and (B) calculation of the dipole moment using the pK corrected for electrostatic shifts. The dipole moments of several proteins were computed using both NMR and x-ray databases. The dipole moments of these two sets of calculations are, with a few exceptions, in good agreement with one another and also with measured dipole moments.

  5. RAID: a comprehensive resource for human RNA-associated (RNA-RNA/RNA-protein) interaction.

    PubMed

    Zhang, Xiaomeng; Wu, Deng; Chen, Liqun; Li, Xiang; Yang, Jinxurong; Fan, Dandan; Dong, Tingting; Liu, Mingyue; Tan, Puwen; Xu, Jintian; Yi, Ying; Wang, Yuting; Zou, Hua; Hu, Yongfei; Fan, Kaili; Kang, Juanjuan; Huang, Yan; Miao, Zhengqiang; Bi, Miaoman; Jin, Nana; Li, Kongning; Li, Xia; Xu, Jianzhen; Wang, Dong

    2014-07-01

    Transcriptomic analyses have revealed an unexpected complexity in the eukaryote transcriptome, which includes not only protein-coding transcripts but also an expanding catalog of noncoding RNAs (ncRNAs). Diverse coding and noncoding RNAs (ncRNAs) perform functions through interaction with each other in various cellular processes. In this project, we have developed RAID (http://www.rna-society.org/raid), an RNA-associated (RNA-RNA/RNA-protein) interaction database. RAID intends to provide the scientific community with all-in-one resources for efficient browsing and extraction of the RNA-associated interactions in human. This version of RAID contains more than 6100 RNA-associated interactions obtained by manually reviewing more than 2100 published papers, including 4493 RNA-RNA interactions and 1619 RNA-protein interactions. Each entry contains detailed information on an RNA-associated interaction, including RAID ID, RNA/protein symbol, RNA/protein categories, validated method, expressing tissue, literature references (Pubmed IDs), and detailed functional description. Users can query, browse, analyze, and manipulate RNA-associated (RNA-RNA/RNA-protein) interaction. RAID provides a comprehensive resource of human RNA-associated (RNA-RNA/RNA-protein) interaction network. Furthermore, this resource will help in uncovering the generic organizing principles of cellular function network. © 2014 Zhang et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  6. Identification of compound-protein interactions through the analysis of gene ontology, KEGG enrichment for proteins and molecular fragments of compounds.

    PubMed

    Chen, Lei; Zhang, Yu-Hang; Zheng, Mingyue; Huang, Tao; Cai, Yu-Dong

    2016-12-01

    Compound-protein interactions play important roles in every cell via the recognition and regulation of specific functional proteins. The correct identification of compound-protein interactions can lead to a good comprehension of this complicated system and provide useful input for the investigation of various attributes of compounds and proteins. In this study, we attempted to understand this system by extracting properties from both proteins and compounds, in which proteins were represented by gene ontology and KEGG pathway enrichment scores and compounds were represented by molecular fragments. Advanced feature selection methods, including minimum redundancy maximum relevance, incremental feature selection, and the basic machine learning algorithm random forest, were used to analyze these properties and extract core factors for the determination of actual compound-protein interactions. Compound-protein interactions reported in The Binding Databases were used as positive samples. To improve the reliability of the results, the analytic procedure was executed five times using different negative samples. Simultaneously, five optimal prediction methods based on a random forest and yielding maximum MCCs of approximately 77.55 % were constructed and may be useful tools for the prediction of compound-protein interactions. This work provides new clues to understanding the system of compound-protein interactions by analyzing extracted core features. Our results indicate that compound-protein interactions are related to biological processes involving immune, developmental and hormone-associated pathways.

  7. PLI: a web-based tool for the comparison of protein-ligand interactions observed on PDB structures.

    PubMed

    Gallina, Anna Maria; Bisignano, Paola; Bergamino, Maurizio; Bordo, Domenico

    2013-02-01

    A large fraction of the entries contained in the Protein Data Bank describe proteins in complex with low molecular weight molecules such as physiological compounds or synthetic drugs. In many cases, the same molecule is found in distinct protein-ligand complexes. There is an increasing interest in Medicinal Chemistry in comparing protein binding sites to get insight on interactions that modulate the binding specificity, as this structural information can be correlated with other experimental data of biochemical or physiological nature and may help in rational drug design. The web service protein-ligand interaction presented here provides a tool to analyse and compare the binding pockets of homologous proteins in complex with a selected ligand. The information is deduced from protein-ligand complexes present in the Protein Data Bank and stored in the underlying database. Freely accessible at http://bioinformatics.istge.it/pli/.

  8. Database resources of the National Center for Biotechnology Information.

    PubMed

    2016-01-04

    The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank(®) nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (PubMed Central (PMC), Bookshelf and PubReader), health (ClinVar, dbGaP, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen), genomes (BioProject, Assembly, Genome, BioSample, dbSNP, dbVar, Epigenomics, the Map Viewer, Nucleotide, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser and the Trace Archive), genes (Gene, Gene Expression Omnibus (GEO), HomoloGene, PopSet and UniGene), proteins (Protein, the Conserved Domain Database (CDD), COBALT, Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB) and Protein Clusters) and chemicals (Biosystems and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for most of these databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  9. Database resources of the National Center for Biotechnology Information.

    PubMed

    2015-01-01

    The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank(®) nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (Bookshelf, PubMed Central (PMC) and PubReader); medical genetics (ClinVar, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen); genes and genomics (BioProject, BioSample, dbSNP, dbVar, Epigenomics, Gene, Gene Expression Omnibus (GEO), Genome, HomoloGene, the Map Viewer, Nucleotide, PopSet, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser, Trace Archive and UniGene); and proteins and chemicals (Biosystems, COBALT, the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB), Protein Clusters, Protein and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for many of these databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  10. PodNet, a protein-protein interaction network of the podocyte.

    PubMed

    Warsow, Gregor; Endlich, Nicole; Schordan, Eric; Schordan, Sandra; Chilukoti, Ravi K; Homuth, Georg; Moeller, Marcus J; Fuellen, Georg; Endlich, Karlhans

    2013-07-01

    Interactions between proteins crucially determine cellular structure and function. Differential analysis of the interactome may help elucidate molecular mechanisms during disease development; however, this analysis necessitates mapping of expression data on protein-protein interaction networks. These networks do not exist for the podocyte; therefore, we built PodNet, a literature-based mouse podocyte network in Cytoscape format. Using database protein-protein interactions, we expanded PodNet to XPodNet with enhanced connectivity. In order to test the performance of XPodNet in differential interactome analysis, we examined podocyte developmental differentiation and the effect of cell culture. Transcriptomes of podocytes in 10 different states were mapped on XPodNet and analyzed with the Cytoscape plugin ExprEssence, based on the law of mass action. Interactions between slit diaphragm proteins are most significantly upregulated during podocyte development and most significantly downregulated in culture. On the other hand, our analysis revealed that interactions lost during podocyte differentiation are not regained in culture, suggesting a loss rather than a reversal of differentiation for podocytes in culture. Thus, we have developed PodNet as a valuable tool for differential interactome analysis in podocytes, and we have identified established and unexplored regulated interactions in developing and cultured podocytes.

  11. Prediction of host - pathogen protein interactions between Mycobacterium tuberculosis and Homo sapiens using sequence motifs.

    PubMed

    Huo, Tong; Liu, Wei; Guo, Yu; Yang, Cheng; Lin, Jianping; Rao, Zihe

    2015-03-26

    Emergence of multiple drug resistant strains of M. tuberculosis (MDR-TB) threatens to derail global efforts aimed at reigning in the pathogen. Co-infections of M. tuberculosis with HIV are difficult to treat. To counter these new challenges, it is essential to study the interactions between M. tuberculosis and the host to learn how these bacteria cause disease. We report a systematic flow to predict the host pathogen interactions (HPIs) between M. tuberculosis and Homo sapiens based on sequence motifs. First, protein sequences were used as initial input for identifying the HPIs by 'interolog' method. HPIs were further filtered by prediction of domain-domain interactions (DDIs). Functional annotations of protein and publicly available experimental results were applied to filter the remaining HPIs. Using such a strategy, 118 pairs of HPIs were identified, which involve 43 proteins from M. tuberculosis and 48 proteins from Homo sapiens. A biological interaction network between M. tuberculosis and Homo sapiens was then constructed using the predicted inter- and intra-species interactions based on the 118 pairs of HPIs. Finally, a web accessible database named PATH (Protein interactions of M. tuberculosis and Human) was constructed to store these predicted interactions and proteins. This interaction network will facilitate the research on host-pathogen protein-protein interactions, and may throw light on how M. tuberculosis interacts with its host.

  12. Ranking support vector machine for multiple kernels output combination in protein-protein interaction extraction from biomedical literature.

    PubMed

    Yang, Zhihao; Lin, Yuan; Wu, Jiajin; Tang, Nan; Lin, Hongfei; Li, Yanpeng

    2011-10-01

    Knowledge about protein-protein interactions (PPIs) unveils the molecular mechanisms of biological processes. However, the volume and content of published biomedical literature on protein interactions is expanding rapidly, making it increasingly difficult for interaction database curators to detect and curate protein interaction information manually. We present a multiple kernel learning-based approach for automatic PPI extraction from biomedical literature. The approach combines the following kernels: feature-based, tree, and graph and combines their output with Ranking support vector machine (SVM). Experimental evaluations show that the features in individual kernels are complementary and the kernel combined with Ranking SVM achieves better performance than those of the individual kernels, equal weight combination and optimal weight combination. Our approach can achieve state-of-the-art performance with respect to the comparable evaluations, with 64.88% F-score and 88.02% AUC on the AImed corpus. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Reconstituting protein interaction networks using parameter-dependent domain-domain interactions

    PubMed Central

    2013-01-01

    Background We can describe protein-protein interactions (PPIs) as sets of distinct domain-domain interactions (DDIs) that mediate the physical interactions between proteins. Experimental data confirm that DDIs are more consistent than their corresponding PPIs, lending support to the notion that analyses of DDIs may improve our understanding of PPIs and lead to further insights into cellular function, disease, and evolution. However, currently available experimental DDI data cover only a small fraction of all existing PPIs and, in the absence of structural data, determining which particular DDI mediates any given PPI is a challenge. Results We present two contributions to the field of domain interaction analysis. First, we introduce a novel computational strategy to merge domain annotation data from multiple databases. We show that when we merged yeast domain annotations from six annotation databases we increased the average number of domains per protein from 1.05 to 2.44, bringing it closer to the estimated average value of 3. Second, we introduce a novel computational method, parameter-dependent DDI selection (PADDS), which, given a set of PPIs, extracts a small set of domain pairs that can reconstruct the original set of protein interactions, while attempting to minimize false positives. Based on a set of PPIs from multiple organisms, our method extracted 27% more experimentally detected DDIs than existing computational approaches. Conclusions We have provided a method to merge domain annotation data from multiple sources, ensuring large and consistent domain annotation for any given organism. Moreover, we provided a method to extract a small set of DDIs from the underlying set of PPIs and we showed that, in contrast to existing approaches, our method was not biased towards DDIs with low or high occurrence counts. Finally, we used these two methods to highlight the influence of the underlying annotation density on the characteristics of extracted DDIs. Although increased annotations greatly expanded the possible DDIs, the lack of knowledge of the true biological false positive interactions still prevents an unambiguous assignment of domain interactions responsible for all protein network interactions. Executable files and examples are given at: http://www.bhsai.org/downloads/padds/ PMID:23651452

  14. CellMap visualizes protein-protein interactions and subcellular localization

    PubMed Central

    Dallago, Christian; Goldberg, Tatyana; Andrade-Navarro, Miguel Angel; Alanis-Lobato, Gregorio; Rost, Burkhard

    2018-01-01

    Many tools visualize protein-protein interaction (PPI) networks. The tool introduced here, CellMap, adds one crucial novelty by visualizing PPI networks in the context of subcellular localization, i.e. the location in the cell or cellular component in which a PPI happens. Users can upload images of cells and define areas of interest against which PPIs for selected proteins are displayed (by default on a cartoon of a cell). Annotations of localization are provided by the user or through our in-house database. The visualizer and server are written in JavaScript, making CellMap easy to customize and to extend by researchers and developers. PMID:29497493

  15. PlaMoM: a comprehensive database compiles plant mobile macromolecules.

    PubMed

    Guan, Daogang; Yan, Bin; Thieme, Christoph; Hua, Jingmin; Zhu, Hailong; Boheler, Kenneth R; Zhao, Zhongying; Kragler, Friedrich; Xia, Yiji; Zhang, Shoudong

    2017-01-04

    In plants, various phloem-mobile macromolecules including noncoding RNAs, mRNAs and proteins are suggested to act as important long-distance signals in regulating crucial physiological and morphological transition processes such as flowering, plant growth and stress responses. Given recent advances in high-throughput sequencing technologies, numerous mobile macromolecules have been identified in diverse plant species from different plant families. However, most of the identified mobile macromolecules are not annotated in current versions of species-specific databases and are only available as non-searchable datasheets. To facilitate study of the mobile signaling macromolecules, we compiled the PlaMoM (Plant Mobile Macromolecules) database, a resource that provides convenient and interactive search tools allowing users to retrieve, to analyze and also to predict mobile RNAs/proteins. Each entry in the PlaMoM contains detailed information such as nucleotide/amino acid sequences, ortholog partners, related experiments, gene functions and literature. For the model plant Arabidopsis thaliana, protein-protein interactions of mobile transcripts are presented as interactive molecular networks. Furthermore, PlaMoM provides a built-in tool to identify potential RNA mobility signals such as tRNA-like structures. The current version of PlaMoM compiles a total of 17 991 mobile macromolecules from 14 plant species/ecotypes from published data and literature. PlaMoM is available at http://www.systembioinfo.org/plamom/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Databases for Microbiologists

    DOE PAGES

    Zhulin, Igor B.

    2015-05-26

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.

  17. Databases for Microbiologists

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhulin, Igor B.

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.

  18. Databases for Microbiologists

    PubMed Central

    2015-01-01

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493

  19. INTERSPIA: a web application for exploring the dynamics of protein-protein interactions among multiple species.

    PubMed

    Kwon, Daehong; Lee, Daehwan; Kim, Juyeon; Lee, Jongin; Sim, Mikang; Kim, Jaebum

    2018-05-09

    Proteins perform biological functions through cascading interactions with each other by forming protein complexes. As a result, interactions among proteins, called protein-protein interactions (PPIs) are not completely free from selection constraint during evolution. Therefore, the identification and analysis of PPI changes during evolution can give us new insight into the evolution of functions. Although many algorithms, databases and websites have been developed to help the study of PPIs, most of them are limited to visualize the structure and features of PPIs in a chosen single species with limited functions in the visualization perspective. This leads to difficulties in the identification of different patterns of PPIs in different species and their functional consequences. To resolve these issues, we developed a web application, called INTER-Species Protein Interaction Analysis (INTERSPIA). Given a set of proteins of user's interest, INTERSPIA first discovers additional proteins that are functionally associated with the input proteins and searches for different patterns of PPIs in multiple species through a server-side pipeline, and second visualizes the dynamics of PPIs in multiple species using an easy-to-use web interface. INTERSPIA is freely available at http://bioinfo.konkuk.ac.kr/INTERSPIA/.

  20. FunSimMat: a comprehensive functional similarity database

    PubMed Central

    Schlicker, Andreas; Albrecht, Mario

    2008-01-01

    Functional similarity based on Gene Ontology (GO) annotation is used in diverse applications like gene clustering, gene expression data analysis, protein interaction prediction and evaluation. However, there exists no comprehensive resource of functional similarity values although such a database would facilitate the use of functional similarity measures in different applications. Here, we describe FunSimMat (Functional Similarity Matrix, http://funsimmat.bioinf.mpi-inf.mpg.de/), a large new database that provides several different semantic similarity measures for GO terms. It offers various precomputed functional similarity values for proteins contained in UniProtKB and for protein families in Pfam and SMART. The web interface allows users to efficiently perform both semantic similarity searches with GO terms and functional similarity searches with proteins or protein families. All results can be downloaded in tab-delimited files for use with other tools. An additional XML–RPC interface gives automatic online access to FunSimMat for programs and remote services. PMID:17932054

  1. SCOWLP classification: Structural comparison and analysis of protein binding regions

    PubMed Central

    Teyra, Joan; Paszkowski-Rogacz, Maciej; Anders, Gerd; Pisabarro, M Teresa

    2008-01-01

    Background Detailed information about protein interactions is critical for our understanding of the principles governing protein recognition mechanisms. The structures of many proteins have been experimentally determined in complex with different ligands bound either in the same or different binding regions. Thus, the structural interactome requires the development of tools to classify protein binding regions. A proper classification may provide a general view of the regions that a protein uses to bind others and also facilitate a detailed comparative analysis of the interacting information for specific protein binding regions at atomic level. Such classification might be of potential use for deciphering protein interaction networks, understanding protein function, rational engineering and design. Description Protein binding regions (PBRs) might be ideally described as well-defined separated regions that share no interacting residues one another. However, PBRs are often irregular, discontinuous and can share a wide range of interacting residues among them. The criteria to define an individual binding region can be often arbitrary and may differ from other binding regions within a protein family. Therefore, the rational behind protein interface classification should aim to fulfil the requirements of the analysis to be performed. We extract detailed interaction information of protein domains, peptides and interfacial solvent from the SCOWLP database and we classify the PBRs of each domain family. For this purpose, we define a similarity index based on the overlapping of interacting residues mapped in pair-wise structural alignments. We perform our classification with agglomerative hierarchical clustering using the complete-linkage method. Our classification is calculated at different similarity cut-offs to allow flexibility in the analysis of PBRs, feature especially interesting for those protein families with conflictive binding regions. The hierarchical classification of PBRs is implemented into the SCOWLP database and extends the SCOP classification with three additional family sub-levels: Binding Region, Interface and Contacting Domains. SCOWLP contains 9,334 binding regions distributed within 2,561 families. In 65% of the cases we observe families containing more than one binding region. Besides, 22% of the regions are forming complex with more than one different protein family. Conclusion The current SCOWLP classification and its web application represent a framework for the study of protein interfaces and comparative analysis of protein family binding regions. This comparison can be performed at atomic level and allows the user to study interactome conservation and variability. The new SCOWLP classification may be of great utility for reconstruction of protein complexes, understanding protein networks and ligand design. SCOWLP will be updated with every SCOP release. The web application is available at . PMID:18182098

  2. AFAL: a web service for profiling amino acids surrounding ligands in proteins

    NASA Astrophysics Data System (ADS)

    Arenas-Salinas, Mauricio; Ortega-Salazar, Samuel; Gonzales-Nilo, Fernando; Pohl, Ehmke; Holmes, David S.; Quatrini, Raquel

    2014-11-01

    With advancements in crystallographic technology and the increasing wealth of information populating structural databases, there is an increasing need for prediction tools based on spatial information that will support the characterization of proteins and protein-ligand interactions. Herein, a new web service is presented termed amino acid frequency around ligand (AFAL) for determining amino acids type and frequencies surrounding ligands within proteins deposited in the Protein Data Bank and for assessing the atoms and atom-ligand distances involved in each interaction (availability: http://structuralbio.utalca.cl/AFAL/index.html). AFAL allows the user to define a wide variety of filtering criteria (protein family, source organism, resolution, sequence redundancy and distance) in order to uncover trends and evolutionary differences in amino acid preferences that define interactions with particular ligands. Results obtained from AFAL provide valuable statistical information about amino acids that may be responsible for establishing particular ligand-protein interactions. The analysis will enable investigators to compare ligand-binding sites of different proteins and to uncover general as well as specific interaction patterns from existing data. Such patterns can be used subsequently to predict ligand binding in proteins that currently have no structural information and to refine the interpretation of existing protein models. The application of AFAL is illustrated by the analysis of proteins interacting with adenosine-5'-triphosphate.

  3. AFAL: a web service for profiling amino acids surrounding ligands in proteins.

    PubMed

    Arenas-Salinas, Mauricio; Ortega-Salazar, Samuel; Gonzales-Nilo, Fernando; Pohl, Ehmke; Holmes, David S; Quatrini, Raquel

    2014-11-01

    With advancements in crystallographic technology and the increasing wealth of information populating structural databases, there is an increasing need for prediction tools based on spatial information that will support the characterization of proteins and protein-ligand interactions. Herein, a new web service is presented termed amino acid frequency around ligand (AFAL) for determining amino acids type and frequencies surrounding ligands within proteins deposited in the Protein Data Bank and for assessing the atoms and atom-ligand distances involved in each interaction (availability: http://structuralbio.utalca.cl/AFAL/index.html ). AFAL allows the user to define a wide variety of filtering criteria (protein family, source organism, resolution, sequence redundancy and distance) in order to uncover trends and evolutionary differences in amino acid preferences that define interactions with particular ligands. Results obtained from AFAL provide valuable statistical information about amino acids that may be responsible for establishing particular ligand-protein interactions. The analysis will enable investigators to compare ligand-binding sites of different proteins and to uncover general as well as specific interaction patterns from existing data. Such patterns can be used subsequently to predict ligand binding in proteins that currently have no structural information and to refine the interpretation of existing protein models. The application of AFAL is illustrated by the analysis of proteins interacting with adenosine-5'-triphosphate.

  4. A Small-molecule Inhibitor, 5′-O-Tritylthymidine, targets FAK and Mdm-2 Interaction, and Blocks Breast and Colon Tumorigenesis in vivo

    PubMed Central

    Golubovskaya, Vita; Palma, Nadia L.; Zheng, Min; Ho, Baotran; Magis, Andrew; Ostrov, David; Cance, William G.

    2013-01-01

    Focal Adhesion Kinase (FAK) is overexpressed in many types of tumors and plays an important role in survival. We developed a novel approach, targeting FAK-protein interactions by computer modeling and screening of NCI small molecule drug database. In this report we targeted FAK and Mdm-2 protein interaction to decrease tumor growth. By macromolecular modeling we found a model of FAK and Mdm-2 interaction and performed screening of >200,000 small molecule compounds from NCI database with drug-like characteristics, targeting the FAK-Mdm-2 interaction. We identified 5′-O-Tritylthymidine, called M13 compound that significantly decreased viability in different cancer cells. M13 was docked into the pocket of FAK and Mdm-2 interaction and was directly bound to the FAK-N terminal domain by ForteBio Octet assay. In addition, M13 compound affected FAK and Mdm-2 levels and decreased complex of FAK and Mdm-2 proteins in breast and colon cancer cells. M13 re-activated p53 activity inhibited by FAK with Mdm-2 promoter. M13 decreased viability, clonogenicity, increased detachment and apoptosis in a dose-dependent manner in BT474 breast and in HCT116 colon cancer cells in vitro. M13 decreased FAK, activated p53 and caspase-8 in both cell lines. In addition, M13 decreased breast and colon tumor growth in vivo. M13 activated p53 and decreased FAK in tumor samples consistent with decreased tumor growth. The data demonstrate a novel approach for targeting FAK and Mdm-2 protein interaction, provide a model of FAK and Mdm-2 interaction, identify M13 compound targeting this interaction and decreasing tumor growth that is critical for future targeted therapeutics. PMID:22292771

  5. Reconstruction of a Functional Human Gene Network, with an Application for Prioritizing Positional Candidate Genes

    PubMed Central

    Franke, Lude; Bakel, Harm van; Fokkens, Like; de Jong, Edwin D.; Egmont-Petersen, Michael; Wijmenga, Cisca

    2006-01-01

    Most common genetic disorders have a complex inheritance and may result from variants in many genes, each contributing only weak effects to the disease. Pinpointing these disease genes within the myriad of susceptibility loci identified in linkage studies is difficult because these loci may contain hundreds of genes. However, in any disorder, most of the disease genes will be involved in only a few different molecular pathways. If we know something about the relationships between the genes, we can assess whether some genes (which may reside in different loci) functionally interact with each other, indicating a joint basis for the disease etiology. There are various repositories of information on pathway relationships. To consolidate this information, we developed a functional human gene network that integrates information on genes and the functional relationships between genes, based on data from the Kyoto Encyclopedia of Genes and Genomes, the Biomolecular Interaction Network Database, Reactome, the Human Protein Reference Database, the Gene Ontology database, predicted protein-protein interactions, human yeast two-hybrid interactions, and microarray coexpressions. We applied this network to interrelate positional candidate genes from different disease loci and then tested 96 heritable disorders for which the Online Mendelian Inheritance in Man database reported at least three disease genes. Artificial susceptibility loci, each containing 100 genes, were constructed around each disease gene, and we used the network to rank these genes on the basis of their functional interactions. By following up the top five genes per artificial locus, we were able to detect at least one known disease gene in 54% of the loci studied, representing a 2.8-fold increase over random selection. This suggests that our method can significantly reduce the cost and effort of pinpointing true disease genes in analyses of disorders for which numerous loci have been reported but for which most of the genes are unknown. PMID:16685651

  6. The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes

    PubMed Central

    Rigden, Daniel J

    2017-01-01

    Abstract This year's Database Issue of Nucleic Acids Research contains 152 papers that include descriptions of 54 new databases and update papers on 98 databases, of which 16 have not been previously featured in NAR. As always, these databases cover a broad range of molecular biology subjects, including genome structure, gene expression and its regulation, proteins, protein domains, and protein–protein interactions. Following the recent trend, an increasing number of new and established databases deal with the issues of human health, from cancer-causing mutations to drugs and drug targets. In accordance with this trend, three recently compiled databases that have been selected by NAR reviewers and editors as ‘breakthrough’ contributions, denovo-db, the Monarch Initiative, and Open Targets, cover human de novo gene variants, disease-related phenotypes in model organisms, and a bioinformatics platform for therapeutic target identification and validation, respectively. We expect these databases to attract the attention of numerous researchers working in various areas of genetics and genomics. Looking back at the past 12 years, we present here the ‘golden set’ of databases that have consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database. The Database Issue is freely available online at the https://academic.oup.com/nar web site. An updated version of the NAR Molecular Biology Database Collection is available at http://www.oxfordjournals.org/nar/database/a/. PMID:28053160

  7. RAID v2.0: an updated resource of RNA-associated interactions across organisms.

    PubMed

    Yi, Ying; Zhao, Yue; Li, Chunhua; Zhang, Lin; Huang, Huiying; Li, Yana; Liu, Lanlan; Hou, Ping; Cui, Tianyu; Tan, Puwen; Hu, Yongfei; Zhang, Ting; Huang, Yan; Li, Xiaobo; Yu, Jia; Wang, Dong

    2017-01-04

    With the development of biotechnologies and computational prediction algorithms, the number of experimental and computational prediction RNA-associated interactions has grown rapidly in recent years. However, diverse RNA-associated interactions are scattered over a wide variety of resources and organisms, whereas a fully comprehensive view of diverse RNA-associated interactions is still not available for any species. Hence, we have updated the RAID database to version 2.0 (RAID v2.0, www.rna-society.org/raid/) by integrating experimental and computational prediction interactions from manually reading literature and other database resources under one common framework. The new developments in RAID v2.0 include (i) over 850-fold RNA-associated interactions, an enhancement compared to the previous version; (ii) numerous resources integrated with experimental or computational prediction evidence for each RNA-associated interaction; (iii) a reliability assessment for each RNA-associated interaction based on an integrative confidence score; and (iv) an increase of species coverage to 60. Consequently, RAID v2.0 recruits more than 5.27 million RNA-associated interactions, including more than 4 million RNA-RNA interactions and more than 1.2 million RNA-protein interactions, referring to nearly 130 000 RNA/protein symbols across 60 species. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Properties of polyproline II, a secondary structure element implicated in protein-protein interactions.

    PubMed

    Cubellis, M V; Caillez, F; Blundell, T L; Lovell, S C

    2005-03-01

    The polyproline II (PPII) conformation of protein backbone is an important secondary structure type. It is unusual in that, due to steric constraints, its main-chain hydrogen-bond donors and acceptors cannot easily be satisfied. It is unable to make local hydrogen bonds, in a manner similar to that of alpha-helices, and it cannot easily satisfy the hydrogen-bonding potential of neighboring residues in polyproline conformation in a manner analogous to beta-strands. Here we describe an analysis of polyproline conformations using the HOMSTRAD database of structurally aligned proteins. This allows us not only to determine amino acid propensities from a much larger database than previously but also to investigate conservation of amino acids in polyproline conformations, and the conservation of the conformation itself. Although proline is common in polyproline helices, helices without proline represent 46% of the total. No other amino acid appears to be greatly preferred; glycine and aromatic amino acids have low propensities for PPII. Accordingly, the hydrogen-bonding potential of PPII main-chain is mainly satisfied by water molecules and by other parts of the main-chain. Side-chain to main-chain interactions are mostly nonlocal. Interestingly, the increased number of nonsatisfied H-bond donors and acceptors (as compared with alpha-helices and beta-strands) makes PPII conformers well suited to take part in protein-protein interactions. Copyright 2005 Wiley-Liss, Inc.

  9. Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information.

    PubMed

    Li, Min; Li, Wenkai; Wu, Fang-Xiang; Pan, Yi; Wang, Jianxin

    2018-06-14

    Essential proteins are important participants in various life activities and play a vital role in the survival and reproduction of living organisms. Identification of essential proteins from protein-protein interaction (PPI) networks has great significance to facilitate the study of human complex diseases, the design of drugs and the development of bioinformatics and computational science. Studies have shown that highly connected proteins in a PPI network tend to be essential. A series of computational methods have been proposed to identify essential proteins by analyzing topological structures of PPI networks. However, the high noise in the PPI data can degrade the accuracy of essential protein prediction. Moreover, proteins must be located in the appropriate subcellular localization to perform their functions, and only when the proteins are located in the same subcellular localization, it is possible that they can interact with each other. In this paper, we propose a new network-based essential protein discovery method based on sub-network partition and prioritization by integrating subcellular localization information, named SPP. The proposed method SPP was tested on two different yeast PPI networks obtained from DIP database and BioGRID database. The experimental results show that SPP can effectively reduce the effect of false positives in PPI networks and predict essential proteins more accurately compared with other existing computational methods DC, BC, CC, SC, EC, IC, NC. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. AntiJen: a quantitative immunology database integrating functional, thermodynamic, kinetic, biophysical, and cellular data

    PubMed Central

    Toseland, Christopher P; Clayton, Debra J; McSparron, Helen; Hemsley, Shelley L; Blythe, Martin J; Paine, Kelly; Doytchinova, Irini A; Guan, Pingping; Hattotuwagama, Channa K; Flower, Darren R

    2005-01-01

    AntiJen is a database system focused on the integration of kinetic, thermodynamic, functional, and cellular data within the context of immunology and vaccinology. Compared to its progenitor JenPep, the interface has been completely rewritten and redesigned and now offers a wider variety of search methods, including a nucleotide and a peptide BLAST search. In terms of data archived, AntiJen has a richer and more complete breadth, depth, and scope, and this has seen the database increase to over 31,000 entries. AntiJen provides the most complete and up-to-date dataset of its kind. While AntiJen v2.0 retains a focus on both T cell and B cell epitopes, its greatest novelty is the archiving of continuous quantitative data on a variety of immunological molecular interactions. This includes thermodynamic and kinetic measures of peptide binding to TAP and the Major Histocompatibility Complex (MHC), peptide-MHC complexes binding to T cell receptors, antibodies binding to protein antigens and general immunological protein-protein interactions. The database also contains quantitative specificity data from position-specific peptide libraries and biophysical data, in the form of diffusion co-efficients and cell surface copy numbers, on MHCs and other immunological molecules. The uses of AntiJen include the design of vaccines and diagnostics, such as tetramers, and other laboratory reagents, as well as helping parameterize the bioinformatic or mathematical in silico modeling of the immune system. The database is accessible from the URL: . PMID:16305757

  11. Topology and weights in a protein domain interaction network--a novel way to predict protein interactions.

    PubMed

    Wuchty, Stefan

    2006-05-23

    While the analysis of unweighted biological webs as diverse as genetic, protein and metabolic networks allowed spectacular insights in the inner workings of a cell, biological networks are not only determined by their static grid of links. In fact, we expect that the heterogeneity in the utilization of connections has a major impact on the organization of cellular activities as well. We consider a web of interactions between protein domains of the Protein Family database (PFAM), which are weighted by a probability score. We apply metrics that combine the static layout and the weights of the underlying interactions. We observe that unweighted measures as well as their weighted counterparts largely share the same trends in the underlying domain interaction network. However, we only find weak signals that weights and the static grid of interactions are connected entities. Therefore assuming that a protein interaction is governed by a single domain interaction, we observe strong and significant correlations of the highest scoring domain interaction and the confidence of protein interactions in the underlying interactions of yeast and fly. Modeling an interaction between proteins if we find a high scoring protein domain interaction we obtain 1, 428 protein interactions among 361 proteins in the human malaria parasite Plasmodium falciparum. Assessing their quality by a logistic regression method we observe that increasing confidence of predicted interactions is accompanied by high scoring domain interactions and elevated levels of functional similarity and evolutionary conservation. Our results indicate that probability scores are randomly distributed, allowing to treat static grid and weights of domain interactions as separate entities. In particular, these finding confirms earlier observations that a protein interaction is a matter of a single interaction event on domain level. As an immediate application, we show a simple way to predict potential protein interactions by utilizing expectation scores of single domain interactions.

  12. UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions.

    PubMed

    Hume, Maxwell A; Barrera, Luis A; Gisselbrecht, Stephen S; Bulyk, Martha L

    2015-01-01

    The Universal PBM Resource for Oligonucleotide Binding Evaluation (UniPROBE) serves as a convenient source of information on published data generated using universal protein-binding microarray (PBM) technology, which provides in vitro data about the relative DNA-binding preferences of transcription factors for all possible sequence variants of a length k ('k-mers'). The database displays important information about the proteins and displays their DNA-binding specificity data in terms of k-mers, position weight matrices and graphical sequence logos. This update to the database documents the growth of UniPROBE since the last update 4 years ago, and introduces a variety of new features and tools, including a new streamlined pipeline that facilitates data deposition by universal PBM data generators in the research community, a tool that generates putative nonbinding (i.e. negative control) DNA sequences for one or more proteins and novel motifs obtained by analyzing the PBM data using the BEEML-PBM algorithm for motif inference. The UniPROBE database is available at http://uniprobe.org. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase - an open access and searchable database of a coral genome

    PubMed Central

    2013-01-01

    Background Contemporary coral reef research has firmly established that a genomic approach is urgently needed to better understand the effects of anthropogenic environmental stress and global climate change on coral holobiont interactions. Here we present KEGG orthology-based annotation of the complete genome sequence of the scleractinian coral Acropora digitifera and provide the first comprehensive view of the genome of a reef-building coral by applying advanced bioinformatics. Description Sequences from the KEGG database of protein function were used to construct hidden Markov models. These models were used to search the predicted proteome of A. digitifera to establish complete genomic annotation. The annotated dataset is published in ZoophyteBase, an open access format with different options for searching the data. A particularly useful feature is the ability to use a Google-like search engine that links query words to protein attributes. We present features of the annotation that underpin the molecular structure of key processes of coral physiology that include (1) regulatory proteins of symbiosis, (2) planula and early developmental proteins, (3) neural messengers, receptors and sensory proteins, (4) calcification and Ca2+-signalling proteins, (5) plant-derived proteins, (6) proteins of nitrogen metabolism, (7) DNA repair proteins, (8) stress response proteins, (9) antioxidant and redox-protective proteins, (10) proteins of cellular apoptosis, (11) microbial symbioses and pathogenicity proteins, (12) proteins of viral pathogenicity, (13) toxins and venom, (14) proteins of the chemical defensome and (15) coral epigenetics. Conclusions We advocate that providing annotation in an open-access searchable database available to the public domain will give an unprecedented foundation to interrogate the fundamental molecular structure and interactions of coral symbiosis and allow critical questions to be addressed at the genomic level based on combined aspects of evolutionary, developmental, metabolic, and environmental perspectives. PMID:23889801

  14. Modeling and simulating networks of interdependent protein interactions.

    PubMed

    Stöcker, Bianca K; Köster, Johannes; Zamir, Eli; Rahmann, Sven

    2018-05-21

    Protein interactions are fundamental building blocks of biochemical reaction systems underlying cellular functions. The complexity and functionality of these systems emerge not only from the protein interactions themselves but also from the dependencies between these interactions, as generated by allosteric effects or mutual exclusion due to steric hindrance. Therefore, formal models for integrating and utilizing information about interaction dependencies are of high interest. Here, we describe an approach for endowing protein networks with interaction dependencies using propositional logic, thereby obtaining constrained protein interaction networks ("constrained networks"). The construction of these networks is based on public interaction databases as well as text-mined information about interaction dependencies. We present an efficient data structure and algorithm to simulate protein complex formation in constrained networks. The efficiency of the model allows fast simulation and facilitates the analysis of many proteins in large networks. In addition, this approach enables the simulation of perturbation effects, such as knockout of single or multiple proteins and changes of protein concentrations. We illustrate how our model can be used to analyze a constrained human adhesome protein network, which is responsible for the formation of diverse and dynamic cell-matrix adhesion sites. By comparing protein complex formation under known interaction dependencies versus without dependencies, we investigate how these dependencies shape the resulting repertoire of protein complexes. Furthermore, our model enables investigating how the interplay of network topology with interaction dependencies influences the propagation of perturbation effects across a large biochemical system. Our simulation software CPINSim (for Constrained Protein Interaction Network Simulator) is available under the MIT license at http://github.com/BiancaStoecker/cpinsim and as a Bioconda package (https://bioconda.github.io).

  15. PlaMoM: a comprehensive database compiles plant mobile macromolecules

    PubMed Central

    Guan, Daogang; Yan, Bin; Thieme, Christoph; Hua, Jingmin; Zhu, Hailong; Boheler, Kenneth R.; Zhao, Zhongying; Kragler, Friedrich; Xia, Yiji; Zhang, Shoudong

    2017-01-01

    In plants, various phloem-mobile macromolecules including noncoding RNAs, mRNAs and proteins are suggested to act as important long-distance signals in regulating crucial physiological and morphological transition processes such as flowering, plant growth and stress responses. Given recent advances in high-throughput sequencing technologies, numerous mobile macromolecules have been identified in diverse plant species from different plant families. However, most of the identified mobile macromolecules are not annotated in current versions of species-specific databases and are only available as non-searchable datasheets. To facilitate study of the mobile signaling macromolecules, we compiled the PlaMoM (Plant Mobile Macromolecules) database, a resource that provides convenient and interactive search tools allowing users to retrieve, to analyze and also to predict mobile RNAs/proteins. Each entry in the PlaMoM contains detailed information such as nucleotide/amino acid sequences, ortholog partners, related experiments, gene functions and literature. For the model plant Arabidopsis thaliana, protein–protein interactions of mobile transcripts are presented as interactive molecular networks. Furthermore, PlaMoM provides a built-in tool to identify potential RNA mobility signals such as tRNA-like structures. The current version of PlaMoM compiles a total of 17 991 mobile macromolecules from 14 plant species/ecotypes from published data and literature. PlaMoM is available at http://www.systembioinfo.org/plamom/. PMID:27924044

  16. ISAAC - InterSpecies Analysing Application using Containers.

    PubMed

    Baier, Herbert; Schultz, Jörg

    2014-01-15

    Information about genes, transcripts and proteins is spread over a wide variety of databases. Different tools have been developed using these databases to identify biological signals in gene lists from large scale analysis. Mostly, they search for enrichments of specific features. But, these tools do not allow an explorative walk through different views and to change the gene lists according to newly upcoming stories. To fill this niche, we have developed ISAAC, the InterSpecies Analysing Application using Containers. The central idea of this web based tool is to enable the analysis of sets of genes, transcripts and proteins under different biological viewpoints and to interactively modify these sets at any point of the analysis. Detailed history and snapshot information allows tracing each action. Furthermore, one can easily switch back to previous states and perform new analyses. Currently, sets can be viewed in the context of genomes, protein functions, protein interactions, pathways, regulation, diseases and drugs. Additionally, users can switch between species with an automatic, orthology based translation of existing gene sets. As todays research usually is performed in larger teams and consortia, ISAAC provides group based functionalities. Here, sets as well as results of analyses can be exchanged between members of groups. ISAAC fills the gap between primary databases and tools for the analysis of large gene lists. With its highly modular, JavaEE based design, the implementation of new modules is straight forward. Furthermore, ISAAC comes with an extensive web-based administration interface including tools for the integration of third party data. Thus, a local installation is easily feasible. In summary, ISAAC is tailor made for highly explorative interactive analyses of gene, transcript and protein sets in a collaborative environment.

  17. APID interactomes: providing proteome-based interactomes with controlled quality for multiple species and derived networks

    PubMed Central

    Alonso-López, Diego; Gutiérrez, Miguel A.; Lopes, Katia P.; Prieto, Carlos; Santamaría, Rodrigo; De Las Rivas, Javier

    2016-01-01

    APID (Agile Protein Interactomes DataServer) is an interactive web server that provides unified generation and delivery of protein interactomes mapped to their respective proteomes. This resource is a new, fully redesigned server that includes a comprehensive collection of protein interactomes for more than 400 organisms (25 of which include more than 500 interactions) produced by the integration of only experimentally validated protein–protein physical interactions. For each protein–protein interaction (PPI) the server includes currently reported information about its experimental validation to allow selection and filtering at different quality levels. As a whole, it provides easy access to the interactomes from specific species and includes a global uniform compendium of 90,379 distinct proteins and 678,441 singular interactions. APID integrates and unifies PPIs from major primary databases of molecular interactions, from other specific repositories and also from experimentally resolved 3D structures of protein complexes where more than two proteins were identified. For this purpose, a collection of 8,388 structures were analyzed to identify specific PPIs. APID also includes a new graph tool (based on Cytoscape.js) for visualization and interactive analyses of PPI networks. The server does not require registration and it is freely available for use at http://apid.dep.usal.es. PMID:27131791

  18. The salivary secretome of the tsetse fly Glossina pallidipes (Diptera: Glossinidae) infected by salivary gland hypertrophy virus.

    PubMed

    Kariithi, Henry M; Ince, Ikbal A; Boeren, Sjef; Abd-Alla, Adly M M; Parker, Andrew G; Aksoy, Serap; Vlak, Just M; Oers, Monique M van

    2011-11-01

    The competence of the tsetse fly Glossina pallidipes (Diptera; Glossinidae) to acquire salivary gland hypertrophy virus (SGHV), to support virus replication and successfully transmit the virus depends on complex interactions between Glossina and SGHV macromolecules. Critical requisites to SGHV transmission are its replication and secretion of mature virions into the fly's salivary gland (SG) lumen. However, secretion of host proteins is of equal importance for successful transmission and requires cataloging of G. pallidipes secretome proteins from hypertrophied and non-hypertrophied SGs. After electrophoretic profiling and in-gel trypsin digestion, saliva proteins were analyzed by nano-LC-MS/MS. MaxQuant/Andromeda search of the MS data against the non-redundant (nr) GenBank database and a G. morsitans morsitans SG EST database, yielded a total of 521 hits, 31 of which were SGHV-encoded. On a false discovery rate limit of 1% and detection threshold of least 2 unique peptides per protein, the analysis resulted in 292 Glossina and 25 SGHV MS-supported proteins. When annotated by the Blast2GO suite, at least one gene ontology (GO) term could be assigned to 89.9% (285/317) of the detected proteins. Five (∼1.8%) Glossina and three (∼12%) SGHV proteins remained without a predicted function after blast searches against the nr database. Sixty-five of the 292 detected Glossina proteins contained an N-terminal signal/secretion peptide sequence. Eight of the SGHV proteins were predicted to be non-structural (NS), and fourteen are known structural (VP) proteins. SGHV alters the protein expression pattern in Glossina. The G. pallidipes SG secretome encompasses a spectrum of proteins that may be required during the SGHV infection cycle. These detected proteins have putative interactions with at least 21 of the 25 SGHV-encoded proteins. Our findings opens venues for developing novel SGHV mitigation strategies to block SGHV infections in tsetse production facilities such as using SGHV-specific antibodies and phage display-selected gut epithelia-binding peptides.

  19. New insights into protein-protein interaction data lead to increased estimates of the S. cerevisiae interactome size.

    PubMed

    Sambourg, Laure; Thierry-Mieg, Nicolas

    2010-12-21

    As protein interactions mediate most cellular mechanisms, protein-protein interaction networks are essential in the study of cellular processes. Consequently, several large-scale interactome mapping projects have been undertaken, and protein-protein interactions are being distilled into databases through literature curation; yet protein-protein interaction data are still far from comprehensive, even in the model organism Saccharomyces cerevisiae. Estimating the interactome size is important for evaluating the completeness of current datasets, in order to measure the remaining efforts that are required. We examined the yeast interactome from a new perspective, by taking into account how thoroughly proteins have been studied. We discovered that the set of literature-curated protein-protein interactions is qualitatively different when restricted to proteins that have received extensive attention from the scientific community. In particular, these interactions are less often supported by yeast two-hybrid, and more often by more complex experiments such as biochemical activity assays. Our analysis showed that high-throughput and literature-curated interactome datasets are more correlated than commonly assumed, but that this bias can be corrected for by focusing on well-studied proteins. We thus propose a simple and reliable method to estimate the size of an interactome, combining literature-curated data involving well-studied proteins with high-throughput data. It yields an estimate of at least 37, 600 direct physical protein-protein interactions in S. cerevisiae. Our method leads to higher and more accurate estimates of the interactome size, as it accounts for interactions that are genuine yet difficult to detect with commonly-used experimental assays. This shows that we are even further from completing the yeast interactome map than previously expected.

  20. Virtual Interactomics of Proteins from Biochemical Standpoint

    PubMed Central

    Kubrycht, Jaroslav; Sigler, Karel; Souček, Pavel

    2012-01-01

    Virtual interactomics represents a rapidly developing scientific area on the boundary line of bioinformatics and interactomics. Protein-related virtual interactomics then comprises instrumental tools for prediction, simulation, and networking of the majority of interactions important for structural and individual reproduction, differentiation, recognition, signaling, regulation, and metabolic pathways of cells and organisms. Here, we describe the main areas of virtual protein interactomics, that is, structurally based comparative analysis and prediction of functionally important interacting sites, mimotope-assisted and combined epitope prediction, molecular (protein) docking studies, and investigation of protein interaction networks. Detailed information about some interesting methodological approaches and online accessible programs or databases is displayed in our tables. Considerable part of the text deals with the searches for common conserved or functionally convergent protein regions and subgraphs of conserved interaction networks, new outstanding trends and clinically interesting results. In agreement with the presented data and relationships, virtual interactomic tools improve our scientific knowledge, help us to formulate working hypotheses, and they frequently also mediate variously important in silico simulations. PMID:22928109

  1. DEPPDB - DNA electrostatic potential properties database. Electrostatic properties of genome DNA elements.

    PubMed

    Osypov, Alexander A; Krutinin, Gleb G; Krutinina, Eugenia A; Kamzolova, Svetlana G

    2012-04-01

    Electrostatic properties of genome DNA are important to its interactions with different proteins, in particular, related to transcription. DEPPDB - DNA Electrostatic Potential (and other Physical) Properties Database - provides information on the electrostatic and other physical properties of genome DNA combined with its sequence and annotation of biological and structural properties of genomes and their elements. Genomes are organized on taxonomical basis, supporting comparative and evolutionary studies. Currently, DEPPDB contains all completely sequenced bacterial, viral, mitochondrial, and plastids genomes according to the NCBI RefSeq, and some model eukaryotic genomes. Data for promoters, regulation sites, binding proteins, etc., are incorporated from established DBs and literature. The database is complemented by analytical tools. User sequences calculations are available. Case studies discovered electrostatics complementing DNA bending in E.coli plasmid BNT2 promoter functioning, possibly affecting host-environment metabolic switch. Transcription factors binding sites gravitate to high potential regions, confirming the electrostatics universal importance in protein-DNA interactions beyond the classical promoter-RNA polymerase recognition and regulation. Other genome elements, such as terminators, also show electrostatic peculiarities. Most intriguing are gene starts, exhibiting taxonomic correlations. The necessity of the genome electrostatic properties studies is discussed.

  2. VisANT 3.0: new modules for pathway visualization, editing, prediction and construction.

    PubMed

    Hu, Zhenjun; Ng, David M; Yamada, Takuji; Chen, Chunnuan; Kawashima, Shuichi; Mellor, Joe; Linghu, Bolan; Kanehisa, Minoru; Stuart, Joshua M; DeLisi, Charles

    2007-07-01

    With the integration of the KEGG and Predictome databases as well as two search engines for coexpressed genes/proteins using data sets obtained from the Stanford Microarray Database (SMD) and Gene Expression Omnibus (GEO) database, VisANT 3.0 supports exploratory pathway analysis, which includes multi-scale visualization of multiple pathways, editing and annotating pathways using a KEGG compatible visual notation and visualization of expression data in the context of pathways. Expression levels are represented either by color intensity or by nodes with an embedded expression profile. Multiple experiments can be navigated or animated. Known KEGG pathways can be enriched by querying either coexpressed components of known pathway members or proteins with known physical interactions. Predicted pathways for genes/proteins with unknown functions can be inferred from coexpression or physical interaction data. Pathways produced in VisANT can be saved as computer-readable XML format (VisML), graphic images or high-resolution Scalable Vector Graphics (SVG). Pathways in the format of VisML can be securely shared within an interested group or published online using a simple Web link. VisANT is freely available at http://visant.bu.edu.

  3. In silico analysis of protein toxin and bacteriocins from Lactobacillus paracasei SD1 genome and available online databases

    PubMed Central

    Surachat, Komwit; Sangket, Unitsa; Deachamag, Panchalika; Chotigeat, Wilaiwan

    2017-01-01

    Lactobacillus paracasei SD1 is a potential probiotic strain due to its ability to survive several conditions in human dental cavities. To ascertain its safety for human use, we therefore performed a comprehensive bioinformatics analysis and characterization of the bacterial protein toxins produced by this strain. We report the complete genome of Lactobacillus paracasei SD1 and its comparison to other Lactobacillus genomes. Additionally, we identify and analyze its protein toxins and antimicrobial proteins using reliable online database resources and establish its phylogenetic relationship with other bacterial genomes. Our investigation suggests that this strain is safe for human use and contains several bacteriocins that confer health benefits to the host. An in silico analysis of protein-protein interactions between the target bacteriocins and the microbial proteins gtfB and luxS of Streptococcus mutans was performed and is discussed here. PMID:28837656

  4. Detection of functionally important regions in "hypothetical proteins" of known structure.

    PubMed

    Nimrod, Guy; Schushan, Maya; Steinberg, David M; Ben-Tal, Nir

    2008-12-10

    Structural genomics initiatives provide ample structures of "hypothetical proteins" (i.e., proteins of unknown function) at an ever increasing rate. However, without function annotation, this structural goldmine is of little use to biologists who are interested in particular molecular systems. To this end, we used (an improved version of) the PatchFinder algorithm for the detection of functional regions on the protein surface, which could mediate its interactions with, e.g., substrates, ligands, and other proteins. Examination, using a data set of annotated proteins, showed that PatchFinder outperforms similar methods. We collected 757 structures of hypothetical proteins and their predicted functional regions in the N-Func database. Inspection of several of these regions demonstrated that they are useful for function prediction. For example, we suggested an interprotein interface and a putative nucleotide-binding site. A web-server implementation of PatchFinder and the N-Func database are available at http://patchfinder.tau.ac.il/.

  5. Identification of proteins interacting with lactate dehydrogenase in claw muscle of the porcelain crab Petrolisthes cinctipes

    PubMed Central

    Cayenne, Andrea P.; Gabert, Beverly; Stillman, Jonathon H.

    2011-01-01

    Biochemical adaptation of enzymes involves conservation of activity, stability and affinity across a wide range of intracellular and environmental conditions. Enzyme adaptation by alteration of primary structure is well known, but the roles of protein-protein interactions in enzyme adaptation are less well understood. Interspecific differences in thermal stability of lactate dehydrogenase (LDH) in porcelain crabs (genus Petrolisthes) are related to intrinsic differences among LDH molecules and by interactions with other stabilizing proteins. Here, we identified proteins that interact with LDH in porcelain crab claw muscle tissue using co-immunoprecipitation, and showed LDH exists in high molecular weight complexes using size exclusion chromatography and Western blot analyses. Co-immunoprecipitated proteins were separated using 2D SDS PAGE and analyzed by LC/ESI using peptide MS/MS. Peptide MS/MS ions were compared to an EST database for Petrolisthes cinctipes to identify proteins. Identified proteins included cytoskeletal elements, glycolytic enzymes, a phosphagen kinase, and the respiratory protein hemocyanin. Our results support the hypothesis that LDH interacts with glycolytic enzymes in a metabolon structured by cytoskeletal elements that may also include the enzyme for transfer of the adenylate charge in glycolytically produced ATP. Those interactions may play specific roles in biochemical adaptation of glycolytic enzymes. PMID:21968246

  6. Phthalic acid chemical probes synthesized for protein-protein interaction analysis.

    PubMed

    Liang, Shih-Shin; Liao, Wei-Ting; Kuo, Chao-Jen; Chou, Chi-Hsien; Wu, Chin-Jen; Wang, Hui-Min

    2013-06-24

    Plasticizers are additives that are used to increase the flexibility of plastic during manufacturing. However, in injection molding processes, plasticizers cannot be generated with monomers because they can peel off from the plastics into the surrounding environment, water, or food, or become attached to skin. Among the various plasticizers that are used, 1,2-benzenedicarboxylic acid (phthalic acid) is a typical precursor to generate phthalates. In addition, phthalic acid is a metabolite of diethylhexyl phthalate (DEHP). According to Gene_Ontology gene/protein database, phthalates can cause genital diseases, cardiotoxicity, hepatotoxicity, nephrotoxicity, etc. In this study, a silanized linker (3-aminopropyl triethoxyslane, APTES) was deposited on silicon dioxides (SiO2) particles and phthalate chemical probes were manufactured from phthalic acid and APTES-SiO2. These probes could be used for detecting proteins that targeted phthalic acid and for protein-protein interactions. The phthalic acid chemical probes we produced were incubated with epithelioid cell lysates of normal rat kidney (NRK-52E cells) to detect the interactions between phthalic acid and NRK-52E extracted proteins. These chemical probes interacted with a number of chaperones such as protein disulfide-isomerase A6, heat shock proteins, and Serpin H1. Ingenuity Pathways Analysis (IPA) software showed that these chemical probes were a practical technique for protein-protein interaction analysis.

  7. Functional annotation from the genome sequence of the giant panda.

    PubMed

    Huo, Tong; Zhang, Yinjie; Lin, Jianping

    2012-08-01

    The giant panda is one of the most critically endangered species due to the fragmentation and loss of its habitat. Studying the functions of proteins in this animal, especially specific trait-related proteins, is therefore necessary to protect the species. In this work, the functions of these proteins were investigated using the genome sequence of the giant panda. Data on 21,001 proteins and their functions were stored in the Giant Panda Protein Database, in which the proteins were divided into two groups: 20,179 proteins whose functions can be predicted by GeneScan formed the known-function group, whereas 822 proteins whose functions cannot be predicted by GeneScan comprised the unknown-function group. For the known-function group, we further classified the proteins by molecular function, biological process, cellular component, and tissue specificity. For the unknown-function group, we developed a strategy in which the proteins were filtered by cross-Blast to identify panda-specific proteins under the assumption that proteins related to the panda-specific traits in the unknown-function group exist. After this filtering procedure, we identified 32 proteins (2 of which are membrane proteins) specific to the giant panda genome as compared against the dog and horse genomes. Based on their amino acid sequences, these 32 proteins were further analyzed by functional classification using SVM-Prot, motif prediction using MyHits, and interacting protein prediction using the Database of Interacting Proteins. Nineteen proteins were predicted to be zinc-binding proteins, thus affecting the activities of nucleic acids. The 32 panda-specific proteins will be further investigated by structural and functional analysis.

  8. Protein interaction networks from literature mining

    NASA Astrophysics Data System (ADS)

    Ihara, Sigeo

    2005-03-01

    The ability to accurately predict and understand physiological changes in the biological network system in response to disease or drug therapeutics is of crucial importance in life science. The extensive amount of gene expression data generated from even a single microarray experiment often proves difficult to fully interpret and comprehend the biological significance. An increasing knowledge of protein interactions stored in the PubMed database, as well as the advancement of natural language processing, however, makes it possible to construct protein interaction networks from the gene expression information that are essential for understanding the biological meaning. From the in house literature mining system we have developed, the protein interaction network for humans was constructed. By analysis based on the graph-theoretical characterization of the total interaction network in literature, we found that the network is scale-free and semantic long-ranged interactions (i.e. inhibit, induce) between proteins dominate in the total interaction network, reducing the degree exponent. Interaction networks generated based on scientific text in which the interaction event is ambiguously described result in disconnected networks. In contrast interaction networks based on text in which the interaction events are clearly stated result in strongly connected networks. The results of protein-protein interaction networks obtained in real applications from microarray experiments are discussed: For example, comparisons of the gene expression data indicative of either a good or a poor prognosis for acute lymphoblastic leukemia with MLL rearrangements, using our system, showed newly discovered signaling cross-talk.

  9. A Viral-Human Interactome Based on Structural Motif-Domain Interactions Captures the Human Infectome

    PubMed Central

    Guo, Xianwu; Rodríguez-Pérez, Mario A.

    2013-01-01

    Protein interactions between a pathogen and its host are fundamental in the establishment of the pathogen and underline the infection mechanism. In the present work, we developed a single predictive model for building a host-viral interactome based on the identification of structural descriptors from motif-domain interactions of protein complexes deposited in the Protein Data Bank (PDB). The structural descriptors were used for searching, in a database of protein sequences of human and five clinically important viruses; therefore, viral and human proteins sharing a descriptor were predicted as interacting proteins. The analysis of the host-viral interactome allowed to identify a set of new interactions that further explain molecular mechanism associated with viral infections and showed that it was able to capture human proteins already associated to viral infections (human infectome) and non-infectious diseases (human diseasome). The analysis of human proteins targeted by viral proteins in the context of a human interactome showed that their neighbors are enriched in proteins reported with differential expression under infection and disease conditions. It is expected that the findings of this work will contribute to the development of systems biology for infectious diseases, and help guide the rational identification and prioritization of novel drug targets. PMID:23951184

  10. MIPS: curated databases and comprehensive secondary data resources in 2010.

    PubMed

    Mewes, H Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F X; Stümpflen, Volker; Antonov, Alexey

    2011-01-01

    The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38,000,000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de).

  11. MIPS: curated databases and comprehensive secondary data resources in 2010

    PubMed Central

    Mewes, H. Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F.X.; Stümpflen, Volker; Antonov, Alexey

    2011-01-01

    The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38 000 000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de). PMID:21109531

  12. Gene: a gene-centered information resource at NCBI.

    PubMed

    Brown, Garth R; Hem, Vichet; Katz, Kenneth S; Ovetsky, Michael; Wallin, Craig; Ermolaeva, Olga; Tolstoy, Igor; Tatusova, Tatiana; Pruitt, Kim D; Maglott, Donna R; Murphy, Terence D

    2015-01-01

    The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  13. An emerging cyberinfrastructure for biodefense pathogen and pathogen-host data.

    PubMed

    Zhang, C; Crasta, O; Cammer, S; Will, R; Kenyon, R; Sullivan, D; Yu, Q; Sun, W; Jha, R; Liu, D; Xue, T; Zhang, Y; Moore, M; McGarvey, P; Huang, H; Chen, Y; Zhang, J; Mazumder, R; Wu, C; Sobral, B

    2008-01-01

    The NIAID-funded Biodefense Proteomics Resource Center (RC) provides storage, dissemination, visualization and analysis capabilities for the experimental data deposited by seven Proteomics Research Centers (PRCs). The data and its publication is to support researchers working to discover candidates for the next generation of vaccines, therapeutics and diagnostics against NIAID's Category A, B and C priority pathogens. The data includes transcriptional profiles, protein profiles, protein structural data and host-pathogen protein interactions, in the context of the pathogen life cycle in vivo and in vitro. The database has stored and supported host or pathogen data derived from Bacillus, Brucella, Cryptosporidium, Salmonella, SARS, Toxoplasma, Vibrio and Yersinia, human tissue libraries, and mouse macrophages. These publicly available data cover diverse data types such as mass spectrometry, yeast two-hybrid (Y2H), gene expression profiles, X-ray and NMR determined protein structures and protein expression clones. The growing database covers over 23 000 unique genes/proteins from different experiments and organisms. All of the genes/proteins are annotated and integrated across experiments using UniProt Knowledgebase (UniProtKB) accession numbers. The web-interface for the database enables searching, querying and downloading at the level of experiment, group and individual gene(s)/protein(s) via UniProtKB accession numbers or protein function keywords. The system is accessible at http://www.proteomicsresource.org/.

  14. Combining active learning and semi-supervised learning techniques to extract protein interaction sentences.

    PubMed

    Song, Min; Yu, Hwanjo; Han, Wook-Shin

    2011-11-24

    Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.

  15. sc-PDB: a database for identifying variations and multiplicity of 'druggable' binding sites in proteins.

    PubMed

    Meslamani, Jamel; Rognan, Didier; Kellenberger, Esther

    2011-05-01

    The sc-PDB database is an annotated archive of druggable binding sites extracted from the Protein Data Bank. It contains all-atoms coordinates for 8166 protein-ligand complexes, chosen for their geometrical and physico-chemical properties. The sc-PDB provides a functional annotation for proteins, a chemical description for ligands and the detailed intermolecular interactions for complexes. The sc-PDB now includes a hierarchical classification of all the binding sites within a functional class. The sc-PDB entries were first clustered according to the protein name indifferent of the species. For each cluster, we identified dissimilar sites (e.g. catalytic and allosteric sites of an enzyme). SCOPE AND APPLICATIONS: The classification of sc-PDB targets by binding site diversity was intended to facilitate chemogenomics approaches to drug design. In ligand-based approaches, it avoids comparing ligands that do not share the same binding site. In structure-based approaches, it permits to quantitatively evaluate the diversity of the binding site definition (variations in size, sequence and/or structure). The sc-PDB database is freely available at: http://bioinfo-pharma.u-strasbg.fr/scPDB.

  16. MDB: the Metalloprotein Database and Browser at The Scripps Research Institute

    PubMed Central

    Castagnetto, Jesus M.; Hennessy, Sean W.; Roberts, Victoria A.; Getzoff, Elizabeth D.; Tainer, John A.; Pique, Michael E.

    2002-01-01

    The Metalloprotein Database and Browser (MDB; http://metallo.scripps.edu) at The Scripps Research Institute is a web-accessible resource for metalloprotein research. It offers the scientific community quantitative information on geometrical parameters of metal-binding sites in protein structures available from the Protein Data Bank (PDB). The MDB also offers analytical tools for the examination of trends or patterns in the indexed metal-binding sites. A user can perform interactive searches, metal-site structure visualization (via a Java applet), and analysis of the quantitative data by accessing the MDB through a web browser without requiring an external application or platform-dependent plugin. The MDB also has a non-interactive interface with which other web sites and network-aware applications can seamlessly incorporate data or statistical analysis results from metal-binding sites. The information contained in the MDB is periodically updated with automated algorithms that find and index metal sites from new protein structures released by the PDB. PMID:11752342

  17. Proteome-wide Subcellular Topologies of E. coli Polypeptides Database (STEPdb)*

    PubMed Central

    Orfanoudaki, Georgia; Economou, Anastassios

    2014-01-01

    Cell compartmentalization serves both the isolation and the specialization of cell functions. After synthesis in the cytoplasm, over a third of all proteins are targeted to other subcellular compartments. Knowing how proteins are distributed within the cell and how they interact is a prerequisite for understanding it as a whole. Surface and secreted proteins are important pathogenicity determinants. Here we present the STEP database (STEPdb) that contains a comprehensive characterization of subcellular localization and topology of the complete proteome of Escherichia coli. Two widely used E. coli proteomes (K-12 and BL21) are presented organized into thirteen subcellular classes. STEPdb exploits the wealth of genetic, proteomic, biochemical, and functional information on protein localization, secretion, and targeting in E. coli, one of the best understood model organisms. Subcellular annotations were derived from a combination of bioinformatics prediction, proteomic, biochemical, functional, topological data and extensive literature re-examination that were refined through manual curation. Strong experimental support for the location of 1553 out of 4303 proteins was based on 426 articles and some experimental indications for another 526. Annotations were provided for another 320 proteins based on firm bioinformatic predictions. STEPdb is the first database that contains an extensive set of peripheral IM proteins (PIM proteins) and includes their graphical visualization into complexes, cellular functions, and interactions. It also summarizes all currently known protein export machineries of E. coli K-12 and pairs them, where available, with the secretory proteins that use them. It catalogs the Sec- and TAT-utilizing secretomes and summarizes their topological features such as signal peptides and transmembrane regions, transmembrane topologies and orientations. It also catalogs physicochemical and structural features that influence topology such as abundance, solubility, disorder, heat resistance, and structural domain families. Finally, STEPdb incorporates prediction tools for topology (TMHMM, SignalP, and Phobius) and disorder (IUPred) and implements the BLAST2STEP that performs protein homology searches against the STEPdb. PMID:25210196

  18. Atlas - a data warehouse for integrative bioinformatics.

    PubMed

    Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire M S; Ling, John; Ouellette, B F Francis

    2005-02-21

    We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/

  19. Atlas – a data warehouse for integrative bioinformatics

    PubMed Central

    Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire MS; Ling, John; Ouellette, BF Francis

    2005-01-01

    Background We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: PMID:15723693

  20. An automated method for finding molecular complexes in large protein interaction networks

    PubMed Central

    Bader, Gary D; Hogue, Christopher WV

    2003-01-01

    Background Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery. Results This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation. Conclusion Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from . PMID:12525261

  1. Peeping into Human Renal Calcium Oxalate Stone Matrix: Characterization of Novel Proteins Involved in the Intricate Mechanism of Urolithiasis

    PubMed Central

    Tandon, Chanderdeep

    2013-01-01

    Background The increasing number of patients suffering from urolithiasis represents one of the major challenges which nephrologists face worldwide today. For enhancing therapeutic outcomes of this disease, the pathogenic basis for the formation of renal stones is the need of hour. Proteins are found as major component in human renal stone matrix and are considered to have a potential role in crystal–membrane interaction, crystal growth and stone formation but their role in urolithiasis still remains obscure. Methods Proteins were isolated from the matrix of human CaOx containing kidney stones. Proteins having MW>3 kDa were subjected to anion exchange chromatography followed by molecular-sieve chromatography. The effect of these purified proteins was tested against CaOx nucleation and growth and on oxalate injured Madin–Darby Canine Kidney (MDCK) renal epithelial cells for their activity. Proteins were identified by Matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF MS) followed by database search with MASCOT server. In silico molecular interaction studies with CaOx crystals were also investigated. Results Five proteins were identified from the matrix of calcium oxalate kidney stones by MALDI-TOF MS followed by database search with MASCOT server with the competence to control the stone formation process. Out of which two proteins were promoters, two were inhibitors and one protein had a dual activity of both inhibition and promotion towards CaOx nucleation and growth. Further molecular modelling calculations revealed the mode of interaction of these proteins with CaOx at the molecular level. Conclusions We identified and characterized Ethanolamine-phosphate cytidylyltransferase, Ras GTPase-activating-like protein, UDP-glucose:glycoprotein glucosyltransferase 2, RIMS-binding protein 3A, Macrophage-capping protein as novel proteins from the matrix of human calcium oxalate stone which play a critical role in kidney stone formation. Thus, these proteins having potential to modulate calcium oxalate crystallization will throw light on understanding and controlling urolithiasis in humans. PMID:23894559

  2. Improvements in the Protein Identifier Cross-Reference service.

    PubMed

    Wein, Samuel P; Côté, Richard G; Dumousseau, Marine; Reisinger, Florian; Hermjakob, Henning; Vizcaíno, Juan A

    2012-07-01

    The Protein Identifier Cross-Reference (PICR) service is a tool that allows users to map protein identifiers, protein sequences and gene identifiers across over 100 different source databases. PICR takes input through an interactive website as well as Representational State Transfer (REST) and Simple Object Access Protocol (SOAP) services. It returns the results as HTML pages, XLS and CSV files. It has been in production since 2007 and has been recently enhanced to add new functionality and increase the number of databases it covers. Protein subsequences can be Basic Local Alignment Search Tool (BLAST) against the UniProt Knowledgebase (UniProtKB) to provide an entry point to the standard PICR mapping algorithm. In addition, gene identifiers from UniProtKB and Ensembl can now be submitted as input or mapped to as output from PICR. We have also implemented a 'best-guess' mapping algorithm for UniProt. In this article, we describe the usefulness of PICR, how these changes have been implemented, and the corresponding additions to the web services. Finally, we explain that the number of source databases covered by PICR has increased from the initial 73 to the current 102. New resources include several new species-specific Ensembl databases as well as the Ensembl Genome ones. PICR can be accessed at http://www.ebi.ac.uk/Tools/picr/.

  3. CPLA 1.0: an integrated database of protein lysine acetylation.

    PubMed

    Liu, Zexian; Cao, Jun; Gao, Xinjiao; Zhou, Yanhong; Wen, Longping; Yang, Xiangjiao; Yao, Xuebiao; Ren, Jian; Xue, Yu

    2011-01-01

    As a reversible post-translational modification (PTM) discovered decades ago, protein lysine acetylation was known for its regulation of transcription through the modification of histones. Recent studies discovered that lysine acetylation targets broad substrates and especially plays an essential role in cellular metabolic regulation. Although acetylation is comparable with other major PTMs such as phosphorylation, an integrated resource still remains to be developed. In this work, we presented the compendium of protein lysine acetylation (CPLA) database for lysine acetylated substrates with their sites. From the scientific literature, we manually collected 7151 experimentally identified acetylation sites in 3311 targets. We statistically studied the regulatory roles of lysine acetylation by analyzing the Gene Ontology (GO) and InterPro annotations. Combined with protein-protein interaction information, we systematically discovered a potential human lysine acetylation network (HLAN) among histone acetyltransferases (HATs), substrates and histone deacetylases (HDACs). In particular, there are 1862 triplet relationships of HAT-substrate-HDAC retrieved from the HLAN, at least 13 of which were previously experimentally verified. The online services of CPLA database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0). The CPLA database is freely available for all users at: http://cpla.biocuckoo.org.

  4. Genome scale enzyme–metabolite and drug–target interaction predictions using the signature molecular descriptor

    DOE PAGES

    Faulon, Jean-Loup; Misra, Milind; Martin, Shawn; ...

    2007-11-23

    Motivation: Identifying protein enzymatic or pharmacological activities are important areas of research in biology and chemistry. Biological and chemical databases are increasingly being populated with linkages between protein sequences and chemical structures. Additionally, there is now sufficient information to apply machine-learning techniques to predict interactions between chemicals and proteins at a genome scale. Current machine-learning techniques use as input either protein sequences and structures or chemical information. We propose here a method to infer protein–chemical interactions using heterogeneous input consisting of both protein sequence and chemical information. Results: Our method relies on expressing proteins and chemicals with a common cheminformaticsmore » representation. We demonstrate our approach by predicting whether proteins can catalyze reactions not present in training sets. We also predict whether a given drug can bind a target, in the absence of prior binding information for that drug and target. Lastly, such predictions cannot be made with current machine-learning techniques requiring binding information for individual reactions or individual targets.« less

  5. dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins.

    PubMed

    Huang, Kai-Yao; Su, Min-Gang; Kao, Hui-Ju; Hsieh, Yun-Chung; Jhong, Jhih-Hua; Cheng, Kuang-Hao; Huang, Hsien-Da; Lee, Tzong-Yi

    2016-01-04

    Owing to the importance of the post-translational modifications (PTMs) of proteins in regulating biological processes, the dbPTM (http://dbPTM.mbc.nctu.edu.tw/) was developed as a comprehensive database of experimentally verified PTMs from several databases with annotations of potential PTMs for all UniProtKB protein entries. For this 10th anniversary of dbPTM, the updated resource provides not only a comprehensive dataset of experimentally verified PTMs, supported by the literature, but also an integrative interface for accessing all available databases and tools that are associated with PTM analysis. As well as collecting experimental PTM data from 14 public databases, this update manually curates over 12 000 modified peptides, including the emerging S-nitrosylation, S-glutathionylation and succinylation, from approximately 500 research articles, which were retrieved by text mining. As the number of available PTM prediction methods increases, this work compiles a non-homologous benchmark dataset to evaluate the predictive power of online PTM prediction tools. An increasing interest in the structural investigation of PTM substrate sites motivated the mapping of all experimental PTM peptides to protein entries of Protein Data Bank (PDB) based on database identifier and sequence identity, which enables users to examine spatially neighboring amino acids, solvent-accessible surface area and side-chain orientations for PTM substrate sites on tertiary structures. Since drug binding in PDB is annotated, this update identified over 1100 PTM sites that are associated with drug binding. The update also integrates metabolic pathways and protein-protein interactions to support the PTM network analysis for a group of proteins. Finally, the web interface is redesigned and enhanced to facilitate access to this resource. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Topological and organizational properties of the products of house-keeping and tissue-specific genes in protein-protein interaction networks.

    PubMed

    Lin, Wen-Hsien; Liu, Wei-Chung; Hwang, Ming-Jing

    2009-03-11

    Human cells of various tissue types differ greatly in morphology despite having the same set of genetic information. Some genes are expressed in all cell types to perform house-keeping functions, while some are selectively expressed to perform tissue-specific functions. In this study, we wished to elucidate how proteins encoded by human house-keeping genes and tissue-specific genes are organized in human protein-protein interaction networks. We constructed protein-protein interaction networks for different tissue types using two gene expression datasets and one protein-protein interaction database. We then calculated three network indices of topological importance, the degree, closeness, and betweenness centralities, to measure the network position of proteins encoded by house-keeping and tissue-specific genes, and quantified their local connectivity structure. Compared to a random selection of proteins, house-keeping gene-encoded proteins tended to have a greater number of directly interacting neighbors and occupy network positions in several shortest paths of interaction between protein pairs, whereas tissue-specific gene-encoded proteins did not. In addition, house-keeping gene-encoded proteins tended to connect with other house-keeping gene-encoded proteins in all tissue types, whereas tissue-specific gene-encoded proteins also tended to connect with other tissue-specific gene-encoded proteins, but only in approximately half of the tissue types examined. Our analysis showed that house-keeping gene-encoded proteins tend to occupy important network positions, while those encoded by tissue-specific genes do not. The biological implications of our findings were discussed and we proposed a hypothesis regarding how cells organize their protein tools in protein-protein interaction networks. Our results led us to speculate that house-keeping gene-encoded proteins might form a core in human protein-protein interaction networks, while clusters of tissue-specific gene-encoded proteins are attached to the core at more peripheral positions of the networks.

  7. Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.

    PubMed

    Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz

    2015-01-01

    Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).

  8. ZikaBase: An integrated ZIKV- Human Interactome Map database.

    PubMed

    Gurumayum, Sanathoi; Brahma, Rahul; Naorem, Leimarembi Devi; Muthaiyan, Mathavan; Gopal, Jeyakodi; Venkatesan, Amouda

    2018-01-15

    Re-emergence of ZIKV has caused infections in more than 1.5 million people. The molecular mechanism and pathogenesis of ZIKV is not well explored due to unavailability of adequate model and lack of publically accessible resources to provide information of ZIKV-Human protein interactome map till today. This study made an attempt to curate the ZIKV-Human interaction proteins from published literatures and RNA-Seq data. 11 direct interaction, 12 associated genes are retrieved from literatures and 3742 Differentially Expressed Genes (DEGs) are obtained from RNA-Seq analysis. The genes have been analyzed to construct the ZIKV-Human Interactome Map. The importance of the study has been illustrated by the enrichment analysis and observed that direct interaction and associated genes are enriched in viral entry into host cell. Also, ZIKV infection modulates 32% signal and 27% immune system pathways. The integrated database, ZikaBase has been developed to help the virology research community and accessible at https://test5.bicpu.edu.in. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. The online Tabloid Proteome: an annotated database of protein associations

    PubMed Central

    Turan, Demet; Tavernier, Jan

    2018-01-01

    Abstract A complete knowledge of the proteome can only be attained by determining the associations between proteins, along with the nature of these associations (e.g. physical contact in protein–protein interactions, participation in complex formation or different roles in the same pathway). Despite extensive efforts in elucidating direct protein interactions, our knowledge on the complete spectrum of protein associations remains limited. We therefore developed a new approach that detects protein associations from identifications obtained after re-processing of large-scale, public mass spectrometry-based proteomics data. Our approach infers protein association based on the co-occurrence of proteins across many different proteomics experiments, and provides information that is almost completely complementary to traditional direct protein interaction studies. We here present a web interface to query and explore the associations derived from this method, called the online Tabloid Proteome. The online Tabloid Proteome also integrates biological knowledge from several existing resources to annotate our derived protein associations. The online Tabloid Proteome is freely available through a user-friendly web interface, which provides intuitive navigation and data exploration options for the user at http://iomics.ugent.be/tabloidproteome. PMID:29040688

  10. JavaProtein Dossier: a novel web-based data visualization tool for comprehensive analysis of protein structure

    PubMed Central

    Neshich, Goran; Rocchia, Walter; Mancini, Adauto L.; Yamagishi, Michel E. B.; Kuser, Paula R.; Fileto, Renato; Baudet, Christian; Pinto, Ivan P.; Montagner, Arnaldo J.; Palandrani, Juliana F.; Krauchenco, Joao N.; Torres, Renato C.; Souza, Savio; Togawa, Roberto C.; Higa, Roberto H.

    2004-01-01

    JavaProtein Dossier (JPD) is a new concept, database and visualization tool providing one of the largest collections of the physicochemical parameters describing proteins' structure, stability, function and interaction with other macromolecules. By collecting as many descriptors/parameters as possible within a single database, we can achieve a better use of the available data and information. Furthermore, data grouping allows us to generate different parameters with the potential to provide new insights into the sequence–structure–function relationship. In JPD, residue selection can be performed according to multiple criteria. JPD can simultaneously display and analyze all the physicochemical parameters of any pair of structures, using precalculated structural alignments, allowing direct parameter comparison at corresponding amino acid positions among homologous structures. In order to focus on the physicochemical (and consequently pharmacological) profile of proteins, visualization tools (showing the structure and structural parameters) also had to be optimized. Our response to this challenge was the use of Java technology with its exceptional level of interactivity. JPD is freely accessible (within the Gold Sting Suite) at http://sms.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS, http://trantor.bioc.columbia.edu/SMS and http://www.es.embnet.org/SMS/ (Option: JavaProtein Dossier). PMID:15215458

  11. The Knowledge-Integrated Network Biomarkers Discovery for Major Adverse Cardiac Events

    PubMed Central

    Jin, Guangxu; Zhou, Xiaobo; Wang, Honghui; Zhao, Hong; Cui, Kemi; Zhang, Xiang-Sun; Chen, Luonan; Hazen, Stanley L.; Li, King; Wong, Stephen T. C.

    2010-01-01

    The mass spectrometry (MS) technology in clinical proteomics is very promising for discovery of new biomarkers for diseases management. To overcome the obstacles of data noises in MS analysis, we proposed a new approach of knowledge-integrated biomarker discovery using data from Major Adverse Cardiac Events (MACE) patients. We first built up a cardiovascular-related network based on protein information coming from protein annotations in Uniprot, protein–protein interaction (PPI), and signal transduction database. Distinct from the previous machine learning methods in MS data processing, we then used statistical methods to discover biomarkers in cardiovascular-related network. Through the tradeoff between known protein information and data noises in mass spectrometry data, we finally could firmly identify those high-confident biomarkers. Most importantly, aided by protein–protein interaction network, that is, cardiovascular-related network, we proposed a new type of biomarkers, that is, network biomarkers, composed of a set of proteins and the interactions among them. The candidate network biomarkers can classify the two groups of patients more accurately than current single ones without consideration of biological molecular interaction. PMID:18665624

  12. PASS2: an automated database of protein alignments organised as structural superfamilies.

    PubMed

    Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan

    2004-04-02

    The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html

  13. ORFer--retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files.

    PubMed

    Büssow, Konrad; Hoffmann, Steve; Sievert, Volker

    2002-12-19

    Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.

  14. ProXL (Protein Cross-Linking Database): A Platform for Analysis, Visualization, and Sharing of Protein Cross-Linking Mass Spectrometry Data

    PubMed Central

    2016-01-01

    ProXL is a Web application and accompanying database designed for sharing, visualizing, and analyzing bottom-up protein cross-linking mass spectrometry data with an emphasis on structural analysis and quality control. ProXL is designed to be independent of any particular software pipeline. The import process is simplified by the use of the ProXL XML data format, which shields developers of data importers from the relative complexity of the relational database schema. The database and Web interfaces function equally well for any software pipeline and allow data from disparate pipelines to be merged and contrasted. ProXL includes robust public and private data sharing capabilities, including a project-based interface designed to ensure security and facilitate collaboration among multiple researchers. ProXL provides multiple interactive and highly dynamic data visualizations that facilitate structural-based analysis of the observed cross-links as well as quality control. ProXL is open-source, well-documented, and freely available at https://github.com/yeastrc/proxl-web-app. PMID:27302480

  15. ProXL (Protein Cross-Linking Database): A Platform for Analysis, Visualization, and Sharing of Protein Cross-Linking Mass Spectrometry Data.

    PubMed

    Riffle, Michael; Jaschob, Daniel; Zelter, Alex; Davis, Trisha N

    2016-08-05

    ProXL is a Web application and accompanying database designed for sharing, visualizing, and analyzing bottom-up protein cross-linking mass spectrometry data with an emphasis on structural analysis and quality control. ProXL is designed to be independent of any particular software pipeline. The import process is simplified by the use of the ProXL XML data format, which shields developers of data importers from the relative complexity of the relational database schema. The database and Web interfaces function equally well for any software pipeline and allow data from disparate pipelines to be merged and contrasted. ProXL includes robust public and private data sharing capabilities, including a project-based interface designed to ensure security and facilitate collaboration among multiple researchers. ProXL provides multiple interactive and highly dynamic data visualizations that facilitate structural-based analysis of the observed cross-links as well as quality control. ProXL is open-source, well-documented, and freely available at https://github.com/yeastrc/proxl-web-app .

  16. Investigation of candidate genes for osteoarthritis based on gene expression profiles.

    PubMed

    Dong, Shuanghai; Xia, Tian; Wang, Lei; Zhao, Qinghua; Tian, Jiwei

    2016-12-01

    To explore the mechanism of osteoarthritis (OA) and provide valid biological information for further investigation. Gene expression profile of GSE46750 was downloaded from Gene Expression Omnibus database. The Linear Models for Microarray Data (limma) package (Bioconductor project, http://www.bioconductor.org/packages/release/bioc/html/limma.html) was used to identify differentially expressed genes (DEGs) in inflamed OA samples. Gene Ontology function enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis of DEGs were performed based on Database for Annotation, Visualization and Integrated Discovery data, and protein-protein interaction (PPI) network was constructed based on the Search Tool for the Retrieval of Interacting Genes/Proteins database. Regulatory network was screened based on Encyclopedia of DNA Elements. Molecular Complex Detection was used for sub-network screening. Two sub-networks with highest node degree were integrated with transcriptional regulatory network and KEGG functional enrichment analysis was processed for 2 modules. In total, 401 up- and 196 down-regulated DEGs were obtained. Up-regulated DEGs were involved in inflammatory response, while down-regulated DEGs were involved in cell cycle. PPI network with 2392 protein interactions was constructed. Moreover, 10 genes including Interleukin 6 (IL6) and Aurora B kinase (AURKB) were found to be outstanding in PPI network. There are 214 up- and 8 down-regulated transcription factor (TF)-target pairs in the TF regulatory network. Module 1 had TFs including SPI1, PRDM1, and FOS, while module 2 contained FOSL1. The nodes in module 1 were enriched in chemokine signaling pathway, while the nodes in module 2 were mainly enriched in cell cycle. The screened DEGs including IL6, AGT, and AURKB might be potential biomarkers for gene therapy for OA by being regulated by TFs such as FOS and SPI1, and participating in the cell cycle and cytokine-cytokine receptor interaction pathway. Copyright © 2016 Turkish Association of Orthopaedics and Traumatology. Production and hosting by Elsevier B.V. All rights reserved.

  17. ModuleRole: a tool for modulization, role determination and visualization in protein-protein interaction networks.

    PubMed

    Li, Guipeng; Li, Ming; Zhang, Yiwei; Wang, Dong; Li, Rong; Guimerà, Roger; Gao, Juntao Tony; Zhang, Michael Q

    2014-01-01

    Rapidly increasing amounts of (physical and genetic) protein-protein interaction (PPI) data are produced by various high-throughput techniques, and interpretation of these data remains a major challenge. In order to gain insight into the organization and structure of the resultant large complex networks formed by interacting molecules, using simulated annealing, a method based on the node connectivity, we developed ModuleRole, a user-friendly web server tool which finds modules in PPI network and defines the roles for every node, and produces files for visualization in Cytoscape and Pajek. For given proteins, it analyzes the PPI network from BioGRID database, finds and visualizes the modules these proteins form, and then defines the role every node plays in this network, based on two topological parameters Participation Coefficient and Z-score. This is the first program which provides interactive and very friendly interface for biologists to find and visualize modules and roles of proteins in PPI network. It can be tested online at the website http://www.bioinfo.org/modulerole/index.php, which is free and open to all users and there is no login requirement, with demo data provided by "User Guide" in the menu Help. Non-server application of this program is considered for high-throughput data with more than 200 nodes or user's own interaction datasets. Users are able to bookmark the web link to the result page and access at a later time. As an interactive and highly customizable application, ModuleRole requires no expert knowledge in graph theory on the user side and can be used in both Linux and Windows system, thus a very useful tool for biologist to analyze and visualize PPI networks from databases such as BioGRID. ModuleRole is implemented in Java and C, and is freely available at http://www.bioinfo.org/modulerole/index.php. Supplementary information (user guide, demo data) is also available at this website. API for ModuleRole used for this program can be obtained upon request.

  18. BIPS: BIANA Interolog Prediction Server. A tool for protein-protein interaction inference.

    PubMed

    Garcia-Garcia, Javier; Schleker, Sylvia; Klein-Seetharaman, Judith; Oliva, Baldo

    2012-07-01

    Protein-protein interactions (PPIs) play a crucial role in biology, and high-throughput experiments have greatly increased the coverage of known interactions. Still, identification of complete inter- and intraspecies interactomes is far from being complete. Experimental data can be complemented by the prediction of PPIs within an organism or between two organisms based on the known interactions of the orthologous genes of other organisms (interologs). Here, we present the BIANA (Biologic Interactions and Network Analysis) Interolog Prediction Server (BIPS), which offers a web-based interface to facilitate PPI predictions based on interolog information. BIPS benefits from the capabilities of the framework BIANA to integrate the several PPI-related databases. Additional metadata can be used to improve the reliability of the predicted interactions. Sensitivity and specificity of the server have been calculated using known PPIs from different interactomes using a leave-one-out approach. The specificity is between 72 and 98%, whereas sensitivity varies between 1 and 59%, depending on the sequence identity cut-off used to calculate similarities between sequences. BIPS is freely accessible at http://sbi.imim.es/BIPS.php.

  19. Locating overlapping dense subgraphs in gene (protein) association networks and predicting novel protein functional groups among these subgraphs

    NASA Astrophysics Data System (ADS)

    Palla, Gergely; Derenyi, Imre; Farkas, Illes J.; Vicsek, Tamas

    2006-03-01

    Most tasks in a cell are performed not by individual proteins, but by functional groups of proteins (either physically interacting with each other or associated in other ways). In gene (protein) association networks these groups show up as sets of densely connected nodes. In the yeast, Saccharomyces cerevisiae, known physically interacting groups of proteins (called protein complexes) strongly overlap: the total number of proteins contained by these complexes by far underestimates the sum of their sizes (2750 vs. 8932). Thus, most functional groups of proteins, both physically interacting and other, are likely to share many of their members with other groups. However, current algorithms searching for dense groups of nodes in networks usually exclude overlaps. With the aim to discover both novel functions of individual proteins and novel protein functional groups we combine in protein association networks (i) a search for overlapping dense subgraphs based on the Clique Percolation Method (CPM) (Palla, G., et.al. Nature 435, 814-818 (2005), http://angel.elte.hu/clustering), which explicitly allows for overlaps among the groups, and (ii) a verification and characterization of the identified groups of nodes (proteins) with the help of standard annotation databases listing known functions.

  20. Analysis of Protein–Protein Interactions in MCF-7 and MDA-MB-231 Cell Lines Using Phthalic Acid Chemical Probes

    PubMed Central

    Liang, Shih-Shin; Wang, Tsu-Nai; Tsai, Eing-Mei

    2014-01-01

    Phthalates are a class of plasticizers that have been characterized as endocrine disrupters, and are associated with genital diseases, cardiotoxicity, hepatotoxicity, and nephrotoxicity in the GeneOntology gene/protein database. In this study, we synthesized phthalic acid chemical probes and demonstrated differing protein–protein interactions between MCF-7 cells and MDA-MB-231 breast cancer cell lines. Phthalic acid chemical probes were synthesized using silicon dioxide particle carriers, which were modified using the silanized linker 3-aminopropyl triethoxyslane (APTES). Incubation with cell lysates from breast cancer cell lines revealed interactions between phthalic acid and cellular proteins in MCF-7 and MDA-MB-231 cells. Subsequent proteomics analyses indicated 22 phthalic acid-binding proteins in both cell types, including heat shock cognate 71-kDa protein, ATP synthase subunit beta, and heat shock protein HSP 90-beta. In addition, 21 MCF-7-specific and 32 MDA-MB-231 specific phthalic acid-binding proteins were identified, including related proteasome proteins, heat shock 70-kDa protein, and NADPH dehydrogenase and ribosomal correlated proteins, ras-related proteins, and members of the heat shock protein family, respectively. PMID:25402641

  1. Receptor Tyrosine Kinase MET Interactome and Neurodevelopmental Disorder Partners at the Developing Synapse

    PubMed Central

    Xie, Zhihui; Li, Jing; Baker, Jonathan; Eagleson, Kathie L.; Coba, Marcelo P.; Levitt, Pat

    2016-01-01

    Background Atypical synapse development and plasticity are implicated in many neurodevelopmental disorders (NDDs). NDD-associated, high confidence risk genes have been identified, yet little is known about functional relationships at the level of protein-protein interactions, which are the dominant molecular bases responsible for mediating circuit development. Methods Proteomics in three independent developing neocortical synaptosomal preparations identified putative interacting proteins of the ligand-activated MET receptor tyrosine kinase, an autism risk gene that mediates synapse development. The candidates were translated into interactome networks and analyzed bioinformatically. Additionally, three independent quantitative proximity ligation assays (PLA) in cultured neurons and four independent immunoprecipitation analyses of synaptosomes validated protein interactions. Results Approximately 11% (8/72) of MET-interacting proteins, including SHANK3, SYNGAP1 and GRIN2B, are associated with NDDs. Proteins in the MET interactome were translated into a novel MET interactome network based on human protein-protein interaction databases. High confidence genes from different NDD datasets that encode synaptosomal proteins were analyzed for being enriched in MET interactome proteins. This was found for autism, but not schizophrenia, bipolar disorder, major depressive disorder or attentional deficit hyperactivity disorder. There is correlated gene expression between MET and its interactive partners in developing human temporal and visual neocortices, but not with highly expressed genes that are not in the interactome. PLA and biochemical analyses demonstrate that MET-protein partner interactions are dynamically regulated by receptor activation. Conclusions The results provide a novel molecular framework for deciphering the functional relations of key regulators of synaptogenesis that contribute to both typical cortical development and to NDDs. PMID:27086544

  2. SIRIUS. An automated method for the analysis of the preferred packing arrangements between protein groups.

    PubMed

    Singh, J; Thornton, J M

    1990-02-05

    Automated methods have been developed to determine the preferred packing arrangement between interacting protein groups. A suite of FORTRAN programs, SIRIUS, is described for calculating and analysing the geometries of interacting protein groups using crystallographically derived atomic co-ordinates. The programs involved in calculating the geometries search for interacting pairs of protein groups using a distance criterion, and then calculate the spatial disposition and orientation of the pair. The second set of programs is devoted to analysis. This involves calculating the observed and expected distributions of the angles and assessing the statistical significance of the difference between the two. A database of the geometries of the 400 combinations of side-chain to side-chain interaction has been created. The approach used in analysing the geometrical information is illustrated here with specific examples of interactions between side-chains, peptide groups and particular types of atom. At the side-chain level, an analysis of aromatic-amino interactions, and the interactions of peptide carbonyl groups with arginine residues is presented. At the atomic level the analyses include the spatial disposition of oxygen atoms around tyrosine residues, and the frequency and type of contact between carbon, nitrogen and oxygen atoms. This information is currently being applied to the modelling of protein interactions.

  3. The Salivary Secretome of the Tsetse Fly Glossina pallidipes (Diptera: Glossinidae) Infected by Salivary Gland Hypertrophy Virus

    PubMed Central

    Kariithi, Henry M.; Ince, Ikbal A.; Boeren, Sjef; Abd-Alla, Adly M. M.; Parker, Andrew G.; Aksoy, Serap; Vlak, Just M.; van Oers, Monique M.

    2011-01-01

    Background The competence of the tsetse fly Glossina pallidipes (Diptera; Glossinidae) to acquire salivary gland hypertrophy virus (SGHV), to support virus replication and successfully transmit the virus depends on complex interactions between Glossina and SGHV macromolecules. Critical requisites to SGHV transmission are its replication and secretion of mature virions into the fly's salivary gland (SG) lumen. However, secretion of host proteins is of equal importance for successful transmission and requires cataloging of G. pallidipes secretome proteins from hypertrophied and non-hypertrophied SGs. Methodology/Principal Findings After electrophoretic profiling and in-gel trypsin digestion, saliva proteins were analyzed by nano-LC-MS/MS. MaxQuant/Andromeda search of the MS data against the non-redundant (nr) GenBank database and a G. morsitans morsitans SG EST database, yielded a total of 521 hits, 31 of which were SGHV-encoded. On a false discovery rate limit of 1% and detection threshold of least 2 unique peptides per protein, the analysis resulted in 292 Glossina and 25 SGHV MS-supported proteins. When annotated by the Blast2GO suite, at least one gene ontology (GO) term could be assigned to 89.9% (285/317) of the detected proteins. Five (∼1.8%) Glossina and three (∼12%) SGHV proteins remained without a predicted function after blast searches against the nr database. Sixty-five of the 292 detected Glossina proteins contained an N-terminal signal/secretion peptide sequence. Eight of the SGHV proteins were predicted to be non-structural (NS), and fourteen are known structural (VP) proteins. Conclusions/Significance SGHV alters the protein expression pattern in Glossina. The G. pallidipes SG secretome encompasses a spectrum of proteins that may be required during the SGHV infection cycle. These detected proteins have putative interactions with at least 21 of the 25 SGHV-encoded proteins. Our findings opens venues for developing novel SGHV mitigation strategies to block SGHV infections in tsetse production facilities such as using SGHV-specific antibodies and phage display-selected gut epithelia-binding peptides. PMID:22132244

  4. Database resources of the National Center for Biotechnology Information.

    PubMed

    Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; Dicuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian

    2012-01-01

    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

  5. Database resources of the National Center for Biotechnology Information

    PubMed Central

    Acland, Abigail; Agarwala, Richa; Barrett, Tanya; Beck, Jeff; Benson, Dennis A.; Bollin, Colleen; Bolton, Evan; Bryant, Stephen H.; Canese, Kathi; Church, Deanna M.; Clark, Karen; DiCuccio, Michael; Dondoshansky, Ilya; Federhen, Scott; Feolo, Michael; Geer, Lewis Y.; Gorelenkov, Viatcheslav; Hoeppner, Marilu; Johnson, Mark; Kelly, Christopher; Khotomlianski, Viatcheslav; Kimchi, Avi; Kimelman, Michael; Kitts, Paul; Krasnov, Sergey; Kuznetsov, Anatoliy; Landsman, David; Lipman, David J.; Lu, Zhiyong; Madden, Thomas L.; Madej, Tom; Maglott, Donna R.; Marchler-Bauer, Aron; Karsch-Mizrachi, Ilene; Murphy, Terence; Ostell, James; O'Sullivan, Christopher; Panchenko, Anna; Phan, Lon; Pruitt, Don Preussm Kim D.; Rubinstein, Wendy; Sayers, Eric W.; Schneider, Valerie; Schuler, Gregory D.; Sequeira, Edwin; Sherry, Stephen T.; Shumway, Martin; Sirotkin, Karl; Siyan, Karanjit; Slotta, Douglas; Soboleva, Alexandra; Soussov, Vladimir; Starchenko, Grigory; Tatusova, Tatiana A.; Trawick, Bart W.; Vakatov, Denis; Wang, Yanli; Ward, Minghong; John Wilbur, W.; Yaschenko, Eugene; Zbicz, Kerry

    2014-01-01

    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, PubReader, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Primer-BLAST, COBALT, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, ClinVar, MedGen, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page. PMID:24259429

  6. Insight into the transcriptome of Arthrobotrys conoides using high throughput sequencing.

    PubMed

    Ramesh, Pandit; Reena, Patel; Amitbikram, Mohapatra; Chaitanya, Joshi; Anju, Kunjadia

    2015-12-01

    Arthrobotrys conoides is a nematode-trapping fungus belonging to Orbiliales, Ascomycota group, and traps prey nematodes by means of adhesive network. Fungus has a potential to be used as a biocontrol agent against plant parasitic nematodes. In the present study, we characterized the transcriptome of A. conoides using high-throughput sequencing technology and characterized its virulence unigenes. Total 7,255 cDNA contigs with an average length of 425 bp were generated and 6184 (61.81%) transcripts were functionally annotated and characterized. Majority of unigenes were found analogous to the genes of plant pathogenic fungi. A total of 1749 transcripts were found to be orthologous with eukaryotic proteins of KOG database. Several carbohydrate active enzymes and peptidases were identified. We also analyzed classically and nonclassically secreted proteins and confirmed by BLASTP against fungal secretome database. A total of 916 contigs were analogous to 556 unique proteins of Pathogen Host Interaction (PHI) database. Further, we identified 91 unigenes homologous to the database of fungal virulence factor (DFVF). A total of 104 putative protein kinases coding transcripts were identified by BLASTP against KinBase database, which are major players in signaling pathways. This study provides a comprehensive look at the transcriptome of A. conoides and the identified unigenes might have a role in catching and killing prey nematodes by A. conoides. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Amyloid precursor protein interaction network in human testis: sentinel proteins for male reproduction.

    PubMed

    Silva, Joana Vieira; Yoon, Sooyeon; Domingues, Sara; Guimarães, Sofia; Goltsev, Alexander V; da Cruz E Silva, Edgar Figueiredo; Mendes, José Fernando F; da Cruz E Silva, Odete Abreu Beirão; Fardilha, Margarida

    2015-01-16

    Amyloid precursor protein (APP) is widely recognized for playing a central role in Alzheimer's disease pathogenesis. Although APP is expressed in several tissues outside the human central nervous system, the functions of APP and its family members in other tissues are still poorly understood. APP is involved in several biological functions which might be potentially important for male fertility, such as cell adhesion, cell motility, signaling, and apoptosis. Furthermore, APP superfamily members are known to be associated with fertility. Knowledge on the protein networks of APP in human testis and spermatozoa will shed light on the function of APP in the male reproductive system. We performed a Yeast Two-Hybrid screen and a database search to study the interaction network of APP in human testis and sperm. To gain insights into the role of APP superfamily members in fertility, the study was extended to APP-like protein 2 (APLP2). We analyzed several topological properties of the APP interaction network and the biological and physiological properties of the proteins in the APP interaction network were also specified by gene ontologyand pathways analyses. We classified significant features related to the human male reproduction for the APP interacting proteins and identified modules of proteins with similar functional roles which may show cooperative behavior for male fertility. The present work provides the first report on the APP interactome in human testis. Our approach allowed the identification of novel interactions and recognition of key APP interacting proteins for male reproduction, particularly in sperm-oocyte interaction.

  8. HippDB: a database of readily targeted helical protein-protein interactions.

    PubMed

    Bergey, Christina M; Watkins, Andrew M; Arora, Paramjit S

    2013-11-01

    HippDB catalogs every protein-protein interaction whose structure is available in the Protein Data Bank and which exhibits one or more helices at the interface. The Web site accepts queries on variables such as helix length and sequence, and it provides computational alanine scanning and change in solvent-accessible surface area values for every interfacial residue. HippDB is intended to serve as a starting point for structure-based small molecule and peptidomimetic drug development. HippDB is freely available on the web at http://www.nyu.edu/projects/arora/hippdb. The Web site is implemented in PHP, MySQL and Apache. Source code freely available for download at http://code.google.com/p/helidb, implemented in Perl and supported on Linux. arora@nyu.edu.

  9. An emerging cyberinfrastructure for biodefense pathogen and pathogen–host data

    PubMed Central

    Zhang, C.; Crasta, O.; Cammer, S.; Will, R.; Kenyon, R.; Sullivan, D.; Yu, Q.; Sun, W.; Jha, R.; Liu, D.; Xue, T.; Zhang, Y.; Moore, M.; McGarvey, P.; Huang, H.; Chen, Y.; Zhang, J.; Mazumder, R.; Wu, C.; Sobral, B.

    2008-01-01

    The NIAID-funded Biodefense Proteomics Resource Center (RC) provides storage, dissemination, visualization and analysis capabilities for the experimental data deposited by seven Proteomics Research Centers (PRCs). The data and its publication is to support researchers working to discover candidates for the next generation of vaccines, therapeutics and diagnostics against NIAID's Category A, B and C priority pathogens. The data includes transcriptional profiles, protein profiles, protein structural data and host–pathogen protein interactions, in the context of the pathogen life cycle in vivo and in vitro. The database has stored and supported host or pathogen data derived from Bacillus, Brucella, Cryptosporidium, Salmonella, SARS, Toxoplasma, Vibrio and Yersinia, human tissue libraries, and mouse macrophages. These publicly available data cover diverse data types such as mass spectrometry, yeast two-hybrid (Y2H), gene expression profiles, X-ray and NMR determined protein structures and protein expression clones. The growing database covers over 23 000 unique genes/proteins from different experiments and organisms. All of the genes/proteins are annotated and integrated across experiments using UniProt Knowledgebase (UniProtKB) accession numbers. The web-interface for the database enables searching, querying and downloading at the level of experiment, group and individual gene(s)/protein(s) via UniProtKB accession numbers or protein function keywords. The system is accessible at http://www.proteomicsresource.org/. PMID:17984082

  10. Proteomics screening of adenosine triphosphate-interacting proteins in the liver of diazinon-treated rats.

    PubMed

    Pourtaji, A; Robati, R Yazdian; Lari, P; Hosseinzadeh, H; Ramezani, M; Abnous, K

    2016-10-01

    Diazinon (DZN) is one of the most important organophosphorus compounds used to control pests in agriculture in many countries. Several studies have shown that exposure to DZN may alter protein expression in the liver. In order to further investigate the mechanism of DZN toxicity, differentially expressed ATP-interacting proteins, following subacute exposure to toxin, were separated and identified in rat liver. Male rats were equally divided into four groups: control (corn oil) and DZN (15 mg/kg) by gavage once a day for 4 weeks. After homogenization of liver tissue, lysates were incubated ATP-sepharose beads. After several washes, ATP-interacting proteins were eluted and separated on 2-D polyacrylamide gels. Deferentially expressed proteins were cut and identified using matrix-assisted laser desorption/ionization/time-of-flight and Mascot database. Identified proteins were classified according to their biological process using protein analysis through evolutionary relationships (PANTHER) Web site. In this work, we showed that several key proteins involved in biological processes such as antioxidant system, oxidative stress, apoptosis, and metabolism were differentially expressed after subacute exposure to DZN. © The Author(s) 2015.

  11. Novel Gbeta Mimic Kelch Proteins (Gpb1 and Gpb2 Connect G-Protein Signaling to Ras via Yeast Neurofibromin Homologs Ira1 and Ira2: A Model for Human NF1

    DTIC Science & Technology

    2008-03-01

    tractable fungal model system, Cryptococcus neoformans, and identified two kelch repeat homologs that are involved in mating (Kem1 and Kem2). To...find kelch-repeat proteins involved in G protein signaling, Cryptococcus homologues of Gpb1/2, which interacts with and negatively regulates the G...protein alpha subunit, Gpa2, in S. cerevisiae, were searched by BLAST (tblastn) in Cryptococcus genome database of serotype A (Duke University Medical

  12. Physical and in silico approaches identify DNA-PK in a Tax DNA-damage response interactome

    PubMed Central

    Ramadan, Emad; Ward, Michael; Guo, Xin; Durkin, Sarah S; Sawyer, Adam; Vilela, Marcelo; Osgood, Christopher; Pothen, Alex; Semmes, Oliver J

    2008-01-01

    Background We have initiated an effort to exhaustively map interactions between HTLV-1 Tax and host cellular proteins. The resulting Tax interactome will have significant utility toward defining new and understanding known activities of this important viral protein. In addition, the completion of a full Tax interactome will also help shed light upon the functional consequences of these myriad Tax activities. The physical mapping process involved the affinity isolation of Tax complexes followed by sequence identification using tandem mass spectrometry. To date we have mapped 250 cellular components within this interactome. Here we present our approach to prioritizing these interactions via an in silico culling process. Results We first constructed an in silico Tax interactome comprised of 46 literature-confirmed protein-protein interactions. This number was then reduced to four Tax-interactions suspected to play a role in DNA damage response (Rad51, TOP1, Chk2, 53BP1). The first-neighbor and second-neighbor interactions of these four proteins were assembled from available human protein interaction databases. Through an analysis of betweenness and closeness centrality measures, and numbers of interactions, we ranked proteins in the first neighborhood. When this rank list was compared to the list of physical Tax-binding proteins, DNA-PK was the highest ranked protein common to both lists. An overlapping clustering of the Tax-specific second-neighborhood protein network showed DNA-PK to be one of three bridge proteins that link multiple clusters in the DNA damage response network. Conclusion The interaction of Tax with DNA-PK represents an important biological paradigm as suggested via consensus findings in vivo and in silico. We present this methodology as an approach to discovery and as a means of validating components of a consensus Tax interactome. PMID:18922151

  13. Identification of Potent Chloride Intracellular Channel Protein 1 Inhibitors from Traditional Chinese Medicine through Structure-Based Virtual Screening and Molecular Dynamics Analysis

    PubMed Central

    Wan, Minghui; Liao, Dongjiang; Peng, Guilin; Xu, Xin; Yin, Weiqiang; Guo, Guixin; Jiang, Funeng; Zhong, Weide

    2017-01-01

    Chloride intracellular channel 1 (CLIC1) is involved in the development of most aggressive human tumors, including gastric, colon, lung, liver, and glioblastoma cancers. It has become an attractive new therapeutic target for several types of cancer. In this work, we aim to identify natural products as potent CLIC1 inhibitors from Traditional Chinese Medicine (TCM) database using structure-based virtual screening and molecular dynamics (MD) simulation. First, structure-based docking was employed to screen the refined TCM database and the top 500 TCM compounds were obtained and reranked by X-Score. Then, 30 potent hits were achieved from the top 500 TCM compounds using cluster and ligand-protein interaction analysis. Finally, MD simulation was employed to validate the stability of interactions between each hit and CLIC1 protein from docking simulation, and Molecular Mechanics/Generalized Born Surface Area (MM-GBSA) analysis was used to refine the virtual hits. Six TCM compounds with top MM-GBSA scores and ideal-binding models were confirmed as the final hits. Our study provides information about the interaction between TCM compounds and CLIC1 protein, which may be helpful for further experimental investigations. In addition, the top 6 natural products structural scaffolds could serve as building blocks in designing drug-like molecules for CLIC1 inhibition. PMID:29147652

  14. Identification of Histone Deacetylase (HDAC) as a drug target against MRSA via interolog method of protein-protein interaction prediction.

    PubMed

    Uddin, Reaz; Tariq, Syeda Sumayya; Azam, Syed Sikander; Wadood, Abdul; Moin, Syed Tarique

    2017-08-30

    Patently, Protein-Protein Interactions (PPIs) lie at the core of significant biological functions and make the foundation of host-pathogen relationships. Hence, the current study is aimed to use computational biology techniques to predict host-pathogen Protein-Protein Interactions (HP-PPIs) between MRSA and Humans as potential drug targets ultimately proposing new possible inhibitors against them. As a matter of fact this study is based on the Interolog method which implies that homologous proteins retain their ability to interact. A distant homolog approach based on Interolog method was employed to speculate MRSA protein homologs in Humans using PSI-BLAST. In addition the protein interaction partners of these homologs as listed in Database of Interacting Proteins (DIP) were predicted to interact with MRSA as well. Moreover, a direct approach using BLAST was also applied so as to attain further confidence in the strategy. Consequently, the common HP-PPIs predicted by both approaches are suggested as potential drug targets (22%) whereas, the unique HP-PPIs estimated only through distant homolog approach are presented as novel drug targets (12%). Furthermore, the most repeated entry in our results was found to be MRSA Histone Deacetylase (HDAC) which was then modeled using SWISS-MODEL. Eventually, small molecules from ZINC, selected randomly, were docked against HDAC using Auto Dock and are suggested as potential binders (inhibitors) based on their energetic profiles. Thus the current study provides basis for further in-depth analysis of such data which not only include MRSA but other deadly pathogens as well. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Identification of interacting proteins of the TaFVE protein involved in spike development in bread wheat.

    PubMed

    Zheng, Yong-Sheng; Lu, Yu-Qing; Meng, Ying-Ying; Zhang, Rong-Zhi; Zhang, Han; Sun, Jia-Mei; Wang, Mu-Mu; Li, Li-Hui; Li, Ru-Yu

    2017-05-01

    WD-40 repeat-containing protein MSI4 (FVE)/MSI4 plays important roles in determining flowering time in Arabidopsis. However, its function is unexplored in wheat. In the present study, coimmunoprecipitation and nanoscale liquid chromatography coupled to MS/MS were used to identify FVE in wheat (TaFVE)-interacting or associated proteins. Altogether 89 differentially expressed proteins showed the same downregulated expression trends as TaFVE in wheat line 5660M. Among them, 62 proteins were further predicted to be involved in the interaction network of TaFVE and 11 proteins have been shown to be potential TaFVE interactors based on curated databases and experimentally determined in other species by the STRING. Both yeast two-hybrid assay and bimolecular fluorescence complementation assay showed that histone deacetylase 6 and histone deacetylase 15 directly interacted with TaFVE. Multiple chromatin-remodelling proteins and polycomb group proteins were also identified and predicted to interact with TaFVE. These results showed that TaFVE directly interacted with multiple proteins to form multiple complexes to regulate spike developmental process, e.g. histone deacetylate, chromatin-remodelling and polycomb repressive complex 2 complexes. In addition, multiple flower development regulation factors (e.g. flowering locus K homology domain, flowering time control protein FPA, FY, flowering time control protein FCA, APETALA 1) involved in floral transition were also identified in the present study. Taken together, these results further elucidate the regulatory functions of TaFVE and help reveal the genetic mechanisms underlying wheat spike differentiation. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Key genes and pathways in measles and their interaction with environmental chemicals.

    PubMed

    Zhang, Rongqiang; Jiang, Hualin; Li, Fengying; Su, Ning; Ding, Yi; Mao, Xiang; Ren, Dan; Wang, Jing

    2018-06-01

    The aim of the present study was to explore key genes that may have a role in the pathology of measles virus infection and to clarify the interaction networks between environmental factors and differentially expressed genes (DEGs). After screening the database of the Gene Expression Omnibus of the National Center for Biotechnology Information, the dataset GSE5808 was downloaded and analyzed. A global normalization method was performed to minimize data inconsistencies and heterogeneity. DEGs during different stages of measles virus infection were explored using R software (v3.4.0). Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of the DEGs were performed using Cytoscape 3.4.0 software. A protein-protein interaction (PPI) network of the DEGs was obtained from the STRING database v9.05. A total of 43 DEGs were obtained from four analyzed sample groups, including 10 highly expressed genes and 33 genes with decreased expression. The most enriched pathways based on KEGG analysis were fatty acid elongation, cytokine-cytokine receptor interaction and RNA degradation. The genes mentioned in the PPI network were mainly associated with protein binding and chemokine activity. A total of 219 chemicals were identified that may, jointly or on their own, interact with the 6 DEGs between the control group and patients with measles (at hospital entry), including benzo(a)pyrene (BaP) and tetrachlorodibenzodioxin (TCDD). In conclusion, the present study revealed that chemokines and environmental chemicals, e.g. BaP and TCDD, may affect the development of measles.

  17. Drug-Like Protein–Protein Interaction Modulators: Challenges and Opportunities for Drug Discovery and Chemical Biology

    PubMed Central

    Villoutreix, Bruno O; Kuenemann, Melaine A; Poyet, Jean-Luc; Bruzzoni-Giovanelli, Heriberto; Labbé, Céline; Lagorce, David; Sperandio, Olivier; Miteva, Maria A

    2014-01-01

    Fundamental processes in living cells are largely controlled by macromolecular interactions and among them, protein–protein interactions (PPIs) have a critical role while their dysregulations can contribute to the pathogenesis of numerous diseases. Although PPIs were considered as attractive pharmaceutical targets already some years ago, they have been thus far largely unexploited for therapeutic interventions with low molecular weight compounds. Several limiting factors, from technological hurdles to conceptual barriers, are known, which, taken together, explain why research in this area has been relatively slow. However, this last decade, the scientific community has challenged the dogma and became more enthusiastic about the modulation of PPIs with small drug-like molecules. In fact, several success stories were reported both, at the preclinical and clinical stages. In this review article, written for the 2014 International Summer School in Chemoinformatics (Strasbourg, France), we discuss in silico tools (essentially post 2012) and databases that can assist the design of low molecular weight PPI modulators (these tools can be found at www.vls3d.com). We first introduce the field of protein–protein interaction research, discuss key challenges and comment recently reported in silico packages, protocols and databases dedicated to PPIs. Then, we illustrate how in silico methods can be used and combined with experimental work to identify PPI modulators. PMID:25254076

  18. The role of stabilization centers in protein thermal stability

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Magyar, Csaba; Gromiha, M. Michael; Sávoly, Zoltán

    2016-02-26

    The definition of stabilization centers was introduced almost two decades ago. They are centers of noncovalent long range interaction clusters, believed to have a role in maintaining the three-dimensional structure of proteins by preventing their decay due to their cooperative long range interactions. Here, this hypothesis is investigated from the viewpoint of thermal stability for the first time, using a large protein thermodynamics database. The positions of amino acids belonging to stabilization centers are correlated with available experimental thermodynamic data on protein thermal stability. Our analysis suggests that stabilization centers, especially solvent exposed ones, do contribute to the thermal stabilizationmore » of proteins. - Highlights: • Stabilization centers contribute to thermal stabilization of protein structures. • Stabilization center content correlates with melting temperature of proteins. • Exposed stabilization center content correlates with stability even in hyperthermophiles. • Stability changing mutations are frequently found at stabilization centers.« less

  19. atBioNet--an integrated network analysis tool for genomics and biomarker discovery.

    PubMed

    Ding, Yijun; Chen, Minjun; Liu, Zhichao; Ding, Don; Ye, Yanbin; Zhang, Min; Kelly, Reagan; Guo, Li; Su, Zhenqiang; Harris, Stephen C; Qian, Feng; Ge, Weigong; Fang, Hong; Xu, Xiaowei; Tong, Weida

    2012-07-20

    Large amounts of mammalian protein-protein interaction (PPI) data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should integrate different content-specific PPI databases into a comprehensive network format with a user-friendly platform to identify key functional modules/pathways and the underlying mechanisms of disease and toxicity. atBioNet integrates seven publicly available PPI databases into a network-specific knowledge base. Knowledge expansion is achieved by expanding a user supplied proteins/genes list with interactions from its integrated PPI network. The statistically significant functional modules are determined by applying a fast network-clustering algorithm (SCAN: a Structural Clustering Algorithm for Networks). The functional modules can be visualized either separately or together in the context of the whole network. Integration of pathway information enables enrichment analysis and assessment of the biological function of modules. Three case studies are presented using publicly available disease gene signatures as a basis to discover new biomarkers for acute leukemia, systemic lupus erythematosus, and breast cancer. The results demonstrated that atBioNet can not only identify functional modules and pathways related to the studied diseases, but this information can also be used to hypothesize novel biomarkers for future analysis. atBioNet is a free web-based network analysis tool that provides a systematic insight into proteins/genes interactions through examining significant functional modules. The identified functional modules are useful for determining underlying mechanisms of disease and biomarker discovery. It can be accessed at: http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm285284.htm.

  20. Patterns of HIV-1 Protein Interaction Identify Perturbed Host-Cellular Subsystems

    PubMed Central

    MacPherson, Jamie I.; Dickerson, Jonathan E.; Pinney, John W.; Robertson, David L.

    2010-01-01

    Human immunodeficiency virus type 1 (HIV-1) exploits a diverse array of host cell functions in order to replicate. This is mediated through a network of virus-host interactions. A variety of recent studies have catalogued this information. In particular the HIV-1, Human Protein Interaction Database (HHPID) has provided a unique depth of protein interaction detail. However, as a map of HIV-1 infection, the HHPID is problematic, as it contains curation error and redundancy; in addition, it is based on a heterogeneous set of experimental methods. Based on identifying shared patterns of HIV-host interaction, we have developed a novel methodology to delimit the core set of host-cellular functions and their associated perturbation from the HHPID. Initially, using biclustering, we identify 279 significant sets of host proteins that undergo the same types of interaction. The functional cohesiveness of these protein sets was validated using a human protein-protein interaction network, gene ontology annotation and sequence similarity. Next, using a distance measure, we group host protein sets and identify 37 distinct higher-level subsystems. We further demonstrate the biological significance of these subsystems by cross-referencing with global siRNA screens that have been used to detect host factors necessary for HIV-1 replication, and investigate the seemingly small intersect between these data sets. Our results highlight significant host-cell subsystems that are perturbed during the course of HIV-1 infection. Moreover, we characterise the patterns of interaction that contribute to these perturbations. Thus, our work disentangles the complex set of HIV-1-host protein interactions in the HHPID, reconciles these with siRNA screens and provides an accessible and interpretable map of infection. PMID:20686668

  1. Molecular differences between mature and immature dental pulp cells: Bioinformatics and preliminary results.

    PubMed

    Chen, Long; Jiang, Yifeng; Du, Zhen

    2018-04-01

    Although previous studies have demonstrated that dental pulp stem cells (DPSCs) from mature and immature teeth exhibit potential for multi-directional differentiation, the molecular and biological difference between the DPSCs from mature and immature permanent teeth has not been fully investigated. In the present study, 500 differentially expressed genes from dental pulp cells (DPCs) in mature and immature permanent teeth were obtained from the Gene Expression Omnibus online database. Based on bioinformatics analysis using the Database for Annotation, Visualization and Integrated Discovery, these genes were divided into a number of subgroups associated with immunity, inflammation and cell signaling. The results of the present study suggest that immune features, response to infection and cell signaling may be different in DPCs from mature and immature permanent teeth; furthermore, DPCs from immature permanent teeth may be more suitable for use in tissue engineering or stem cell therapy. The Online Mendelian Inheritance in Man database stated that Sonic Hedgehog (SHH), a differentially expressed gene in DPCs from mature and immature permanent teeth, serves a crucial role in the development of craniofacial tissues, including teeth, which further confirmed that SHH may cause DPCs from mature and immature permanent teeth to exhibit different biological characteristics. The Search Tool for the Retrieval of Interacting Genes/Proteins database revealed that SHH has functional protein associations with a number of other proteins, including Glioma-associated oncogene (GLI)1, GLI2, growth arrest-specific protein 1, bone morphogenetic protein (BMP)2 and BMP4, in mice and humans. It was also demonstrated that SHH may interact with other genes to regulate the biological characteristics of DPCs. The results of the present study may provide a useful reference basis for selecting suitable DPSCs and molecules for the treatment of these cells to optimize features for tissue engineering or stem cell therapy. Quantitative polymerase chain reaction should be performed to confirm the differential expression of these genes prior to the beginning of a functional study.

  2. CPLA 1.0: an integrated database of protein lysine acetylation

    PubMed Central

    Liu, Zexian; Cao, Jun; Gao, Xinjiao; Zhou, Yanhong; Wen, Longping; Yang, Xiangjiao; Yao, Xuebiao; Ren, Jian; Xue, Yu

    2011-01-01

    As a reversible post-translational modification (PTM) discovered decades ago, protein lysine acetylation was known for its regulation of transcription through the modification of histones. Recent studies discovered that lysine acetylation targets broad substrates and especially plays an essential role in cellular metabolic regulation. Although acetylation is comparable with other major PTMs such as phosphorylation, an integrated resource still remains to be developed. In this work, we presented the compendium of protein lysine acetylation (CPLA) database for lysine acetylated substrates with their sites. From the scientific literature, we manually collected 7151 experimentally identified acetylation sites in 3311 targets. We statistically studied the regulatory roles of lysine acetylation by analyzing the Gene Ontology (GO) and InterPro annotations. Combined with protein–protein interaction information, we systematically discovered a potential human lysine acetylation network (HLAN) among histone acetyltransferases (HATs), substrates and histone deacetylases (HDACs). In particular, there are 1862 triplet relationships of HAT-substrate-HDAC retrieved from the HLAN, at least 13 of which were previously experimentally verified. The online services of CPLA database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0). The CPLA database is freely available for all users at: http://cpla.biocuckoo.org. PMID:21059677

  3. The Molecule Pages database

    PubMed Central

    Saunders, Brian; Lyon, Stephen; Day, Matthew; Riley, Brenda; Chenette, Emily; Subramaniam, Shankar

    2008-01-01

    The UCSD-Nature Signaling Gateway Molecule Pages (http://www.signaling-gateway.org/molecule) provides essential information on more than 3800 mammalian proteins involved in cellular signaling. The Molecule Pages contain expert-authored and peer-reviewed information based on the published literature, complemented by regularly updated information derived from public data source references and sequence analysis. The expert-authored data includes both a full-text review about the molecule, with citations, and highly structured data for bioinformatics interrogation, including information on protein interactions and states, transitions between states and protein function. The expert-authored pages are anonymously peer reviewed by the Nature Publishing Group. The Molecule Pages data is present in an object-relational database format and is freely accessible to the authors, the reviewers and the public from a web browser that serves as a presentation layer. The Molecule Pages are supported by several applications that along with the database and the interfaces form a multi-tier architecture. The Molecule Pages and the Signaling Gateway are routinely accessed by a very large research community. PMID:17965093

  4. The Molecule Pages database.

    PubMed

    Saunders, Brian; Lyon, Stephen; Day, Matthew; Riley, Brenda; Chenette, Emily; Subramaniam, Shankar; Vadivelu, Ilango

    2008-01-01

    The UCSD-Nature Signaling Gateway Molecule Pages (http://www.signaling-gateway.org/molecule) provides essential information on more than 3800 mammalian proteins involved in cellular signaling. The Molecule Pages contain expert-authored and peer-reviewed information based on the published literature, complemented by regularly updated information derived from public data source references and sequence analysis. The expert-authored data includes both a full-text review about the molecule, with citations, and highly structured data for bioinformatics interrogation, including information on protein interactions and states, transitions between states and protein function. The expert-authored pages are anonymously peer reviewed by the Nature Publishing Group. The Molecule Pages data is present in an object-relational database format and is freely accessible to the authors, the reviewers and the public from a web browser that serves as a presentation layer. The Molecule Pages are supported by several applications that along with the database and the interfaces form a multi-tier architecture. The Molecule Pages and the Signaling Gateway are routinely accessed by a very large research community.

  5. Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks.

    PubMed

    Deeter, Anthony; Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui

    2017-01-01

    The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.

  6. Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions.

    PubMed

    Liu, Zhihai; Su, Minyi; Han, Li; Liu, Jie; Yang, Qifan; Li, Yan; Wang, Renxiao

    2017-02-21

    In structure-based drug design, scoring functions are widely used for fast evaluation of protein-ligand interactions. They are often applied in combination with molecular docking and de novo design methods. Since the early 1990s, a whole spectrum of protein-ligand interaction scoring functions have been developed. Regardless of their technical difference, scoring functions all need data sets combining protein-ligand complex structures and binding affinity data for parametrization and validation. However, data sets of this kind used to be rather limited in terms of size and quality. On the other hand, standard metrics for evaluating scoring function used to be ambiguous. Scoring functions are often tested in molecular docking or even virtual screening trials, which do not directly reflect the genuine quality of scoring functions. Collectively, these underlying obstacles have impeded the invention of more advanced scoring functions. In this Account, we describe our long-lasting efforts to overcome these obstacles, which involve two related projects. On the first project, we have created the PDBbind database. It is the first database that systematically annotates the protein-ligand complexes in the Protein Data Bank (PDB) with experimental binding data. This database has been updated annually since its first public release in 2004. The latest release (version 2016) provides binding data for 16 179 biomolecular complexes in PDB. Data sets provided by PDBbind have been applied to many computational and statistical studies on protein-ligand interaction and various subjects. In particular, it has become a major data resource for scoring function development. On the second project, we have established the Comparative Assessment of Scoring Functions (CASF) benchmark for scoring function evaluation. Our key idea is to decouple the "scoring" process from the "sampling" process, so scoring functions can be tested in a relatively pure context to reflect their quality. In our latest work on this track, i.e. CASF-2013, the performance of a scoring function was quantified in four aspects, including "scoring power", "ranking power", "docking power", and "screening power". All four performance tests were conducted on a test set containing 195 high-quality protein-ligand complexes selected from PDBbind. A panel of 20 standard scoring functions were tested as demonstration. Importantly, CASF is designed to be an open-access benchmark, with which scoring functions developed by different researchers can be compared on the same grounds. Indeed, it has become a popular choice for scoring function validation in recent years. Despite the considerable progress that has been made so far, the performance of today's scoring functions still does not meet people's expectations in many aspects. There is a constant demand for more advanced scoring functions. Our efforts have helped to overcome some obstacles underlying scoring function development so that the researchers in this field can move forward faster. We will continue to improve the PDBbind database and the CASF benchmark in the future to keep them as useful community resources.

  7. PCoM-DB Update: A Protein Co-Migration Database for Photosynthetic Organisms.

    PubMed

    Takabayashi, Atsushi; Takabayashi, Saeka; Takahashi, Kaori; Watanabe, Mai; Uchida, Hiroko; Murakami, Akio; Fujita, Tomomichi; Ikeuchi, Masahiko; Tanaka, Ayumi

    2017-01-01

    The identification of protein complexes is important for the understanding of protein structure and function and the regulation of cellular processes. We used blue-native PAGE and tandem mass spectrometry to identify protein complexes systematically, and built a web database, the protein co-migration database (PCoM-DB, http://pcomdb.lowtem.hokudai.ac.jp/proteins/top), to provide prediction tools for protein complexes. PCoM-DB provides migration profiles for any given protein of interest, and allows users to compare them with migration profiles of other proteins, showing the oligomeric states of proteins and thus identifying potential interaction partners. The initial version of PCoM-DB (launched in January 2013) included protein complex data for Synechocystis whole cells and Arabidopsis thaliana thylakoid membranes. Here we report PCoM-DB version 2.0, which includes new data sets and analytical tools. Additional data are included from whole cells of the pelagic marine picocyanobacterium Prochlorococcus marinus, the thermophilic cyanobacterium Thermosynechococcus elongatus, the unicellular green alga Chlamydomonas reinhardtii and the bryophyte Physcomitrella patens. The Arabidopsis protein data now include data for intact mitochondria, intact chloroplasts, chloroplast stroma and chloroplast envelopes. The new tools comprise a multiple-protein search form and a heat map viewer for protein migration profiles. Users can compare migration profiles of a protein of interest among different organelles or compare migration profiles among different proteins within the same sample. For Arabidopsis proteins, users can compare migration profiles of a protein of interest with putative homologous proteins from non-Arabidopsis organisms. The updated PCoM-DB will help researchers find novel protein complexes and estimate their evolutionary changes in the green lineage. © The Author 2017. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  8. ReLiance: a machine learning and literature-based prioritization of receptor—ligand pairings

    PubMed Central

    Iacucci, Ernesto; Tranchevent, Léon-Charles; Popovic, Dusan; Pavlopoulos, Georgios A.; De Moor, Bart; Schneider, Reinhard; Moreau, Yves

    2012-01-01

    Motivation: The prediction of receptor—ligand pairings is an important area of research as intercellular communications are mediated by the successful interaction of these key proteins. As the exhaustive assaying of receptor—ligand pairs is impractical, a computational approach to predict pairings is necessary. We propose a workflow to carry out this interaction prediction task, using a text mining approach in conjunction with a state of the art prediction method, as well as a widely accessible and comprehensive dataset. Among several modern classifiers, random forests have been found to be the best at this prediction task. The training of this classifier was carried out using an experimentally validated dataset of Database of Ligand-Receptor Partners (DLRP) receptor—ligand pairs. New examples, co-cited with the training receptors and ligands, are then classified using the trained classifier. After applying our method, we find that we are able to successfully predict receptor—ligand pairs within the GPCR family with a balanced accuracy of 0.96. Upon further inspection, we find several supported interactions that were not present in the Database of Interacting Proteins (DIPdatabase). We have measured the balanced accuracy of our method resulting in high quality predictions stored in the available database ReLiance. Availability: http://homes.esat.kuleuven.be/~bioiuser/ReLianceDB/index.php Contact: yves.moreau@esat.kuleuven.be; ernesto.iacucci@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22962483

  9. Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.

    PubMed

    Tuo, Youlin; An, Ning; Zhang, Ming

    2018-03-01

    The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of P<0.05. Based on the protein‑protein interactions (PPIs) in the Biological General Repository for Interaction Datasets, Human Protein Reference Database and Biomolecular Interaction Network Database, the PPI network of the feature genes was constructed. The feature genes identified by topological characteristics were then used for support vector machine (SVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several independent datasets. CDK2, CDKN1A, E2F1 and MYC were indicated as the potential feature genes in metastatic breast cancer.

  10. cisPath: an R/Bioconductor package for cloud users for visualization and management of functional protein interaction networks.

    PubMed

    Wang, Likun; Yang, Luhe; Peng, Zuohan; Lu, Dan; Jin, Yan; McNutt, Michael; Yin, Yuxin

    2015-01-01

    With the burgeoning development of cloud technology and services, there are an increasing number of users who prefer cloud to run their applications. All software and associated data are hosted on the cloud, allowing users to access them via a web browser from any computer, anywhere. This paper presents cisPath, an R/Bioconductor package deployed on cloud servers for client users to visualize, manage, and share functional protein interaction networks. With this R package, users can easily integrate downloaded protein-protein interaction information from different online databases with private data to construct new and personalized interaction networks. Additional functions allow users to generate specific networks based on private databases. Since the results produced with the use of this package are in the form of web pages, cloud users can easily view and edit the network graphs via the browser, using a mouse or touch screen, without the need to download them to a local computer. This package can also be installed and run on a local desktop computer. Depending on user preference, results can be publicized or shared by uploading to a web server or cloud driver, allowing other users to directly access results via a web browser. This package can be installed and run on a variety of platforms. Since all network views are shown in web pages, such package is particularly useful for cloud users. The easy installation and operation is an attractive quality for R beginners and users with no previous experience with cloud services.

  11. cisPath: an R/Bioconductor package for cloud users for visualization and management of functional protein interaction networks

    PubMed Central

    2015-01-01

    Background With the burgeoning development of cloud technology and services, there are an increasing number of users who prefer cloud to run their applications. All software and associated data are hosted on the cloud, allowing users to access them via a web browser from any computer, anywhere. This paper presents cisPath, an R/Bioconductor package deployed on cloud servers for client users to visualize, manage, and share functional protein interaction networks. Results With this R package, users can easily integrate downloaded protein-protein interaction information from different online databases with private data to construct new and personalized interaction networks. Additional functions allow users to generate specific networks based on private databases. Since the results produced with the use of this package are in the form of web pages, cloud users can easily view and edit the network graphs via the browser, using a mouse or touch screen, without the need to download them to a local computer. This package can also be installed and run on a local desktop computer. Depending on user preference, results can be publicized or shared by uploading to a web server or cloud driver, allowing other users to directly access results via a web browser. Conclusions This package can be installed and run on a variety of platforms. Since all network views are shown in web pages, such package is particularly useful for cloud users. The easy installation and operation is an attractive quality for R beginners and users with no previous experience with cloud services. PMID:25708840

  12. Receptor Tyrosine Kinase MET Interactome and Neurodevelopmental Disorder Partners at the Developing Synapse.

    PubMed

    Xie, Zhihui; Li, Jing; Baker, Jonathan; Eagleson, Kathie L; Coba, Marcelo P; Levitt, Pat

    2016-12-15

    Atypical synapse development and plasticity are implicated in many neurodevelopmental disorders (NDDs). NDD-associated, high-confidence risk genes have been identified, yet little is known about functional relationships at the level of protein-protein interactions, which are the dominant molecular bases responsible for mediating circuit development. Proteomics in three independent developing neocortical synaptosomal preparations identified putative interacting proteins of the ligand-activated MET receptor tyrosine kinase, an autism risk gene that mediates synapse development. The candidates were translated into interactome networks and analyzed bioinformatically. Additionally, three independent quantitative proximity ligation assays in cultured neurons and four independent immunoprecipitation analyses of synaptosomes validated protein interactions. Approximately 11% (8/72) of MET-interacting proteins, including SHANK3, SYNGAP1, and GRIN2B, are associated with NDDs. Proteins in the MET interactome were translated into a novel MET interactome network based on human protein-protein interaction databases. High-confidence genes from different NDD datasets that encode synaptosomal proteins were analyzed for being enriched in MET interactome proteins. This was found for autism but not schizophrenia, bipolar disorder, major depressive disorder, or attention-deficit/hyperactivity disorder. There is correlated gene expression between MET and its interactive partners in developing human temporal and visual neocortices but not with highly expressed genes that are not in the interactome. Proximity ligation assays and biochemical analyses demonstrate that MET-protein partner interactions are dynamically regulated by receptor activation. The results provide a novel molecular framework for deciphering the functional relations of key regulators of synaptogenesis that contribute to both typical cortical development and to NDDs. Copyright © 2016 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  13. Genetic analysis of the cytoplasmic dynein subunit families.

    PubMed

    Pfister, K Kevin; Shah, Paresh R; Hummerich, Holger; Russ, Andreas; Cotton, James; Annuar, Azlina Ahmad; King, Stephen M; Fisher, Elizabeth M C

    2006-01-01

    Cytoplasmic dyneins, the principal microtubule minus-end-directed motor proteins of the cell, are involved in many essential cellular processes. The major form of this enzyme is a complex of at least six protein subunits, and in mammals all but one of the subunits are encoded by at least two genes. Here we review current knowledge concerning the subunits, their interactions, and their functional roles as derived from biochemical and genetic analyses. We also carried out extensive database searches to look for new genes and to clarify anomalies in the databases. Our analysis documents evolutionary relationships among the dynein subunits of mammals and other model organisms, and sheds new light on the role of this diverse group of proteins, highlighting the existence of two cytoplasmic dynein complexes with distinct cellular roles.

  14. Genetic Analysis of the Cytoplasmic Dynein Subunit Families

    PubMed Central

    Pfister, K. Kevin; Shah, Paresh R; Hummerich, Holger; Russ, Andreas; Cotton, James; Annuar, Azlina Ahmad; King, Stephen M; Fisher, Elizabeth M. C

    2006-01-01

    Cytoplasmic dyneins, the principal microtubule minus-end-directed motor proteins of the cell, are involved in many essential cellular processes. The major form of this enzyme is a complex of at least six protein subunits, and in mammals all but one of the subunits are encoded by at least two genes. Here we review current knowledge concerning the subunits, their interactions, and their functional roles as derived from biochemical and genetic analyses. We also carried out extensive database searches to look for new genes and to clarify anomalies in the databases. Our analysis documents evolutionary relationships among the dynein subunits of mammals and other model organisms, and sheds new light on the role of this diverse group of proteins, highlighting the existence of two cytoplasmic dynein complexes with distinct cellular roles. PMID:16440056

  15. HotRegion: a database of predicted hot spot clusters.

    PubMed

    Cukuroglu, Engin; Gursoy, Attila; Keskin, Ozlem

    2012-01-01

    Hot spots are energetically important residues at protein interfaces and they are not randomly distributed across the interface but rather clustered. These clustered hot spots form hot regions. Hot regions are important for the stability of protein complexes, as well as providing specificity to binding sites. We propose a database called HotRegion, which provides the hot region information of the interfaces by using predicted hot spot residues, and structural properties of these interface residues such as pair potentials of interface residues, accessible surface area (ASA) and relative ASA values of interface residues of both monomer and complex forms of proteins. Also, the 3D visualization of the interface and interactions among hot spot residues are provided. HotRegion is accessible at http://prism.ccbb.ku.edu.tr/hotregion.

  16. Real-Time Ligand Binding Pocket Database Search Using Local Surface Descriptors

    PubMed Central

    Chikhi, Rayan; Sael, Lee; Kihara, Daisuke

    2010-01-01

    Due to the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of a particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two dimensional pseudo-Zernike moments or the 3D Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark study employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed. PMID:20455259

  17. Real-time ligand binding pocket database search using local surface descriptors.

    PubMed

    Chikhi, Rayan; Sael, Lee; Kihara, Daisuke

    2010-07-01

    Because of the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two-dimensional pseudo-Zernike moments or the three-dimensional Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark studies employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed.

  18. P2P proteomics -- data sharing for enhanced protein identification

    PubMed Central

    2012-01-01

    Background In order to tackle the important and challenging problem in proteomics of identifying known and new protein sequences using high-throughput methods, we propose a data-sharing platform that uses fully distributed P2P technologies to share specifications of peer-interaction protocols and service components. By using such a platform, information to be searched is no longer centralised in a few repositories but gathered from experiments in peer proteomics laboratories, which can subsequently be searched by fellow researchers. Methods The system distributively runs a data-sharing protocol specified in the Lightweight Communication Calculus underlying the system through which researchers interact via message passing. For this, researchers interact with the system through particular components that link to database querying systems based on BLAST and/or OMSSA and GUI-based visualisation environments. We have tested the proposed platform with data drawn from preexisting MS/MS data reservoirs from the 2006 ABRF (Association of Biomolecular Resource Facilities) test sample, which was extensively tested during the ABRF Proteomics Standards Research Group 2006 worldwide survey. In particular we have taken the data available from a subset of proteomics laboratories of Spain's National Institute for Proteomics, ProteoRed, a network for the coordination, integration and development of the Spanish proteomics facilities. Results and Discussion We performed queries against nine databases including seven ProteoRed proteomics laboratories, the NCBI Swiss-Prot database and the local database of the CSIC/UAB Proteomics Laboratory. A detailed analysis of the results indicated the presence of a protein that was supported by other NCBI matches and highly scored matches in several proteomics labs. The analysis clearly indicated that the protein was a relatively high concentrated contaminant that could be present in the ABRF sample. This fact is evident from the information that could be derived from the proposed P2P proteomics system, however it is not straightforward to arrive to the same conclusion by conventional means as it is difficult to discard organic contamination of samples. The actual presence of this contaminant was only stated after the ABRF study of all the identifications reported by the laboratories. PMID:22293032

  19. Application of clustering methods: Regularized Markov clustering (R-MCL) for analyzing dengue virus similarity

    NASA Astrophysics Data System (ADS)

    Lestari, D.; Raharjo, D.; Bustamam, A.; Abdillah, B.; Widhianto, W.

    2017-07-01

    Dengue virus consists of 10 different constituent proteins and are classified into 4 major serotypes (DEN 1 - DEN 4). This study was designed to perform clustering against 30 protein sequences of dengue virus taken from Virus Pathogen Database and Analysis Resource (VIPR) using Regularized Markov Clustering (R-MCL) algorithm and then we analyze the result. By using Python program 3.4, R-MCL algorithm produces 8 clusters with more than one centroid in several clusters. The number of centroid shows the density level of interaction. Protein interactions that are connected in a tissue, form a complex protein that serves as a specific biological process unit. The analysis of result shows the R-MCL clustering produces clusters of dengue virus family based on the similarity role of their constituent protein, regardless of serotypes.

  20. CCProf: exploring conformational change profile of proteins

    PubMed Central

    Chang, Che-Wei; Chou, Chai-Wei; Chang, Darby Tien-Hao

    2016-01-01

    In many biological processes, proteins have important interactions with various molecules such as proteins, ions or ligands. Many proteins undergo conformational changes upon these interactions, where regions with large conformational changes are critical to the interactions. This work presents the CCProf platform, which provides conformational changes of entire proteins, named conformational change profile (CCP) in the context. CCProf aims to be a platform where users can study potential causes of novel conformational changes. It provides 10 biological features, including conformational change, potential binding target site, secondary structure, conservation, disorder propensity, hydropathy propensity, sequence domain, structural domain, phosphorylation site and catalytic site. All these information are integrated into a well-aligned view, so that researchers can capture important relevance between different biological features visually. The CCProf contains 986 187 protein structure pairs for 3123 proteins. In addition, CCProf provides a 3D view in which users can see the protein structures before and after conformational changes as well as binding targets that induce conformational changes. All information (e.g. CCP, binding targets and protein structures) shown in CCProf, including intermediate data are available for download to expedite further analyses. Database URL: http://zoro.ee.ncku.edu.tw/ccprof/ PMID:27016699

  1. Construction of phosphorylation interaction networks by text mining of full-length articles using the eFIP system.

    PubMed

    Tudor, Catalina O; Ross, Karen E; Li, Gang; Vijay-Shanker, K; Wu, Cathy H; Arighi, Cecilia N

    2015-01-01

    Protein phosphorylation is a reversible post-translational modification where a protein kinase adds a phosphate group to a protein, potentially regulating its function, localization and/or activity. Phosphorylation can affect protein-protein interactions (PPIs), abolishing interaction with previous binding partners or enabling new interactions. Extracting phosphorylation information coupled with PPI information from the scientific literature will facilitate the creation of phosphorylation interaction networks of kinases, substrates and interacting partners, toward knowledge discovery of functional outcomes of protein phosphorylation. Increasingly, PPI databases are interested in capturing the phosphorylation state of interacting partners. We have previously developed the eFIP (Extracting Functional Impact of Phosphorylation) text mining system, which identifies phosphorylated proteins and phosphorylation-dependent PPIs. In this work, we present several enhancements for the eFIP system: (i) text mining for full-length articles from the PubMed Central open-access collection; (ii) the integration of the RLIMS-P 2.0 system for the extraction of phosphorylation events with kinase, substrate and site information; (iii) the extension of the PPI module with new trigger words/phrases describing interactions and (iv) the addition of the iSimp tool for sentence simplification to aid in the matching of syntactic patterns. We enhance the website functionality to: (i) support searches based on protein roles (kinases, substrates, interacting partners) or using keywords; (ii) link protein entities to their corresponding UniProt identifiers if mapped and (iii) support visual exploration of phosphorylation interaction networks using Cytoscape. The evaluation of eFIP on full-length articles achieved 92.4% precision, 76.5% recall and 83.7% F-measure on 100 article sections. To demonstrate eFIP for knowledge extraction and discovery, we constructed phosphorylation-dependent interaction networks involving 14-3-3 proteins identified from cancer-related versus diabetes-related articles. Comparison of the phosphorylation interaction network of kinases, phosphoproteins and interactants obtained from eFIP searches, along with enrichment analysis of the protein set, revealed several shared interactions, highlighting common pathways discussed in the context of both diseases. © The Author(s) 2015. Published by Oxford University Press.

  2. Non-transcriptional interactions of Hox proteins: inventory, facts, and future directions.

    PubMed

    Rezsohazy, René

    2014-01-01

    Hox proteins are conserved homeodomain transcription factors involved in the control of embryo patterning, organ development, and cell differentiation during animal development and adult life. Although recognizably active in gene regulation, accumulating reports support that Hox proteins are also active in controlling other molecular processes like mRNA translation, DNA repair, initiation of DNA replication, and possibly modulation of signal transduction. Here we review experimental evidence as well as databases entries indicative of non-transcriptional activities of Hox proteins. Copyright © 2013 Wiley Periodicals, Inc.

  3. 3DNALandscapes: a database for exploring the conformational features of DNA.

    PubMed

    Zheng, Guohui; Colasanti, Andrew V; Lu, Xiang-Jun; Olson, Wilma K

    2010-01-01

    3DNALandscapes, located at: http://3DNAscapes.rutgers.edu, is a new database for exploring the conformational features of DNA. In contrast to most structural databases, which archive the Cartesian coordinates and/or derived parameters and images for individual structures, 3DNALandscapes enables searches of conformational information across multiple structures. The database contains a wide variety of structural parameters and molecular images, computed with the 3DNA software package and known to be useful for characterizing and understanding the sequence-dependent spatial arrangements of the DNA sugar-phosphate backbone, sugar-base side groups, base pairs, base-pair steps, groove structure, etc. The data comprise all DNA-containing structures--both free and bound to proteins, drugs and other ligands--currently available in the Protein Data Bank. The web interface allows the user to link, report, plot and analyze this information from numerous perspectives and thereby gain insight into DNA conformation, deformability and interactions in different sequence and structural contexts. The data accumulated from known, well-resolved DNA structures can serve as useful benchmarks for the analysis and simulation of new structures. The collective data can also help to understand how DNA deforms in response to proteins and other molecules and undergoes conformational rearrangements.

  4. IMGT/3Dstructure-DB and IMGT/StructuralQuery, a database and a tool for immunoglobulin, T cell receptor and MHC structural data

    PubMed Central

    Kaas, Quentin; Ruiz, Manuel; Lefranc, Marie-Paule

    2004-01-01

    IMGT/3Dstructure-DB and IMGT/Structural-Query are a novel 3D structure database and a new tool for immunological proteins. They are part of IMGT, the international ImMunoGenetics information system®, a high-quality integrated knowledge resource specializing in immunoglobulins (IG), T cell receptors (TR), major histocompatibility complex (MHC) and related proteins of the immune system (RPI) of human and other vertebrate species, which consists of databases, Web resources and interactive on-line tools. IMGT/3Dstructure-DB data are described according to the IMGT Scientific chart rules based on the IMGT-ONTOLOGY concepts. IMGT/3Dstructure-DB provides IMGT gene and allele identification of IG, TR and MHC proteins with known 3D structures, domain delimitations, amino acid positions according to the IMGT unique numbering and renumbered coordinate flat files. Moreover IMGT/3Dstructure-DB provides 2D graphical representations (or Collier de Perles) and results of contact analysis. The IMGT/StructuralQuery tool allows search of this database based on specific structural characteristics. IMGT/3Dstructure-DB and IMGT/StructuralQuery are freely available at http://imgt.cines.fr. PMID:14681396

  5. The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences.

    PubMed

    Fourment, Mathieu; Gibbs, Mark J

    2008-02-05

    Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically.

  6. Indel PDB: a database of structural insertions and deletions derived from sequence alignments of closely related proteins.

    PubMed

    Hsing, Michael; Cherkasov, Artem

    2008-06-25

    Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.

  7. Prediction of binding constants of protein ligands: A fast method for the prioritization of hits obtained from de novo design or 3D database search programs

    NASA Astrophysics Data System (ADS)

    Böhm, Hans-Joachim

    1998-07-01

    A dataset of 82 protein-ligand complexes of known 3D structure and binding constant Ki was analysed to elucidate the important factors that determine the strength of protein-ligand interactions. The following parameters were investigated: the number and geometry of hydrogen bonds and ionic interactions between the protein and the ligand, the size of the lipophilic contact surface, the flexibility of the ligand, the electrostatic potential in the binding site, water molecules in the binding site, cavities along the protein-ligand interface and specific interactions between aromatic rings. Based on these parameters, a new empirical scoring function is presented that estimates the free energy of binding for a protein-ligand complex of known 3D structure. The function distinguishes between buried and solvent accessible hydrogen bonds. It tolerates deviations in the hydrogen bond geometry of up to 0.25 Å in the length and up to 30 °Cs in the hydrogen bond angle without penalizing the score. The new energy function reproduces the binding constants (ranging from 3.7 × 10-2 M to 1 × 10-14 M, corresponding to binding energies between -8 and -80 kJ/mol) of the dataset with a standard deviation of 7.3 kJ/mol corresponding to 1.3 orders of magnitude in binding affinity. The function can be evaluated very fast and is therefore also suitable for the application in a 3D database search or de novo ligand design program such as LUDI. The physical significance of the individual contributions is discussed.

  8. iLIR@viral: A web resource for LIR motif-containing proteins in viruses.

    PubMed

    Jacomin, Anne-Claire; Samavedam, Siva; Charles, Hannah; Nezis, Ioannis P

    2017-10-03

    Macroautophagy/autophagy has been shown to mediate the selective lysosomal degradation of pathogenic bacteria and viruses (xenophagy), and to contribute to the activation of innate and adaptative immune responses. Autophagy can serve as an antiviral defense mechanism but also as a proviral process during infection. Atg8-family proteins play a central role in the autophagy process due to their ability to interact with components of the autophagy machinery as well as selective autophagy receptors and adaptor proteins. Such interactions are usually mediated through LC3-interacting region (LIR) motifs. So far, only one viral protein has been experimentally shown to have a functional LIR motif, leaving open a vast field for investigation. Here, we have developed the iLIR@viral database ( http://ilir.uk/virus/ ) as a freely accessible web resource listing all the putative canonical LIR motifs identified in viral proteins. Additionally, we used a curated text-mining analysis of the literature to identify novel putative LIR motif-containing proteins (LIRCPs) in viruses. We anticipate that iLIR@viral will assist with elucidating the full complement of LIRCPs in viruses.

  9. Deppdb--DNA electrostatic potential properties database: electrostatic properties of genome DNA.

    PubMed

    Osypov, Alexander A; Krutinin, Gleb G; Kamzolova, Svetlana G

    2010-06-01

    The electrostatic properties of genome DNA influence its interactions with different proteins, in particular, the regulation of transcription by RNA-polymerases. DEPPDB--DNA Electrostatic Potential Properties Database--was developed to hold and provide all available information on the electrostatic properties of genome DNA combined with its sequence and annotation of biological and structural properties of genome elements and whole genomes. Genomes in DEPPDB are organized on a taxonomical basis. Currently, the database contains all the completely sequenced bacterial and viral genomes according to NCBI RefSeq. General properties of the genome DNA electrostatic potential profile and principles of its formation are revealed. This potential correlates with the GC content but does not correspond to it exactly and strongly depends on both the sequence arrangement and its context (flanking regions). Analysis of the promoter regions for bacterial and viral RNA polymerases revealed a correspondence between the scale of these proteins' physical properties and electrostatic profile patterns. We also discovered a direct correlation between the potential value and the binding frequency of RNA polymerase to DNA, supporting the idea of the role of electrostatics in these interactions. This matches a pronounced tendency of the promoter regions to possess higher values of the electrostatic potential.

  10. RNA Bricks—a database of RNA 3D motifs and their interactions

    PubMed Central

    Chojnowski, Grzegorz; Waleń, Tomasz; Bujnicki, Janusz M.

    2014-01-01

    The RNA Bricks database (http://iimcb.genesilico.pl/rnabricks), stores information about recurrent RNA 3D motifs and their interactions, found in experimentally determined RNA structures and in RNA–protein complexes. In contrast to other similar tools (RNA 3D Motif Atlas, RNA Frabase, Rloom) RNA motifs, i.e. ‘RNA bricks’ are presented in the molecular environment, in which they were determined, including RNA, protein, metal ions, water molecules and ligands. All nucleotide residues in RNA bricks are annotated with structural quality scores that describe real-space correlation coefficients with the electron density data (if available), backbone geometry and possible steric conflicts, which can be used to identify poorly modeled residues. The database is also equipped with an algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms of the user-provided query structure and of stored RNA motifs, without relying on sequence or secondary structure information. This enables the identification of local structural similarities among evolutionarily related and unrelated RNA molecules. Besides, the search utility enables searching ‘RNA bricks’ according to sequence similarity, and makes it possible to identify motifs with modified ribonucleotide residues at specific positions. PMID:24220091

  11. MIPS: analysis and annotation of genome information in 2007

    PubMed Central

    Mewes, H. W.; Dietmann, S.; Frishman, D.; Gregory, R.; Mannhaupt, G.; Mayer, K. F. X.; Münsterkötter, M.; Ruepp, A.; Spannagl, M.; Stümpflen, V.; Rattei, T.

    2008-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:18158298

  12. MIPS: analysis and annotation of genome information in 2007.

    PubMed

    Mewes, H W; Dietmann, S; Frishman, D; Gregory, R; Mannhaupt, G; Mayer, K F X; Münsterkötter, M; Ruepp, A; Spannagl, M; Stümpflen, V; Rattei, T

    2008-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  13. Proteomic analysis of sweet algerian apricot kernels (Prunus armeniaca L.) by combinatorial peptide ligand libraries and LC-MS/MS.

    PubMed

    Ghorab, Hamida; Lammi, Carmen; Arnoldi, Anna; Kabouche, Zahia; Aiello, Gilda

    2018-01-15

    An investigation on the proteome of the sweet kernel of apricot, based on equalisation with combinatorial peptide ligand libraries (CPLLs), SDS-PAGE, nLC-ESI-MS/MS, and database search, permitted identifying 175 proteins. Gene ontology analysis indicated that their main molecular functions are in nucleotide binding (20.9%), hydrolase activities (10.6%), kinase activities (7%), and catalytic activity (5.6%). A protein-protein association network analysis using STRING software permitted to build an interactomic map of all detected proteins, characterised by 34 interactions. In order to forecast the potential health benefits deriving from the consumption of these proteins, the two most abundant, i.e. Prunin 1 and 2, were enzymatically digested in silico predicting 10 and 14 peptides, respectively. Searching their sequences in the database BIOPEP, it was possible to suggest a variety of bioactivities, including dipeptidyl peptidase-IV (DPP-IV) and angiotensin converting enzyme I (ACE) inhibition, glucose uptake stimulation and antioxidant properties. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Computer applications making rapid advances in high throughput microbial proteomics (HTMP).

    PubMed

    Anandkumar, Balakrishna; Haga, Steve W; Wu, Hui-Fen

    2014-02-01

    The last few decades have seen the rise of widely-available proteomics tools. From new data acquisition devices, such as MALDI-MS and 2DE to new database searching softwares, these new products have paved the way for high throughput microbial proteomics (HTMP). These tools are enabling researchers to gain new insights into microbial metabolism, and are opening up new areas of study, such as protein-protein interactions (interactomics) discovery. Computer software is a key part of these emerging fields. This current review considers: 1) software tools for identifying the proteome, such as MASCOT or PDQuest, 2) online databases of proteomes, such as SWISS-PROT, Proteome Web, or the Proteomics Facility of the Pathogen Functional Genomics Resource Center, and 3) software tools for applying proteomic data, such as PSI-BLAST or VESPA. These tools allow for research in network biology, protein identification, functional annotation, target identification/validation, protein expression, protein structural analysis, metabolic pathway engineering and drug discovery.

  15. Database resources of the National Center for Biotechnology Information

    PubMed Central

    Sayers, Eric W.; Barrett, Tanya; Benson, Dennis A.; Bolton, Evan; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M.; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J.; Lu, Zhiyong; Madden, Thomas L.; Madej, Tom; Maglott, Donna R.; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D.; Schuler, Gregory D.; Sequeira, Edwin; Sherry, Stephen T.; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A.; Wagner, Lukas; Wang, Yanli; Wilbur, W. John; Yaschenko, Eugene; Ye, Jian

    2012-01-01

    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:22140104

  16. Database resources of the National Center for Biotechnology Information

    PubMed Central

    2013-01-01

    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page. PMID:23193264

  17. NPInter v3.0: an upgraded database of noncoding RNA-associated interactions

    PubMed Central

    Hao, Yajing; Wu, Wei; Li, Hui; Yuan, Jiao; Luo, Jianjun; Zhao, Yi; Chen, Runsheng

    2016-01-01

    Despite the fact that a large quantity of noncoding RNAs (ncRNAs) have been identified, their functions remain unclear. To enable researchers to have a better understanding of ncRNAs’ functions, we updated the NPInter database to version 3.0, which contains experimentally verified interactions between ncRNAs (excluding tRNAs and rRNAs), especially long noncoding RNAs (lncRNAs) and other biomolecules (proteins, mRNAs, miRNAs and genomic DNAs). In NPInter v3.0, interactions pertaining to ncRNAs are not only manually curated from scientific literature but also curated from high-throughput technologies. In addition, we also curated lncRNA–miRNA interactions from in silico predictions supported by AGO CLIP-seq data. When compared with NPInter v2.0, the interactions are more informative (with additional information on tissues or cell lines, binding sites, conservation, co-expression values and other features) and more organized (with divisions on data sets by data sources, tissues or cell lines, experiments and other criteria). NPInter v3.0 expands the data set to 491,416 interactions in 188 tissues (or cell lines) from 68 kinds of experimental technologies. NPInter v3.0 also improves the user interface and adds new web services, including a local UCSC Genome Browser to visualize binding sites. Additionally, NPInter v3.0 defined a high-confidence set of interactions and predicted the functions of lncRNAs in human and mouse based on the interactions curated in the database. NPInter v3.0 is available at http://www.bioinfo.org/NPInter/. Database URL: http://www.bioinfo.org/NPInter/ PMID:27087310

  18. Regulatory interactions between long noncoding RNA LINC00968 and miR-9-3p in non-small cell lung cancer: A bioinformatic analysis based on miRNA microarray, GEO and TCGA.

    PubMed

    Li, Dong-Yao; Chen, Wen-Jie; Shang, Jun; Chen, Gang; Li, Shi-Kang

    2018-06-01

    Long non-coding RNAs (lncRNAs) have been demonstrated to mediate carcinogenesis in various types of cancer. However, the regulatory role of lncRNA LINC00968 in lung adenocarcinoma remains unclear. The microRNA (miRNA) expression in LINC00968-overexpressing human lung adenocarcinoma A549 cells was detected using miRNA microarray analysis. miR-9-3p was selected for further analysis, and its expression was verified in the Gene Expression Omnibus (GEO) database. In addition, the regulatory axis of LINC00968 was validated using The Cancer Genome Atlas (TCGA) database. Results of the GEO database indicated miR-9-3p expression in lung adenocarcinoma was significantly higher compared with normal tissues. Functional enrichment analyses of the target genes of miR-9-3p indicated protein binding and the AMP-activated protein kinase pathway were the most enriched Gene Ontology and KEGG terms, respectively. Combining target genes with the correlated genes of LINC00968 and miR-9-3p, 120 objective genes were obtained, which were used to construct a protein-protein interaction (PPI) network. Cyclin A2 (CCNA2) was identified to have a vital role in the PPI network. Significant correlations were detected between LINC00968, miR-9-3p and CCNA2 in lung adenocarcinoma. The LINC00968/miR-9-3p/CCNA2 regulatory axis provides a new foundation for further evaluating the regulatory mechanisms of LINC00968 in lung adenocarcinoma.

  19. Key genes and pathways in measles and their interaction with environmental chemicals

    PubMed Central

    Zhang, Rongqiang; Jiang, Hualin; Li, Fengying; Su, Ning; Ding, Yi; Mao, Xiang; Ren, Dan; Wang, Jing

    2018-01-01

    The aim of the present study was to explore key genes that may have a role in the pathology of measles virus infection and to clarify the interaction networks between environmental factors and differentially expressed genes (DEGs). After screening the database of the Gene Expression Omnibus of the National Center for Biotechnology Information, the dataset GSE5808 was downloaded and analyzed. A global normalization method was performed to minimize data inconsistencies and heterogeneity. DEGs during different stages of measles virus infection were explored using R software (v3.4.0). Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of the DEGs were performed using Cytoscape 3.4.0 software. A protein-protein interaction (PPI) network of the DEGs was obtained from the STRING database v9.05. A total of 43 DEGs were obtained from four analyzed sample groups, including 10 highly expressed genes and 33 genes with decreased expression. The most enriched pathways based on KEGG analysis were fatty acid elongation, cytokine-cytokine receptor interaction and RNA degradation. The genes mentioned in the PPI network were mainly associated with protein binding and chemokine activity. A total of 219 chemicals were identified that may, jointly or on their own, interact with the 6 DEGs between the control group and patients with measles (at hospital entry), including benzo(a)pyrene (BaP) and tetrachlorodibenzodioxin (TCDD). In conclusion, the present study revealed that chemokines and environmental chemicals, e.g. BaP and TCDD, may affect the development of measles. PMID:29805511

  20. Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks

    PubMed Central

    Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui

    2017-01-01

    The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways. PMID:29049295

  1. ProteinInferencer: Confident protein identification and multiple experiment comparison for large scale proteomics projects.

    PubMed

    Zhang, Yaoyang; Xu, Tao; Shan, Bing; Hart, Jonathan; Aslanian, Aaron; Han, Xuemei; Zong, Nobel; Li, Haomin; Choi, Howard; Wang, Dong; Acharya, Lipi; Du, Lisa; Vogt, Peter K; Ping, Peipei; Yates, John R

    2015-11-03

    Shotgun proteomics generates valuable information from large-scale and target protein characterizations, including protein expression, protein quantification, protein post-translational modifications (PTMs), protein localization, and protein-protein interactions. Typically, peptides derived from proteolytic digestion, rather than intact proteins, are analyzed by mass spectrometers because peptides are more readily separated, ionized and fragmented. The amino acid sequences of peptides can be interpreted by matching the observed tandem mass spectra to theoretical spectra derived from a protein sequence database. Identified peptides serve as surrogates for their proteins and are often used to establish what proteins were present in the original mixture and to quantify protein abundance. Two major issues exist for assigning peptides to their originating protein. The first issue is maintaining a desired false discovery rate (FDR) when comparing or combining multiple large datasets generated by shotgun analysis and the second issue is properly assigning peptides to proteins when homologous proteins are present in the database. Herein we demonstrate a new computational tool, ProteinInferencer, which can be used for protein inference with both small- or large-scale data sets to produce a well-controlled protein FDR. In addition, ProteinInferencer introduces confidence scoring for individual proteins, which makes protein identifications evaluable. This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015. Published by Elsevier B.V.

  2. Electrostatic potential calculation for biomolecules--creating a database of pre-calculated values reported on a per residue basis for all PDB protein structures.

    PubMed

    Rocchia, W; Neshich, G

    2007-10-05

    STING and Java Protein Dossier provide a collection of physical-chemical parameters, describing protein structure, stability, function, and interaction, considered one of the most comprehensive among the available protein databases of similar type. Particular attention in STING is paid to the electrostatic potential. It makes use of DelPhi, a well-known tool that calculates this physical-chemical quantity for biomolecules by solving the Poisson Boltzmann equation. In this paper, we describe a modification to the DelPhi program aimed at integrating it within the STING environment. We also outline how the "amino acid electrostatic potential" and the "surface amino acid electrostatic potential" are calculated (over all Protein Data Bank (PDB) content) and how the corresponding values are made searchable in STING_DB. In addition, we show that the STING and Java Protein Dossier are also capable of providing these particular parameter values for the analysis of protein structures modeled in computers or being experimentally solved, but not yet deposited in the PDB. Furthermore, we compare the calculated electrostatic potential values obtained by using the earlier version of DelPhi and those by STING, for the biologically relevant case of lysozyme-antibody interaction. Finally, we describe the STING capacity to make queries (at both residue and atomic levels) across the whole PDB, by looking at a specific case where the electrostatic potential parameter plays a crucial role in terms of a particular protein function, such as ligand binding. BlueStar STING is available at http://www.cbi.cnptia.embrapa.br.

  3. LenVarDB: database of length-variant protein domains.

    PubMed

    Mutt, Eshita; Mathew, Oommen K; Sowdhamini, Ramanathan

    2014-01-01

    Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions-deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.

  4. Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets

    PubMed Central

    2012-01-01

    Background To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. Results We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. Conclusions SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery. PMID:23281852

  5. Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets.

    PubMed

    Chiu, Yi-Yuan; Lin, Chun-Yu; Lin, Chih-Ta; Hsu, Kai-Cheng; Chang, Li-Zen; Yang, Jinn-Moon

    2012-01-01

    To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery.

  6. Protein structure similarity from Principle Component Correlation analysis.

    PubMed

    Zhou, Xiaobo; Chou, James; Wong, Stephen T C

    2006-01-25

    Owing to rapid expansion of protein structure databases in recent years, methods of structure comparison are becoming increasingly effective and important in revealing novel information on functional properties of proteins and their roles in the grand scheme of evolutionary biology. Currently, the structural similarity between two proteins is measured by the root-mean-square-deviation (RMSD) in their best-superimposed atomic coordinates. RMSD is the golden rule of measuring structural similarity when the structures are nearly identical; it, however, fails to detect the higher order topological similarities in proteins evolved into different shapes. We propose new algorithms for extracting geometrical invariants of proteins that can be effectively used to identify homologous protein structures or topologies in order to quantify both close and remote structural similarities. We measure structural similarity between proteins by correlating the principle components of their secondary structure interaction matrix. In our approach, the Principle Component Correlation (PCC) analysis, a symmetric interaction matrix for a protein structure is constructed with relationship parameters between secondary elements that can take the form of distance, orientation, or other relevant structural invariants. When using a distance-based construction in the presence or absence of encoded N to C terminal sense, there are strong correlations between the principle components of interaction matrices of structurally or topologically similar proteins. The PCC method is extensively tested for protein structures that belong to the same topological class but are significantly different by RMSD measure. The PCC analysis can also differentiate proteins having similar shapes but different topological arrangements. Additionally, we demonstrate that when using two independently defined interaction matrices, comparison of their maximum eigenvalues can be highly effective in clustering structurally or topologically similar proteins. We believe that the PCC analysis of interaction matrix is highly flexible in adopting various structural parameters for protein structure comparison.

  7. PDB-Ligand: a ligand database based on PDB for the automated and customized classification of ligand-binding structures.

    PubMed

    Shin, Jae-Min; Cho, Doo-Ho

    2005-01-01

    PDB-Ligand (http://www.idrtech.com/PDB-Ligand/) is a three-dimensional structure database of small molecular ligands that are bound to larger biomolecules deposited in the Protein Data Bank (PDB). It is also a database tool that allows one to browse, classify, superimpose and visualize these structures. As of May 2004, there are about 4870 types of small molecular ligands, experimentally determined as a complex with protein or DNA in the PDB. The proteins that a given ligand binds are often homologous and present the same binding structure to the ligand. However, there are also many instances wherein a given ligand binds to two or more unrelated proteins, or to the same or homologous protein in different binding environments. PDB-Ligand serves as an interactive structural analysis and clustering tool for all the ligand-binding structures in the PDB. PDB-Ligand also provides an easier way to obtain a number of different structure alignments of many related ligand-binding structures based on a simple and flexible ligand clustering method. PDB-Ligand will be a good resource for both a better interpretation of ligand-binding structures and the development of better scoring functions to be used in many drug discovery applications.

  8. Contextualization of drug-mediator relations using evidence networks.

    PubMed

    Tran, Hai Joey; Speyer, Gil; Kiefer, Jeff; Kim, Seungchan

    2017-05-31

    Genomic analysis of drug response can provide unique insights into therapies that can be used to match the "right drug to the right patient." However, the process of discovering such therapeutic insights using genomic data is not straightforward and represents an area of active investigation. EDDY (Evaluation of Differential DependencY), a statistical test to detect differential statistical dependencies, is one method that leverages genomic data to identify differential genetic dependencies. EDDY has been used in conjunction with the Cancer Therapeutics Response Portal (CTRP), a dataset with drug-response measurements for more than 400 small molecules, and RNAseq data of cell lines in the Cancer Cell Line Encyclopedia (CCLE) to find potential drug-mediator pairs. Mediators were identified as genes that showed significant change in genetic statistical dependencies within annotated pathways between drug sensitive and drug non-sensitive cell lines, and the results are presented as a public web-portal (EDDY-CTRP). However, the interpretability of drug-mediator pairs currently hinders further exploration of these potentially valuable results. In this study, we address this challenge by constructing evidence networks built with protein and drug interactions from the STITCH and STRING interaction databases. STITCH and STRING are sister databases that catalog known and predicted drug-protein interactions and protein-protein interactions, respectively. Using these two databases, we have developed a method to construct evidence networks to "explain" the relation between a drug and a mediator.  RESULTS: We applied this approach to drug-mediator relations discovered in EDDY-CTRP analysis and identified evidence networks for ~70% of drug-mediator pairs where most mediators were not known direct targets for the drug. Constructed evidence networks enable researchers to contextualize the drug-mediator pair with current research and knowledge. Using evidence networks, we were able to improve the interpretability of the EDDY-CTRP results by linking the drugs and mediators with genes associated with both the drug and the mediator. We anticipate that these evidence networks will help inform EDDY-CTRP results and enhance the generation of important insights to drug sensitivity that will lead to improved precision medicine applications.

  9. The BioGRID interaction database: 2013 update.

    PubMed

    Chatr-Aryamontri, Andrew; Breitkreutz, Bobby-Joe; Heinicke, Sven; Boucher, Lorrie; Winter, Andrew; Stark, Chris; Nixon, Julie; Ramage, Lindsay; Kolas, Nadine; O'Donnell, Lara; Reguly, Teresa; Breitkreutz, Ashton; Sellam, Adnane; Chen, Daici; Chang, Christie; Rust, Jennifer; Livstone, Michael; Oughtred, Rose; Dolinski, Kara; Tyers, Mike

    2013-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: http//thebiogrid.org) is an open access archive of genetic and protein interactions that are curated from the primary biomedical literature for all major model organism species. As of September 2012, BioGRID houses more than 500 000 manually annotated interactions from more than 30 model organisms. BioGRID maintains complete curation coverage of the literature for the budding yeast Saccharomyces cerevisiae, the fission yeast Schizosaccharomyces pombe and the model plant Arabidopsis thaliana. A number of themed curation projects in areas of biomedical importance are also supported. BioGRID has established collaborations and/or shares data records for the annotation of interactions and phenotypes with most major model organism databases, including Saccharomyces Genome Database, PomBase, WormBase, FlyBase and The Arabidopsis Information Resource. BioGRID also actively engages with the text-mining community to benchmark and deploy automated tools to expedite curation workflows. BioGRID data are freely accessible through both a user-defined interactive interface and in batch downloads in a wide variety of formats, including PSI-MI2.5 and tab-delimited files. BioGRID records can also be interrogated and analyzed with a series of new bioinformatics tools, which include a post-translational modification viewer, a graphical viewer, a REST service and a Cytoscape plugin.

  10. A gene network bioinformatics analysis for pemphigoid autoimmune blistering diseases.

    PubMed

    Barone, Antonio; Toti, Paolo; Giuca, Maria Rita; Derchi, Giacomo; Covani, Ugo

    2015-07-01

    In this theoretical study, a text mining search and clustering analysis of data related to genes potentially involved in human pemphigoid autoimmune blistering diseases (PAIBD) was performed using web tools to create a gene/protein interaction network. The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database was employed to identify a final set of PAIBD-involved genes and to calculate the overall significant interactions among genes: for each gene, the weighted number of links, or WNL, was registered and a clustering procedure was performed using the WNL analysis. Genes were ranked in class (leader, B, C, D and so on, up to orphans). An ontological analysis was performed for the set of 'leader' genes. Using the above-mentioned data network, 115 genes represented the final set; leader genes numbered 7 (intercellular adhesion molecule 1 (ICAM-1), interferon gamma (IFNG), interleukin (IL)-2, IL-4, IL-6, IL-8 and tumour necrosis factor (TNF)), class B genes were 13, whereas the orphans were 24. The ontological analysis attested that the molecular action was focused on extracellular space and cell surface, whereas the activation and regulation of the immunity system was widely involved. Despite the limited knowledge of the present pathologic phenomenon, attested by the presence of 24 genes revealing no protein-protein direct or indirect interactions, the network showed significant pathways gathered in several subgroups: cellular components, molecular functions, biological processes and the pathologic phenomenon obtained from the Kyoto Encyclopaedia of Genes and Genomes (KEGG) database. The molecular basis for PAIBD was summarised and expanded, which will perhaps give researchers promising directions for the identification of new therapeutic targets.

  11. MFIB: a repository of protein complexes with mutual folding induced by binding.

    PubMed

    Fichó, Erzsébet; Reményi, István; Simon, István; Mészáros, Bálint

    2017-11-15

    It is commonplace that intrinsically disordered proteins (IDPs) are involved in crucial interactions in the living cell. However, the study of protein complexes formed exclusively by IDPs is hindered by the lack of data and such analyses remain sporadic. Systematic studies benefited other types of protein-protein interactions paving a way from basic science to therapeutics; yet these efforts require reliable datasets that are currently lacking for synergistically folding complexes of IDPs. Here we present the Mutual Folding Induced by Binding (MFIB) database, the first systematic collection of complexes formed exclusively by IDPs. MFIB contains an order of magnitude more data than any dataset used in corresponding studies and offers a wide coverage of known IDP complexes in terms of flexibility, oligomeric composition and protein function from all domains of life. The included complexes are grouped using a hierarchical classification and are complemented with structural and functional annotations. MFIB is backed by a firm development team and infrastructure, and together with possible future community collaboration it will provide the cornerstone for structural and functional studies of IDP complexes. MFIB is freely accessible at http://mfib.enzim.ttk.mta.hu/. The MFIB application is hosted by Apache web server and was implemented in PHP. To enrich querying features and to enhance backend performance a MySQL database was also created. simon.istvan@ttk.mta.hu, meszaros.balint@ttk.mta.hu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  12. ExtraTrain: a database of Extragenic regions and Transcriptional information in prokaryotic organisms

    PubMed Central

    Pareja, Eduardo; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Bonal, Javier; Tobes, Raquel

    2006-01-01

    Background Transcriptional regulation processes are the principal mechanisms of adaptation in prokaryotes. In these processes, the regulatory proteins and the regulatory DNA signals located in extragenic regions are the key elements involved. As all extragenic spaces are putative regulatory regions, ExtraTrain covers all extragenic regions of available genomes and regulatory proteins from bacteria and archaea included in the UniProt database. Description ExtraTrain provides integrated and easily manageable information for 679816 extragenic regions and for the genes delimiting each of them. In addition ExtraTrain supplies a tool to explore extragenic regions, named Palinsight, oriented to detect and search palindromic patterns. This interactive visual tool is totally integrated in the database, allowing the search for regulatory signals in user defined sets of extragenic regions. The 26046 regulatory proteins included in ExtraTrain belong to the families AraC/XylS, ArsR, AsnC, Cold shock domain, CRP-FNR, DeoR, GntR, IclR, LacI, LuxR, LysR, MarR, MerR, NtrC/Fis, OmpR and TetR. The database follows the InterPro criteria to define these families. The information about regulators includes manually curated sets of references specifically associated to regulator entries. In order to achieve a sustainable and maintainable knowledge database ExtraTrain is a platform open to the contribution of knowledge by the scientific community providing a system for the incorporation of textual knowledge. Conclusion ExtraTrain is a new database for exploring Extragenic regions and Transcriptional information in bacteria and archaea. ExtraTrain database is available at . PMID:16539733

  13. DIBS: a repository of disordered binding sites mediating interactions with ordered proteins.

    PubMed

    Schad, Eva; Fichó, Erzsébet; Pancsa, Rita; Simon, István; Dosztányi, Zsuzsanna; Mészáros, Bálint

    2018-02-01

    Intrinsically Disordered Proteins (IDPs) mediate crucial protein-protein interactions, most notably in signaling and regulation. As their importance is increasingly recognized, the detailed analyses of specific IDP interactions opened up new opportunities for therapeutic targeting. Yet, large scale information about IDP-mediated interactions in structural and functional details are lacking, hindering the understanding of the mechanisms underlying this distinct binding mode. Here, we present DIBS, the first comprehensive, curated collection of complexes between IDPs and ordered proteins. DIBS not only describes by far the highest number of cases, it also provides the dissociation constants of their interactions, as well as the description of potential post-translational modifications modulating the binding strength and linear motifs involved in the binding. Together with the wide range of structural and functional annotations, DIBS will provide the cornerstone for structural and functional studies of IDP complexes. DIBS is freely accessible at http://dibs.enzim.ttk.mta.hu/. The DIBS application is hosted by Apache web server and was implemented in PHP. To enrich querying features and to enhance backend performance a MySQL database was also created. dosztanyi@caesar.elte.hu or bmeszaros@caesar.elte.hu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  14. The Comparative Toxicogenomics Database (CTD): A Resource for Comparative Toxicological Studies

    PubMed Central

    CJ, Mattingly; MC, Rosenstein; GT, Colby; JN, Forrest; JL, Boyer

    2006-01-01

    The etiology of most chronic diseases involves interactions between environmental factors and genes that modulate important biological processes (Olden and Wilson, 2000). We are developing the publicly available Comparative Toxicogenomics Database (CTD) to promote understanding about the effects of environmental chemicals on human health. CTD identifies interactions between chemicals and genes and facilitates cross-species comparative studies of these genes. The use of diverse animal models and cross-species comparative sequence studies has been critical for understanding basic physiological mechanisms and gene and protein functions. Similarly, these approaches will be valuable for exploring the molecular mechanisms of action of environmental chemicals and the genetic basis of differential susceptibility. PMID:16902965

  15. Identifying functionally informative evolutionary sequence profiles.

    PubMed

    Gil, Nelson; Fiser, Andras

    2018-04-15

    Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.

  16. EADB: An Estrogenic Activity Database for Assessing Potential Endocrine Activity

    EPA Science Inventory

    Endocrine-active chemicals can potentially have adverse effects on both humans and wildlife. They can interfere with the body’s endocrine system through direct or indirect interactions with many protein targets. Estrogen receptors (ERs) are one of the major targets, and many ...

  17. Protein sequences insight into heavy metal tolerance in Cronobacter sakazakii BAA-894 encoded by plasmid pESA3.

    PubMed

    Chaturvedi, Navaneet; Kajsik, Michal; Forsythe, Stephen; Pandey, Paras Nath

    2015-12-01

    The recently annotated genome of the bacterium Cronobacter sakazakii BAA-894 suggests that the organism has the ability to bind heavy metals. This study demonstrates heavy metal tolerance in C. sakazakii, in which proteins with the heavy metal interaction were recognized by computational and experimental study. As the result, approximately one-fourth of proteins encoded on the plasmid pESA3 are proposed to have potential interaction with heavy metals. Interaction between heavy metals and predicted proteins was further corroborated using protein crystal structures from protein data bank database and comparison of metal-binding ligands. In addition, a phylogenetic study was undertaken for the toxic heavy metals, arsenic, cadmium, lead and mercury, which generated relatedness clustering for lead, cadmium and arsenic. Laboratory studies confirmed the organism's tolerance to tellurite, copper and silver. These experimental and computational study data extend our understanding of the genes encoding for proteins of this important neonatal pathogen and provide further insights into the genotypes associated with features that can contribute to its persistence in the environment. The information will be of value for future environmental protection from heavy toxic metals.

  18. Feature generation and representations for protein-protein interaction classification.

    PubMed

    Lan, Man; Tan, Chew Lim; Su, Jian

    2009-10-01

    Automatic detecting protein-protein interaction (PPI) relevant articles is a crucial step for large-scale biological database curation. The previous work adopted POS tagging, shallow parsing and sentence splitting techniques, but they achieved worse performance than the simple bag-of-words representation. In this paper, we generated and investigated multiple types of feature representations in order to further improve the performance of PPI text classification task. Besides the traditional domain-independent bag-of-words approach and the term weighting methods, we also explored other domain-dependent features, i.e. protein-protein interaction trigger keywords, protein named entities and the advanced ways of incorporating Natural Language Processing (NLP) output. The integration of these multiple features has been evaluated on the BioCreAtIvE II corpus. The experimental results showed that both the advanced way of using NLP output and the integration of bag-of-words and NLP output improved the performance of text classification. Specifically, in comparison with the best performance achieved in the BioCreAtIvE II IAS, the feature-level and classifier-level integration of multiple features improved the performance of classification 2.71% and 3.95%, respectively.

  19. Investigation of potent lead for acquired immunodeficiency syndrome from traditional Chinese medicine.

    PubMed

    Hung, Tzu-Chieh; Lee, Wen-Yuan; Chen, Kuen-Bao; Chan, Yueh-Chiu; Chen, Calvin Yu-Chian

    2014-01-01

    Acquired immunodeficiency syndrome (AIDS), caused by human immunodeficiency virus (HIV), has become, because of the rapid spread of the disease, a serious global problem and cannot be treated. Recent studies indicate that VIF is a protein of HIV to prevent all of human immunity to attack HIV. Molecular compounds of traditional Chinese medicine (TCM) database filtered through molecular docking and molecular dynamics simulations to inhibit VIF can protect against HIV. Glutamic acid, plantagoguanidinic acid, and Aurantiamide acetate based docking score higher with other TCM compounds selected. Molecular dynamics are useful for analysis and detection ligand interactions. According to the docking position, hydrophobic interactions, hydrogen bonding changes, and structure variation, the study try to select the efficacy of traditional Chinese medicine compound Aurantiamide acetate is better than the other for protein-ligand interactions to maintain the protein composition, based on changes in the structure.

  20. FragFit: a web-application for interactive modeling of protein segments into cryo-EM density maps.

    PubMed

    Tiemann, Johanna K S; Rose, Alexander S; Ismer, Jochen; Darvish, Mitra D; Hilal, Tarek; Spahn, Christian M T; Hildebrand, Peter W

    2018-05-21

    Cryo-electron microscopy (cryo-EM) is a standard method to determine the three-dimensional structures of molecular complexes. However, easy to use tools for modeling of protein segments into cryo-EM maps are sparse. Here, we present the FragFit web-application, a web server for interactive modeling of segments of up to 35 amino acids length into cryo-EM density maps. The fragments are provided by a regularly updated database containing at the moment about 1 billion entries extracted from PDB structures and can be readily integrated into a protein structure. Fragments are selected based on geometric criteria, sequence similarity and fit into a given cryo-EM density map. Web-based molecular visualization with the NGL Viewer allows interactive selection of fragments. The FragFit web-application, accessible at http://proteinformatics.de/FragFit, is free and open to all users, without any login requirements.

  1. Detecting Coevolution in and among Protein Domains

    PubMed Central

    Yeang, Chen-Hsiang; Haussler, David

    2007-01-01

    Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level. PMID:17983264

  2. Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction.

    PubMed

    Muley, Vijaykumar Yogesh; Ranjan, Akash

    2012-01-01

    Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.

  3. Connecting proteins with drug-like compounds: Open source drug discovery workflows with BindingDB and KNIME

    PubMed Central

    Berthold, Michael R.; Hedrick, Michael P.; Gilson, Michael K.

    2015-01-01

    Today’s large, public databases of protein–small molecule interaction data are creating important new opportunities for data mining and integration. At the same time, new graphical user interface-based workflow tools offer facile alternatives to custom scripting for informatics and data analysis. Here, we illustrate how the large protein-ligand database BindingDB may be incorporated into KNIME workflows as a step toward the integration of pharmacological data with broader biomolecular analyses. Thus, we describe a collection of KNIME workflows that access BindingDB data via RESTful webservices and, for more intensive queries, via a local distillation of the full BindingDB dataset. We focus in particular on the KNIME implementation of knowledge-based tools to generate informed hypotheses regarding protein targets of bioactive compounds, based on notions of chemical similarity. A number of variants of this basic approach are tested for seven existing drugs with relatively ill-defined therapeutic targets, leading to replication of some previously confirmed results and discovery of new, high-quality hits. Implications for future development are discussed. Database URL: www.bindingdb.org PMID:26384374

  4. TrypsNetDB: An integrated framework for the functional characterization of trypanosomatid proteins

    PubMed Central

    Gazestani, Vahid H.; Yip, Chun Wai; Nikpour, Najmeh; Berghuis, Natasha

    2017-01-01

    Trypanosomatid parasites cause serious infections in humans and production losses in livestock. Due to the high divergence from other eukaryotes, such as humans and model organisms, the functional roles of many trypanosomatid proteins cannot be predicted by homology-based methods, rendering a significant portion of their proteins as uncharacterized. Recent technological advances have led to the availability of multiple systematic and genome-wide datasets on trypanosomatid parasites that are informative regarding the biological role(s) of their proteins. Here, we report TrypsNetDB (http://trypsNetDB.org), a web-based resource for the functional annotation of 16 different species/strains of trypanosomatid parasites. The database not only visualizes the network context of the queried protein(s) in an intuitive way but also examines the response of the represented network in more than 50 different biological contexts and its enrichment for various biological terms and pathways, protein sequence signatures, and potential RNA regulatory elements. The interactome core of the database, as of Jan 23, 2017, contains 101,187 interactions among 13,395 trypanosomatid proteins inferred from 97 genome-wide and focused studies on the interactome of these organisms. PMID:28158179

  5. Atomic interaction networks in the core of protein domains and their native folds.

    PubMed

    Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S; Sasisekharan, V; Sasisekharan, Ram

    2010-02-23

    Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be "signature" of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1-2 angstroms (mean 1.61A) C(alpha) RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the 'twilight' and 'midnight' zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools.

  6. Atomic Interaction Networks in the Core of Protein Domains and Their Native Folds

    PubMed Central

    Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S.; Sasisekharan, V.; Sasisekharan, Ram

    2010-01-01

    Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be “signature” of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1–2 angstroms (mean 1.61A) Cα RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the ‘twilight’ and ‘midnight’ zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools. PMID:20186337

  7. Analysis of Structural Features Contributing to Weak Affinities of Ubiquitin/Protein Interactions.

    PubMed

    Cohen, Ariel; Rosenthal, Eran; Shifman, Julia M

    2017-11-10

    Ubiquitin is a small protein that enables one of the most common post-translational modifications, where the whole ubiquitin molecule is attached to various target proteins, forming mono- or polyubiquitin conjugations. As a prototypical multispecific protein, ubiquitin interacts non-covalently with a variety of proteins in the cell, including ubiquitin-modifying enzymes and ubiquitin receptors that recognize signals from ubiquitin-conjugated substrates. To enable recognition of multiple targets and to support fast dissociation from the ubiquitin modifying enzymes, ubiquitin/protein interactions are characterized with low affinities, frequently in the higher μM and lower mM range. To determine how structure encodes low binding affinity of ubiquitin/protein complexes, we analyzed structures of more than a hundred such complexes compiled in the Ubiquitin Structural Relational Database. We calculated various structure-based features of ubiquitin/protein binding interfaces and compared them to the same features of general protein-protein interactions (PPIs) with various functions and generally higher affinities. Our analysis shows that ubiquitin/protein binding interfaces on average do not differ in size and shape complementarity from interfaces of higher-affinity PPIs. However, they contain fewer favorable hydrogen bonds and more unfavorable hydrophobic/charge interactions. We further analyzed how binding interfaces change upon affinity maturation of ubiquitin toward its target proteins. We demonstrate that while different features are improved in different experiments, the majority of the evolved complexes exhibit better shape complementarity and hydrogen bond pattern compared to wild-type complexes. Our analysis helps to understand how low-affinity PPIs have evolved and how they could be converted into high-affinity PPIs. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Proteomics of xenografted human breast cancer indicates novel targets related to tamoxifen resistance.

    PubMed

    Besada, Vladimir; Diaz, Maylin; Becker, Michael; Ramos, Yassel; Castellanos-Serra, Lila; Fichtner, Iduna

    2006-02-01

    Tamoxifen is the most frequently used drug for hormone therapy of breast cancer patients, even though a high percentage of women are (or become) refractory to this treatment. The proteins involved in tamoxifen resistance of breast tumor cells as well as the mechanisms by which they interact, are still unknown. Some years ago, we established the xenograft breast tumor 3366, sensitive to tamoxifen and the 3366/TAM, resistant to tamoxifen, derived after two years of in vivo passages of the parental 3366 under tamoxifen treatment. Here, we compare the protein expression levels of both xenografts. 2-DE of proteins from total cell extracts showed very high reproducibility among tumors from each group (tamoxifen sensitive and tamoxifen resistant). The heuristic clustering analysis of these gels pooled them correctly in both groups. Twelve proteins were found up-regulated in the tamoxifen-resistant line, while nine were down-regulated. The proteins differentially expressed were identified by MS and sequence database analysis. Biological functions of these proteins are related to cell-cell adhesion and interaction, signal transduction, DNA and protein synthesis machinery, mitochondrial respiratory chain, oxidative stress processes and apoptosis. Three of the identified proteins (ALG-2 interacting protein and two GDP-dissociation inhibitors) could be directly involved in the resistance phenomenon.

  9. How much do we know about the coupling of G-proteins to serotonin receptors?

    PubMed Central

    2014-01-01

    Serotonin receptors are G-protein-coupled receptors (GPCRs) involved in a variety of psychiatric disorders. G-proteins, heterotrimeric complexes that couple to multiple receptors, are activated when their receptor is bound by the appropriate ligand. Activation triggers a cascade of further signalling events that ultimately result in cell function changes. Each of the several known G-protein types can activate multiple pathways. Interestingly, since several G-proteins can couple to the same serotonin receptor type, receptor activation can result in induction of different pathways. To reach a better understanding of the role, interactions and expression of G-proteins a literature search was performed in order to list all the known heterotrimeric combinations and serotonin receptor complexes. Public databases were analysed to collect transcript and protein expression data relating to G-proteins in neural tissues. Only a very small number of heterotrimeric combinations and G-protein-receptor complexes out of the possible thousands suggested by expression data analysis have been examined experimentally. In addition this has mostly been obtained using insect, hamster, rat and, to a lesser extent, human cell lines. Besides highlighting which interactions have not been explored, our findings suggest additional possible interactions that should be examined based on our expression data analysis. PMID:25011628

  10. How much do we know about the coupling of G-proteins to serotonin receptors?

    PubMed

    Giulietti, Matteo; Vivenzio, Viviana; Piva, Francesco; Principato, Giovanni; Bellantuono, Cesario; Nardi, Bernardo

    2014-07-10

    Serotonin receptors are G-protein-coupled receptors (GPCRs) involved in a variety of psychiatric disorders. G-proteins, heterotrimeric complexes that couple to multiple receptors, are activated when their receptor is bound by the appropriate ligand. Activation triggers a cascade of further signalling events that ultimately result in cell function changes. Each of the several known G-protein types can activate multiple pathways. Interestingly, since several G-proteins can couple to the same serotonin receptor type, receptor activation can result in induction of different pathways. To reach a better understanding of the role, interactions and expression of G-proteins a literature search was performed in order to list all the known heterotrimeric combinations and serotonin receptor complexes. Public databases were analysed to collect transcript and protein expression data relating to G-proteins in neural tissues. Only a very small number of heterotrimeric combinations and G-protein-receptor complexes out of the possible thousands suggested by expression data analysis have been examined experimentally. In addition this has mostly been obtained using insect, hamster, rat and, to a lesser extent, human cell lines. Besides highlighting which interactions have not been explored, our findings suggest additional possible interactions that should be examined based on our expression data analysis.

  11. Annotating Mutational Effects on Proteins and Protein Interactions: Designing Novel and Revisiting Existing Protocols.

    PubMed

    Li, Minghui; Goncearenco, Alexander; Panchenko, Anna R

    2017-01-01

    In this review we describe a protocol to annotate the effects of missense mutations on proteins, their functions, stability, and binding. For this purpose we present a collection of the most comprehensive databases which store different types of sequencing data on missense mutations, we discuss their relationships, possible intersections, and unique features. Next, we suggest an annotation workflow using the state-of-the art methods and highlight their usability, advantages, and limitations for different cases. Finally, we address a particularly difficult problem of deciphering the molecular mechanisms of mutations on proteins and protein complexes to understand the origins and mechanisms of diseases.

  12. The structure of Ca2+-loaded S100A2 at 1.3-Å resolution.

    PubMed

    Koch, Michael; Fritz, Günter

    2012-05-01

    S100A2 is an EF-hand calcium ion (Ca(2+))-binding protein that activates the tumour suppressor p53. In order to understand the molecular mechanisms underlying the Ca(2+) -induced activation of S100A2, the structure of Ca(2+)-bound S100A2 was determined at 1.3 Å resolution by X-ray crystallography. The structure was compared with Ca(2+) -free S100A2 and with other S100 proteins. Binding of Ca(2+) to S100A2 induces small structural changes in the N-terminal EF-hand, but a large conformational change in the C-terminal EF-hand, reorienting helix III by approximately 90°. This movement is accompanied by the exposure of a hydrophobic cavity between helix III and helix IV that represents the target protein interaction site. This molecular reorganization is associated with the breaking and new formation of intramolecular hydrophobic contacts. The target binding site exhibits unique features; in particular, the hydrophobic cavity is larger than in other Ca(2+)-loaded S100 proteins. The structural data underline that the shape and size of the hydrophobic cavity are major determinants for target specificity of S100 proteins and suggest that the binding mode for S100A2 is different from that of other p53-interacting S100 proteins. Database Structural data are available in the Protein Data Bank database under the accession number 4DUQ © 2012 The Authors Journal compilation © 2012 FEBS.

  13. Comparative analysis of protein-protein interactions in the defense response of rice and wheat.

    PubMed

    Cantu, Dario; Yang, Baoju; Ruan, Randy; Li, Kun; Menzo, Virginia; Fu, Daolin; Chern, Mawsheng; Ronald, Pamela C; Dubcovsky, Jorge

    2013-03-12

    Despite the importance of wheat as a major staple crop and the negative impact of diseases on its production worldwide, the genetic mechanisms and gene interactions involved in the resistance response in wheat are still poorly understood. The complete sequence of the rice genome has provided an extremely useful parallel road map for genetic and genomics studies in wheat. The recent construction of a defense response interactome in rice has the potential to further enhance the translation of advances in rice to wheat and other grasses. The objective of this study was to determine the degree of conservation in the protein-protein interactions in the rice and wheat defense response interactomes. As entry points we selected proteins that serve as key regulators of the rice defense response: the RAR1/SGT1/HSP90 protein complex, NPR1, XA21, and XB12 (XA21 interacting protein 12). Using available wheat sequence databases and phylogenetic analyses we identified and cloned the wheat orthologs of these four rice proteins, including recently duplicated paralogs, and their known direct interactors and tested 86 binary protein interactions using yeast-two-hybrid (Y2H) assays. All interactions between wheat proteins were further tested using in planta bimolecular fluorescence complementation (BiFC). Eighty three percent of the known rice interactions were confirmed when wheat proteins were tested with rice interactors and 76% were confirmed using wheat protein pairs. All interactions in the RAR1/SGT1/ HSP90, NPR1 and XB12 nodes were confirmed for the identified orthologous wheat proteins, whereas only forty four percent of the interactions were confirmed in the interactome node centered on XA21. We hypothesize that this reduction may be associated with a different sub-functionalization history of the multiple duplications that occurred in this gene family after the divergence of the wheat and rice lineages. The observed high conservation of interactions between proteins that serve as key regulators of the rice defense response suggests that the existing rice interactome can be used to predict interactions in wheat. Such predictions are less reliable for nodes that have undergone a different history of duplications and sub-functionalization in the two lineages.

  14. Using porphyrin-amino acid pairs to model the electrochemistry of heme proteins: experimental and theoretical investigations.

    PubMed

    Samajdar, Rudra N; Manogaran, Dhivya; Yashonath, S; Bhattacharyya, Aninda J

    2018-04-18

    Quasi reversibility in electrochemical cycling between different oxidation states of iron is an often seen characteristic of iron containing heme proteins that bind dioxygen. Surprisingly, the system becomes fully reversible in the bare iron-porphyrin complex: hemin. This leads to the speculation that the polypeptide bulk (globin) around the iron-porphyrin active site in these heme proteins is probably responsible for the electrochemical quasi reversibility. To understand the effect of such polypeptide bulk on iron-porphyrin, we study the interaction of specific amino acids with the hemin center in solution. We choose three representative amino acids-histidine (a well-known iron coordinator in bio-inorganic systems), tryptophan (a well-known fluoroprobe for proteins), and cysteine (a redox-active organic molecule). The interactions of these amino acids with hemin are studied using electrochemistry, spectroscopy, and density functional theory. The results indicate that among these three, the interaction of histidine with the iron center is strongest. Further, histidine maintains the electrochemical reversibility of iron. On the other hand, tryptophan and cysteine interact weakly with the iron center but disturb the electrochemical reversibility by contributing their own redox active processes to the system. Put together, this study attempts to understand the molecular interactions that can control electrochemical reversibility in heme proteins. The results obtained here from the three representative amino acids can be scaled up to build a heme-amino acid interaction database that may predict the electrochemical properties of any protein with a defined polypeptide sequence.

  15. Identification of GRB2 and GAB1 Coexpression as an Unfavorable Prognostic Factor for Hepatocellular Carcinoma by a Combination of Expression Profile and Network Analysis

    PubMed Central

    Yang, Mei; Wang, Danhua; Yu, Lingxiang; Guo, Chaonan; Guo, Xiaodong; Lin, Na

    2013-01-01

    Aim To screen novel markers for hepatocellular carcinoma (HCC) by a combination of expression profile, interaction network analysis and clinical validation. Methods HCC significant molecules which are differentially expressed or had genetic variations in HCC tissues were obtained from five existing HCC related databases (OncoDB.HCC, HCC.net, dbHCCvar, EHCO and Liverome). Then, the protein-protein interaction (PPI) network of these molecules was constructed. Three topological features of the network ('Degree', 'Betweenness', and 'Closeness') and the k-core algorithm were used to screen candidate HCC markers which play crucial roles in tumorigenesis of HCC. Furthermore, the clinical significance of two candidate HCC markers growth factor receptor-bound 2 (GRB2) and GRB2-associated-binding protein 1 (GAB1) was validated. Results In total, 6179 HCC significant genes and 977 HCC significant proteins were collected from existing HCC related databases. After network analysis, 331 candidate HCC markers were identified. Especially, GAB1 has the highest k-coreness suggesting its central localization in HCC related network, and the interaction between GRB2 and GAB1 has the largest edge-betweenness implying it may be biologically important to the function of HCC related network. As the results of clinical validation, the expression levels of both GRB2 and GAB1 proteins were significantly higher in HCC tissues than those in their adjacent nonneoplastic tissues. More importantly, the combined GRB2 and GAB1 protein expression was significantly associated with aggressive tumor progression and poor prognosis in patients with HCC. Conclusion This study provided an integrative analysis by combining expression profile and interaction network analysis to identify a list of biologically significant HCC related markers and pathways. Further experimental validation indicated that the aberrant expression of GRB2 and GAB1 proteins may be strongly related to tumor progression and prognosis in patients with HCC. The overexpression of GRB2 in combination with upregulation of GAB1 may be an unfavorable prognostic factor for HCC. PMID:24391994

  16. Differential gene expression analysis in glioblastoma cells and normal human brain cells based on GEO database.

    PubMed

    Wang, Anping; Zhang, Guibin

    2017-11-01

    The differentially expressed genes between glioblastoma (GBM) cells and normal human brain cells were investigated to performed pathway analysis and protein interaction network analysis for the differentially expressed genes. GSE12657 and GSE42656 gene chips, which contain gene expression profile of GBM were obtained from Gene Expression Omniub (GEO) database of National Center for Biotechnology Information (NCBI). The 'limma' data packet in 'R' software was used to analyze the differentially expressed genes in the two gene chips, and gene integration was performed using 'RobustRankAggreg' package. Finally, pheatmap software was used for heatmap analysis and Cytoscape, DAVID, STRING and KOBAS were used for protein-protein interaction, Gene Ontology (GO) and KEGG analyses. As results: i) 702 differentially expressed genes were identified in GSE12657, among those genes, 548 were significantly upregulated and 154 were significantly downregulated (p<0.01, fold-change >1), and 1,854 differentially expressed genes were identified in GSE42656, among the genes, 1,068 were significantly upregulated and 786 were significantly downregulated (p<0.01, fold-change >1). A total of 167 differentially expressed genes including 100 upregulated genes and 67 downregulated genes were identified after gene integration, and the genes showed significantly different expression levels in GBM compared with normal human brain cells (p<0.05). ii) Interactions between the protein products of 101 differentially expressed genes were identified using STRING and expression network was established. A key gene, called CALM3, was identified by Cytoscape software. iii) GO enrichment analysis showed that differentially expressed genes were mainly enriched in 'neurotransmitter:sodium symporter activity' and 'neurotransmitter transporter activity', which can affect the activity of neurotransmitter transportation. KEGG pathway analysis showed that the differentially expressed genes were mainly enriched in 'protein processing in endoplasmic reticulum', which can affect protein processing in endoplasmic reticulum. The results showed that: i) 167 differentially expressed genes were identified from two gene chips after integration; and ii) protein interaction network was established, and GO and KEGG pathway analyses were successfully performed to identify and annotate the key gene, which provide new insights for the studies on GBN at gene level.

  17. Rosette Assay: Highly Customizable Dot-Blot for SH2 Domain Screening.

    PubMed

    Ng, Khong Y; Machida, Kazuya

    2017-01-01

    With a growing number of high-throughput studies, structural analyses, and availability of protein-protein interaction databases, it is now possible to apply web-based prediction tools to SH2 domain-interactions. However, in silico prediction is not always reliable and requires experimental validation. Rosette assay is a dot blot-based reverse-phase assay developed for the assessment of binding between SH2 domains and their ligands. It is conveniently customizable, allowing for low- to high-throughput analysis of interactions between various numbers of SH2 domains and their ligands, e.g., short peptides, purified proteins, and cell lysates. The binding assay is performed in a 96-well plate (MBA or MWA apparatus) in which a sample spotted membrane is incubated with up to 96 labeled SH2 domains. Bound domains are detected and quantified using a chemiluminescence or near-infrared fluorescence (IR) imaging system. In this chapter, we describe a practical protocol for rosette assay to assess interactions between synthesized tyrosine phosphorylated peptides and a library of GST-tagged SH2 domains. Since the methodology is not confined to assessment of SH2-pTyr interactions, rosette assay can be broadly utilized for ligand and drug screening using different protein interaction domains or antibodies.

  18. Protein-protein interaction site prediction in Homo sapiens and E. coli using an interaction-affinity based membership function in fuzzy SVM.

    PubMed

    Sriwastava, Brijesh Kumar; Basu, Subhadip; Maulik, Ujjwal

    2015-10-01

    Protein-protein interaction (PPI) site prediction aids to ascertain the interface residues that participate in interaction processes. Fuzzy support vector machine (F-SVM) is proposed as an effective method to solve this problem, and we have shown that the performance of the classical SVM can be enhanced with the help of an interaction-affinity based fuzzy membership function. The performances of both SVM and F-SVM on the PPI databases of the Homo sapiens and E. coli organisms are evaluated and estimated the statistical significance of the developed method over classical SVM and other fuzzy membership-based SVM methods available in the literature. Our membership function uses the residue-level interaction affinity scores for each pair of positive and negative sequence fragments. The average AUC scores in the 10-fold cross-validation experiments are measured as 79.94% and 80.48% for the Homo sapiens and E. coli organisms respectively. On the independent test datasets, AUC scores are obtained as 76.59% and 80.17% respectively for the two organisms. In almost all cases, the developed F-SVM method improves the performances obtained by the corresponding classical SVM and the other classifiers, available in the literature.

  19. YPD™, PombePD™ and WormPD™: model organism volumes of the BioKnowledge™ Library, an integrated resource for protein information

    PubMed Central

    Costanzo, Maria C.; Crawford, Matthew E.; Hirschman, Jodi E.; Kranz, Janice E.; Olsen, Philip; Robertson, Laura S.; Skrzypek, Marek S.; Braun, Burkhard R.; Hopkins, Kelley Lennon; Kondu, Pinar; Lengieza, Carey; Lew-Smith, Jodi E.; Tillberg, Michael; Garrels, James I.

    2001-01-01

    The BioKnowledge Library is a relational database and web site (http://www.proteome.com) composed of protein-specific information collected from the scientific literature. Each Protein Report on the web site summarizes and displays published information about a single protein, including its biochemical function, role in the cell and in the whole organism, localization, mutant phenotype and genetic interactions, regulation, domains and motifs, interactions with other proteins and other relevant data. This report describes four species-specific volumes of the BioKnowledge Library, concerned with the model organisms Saccharo­myces cerevisiae (YPD), Schizosaccharomyces pombe (PombePD) and Caenorhabditis elegans (WormPD), and with the fungal pathogen Candida albicans (CalPD™). Protein Reports of each species are unified in format, easily searchable and extensively cross-referenced between species. The relevance of these comprehensively curated resources to analysis of proteins in other species is discussed, and is illustrated by a survey of model organism proteins that have similarity to human proteins involved in disease. PMID:11125054

  20. Characterization of the Saccharomyces cerevisiae ATP-Interactome using the iTRAQ-SPROX Technique

    NASA Astrophysics Data System (ADS)

    Geer, M. Ariel; Fitzgerald, Michael C.

    2016-02-01

    The stability of proteins from rates of oxidation (SPROX) technique was used in combination with an isobaric mass tagging strategy to identify adenosine triphosphate (ATP) interacting proteins in the Saccharomyces cerevisiae proteome. The SPROX methodology utilized in this work enabled 373 proteins in a yeast cell lysate to be assayed for ATP interactions (both direct and indirect) using the non-hydrolyzable ATP analog, adenylyl imidodiphosphate (AMP-PNP). A total of 28 proteins were identified with AMP-PNP-induced thermodynamic stability changes. These protein hits included 14 proteins that were previously annotated as ATP-binding proteins in the Saccharomyces Genome Database (SGD). The 14 non-annotated ATP-binding proteins included nine proteins that were previously found to be ATP-sensitive in an earlier SPROX study using a stable isotope labeling with amino acids in cell culture (SILAC)-based approach. A bioinformatics analysis of the protein hits identified here and in the earlier SILAC-SPROX experiments revealed that many of the previously annotated ATP-binding protein hits were kinases, ligases, and chaperones. In contrast, many of the newly discovered ATP-sensitive proteins were not from these protein classes, but rather were hydrolases, oxidoreductases, and nucleic acid-binding proteins.

  1. Identification of a cytoplasmic interaction partner of the large regulatory proteins Rep78/Rep68 of adeno-associated virus type 2 (AAV-2)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weger, Stefan; Hammer, Eva; Goetz, Anne

    2007-05-25

    Through yeast two-hybrid analysis and coimmunoprecipitation studies, we have identified a novel cellular AAV-2 Rep78/Rep68 interaction partner located predominantly in the cytoplasm. In public databases, it has been assigned as KCTD5, because of a region of high similarity to the cytoplasmic tetramerization domain of voltage-gated potassium channels. Whereas Rep/KCTD5 interaction relied on the region surrounding the Rep nuclear localization signal, nuclear accumulation of Rep was not required. Wildtype Rep78/Rep68 proteins induced the translocation of large portions of KCTD5 into the nucleus pointing to functional interactions both in the cytoplasm and the nucleus. In line with an anticipated functional interference inmore » the cytoplasm, KCTD5 overexpression completely abrogated Rep68-mediated posttranscriptional activation of a HIV-LTR driven luciferase reporter gene. Our study expands the panel of already identified nuclear Rep interaction partners to a cytoplasmic protein, which raises the awareness that important steps in the AAV life cycle may be regulated in this compartment.« less

  2. Identification and analysis of potential targets in Streptococcus sanguinis using computer aided protein data analysis.

    PubMed

    Chowdhury, Md Rabiul Hossain; Bhuiyan, Md IqbalKaiser; Saha, Ayan; Mosleh, Ivan Mhai; Mondol, Sobuj; Ahmed, C M Sabbir

    2014-01-01

    Streptococcus sanguinis is a Gram-positive, facultative aerobic bacterium that is a member of the viridans streptococcus group. It is found in human mouths in dental plaque, which accounts for both dental cavities and bacterial endocarditis, and which entails a mortality rate of 25%. Although a range of remedial mediators have been found to control this organism, the effectiveness of agents such as penicillin, amoxicillin, trimethoprim-sulfamethoxazole, and erythromycin, was observed. The emphasis of this investigation was on finding substitute and efficient remedial approaches for the total destruction of this bacterium. In this computational study, various databases and online software were used to ascertain some specific targets of S. sanguinis. Particularly, the Kyoto Encyclopedia of Genes and Genomes databases were applied to determine human nonhomologous proteins, as well as the metabolic pathways involved with those proteins. Different software such as Phyre2, CastP, DoGSiteScorer, the Protein Function Predictor server, and STRING were utilized to evaluate the probable active drug binding site with its known function and protein-protein interaction. In this study, among 218 essential proteins of this pathogenic bacterium, 81 nonhomologous proteins were accrued, and 15 proteins that are unique in several metabolic pathways of S. sanguinis were isolated through metabolic pathway analysis. Furthermore, four essentially membrane-bound unique proteins that are involved in distinct metabolic pathways were revealed by this research. Active sites and druggable pockets of these selected proteins were investigated with bioinformatic techniques. In addition, this study also mentions the activity of those proteins, as well as their interactions with the other proteins. Our findings helped to identify the type of protein to be considered as an efficient drug target. This study will pave the way for researchers to develop and discover more effective and specific therapeutic agents against S. sanguinis.

  3. Functional structural motifs for protein-ligand, protein-protein, and protein-nucleic acid interactions and their connection to supersecondary structures.

    PubMed

    Kinjo, Akira R; Nakamura, Haruki

    2013-01-01

    Protein functions are mediated by interactions between proteins and other molecules. One useful approach to analyze protein functions is to compare and classify the structures of interaction interfaces of proteins. Here, we describe the procedures for compiling a database of interface structures and efficiently comparing the interface structures. To do so requires a good understanding of the data structures of the Protein Data Bank (PDB). Therefore, we also provide a detailed account of the PDB exchange dictionary necessary for extracting data that are relevant for analyzing interaction interfaces and secondary structures. We identify recurring structural motifs by classifying similar interface structures, and we define a coarse-grained representation of supersecondary structures (SSS) which represents a sequence of two or three secondary structure elements including their relative orientations as a string of four to seven letters. By examining the correspondence between structural motifs and SSS strings, we show that no SSS string has particularly high propensity to be found interaction interfaces in general, indicating any SSS can be used as a binding interface. When individual structural motifs are examined, there are some SSS strings that have high propensity for particular groups of structural motifs. In addition, it is shown that while the SSS strings found in particular structural motifs for nonpolymer and protein interfaces are as abundant as in other structural motifs that belong to the same subunit, structural motifs for nucleic acid interfaces exhibit somewhat stronger preference for SSS strings. In regard to protein folds, many motif-specific SSS strings were found across many folds, suggesting that SSS may be a useful description to investigate the universality of ligand binding modes.

  4. Recent advances in proteomics of cereals.

    PubMed

    Bansal, Monika; Sharma, Madhu; Kanwar, Priyanka; Goyal, Aakash

    Cereals contribute a major part of human nutrition and are considered as an integral source of energy for human diets. With genomic databases already available in cereals such as rice, wheat, barley, and maize, the focus has now moved to proteome analysis. Proteomics studies involve the development of appropriate databases based on developing suitable separation and purification protocols, identification of protein functions, and can confirm their functional networks based on already available data from other sources. Tremendous progress has been made in the past decade in generating huge data-sets for covering interactions among proteins, protein composition of various organs and organelles, quantitative and qualitative analysis of proteins, and to characterize their modulation during plant development, biotic, and abiotic stresses. Proteomics platforms have been used to identify and improve our understanding of various metabolic pathways. This article gives a brief review of efforts made by different research groups on comparative descriptive and functional analysis of proteomics applications achieved in the cereal science so far.

  5. Cry-Bt identifier: a biological database for PCR detection of Cry genes present in transgenic plants.

    PubMed

    Singh, Vinay Kumar; Ambwani, Sonu; Marla, Soma; Kumar, Anil

    2009-10-23

    We describe the development of a user friendly tool that would assist in the retrieval of information relating to Cry genes in transgenic crops. The tool also helps in detection of transformed Cry genes from Bacillus thuringiensis present in transgenic plants by providing suitable designed primers for PCR identification of these genes. The tool designed based on relational database model enables easy retrieval of information from the database with simple user queries. The tool also enables users to access related information about Cry genes present in various databases by interacting with different sources (nucleotide sequences, protein sequence, sequence comparison tools, published literature, conserved domains, evolutionary and structural data). http://insilicogenomics.in/Cry-btIdentifier/welcome.html.

  6. Classification of ligand molecules in PDB with fast heuristic graph match algorithm COMPLIG.

    PubMed

    Saito, Mihoko; Takemura, Naomi; Shirai, Tsuyoshi

    2012-12-14

    A fast heuristic graph-matching algorithm, COMPLIG, was devised to classify the small-molecule ligands in the Protein Data Bank (PDB), which are currently not properly classified on structure basis. By concurrently classifying proteins and ligands, we determined the most appropriate parameter for categorizing ligands to be more than 60% identity of atoms and bonds between molecules, and we classified 11,585 types of ligands into 1946 clusters. Although the large clusters were composed of nucleotides or amino acids, a significant presence of drug compounds was also observed. Application of the system to classify the natural ligand status of human proteins in the current database suggested that, at most, 37% of the experimental structures of human proteins were in complex with natural ligands. However, protein homology- and/or ligand similarity-based modeling was implied to provide models of natural interactions for an additional 28% of the total, which might be used to increase the knowledge of intrinsic protein-metabolite interactions. Copyright © 2012 Elsevier Ltd. All rights reserved.

  7. Global analysis of host-pathogen interactions that regulate early stage HIV-1 replication

    PubMed Central

    König, Renate; Zhou, Yingyao; Elleder, Daniel; Diamond, Tracy L.; Bonamy, Ghislain M.C.; Irelan, Jeffrey T.; Chiang, Chih-yuan; Tu, Buu P.; De Jesus, Paul D.; Lilley, Caroline E.; Seidel, Shannon; Opaluch, Amanda M.; Caldwell, Jeremy S.; Weitzman, Matthew D.; Kuhen, Kelli L.; Bandyopadhyay, Sourav; Ideker, Trey; Orth, Anthony P.; Miraglia, Loren J.; Bushman, Frederic D.; Young, John A.; Chanda, Sumit K.

    2008-01-01

    Human Immunodeficiency Viruses (HIV-1 and HIV-2) rely upon host-encoded proteins to facilitate their replication. Here we combined genome-wide siRNA analyses with interrogation of human interactome databases to assemble a host-pathogen biochemical network containing 213 confirmed host cellular factors and 11 HIV-1-encoded proteins. Protein complexes that regulate ubiquitin conjugation, proteolysis, DNA damage response and RNA splicing were identified as important modulators of early stage HIV-1 infection. Additionally, over 40 new factors were shown to specifically influence initiation and/or kinetics of HIV-1 DNA synthesis, including cytoskeletal regulatory proteins, modulators of post-translational modification, and nucleic acid binding proteins. Finally, fifteen proteins with diverse functional roles, including nuclear transport, prostaglandin synthesis, ubiquitination, and transcription, were found to influence nuclear import or viral DNA integration. Taken together, the multi-scale approach described here has uncovered multiprotein virus-host interactions that likely act in concert to facilitate early steps of HIV-1 infection. PMID:18854154

  8. The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences

    PubMed Central

    Fourment, Mathieu; Gibbs, Mark J

    2008-01-01

    Background Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. Results The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. Conclusion VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically. PMID:18251994

  9. Recent progress and future directions in protein-protein docking.

    PubMed

    Ritchie, David W

    2008-02-01

    This article gives an overview of recent progress in protein-protein docking and it identifies several directions for future research. Recent results from the CAPRI blind docking experiments show that docking algorithms are steadily improving in both reliability and accuracy. Current docking algorithms employ a range of efficient search and scoring strategies, including e.g. fast Fourier transform correlations, geometric hashing, and Monte Carlo techniques. These approaches can often produce a relatively small list of up to a few thousand orientations, amongst which a near-native binding mode is often observed. However, despite the use of improved scoring functions which typically include models of desolvation, hydrophobicity, and electrostatics, current algorithms still have difficulty in identifying the correct solution from the list of false positives, or decoys. Nonetheless, significant progress is being made through better use of bioinformatics, biochemical, and biophysical information such as e.g. sequence conservation analysis, protein interaction databases, alanine scanning, and NMR residual dipolar coupling restraints to help identify key binding residues. Promising new approaches to incorporate models of protein flexibility during docking are being developed, including the use of molecular dynamics snapshots, rotameric and off-rotamer searches, internal coordinate mechanics, and principal component analysis based techniques. Some investigators now use explicit solvent models in their docking protocols. Many of these approaches can be computationally intensive, although new silicon chip technologies such as programmable graphics processor units are beginning to offer competitive alternatives to conventional high performance computer systems. As cryo-EM techniques improve apace, docking NMR and X-ray protein structures into low resolution EM density maps is helping to bridge the resolution gap between these complementary techniques. The use of symmetry and fragment assembly constraints are also helping to make possible docking-based predictions of large multimeric protein complexes. In the near future, the closer integration of docking algorithms with protein interface prediction software, structural databases, and sequence analysis techniques should help produce better predictions of protein interaction networks and more accurate structural models of the fundamental molecular interactions within the cell.

  10. Application of Ferulic Acid for Alzheimer’s Disease: Combination of Text Mining and Experimental Validation

    PubMed Central

    Meng, Guilin; Meng, Xiulin; Ma, Xiaoye; Zhang, Gengping; Hu, Xiaolin; Jin, Aiping; Liu, Xueyuan

    2018-01-01

    Alzheimer’s disease (AD) is an increasing concern in human health. Despite significant research, highly effective drugs to treat AD are lacking. The present study describes the text mining process to identify drug candidates from a traditional Chinese medicine (TCM) database, along with associated protein target mechanisms. We carried out text mining to identify literatures that referenced both AD and TCM and focused on identifying compounds and protein targets of interest. After targeting one potential TCM candidate, corresponding protein-protein interaction (PPI) networks were assembled in STRING to decipher the most possible mechanism of action. This was followed by validation using Western blot and co-immunoprecipitation in an AD cell model. The text mining strategy using a vast amount of AD-related literature and the TCM database identified curcumin, whose major component was ferulic acid (FA). This was used as a key candidate compound for further study. Using the top calculated interaction score in STRING, BACE1 and MMP2 were implicated in the activity of FA in AD. Exposure of SHSY5Y-APP cells to FA resulted in the decrease in expression levels of BACE-1 and APP, while the expression of MMP-2 and MMP-9 increased in a dose-dependent manner. This suggests that FA induced BACE1 and MMP2 pathways maybe novel potential mechanisms involved in AD. The text mining of literature and TCM database related to AD suggested FA as a promising TCM ingredient for the treatment of AD. Potential mechanisms interconnected and integrated with Aβ aggregation inhibition and extracellular matrix remodeling underlying the activity of FA were identified using in vitro studies. PMID:29896095

  11. Analysis of gene expression profile microarray data in complex regional pain syndrome.

    PubMed

    Tan, Wulin; Song, Yiyan; Mo, Chengqiang; Jiang, Shuangjian; Wang, Zhongxing

    2017-09-01

    The aim of the present study was to predict key genes and proteins associated with complex regional pain syndrome (CRPS) using bioinformatics analysis. The gene expression profiling microarray data, GSE47603, which included peripheral blood samples from 4 patients with CRPS and 5 healthy controls, was obtained from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) in CRPS patients compared with healthy controls were identified using the GEO2R online tool. Functional enrichment analysis was then performed using The Database for Annotation Visualization and Integrated Discovery online tool. Protein‑protein interaction (PPI) network analysis was subsequently performed using Search Tool for the Retrieval of Interaction Genes database and analyzed with Cytoscape software. A total of 257 DEGs were identified, including 243 upregulated genes and 14 downregulated ones. Genes in the human leukocyte antigen (HLA) family were most significantly differentially expressed. Enrichment analysis demonstrated that signaling pathways, including immune response, cell motion, adhesion and angiogenesis were associated with CRPS. PPI network analysis revealed that key genes, including early region 1A binding protein p300 (EP300), CREB‑binding protein (CREBBP), signal transducer and activator of transcription (STAT)3, STAT5A and integrin α M were associated with CRPS. The results suggest that the immune response may therefore serve an important role in CRPS development. In addition, genes in the HLA family, such as HLA‑DQB1 and HLA‑DRB1, may present potential biomarkers for the diagnosis of CRPS. Furthermore, EP300, its paralog CREBBP, and the STAT family genes, STAT3 and STAT5 may be important in the development of CRPS.

  12. Application of Ferulic Acid for Alzheimer's Disease: Combination of Text Mining and Experimental Validation.

    PubMed

    Meng, Guilin; Meng, Xiulin; Ma, Xiaoye; Zhang, Gengping; Hu, Xiaolin; Jin, Aiping; Zhao, Yanxin; Liu, Xueyuan

    2018-01-01

    Alzheimer's disease (AD) is an increasing concern in human health. Despite significant research, highly effective drugs to treat AD are lacking. The present study describes the text mining process to identify drug candidates from a traditional Chinese medicine (TCM) database, along with associated protein target mechanisms. We carried out text mining to identify literatures that referenced both AD and TCM and focused on identifying compounds and protein targets of interest. After targeting one potential TCM candidate, corresponding protein-protein interaction (PPI) networks were assembled in STRING to decipher the most possible mechanism of action. This was followed by validation using Western blot and co-immunoprecipitation in an AD cell model. The text mining strategy using a vast amount of AD-related literature and the TCM database identified curcumin, whose major component was ferulic acid (FA). This was used as a key candidate compound for further study. Using the top calculated interaction score in STRING, BACE1 and MMP2 were implicated in the activity of FA in AD. Exposure of SHSY5Y-APP cells to FA resulted in the decrease in expression levels of BACE-1 and APP, while the expression of MMP-2 and MMP-9 increased in a dose-dependent manner. This suggests that FA induced BACE1 and MMP2 pathways maybe novel potential mechanisms involved in AD. The text mining of literature and TCM database related to AD suggested FA as a promising TCM ingredient for the treatment of AD. Potential mechanisms interconnected and integrated with Aβ aggregation inhibition and extracellular matrix remodeling underlying the activity of FA were identified using in vitro studies.

  13. Structural insights of the MLF1/14-3-3 interaction.

    PubMed

    Molzan, Manuela; Weyand, Michael; Rose, Rolf; Ottmann, Christian

    2012-02-01

    Myeloid leukaemia factor 1 (MLF1) binds to 14-3-3 adapter proteins by a sequence surrounding Ser34 with the functional consequences of this interaction largely unknown. We present here the high-resolution crystal structure of this binding motif [MLF1(29-42)pSer34] in complex with 14-3-3ε and analyse the interaction with isothermal titration calorimetry. Fragment-based ligand discovery employing crystals of the binary 14-3-3ε/MLF1(29-42)pSer34 complex was used to identify a molecule that binds to the interface rim of the two proteins, potentially representing the starting point for the development of a small molecule that stabilizes the MLF1/14-3-3 protein-protein interaction. Such a compound might be used as a chemical biology tool to further analyse the 14-3-3/MLF1 interaction without the use of genetic methods. Database Structural data are available in the Protein Data Bank under the accession number(s) 3UAL [14-3-3ε/MLF1(29-42)pSer34 complex] and 3UBW [14-3-3ε/MLF1(29-42)pSer34/3-pyrrolidinol complex] Structured digital abstract •  14-3-3 epsilon and MLF1 bind by x-ray crystallography (View interaction) •  14-3-3 epsilon and MLF1 bind by isothermal titration calorimetry (View Interaction: 1, 2). © 2011 The Authors Journal compilation © 2011 FEBS.

  14. Use of Graph Database for the Integration of Heterogeneous Biological Data.

    PubMed

    Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young

    2017-03-01

    Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.

  15. Use of Graph Database for the Integration of Heterogeneous Biological Data

    PubMed Central

    Yoon, Byoung-Ha; Kim, Seon-Kyu

    2017-01-01

    Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946

  16. Database resources of the National Center for Biotechnology Information.

    PubMed

    Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian

    2011-01-01

    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

  17. Surfing the Protein-Protein Interaction Surface Using Docking Methods: Application to the Design of PPI Inhibitors.

    PubMed

    Sable, Rushikesh; Jois, Seetharama

    2015-06-23

    Blocking protein-protein interactions (PPI) using small molecules or peptides modulates biochemical pathways and has therapeutic significance. PPI inhibition for designing drug-like molecules is a new area that has been explored extensively during the last decade. Considering the number of available PPI inhibitor databases and the limited number of 3D structures available for proteins, docking and scoring methods play a major role in designing PPI inhibitors as well as stabilizers. Docking methods are used in the design of PPI inhibitors at several stages of finding a lead compound, including modeling the protein complex, screening for hot spots on the protein-protein interaction interface and screening small molecules or peptides that bind to the PPI interface. There are three major challenges to the use of docking on the relatively flat surfaces of PPI. In this review we will provide some examples of the use of docking in PPI inhibitor design as well as its limitations. The combination of experimental and docking methods with improved scoring function has thus far resulted in few success stories of PPI inhibitors for therapeutic purposes. Docking algorithms used for PPI are in the early stages, however, and as more data are available docking will become a highly promising area in the design of PPI inhibitors or stabilizers.

  18. PPCM: Combing multiple classifiers to improve protein-protein interaction prediction

    DOE PAGES

    Yao, Jianzhuang; Guo, Hong; Yang, Xiaohan

    2015-08-01

    Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using anmore » assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. Ultimately, this pipeline will be useful for predicting PPI in nonmodel species.« less

  19. Biomine: predicting links between biological entities using network models of heterogeneous databases.

    PubMed

    Eronen, Lauri; Toivonen, Hannu

    2012-06-06

    Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.

  20. ROCS: a Reproducibility Index and Confidence Score for Interaction Proteomics Studies

    PubMed Central

    2012-01-01

    Background Affinity-Purification Mass-Spectrometry (AP-MS) provides a powerful means of identifying protein complexes and interactions. Several important challenges exist in interpreting the results of AP-MS experiments. First, the reproducibility of AP-MS experimental replicates can be low, due both to technical variability and the dynamic nature of protein interactions in the cell. Second, the identification of true protein-protein interactions in AP-MS experiments is subject to inaccuracy due to high false negative and false positive rates. Several experimental approaches can be used to mitigate these drawbacks, including the use of replicated and control experiments and relative quantification to sensitively distinguish true interacting proteins from false ones. Methods To address the issues of reproducibility and accuracy of protein-protein interactions, we introduce a two-step method, called ROCS, which makes use of Indicator Prey Proteins to select reproducible AP-MS experiments, and of Confidence Scores to select specific protein-protein interactions. The Indicator Prey Proteins account for measures of protein identifiability as well as protein reproducibility, effectively allowing removal of outlier experiments that contribute noise and affect downstream inferences. The filtered set of experiments is then used in the Protein-Protein Interaction (PPI) scoring step. Prey protein scoring is done by computing a Confidence Score, which accounts for the probability of occurrence of prey proteins in the bait experiments relative to the control experiment, where the significance cutoff parameter is estimated by simultaneously controlling false positives and false negatives against metrics of false discovery rate and biological coherence respectively. In summary, the ROCS method relies on automatic objective criterions for parameter estimation and error-controlled procedures. Results We illustrate the performance of our method by applying it to five previously published AP-MS experiments, each containing well characterized protein interactions, allowing for systematic benchmarking of ROCS. We show that our method may be used on its own to make accurate identification of specific, biologically relevant protein-protein interactions, or in combination with other AP-MS scoring methods to significantly improve inferences. Conclusions Our method addresses important issues encountered in AP-MS datasets, making ROCS a very promising tool for this purpose, either on its own or in conjunction with other methods. We anticipate that our methodology may be used more generally in proteomics studies and databases, where experimental reproducibility issues arise. The method is implemented in the R language, and is available as an R package called “ROCS”, freely available from the CRAN repository http://cran.r-project.org/. PMID:22682516

  1. Bioinformatics Analysis of Protein Phosphorylation in Plant Systems Biology Using P3DB.

    PubMed

    Yao, Qiuming; Xu, Dong

    2017-01-01

    Protein phosphorylation is one of the most pervasive protein post-translational modification events in plant cells. It is involved in many plant biological processes, such as plant growth, organ development, and plant immunology, by regulating or switching signaling and metabolic pathways. High-throughput experimental methods like mass spectrometry can easily characterize hundreds to thousands of phosphorylation events in a single experiment. With the increasing volume of the data sets, Plant Protein Phosphorylation DataBase (P3DB, http://p3db.org ) provides a comprehensive, systematic, and interactive online platform to deposit, query, analyze, and visualize these phosphorylation events in many plant species. It stores the protein phosphorylation sites in the context of identified mass spectra, phosphopeptides, and phosphoproteins contributed from various plant proteome studies. In addition, P3DB associates these plant phosphorylation sites to protein physicochemical information in the protein charts and tertiary structures, while various protein annotations from hierarchical kinase phosphatase families, protein domains, and gene ontology are also added into the database. P3DB not only provides rich information, but also interconnects and provides visualization of the data in networks, in systems biology context. Currently, P3DB includes the KiC (Kinase Client) assay network, the protein-protein interaction network, the kinase-substrate network, the phosphatase-substrate network, and the protein domain co-occurrence network. All of these are available to query for and visualize existing phosphorylation events. Although P3DB only hosts experimentally identified phosphorylation data, it provides a plant phosphorylation prediction model for any unknown queries on the fly. P3DB is an entry point to the plant phosphorylation community to deposit and visualize any customized data sets within this systems biology framework. Nowadays, P3DB has become one of the major bioinformatics platforms of protein phosphorylation in plant biology.

  2. Identification and analysis of potential targets in Streptococcus sanguinis using computer aided protein data analysis

    PubMed Central

    Chowdhury, Md Rabiul Hossain; Bhuiyan, Md IqbalKaiser; Saha, Ayan; Mosleh, Ivan MHAI; Mondol, Sobuj; Ahmed, C M Sabbir

    2014-01-01

    Purpose Streptococcus sanguinis is a Gram-positive, facultative aerobic bacterium that is a member of the viridans streptococcus group. It is found in human mouths in dental plaque, which accounts for both dental cavities and bacterial endocarditis, and which entails a mortality rate of 25%. Although a range of remedial mediators have been found to control this organism, the effectiveness of agents such as penicillin, amoxicillin, trimethoprim–sulfamethoxazole, and erythromycin, was observed. The emphasis of this investigation was on finding substitute and efficient remedial approaches for the total destruction of this bacterium. Materials and methods In this computational study, various databases and online software were used to ascertain some specific targets of S. sanguinis. Particularly, the Kyoto Encyclopedia of Genes and Genomes databases were applied to determine human nonhomologous proteins, as well as the metabolic pathways involved with those proteins. Different software such as Phyre2, CastP, DoGSiteScorer, the Protein Function Predictor server, and STRING were utilized to evaluate the probable active drug binding site with its known function and protein–protein interaction. Results In this study, among 218 essential proteins of this pathogenic bacterium, 81 nonhomologous proteins were accrued, and 15 proteins that are unique in several metabolic pathways of S. sanguinis were isolated through metabolic pathway analysis. Furthermore, four essentially membrane-bound unique proteins that are involved in distinct metabolic pathways were revealed by this research. Active sites and druggable pockets of these selected proteins were investigated with bioinformatic techniques. In addition, this study also mentions the activity of those proteins, as well as their interactions with the other proteins. Conclusion Our findings helped to identify the type of protein to be considered as an efficient drug target. This study will pave the way for researchers to develop and discover more effective and specific therapeutic agents against S. sanguinis. PMID:25473301

  3. PharmDB-K: Integrated Bio-Pharmacological Network Database for Traditional Korean Medicine

    PubMed Central

    Lee, Ji-Hyun; Park, Kyoung Mii; Han, Dong-Jin; Bang, Nam Young; Kim, Do-Hee; Na, Hyeongjin; Lim, Semi; Kim, Tae Bum; Kim, Dae Gyu; Kim, Hyun-Jung; Chung, Yeonseok; Sung, Sang Hyun; Surh, Young-Joon; Kim, Sunghoon; Han, Byung Woo

    2015-01-01

    Despite the growing attention given to Traditional Medicine (TM) worldwide, there is no well-known, publicly available, integrated bio-pharmacological Traditional Korean Medicine (TKM) database for researchers in drug discovery. In this study, we have constructed PharmDB-K, which offers comprehensive information relating to TKM-associated drugs (compound), disease indication, and protein relationships. To explore the underlying molecular interaction of TKM, we integrated fourteen different databases, six Pharmacopoeias, and literature, and established a massive bio-pharmacological network for TKM and experimentally validated some cases predicted from the PharmDB-K analyses. Currently, PharmDB-K contains information about 262 TKMs, 7,815 drugs, 3,721 diseases, 32,373 proteins, and 1,887 side effects. One of the unique sets of information in PharmDB-K includes 400 indicator compounds used for standardization of herbal medicine. Furthermore, we are operating PharmDB-K via phExplorer (a network visualization software) and BioMart (a data federation framework) for convenient search and analysis of the TKM network. Database URL: http://pharmdb-k.org, http://biomart.i-pharm.org. PMID:26555441

  4. Disease gene classification with metagraph representations.

    PubMed

    Kircali Ata, Sezin; Fang, Yuan; Wu, Min; Li, Xiao-Li; Xiao, Xiaokui

    2017-12-01

    Protein-protein interaction (PPI) networks play an important role in studying the functional roles of proteins, including their association with diseases. However, protein interaction networks are not sufficient without the support of additional biological knowledge for proteins such as their molecular functions and biological processes. To complement and enrich PPI networks, we propose to exploit biological properties of individual proteins. More specifically, we integrate keywords describing protein properties into the PPI network, and construct a novel PPI-Keywords (PPIK) network consisting of both proteins and keywords as two different types of nodes. As disease proteins tend to have a similar topological characteristics on the PPIK network, we further propose to represent proteins with metagraphs. Different from a traditional network motif or subgraph, a metagraph can capture a particular topological arrangement involving the interactions/associations between both proteins and keywords. Based on the novel metagraph representations for proteins, we further build classifiers for disease protein classification through supervised learning. Our experiments on three different PPI databases demonstrate that the proposed method consistently improves disease protein prediction across various classifiers, by 15.3% in AUC on average. It outperforms the baselines including the diffusion-based methods (e.g., RWR) and the module-based methods by 13.8-32.9% for overall disease protein prediction. For predicting breast cancer genes, it outperforms RWR, PRINCE and the module-based baselines by 6.6-14.2%. Finally, our predictions also turn out to have better correlations with literature findings from PubMed. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Comprehensive inventory of protein complexes in the Protein Data Bank from consistent classification of interfaces.

    PubMed

    Bordner, Andrew J; Gorin, Andrey A

    2008-05-12

    Protein-protein interactions are ubiquitous and essential for all cellular processes. High-resolution X-ray crystallographic structures of protein complexes can reveal the details of their function and provide a basis for many computational and experimental approaches. Differentiation between biological and non-biological contacts and reconstruction of the intact complex is a challenging computational problem. A successful solution can provide additional insights into the fundamental principles of biological recognition and reduce errors in many algorithms and databases utilizing interaction information extracted from the Protein Data Bank (PDB). We have developed a method for identifying protein complexes in the PDB X-ray structures by a four step procedure: (1) comprehensively collecting all protein-protein interfaces; (2) clustering similar protein-protein interfaces together; (3) estimating the probability that each cluster is relevant based on a diverse set of properties; and (4) combining these scores for each PDB entry in order to predict the complex structure. The resulting clusters of biologically relevant interfaces provide a reliable catalog of evolutionary conserved protein-protein interactions. These interfaces, as well as the predicted protein complexes, are available from the Protein Interface Server (PInS) website (see Availability and requirements section). Our method demonstrates an almost two-fold reduction of the annotation error rate as evaluated on a large benchmark set of complexes validated from the literature. We also estimate relative contributions of each interface property to the accurate discrimination of biologically relevant interfaces and discuss possible directions for further improving the prediction method.

  6. FEZ2 has acquired additional protein interaction partners relative to FEZ1: functional and evolutionary implications.

    PubMed

    Alborghetti, Marcos R; Furlan, Ariane S; Kobarg, Jörg

    2011-03-08

    The FEZ (fasciculation and elongation protein zeta) family designation was purposed by Bloom and Horvitz by genetic analysis of C. elegans unc-76. Similar human sequences were identified in the expressed sequence tag database as FEZ1 and FEZ2. The unc-76 function is necessary for normal axon fasciculation and is required for axon-axon interactions. Indeed, the loss of UNC-76 function results in defects in axonal transport. The human FEZ1 protein has been shown to rescue defects caused by unc-76 mutations in nematodes, indicating that both UNC-76 and FEZ1 are evolutionarily conserved in their function. Until today, little is known about FEZ2 protein function. Using the yeast two-hybrid system we demonstrate here conserved evolutionary features among orthologs and non-conserved features between paralogs of the FEZ family of proteins, by comparing the interactome profiles of the C-terminals of human FEZ1, FEZ2 and UNC-76 from C. elegans. Furthermore, we correlate our data with an analysis of the molecular evolution of the FEZ protein family in the animal kingdom. We found that FEZ2 interacted with 59 proteins and that of these only 40 interacted with FEZ1. Of the 40 FEZ1 interacting proteins, 36 (90%), also interacted with UNC-76 and none of the 19 FEZ2 specific proteins interacted with FEZ1 or UNC-76. This together with the duplication of unc-76 gene in the ancestral line of chordates suggests that FEZ2 is in the process of acquiring new additional functions. The results provide also an explanation for the dramatic difference between C. elegans and D. melanogaster unc-76 mutants on one hand, which cause serious defects in the nervous system, and the mouse FEZ1 -/- knockout mice on the other, which show no morphological and no strong behavioural phenotype. Likely, the ubiquitously expressed FEZ2 can completely compensate the lack of neuronal FEZ1, since it can interact with all FEZ1 interacting proteins and additional 19 proteins.

  7. FEZ2 Has Acquired Additional Protein Interaction Partners Relative to FEZ1: Functional and Evolutionary Implications

    PubMed Central

    Alborghetti, Marcos R.; Furlan, Ariane S.; Kobarg, Jörg

    2011-01-01

    Background The FEZ (fasciculation and elongation protein zeta) family designation was purposed by Bloom and Horvitz by genetic analysis of C. elegans unc-76. Similar human sequences were identified in the expressed sequence tag database as FEZ1 and FEZ2. The unc-76 function is necessary for normal axon fasciculation and is required for axon-axon interactions. Indeed, the loss of UNC-76 function results in defects in axonal transport. The human FEZ1 protein has been shown to rescue defects caused by unc-76 mutations in nematodes, indicating that both UNC-76 and FEZ1 are evolutionarily conserved in their function. Until today, little is known about FEZ2 protein function. Methodology/Principal Findings Using the yeast two-hybrid system we demonstrate here conserved evolutionary features among orthologs and non-conserved features between paralogs of the FEZ family of proteins, by comparing the interactome profiles of the C-terminals of human FEZ1, FEZ2 and UNC-76 from C. elegans. Furthermore, we correlate our data with an analysis of the molecular evolution of the FEZ protein family in the animal kingdom. Conclusions/Significance We found that FEZ2 interacted with 59 proteins and that of these only 40 interacted with FEZ1. Of the 40 FEZ1 interacting proteins, 36 (90%), also interacted with UNC-76 and none of the 19 FEZ2 specific proteins interacted with FEZ1 or UNC-76. This together with the duplication of unc-76 gene in the ancestral line of chordates suggests that FEZ2 is in the process of acquiring new additional functions. The results provide also an explanation for the dramatic difference between C. elegans and D. melanogaster unc-76 mutants on one hand, which cause serious defects in the nervous system, and the mouse FEZ1 -/- knockout mice on the other, which show no morphological and no strong behavioural phenotype. Likely, the ubiquitously expressed FEZ2 can completely compensate the lack of neuronal FEZ1, since it can interact with all FEZ1 interacting proteins and additional 19 proteins. PMID:21408165

  8. MPIC: a mitochondrial protein import components database for plant and non-plant species.

    PubMed

    Murcha, Monika W; Narsai, Reena; Devenish, James; Kubiszewski-Jakubiak, Szymon; Whelan, James

    2015-01-01

    In the 2 billion years since the endosymbiotic event that gave rise to mitochondria, variations in mitochondrial protein import have evolved across different species. With the genomes of an increasing number of plant species sequenced, it is possible to gain novel insights into mitochondrial protein import pathways. We have generated the Mitochondrial Protein Import Components (MPIC) Database (DB; http://www.plantenergy.uwa.edu.au/applications/mpic) providing searchable information on the protein import apparatus of plant and non-plant mitochondria. An in silico analysis was carried out, comparing the mitochondrial protein import apparatus from 24 species representing various lineages from Saccharomyces cerevisiae (yeast) and algae to Homo sapiens (human) and higher plants, including Arabidopsis thaliana (Arabidopsis), Oryza sativa (rice) and other more recently sequenced plant species. Each of these species was extensively searched and manually assembled for analysis in the MPIC DB. The database presents an interactive diagram in a user-friendly manner, allowing users to select their import component of interest. The MPIC DB presents an extensive resource facilitating detailed investigation of the mitochondrial protein import machinery and allowing patterns of conservation and divergence to be recognized that would otherwise have been missed. To demonstrate the usefulness of the MPIC DB, we present a comparative analysis of the mitochondrial protein import machinery in plants and non-plant species, revealing plant-specific features that have evolved. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  9. Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms

    PubMed Central

    Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei

    2016-01-01

    Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851

  10. FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.

    PubMed

    El-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant

    2016-01-01

    A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.

  11. Reconstruction of the bifidobacterial pan-secretome reveals the network of extracellular interactions between bifidobacteria and the infant gut.

    PubMed

    Lugli, Gabriele Andrea; Mancino, Walter; Milani, Christian; Duranti, Sabrina; Turroni, Francesca; van Sinderen, Douwe; Ventura, Marco

    2018-06-08

    The repertoire of secreted proteins decoded by a microorganism represents proteins released from or associated with the cell's surface. In gut commensals, such as bifidobacteria, these proteins are perceived to be functionally relevant as they regulate the interaction with the gut environment. In the current study, we have screened the predicted proteome of over 300 bifidobacterial strains amongst the currently recognized bifidobacterial species to generate a comprehensive database encompassing bifidobacterial extracellular proteins. A glycobiome analysis of this predicted bifidobacterial secretome revealed that a correlation exists between particular bifidobacterial species and their capability to hydrolyze HMOs and intestinal glyconjugates such as mucin. Furthermore, exploration of metatranscriptomic datasets of the infant gut microbiota allowed the evaluation of the expression of bifidobacterial genes encoding extracellular proteins, represented by ABC transporter substrate-binding proteins and glycoside hydrolases enzymes involved in the degradation of human milk oligosaccharides and mucin. Overall, this study provides insights into how bifidobacteria interact with their natural yet highly complex environment, the infant gut. Importance The ecological success of bifidobacteria relies on the activity of extracellular proteins that are involved in the metabolism of nutrients and the interaction with the environment. To date, information on secreted proteins encoded by bifidobacteria are incomplete and just related to few species. In this study, we reconstructed the bifidobacterial pan-secretome, revealing extracellular proteins that modulate the interaction of bifidobacteria with their natural environment. Furthermore, a survey of secretion system between bifidobacterial genomes allowed the identification of a conserved Sec-dependent secretion machinery in all the analyzed genomes and the Tat protein translocation system in the chromosomes of 23 strains belonging to Bifidobacterium longum subsp. longum and Bifidobacterium aesculapii . Copyright © 2018 American Society for Microbiology.

  12. GEneSTATION 1.0: a synthetic resource of diverse evolutionary and functional genomic data for studying the evolution of pregnancy-associated tissues and phenotypes

    PubMed Central

    Kim, Mara; Cooper, Brian A.; Venkat, Rohit; Phillips, Julie B.; Eidem, Haley R.; Hirbo, Jibril; Nutakki, Sashank; Williams, Scott M.; Muglia, Louis J.; Capra, J. Anthony; Petren, Kenneth; Abbot, Patrick; Rokas, Antonis; McGary, Kriston L.

    2016-01-01

    Mammalian gestation and pregnancy are fast evolving processes that involve the interaction of the fetal, maternal and paternal genomes. Version 1.0 of the GEneSTATION database (http://genestation.org) integrates diverse types of omics data across mammals to advance understanding of the genetic basis of gestation and pregnancy-associated phenotypes and to accelerate the translation of discoveries from model organisms to humans. GEneSTATION is built using tools from the Generic Model Organism Database project, including the biology-aware database CHADO, new tools for rapid data integration, and algorithms that streamline synthesis and user access. GEneSTATION contains curated life history information on pregnancy and reproduction from 23 high-quality mammalian genomes. For every human gene, GEneSTATION contains diverse evolutionary (e.g. gene age, population genetic and molecular evolutionary statistics), organismal (e.g. tissue-specific gene and protein expression, differential gene expression, disease phenotype), and molecular data types (e.g. Gene Ontology Annotation, protein interactions), as well as links to many general (e.g. Entrez, PubMed) and pregnancy disease-specific (e.g. PTBgene, dbPTB) databases. By facilitating the synthesis of diverse functional and evolutionary data in pregnancy-associated tissues and phenotypes and enabling their quick, intuitive, accurate and customized meta-analysis, GEneSTATION provides a novel platform for comprehensive investigation of the function and evolution of mammalian pregnancy. PMID:26567549

  13. In silico analysis of candidate proteins sharing homology with Streptococcus agalactiae proteins and their role in male infertility.

    PubMed

    Parida, Rajeshwari; Samanta, Luna

    2017-02-01

    Leukocytospermia is a physiologic condition defined as human semen with a leukocyte count of >1 x 10 6 cells/ml that is often correlated with male infertility. Moreover, bacteriospermia has been associated with leukocytospermia ultimately leading to male infertility. We have found that semen samples with >1 x 10 6 /ml leukocytes and/or bacteriospermia have oxidative predominance as evidenced by augmented protein carbonyl and lipid peroxidation status of the semen which is implicated in sperm dysfunction. It has been reported that Streptococcus agalactiae is present in bacteriospermic samples. Previous research has shown that human leukocyte antigen beta chain paralog (HLA-DRB) alleles interact best with the infected sperm cells rather than the non-infected cells. Little is known about the interaction of major histocompatibility complex (MHC) present on leukocytes with the sperm upon bacterial infection and how it induces an immunological response which we have addressed by epitope mapping. Therefore, we examined MHC class II derived bacterial peptides which might have human sperm-related functional aspects. Twenty-two S. agalactiae proteins were obtained from PUBMED protein database for our study. Protein sequences with more than two accession numbers were aligned using CLUSTAL Omega to check their conservation pattern. Each protein sequence was then analyzed for T-cell epitope prediction against HLA-DRB alleles using the immune epitope database (IEDB) analysis tool. Out of a plethora of peptides obtained from this analysis, peptides corresponding to proteins of interest such as DNA binding response regulator, hyaluronate lyase and laminin binding protein were screened against the human proteome using Blastp. Interestingly, we have found bacterial peptides sharing homology with human peptides deciphering some of the important sperm functions. Antibodies raised against these probable bacterial antigens of fertility will not only help us understand the mechanism of leukocytospermia/bacteriospermia induced male factor infertility but also open new avenues for immunocontraception. AA: amino acid; ASA: antisperm antibodies; GBS: group B streptococcus; HLA: human leukocyte antigen; HAS3: hyaluronan synthase 3: IEDB: immune epitope database; MAPO2: O 6 -methylguanine-induced apoptosis 2; MHC: major histocompatibility complex; ROS: reactive oxygen species; Rosbin1: round spermatid basic protein 1; S. agalactiae: Streptococcus agalactiae;SA: sperm antigen; SPATA17: spermatogenesis associated protein17; SPNR: spermatid perinuclear RNA binding protein; TEX15: testis-expressed sequence 15 protein; TOPAZ: testis- and ovary-specific PAZ domain-containing protein; TPABP: testis-specific poly-A binding protein; TPAP: testis-specific poly(A) polymerase; WHO: World Health Organization.

  14. Approaches for Defining the Hsp90-dependent Proteome

    PubMed Central

    Hartson, Steven D.; Matts, Robert L.

    2011-01-01

    Hsp90 is the target of ongoing drug discovery studies seeking new compounds to treat cancer, neurodegenerative diseases, and protein folding disorders. To better understand Hsp90’s roles in cellular pathologies and in normal cells, numerous studies have utilized proteomics assays and related high-throughput tools to characterize its physical and functional protein partnerships. This review surveys these studies, and summarizes the strengths and limitations of the individual attacks. We also include downloadable spreadsheets compiling all of the Hsp90-interacting proteins identified in more than 23 studies. These tools include cross-references among gene aliases, human homologues of yeast Hsp90-interacting proteins, hyperlinks to database entries, summaries of canonical pathways that are enriched in the Hsp90 interactome, and additional bioinformatic annotations. In addition to summarizing Hsp90 proteomics studies performed to date and the insights they have provided, we identify gaps in our current understanding of Hsp90-mediated proteostasis. PMID:21906632

  15. Human Mitochondrial Protein Database

    National Institute of Standards and Technology Data Gateway

    SRD 131 Human Mitochondrial Protein Database (Web, free access)   The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.

  16. Prediction of Ras-effector interactions using position energy matrices.

    PubMed

    Kiel, Christina; Serrano, Luis

    2007-09-01

    One of the more challenging problems in biology is to determine the cellular protein interaction network. Progress has been made to predict protein-protein interactions based on structural information, assuming that structural similar proteins interact in a similar way. In a previous publication, we have determined a genome-wide Ras-effector interaction network based on homology models, with a high accuracy of predicting binding and non-binding domains. However, for a prediction on a genome-wide scale, homology modelling is a time-consuming process. Therefore, we here successfully developed a faster method using position energy matrices, where based on different Ras-effector X-ray template structures, all amino acids in the effector binding domain are sequentially mutated to all other amino acid residues and the effect on binding energy is calculated. Those pre-calculated matrices can then be used to score for binding any Ras or effector sequences. Based on position energy matrices, the sequences of putative Ras-binding domains can be scanned quickly to calculate an energy sum value. By calibrating energy sum values using quantitative experimental binding data, thresholds can be defined and thus non-binding domains can be excluded quickly. Sequences which have energy sum values above this threshold are considered to be potential binding domains, and could be further analysed using homology modelling. This prediction method could be applied to other protein families sharing conserved interaction types, in order to determine in a fast way large scale cellular protein interaction networks. Thus, it could have an important impact on future in silico structural genomics approaches, in particular with regard to increasing structural proteomics efforts, aiming to determine all possible domain folds and interaction types. All matrices are deposited in the ADAN database (http://adan-embl.ibmc.umh.es/). Supplementary data are available at Bioinformatics online.

  17. The Protein Information Resource: an integrated public resource of functional annotation of proteins

    PubMed Central

    Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.

    2002-01-01

    The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247

  18. A database of human genes and a gene network involved in response to tick-borne encephalitis virus infection.

    PubMed

    Ignatieva, Elena V; Igoshin, Alexander V; Yudin, Nikolay S

    2017-12-28

    Tick-borne encephalitis is caused by the neurotropic, positive-sense RNA virus, tick-borne encephalitis virus (TBEV). TBEV infection can lead to a variety of clinical manifestations ranging from slight fever to severe neurological illness. Very little is known about genetic factors predisposing to severe forms of disease caused by TBEV. The aims of the study were to compile a catalog of human genes involved in response to TBEV infection and to rank genes from the catalog based on the number of neighbors in the network of pairwise interactions involving these genes and TBEV RNA or proteins. Based on manual review and curation of scientific publications a catalog comprising 140 human genes involved in response to TBEV infection was developed. To provide access to data on all genes, the TBEVhostDB web resource ( http://icg.nsc.ru/TBEVHostDB/ ) was created. We reconstructed a network formed by pairwise interactions between TBEV virion itself, viral RNA and viral proteins and 140 genes/proteins from TBEVHostDB. Genes were ranked according to the number of interactions in the network. Two genes/proteins (CCR5 and IFNAR1) that had maximal number of interactions were revealed. It was found that the subnetworks formed by CCR5 and IFNAR1 and their neighbors were a fragments of two key pathways functioning during the course of tick-borne encephalitis: (1) the attenuation of interferon-I signaling pathway by the TBEV NS5 protein that targeted peptidase D; (2) proinflammation and tissue damage pathway triggered by chemokine receptor CCR5 interacting with CD4, CCL3, CCL4, CCL2. Among nine genes associated with severe forms of TBEV infection, three genes/proteins (CCR5, IL10, ARID1B) were found to have protein-protein interactions within the network, and two genes/proteins (IFNL3 and the IL10, that was just mentioned) were up- or down-regulated in response to TBEV infection. Based on this finding, potential mechanisms for participation of CCR5, IL10, ARID1B, and IFNL3 in the host response to TBEV infection were suggested. A database comprising 140 human genes involved in response to TBEV infection was compiled and the TBEVHostDB web resource, providing access to all genes was created. This is the first effort of integrating and unifying data on genetic factors that may predispose to severe forms of diseases caused by TBEV. The TBEVHostDB could potentially be used for assessment of risk factors for severe forms of tick-borne encephalitis and for the design of personalized pharmacological strategies for the treatment of TBEV infection.

  19. Protein-protein interaction network of gene expression in the hydrocortisone-treated keloid.

    PubMed

    Chen, Rui; Zhang, Zhiliang; Xue, Zhujia; Wang, Lin; Fu, Mingang; Lu, Yi; Bai, Ling; Zhang, Ping; Fan, Zhihong

    2015-01-01

    In order to explore the molecular mechanism of hydrocortisone in keloid tissue, the gene expression profiles of keloid samples treated with hydrocortisone were subjected to bioinformatics analysis. Firstly, the gene expression profiles (GSE7890) of five samples of keloid treated with hydrocortisone and five untreated keloid samples were downloaded from the Gene Expression Omnibus (GEO) database. Secondly, data were preprocessed using packages in R language and differentially expressed genes (DEGs) were screened using a significance analysis of microarrays (SAM) protocol. Thirdly, the DEGs were subjected to gene ontology (GO) function and KEGG pathway enrichment analysis. Finally, the interactions of DEGs in samples of keloid treated with hydrocortisone were explored in a human protein-protein interaction (PPI) network, and sub-modules of the DEGs interaction network were analyzed using Cytoscape software. Based on the analysis, 572 DEGs in the hydrocortisone-treated samples were screened; most of these were involved in the signal transduction and cell cycle. Furthermore, three critical genes in the module, including COL1A1, NID1, and PRELP, were screened in the PPI network analysis. These findings enhance understanding of the pathogenesis of the keloid and provide references for keloid therapy. © 2015 The International Society of Dermatology.

  20. Computational prediction of protein-protein interactions in Leishmania predicted proteomes.

    PubMed

    Rezende, Antonio M; Folador, Edson L; Resende, Daniela de M; Ruiz, Jeronimo C

    2012-01-01

    The Trypanosomatids parasites Leishmania braziliensis, Leishmania major and Leishmania infantum are important human pathogens. Despite of years of study and genome availability, effective vaccine has not been developed yet, and the chemotherapy is highly toxic. Therefore, it is clear just interdisciplinary integrated studies will have success in trying to search new targets for developing of vaccines and drugs. An essential part of this rationale is related to protein-protein interaction network (PPI) study which can provide a better understanding of complex protein interactions in biological system. Thus, we modeled PPIs for Trypanosomatids through computational methods using sequence comparison against public database of protein or domain interaction for interaction prediction (Interolog Mapping) and developed a dedicated combined system score to address the predictions robustness. The confidence evaluation of network prediction approach was addressed using gold standard positive and negative datasets and the AUC value obtained was 0.94. As result, 39,420, 43,531 and 45,235 interactions were predicted for L. braziliensis, L. major and L. infantum respectively. For each predicted network the top 20 proteins were ranked by MCC topological index. In addition, information related with immunological potential, degree of protein sequence conservation among orthologs and degree of identity compared to proteins of potential parasite hosts was integrated. This information integration provides a better understanding and usefulness of the predicted networks that can be valuable to select new potential biological targets for drug and vaccine development. Network modularity which is a key when one is interested in destabilizing the PPIs for drug or vaccine purposes along with multiple alignments of the predicted PPIs were performed revealing patterns associated with protein turnover. In addition, around 50% of hypothetical protein present in the networks received some degree of functional annotation which represents an important contribution since approximately 60% of Leishmania predicted proteomes has no predicted function.

  1. Cloning and characterization of carboxyl terminus of heat shock cognate 70-interacting protein gene from the silkworm, Bombyx mori.

    PubMed

    Ohsawa, Takeshi; Fujimoto, Shota; Tsunakawa, Akane; Shibano, Yuka; Kawasaki, Hideki; Iwanaga, Masashi

    2016-11-01

    Carboxyl terminus of heat shock cognate 70-interacting protein (CHIP) is an evolutionarily conserved E3 ubiquitin ligase across different eukaryotic species and is known to play a key role in protein quality control. CHIP has two distinct functional domains, an N-terminal tetratricopeptide repeat (TPR) and a C-terminal U-box domain, which are required for the ubiquitination of numerous labile client proteins that are chaperoned by heat shock proteins (HSPs) and heat shock cognate proteins (HSCs). During our screen for CHIP-like proteins in the Bombyx mori databases, we found a novel silkworm gene, Bombyx mori CHIP. Phylogenetic analysis showed that BmCHIP belongs to Lepidopteran lineages. Quantitative reverse transcription-PCR analysis indicated that BmCHIP was relatively highly expressed in the gonad and fat body. A pull-down experiment and auto-ubiquitination assay showed that BmCHIP interacted with BmHSC70 and had E3 ligase activity. Additionally, immunohistochemical analysis revealed that BmCHIP was partially co-localized with ubiquitin in BmN4 cells. These data support that BmCHIP plays an important role in the ubiquitin proteasome system as an E3 ubiquitin ligase in B. mori. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. Molecular Interaction Map of the Mammalian Cell Cycle Control and DNA Repair Systems

    PubMed Central

    Kohn, Kurt W.

    1999-01-01

    Eventually to understand the integrated function of the cell cycle regulatory network, we must organize the known interactions in the form of a diagram, map, and/or database. A diagram convention was designed capable of unambiguous representation of networks containing multiprotein complexes, protein modifications, and enzymes that are substrates of other enzymes. To facilitate linkage to a database, each molecular species is symbolically represented only once in each diagram. Molecular species can be located on the map by means of indexed grid coordinates. Each interaction is referenced to an annotation list where pertinent information and references can be found. Parts of the network are grouped into functional subsystems. The map shows how multiprotein complexes could assemble and function at gene promoter sites and at sites of DNA damage. It also portrays the richness of connections between the p53-Mdm2 subsystem and other parts of the network. PMID:10436023

  3. Predicting Protein Relationships to Human Pathways through a Relational Learning Approach Based on Simple Sequence Features.

    PubMed

    García-Jiménez, Beatriz; Pons, Tirso; Sanchis, Araceli; Valencia, Alfonso

    2014-01-01

    Biological pathways are important elements of systems biology and in the past decade, an increasing number of pathway databases have been set up to document the growing understanding of complex cellular processes. Although more genome-sequence data are becoming available, a large fraction of it remains functionally uncharacterized. Thus, it is important to be able to predict the mapping of poorly annotated proteins to original pathway models. We have developed a Relational Learning-based Extension (RLE) system to investigate pathway membership through a function prediction approach that mainly relies on combinations of simple properties attributed to each protein. RLE searches for proteins with molecular similarities to specific pathway components. Using RLE, we associated 383 uncharacterized proteins to 28 pre-defined human Reactome pathways, demonstrating relative confidence after proper evaluation. Indeed, in specific cases manual inspection of the database annotations and the related literature supported the proposed classifications. Examples of possible additional components of the Electron transport system, Telomere maintenance and Integrin cell surface interactions pathways are discussed in detail. All the human predicted proteins in the 2009 and 2012 releases 30 and 40 of Reactome are available at http://rle.bioinfo.cnio.es.

  4. Proteomic characterization of hempseed (Cannabis sativa L.).

    PubMed

    Aiello, Gilda; Fasoli, Elisa; Boschin, Giovanna; Lammi, Carmen; Zanoni, Chiara; Citterio, Attilio; Arnoldi, Anna

    2016-09-16

    This paper presents an investigation on hempseed proteome. The experimental approach, based on combinatorial peptide ligand libraries (CPLLs), SDS-PAGE separation, nLC-ESI-MS/MS identification, and database search, permitted identifying in total 181 expressed proteins. This very large number of identifications was achieved by searching in two databases: Cannabis sativa L. (56 gene products identified) and Arabidopsis thaliana (125 gene products identified). By performing a protein-protein association network analysis using the STRING software, it was possible to build the first interactomic map of all detected proteins, characterized by 137 nodes and 410 interactions. Finally, a Gene Ontology analysis of the identified species permitted to classify their molecular functions: the great majority is involved in the seed metabolic processes (41%), responses to stimulus (8%), and biological process (7%). Hempseed is an underexploited non-legume protein-rich seed. Although its protein is well known for its digestibility, essential amino acid composition, and useful techno-functional properties, a comprehensive proteome characterization is still lacking. The objective of this work was to fill this knowledge gap and provide information useful for a better exploitation of this seed in different food products. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Prediction of hot spots in protein interfaces using a random forest model with hybrid features.

    PubMed

    Wang, Lin; Liu, Zhi-Ping; Zhang, Xiang-Sun; Chen, Luonan

    2012-03-01

    Prediction of hot spots in protein interfaces provides crucial information for the research on protein-protein interaction and drug design. Existing machine learning methods generally judge whether a given residue is likely to be a hot spot by extracting features only from the target residue. However, hot spots usually form a small cluster of residues which are tightly packed together at the center of protein interface. With this in mind, we present a novel method to extract hybrid features which incorporate a wide range of information of the target residue and its spatially neighboring residues, i.e. the nearest contact residue in the other face (mirror-contact residue) and the nearest contact residue in the same face (intra-contact residue). We provide a novel random forest (RF) model to effectively integrate these hybrid features for predicting hot spots in protein interfaces. Our method can achieve accuracy (ACC) of 82.4% and Matthew's correlation coefficient (MCC) of 0.482 in Alanine Scanning Energetics Database, and ACC of 77.6% and MCC of 0.429 in Binding Interface Database. In a comparison study, performance of our RF model exceeds other existing methods, such as Robetta, FOLDEF, KFC, KFC2, MINERVA and HotPoint. Of our hybrid features, three physicochemical features of target residues (mass, polarizability and isoelectric point), the relative side-chain accessible surface area and the average depth index of mirror-contact residues are found to be the main discriminative features in hot spots prediction. We also confirm that hot spots tend to form large contact surface areas between two interacting proteins. Source data and code are available at: http://www.aporc.org/doc/wiki/HotSpot.

  6. Mining disease genes using integrated protein-protein interaction and gene-gene co-regulation information.

    PubMed

    Li, Jin; Wang, Limei; Guo, Maozu; Zhang, Ruijie; Dai, Qiguo; Liu, Xiaoyan; Wang, Chunyu; Teng, Zhixia; Xuan, Ping; Zhang, Mingming

    2015-01-01

    In humans, despite the rapid increase in disease-associated gene discovery, a large proportion of disease-associated genes are still unknown. Many network-based approaches have been used to prioritize disease genes. Many networks, such as the protein-protein interaction (PPI), KEGG, and gene co-expression networks, have been used. Expression quantitative trait loci (eQTLs) have been successfully applied for the determination of genes associated with several diseases. In this study, we constructed an eQTL-based gene-gene co-regulation network (GGCRN) and used it to mine for disease genes. We adopted the random walk with restart (RWR) algorithm to mine for genes associated with Alzheimer disease. Compared to the Human Protein Reference Database (HPRD) PPI network alone, the integrated HPRD PPI and GGCRN networks provided faster convergence and revealed new disease-related genes. Therefore, using the RWR algorithm for integrated PPI and GGCRN is an effective method for disease-associated gene mining.

  7. Web3DMol: interactive protein structure visualization based on WebGL.

    PubMed

    Shi, Maoxiang; Gao, Juntao; Zhang, Michael Q

    2017-07-03

    A growing number of web-based databases and tools for protein research are being developed. There is now a widespread need for visualization tools to present the three-dimensional (3D) structure of proteins in web browsers. Here, we introduce our 3D modeling program-Web3DMol-a web application focusing on protein structure visualization in modern web browsers. Users submit a PDB identification code or select a PDB archive from their local disk, and Web3DMol will display and allow interactive manipulation of the 3D structure. Featured functions, such as sequence plot, fragment segmentation, measure tool and meta-information display, are offered for users to gain a better understanding of protein structure. Easy-to-use APIs are available for developers to reuse and extend Web3DMol. Web3DMol can be freely accessed at http://web3dmol.duapp.com/, and the source code is distributed under the MIT license. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Spectroscopic studies on peptides and proteins with cysteine-containing heme regulatory motifs (HRM).

    PubMed

    Schubert, Erik; Florin, Nicole; Duthie, Fraser; Henning Brewitz, H; Kühl, Toni; Imhof, Diana; Hagelueken, Gregor; Schiemann, Olav

    2015-07-01

    The role of heme as a cofactor in enzymatic reactions has been studied for a long time and in great detail. Recently it was discovered that heme can also serve as a signalling molecule in cells but so far only few examples of this regulation have been studied. In order to discover new potentially heme-regulated proteins, we screened protein sequence databases for bacterial proteins that contain sequence features like a Cysteine-Proline (CP) motif, which is known for its heme-binding propensity. Based on this search we synthesized a series of these potential heme regulatory motifs (HRMs). We used cw EPR spectroscopy to investigate whether these sequences do indeed bind to heme and if the spin state of heme is changed upon interaction with the peptides. The corresponding proteins of two potential HRMs, FeoB and GlpF, were expressed and purified and their interaction with heme was studied by cw EPR and UV-Visible (UV-Vis) spectroscopy. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. PACAP Interactions in the Mouse Brain: Implications for Behavioral and Other Disorders

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Acquaah-Mensah, George; Taylor, Ronald C.; Bhave, Sanjiv V.

    2012-01-10

    As an activator of adenylate cyclase, the neuropeptide Pituitary Adenylate Cyclase Activating Peptide (PACAP) impacts levels of cyclic AMP, a key second messenger available in brain cells. PACAP is involved in certain adult behaviors. To elucidate PACAP interactions, a compendium of microarrays representing mRNA expression in the adult mouse whole brain was pooled from the Phenogen database for analysis. A regulatory network was computed based on mutual information between gene pairs using gene expression data across the compendium. Clusters among genes directly linked to PACAP, and probable interactions between corresponding proteins were computed. Database 'experts' affirmed some of the inferredmore » relationships. The findings suggest ADCY7 is probably the adenylate cyclase isoform most relevant to PACAP's action. They also support intervening roles for kinases including GSK3B, PI 3-kinase, SGK3 and AMPK. Other high-confidence interactions are hypothesized for future testing. This new information has implications for certain behavioral and other disorders.« less

  10. Protein Folding and Structure Prediction from the Ground Up: The Atomistic Associative Memory, Water Mediated, Structure and Energy Model.

    PubMed

    Chen, Mingchen; Lin, Xingcheng; Zheng, Weihua; Onuchic, José N; Wolynes, Peter G

    2016-08-25

    The associative memory, water mediated, structure and energy model (AWSEM) is a coarse-grained force field with transferable tertiary interactions that incorporates local in sequence energetic biases using bioinformatically derived structural information about peptide fragments with locally similar sequences that we call memories. The memory information from the protein data bank (PDB) database guides proper protein folding. The structural information about available sequences in the database varies in quality and can sometimes lead to frustrated free energy landscapes locally. One way out of this difficulty is to construct the input fragment memory information from all-atom simulations of portions of the complete polypeptide chain. In this paper, we investigate this approach first put forward by Kwac and Wolynes in a more complete way by studying the structure prediction capabilities of this approach for six α-helical proteins. This scheme which we call the atomistic associative memory, water mediated, structure and energy model (AAWSEM) amounts to an ab initio protein structure prediction method that starts from the ground up without using bioinformatic input. The free energy profiles from AAWSEM show that atomistic fragment memories are sufficient to guide the correct folding when tertiary forces are included. AAWSEM combines the efficiency of coarse-grained simulations on the full protein level with the local structural accuracy achievable from all-atom simulations of only parts of a large protein. The results suggest that a hybrid use of atomistic fragment memory and database memory in structural predictions may well be optimal for many practical applications.

  11. Systematic Differences in Signal Emitting and Receiving Revealed by PageRank Analysis of a Human Protein Interactome

    PubMed Central

    Li, Xiu-Qing

    2012-01-01

    Most protein PageRank studies do not use signal flow direction information in protein interactions because this information was not readily available in large protein databases until recently. Therefore, four questions have yet to be answered: A) What is the general difference between signal emitting and receiving in a protein interactome? B) Which proteins are among the top ranked in directional ranking? C) Are high ranked proteins more evolutionarily conserved than low ranked ones? D) Do proteins with similar ranking tend to have similar subcellular locations? In this study, we address these questions using the forward, reverse, and non-directional PageRank approaches to rank an information-directional network of human proteins and study their evolutionary conservation. The forward ranking gives credit to information receivers, reverse ranking to information emitters, and non-directional ranking mainly to the number of interactions. The protein lists generated by the forward and non-directional rankings are highly correlated, but those by the reverse and non-directional rankings are not. The results suggest that the signal emitting/receiving system is characterized by key-emittings and relatively even receivings in the human protein interactome. Signaling pathway proteins are frequent in top ranked ones. Eight proteins are both informational top emitters and top receivers. Top ranked proteins, except a few species-related novel-function ones, are evolutionarily well conserved. Protein-subunit ranking position reflects subunit function. These results demonstrate the usefulness of different PageRank approaches in characterizing protein networks and provide insights to protein interaction in the cell. PMID:23028653

  12. Molecular modeling of the AhR structure and interactions can shed light on ligand-dependent activation and transformation mechanisms.

    PubMed

    Bonati, Laura; Corrada, Dario; Tagliabue, Sara Giani; Motta, Stefano

    2017-02-01

    Molecular modeling has given important contributions to elucidation of the main stages in the AhR signal transduction pathway. Despite the lack of experimentally determined structures of the AhR functional domains, information derived from homologous systems has been exploited for modeling their structure and interactions. Homology models of the AhR PASB domain have provided information on the binding cavity and contributed to elucidate species-specific differences in ligand binding. Molecular Docking simulations of the ligand binding process have given insights into differences in binding of diverse agonists, antagonists, and selective AhR modulators, and their application to virtual screening of large databases of compounds have allowed identification of novel AhR ligands. Recently available structural information on protein-protein and protein-DNA complexes of other bHLH-PAS systems has opened the way for modeling the AhR:ARNT dimer structure and investigating the mechanisms of AhR transformation and DNA binding. Future research directions should include simulation of the protein dynamics to obtain a more reliable description of intermolecular interactions involved in signal transmission.

  13. Carbohydrate Structure Database: tools for statistical analysis of bacterial, plant and fungal glycomes

    PubMed Central

    Egorova, K.S.; Kondakova, A.N.; Toukach, Ph.V.

    2015-01-01

    Carbohydrates are biological blocks participating in diverse and crucial processes both at cellular and organism levels. They protect individual cells, establish intracellular interactions, take part in the immune reaction and participate in many other processes. Glycosylation is considered as one of the most important modifications of proteins and other biologically active molecules. Still, the data on the enzymatic machinery involved in the carbohydrate synthesis and processing are scattered, and the advance on its study is hindered by the vast bulk of accumulated genetic information not supported by any experimental evidences for functions of proteins that are encoded by these genes. In this article, we present novel instruments for statistical analysis of glycomes in taxa. These tools may be helpful for investigating carbohydrate-related enzymatic activities in various groups of organisms and for comparison of their carbohydrate content. The instruments are developed on the Carbohydrate Structure Database (CSDB) platform and are available freely on the CSDB web-site at http://csdb.glycoscience.ru. Database URL: http://csdb.glycoscience.ru PMID:26337239

  14. 3DSDSCAR--a three dimensional structural database for sialic acid-containing carbohydrates through molecular dynamics simulation.

    PubMed

    Veluraja, Kasinadar; Selvin, Jeyasigamani F A; Venkateshwari, Selvakumar; Priyadarzini, Thanu R K

    2010-09-23

    The inherent flexibility and lack of strong intramolecular interactions of oligosaccharides demand the use of theoretical methods for their structural elucidation. In spite of the developments of theoretical methods, not much research on glycoinformatics is done so far when compared to bioinformatics research on proteins and nucleic acids. We have developed three dimensional structural database for a sialic acid-containing carbohydrates (3DSDSCAR). This is an open-access database that provides 3D structural models of a given sialic acid-containing carbohydrate. At present, 3DSDSCAR contains 60 conformational models, belonging to 14 different sialic acid-containing carbohydrates, deduced through 10 ns molecular dynamics (MD) simulations. The database is available at the URL: http://www.3dsdscar.org. Copyright 2010 Elsevier Ltd. All rights reserved.

  15. Protein Information Resource: a community resource for expert annotation of protein data

    PubMed Central

    Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy

    2001-01-01

    The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter­national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041

  16. Pharmacophore modeling, molecular docking and molecular dynamics studies on natural products database to discover novel skeleton as non-purine xanthine oxidase inhibitors.

    PubMed

    Peng, Jiale; Li, Yaping; Zhou, Yeheng; Zhang, Li; Liu, Xingyong; Zuo, Zhili

    2018-05-29

    Gout is a common inflammatory arthritis caused by the deposition of urate crystals within joints. It is increasingly in prevalence during the past few decades as shown by the epidemiological survey results. Xanthine oxidase (XO) is a key enzyme to transfer hypoxanthine and xanthine to uric acid, whose overproduction leads to gout. Therefore, inhibiting the activity of xanthine oxidase is an important way to reduce the production of urate. In the study, in order to identify the potential natural products targeting XO, pharmacophore modeling was employed to filter databases. Here, two methods, pharmacophore based on ligand and pharmacophore based on receptor-ligand, were constructed by Discovery Studio. Then GOLD was used to refine the potential compounds with higher fitness scores. Finally, molecular docking and dynamics simulations were employed to analyze the interactions between compounds and protein. The best hypothesis was set as a 3D query to screen database, returning 785 and 297 compounds respectively. A merged set of the above 1082 molecules was subjected to molecular docking, which returned 144 hits with high-fitness scores. These molecules were clustered in four main kinds depending on different backbones. What is more, molecular docking showed that the representative compounds established key interactions with the amino acid residues in the protein, and the RMSD and RMSF of molecular dynamics results showed that these compounds can stabilize the protein. The information represented in the study confirmed previous reports. And it may assist to discover and design new backbones as potential XO inhibitors based on natural products.

  17. Mimicking a p53-MDM2 interaction based on a stable immunoglobulin-like domain scaffold.

    PubMed

    Jimenez-Sandoval, Pedro; Madrigal-Carrillo, Ezequiel A; Santamaría-Suárez, Hugo A; Maturana, Daniel; Rentería-González, Itzel; Benitez-Cardoza, Claudia G; Torres-Larios, Alfredo; Brieba, Luis G

    2018-04-26

    Antibodies recognize protein targets with great affinity and specificity. However, posttranslational modifications and the presence of intrinsic disulfide-bonds pose difficulties for their industrial use. The immunoglobulin fold is one of the most ubiquitous folds in nature and it is found in many proteins besides antibodies. An example of a protein family with an immunoglobulin-like fold is the Cysteine Protease Inhibitors (ICP) family I42 of the MEROPs database for protease and protease inhibitors. Members of this protein family are thermostable and do not present internal disulfide bonds. Crystal structures of several ICPs indicate that they resemble the Ig-like domain of the human T cell co-receptor CD8α As ICPs present 2 flexible recognition loops that vary accordingly to their targeted protease, we hypothesize that members of this protein family would be ideal to design peptide aptamers that mimic protein-protein interactions. Herein, we use an ICP variant from Entamoeba histolytica (EhICP1) to mimic the interaction between p53 and MDM2. We found that a 13 amino-acid peptide derived from p53 can be introduced in 2 variable loops (DE, FG) but not the third (BC). Chimeric EhICP1-p53 form a stable complex with MDM2 at a micromolar range. Crystal structure of the EhICP1-p53(FG)-loop variant in complex with MDM2 reveals a swapping subdomain between 2 chimeric molecules, however, the p53 peptide interacts with MDM2 as in previous crystal structures. The structural details of the EhICP1-p53(FG) interaction with MDM2 resemble the interaction between an antibody and MDM2. © 2018 Wiley Periodicals, Inc.

  18. Discovery of multiple interacting partners of gankyrin, a proteasomal chaperone and an oncoprotein--evidence for a common hot spot site at the interface and its functional relevance.

    PubMed

    Nanaware, Padma P; Ramteke, Manoj P; Somavarapu, Arun K; Venkatraman, Prasanna

    2014-07-01

    Gankyrin, a non-ATPase component of the proteasome and a chaperone of proteasome assembly, is also an oncoprotein. Gankyrin regulates a variety of oncogenic signaling pathways in cancer cells and accelerates degradation of tumor suppressor proteins p53 and Rb. Therefore gankyrin may be a unique hub integrating signaling networks with the degradation pathway. To identify new interactions that may be crucial in consolidating its role as an oncogenic hub, crystal structure of gankyrin-proteasome ATPase complex was used to predict novel interacting partners. EEVD, a four amino acid linear sequence seems a hot spot site at this interface. By searching for EEVD in exposed regions of human proteins in PDB database, we predicted 34 novel interactions. Eight proteins were tested and seven of them were found to interact with gankyrin. Affinity of four interactions is high enough for endogenous detection. Others require gankyrin overexpression in HEK 293 cells or occur endogenously in breast cancer cell line- MDA-MB-435, reflecting lower affinity or presence of a deregulated network. Mutagenesis and peptide inhibition confirm that EEVD is the common hot spot site at these interfaces and therefore a potential polypharmacological drug target. In MDA-MB-231 cells in which the endogenous CLIC1 is silenced, trans-expression of Wt protein (CLIC1_EEVD) and not the hot spot site mutant (CLIC1_AAVA) resulted in significant rescue of the migratory potential. Our approach can be extended to identify novel functionally relevant protein-protein interactions, in expansion of oncogenic networks and in identifying potential therapeutic targets. © 2013 Wiley Periodicals, Inc.

  19. Visualization of protein interaction networks: problems and solutions

    PubMed Central

    2013-01-01

    Background Visualization concerns the representation of data visually and is an important task in scientific research. Protein-protein interactions (PPI) are discovered using either wet lab techniques, such mass spectrometry, or in silico predictions tools, resulting in large collections of interactions stored in specialized databases. The set of all interactions of an organism forms a protein-protein interaction network (PIN) and is an important tool for studying the behaviour of the cell machinery. Since graphic representation of PINs may highlight important substructures, e.g. protein complexes, visualization is more and more used to study the underlying graph structure of PINs. Although graphs are well known data structures, there are different open problems regarding PINs visualization: the high number of nodes and connections, the heterogeneity of nodes (proteins) and edges (interactions), the possibility to annotate proteins and interactions with biological information extracted by ontologies (e.g. Gene Ontology) that enriches the PINs with semantic information, but complicates their visualization. Methods In these last years many software tools for the visualization of PINs have been developed. Initially thought for visualization only, some of them have been successively enriched with new functions for PPI data management and PIN analysis. The paper analyzes the main software tools for PINs visualization considering four main criteria: (i) technology, i.e. availability/license of the software and supported OS (Operating System) platforms; (ii) interoperability, i.e. ability to import/export networks in various formats, ability to export data in a graphic format, extensibility of the system, e.g. through plug-ins; (iii) visualization, i.e. supported layout and rendering algorithms and availability of parallel implementation; (iv) analysis, i.e. availability of network analysis functions, such as clustering or mining of the graph, and the possibility to interact with external databases. Results Currently, many tools are available and it is not easy for the users choosing one of them. Some tools offer sophisticated 2D and 3D network visualization making available many layout algorithms, others tools are more data-oriented and support integration of interaction data coming from different sources and data annotation. Finally, some specialistic tools are dedicated to the analysis of pathways and cellular processes and are oriented toward systems biology studies, where the dynamic aspects of the processes being studied are central. Conclusion A current trend is the deployment of open, extensible visualization tools (e.g. Cytoscape), that may be incrementally enriched by the interactomics community with novel and more powerful functions for PIN analysis, through the development of plug-ins. On the other hand, another emerging trend regards the efficient and parallel implementation of the visualization engine that may provide high interactivity and near real-time response time, as in NAViGaTOR. From a technological point of view, open-source, free and extensible tools, like Cytoscape, guarantee a long term sustainability due to the largeness of the developers and users communities, and provide a great flexibility since new functions are continuously added by the developer community through new plug-ins, but the emerging parallel, often closed-source tools like NAViGaTOR, can offer near real-time response time also in the analysis of very huge PINs. PMID:23368786

  20. GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis

    PubMed Central

    Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio

    2017-01-01

    There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet. PMID:28695067

  1. GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis.

    PubMed

    Costa, Raquel L; Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio

    2017-01-01

    There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet.

  2. Competing endogenous RNA and interactome bioinformatic analyses on human telomerase.

    PubMed

    Arancio, Walter; Pizzolanti, Giuseppe; Genovese, Swonild Ilenia; Baiamonte, Concetta; Giordano, Carla

    2014-04-01

    We present a classic interactome bioinformatic analysis and a study on competing endogenous (ce) RNAs for hTERT. The hTERT gene codes for the catalytic subunit and limiting component of the human telomerase complex. Human telomerase reverse transcriptase (hTERT) is essential for the integrity of telomeres. Telomere dysfunctions have been widely reported to be involved in aging, cancer, and cellular senescence. The hTERT gene network has been analyzed using the BioGRID interaction database (http://thebiogrid.org/) and related analysis tools such as Osprey (http://biodata.mshri.on.ca/osprey/servlet/Index) and GeneMANIA (http://genemania.org/). The network of interaction of hTERT transcripts has been further analyzed following the competing endogenous (ce) RNA hypotheses (messenger [m] RNAs cross-talk via micro [mi] RNAs) using the miRWalk database and tools (www.ma.uni-heidelberg.de/apps/zmf/mirwalk/). These analyses suggest a role for Akt, nuclear factor-κB (NF-κB), heat shock protein 90 (HSP90), p70/p80 autoantigen, 14-3-3 proteins, and dynein in telomere functions. Roles for histone acetylation/deacetylation and proteoglycan metabolism are also proposed.

  3. Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information.

    PubMed

    Kim, Sun; Kim, Won; Wei, Chih-Hsuan; Lu, Zhiyong; Wilbur, W John

    2012-01-01

    The Comparative Toxicogenomics Database (CTD) contains manually curated literature that describes chemical-gene interactions, chemical-disease relationships and gene-disease relationships. Finding articles containing this information is the first and an important step to assist manual curation efficiency. However, the complex nature of named entities and their relationships make it challenging to choose relevant articles. In this article, we introduce a machine learning framework for prioritizing CTD-relevant articles based on our prior system for the protein-protein interaction article classification task in BioCreative III. To address new challenges in the CTD task, we explore a new entity identification method for genes, chemicals and diseases. In addition, latent topics are analyzed and used as a feature type to overcome the small size of the training set. Applied to the BioCreative 2012 Triage dataset, our method achieved 0.8030 mean average precision (MAP) in the official runs, resulting in the top MAP system among participants. Integrated with PubTator, a Web interface for annotating biomedical literature, the proposed system also received a positive review from the CTD curation team.

  4. The Interactomic Analysis Reveals Pathogenic Protein Networks in Phomopsis longicolla Underlying Seed Decay of Soybean.

    PubMed

    Li, Shuxian; Musungu, Bryan; Lightfoot, David; Ji, Pingsheng

    2018-01-01

    Phomopsis longicolla T. W. Hobbs (syn. Diaporthe longicolla ) is the primary cause of Phomopsis seed decay (PSD) in soybean, Glycine max (L.) Merrill. This disease results in poor seed quality and is one of the most economically important seed diseases in soybean. The objectives of this study were to infer protein-protein interactions (PPI) and to identify conserved global networks and pathogenicity subnetworks in P. longicolla including orthologous pathways for cell signaling and pathogenesis. The interlog method used in the study identified 215,255 unique PPIs among 3,868 proteins. There were 1,414 pathogenicity related genes in P. longicolla identified using the pathogen host interaction (PHI) database. Additionally, 149 plant cell wall degrading enzymes (PCWDE) were detected. The network captured five different classes of carbohydrate degrading enzymes, including the auxiliary activities, carbohydrate esterases, glycoside hydrolases, glycosyl transferases, and carbohydrate binding molecules. From the PPI analysis, novel interacting partners were determined for each of the PCWDE classes. The most predominant class of PCWDE was a group of 60 glycoside hydrolases proteins. The glycoside hydrolase subnetwork was found to be interacting with 1,442 proteins within the network and was among the largest clusters. The orthologous proteins FUS3, HOG, CYP1, SGE1, and the g5566t.1 gene identified in this study could play an important role in pathogenicity. Therefore, the P. longicolla protein interactome (PiPhom) generated in this study can lead to a better understanding of PPIs in soybean pathogens. Furthermore, the PPI may aid in targeting of genes and proteins for further studies of the pathogenicity mechanisms.

  5. In Silico Investigation of Traditional Chinese Medicine Compounds to Inhibit Human Histone Deacetylase 2 for Patients with Alzheimer's Disease

    PubMed Central

    Hung, Tzu-Chieh; Lee, Wen-Yuan; Chen, Kuen-Bao; Chan, Yueh-Chiu; Lee, Cheng-Chun

    2014-01-01

    Human histone deacetylase 2 (HDAC2) has been identified as being associated with Alzheimer's disease (AD), a neuropathic degenerative disease. In this study, we screen the world's largest Traditional Chinese Medicine (TCM) database for natural compounds that may be useful as lead compounds in the search for inhibitors of HDAC2 function. The technique of molecular docking was employed to select the ten top TCM candidates. We used three prediction models, multiple linear regression (MLR), support vector machine (SVM), and the Bayes network toolbox (BNT), to predict the bioactivity of the TCM candidates. Molecular dynamics simulation provides the protein-ligand interactions of compounds. The bioactivity predictions of pIC50 values suggest that the TCM candidatesm, (−)-Bontl ferulate, monomethylcurcumin, and ningposides C, have a greater effect on HDAC2 inhibition. The structure variation caused by the hydrogen bonds and hydrophobic interactions between protein-ligand interactions indicates that these compounds have an inhibitory effect on the protein. PMID:25045700

  6. Signaling gateway molecule pages—a data model perspective

    PubMed Central

    Dinasarapu, Ashok Reddy; Saunders, Brian; Ozerlat, Iley; Azam, Kenan; Subramaniam, Shankar

    2011-01-01

    Summary: The Signaling Gateway Molecule Pages (SGMP) database provides highly structured data on proteins which exist in different functional states participating in signal transduction pathways. A molecule page starts with a state of a native protein, without any modification and/or interactions. New states are formed with every post-translational modification or interaction with one or more proteins, small molecules or class molecules and with each change in cellular location. State transitions are caused by a combination of one or more modifications, interactions and translocations which then might be associated with one or more biological processes. In a characterized biological state, a molecule can function as one of several entities or their combinations, including channel, receptor, enzyme, transcription factor and transporter. We have also exported SGMP data to the Biological Pathway Exchange (BioPAX) and Systems Biology Markup Language (SBML) as well as in our custom XML. Availability: SGMP is available at www.signaling-gateway.org/molecule. Contact: shankar@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21505029

  7. ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures.

    PubMed

    Konc, Janez; Cesnik, Tomo; Konc, Joanna Trykowska; Penca, Matej; Janežič, Dušanka

    2012-02-27

    ProBiS-Database is a searchable repository of precalculated local structural alignments in proteins detected by the ProBiS algorithm in the Protein Data Bank. Identification of functionally important binding regions of the protein is facilitated by structural similarity scores mapped to the query protein structure. PDB structures that have been aligned with a query protein may be rapidly retrieved from the ProBiS-Database, which is thus able to generate hypotheses concerning the roles of uncharacterized proteins. Presented with uncharacterized protein structure, ProBiS-Database can discern relationships between such a query protein and other better known proteins in the PDB. Fast access and a user-friendly graphical interface promote easy exploration of this database of over 420 million local structural alignments. The ProBiS-Database is updated weekly and is freely available online at http://probis.cmm.ki.si/database.

  8. Molecular Quantum Similarity, Chemical Reactivity and Database Screening of 3D Pharmacophores of the Protein Kinases A, B and G from Mycobacterium tuberculosis.

    PubMed

    Morales-Bayuelo, Alejandro

    2017-06-21

    Mycobacterium tuberculosis remains one of the world's most devastating pathogens. For this reason, we developed a study involving 3D pharmacophore searching, selectivity analysis and database screening for a series of anti-tuberculosis compounds, associated with the protein kinases A, B, and G. This theoretical study is expected to shed some light onto some molecular aspects that could contribute to the knowledge of the molecular mechanics behind interactions of these compounds, with anti-tuberculosis activity. Using the Molecular Quantum Similarity field and reactivity descriptors supported in the Density Functional Theory, it was possible to measure the quantification of the steric and electrostatic effects through the Overlap and Coulomb quantitative convergence (alpha and beta) scales. In addition, an analysis of reactivity indices using global and local descriptors was developed, identifying the binding sites and selectivity on these anti-tuberculosis compounds in the active sites. Finally, the reported pharmacophores to PKn A, B and G, were used to carry out database screening, using a database with anti-tuberculosis drugs from the Kelly Chibale research group (http://www.kellychibaleresearch.uct.ac.za/), to find the compounds with affinity for the specific protein targets associated with PKn A, B and G. In this regard, this hybrid methodology (Molecular Mechanic/Quantum Chemistry) shows new insights into drug design that may be useful in the tuberculosis treatment today.

  9. Comprehensive comparative analysis and identification of RNA-binding protein domains: multi-class classification and feature selection.

    PubMed

    Jahandideh, Samad; Srinivasasainagendra, Vinodh; Zhi, Degui

    2012-11-07

    RNA-protein interaction plays an important role in various cellular processes, such as protein synthesis, gene regulation, post-transcriptional gene regulation, alternative splicing, and infections by RNA viruses. In this study, using Gene Ontology Annotated (GOA) and Structural Classification of Proteins (SCOP) databases an automatic procedure was designed to capture structurally solved RNA-binding protein domains in different subclasses. Subsequently, we applied tuned multi-class SVM (TMCSVM), Random Forest (RF), and multi-class ℓ1/ℓq-regularized logistic regression (MCRLR) for analysis and classifying RNA-binding protein domains based on a comprehensive set of sequence and structural features. In this study, we compared prediction accuracy of three different state-of-the-art predictor methods. From our results, TMCSVM outperforms the other methods and suggests the potential of TMCSVM as a useful tool for facilitating the multi-class prediction of RNA-binding protein domains. On the other hand, MCRLR by elucidating importance of features for their contribution in predictive accuracy of RNA-binding protein domains subclasses, helps us to provide some biological insights into the roles of sequences and structures in protein-RNA interactions.

  10. De novo Transcriptome Analysis of Rhizoctonia solani AG1 IA Strain Early Invasion in Zoysia japonica Root.

    PubMed

    Zhu, Chen; Ai, Lin; Wang, Li; Yin, Pingping; Liu, Chenglan; Li, Shanshan; Zeng, Huiming

    2016-01-01

    Zoysia japonica brown spot was caused by necrotrophic fungus Rhizoctonia solani invasion, which led to severe financial loss in city lawn and golf ground maintenance. However, little was known about the molecular mechanism of R. solani pathogenicity in Z. japonica. In this study we examined early stage interaction between R. solani AG1 IA strain and Z. japonica cultivar "Zenith" root by cell ultra-structure analysis, pathogenesis-related proteins assay and transcriptome analysis to explore molecular clues for AG1 IA strain pathogenicity in Z. japonica. No obvious cell structure damage was found in infected roots and most pathogenesis-related protein activities showedg a downward trend especially in 36 h post inoculation, which exhibits AG1 IA strain stealthy invasion characteristic. According to Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) database classification, most DEGs in infected "Zenith" roots dynamically changed especially in three aspects, signal transduction, gene translation, and protein synthesis. Total 3422 unigenes of "Zenith" root were predicted into 14 kinds of resistance (R) gene class. Potential fungal resistance related unigenes of "Zenith" root were involved in ligin biosynthesis, phytoalexin synthesis, oxidative burst, wax biosynthesis, while two down-regulated unigenes encoding leucine-rich repeat receptor protein kinase and subtilisin-like protease might be important for host-derived signal perception to AG1 IA strain invasion. According to Pathogen Host Interaction (PHI) database annotation, 1508 unigenes of AG1 IA strain were predicted and classified into 37 known pathogen species, in addition, unigenes encoding virulence, signaling, host stress tolerance, and potential effector were also predicted. This research uncovered transcriptional profiling during the early phase interaction between R. solani AG1 IA strain and Z. japonica, and will greatly help identify key pathogenicity of AG1 IA strain.

  11. An attempt to understand glioma stem cell biology through centrality analysis of a protein interaction network.

    PubMed

    Mallik, Mrinmay Kumar

    2018-02-07

    Biological networks can be analyzed using "Centrality Analysis" to identify the more influential nodes and interactions in the network. This study was undertaken to create and visualize a biological network comprising of protein-protein interactions (PPIs) amongst proteins which are preferentially over-expressed in glioma cancer stem cell component (GCSC) of glioblastomas as compared to the glioma non-stem cancer cell (GNSC) component and then to analyze this network through centrality analyses (CA) in order to identify the essential proteins in this network and their interactions. In addition, this study proposes a new centrality analysis method pertaining exclusively to transcription factors (TFs) and interactions amongst them. Moreover the relevant molecular functions, biological processes and biochemical pathways amongst these proteins were sought through enrichment analysis. A protein interaction network was created using a list of proteins which have been shown to be preferentially expressed or over-expressed in GCSCs isolated from glioblastomas as compared to the GNSCs. This list comprising of 38 proteins, created using manual literature mining, was submitted to the Reactome FIViz tool, a web based application integrated into Cytoscape, an open source software platform for visualizing and analyzing molecular interaction networks and biological pathways to produce the network. This network was subjected to centrality analyses utilizing ranked lists of six centrality measures using the FIViz application and (for the first time) a dedicated centrality analysis plug-in ; CytoNCA. The interactions exclusively amongst the transcription factors were nalyzed through a newly proposed centrality analysis method called "Gene Expression Associated Degree Centrality Analysis (GEADCA)". Enrichment analysis was performed using the "network function analysis" tool on Reactome. The CA was able to identify a small set of proteins with consistently high centrality ranks that is indicative of their strong influence in the protein protein interaction network. Similarly the newly proposed GEADCA helped identify the transcription factors with high centrality values indicative of their key roles in transcriptional regulation. The enrichment studies provided a list of molecular functions, biological processes and biochemical pathways associated with the constructed network. The study shows how pathway based databases may be used to create and analyze a relevant protein interaction network in glioma cancer stem cells and identify the essential elements within it to gather insights into the molecular interactions that regulate the properties of glioma stem cells. How these insights may be utilized to help the development of future research towards formulation of new management strategies have been discussed from a theoretical standpoint. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. SBION: A Program for Analyses of Salt-Bridges from Multiple Structure Files.

    PubMed

    Gupta, Parth Sarthi Sen; Mondal, Sudipta; Mondal, Buddhadev; Islam, Rifat Nawaz Ul; Banerjee, Shyamashree; Bandyopadhyay, Amal K

    2014-01-01

    Salt-bridge and network salt-bridge are specific electrostatic interactions that contribute to the overall stability of proteins. In hierarchical protein folding model, these interactions play crucial role in nucleation process. The advent and growth of protein structure database and its availability in public domain made an urgent need for context dependent rapid analysis of salt-bridges. While these analyses on single protein is cumbersome and time-consuming, batch analyses need efficient software for rapid topological scan of a large number of protein for extracting details on (i) fraction of salt-bridge residues (acidic and basic). (ii) Chain specific intra-molecular salt-bridges, (iii) inter-molecular salt-bridges (protein-protein interactions) in all possible binary combinations (iv) network salt-bridges and (v) secondary structure distribution of salt-bridge residues. To the best of our knowledge, such efficient software is not available in public domain. At this juncture, we have developed a program i.e. SBION which can perform all the above mentioned computations for any number of protein with any number of chain at any given distance of ion-pair. It is highly efficient, fast, error-free and user friendly. Finally we would say that our SBION indeed possesses potential for applications in the field of structural and comparative bioinformatics studies. SBION is freely available for non-commercial/academic institutions on formal request to the corresponding author (akbanerjee@biotech.buruniv.ac.in).

  13. Search for novel remedies to augment radiation resistance of inhabitants of Fukushima and Chernobyl disasters: identifying DNA repair protein XRCC4 inhibitors.

    PubMed

    Sun, Mao-Feng; Chen, Hsin-Yi; Tsai, Fuu-Jen; Lui, Shu-Hui; Chen, Chih-Yi; Chen, Calvin Yu-Chian

    2011-10-01

    Two nuclear plant disasters occurring within a span of 25 years threaten health and genome integrity both in Fukushima and Chernobyl. Search for remedies capable of enhancing DNA repair efficiency and radiation resistance in humans appears to be a urgent problem for now. XRCC4 is an important enhancer in promoting repair pathway triggered by DNA double-strand break (DSB). In the context of radiation therapy, active XRCC4 could reduce DSB-mediated apoptotic effect on cancer cells. Hence, developing XRCC4 inhibitors could possibly enhance radiotherapy outcomes. In this study, we screened traditional Chinese medicine (TCM) database, TCM Database@Taiwan, and have identified three potent inhibitor agents against XRCC4. Through molecular dynamics simulation, we have determined that the protein-ligand interactions were focused at Lys188 on chain A and Lys187 on chain B. Intriguingly, the hydrogen bonds for all three ligands fluctuated frequently but were held at close approximation. The pi-cation interactions and ionic interactions mediated by o-hydroxyphenyl and carboxyl functional groups respectively have been demonstrated to play critical roles in stabilizing binding conformations. Based on these results, we reported the identification of potential radiotherapy enhancers from TCM. We further characterized the key binding elements for inhibiting the XRCC4 activities.

  14. The protein interactome of collapsin response mediator protein-2 (CRMP2/DPYSL2) reveals novel partner proteins in brain tissue.

    PubMed

    Martins-de-Souza, Daniel; Cassoli, Juliana S; Nascimento, Juliana M; Hensley, Kenneth; Guest, Paul C; Pinzon-Velasco, Andres M; Turck, Christoph W

    2015-10-01

    Collapsin response mediator protein-2 (CRMP2) is a CNS protein involved in neuronal development, axonal and neuronal growth, cell migration, and protein trafficking. Recent studies have linked perturbations in CRMP2 function to neurodegenerative disorders such as Alzheimer's disease, neuropathic pain, and Batten disease, and to psychiatric disorders such as schizophrenia. Like most proteins, CRMP2 functions though interactions with a molecular network of proteins and other molecules. Here, we have attempted to identify additional proteins of the CRMP2 interactome to provide further leads about its roles in neurological functions. We used a combined co-immunoprecipitation and shotgun proteomic approach in order to identify CRMP2 protein partners. We identified 78 CRMP2 protein partners not previously reported in public protein interaction databases. These were involved in seven biological processes, which included cell signaling, growth, metabolism, trafficking, and immune function, according to Gene Ontology classifications. Furthermore, 32 different molecular functions were found to be associated with these proteins, such as RNA binding, ribosomal functions, transporter activity, receptor activity, serine/threonine phosphatase activity, cell adhesion, cytoskeletal protein binding and catalytic activity. In silico pathway interactome construction revealed a highly connected network with the most overrepresented functions corresponding to semaphorin interactions, along with axon guidance and WNT5A signaling. Taken together, these findings suggest that the CRMP2 pathway is critical for regulating neuronal and synaptic architecture. Further studies along these lines might uncover novel biomarkers and drug targets for use in drug discovery. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Choosing an Optimal Database for Protein Identification from Tandem Mass Spectrometry Data.

    PubMed

    Kumar, Dhirendra; Yadav, Amit Kumar; Dash, Debasis

    2017-01-01

    Database searching is the preferred method for protein identification from digital spectra of mass to charge ratios (m/z) detected for protein samples through mass spectrometers. The search database is one of the major influencing factors in discovering proteins present in the sample and thus in deriving biological conclusions. In most cases the choice of search database is arbitrary. Here we describe common search databases used in proteomic studies and their impact on final list of identified proteins. We also elaborate upon factors like composition and size of the search database that can influence the protein identification process. In conclusion, we suggest that choice of the database depends on the type of inferences to be derived from proteomics data. However, making additional efforts to build a compact and concise database for a targeted question should generally be rewarding in achieving confident protein identifications.

  16. Autophagy Regulatory Network - a systems-level bioinformatics resource for studying the mechanism and regulation of autophagy.

    PubMed

    Türei, Dénes; Földvári-Nagy, László; Fazekas, Dávid; Módos, Dezső; Kubisch, János; Kadlecsik, Tamás; Demeter, Amanda; Lenti, Katalin; Csermely, Péter; Vellai, Tibor; Korcsmáros, Tamás

    2015-01-01

    Autophagy is a complex cellular process having multiple roles, depending on tissue, physiological, or pathological conditions. Major post-translational regulators of autophagy are well known, however, they have not yet been collected comprehensively. The precise and context-dependent regulation of autophagy necessitates additional regulators, including transcriptional and post-transcriptional components that are listed in various datasets. Prompted by the lack of systems-level autophagy-related information, we manually collected the literature and integrated external resources to gain a high coverage autophagy database. We developed an online resource, Autophagy Regulatory Network (ARN; http://autophagy-regulation.org), to provide an integrated and systems-level database for autophagy research. ARN contains manually curated, imported, and predicted interactions of autophagy components (1,485 proteins with 4,013 interactions) in humans. We listed 413 transcription factors and 386 miRNAs that could regulate autophagy components or their protein regulators. We also connected the above-mentioned autophagy components and regulators with signaling pathways from the SignaLink 2 resource. The user-friendly website of ARN allows researchers without computational background to search, browse, and download the database. The database can be downloaded in SQL, CSV, BioPAX, SBML, PSI-MI, and in a Cytoscape CYS file formats. ARN has the potential to facilitate the experimental validation of novel autophagy components and regulators. In addition, ARN helps the investigation of transcription factors, miRNAs and signaling pathways implicated in the control of the autophagic pathway. The list of such known and predicted regulators could be important in pharmacological attempts against cancer and neurodegenerative diseases.

  17. A Systems Biology Approach to Reveal Putative Host-Derived Biomarkers of Periodontitis by Network Topology Characterization of MMP-REDOX/NO and Apoptosis Integrated Pathways.

    PubMed

    Zeidán-Chuliá, Fares; Gürsoy, Mervi; Neves de Oliveira, Ben-Hur; Özdemir, Vural; Könönen, Eija; Gürsoy, Ulvi K

    2015-01-01

    Periodontitis, a formidable global health burden, is a common chronic disease that destroys tooth-supporting tissues. Biomarkers of the early phase of this progressive disease are of utmost importance for global health. In this context, saliva represents a non-invasive biosample. By using systems biology tools, we aimed to (1) identify an integrated interactome between matrix metalloproteinase (MMP)-REDOX/nitric oxide (NO) and apoptosis upstream pathways of periodontal inflammation, and (2) characterize the attendant topological network properties to uncover putative biomarkers to be tested in saliva from patients with periodontitis. Hence, we first generated a protein-protein network model of interactions ("BIOMARK" interactome) by using the STRING 10 database, a search tool for the retrieval of interacting genes/proteins, with "Experiments" and "Databases" as input options and a confidence score of 0.400. Second, we determined the centrality values (closeness, stress, degree or connectivity, and betweenness) for the "BIOMARK" members by using the Cytoscape software. We found Ubiquitin C (UBC), Jun proto-oncogene (JUN), and matrix metalloproteinase-14 (MMP14) as the most central hub- and non-hub-bottlenecks among the 211 genes/proteins of the whole interactome. We conclude that UBC, JUN, and MMP14 are likely an optimal candidate group of host-derived biomarkers, in combination with oral pathogenic bacteria-derived proteins, for detecting periodontitis at its early phase by using salivary samples from patients. These findings therefore have broader relevance for systems medicine in global health as well.

  18. RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information.

    PubMed

    Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan

    2016-10-07

    RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential information pertaining to an RBP, like overall function annotations, are provided. The web server can be accessed at the following link: http://caps.ncbs.res.in/rstrucfam .

  19. The Interactomic Analysis Reveals Pathogenic Protein Networks in Phomopsis longicolla Underlying Seed Decay of Soybean

    PubMed Central

    Li, Shuxian; Musungu, Bryan; Lightfoot, David; Ji, Pingsheng

    2018-01-01

    Phomopsis longicolla T. W. Hobbs (syn. Diaporthe longicolla) is the primary cause of Phomopsis seed decay (PSD) in soybean, Glycine max (L.) Merrill. This disease results in poor seed quality and is one of the most economically important seed diseases in soybean. The objectives of this study were to infer protein–protein interactions (PPI) and to identify conserved global networks and pathogenicity subnetworks in P. longicolla including orthologous pathways for cell signaling and pathogenesis. The interlog method used in the study identified 215,255 unique PPIs among 3,868 proteins. There were 1,414 pathogenicity related genes in P. longicolla identified using the pathogen host interaction (PHI) database. Additionally, 149 plant cell wall degrading enzymes (PCWDE) were detected. The network captured five different classes of carbohydrate degrading enzymes, including the auxiliary activities, carbohydrate esterases, glycoside hydrolases, glycosyl transferases, and carbohydrate binding molecules. From the PPI analysis, novel interacting partners were determined for each of the PCWDE classes. The most predominant class of PCWDE was a group of 60 glycoside hydrolases proteins. The glycoside hydrolase subnetwork was found to be interacting with 1,442 proteins within the network and was among the largest clusters. The orthologous proteins FUS3, HOG, CYP1, SGE1, and the g5566t.1 gene identified in this study could play an important role in pathogenicity. Therefore, the P. longicolla protein interactome (PiPhom) generated in this study can lead to a better understanding of PPIs in soybean pathogens. Furthermore, the PPI may aid in targeting of genes and proteins for further studies of the pathogenicity mechanisms. PMID:29666630

  20. Predicting Drug-Target Interactions for New Drug Compounds Using a Weighted Nearest Neighbor Profile.

    PubMed

    van Laarhoven, Twan; Marchiori, Elena

    2013-01-01

    In silico discovery of interactions between drug compounds and target proteins is of core importance for improving the efficiency of the laborious and costly experimental determination of drug-target interaction. Drug-target interaction data are available for many classes of pharmaceutically useful target proteins including enzymes, ion channels, GPCRs and nuclear receptors. However, current drug-target interaction databases contain a small number of drug-target pairs which are experimentally validated interactions. In particular, for some drug compounds (or targets) there is no available interaction. This motivates the need for developing methods that predict interacting pairs with high accuracy also for these 'new' drug compounds (or targets). We show that a simple weighted nearest neighbor procedure is highly effective for this task. We integrate this procedure into a recent machine learning method for drug-target interaction we developed in previous work. Results of experiments indicate that the resulting method predicts true interactions with high accuracy also for new drug compounds and achieves results comparable or better than those of recent state-of-the-art algorithms. Software is publicly available at http://cs.ru.nl/~tvanlaarhoven/drugtarget2013/.

  1. Kinetic parameters of cholinesterase interactions with organophosphates: retrieval and comparison tools available through ESTHER database: ESTerases, alpha/beta Hydrolase Enzymes and Relatives.

    PubMed

    Chatonnet, A; Hotelier, T; Cousin, X

    1999-05-14

    Cholinesterases are targets for organophosphorus compounds which are used as insecticides, chemical warfare agents and drugs for the treatment of disease such as glaucoma, or parasitic infections. The widespread use of these chemicals explains the growing of this area of research and the ever increasing number of sequences, structures, or biochemical data available. Future advances will depend upon effective management of existing information as well as upon creation of new knowledge. The ESTHER database goal is to facilitate retrieval and comparison of data about structure and function of proteins presenting the alpha/beta hydrolase fold. Protein engineering and in vitro production of enzymes allow direct comparison of biochemical parameters. Kinetic parameters of enzymatic reactions are now included in the database. These parameters can be searched and compared with a table construction tool. ESTHER can be reached through internet (http://www.ensam.inra.fr/cholinesterase). The full database or the specialised X-window Client-server system can be downloaded from our ftp server (ftp://ftp.toulouse.inra.fr./pub/esther). Forms can be used to send updates or corrections directly from the web.

  2. Mapping PDB chains to UniProtKB entries.

    PubMed

    Martin, Andrew C R

    2005-12-01

    UniProtKB/SwissProt is the main resource for detailed annotations of protein sequences. This database provides a jumping-off point to many other resources through the links it provides. Among others, these include other primary databases, secondary databases, the Gene Ontology and OMIM. While a large number of links are provided to Protein Data Bank (PDB) files, obtaining a regularly updated mapping between UniProtKB entries and PDB entries at the chain or residue level is not straightforward. In particular, there is no regularly updated resource which allows a UniProtKB/SwissProt entry to be identified for a given residue of a PDB file. We have created a completely automatically maintained database which maps PDB residues to residues in UniProtKB/SwissProt and UniProtKB/trEMBL entries. The protocol uses links from PDB to UniProtKB, from UniProtKB to PDB and a brute-force sequence scan to resolve PDB chains for which no annotated link is available. Finally the sequences from PDB and UniProtKB are aligned to obtain a residue-level mapping. The resource may be queried interactively or downloaded from http://www.bioinf.org.uk/pdbsws/.

  3. Using the underlying biological organization of the Mycobacterium tuberculosis functional network for protein function prediction.

    PubMed

    Mazandu, Gaston K; Mulder, Nicola J

    2012-07-01

    Despite ever-increasing amounts of sequence and functional genomics data, there is still a deficiency of functional annotation for many newly sequenced proteins. For Mycobacterium tuberculosis (MTB), more than half of its genome is still uncharacterized, which hampers the search for new drug targets within the bacterial pathogen and limits our understanding of its pathogenicity. As for many other genomes, the annotations of proteins in the MTB proteome were generally inferred from sequence homology, which is effective but its applicability has limitations. We have carried out large-scale biological data integration to produce an MTB protein functional interaction network. Protein functional relationships were extracted from the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database, and additional functional interactions from microarray, sequence and protein signature data. The confidence level of protein relationships in the additional functional interaction data was evaluated using a dynamic data-driven scoring system. This functional network has been used to predict functions of uncharacterized proteins using Gene Ontology (GO) terms, and the semantic similarity between these terms measured using a state-of-the-art GO similarity metric. To achieve better trade-off between improvement of quality, genomic coverage and scalability, this prediction is done by observing the key principles driving the biological organization of the functional network. This study yields a new functionally characterized MTB strain CDC1551 proteome, consisting of 3804 and 3698 proteins out of 4195 with annotations in terms of the biological process and molecular function ontologies, respectively. These data can contribute to research into the Development of effective anti-tubercular drugs with novel biological mechanisms of action. Copyright © 2011 Elsevier B.V. All rights reserved.

  4. A Community Standard Format for the Representation of Protein Affinity Reagents*

    PubMed Central

    Gloriam, David E.; Orchard, Sandra; Bertinetti, Daniela; Björling, Erik; Bongcam-Rudloff, Erik; Borrebaeck, Carl A. K.; Bourbeillon, Julie; Bradbury, Andrew R. M.; de Daruvar, Antoine; Dübel, Stefan; Frank, Ronald; Gibson, Toby J.; Gold, Larry; Haslam, Niall; Herberg, Friedrich W.; Hiltke, Tara; Hoheisel, Jörg D.; Kerrien, Samuel; Koegl, Manfred; Konthur, Zoltán; Korn, Bernhard; Landegren, Ulf; Montecchi-Palazzi, Luisa; Palcy, Sandrine; Rodriguez, Henry; Schweinsberg, Sonja; Sievert, Volker; Stoevesandt, Oda; Taussig, Michael J.; Ueffing, Marius; Uhlén, Mathias; van der Maarel, Silvère; Wingren, Christer; Woollard, Peter; Sherman, David J.; Hermjakob, Henning

    2010-01-01

    Protein affinity reagents (PARs), most commonly antibodies, are essential reagents for protein characterization in basic research, biotechnology, and diagnostics as well as the fastest growing class of therapeutics. Large numbers of PARs are available commercially; however, their quality is often uncertain. In addition, currently available PARs cover only a fraction of the human proteome, and their cost is prohibitive for proteome scale applications. This situation has triggered several initiatives involving large scale generation and validation of antibodies, for example the Swedish Human Protein Atlas and the German Antibody Factory. Antibodies targeting specific subproteomes are being pursued by members of Human Proteome Organisation (plasma and liver proteome projects) and the United States National Cancer Institute (cancer-associated antigens). ProteomeBinders, a European consortium, aims to set up a resource of consistently quality-controlled protein-binding reagents for the whole human proteome. An ultimate PAR database resource would allow consumers to visit one on-line warehouse and find all available affinity reagents from different providers together with documentation that facilitates easy comparison of their cost and quality. However, in contrast to, for example, nucleotide databases among which data are synchronized between the major data providers, current PAR producers, quality control centers, and commercial companies all use incompatible formats, hindering data exchange. Here we propose Proteomics Standards Initiative (PSI)-PAR as a global community standard format for the representation and exchange of protein affinity reagent data. The PSI-PAR format is maintained by the Human Proteome Organisation PSI and was developed within the context of ProteomeBinders by building on a mature proteomics standard format, PSI-molecular interaction, which is a widely accepted and established community standard for molecular interaction data. Further information and documentation are available on the PSI-PAR web site. PMID:19674966

  5. Improving compound-protein interaction prediction by building up highly credible negative samples.

    PubMed

    Liu, Hui; Sun, Jianjiang; Guan, Jihong; Zheng, Jie; Zhou, Shuigeng

    2015-06-15

    Computational prediction of compound-protein interactions (CPIs) is of great importance for drug design and development, as genome-scale experimental validation of CPIs is not only time-consuming but also prohibitively expensive. With the availability of an increasing number of validated interactions, the performance of computational prediction approaches is severely impended by the lack of reliable negative CPI samples. A systematic method of screening reliable negative sample becomes critical to improving the performance of in silico prediction methods. This article aims at building up a set of highly credible negative samples of CPIs via an in silico screening method. As most existing computational models assume that similar compounds are likely to interact with similar target proteins and achieve remarkable performance, it is rational to identify potential negative samples based on the converse negative proposition that the proteins dissimilar to every known/predicted target of a compound are not much likely to be targeted by the compound and vice versa. We integrated various resources, including chemical structures, chemical expression profiles and side effects of compounds, amino acid sequences, protein-protein interaction network and functional annotations of proteins, into a systematic screening framework. We first tested the screened negative samples on six classical classifiers, and all these classifiers achieved remarkably higher performance on our negative samples than on randomly generated negative samples for both human and Caenorhabditis elegans. We then verified the negative samples on three existing prediction models, including bipartite local model, Gaussian kernel profile and Bayesian matrix factorization, and found that the performances of these models are also significantly improved on the screened negative samples. Moreover, we validated the screened negative samples on a drug bioactivity dataset. Finally, we derived two sets of new interactions by training an support vector machine classifier on the positive interactions annotated in DrugBank and our screened negative interactions. The screened negative samples and the predicted interactions provide the research community with a useful resource for identifying new drug targets and a helpful supplement to the current curated compound-protein databases. Supplementary files are available at: http://admis.fudan.edu.cn/negative-cpi/. © The Author 2015. Published by Oxford University Press.

  6. FunRich proteomics software analysis, let the fun begin!

    PubMed

    Benito-Martin, Alberto; Peinado, Héctor

    2015-08-01

    Protein MS analysis is the preferred method for unbiased protein identification. It is normally applied to a large number of both small-scale and high-throughput studies. However, user-friendly computational tools for protein analysis are still needed. In this issue, Mathivanan and colleagues (Proteomics 2015, 15, 2597-2601) report the development of FunRich software, an open-access software that facilitates the analysis of proteomics data, providing tools for functional enrichment and interaction network analysis of genes and proteins. FunRich is a reinterpretation of proteomic software, a standalone tool combining ease of use with customizable databases, free access, and graphical representations. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Update of the androgen receptor gene mutations database.

    PubMed

    Gottlieb, B; Beitel, L K; Lumbroso, R; Pinsky, L; Trifiro, M

    1999-01-01

    The current version of the androgen receptor (AR) gene mutations database is described. The total number of reported mutations has risen from 309 to 374 during the past year. We have expanded the database by adding information on AR-interacting proteins; and we have improved the database by identifying those mutation entries that have been updated. Mutations of unknown significance have now been reported in both the 5' and 3' untranslated regions of the AR gene, and in individuals who are somatic mosaics constitutionally. In addition, single nucleotide polymorphisms, including silent mutations, have been discovered in normal individuals and in individuals with male infertility. A mutation hotspot associated with prostatic cancer has been identified in exon 5. The database is available on the internet (http://www.mcgill.ca/androgendb/), from EMBL-European Bioinformatics Institute (ftp.ebi.ac.uk/pub/databases/androgen), or as a Macintosh FilemakerPro or Word file (MC33@musica.mcgill.ca). Copyright 1999 Wiley-Liss, Inc.

  8. Organizing, exploring, and analyzing antibody sequence data: the case for relational-database managers.

    PubMed

    Owens, John

    2009-01-01

    Technological advances in the acquisition of DNA and protein sequence information and the resulting onrush of data can quickly overwhelm the scientist unprepared for the volume of information that must be evaluated and carefully dissected to discover its significance. Few laboratories have the luxury of dedicated personnel to organize, analyze, or consistently record a mix of arriving sequence data. A methodology based on a modern relational-database manager is presented that is both a natural storage vessel for antibody sequence information and a conduit for organizing and exploring sequence data and accompanying annotation text. The expertise necessary to implement such a plan is equal to that required by electronic word processors or spreadsheet applications. Antibody sequence projects maintained as independent databases are selectively unified by the relational-database manager into larger database families that contribute to local analyses, reports, interactive HTML pages, or exported to facilities dedicated to sophisticated sequence analysis techniques. Database files are transposable among current versions of Microsoft, Macintosh, and UNIX operating systems.

  9. Genome-Wide Prediction and Validation of Peptides That Bind Human Prosurvival Bcl-2 Proteins

    PubMed Central

    DeBartolo, Joe; Taipale, Mikko; Keating, Amy E.

    2014-01-01

    Programmed cell death is regulated by interactions between pro-apoptotic and prosurvival members of the Bcl-2 family. Pro-apoptotic family members contain a weakly conserved BH3 motif that can adopt an alpha-helical structure and bind to a groove on prosurvival partners Bcl-xL, Bcl-w, Bcl-2, Mcl-1 and Bfl-1. Peptides corresponding to roughly 13 reported BH3 motifs have been verified to bind in this manner. Due to their short lengths and low sequence conservation, BH3 motifs are not detected using standard sequence-based bioinformatics approaches. Thus, it is possible that many additional proteins harbor BH3-like sequences that can mediate interactions with the Bcl-2 family. In this work, we used structure-based and data-based Bcl-2 interaction models to find new BH3-like peptides in the human proteome. We used peptide SPOT arrays to test candidate peptides for interaction with one or more of the prosurvival proteins Bcl-xL, Bcl-w, Bcl-2, Mcl-1 and Bfl-1. For the 36 most promising array candidates, we quantified binding to all five human receptors using direct and competition binding assays in solution. All 36 peptides showed evidence of interaction with at least one prosurvival protein, and 22 peptides bound at least one prosurvival protein with a dissociation constant between 1 and 500 nM; many peptides had specificity profiles not previously observed. We also screened the full-length parent proteins of a subset of array-tested peptides for binding to Bcl-xL and Mcl-1. Finally, we used the peptide binding data, in conjunction with previously reported interactions, to assess the affinity and specificity prediction performance of different models. PMID:24967846

  10. Characterization of a Protein Interactome by Co-Immunoprecipitation and Shotgun Mass Spectrometry.

    PubMed

    Maccarrone, Giuseppina; Bonfiglio, Juan Jose; Silberstein, Susana; Turck, Christoph W; Martins-de-Souza, Daniel

    2017-01-01

    Identifying the partners of a given protein (the interactome) may provide leads about the protein's function and the molecular mechanisms in which it is involved. One of the alternative strategies used to characterize protein interactomes consists of co-immunoprecipitation (co-IP) followed by shotgun mass spectrometry. This enables the isolation and identification of a protein target in its native state and its interactome from cells or tissue lysates under physiological conditions. In this chapter, we describe a co-IP protocol for interactome studies that uses an antibody against a protein of interest bound to protein A/G plus agarose beads to isolate a protein complex. The interacting proteins may be further fractionated by SDS-PAGE, followed by in-gel tryptic digestion and nano liquid chromatography high-resolution tandem mass spectrometry (nLC ESI-MS/MS) for identification purposes. The computational tools, strategy for protein identification, and use of interactome databases also will be described.

  11. Identification of key genes and pathways associated with neuropathic pain in uninjured dorsal root ganglion by using bioinformatic analysis.

    PubMed

    Chen, Chao-Jin; Liu, De-Zhao; Yao, Wei-Feng; Gu, Yu; Huang, Fei; Hei, Zi-Qing; Li, Xiang

    2017-01-01

    Neuropathic pain is a complex chronic condition occurring post-nervous system damage. The transcriptional reprogramming of injured dorsal root ganglia (DRGs) drives neuropathic pain. However, few comparative analyses using high-throughput platforms have investigated uninjured DRG in neuropathic pain, and potential interactions among differentially expressed genes (DEGs) and pathways were not taken into consideration. The aim of this study was to identify changes in genes and pathways associated with neuropathic pain in uninjured L4 DRG after L5 spinal nerve ligation (SNL) by using bioinformatic analysis. The microarray profile GSE24982 was downloaded from the Gene Expression Omnibus database to identify DEGs between DRGs in SNL and sham rats. The prioritization for these DEGs was performed using the Toppgene database followed by gene ontology and pathway enrichment analyses. The relationships among DEGs from the protein interactive perspective were analyzed using protein-protein interaction (PPI) network and module analysis. Real-time polymerase chain reaction (PCR) and Western blotting were used to confirm the expression of DEGs in the rodent neuropathic pain model. A total of 206 DEGs that might play a role in neuropathic pain were identified in L4 DRG, of which 75 were upregulated and 131 were downregulated. The upregulated DEGs were enriched in biological processes related to transcription regulation and molecular functions such as DNA binding, cell cycle, and the FoxO signaling pathway. Ctnnb1 protein had the highest connectivity degrees in the PPI network. The in vivo studies also validated that mRNA and protein levels of Ctnnb1 were upregulated in both L4 and L5 DRGs. This study provides insight into the functional gene sets and pathways associated with neuropathic pain in L4 uninjured DRG after L5 SNL, which might promote our understanding of the molecular mechanisms underlying the development of neuropathic pain.

  12. MIPS: a database for genomes and protein sequences.

    PubMed Central

    Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D

    1999-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138

  13. Drug-Target Interactions: Prediction Methods and Applications.

    PubMed

    Anusuya, Shanmugam; Kesherwani, Manish; Priya, K Vishnu; Vimala, Antonydhason; Shanmugam, Gnanendra; Velmurugan, Devadasan; Gromiha, M Michael

    2018-01-01

    Identifying the interactions between drugs and target proteins is a key step in drug discovery. This not only aids to understand the disease mechanism, but also helps to identify unexpected therapeutic activity or adverse side effects of drugs. Hence, drug-target interaction prediction becomes an essential tool in the field of drug repurposing. The availability of heterogeneous biological data on known drug-target interactions enabled many researchers to develop various computational methods to decipher unknown drug-target interactions. This review provides an overview on these computational methods for predicting drug-target interactions along with available webservers and databases for drug-target interactions. Further, the applicability of drug-target interactions in various diseases for identifying lead compounds has been outlined. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. PCOGR: phylogenetic COG ranking as an online tool to judge the specificity of COGs with respect to freely definable groups of organisms.

    PubMed

    Meereis, Florian; Kaufmann, Michael

    2004-10-15

    The rapidly increasing number of completely sequenced genomes led to the establishment of the COG-database which, based on sequence homologies, assigns similar proteins from different organisms to clusters of orthologous groups (COGs). There are several bioinformatic studies that made use of this database to determine (hyper)thermophile-specific proteins by searching for COGs containing (almost) exclusively proteins from (hyper)thermophilic genomes. However, public software to perform individually definable group-specific searches is not available. The tool described here exactly fills this gap. The software is accessible at http://www.uni-wh.de/pcogr and is linked to the COG-database. The user can freely define two groups of organisms by selecting for each of the (current) 66 organisms to belong either to groupA, to the reference groupB or to be ignored by the algorithm. Then, for all COGs a specificity index is calculated with respect to the specificity to groupA, i. e. high scoring COGs contain proteins from the most of groupA organisms while proteins from the most organisms assigned to groupB are absent. In addition to ranking all COGs according to the user defined specificity criteria, a graphical visualization shows the distribution of all COGs by displaying their abundance as a function of their specificity indexes. This software allows detecting COGs specific to a predefined group of organisms. All COGs are ranked in the order of their specificity and a graphical visualization allows recognizing (i) the presence and abundance of such COGs and (ii) the phylogenetic relationship between groupA- and groupB-organisms. The software also allows detecting putative protein-protein interactions, novel enzymes involved in only partially known biochemical pathways, and alternate enzymes originated by convergent evolution.

  15. Inferring the Brassica rapa Interactome Using Protein–Protein Interaction Data from Arabidopsis thaliana

    PubMed Central

    Yang, Jianhua; Osman, Kim; Iqbal, Mudassar; Stekel, Dov J.; Luo, Zewei; Armstrong, Susan J.; Franklin, F. Chris H.

    2013-01-01

    Following successful completion of the Brassica rapa sequencing project, the next step is to investigate functions of individual genes/proteins. For Arabidopsis thaliana, large amounts of protein–protein interaction (PPI) data are available from the major PPI databases (DBs). It is known that Brassica crop species are closely related to A. thaliana. This provides an opportunity to infer the B. rapa interactome using PPI data available from A. thaliana. In this paper, we present an inferred B. rapa interactome that is based on the A. thaliana PPI data from two resources: (i) A. thaliana PPI data from three major DBs, BioGRID, IntAct, and TAIR. (ii) ortholog-based A. thaliana PPI predictions. Linking between B. rapa and A. thaliana was accomplished in three complementary ways: (i) ortholog predictions, (ii) identification of gene duplication based on synteny and collinearity, and (iii) BLAST sequence similarity search. A complementary approach was also applied, which used known/predicted domain–domain interaction data. Specifically, since the two species are closely related, we used PPI data from A. thaliana to predict interacting domains that might be conserved between the two species. The predicted interactome was investigated for the component that contains known A. thaliana meiotic proteins to demonstrate its usability. PMID:23293649

  16. Prior knowledge based mining functional modules from Yeast PPI networks with gene ontology

    PubMed Central

    2010-01-01

    Background In the literature, there are fruitful algorithmic approaches for identification functional modules in protein-protein interactions (PPI) networks. Because of accumulation of large-scale interaction data on multiple organisms and non-recording interaction data in the existing PPI database, it is still emergent to design novel computational techniques that can be able to correctly and scalably analyze interaction data sets. Indeed there are a number of large scale biological data sets providing indirect evidence for protein-protein interaction relationships. Results The main aim of this paper is to present a prior knowledge based mining strategy to identify functional modules from PPI networks with the aid of Gene Ontology. Higher similarity value in Gene Ontology means that two gene products are more functionally related to each other, so it is better to group such gene products into one functional module. We study (i) to encode the functional pairs into the existing PPI networks; and (ii) to use these functional pairs as pairwise constraints to supervise the existing functional module identification algorithms. Topology-based modularity metric and complex annotation in MIPs will be used to evaluate the identified functional modules by these two approaches. Conclusions The experimental results on Yeast PPI networks and GO have shown that the prior knowledge based learning methods perform better than the existing algorithms. PMID:21172053

  17. MaGnET: Malaria Genome Exploration Tool.

    PubMed

    Sharman, Joanna L; Gerloff, Dietlind L

    2013-09-15

    The Malaria Genome Exploration Tool (MaGnET) is a software tool enabling intuitive 'exploration-style' visualization of functional genomics data relating to the malaria parasite, Plasmodium falciparum. MaGnET provides innovative integrated graphic displays for different datasets, including genomic location of genes, mRNA expression data, protein-protein interactions and more. Any selection of genes to explore made by the user is easily carried over between the different viewers for different datasets, and can be changed interactively at any point (without returning to a search). Free online use (Java Web Start) or download (Java application archive and MySQL database; requires local MySQL installation) at http://malariagenomeexplorer.org joanna.sharman@ed.ac.uk or dgerloff@ffame.org Supplementary data are available at Bioinformatics online.

  18. The Halophile protein database.

    PubMed

    Sharma, Naveen; Farooqi, Mohammad Samir; Chaturvedi, Krishna Kumar; Lal, Shashi Bhushan; Grover, Monendra; Rai, Anil; Pandey, Pankaj

    2014-01-01

    Halophilic archaea/bacteria adapt to different salt concentration, namely extreme, moderate and low. These type of adaptations may occur as a result of modification of protein structure and other changes in different cell organelles. Thus proteins may play an important role in the adaptation of halophilic archaea/bacteria to saline conditions. The Halophile protein database (HProtDB) is a systematic attempt to document the biochemical and biophysical properties of proteins from halophilic archaea/bacteria which may be involved in adaptation of these organisms to saline conditions. In this database, various physicochemical properties such as molecular weight, theoretical pI, amino acid composition, atomic composition, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (Gravy) have been listed. These physicochemical properties play an important role in identifying the protein structure, bonding pattern and function of the specific proteins. This database is comprehensive, manually curated, non-redundant catalogue of proteins. The database currently contains 59 897 proteins properties extracted from 21 different strains of halophilic archaea/bacteria. The database can be accessed through link. Database URL: http://webapp.cabgrid.res.in/protein/ © The Author(s) 2014. Published by Oxford University Press.

  19. Profiling lethal factor interacting proteins from human stomach using T7 phage display screening.

    PubMed

    Cardona-Correa, Albin; Rios-Velazquez, Carlos

    2016-05-01

    The anthrax lethal factor (LF) is a zinc dependent metalloproteinase that cleaves the majority of mitogen-activated protein kinase kinases and a member of NOD-like receptor proteins, inducing cell apoptosis. Despite efforts to fully understand the Bacillus anthracis toxin components, the gastrointestinal (GI) anthrax mechanisms have not been fully elucidated. Previous studies demonstrated gastric ulceration, and a substantial bacterial growth rate in Peyer's patches. However, the complete molecular pathways of the disease that results in tissue damage by LF proteolytic activity remains unclear. In the present study, to identify the profile of the proteins potentially involved in GI anthrax, protein‑protein interactions were investigated using human stomach T7 phage display (T7PD) cDNA libraries. T7PD is a high throughput technique that allows the expression of cloned DNA sequences as peptides on the phage surface, enabling the selection and identification of protein ligands. A wild type and mutant LF (E687A) were used to differentiate interaction sites. A total of 124 clones were identified from 194 interacting‑phages, at both the DNA and protein level, by in silico analysis. Databases revealed that the selected candidates were proteins from different families including lipase, peptidase‑A1 and cation transport families, among others. Furthermore, individual T7PD candidates were tested against LF in order to detect their specificity to the target molecule, resulting in 10 LF‑interacting peptides. With a minimum concentration of LF for interaction at 1 µg/ml, the T7PD isolated pepsin A3 pre‑protein (PAP) demonstrated affinity to both types of LF. In addition, PAP was isolated in various lengths for the same protein, exhibiting common regions following PRALINE alignment. These findings will help elucidate and improve the understanding of the molecular pathogenesis of GI anthrax, and aid in the development of potential therapeutic agents.

  20. Exploring of the molecular mechanism of rhinitis via bioinformatics methods

    PubMed Central

    Song, Yufen; Yan, Zhaohui

    2018-01-01

    The aim of this study was to analyze gene expression profiles for exploring the function and regulatory network of differentially expressed genes (DEGs) in pathogenesis of rhinitis by a bioinformatics method. The gene expression profile of GSE43523 was downloaded from the Gene Expression Omnibus database. The dataset contained 7 seasonal allergic rhinitis samples and 5 non-allergic normal samples. DEGs between rhinitis samples and normal samples were identified via the limma package of R. The webGestal database was used to identify enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of the DEGs. The differentially co-expressed pairs of the DEGs were identified via the DCGL package in R, and the differential co-expression network was constructed based on these pairs. A protein-protein interaction (PPI) network of the DEGs was constructed based on the Search Tool for the Retrieval of Interacting Genes database. A total of 263 DEGs were identified in rhinitis samples compared with normal samples, including 125 downregulated ones and 138 upregulated ones. The DEGs were enriched in 7 KEGG pathways. 308 differential co-expression gene pairs were obtained. A differential co-expression network was constructed, containing 212 nodes. In total, 148 PPI pairs of the DEGs were identified, and a PPI network was constructed based on these pairs. Bioinformatics methods could help us identify significant genes and pathways related to the pathogenesis of rhinitis. Steroid biosynthesis pathway and metabolic pathways might play important roles in the development of allergic rhinitis (AR). Genes such as CDC42 effector protein 5, solute carrier family 39 member A11 and PR/SET domain 10 might be also associated with the pathogenesis of AR, which provided references for the molecular mechanisms of AR. PMID:29257233

  1. Exploring the Universe of Protein Structures beyond the Protein Data Bank

    PubMed Central

    Cossio, Pilar; Trovato, Antonio; Pietrucci, Fabio; Seno, Flavio; Maritan, Amos; Laio, Alessandro

    2010-01-01

    It is currently believed that the atlas of existing protein structures is faithfully represented in the Protein Data Bank. However, whether this atlas covers the full universe of all possible protein structures is still a highly debated issue. By using a sophisticated numerical approach, we performed an exhaustive exploration of the conformational space of a 60 amino acid polypeptide chain described with an accurate all-atom interaction potential. We generated a database of around 30,000 compact folds with at least of secondary structure corresponding to local minima of the potential energy. This ensemble plausibly represents the universe of protein folds of similar length; indeed, all the known folds are represented in the set with good accuracy. However, we discover that the known folds form a rather small subset, which cannot be reproduced by choosing random structures in the database. Rather, natural and possible folds differ by the contact order, on average significantly smaller in the former. This suggests the presence of an evolutionary bias, possibly related to kinetic accessibility, towards structures with shorter loops between contacting residues. Beside their conceptual relevance, the new structures open a range of practical applications such as the development of accurate structure prediction strategies, the optimization of force fields, and the identification and design of novel folds. PMID:21079678

  2. Mapping small molecule binding data to structural domains

    PubMed Central

    2012-01-01

    Background Large-scale bioactivity/SAR Open Data has recently become available, and this has allowed new analyses and approaches to be developed to help address the productivity and translational gaps of current drug discovery. One of the current limitations of these data is the relative sparsity of reported interactions per protein target, and complexities in establishing clear relationships between bioactivity and targets using bioinformatics tools. We detail in this paper the indexing of targets by the structural domains that bind (or are likely to bind) the ligand within a full-length protein. Specifically, we present a simple heuristic to map small molecule binding to Pfam domains. This profiling can be applied to all proteins within a genome to give some indications of the potential pharmacological modulation and regulation of all proteins. Results In this implementation of our heuristic, ligand binding to protein targets from the ChEMBL database was mapped to structural domains as defined by profiles contained within the Pfam-A database. Our mapping suggests that the majority of assay targets within the current version of the ChEMBL database bind ligands through a small number of highly prevalent domains, and conversely the majority of Pfam domains sampled by our data play no currently established role in ligand binding. Validation studies, carried out firstly against Uniprot entries with expert binding-site annotation and secondly against entries in the wwPDB repository of crystallographic protein structures, demonstrate that our simple heuristic maps ligand binding to the correct domain in about 90 percent of all assessed cases. Using the mappings obtained with our heuristic, we have assembled ligand sets associated with each Pfam domain. Conclusions Small molecule binding has been mapped to Pfam-A domains of protein targets in the ChEMBL bioactivity database. The result of this mapping is an enriched annotation of small molecule bioactivity data and a grouping of activity classes following the Pfam-A specifications of protein domains. This is valuable for data-focused approaches in drug discovery, for example when extrapolating potential targets of a small molecule with known activity against one or few targets, or in the assessment of a potential target for drug discovery or screening studies. PMID:23282026

  3. Reconstruction of metabolic pathways by combining probabilistic graphical model-based and knowledge-based methods

    PubMed Central

    2014-01-01

    Automatic reconstruction of metabolic pathways for an organism from genomics and transcriptomics data has been a challenging and important problem in bioinformatics. Traditionally, known reference pathways can be mapped into an organism-specific ones based on its genome annotation and protein homology. However, this simple knowledge-based mapping method might produce incomplete pathways and generally cannot predict unknown new relations and reactions. In contrast, ab initio metabolic network construction methods can predict novel reactions and interactions, but its accuracy tends to be low leading to a lot of false positives. Here we combine existing pathway knowledge and a new ab initio Bayesian probabilistic graphical model together in a novel fashion to improve automatic reconstruction of metabolic networks. Specifically, we built a knowledge database containing known, individual gene / protein interactions and metabolic reactions extracted from existing reference pathways. Known reactions and interactions were then used as constraints for Bayesian network learning methods to predict metabolic pathways. Using individual reactions and interactions extracted from different pathways of many organisms to guide pathway construction is new and improves both the coverage and accuracy of metabolic pathway construction. We applied this probabilistic knowledge-based approach to construct the metabolic networks from yeast gene expression data and compared its results with 62 known metabolic networks in the KEGG database. The experiment showed that the method improved the coverage of metabolic network construction over the traditional reference pathway mapping method and was more accurate than pure ab initio methods. PMID:25374614

  4. Pollen Lipidomics: Lipid Profiling Exposes a Notable Diversity in 22 Allergenic Pollen and Potential Biomarkers of the Allergic Immune Response

    PubMed Central

    Bashir, Mohamed Elfatih H.; Lui, Jan Hsi; Palnivelu, Ravishankar; Naclerio, Robert M.; Preuss, Daphne

    2013-01-01

    Background/Aim Pollen grains are the male gametophytes that deliver sperm cells to female gametophytes during sexual reproduction of higher plants. Pollen is a major source of aeroallergens and environmental antigens. The pollen coat harbors a plethora of lipids that are required for pollen hydration, germination, and penetration of the stigma by pollen tubes. In addition to proteins, pollen displays a wide array of lipids that interact with the human immune system. Prior searches for pollen allergens have focused on the identification of intracellular allergenic proteins, but have largely overlooked much of the extracellular pollen matrix, a region where the majority of lipid molecules reside. Lipid antigens have attracted attention for their potent immunoregulatory effects. By being in close proximity to allergenic proteins on the pollen surface when they interact with host cells, lipids could modify the antigenic properties of proteins. Methodology/Principal Findings We performed a comparative pollen lipid profiling of 22 commonly allergenic plant species by the use of gas chromatography-mass spectroscopy, followed by detailed data mining and statistical analysis. Three experiments compared pollen lipid profiles. We built a database library of the pollen lipids by matching acquired pollen-lipid mass spectra and retention times with the NIST/EPA/NIH mass-spectral library. We detected, identified, and relatively quantified more than 106 lipid molecular species including fatty acids, n-alkanes, fatty alcohols, and sterols. Pollen-derived lipids stimulation up-regulate cytokines expression of dendritic and natural killer T cells co-culture. Conclusions/Significance Here we report on a lipidomic analysis of pollen lipids that can serve as a database for identifying potential lipid antigens and/or novel candidate molecules involved in allergy. The database provides a resource that facilitates studies on the role of lipids in the immunopathogenesis of allergy. Pollen lipids vary greatly among allergenic species and contain many molecules that have stimulatory or regulatory effects on immune responses. PMID:23469025

  5. The Movable Type Method Applied to Protein-Ligand Binding.

    PubMed

    Zheng, Zheng; Ucisik, Melek N; Merz, Kenneth M

    2013-12-10

    Accurately computing the free energy for biological processes like protein folding or protein-ligand association remains a challenging problem. Both describing the complex intermolecular forces involved and sampling the requisite configuration space make understanding these processes innately difficult. Herein, we address the sampling problem using a novel methodology we term "movable type". Conceptually it can be understood by analogy with the evolution of printing and, hence, the name movable type. For example, a common approach to the study of protein-ligand complexation involves taking a database of intact drug-like molecules and exhaustively docking them into a binding pocket. This is reminiscent of early woodblock printing where each page had to be laboriously created prior to printing a book. However, printing evolved to an approach where a database of symbols (letters, numerals, etc.) was created and then assembled using a movable type system, which allowed for the creation of all possible combinations of symbols on a given page, thereby, revolutionizing the dissemination of knowledge. Our movable type (MT) method involves the identification of all atom pairs seen in protein-ligand complexes and then creating two databases: one with their associated pairwise distant dependent energies and another associated with the probability of how these pairs can combine in terms of bonds, angles, dihedrals and non-bonded interactions. Combining these two databases coupled with the principles of statistical mechanics allows us to accurately estimate binding free energies as well as the pose of a ligand in a receptor. This method, by its mathematical construction, samples all of configuration space of a selected region (the protein active site here) in one shot without resorting to brute force sampling schemes involving Monte Carlo, genetic algorithms or molecular dynamics simulations making the methodology extremely efficient. Importantly, this method explores the free energy surface eliminating the need to estimate the enthalpy and entropy components individually. Finally, low free energy structures can be obtained via a free energy minimization procedure yielding all low free energy poses on a given free energy surface. Besides revolutionizing the protein-ligand docking and scoring problem this approach can be utilized in a wide range of applications in computational biology which involve the computation of free energies for systems with extensive phase spaces including protein folding, protein-protein docking and protein design.

  6. The Nuclear Protein Database (NPD): sub-nuclear localisation and functional annotation of the nuclear proteome

    PubMed Central

    Dellaire, G.; Farrall, R.; Bickmore, W.A.

    2003-01-01

    The Nuclear Protein Database (NPD) is a curated database that contains information on more than 1300 vertebrate proteins that are thought, or are known, to localise to the cell nucleus. Each entry is annotated with information on predicted protein size and isoelectric point, as well as any repeats, motifs or domains within the protein sequence. In addition, information on the sub-nuclear localisation of each protein is provided and the biological and molecular functions are described using Gene Ontology (GO) terms. The database is searchable by keyword, protein name, sub-nuclear compartment and protein domain/motif. Links to other databases are provided (e.g. Entrez, SWISS-PROT, OMIM, PubMed, PubMed Central). Thus, NPD provides a gateway through which the nuclear proteome may be explored. The database can be accessed at http://npd.hgu.mrc.ac.uk and is updated monthly. PMID:12520015

  7. Genome-wide identification, classification, and functional analysis of the basic helix-loop-helix transcription factors in the cattle, Bos Taurus.

    PubMed

    Li, Fengmei; Liu, Wuyi

    2017-06-01

    The basic helix-loop-helix (bHLH) transcription factors (TFs) form a huge superfamily and play crucial roles in many essential developmental, genetic, and physiological-biochemical processes of eukaryotes. In total, 109 putative bHLH TFs were identified and categorized successfully in the genomic databases of cattle, Bos Taurus, after removing redundant sequences and merging genetic isoforms. Through phylogenetic analyses, 105 proteins among these bHLH TFs were classified into 44 families with 46, 25, 14, 3, 13, and 4 members in the high-order groups A, B, C, D, E, and F, respectively. The remaining 4 bHLH proteins were sorted out as 'orphans.' Next, these 109 putative bHLH proteins identified were further characterized as significantly enriched in 524 significant Gene Ontology (GO) annotations (corrected P value ≤ 0.05) and 21 significantly enriched pathways (corrected P value ≤ 0.05) that had been mapped by the web server KOBAS 2.0. Furthermore, 95 bHLH proteins were further screened and analyzed together with two uncharacterized proteins in the STRING online database to reconstruct the protein-protein interaction network of cattle bHLH TFs. Ultimately, 89 bHLH proteins were fully mapped in a network with 67 biological process, 13 molecular functions, 5 KEGG pathways, 12 PFAM protein domains, and 25 INTERPRO classified protein domains and features. These results provide much useful information and a good reference for further functional investigations and updated researches on cattle bHLH TFs.

  8. A PDB-wide, evolution-based assessment of protein-protein interfaces.

    PubMed

    Baskaran, Kumaran; Duarte, Jose M; Biyani, Nikhil; Bliven, Spencer; Capitani, Guido

    2014-10-18

    Thanks to the growth in sequence and structure databases, more than 50 million sequences are now available in UniProt and 100,000 structures in the PDB. Rich information about protein-protein interfaces can be obtained by a comprehensive study of protein contacts in the PDB, their sequence conservation and geometric features. An automated computational pipeline was developed to run our Evolutionary Protein-Protein Interface Classifier (EPPIC) software on the entire PDB and store the results in a relational database, currently containing > 800,000 interfaces. This allows the analysis of interface data on a PDB-wide scale. Two large benchmark datasets of biological interfaces and crystal contacts, each containing about 3000 entries, were automatically generated based on criteria thought to be strong indicators of interface type. The BioMany set of biological interfaces includes NMR dimers solved as crystal structures and interfaces that are preserved across diverse crystal forms, as catalogued by the Protein Common Interface Database (ProtCID) from Xu and Dunbrack. The second dataset, XtalMany, is derived from interfaces that would lead to infinite assemblies and are therefore crystal contacts. BioMany and XtalMany were used to benchmark the EPPIC approach. The performance of EPPIC was also compared to classifications from the Protein Interfaces, Surfaces, and Assemblies (PISA) program on a PDB-wide scale, finding that the two approaches give the same call in about 88% of PDB interfaces. By comparing our safest predictions to the PDB author annotations, we provide a lower-bound estimate of the error rate of biological unit annotations in the PDB. Additionally, we developed a PyMOL plugin for direct download and easy visualization of EPPIC interfaces for any PDB entry. Both the datasets and the PyMOL plugin are available at http://www.eppic-web.org/ewui/\\#downloads. Our computational pipeline allows us to analyze protein-protein contacts and their sequence conservation across the entire PDB. Two new benchmark datasets are provided, which are over an order of magnitude larger than existing manually curated ones. These tools enable the comprehensive study of several aspects of protein-protein contacts in the PDB and represent a basis for future, even larger scale studies of protein-protein interactions.

  9. Versatility and Invariance in the Evolution of Homologous Heteromeric Interfaces

    PubMed Central

    Andreani, Jessica; Faure, Guilhem; Guerois, Raphaël

    2012-01-01

    Evolutionary pressures act on protein complex interfaces so that they preserve their complementarity. Nonetheless, the elementary interactions which compose the interface are highly versatile throughout evolution. Understanding and characterizing interface plasticity across evolution is a fundamental issue which could provide new insights into protein-protein interaction prediction. Using a database of 1,024 couples of close and remote heteromeric structural interologs, we studied protein-protein interactions from a structural and evolutionary point of view. We systematically and quantitatively analyzed the conservation of different types of interface contacts. Our study highlights astonishing plasticity regarding polar contacts at complex interfaces. It also reveals that up to a quarter of the residues switch out of the interface when comparing two homologous complexes. Despite such versatility, we identify two important interface descriptors which correlate with an increased conservation in the evolution of interfaces: apolar patches and contacts surrounding anchor residues. These observations hold true even when restricting the dataset to transiently formed complexes. We show that a combination of six features related either to sequence or to geometric properties of interfaces can be used to rank positions likely to share similar contacts between two interologs. Altogether, our analysis provides important tracks for extracting meaningful information from multiple sequence alignments of conserved binding partners and for discriminating near-native interfaces using evolutionary information. PMID:22952442

  10. In-Culture Cross-Linking of Bacterial Cells Reveals Large-Scale Dynamic Protein-Protein Interactions at the Peptide Level.

    PubMed

    de Jong, Luitzen; de Koning, Edward A; Roseboom, Winfried; Buncherd, Hansuk; Wanner, Martin J; Dapic, Irena; Jansen, Petra J; van Maarseveen, Jan H; Corthals, Garry L; Lewis, Peter J; Hamoen, Leendert W; de Koster, Chris G

    2017-07-07

    Identification of dynamic protein-protein interactions at the peptide level on a proteomic scale is a challenging approach that is still in its infancy. We have developed a system to cross-link cells directly in culture with the special lysine cross-linker bis(succinimidyl)-3-azidomethyl-glutarate (BAMG). We used the Gram-positive model bacterium Bacillus subtilis as an exemplar system. Within 5 min extensive intracellular cross-linking was detected, while intracellular cross-linking in a Gram-negative species, Escherichia coli, was still undetectable after 30 min, in agreement with the low permeability in this organism for lipophilic compounds like BAMG. We were able to identify 82 unique interprotein cross-linked peptides with <1% false discovery rate by mass spectrometry and genome-wide database searching. Nearly 60% of the interprotein cross-links occur in assemblies involved in transcription and translation. Several of these interactions are new, and we identified a binding site between the δ and β' subunit of RNA polymerase close to the downstream DNA channel, providing a clue into how δ might regulate promoter selectivity and promote RNA polymerase recycling. Our methodology opens new avenues to investigate the functional dynamic organization of complex protein assemblies involved in bacterial growth. Data are available via ProteomeXchange with identifier PXD006287.

  11. Prioritization of potential drug targets against P. aeruginosa by core proteomic analysis using computational subtractive genomics and Protein-Protein interaction network.

    PubMed

    Uddin, Reaz; Jamil, Faiza

    2018-06-01

    Pseudomonas aeruginosa is an opportunistic gram-negative bacterium that has the capability to acquire resistance under hostile conditions and become a threat worldwide. It is involved in nosocomial infections. In the current study, potential novel drug targets against P. aeruginosa have been identified using core proteomic analysis and Protein-Protein Interactions (PPIs) studies. The non-redundant reference proteome of 68 strains having complete genome and latest assembly version of P. aeruginosa were downloaded from ftp NCBI RefSeq server in October 2016. The standalone CD-HIT tool was used to cluster ortholog proteins (having >=80% amino acid identity) present in all strains. The pan-proteome was clustered in 12,380 Clusters of Orthologous Proteins (COPs). By using in-house shell scripts, 3252 common COPs were extracted out and designated as clusters of core proteome. The core proteome of PAO1 strain was selected by fetching PAO1's proteome from common COPs. As a result, 1212 proteins were shortlisted that are non-homologous to the human but essential for the survival of the pathogen. Among these 1212 proteins, 321 proteins are conserved hypothetical proteins. Considering their potential as drug target, those 321 hypothetical proteins were selected and their probable functions were characterized. Based on the druggability criteria, 18 proteins were shortlisted. The interacting partners were identified by investigating the PPIs network using STRING v10 database. Subsequently, 8 proteins were shortlisted as 'hub proteins' and proposed as potential novel drug targets against P. aeruginosa. The study is interesting for the scientific community working to identify novel drug targets against MDR pathogens particularly P. aeruginosa. Copyright © 2018 Elsevier Ltd. All rights reserved.

  12. SorghumFDB: sorghum functional genomics database with multidimensional network analysis.

    PubMed

    Tian, Tian; You, Qi; Zhang, Liwei; Yi, Xin; Yan, Hengyu; Xu, Wenying; Su, Zhen

    2016-01-01

    Sorghum (Sorghum bicolor [L.] Moench) has excellent agronomic traits and biological properties, such as heat and drought-tolerance. It is a C4 grass and potential bioenergy-producing plant, which makes it an important crop worldwide. With the sorghum genome sequence released, it is essential to establish a sorghum functional genomics data mining platform. We collected genomic data and some functional annotations to construct a sorghum functional genomics database (SorghumFDB). SorghumFDB integrated knowledge of sorghum gene family classifications (transcription regulators/factors, carbohydrate-active enzymes, protein kinases, ubiquitins, cytochrome P450, monolignol biosynthesis related enzymes, R-genes and organelle-genes), detailed gene annotations, miRNA and target gene information, orthologous pairs in the model plants Arabidopsis, rice and maize, gene loci conversions and a genome browser. We further constructed a dynamic network of multidimensional biological relationships, comprised of the co-expression data, protein-protein interactions and miRNA-target pairs. We took effective measures to combine the network, gene set enrichment and motif analyses to determine the key regulators that participate in related metabolic pathways, such as the lignin pathway, which is a major biological process in bioenergy-producing plants.Database URL: http://structuralbiology.cau.edu.cn/sorghum/index.html. © The Author(s) 2016. Published by Oxford University Press.

  13. DockTrina: docking triangular protein trimers.

    PubMed

    Popov, Petr; Ritchie, David W; Grudinin, Sergei

    2014-01-01

    In spite of the abundance of oligomeric proteins within a cell, the structural characterization of protein-protein interactions is still a challenging task. In particular, many of these interactions involve heteromeric complexes, which are relatively difficult to determine experimentally. Hence there is growing interest in using computational techniques to model such complexes. However, assembling large heteromeric complexes computationally is a highly combinatorial problem. Nonetheless the problem can be simplified greatly by considering interactions between protein trimers. After dimers and monomers, triangular trimers (i.e. trimers with pair-wise contacts between all three pairs of proteins) are the most frequently observed quaternary structural motifs according to the three-dimensional (3D) complex database. This article presents DockTrina, a novel protein docking method for modeling the 3D structures of nonsymmetrical triangular trimers. The method takes as input pair-wise contact predictions from a rigid body docking program. It then scans and scores all possible combinations of pairs of monomers using a very fast root mean square deviation test. Finally, it ranks the predictions using a scoring function which combines triples of pair-wise contact terms and a geometric clash penalty term. The overall approach takes less than 2 min per complex on a modern desktop computer. The method is tested and validated using a benchmark set of 220 bound and seven unbound protein trimer structures. DockTrina will be made available at http://nano-d.inrialpes.fr/software/docktrina. Copyright © 2013 Wiley Periodicals, Inc.

  14. The European Bioinformatics Institute's data resources: towards systems biology.

    PubMed

    Brooksbank, Catherine; Cameron, Graham; Thornton, Janet

    2005-01-01

    Genomic and post-genomic biological research has provided fine-grain insights into the molecular processes of life, but also threatens to drown biomedical researchers in data. Moreover, as new high-throughput technologies are developed, the types of data that are gathered en masse are diversifying. The need to collect, store and curate all this information in ways that allow its efficient retrieval and exploitation is greater than ever. The European Bioinformatics Institute's (EBI's) databases and tools have evolved to meet the changing needs of molecular biologists: since we last wrote about our services in the 2003 issue of Nucleic Acids Research, we have launched new databases covering protein-protein interactions (IntAct), pathways (Reactome) and small molecules (ChEBI). Our existing core databases have continued to evolve to meet the changing needs of biomedical researchers, and we have developed new data-access tools that help biologists to move intuitively through the different data types, thereby helping them to put the parts together to understand biology at the systems level. The EBI's data resources are all available on our website at http://www.ebi.ac.uk.

  15. 'RetinoGenetics': a comprehensive mutation database for genes related to inherited retinal degeneration.

    PubMed

    Ran, Xia; Cai, Wei-Jun; Huang, Xiu-Feng; Liu, Qi; Lu, Fan; Qu, Jia; Wu, Jinyu; Jin, Zi-Bing

    2014-01-01

    Inherited retinal degeneration (IRD), a leading cause of human blindness worldwide, is exceptionally heterogeneous with clinical heterogeneity and genetic variety. During the past decades, tremendous efforts have been made to explore the complex heterogeneity, and massive mutations have been identified in different genes underlying IRD with the significant advancement of sequencing technology. In this study, we developed a comprehensive database, 'RetinoGenetics', which contains informative knowledge about all known IRD-related genes and mutations for IRD. 'RetinoGenetics' currently contains 4270 mutations in 186 genes, with detailed information associated with 164 phenotypes from 934 publications and various types of functional annotations. Then extensive annotations were performed to each gene using various resources, including Gene Ontology, KEGG pathways, protein-protein interaction, mutational annotations and gene-disease network. Furthermore, by using the search functions, convenient browsing ways and intuitive graphical displays, 'RetinoGenetics' could serve as a valuable resource for unveiling the genetic basis of IRD. Taken together, 'RetinoGenetics' is an integrative, informative and updatable resource for IRD-related genetic predispositions. Database URL: http://www.retinogenetics.org/. © The Author(s) 2014. Published by Oxford University Press.

  16. MIPS: a database for genomes and protein sequences

    PubMed Central

    Mewes, H. W.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Münsterkötter, M.; Rudd, S.; Weil, B.

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz–Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91–93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155–158; Barker et al. (2001) Nucleic Acids Res., 29, 29–32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de). PMID:11752246

  17. MIPS: a database for genomes and protein sequences.

    PubMed

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  18. Prediction of protein interaction hot spots using rough set-based multiple criteria linear programming.

    PubMed

    Chen, Ruoying; Zhang, Zhiwang; Wu, Di; Zhang, Peng; Zhang, Xinyang; Wang, Yong; Shi, Yong

    2011-01-21

    Protein-protein interactions are fundamentally important in many biological processes and it is in pressing need to understand the principles of protein-protein interactions. Mutagenesis studies have found that only a small fraction of surface residues, known as hot spots, are responsible for the physical binding in protein complexes. However, revealing hot spots by mutagenesis experiments are usually time consuming and expensive. In order to complement the experimental efforts, we propose a new computational approach in this paper to predict hot spots. Our method, Rough Set-based Multiple Criteria Linear Programming (RS-MCLP), integrates rough sets theory and multiple criteria linear programming to choose dominant features and computationally predict hot spots. Our approach is benchmarked by a dataset of 904 alanine-mutated residues and the results show that our RS-MCLP method performs better than other methods, e.g., MCLP, Decision Tree, Bayes Net, and the existing HotSprint database. In addition, we reveal several biological insights based on our analysis. We find that four features (the change of accessible surface area, percentage of the change of accessible surface area, size of a residue, and atomic contacts) are critical in predicting hot spots. Furthermore, we find that three residues (Tyr, Trp, and Phe) are abundant in hot spots through analyzing the distribution of amino acids. Copyright © 2010 Elsevier Ltd. All rights reserved.

  19. SuperTarget goes quantitative: update on drug–target interactions

    PubMed Central

    Hecker, Nikolai; Ahmed, Jessica; von Eichborn, Joachim; Dunkel, Mathias; Macha, Karel; Eckert, Andreas; Gilson, Michael K.; Bourne, Philip E.; Preissner, Robert

    2012-01-01

    There are at least two good reasons for the on-going interest in drug–target interactions: first, drug-effects can only be fully understood by considering a complex network of interactions to multiple targets (so-called off-target effects) including metabolic and signaling pathways; second, it is crucial to consider drug-target-pathway relations for the identification of novel targets for drug development. To address this on-going need, we have developed a web-based data warehouse named SuperTarget, which integrates drug-related information associated with medical indications, adverse drug effects, drug metabolism, pathways and Gene Ontology (GO) terms for target proteins. At present, the updated database contains >6000 target proteins, which are annotated with >330 000 relations to 196 000 compounds (including approved drugs); the vast majority of interactions include binding affinities and pointers to the respective literature sources. The user interface provides tools for drug screening and target similarity inclusion. A query interface enables the user to pose complex queries, for example, to find drugs that target a certain pathway, interacting drugs that are metabolized by the same cytochrome P450 or drugs that target proteins within a certain affinity range. SuperTarget is available at http://bioinformatics.charite.de/supertarget. PMID:22067455

  20. Sixty-five years of the long march in protein secondary structure prediction: the final stretch?

    PubMed Central

    Yang, Yuedong; Gao, Jianzhao; Wang, Jihua; Heffernan, Rhys; Hanson, Jack; Paliwal, Kuldip; Zhou, Yaoqi

    2018-01-01

    Abstract Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82–84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88–90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction. PMID:28040746

  1. Transporter taxonomy - a comparison of different transport protein classification schemes.

    PubMed

    Viereck, Michael; Gaulton, Anna; Digles, Daniela; Ecker, Gerhard F

    2014-06-01

    Currently, there are more than 800 well characterized human membrane transport proteins (including channels and transporters) and there are estimates that about 10% (approx. 2000) of all human genes are related to transport. Membrane transport proteins are of interest as potential drug targets, for drug delivery, and as a cause of side effects and drug–drug interactions. In light of the development of Open PHACTS, which provides an open pharmacological space, we analyzed selected membrane transport protein classification schemes (Transporter Classification Database, ChEMBL, IUPHAR/BPS Guide to Pharmacology, and Gene Ontology) for their ability to serve as a basis for pharmacology driven protein classification. A comparison of these membrane transport protein classification schemes by using a set of clinically relevant transporters as use-case reveals the strengths and weaknesses of the different taxonomy approaches.

  2. IntegromeDB: an integrated system and biological search engine.

    PubMed

    Baitaluk, Michael; Kozhenkov, Sergey; Dubinina, Yulia; Ponomarenko, Julia

    2012-01-19

    With the growth of biological data in volume and heterogeneity, web search engines become key tools for researchers. However, general-purpose search engines are not specialized for the search of biological data. Here, we present an approach at developing a biological web search engine based on the Semantic Web technologies and demonstrate its implementation for retrieving gene- and protein-centered knowledge. The engine is available at http://www.integromedb.org. The IntegromeDB search engine allows scanning data on gene regulation, gene expression, protein-protein interactions, pathways, metagenomics, mutations, diseases, and other gene- and protein-related data that are automatically retrieved from publicly available databases and web pages using biological ontologies. To perfect the resource design and usability, we welcome and encourage community feedback.

  3. The yeast two hybrid system in a screen for proteins interacting with axolotl (Ambystoma mexicanum) Msx1 during early limb regeneration.

    PubMed

    Abuqarn, Mehtap; Allmeling, Christina; Amshoff, Inga; Menger, Bjoern; Nasser, Inas; Vogt, Peter M; Reimers, Kerstin

    2011-07-01

    Urodele amphibians are exceptional in their ability to regenerate complex body structures such as limbs. Limb regeneration depends on a process called dedifferentiation. Under an inductive wound epidermis terminally differentiated cells transform to pluripotent progenitor cells that coordinately proliferate and eventually redifferentiate to form the new appendage. Recent studies have developed molecular models integrating a set of genes that might have important functions in the control of regenerative cellular plasticity. Among them is Msx1, which induced dedifferentiation in mammalian myotubes in vitro. Herein, we screened for interaction partners of axolotl Msx1 using a yeast two hybrid system. A two hybrid cDNA library of 5-day-old wound epidermis and underlying tissue containing more than 2×10⁶ cDNAs was constructed and used in the screen. 34 resulting cDNA clones were isolated and sequenced. We then compared sequences of the isolated clones to annotated EST contigs of the Salamander EST database (BLASTn) to identify presumptive orthologs. We subsequently searched all no-hit clone sequences against non redundant NCBI sequence databases using BLASTx. It is the first time, that the yeast two hybrid system was adapted to the axolotl animal model and successfully used in a screen for proteins interacting with Msx1 in the context of amphibian limb regeneration. 2011 Elsevier B.V. All rights reserved.

  4. HIV Molecular Immunology 2014

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yusim, Karina; Korber, Bette Tina Marie; Barouch, Dan

    HIV Molecular Immunology is a companion volume to HIV Sequence Compendium. This publication, the 2014 edition, is the PDF version of the web-based HIV Immunology Database (http://www.hiv.lanl.gov/content/immunology/). The web interface for this relational database has many search options, as well as interactive tools to help immunologists design reagents and interpret their results. In the HIV Immunology Database, HIV-specific B-cell and T-cell responses are summarized and annotated. Immunological responses are divided into three parts, CTL, T helper, and antibody. Within these parts, defined epitopes are organized by protein and binding sites within each protein, moving from left to right through themore » coding regions spanning the HIV genome. We include human responses to natural HIV infections, as well as vaccine studies in a range of animal models and human trials. Responses that are not specifically defined, such as responses to whole proteins or monoclonal antibody responses to discontinuous epitopes, are summarized at the end of each protein section. Studies describing general HIV responses to the virus, but not to any specific protein, are included at the end of each part. The annotation includes information such as crossreactivity, escape mutations, antibody sequence, TCR usage, functional domains that overlap with an epitope, immune response associations with rates of progression and therapy, and how specific epitopes were experimentally defined. Basic information such as HLA specificities for T-cell epitopes, isotypes of monoclonal antibodies, and epitope sequences are included whenever possible. All studies that we can find that incorporate the use of a specific monoclonal antibody are included in the entry for that antibody. A single T-cell epitope can have multiple entries, generally one entry per study. Finally, maps of all defined linear epitopes relative to the HXB2 reference proteins are provided.« less

  5. A Systematic Prediction of Drug-Target Interactions Using Molecular Fingerprints and Protein Sequences.

    PubMed

    Huang, Yu-An; You, Zhu-Hong; Chen, Xing

    2018-01-01

    Drug-Target Interactions (DTI) play a crucial role in discovering new drug candidates and finding new proteins to target for drug development. Although the number of detected DTI obtained by high-throughput techniques has been increasing, the number of known DTI is still limited. On the other hand, the experimental methods for detecting the interactions among drugs and proteins are costly and inefficient. Therefore, computational approaches for predicting DTI are drawing increasing attention in recent years. In this paper, we report a novel computational model for predicting the DTI using extremely randomized trees model and protein amino acids information. More specifically, the protein sequence is represented as a Pseudo Substitution Matrix Representation (Pseudo-SMR) descriptor in which the influence of biological evolutionary information is retained. For the representation of drug molecules, a novel fingerprint feature vector is utilized to describe its substructure information. Then the DTI pair is characterized by concatenating the two vector spaces of protein sequence and drug substructure. Finally, the proposed method is explored for predicting the DTI on four benchmark datasets: Enzyme, Ion Channel, GPCRs and Nuclear Receptor. The experimental results demonstrate that this method achieves promising prediction accuracies of 89.85%, 87.87%, 82.99% and 81.67%, respectively. For further evaluation, we compared the performance of Extremely Randomized Trees model with that of the state-of-the-art Support Vector Machine classifier. And we also compared the proposed model with existing computational models, and confirmed 15 potential drug-target interactions by looking for existing databases. The experiment results show that the proposed method is feasible and promising for predicting drug-target interactions for new drug candidate screening based on sizeable features. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  6. Multiple kernel learning in protein-protein interaction extraction from biomedical literature.

    PubMed

    Yang, Zhihao; Tang, Nan; Zhang, Xiao; Lin, Hongfei; Li, Yanpeng; Yang, Zhiwei

    2011-03-01

    Knowledge about protein-protein interactions (PPIs) unveils the molecular mechanisms of biological processes. The volume and content of published biomedical literature on protein interactions is expanding rapidly, making it increasingly difficult for interaction database administrators, responsible for content input and maintenance to detect and manually update protein interaction information. The objective of this work is to develop an effective approach to automatic extraction of PPI information from biomedical literature. We present a weighted multiple kernel learning-based approach for automatic PPI extraction from biomedical literature. The approach combines the following kernels: feature-based, tree, graph and part-of-speech (POS) path. In particular, we extend the shortest path-enclosed tree (SPT) and dependency path tree to capture richer contextual information. Our experimental results show that the combination of SPT and dependency path tree extensions contributes to the improvement of performance by almost 0.7 percentage units in F-score and 2 percentage units in area under the receiver operating characteristics curve (AUC). Combining two or more appropriately weighed individual will further improve the performance. Both on the individual corpus and cross-corpus evaluation our combined kernel can achieve state-of-the-art performance with respect to comparable evaluations, with 64.41% F-score and 88.46% AUC on the AImed corpus. As different kernels calculate the similarity between two sentences from different aspects. Our combined kernel can reduce the risk of missing important features. More specifically, we use a weighted linear combination of individual kernels instead of assigning the same weight to each individual kernel, thus allowing the introduction of each kernel to incrementally contribute to the performance improvement. In addition, SPT and dependency path tree extensions can improve the performance by including richer context information. Copyright © 2010 Elsevier B.V. All rights reserved.

  7. Integration of gel-based and gel-free proteomic data for functional analysis of proteins through Soybean Proteome Database.

    PubMed

    Komatsu, Setsuko; Wang, Xin; Yin, Xiaojian; Nanjo, Yohei; Ohyanagi, Hajime; Sakata, Katsumi

    2017-06-23

    The Soybean Proteome Database (SPD) stores data on soybean proteins obtained with gel-based and gel-free proteomic techniques. The database was constructed to provide information on proteins for functional analyses. The majority of the data is focused on soybean (Glycine max 'Enrei'). The growth and yield of soybean are strongly affected by environmental stresses such as flooding. The database was originally constructed using data on soybean proteins separated by two-dimensional polyacrylamide gel electrophoresis, which is a gel-based proteomic technique. Since 2015, the database has been expanded to incorporate data obtained by label-free mass spectrometry-based quantitative proteomics, which is a gel-free proteomic technique. Here, the portions of the database consisting of gel-free proteomic data are described. The gel-free proteomic database contains 39,212 proteins identified in 63 sample sets, such as temporal and organ-specific samples of soybean plants grown under flooding stress or non-stressed conditions. In addition, data on organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored. Furthermore, the database integrates multiple omics data such as genomics, transcriptomics, metabolomics, and proteomics. The SPD database is accessible at http://proteome.dc.affrc.go.jp/Soybean/. The Soybean Proteome Database stores data obtained from both gel-based and gel-free proteomic techniques. The gel-free proteomic database comprises 39,212 proteins identified in 63 sample sets, such as different organs of soybean plants grown under flooding stress or non-stressed conditions in a time-dependent manner. In addition, organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored in the gel-free proteomics database. A total of 44,704 proteins, including 5490 proteins identified using a gel-based proteomic technique, are stored in the SPD. It accounts for approximately 80% of all predicted proteins from genome sequences, though there are over lapped proteins. Based on the demonstrated application of data stored in the database for functional analyses, it is suggested that these data will be useful for analyses of biological mechanisms in soybean. Furthermore, coupled with recent advances in information and communication technology, the usefulness of this database would increase in the analyses of biological mechanisms. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Investigations on the antiretroviral activity of carbon nanotubes using computational molecular approach.

    PubMed

    Krishnaraj, R Navanietha; Chandran, Saravanan; Pal, Parimal; Berchmans, Sheela

    2014-01-01

    Carbon nanotubes are the interesting class of materials with wide range of applications. They have excellent physical, chemical and electrical properties. Numerous reports were made on the antiviral activities of carbon nanotubes. However the mechanism of antiviral action is still in infancy. Herein we report, our recent novel findings on the molecular interactions of carbon nanotubes with the three key target proteins of HIV using computational chemistry approach. Armchair, chiral and zigzag CNTs were modeled and used as ligands for the interaction studies. The structure of the key proteins involved in HIV mediated infection namely HIV- Vpr, Nef and Gag proteins were collected from the PDB database. The docking studies were performed to quantify the interaction of the CNT with the three different disease targets. Results showed that the carbon nanotubes had high binding affinity to these proteins which confirms the antagonistic molecular interaction of carbon nanotubes to the disease targets. The modeled armchair carbon nanotubes had the binding affinities of -12.4 Kcal/mole, -20 Kcal/mole and -11.7 Kcal/mole with the Vpr, Nef and Gag proteins of HIV. Chiral CNTs also had the maximum affinity of -16.4 Kcal/mole to Nef. The binding affinity of chiral CNTs to Vpr and Gag was found to be -10.9 Kcal/mole and -10.3 Kcal/mole respectively. The zigzag CNTs had the binding affinity of -11.1 Kcal/mole with Vpr, -18.3 Kcal/mole with Nef and -10.9 with Gag respectively. The strong molecular interactions suggest the efficacy of CNTs for targeting the HIV mediated retroviral infections.

  9. Navigating through the Jungle of Allergens: Features and Applications of Allergen Databases.

    PubMed

    Radauer, Christian

    2017-01-01

    The increasing number of available data on allergenic proteins demanded the establishment of structured, freely accessible allergen databases. In this review article, features and applications of 6 of the most widely used allergen databases are discussed. The WHO/IUIS Allergen Nomenclature Database is the official resource of allergen designations. Allergome is the most comprehensive collection of data on allergens and allergen sources. AllergenOnline is aimed at providing a peer-reviewed database of allergen sequences for prediction of allergenicity of proteins, such as those planned to be inserted into genetically modified crops. The Structural Database of Allergenic Proteins (SDAP) provides a database of allergen sequences, structures, and epitopes linked to bioinformatics tools for sequence analysis and comparison. The Immune Epitope Database (IEDB) is the largest repository of T-cell, B-cell, and major histocompatibility complex protein epitopes including epitopes of allergens. AllFam classifies allergens into families of evolutionarily related proteins using definitions from the Pfam protein family database. These databases contain mostly overlapping data, but also show differences in terms of their targeted users, the criteria for including allergens, data shown for each allergen, and the availability of bioinformatics tools. © 2017 S. Karger AG, Basel.

  10. Detection of alternative splice variants at the proteome level in Aspergillus flavus.

    PubMed

    Chang, Kung-Yen; Georgianna, D Ryan; Heber, Steffen; Payne, Gary A; Muddiman, David C

    2010-03-05

    Identification of proteins from proteolytic peptides or intact proteins plays an essential role in proteomics. Researchers use search engines to match the acquired peptide sequences to the target proteins. However, search engines depend on protein databases to provide candidates for consideration. Alternative splicing (AS), the mechanism where the exon of pre-mRNAs can be spliced and rearranged to generate distinct mRNA and therefore protein variants, enable higher eukaryotic organisms, with only a limited number of genes, to have the requisite complexity and diversity at the proteome level. Multiple alternative isoforms from one gene often share common segments of sequences. However, many protein databases only include a limited number of isoforms to keep minimal redundancy. As a result, the database search might not identify a target protein even with high quality tandem MS data and accurate intact precursor ion mass. We computationally predicted an exhaustive list of putative isoforms of Aspergillus flavus proteins from 20 371 expressed sequence tags to investigate whether an alternative splicing protein database can assign a greater proportion of mass spectrometry data. The newly constructed AS database provided 9807 new alternatively spliced variants in addition to 12 832 previously annotated proteins. The searches of the existing tandem MS spectra data set using the AS database identified 29 new proteins encoded by 26 genes. Nine fungal genes appeared to have multiple protein isoforms. In addition to the discovery of splice variants, AS database also showed potential to improve genome annotation. In summary, the introduction of an alternative splicing database helps identify more proteins and unveils more information about a proteome.

  11. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    PubMed Central

    2010-01-01

    Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows. PMID:21034480

  12. Visualisation and graph-theoretic analysis of a large-scale protein structural interactome

    PubMed Central

    Bolser, Dan; Dafas, Panos; Harrington, Richard; Park, Jong; Schroeder, Michael

    2003-01-01

    Background Large-scale protein interaction maps provide a new, global perspective with which to analyse protein function. PSIMAP, the Protein Structural Interactome Map, is a database of all the structurally observed interactions between superfamilies of protein domains with known three-dimensional structure in the PDB. PSIMAP incorporates both functional and evolutionary information into a single network. Results We present a global analysis of PSIMAP using several distinct network measures relating to centrality, interactivity, fault-tolerance, and taxonomic diversity. We found the following results: Centrality: we show that the center and barycenter of PSIMAP do not coincide, and that the superfamilies forming the barycenter relate to very general functions, while those constituting the center relate to enzymatic activity. Interactivity: we identify the P-loop and immunoglobulin superfamilies as the most highly interactive. We successfully use connectivity and cluster index, which characterise the connectivity of a superfamily's neighbourhood, to discover superfamilies of complex I and II. This is particularly significant as the structure of complex I is not yet solved. Taxonomic diversity: we found that highly interactive superfamilies are in general taxonomically very diverse and are thus amongst the oldest. Fault-tolerance: we found that the network is very robust as for the majority of superfamilies removal from the network will not break up the network. Conclusions Overall, we can single out the P-loop containing nucleotide triphosphate hydrolases superfamily as it is the most highly connected and has the highest taxonomic diversity. In addition, this superfamily has the highest interaction rank, is the barycenter of the network (it has the shortest average path to every other superfamily in the network), and is an articulation vertex, whose removal will disconnect the network. More generally, we conclude that the graph-theoretic and taxonomic analysis of PSIMAP is an important step towards the understanding of protein function and could be an important tool for tracing the evolution of life at the molecular level. PMID:14531933

  13. COMPUTING THERAPY FOR PRECISION MEDICINE: COLLABORATIVE FILTERING INTEGRATES AND PREDICTS MULTI-ENTITY INTERACTIONS.

    PubMed

    Regenbogen, Sam; Wilkins, Angela D; Lichtarge, Olivier

    2016-01-01

    Biomedicine produces copious information it cannot fully exploit. Specifically, there is considerable need to integrate knowledge from disparate studies to discover connections across domains. Here, we used a Collaborative Filtering approach, inspired by online recommendation algorithms, in which non-negative matrix factorization (NMF) predicts interactions among chemicals, genes, and diseases only from pairwise information about their interactions. Our approach, applied to matrices derived from the Comparative Toxicogenomics Database, successfully recovered Chemical-Disease, Chemical-Gene, and Disease-Gene networks in 10-fold cross-validation experiments. Additionally, we could predict each of these interaction matrices from the other two. Integrating all three CTD interaction matrices with NMF led to good predictions of STRING, an independent, external network of protein-protein interactions. Finally, this approach could integrate the CTD and STRING interaction data to improve Chemical-Gene cross-validation performance significantly, and, in a time-stamped study, it predicted information added to CTD after a given date, using only data prior to that date. We conclude that collaborative filtering can integrate information across multiple types of biological entities, and that as a first step towards precision medicine it can compute drug repurposing hypotheses.

  14. COMPUTING THERAPY FOR PRECISION MEDICINE: COLLABORATIVE FILTERING INTEGRATES AND PREDICTS MULTI-ENTITY INTERACTIONS

    PubMed Central

    REGENBOGEN, SAM; WILKINS, ANGELA D.; LICHTARGE, OLIVIER

    2015-01-01

    Biomedicine produces copious information it cannot fully exploit. Specifically, there is considerable need to integrate knowledge from disparate studies to discover connections across domains. Here, we used a Collaborative Filtering approach, inspired by online recommendation algorithms, in which non-negative matrix factorization (NMF) predicts interactions among chemicals, genes, and diseases only from pairwise information about their interactions. Our approach, applied to matrices derived from the Comparative Toxicogenomics Database, successfully recovered Chemical-Disease, Chemical-Gene, and Disease-Gene networks in 10-fold cross-validation experiments. Additionally, we could predict each of these interaction matrices from the other two. Integrating all three CTD interaction matrices with NMF led to good predictions of STRING, an independent, external network of protein-protein interactions. Finally, this approach could integrate the CTD and STRING interaction data to improve Chemical-Gene cross-validation performance significantly, and, in a time-stamped study, it predicted information added to CTD after a given date, using only data prior to that date. We conclude that collaborative filtering can integrate information across multiple types of biological entities, and that as a first step towards precision medicine it can compute drug repurposing hypotheses. PMID:26776170

  15. CASTIN: a system for comprehensive analysis of cancer-stromal interactome.

    PubMed

    Komura, Daisuke; Isagawa, Takayuki; Kishi, Kazuki; Suzuki, Ryohei; Sato, Reiko; Tanaka, Mariko; Katoh, Hiroto; Yamamoto, Shogo; Tatsuno, Kenji; Fukayama, Masashi; Aburatani, Hiroyuki; Ishikawa, Shumpei

    2016-11-09

    Cancer microenvironment plays a vital role in cancer development and progression, and cancer-stromal interactions have been recognized as important targets for cancer therapy. However, identifying relevant and druggable cancer-stromal interactions is challenging due to the lack of quantitative methods to analyze whole cancer-stromal interactome. We present CASTIN (CAncer-STromal INteractome analysis), a novel framework for the evaluation of cancer-stromal interactome from RNA-Seq data using cancer xenograft models. For each ligand-receptor interaction which is derived from curated protein-protein interaction database, CASTIN summarizes gene expression profiles of cancer and stroma into three evaluation indices. These indices provide quantitative evaluation and comprehensive visualization of interactome, and thus enable to identify critical cancer-microenvironment interactions, which would be potential drug targets. We applied CASTIN to the dataset of pancreas ductal adenocarcinoma, and successfully characterized the individual cancer in terms of cancer-stromal relationships, and identified both well-known and less-characterized druggable interactions. CASTIN provides comprehensive view of cancer-stromal interactome and is useful to identify critical interactions which may serve as potential drug targets in cancer-microenvironment. CASTIN is available at: http://github.com/tmd-gpat/CASTIN .

  16. Assembling a protein-protein interaction map of the SSU processome from existing datasets.

    PubMed

    Lim, Young H; Charette, J Michael; Baserga, Susan J

    2011-03-10

    The small subunit (SSU) processome is a large ribonucleoprotein complex involved in small ribosomal subunit assembly. It consists of the U3 snoRNA and ∼72 proteins. While most of its components have been identified, the protein-protein interactions (PPIs) among them remain largely unknown, and thus the assembly, architecture and function of the SSU processome remains unclear. We queried PPI databases for SSU processome proteins to quantify the degree to which the three genome-wide high-throughput yeast two-hybrid (HT-Y2H) studies, the genome-wide protein fragment complementation assay (PCA) and the literature-curated (LC) datasets cover the SSU processome interactome. We find that coverage of the SSU processome PPI network is remarkably sparse. Two of the three HT-Y2H studies each account for four and six PPIs between only six of the 72 proteins, while the third study accounts for as little as one PPI and two proteins. The PCA dataset has the highest coverage among the genome-wide studies with 27 PPIs between 25 proteins. The LC dataset was the most extensive, accounting for 34 proteins and 38 PPIs, many of which were validated by independent methods, thereby further increasing their reliability. When the collected data were merged, we found that at least 70% of the predicted PPIs have yet to be determined and 26 proteins (36%) have no known partners. Since the SSU processome is conserved in all Eukaryotes, we also queried HT-Y2H datasets from six additional model organisms, but only four orthologues and three previously known interologous interactions were found. This provides a starting point for further work on SSU processome assembly, and spotlights the need for a more complete genome-wide Y2H analysis.

  17. Assembling a Protein-Protein Interaction Map of the SSU Processome from Existing Datasets

    PubMed Central

    Baserga, Susan J.

    2011-01-01

    Background The small subunit (SSU) processome is a large ribonucleoprotein complex involved in small ribosomal subunit assembly. It consists of the U3 snoRNA and ∼72 proteins. While most of its components have been identified, the protein-protein interactions (PPIs) among them remain largely unknown, and thus the assembly, architecture and function of the SSU processome remains unclear. Methodology We queried PPI databases for SSU processome proteins to quantify the degree to which the three genome-wide high-throughput yeast two-hybrid (HT-Y2H) studies, the genome-wide protein fragment complementation assay (PCA) and the literature-curated (LC) datasets cover the SSU processome interactome. Conclusions We find that coverage of the SSU processome PPI network is remarkably sparse. Two of the three HT-Y2H studies each account for four and six PPIs between only six of the 72 proteins, while the third study accounts for as little as one PPI and two proteins. The PCA dataset has the highest coverage among the genome-wide studies with 27 PPIs between 25 proteins. The LC dataset was the most extensive, accounting for 34 proteins and 38 PPIs, many of which were validated by independent methods, thereby further increasing their reliability. When the collected data were merged, we found that at least 70% of the predicted PPIs have yet to be determined and 26 proteins (36%) have no known partners. Since the SSU processome is conserved in all Eukaryotes, we also queried HT-Y2H datasets from six additional model organisms, but only four orthologues and three previously known interologous interactions were found. This provides a starting point for further work on SSU processome assembly, and spotlights the need for a more complete genome-wide Y2H analysis. PMID:21423703

  18. NetCoffee: a fast and accurate global alignment approach to identify functionally conserved proteins in multiple networks.

    PubMed

    Hu, Jialu; Kehr, Birte; Reinert, Knut

    2014-02-15

    Owing to recent advancements in high-throughput technologies, protein-protein interaction networks of more and more species become available in public databases. The question of how to identify functionally conserved proteins across species attracts a lot of attention in computational biology. Network alignments provide a systematic way to solve this problem. However, most existing alignment tools encounter limitations in tackling this problem. Therefore, the demand for faster and more efficient alignment tools is growing. We present a fast and accurate algorithm, NetCoffee, which allows to find a global alignment of multiple protein-protein interaction networks. NetCoffee searches for a global alignment by maximizing a target function using simulated annealing on a set of weighted bipartite graphs that are constructed using a triplet approach similar to T-Coffee. To assess its performance, NetCoffee was applied to four real datasets. Our results suggest that NetCoffee remedies several limitations of previous algorithms, outperforms all existing alignment tools in terms of speed and nevertheless identifies biologically meaningful alignments. The source code and data are freely available for download under the GNU GPL v3 license at https://code.google.com/p/netcoffee/.

  19. Systematic Mapping and Functional Analysis of a Family of Human Epididymal Secretory Sperm-Located Proteins*

    PubMed Central

    Li, JianYuan; Liu, FuJun; Wang, HaiYan; Liu, Xin; Liu, Juan; Li, Ning; Wan, FengChun; Wang, WenTing; Zhang, ChengLin; Jin, ShaoHua; Liu, Jie; Zhu, Peng; Liu, YunXiang

    2010-01-01

    The mammalian spermatozoon has many cellular compartments, such as head and tail, permitting it to interact with the female reproductive tract and fertilize the egg. It acquires this fertilizing potential during transit through the epididymis, which secretes proteins that coat different sperm domains. Optimal levels of these proteins provide the spermatozoon with its ability to move to, bind to, fuse with, and penetrate the egg; otherwise male infertility results. As few human epididymal proteins have been characterized, this work was performed to generate a database of human epididymal sperm-located proteins involved in maturation. Two-dimensional gel electrophoresis of epididymal tissue and luminal fluid proteins, followed by identification using MALDI-TOF/MS or MALDI-TOF/TOF, revealed over a thousand spots in gels comprising 745 abundant nonstructural proteins, 408 in luminal fluids, of which 207 were present on spermatozoa. Antibodies raised to 619 recombinant or synthetic peptides, used in Western blots, histological sections, and washed sperm preparations to confirm antibody quality and protein expression, indicated their regional location in the epididymal epithelium and highly specific locations on washed functional spermatozoa. Sperm function tests suggested the role of some proteins in motility and protection against oxidative attack. A large database of these proteins, characterized by size, pI, chromosomal location, and function, was given a unified terminology reflecting their sperm domain location. These novel, secreted human epididymal proteins are potential targets for a posttesticular contraceptive acting to provide rapid, reversible, functional sterility in men and they are also biomarkers that could be used in noninvasive assessments of male fertility. PMID:20736409

  20. HAEdb: a novel interactive, locus-specific mutation database for the C1 inhibitor gene.

    PubMed

    Kalmár, Lajos; Hegedüs, Tamás; Farkas, Henriette; Nagy, Melinda; Tordai, Attila

    2005-01-01

    Hereditary angioneurotic edema (HAE) is an autosomal dominant disorder characterized by episodic local subcutaneous and submucosal edema and is caused by the deficiency of the activated C1 esterase inhibitor protein (C1-INH or C1INH; approved gene symbol SERPING1). Published C1-INH mutations are represented in large universal databases (e.g., OMIM, HGMD), but these databases update their data rather infrequently, they are not interactive, and they do not allow searches according to different criteria. The HAEdb, a C1-INH gene mutation database (http://hae.biomembrane.hu) was created to contribute to the following expectations: 1) help the comprehensive collection of information on genetic alterations of the C1-INH gene; 2) create a database in which data can be searched and compared according to several flexible criteria; and 3) provide additional help in new mutation identification. The website uses MySQL, an open-source, multithreaded, relational database management system. The user-friendly graphical interface was written in the PHP web programming language. The website consists of two main parts, the freely browsable search function, and the password-protected data deposition function. Mutations of the C1-INH gene are divided in two parts: gross mutations involving DNA fragments >1 kb, and micro mutations encompassing all non-gross mutations. Several attributes (e.g., affected exon, molecular consequence, family history) are collected for each mutation in a standardized form. This database may facilitate future comprehensive analyses of C1-INH mutations and also provide regular help for molecular diagnostic testing of HAE patients in different centers.

  1. KIDFamMap: a database of kinase-inhibitor-disease family maps for kinase inhibitor selectivity and binding mechanisms

    PubMed Central

    Chiu, Yi-Yuan; Lin, Chih-Ta; Huang, Jhang-Wei; Hsu, Kai-Cheng; Tseng, Jen-Hu; You, Syuan-Ren; Yang, Jinn-Moon

    2013-01-01

    Kinases play central roles in signaling pathways and are promising therapeutic targets for many diseases. Designing selective kinase inhibitors is an emergent and challenging task, because kinases share an evolutionary conserved ATP-binding site. KIDFamMap (http://gemdock.life.nctu.edu.tw/KIDFamMap/) is the first database to explore kinase-inhibitor families (KIFs) and kinase-inhibitor-disease (KID) relationships for kinase inhibitor selectivity and mechanisms. This database includes 1208 KIFs, 962 KIDs, 55 603 kinase-inhibitor interactions (KIIs), 35 788 kinase inhibitors, 399 human protein kinases, 339 diseases and 638 disease allelic variants. Here, a KIF can be defined as follows: (i) the kinases in the KIF with significant sequence similarity, (ii) the inhibitors in the KIF with significant topology similarity and (iii) the KIIs in the KIF with significant interaction similarity. The KIIs within a KIF are often conserved on some consensus KIDFamMap anchors, which represent conserved interactions between the kinase subsites and consensus moieties of their inhibitors. Our experimental results reveal that the members of a KIF often possess similar inhibition profiles. The KIDFamMap anchors can reflect kinase conformations types, kinase functions and kinase inhibitor selectivity. We believe that KIDFamMap provides biological insights into kinase inhibitor selectivity and binding mechanisms. PMID:23193279

  2. Database resources of the National Center for Biotechnology Information

    PubMed Central

    Wheeler, David L.; Barrett, Tanya; Benson, Dennis A.; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Kenton, David L.; Khovayko, Oleg; Lipman, David J.; Madden, Thomas L.; Maglott, Donna R.; Ostell, James; Pruitt, Kim D.; Schuler, Gregory D.; Schriml, Lynn M.; Sequeira, Edwin; Sherry, Stephen T.; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Suzek, Tugba O.; Tatusov, Roman; Tatusova, Tatiana A.; Wagner, Lukas; Yaschenko, Eugene

    2006-01-01

    In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Retroviral Genotyping Tools, HIV-1, Human Protein Interaction Database, SAGEmap, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of the resources can be accessed through the NCBI home page at: . PMID:16381840

  3. Database on pharmacophore analysis of active principles, from medicinal plants

    PubMed Central

    Pitchai, Daisy; Manikkam, Rajalakshmi; Rajendran, Sasikala R; Pitchai, Gnanamani

    2010-01-01

    Plants continue to be a major source of medicines, as they have been throughout human history. In the present days, drug discovery from plants involves a multidisciplinary approach combining ethnobotanical, phytochemical and biological techniques to provide us new chemical compounds (lead molecules) for the development of drugs against various pharmacological targets, including cancer, diabetes and its secondary complications. In view of this need in current drug discovery from medicinal plants, here we describe another web database containing the information of pharmacophore analysis of active principles possessing antidiabetic, antimicrobial, anticancerous and antioxidant properties from medicinal plants. The database provides the botanical, taxonomic classification, biochemical as well as pharmacological properties of medicinal plants. Data on antidiabetic, antimicrobial, anti oxidative, anti tumor and anti inflammatory compounds, and their physicochemical properties, SMILES Notation, Lipinski's properties are included in our database. One of the proposed features in the database is the predicted ADMET values and the interaction of bioactive compounds to the target protein. The database alphabetically lists the compound name and also provides tabs separating for anti microbial, antitumor, antidiabetic, and antioxidative compounds. Availability http://www.hccbif.info / PMID:21346859

  4. Survey of Natural Language Processing Techniques in Bioinformatics.

    PubMed

    Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling

    2015-01-01

    Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.

  5. The Mitosis and Neurodevelopment Proteins NDE1 and NDEL1 Form Dimers, Tetramers, and Polymers with a Folded Back Structure in Solution*

    PubMed Central

    Soares, Dinesh C.; Bradshaw, Nicholas J.; Zou, Juan; Kennaway, Christopher K.; Hamilton, Russell S.; Chen, Zhuo A.; Wear, Martin A.; Blackburn, Elizabeth A.; Bramham, Janice; Böttcher, Bettina; Millar, J. Kirsty; Barlow, Paul N.; Walkinshaw, Malcolm D.; Rappsilber, Juri; Porteous, David J.

    2012-01-01

    Paralogs NDE1 (nuclear distribution element 1) and NDEL1 (NDE-like 1) are essential for mitosis and neurodevelopment. Both proteins are predicted to have similar structures, based upon high sequence similarity, and they co-complex in mammalian cells. X-ray diffraction studies and homology modeling suggest that their N-terminal regions (residues 8–167) adopt continuous, extended α-helical coiled-coil structures, but no experimentally derived information on the structure of their C-terminal regions or the architecture of the full-length proteins is available. In the case of NDE1, no biophysical data exists. Here we characterize the structural architecture of both full-length proteins utilizing negative stain electron microscopy along with our established paradigm of chemical cross-linking followed by tryptic digestion, mass spectrometry, and database searching, which we enhance using isotope labeling for mixed NDE1-NDEL1. We determined that full-length NDE1 forms needle-like dimers and tetramers in solution, similar to crystal structures of NDEL1, as well as chain-like end-to-end polymers. The C-terminal domain of each protein, required for interaction with key protein partners dynein and DISC1 (disrupted-in-schizophrenia 1), includes a predicted disordered region that allows a bent back structure. This facilitates interaction of the C-terminal region with the N-terminal coiled-coil domain and is in agreement with previous results showing N- and C-terminal regions of NDEL1 and NDE1 cooperating in dynein interaction. It sheds light on recently identified mutations in the NDE1 gene that cause truncation of the encoded protein. Additionally, analysis of mixed NDE1-NDEL1 complexes demonstrates that NDE1 and NDEL1 can interact directly. PMID:22843697

  6. The PMDB Protein Model Database

    PubMed Central

    Castrignanò, Tiziana; De Meo, Paolo D'Onorio; Cozzetto, Domenico; Talamo, Ivano Giuseppe; Tramontano, Anna

    2006-01-01

    The Protein Model Database (PMDB) is a public resource aimed at storing manually built 3D models of proteins. The database is designed to provide access to models published in the scientific literature, together with validating experimental data. It is a relational database and it currently contains >74 000 models for ∼240 proteins. The system is accessible at and allows predictors to submit models along with related supporting evidence and users to download them through a simple and intuitive interface. Users can navigate in the database and retrieve models referring to the same target protein or to different regions of the same protein. Each model is assigned a unique identifier that allows interested users to directly access the data. PMID:16381873

  7. PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank.

    PubMed

    Tusnády, Gábor E; Dosztányi, Zsuzsanna; Simon, István

    2005-01-01

    PDB_TM is a database for transmembrane proteins with known structures. It aims to collect all transmembrane proteins that are deposited in the protein structure database (PDB) and to determine their membrane-spanning regions. These assignments are based on the TMDET algorithm, which uses only structural information to locate the most likely position of the lipid bilayer and to distinguish between transmembrane and globular proteins. This algorithm was applied to all PDB entries and the results were collected in the PDB_TM database. By using TMDET algorithm, the PDB_TM database can be automatically updated every week, keeping it synchronized with the latest PDB updates. The PDB_TM database is available at http://www.enzim.hu/PDB_TM.

  8. Analysis of Cysteine Redox Post-Translational Modifications in Cell Biology and Drug Pharmacology.

    PubMed

    Wani, Revati; Murray, Brion W

    2017-01-01

    Reversible cysteine oxidation is an emerging class of protein post-translational modification (PTM) that regulates catalytic activity, modulates conformation, impacts protein-protein interactions, and affects subcellular trafficking of numerous proteins. Redox PTMs encompass a broad array of cysteine oxidation reactions with different half-lives, topographies, and reactivities such as S-glutathionylation and sulfoxidation. Recent studies from our group underscore the lesser known effect of redox protein modifications on drug binding. To date, biological studies to understand mechanistic and functional aspects of redox regulation are technically challenging. A prominent issue is the lack of tools for labeling proteins oxidized to select chemotype/oxidant species in cells. Predictive computational tools and curated databases of oxidized proteins are facilitating structural and functional insights into regulation of the network of oxidized proteins or redox proteome. In this chapter, we discuss analytical platforms for studying protein oxidation, suggest computational tools currently available in the field to determine redox sensitive proteins, and begin to illuminate roles of cysteine redox PTMs in drug pharmacology.

  9. Study of intermolecular contacts in proteins and oligomer interfaces and preliminary investigations into the design and production of nanomaterials from proteins

    NASA Astrophysics Data System (ADS)

    Iyer, Ganesh Hariharan

    The first part of this research involved a study of the nature and extent of nonbonded interactions at crystal and oligomer interfaces. A survey was compiled of several characteristics of intersubunit contacts in 58 different oligomeric proteins, and of the intermolecular contacts in 223 protein crystal structures. Routines written in "S" language were utilized for the generation of the observed and expected contacts. The information in the Protein Data Bank (PDB) was extracted using the database management system, Protein Knowledge Base (PKB). Potentials of mean force for atom-atom contacts and residue-residue contacts were derived by comparison of the number of observed interactions with the number expected by mass action. Preference association matrices and log-linear analyses were applied to determine the different factors that could contribute to the overall interactions at the interfaces of oligomers and crystals. Surface patches at oligomer and crystal interfaces were also studied to further investigate the origin of the differences in their stabilities. Total number of atoms in contact and the secondary structure elements involved are similar in the two types of interfaces. Crystal contacts result from more numerous interactions by polar residues, compared with a tendency toward nonpolar amino acid prominent in oligomer interfaces. Contact potentials indicate that hydrophobic interactions at oligomer interfaces favor aromatic amino acids and methionine over aliphatic amino acids; and that crystal contacts form in such a way as to avoid inclusion of hydrophobic interactions. The second part involved the development of a new class of biomaterials from two-dimensional arrays of ordered proteins. Point mutations were planned to introduce cysteine residues at appropriate locations to enable cross-linking at the molecular interface within given crystallographic planes. Crystallization and subsequent cross-linking of the modified protein would lead to the formation of arrays on subsequent dissociation of the crystal. Novel protein architectures can be generated from these cross-linked nanostructures. Experiments with model protein, maltose-binding protein (MBP) were performed to develop purification, cross-linking and crystallization techniques. The long-term goal of this project is to apply the experience gained with MBP to the fabrication of nanomaterials from other, application-specific proteins for ultrafiltration and microelectronic devices.

  10. Proteomic analysis of the Theileria annulata schizont

    PubMed Central

    Witschi, M.; Xia, D.; Sanderson, S.; Baumgartner, M.; Wastling, J.M.; Dobbelaere, D.A.E.

    2013-01-01

    The apicomplexan parasite, Theileria annulata, is the causative agent of tropical theileriosis, a devastating lymphoproliferative disease of cattle. The schizont stage transforms bovine leukocytes and provides an intriguing model to study host/pathogen interactions. The genome of T. annulata has been sequenced and transcriptomic data are rapidly accumulating. In contrast, little is known about the proteome of the schizont, the pathogenic, transforming life cycle stage of the parasite. Using one-dimensional (1-D) gel LC-MS/MS, a proteomic analysis of purified T. annulata schizonts was carried out. In whole parasite lysates, 645 proteins were identified. Proteins with transmembrane domains (TMDs) were under-represented and no proteins with more than four TMDs could be detected. To tackle this problem, Triton X-114 treatment was applied, which facilitates the extraction of membrane proteins, followed by 1-D gel LC-MS/MS. This resulted in the identification of an additional 153 proteins. Half of those had one or more TMD and 30 proteins with more than four TMDs were identified. This demonstrates that Triton X-114 treatment can provide a valuable additional tool for the identification of new membrane proteins in proteomic studies. With two exceptions, all proteins involved in glycolysis and the citric acid cycle were identified. For at least 29% of identified proteins, the corresponding transcripts were not present in the existing expressed sequence tag databases. The proteomics data were integrated into the publicly accessible database resource at EuPathDB (www.eupathdb.org) so that mass spectrometry-based protein expression evidence for T. annulata can be queried alongside transcriptional and other genomics data available for these parasites. PMID:23178997

  11. 3D Complex: A Structural Classification of Protein Complexes

    PubMed Central

    Levy, Emmanuel D; Pereira-Leal, Jose B; Chothia, Cyrus; Teichmann, Sarah A

    2006-01-01

    Most of the proteins in a cell assemble into complexes to carry out their function. It is therefore crucial to understand the physicochemical properties as well as the evolution of interactions between proteins. The Protein Data Bank represents an important source of information for such studies, because more than half of the structures are homo- or heteromeric protein complexes. Here we propose the first hierarchical classification of whole protein complexes of known 3-D structure, based on representing their fundamental structural features as a graph. This classification provides the first overview of all the complexes in the Protein Data Bank and allows nonredundant sets to be derived at different levels of detail. This reveals that between one-half and two-thirds of known structures are multimeric, depending on the level of redundancy accepted. We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones. This is because most complexes contain four subunits or less, and the large majority are homomeric. In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes. Finally, through comparison of Biological Units in the Protein Data Bank with the Protein Quaternary Structure database, we identified many possible errors in quaternary structure assignments. Our classification, available as a database and Web server at http://www.3Dcomplex.org, will be a starting point for future work aimed at understanding the structure and evolution of protein complexes. PMID:17112313

  12. Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

    PubMed

    Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

    2017-06-01

    Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  13. The BioC-BioGRID corpus: full text articles annotated for curation of protein–protein and genetic interactions

    PubMed Central

    Kim, Sun; Chatr-aryamontri, Andrew; Chang, Christie S.; Oughtred, Rose; Rust, Jennifer; Wilbur, W. John; Comeau, Donald C.; Dolinski, Kara; Tyers, Mike

    2017-01-01

    A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein–protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future uses of the BioC-BioGRID corpus are detailed in this report. Database URL: http://bioc.sourceforge.net/BioC-BioGRID.html PMID:28077563

  14. Metagenomic Taxonomy-Guided Database-Searching Strategy for Improving Metaproteomic Analysis.

    PubMed

    Xiao, Jinqiu; Tanca, Alessandro; Jia, Ben; Yang, Runqing; Wang, Bo; Zhang, Yu; Li, Jing

    2018-04-06

    Metaproteomics provides a direct measure of the functional information by investigating all proteins expressed by a microbiota. However, due to the complexity and heterogeneity of microbial communities, it is very hard to construct a sequence database suitable for a metaproteomic study. Using a public database, researchers might not be able to identify proteins from poorly characterized microbial species, while a sequencing-based metagenomic database may not provide adequate coverage for all potentially expressed protein sequences. To address this challenge, we propose a metagenomic taxonomy-guided database-search strategy (MT), in which a merged database is employed, consisting of both taxonomy-guided reference protein sequences from public databases and proteins from metagenome assembly. By applying our MT strategy to a mock microbial mixture, about two times as many peptides were detected as with the metagenomic database only. According to the evaluation of the reliability of taxonomic attribution, the rate of misassignments was comparable to that obtained using an a priori matched database. We also evaluated the MT strategy with a human gut microbial sample, and we found 1.7 times as many peptides as using a standard metagenomic database. In conclusion, our MT strategy allows the construction of databases able to provide high sensitivity and precision in peptide identification in metaproteomic studies, enabling the detection of proteins from poorly characterized species within the microbiota.

  15. A Reference Proteomic Database of Lactobacillus plantarum CMCC-P0002

    PubMed Central

    Tian, Wanhong; Yu, Gang; Liu, Xiankai; Wang, Jie; Feng, Erling; Zhang, Xuemin; Chen, Bei; Zeng, Ming; Wang, Hengliang

    2011-01-01

    Lactobacillus plantarum is a widespread probiotic bacteria found in many fermented food products. In this study, the whole-cell proteins and secretory proteins of L. plantarum were separated by two-dimensional electrophoresis method. A total of 434 proteins were identified by tandem mass spectrometry, including a plasmid-encoded hypothetical protein pLP9000_05. The information of first 20 highest abundance proteins was listed for the further genetic manipulation of L. plantarum, such as construction of high-level expressions system. Furthermore, the first interaction map of L. plantarum was established by Blue-Native/SDS-PAGE technique. A heterodimeric complex composed of maltose phosphorylase Map3 and Map2, and two homodimeric complexes composed of Map3 and Map2 respectively, were identified at the same time, indicating the important roles of these proteins. These findings provided valuable information for the further proteomic researches of L. plantarum. PMID:21998671

  16. The FMRP regulon: from targets to disease convergence

    PubMed Central

    Fernández, Esperanza; Rajan, Nicholas; Bagni, Claudia

    2013-01-01

    The fragile X mental retardation protein (FMRP) is an RNA-binding protein that regulates mRNA metabolism. FMRP has been largely studied in the brain, where the absence of this protein leads to fragile X syndrome, the most frequent form of inherited intellectual disability. Since the identification of the FMRP gene in 1991, many studies have primarily focused on understanding the function/s of this protein. Hundreds of potential FMRP mRNA targets and several interacting proteins have been identified. Here, we report the identification of FMRP mRNA targets in the mammalian brain that support the key role of this protein during brain development and in regulating synaptic plasticity. We compared the genes from databases and genome-wide association studies with the brain FMRP transcriptome, and identified several FMRP mRNA targets associated with autism spectrum disorders, mood disorders and schizophrenia, showing a potential common pathway/s for these apparently different disorders. PMID:24167470

  17. A reference proteomic database of Lactobacillus plantarum CMCC-P0002.

    PubMed

    Zhu, Li; Hu, Wei; Liu, Datao; Tian, Wanhong; Yu, Gang; Liu, Xiankai; Wang, Jie; Feng, Erling; Zhang, Xuemin; Chen, Bei; Zeng, Ming; Wang, Hengliang

    2011-01-01

    Lactobacillus plantarum is a widespread probiotic bacteria found in many fermented food products. In this study, the whole-cell proteins and secretory proteins of L. plantarum were separated by two-dimensional electrophoresis method. A total of 434 proteins were identified by tandem mass spectrometry, including a plasmid-encoded hypothetical protein pLP9000_05. The information of first 20 highest abundance proteins was listed for the further genetic manipulation of L. plantarum, such as construction of high-level expressions system. Furthermore, the first interaction map of L. plantarum was established by Blue-Native/SDS-PAGE technique. A heterodimeric complex composed of maltose phosphorylase Map3 and Map2, and two homodimeric complexes composed of Map3 and Map2 respectively, were identified at the same time, indicating the important roles of these proteins. These findings provided valuable information for the further proteomic researches of L. plantarum.

  18. Proteomic analysis of protein interactions between Eimeria maxima sporozoites and chicken jejunal epithelial cells by shotgun LC-MS/MS.

    PubMed

    Huang, Jingwei; Liu, Tingqi; Li, Ke; Song, Xiaokai; Yan, Ruofeng; Xu, Lixin; Li, Xiangrui

    2018-04-04

    Eimeria maxima initiates infection by invading the jejunal epithelial cells of chicken. However, the proteins involved in invasion remain unknown. The research of the molecules that participate in the interactions between E. maxima sporozoites and host target cells will fill a gap in our understanding of the invasion system of this parasitic pathogen. In the present study, chicken jejunal epithelial cells were isolated and cultured in vitro. Western blot was employed to analyze the soluble proteins of E. maxima sporozoites that bound to chicken jejunal epithelial cells. Co-immunoprecipitation (co-IP) assay was used to separate the E. maxima proteins that bound to chicken jejunal epithelial cells. Shotgun LC-MS/MS technique was used for proteomics identification and Gene Ontology was employed for the bioinformatics analysis. The results of Western blot analysis showed that four proteins bands from jejunal epithelial cells co-cultured with soluble proteins of E. maxima sporozoites were recognized by the positive sera, with molecular weights of 70, 90, 95 and 130 kDa. The co-IP dilutions were analyzed by shotgun LC-MS/MS. A total of 204 proteins were identified in the E. maxima protein database using the MASCOT search engine. Thirty-five proteins including microneme protein 3 and 7 had more than two unique peptide counts and were annotated using Gene Ontology for molecular function, biological process and cellular localization. The results revealed that of the 35 annotated peptides, 22 (62.86%) were associated with binding activity and 15 (42.86%) were involved in catalytic activity. Our findings provide an insight into the interaction between E. maxima and the corresponding host cells and it is important for the understanding of molecular mechanisms underlying E. maxima invasion.

  19. Introducing meta-services for biomedical information extraction

    PubMed Central

    Leitner, Florian; Krallinger, Martin; Rodriguez-Penagos, Carlos; Hakenberg, Jörg; Plake, Conrad; Kuo, Cheng-Ju; Hsu, Chun-Nan; Tsai, Richard Tzong-Han; Hung, Hsi-Chuan; Lau, William W; Johnson, Calvin A; Sætre, Rune; Yoshida, Kazuhiro; Chen, Yan Hua; Kim, Sun; Shin, Soo-Yong; Zhang, Byoung-Tak; Baumgartner, William A; Hunter, Lawrence; Haddow, Barry; Matthews, Michael; Wang, Xinglong; Ruch, Patrick; Ehrler, Frédéric; Özgür, Arzucan; Erkan, Güneş; Radev, Dragomir R; Krauthammer, Michael; Luong, ThaiBinh; Hoffmann, Robert; Sander, Chris; Valencia, Alfonso

    2008-01-01

    We introduce the first meta-service for information extraction in molecular biology, the BioCreative MetaServer (BCMS; ). This prototype platform is a joint effort of 13 research groups and provides automatically generated annotations for PubMed/Medline abstracts. Annotation types cover gene names, gene IDs, species, and protein-protein interactions. The annotations are distributed by the meta-server in both human and machine readable formats (HTML/XML). This service is intended to be used by biomedical researchers and database annotators, and in biomedical language processing. The platform allows direct comparison, unified access, and result aggregation of the annotations. PMID:18834497

  20. Visualization portal for genetic variation (VizGVar): a tool for interactive visualization of SNPs and somatic mutations in exons, genes and protein domains.

    PubMed

    Solano-Román, Antonio; Alfaro-Arias, Verónica; Cruz-Castillo, Carlos; Orozco-Solano, Allan

    2018-03-15

    VizGVar was designed to meet the growing need of the research community for improved genomic and proteomic data viewers that benefit from better information visualization. We implemented a new information architecture and applied user centered design principles to provide a new improved way of visualizing genetic information and protein data related to human disease. VizGVar connects the entire database of Ensembl protein motifs, domains, genes and exons with annotated SNPs and somatic variations from PharmGKB and COSMIC. VizGVar precisely represents genetic variations and their respective location by colored curves to designate different types of variations. The structured hierarchy of biological data is reflected in aggregated patterns through different levels, integrating several layers of information at once. VizGVar provides a new interactive, web-based JavaScript visualization of somatic mutations and protein variation, enabling fast and easy discovery of clinically relevant variation patterns. VizGVar is accessible at http://vizport.io/vizgvar; http://vizport.io/vizgvar/doc/. asolano@broadinstitute.org or allan.orozcosolano@ucr.ac.cr.

  1. Evidence for a strong sulfur-aromatic interaction derived from crystallographic data.

    PubMed

    Zauhar, R J; Colbert, C L; Morgan, R S; Welsh, W J

    2000-03-01

    We have uncovered new evidence for a significant interaction between divalent sulfur atoms and aromatic rings. Our study involves a statistical analysis of interatomic distances and other geometric descriptors derived from entries in the Cambridge Crystallographic Database (F. H. Allen and O. Kennard, Chem. Design Auto. News, 1993, Vol. 8, pp. 1 and 31-37). A set of descriptors was defined sufficient in number and type so as to elucidate completely the preferred geometry of interaction between six-membered aromatic carbon rings and divalent sulfurs for all crystal structures of nonmetal-bearing organic compounds present in the database. In order to test statistical significance, analogous probability distributions for the interaction of the moiety X-CH(2)-X with aromatic rings were computed, and taken a priori to correspond to the null hypothesis of no significant interaction. Tests of significance were carried our pairwise between probability distributions of sulfur-aromatic interaction descriptors and their CH(2)-aromatic analogues using the Smirnov-Kolmogorov nonparametric test (W. W. Daniel, Applied Nonparametric Statistics, Houghton-Mifflin: Boston, New York, 1978, pp. 276-286), and in all cases significance at the 99% confidence level or better was observed. Local maxima of the probability distributions were used to define a preferred geometry of interaction between the divalent sulfur moiety and the aromatic ring. Molecular mechanics studies were performed in an effort to better understand the physical basis of the interaction. This study confirms observations based on statistics of interaction of amino acids in protein crystal structures (R. S. Morgan, C. E. Tatsch, R. H. Gushard, J. M. McAdon, and P. K. Warme, International Journal of Peptide Protein Research, 1978, Vol. 11, pp. 209-217; R. S. Morgan and J. M. McAdon, International Journal of Peptide Protein Research, 1980, Vol. 15, pp. 177-180; K. S. C. Reid, P. F. Lindley, and J. M. Thornton, FEBS Letters, 1985, Vol. 190, pp. 209-213), as well as studies involving molecular mechanics (G. Nemethy and H. A. Scheraga, Biochemistry and Biophysics Research Communications, 1981, Vol. 98, pp. 482-487) and quantum chemical calculations (B. V. Cheney, M. W. Schulz, and J. Cheney, Biochimica Biophysica Acta, 1989, Vol. 996, pp.116-124; J. Pranata, Bioorganic Chemistry, 1997, Vol. 25, pp. 213-219)-all of which point to the possible importance of the sulfur-aromatic interaction. However, the preferred geometry of the interaction, as determined from our analysis of the small-molecule crystal data, differs significantly from that found by other approaches. Copyright 2000 John Wiley & Sons, Inc.

  2. Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier.

    PubMed

    Sriwastava, Brijesh K; Basu, Subhadip; Maulik, Ujjwal

    2015-01-01

    Predicting residues that participate in protein-protein interactions (PPI) helps to identify, which amino acids are located at the interface. In this paper, we show that the performance of the classical support vector machine (SVM) algorithm can further be improved with the use of a custom-designed fuzzy membership function, for the partner-specific PPI interface prediction problem. We evaluated the performances of both classical SVM and fuzzy SVM (F-SVM) on the PPI databases of three different model proteomes of Homo sapiens, Escherichia coli and Saccharomyces Cerevisiae and calculated the statistical significance of the developed F-SVM over classical SVM algorithm. We also compared our performance with the available state-of-the-art fuzzy methods in this domain and observed significant performance improvements. To predict interaction sites in protein complexes, local composition of amino acids together with their physico-chemical characteristics are used, where the F-SVM based prediction method exploits the membership function for each pair of sequence fragments. The average F-SVM performance (area under ROC curve) on the test samples in 10-fold cross validation experiment are measured as 77.07, 78.39, and 74.91 percent for the aforementioned organisms respectively. Performances on independent test sets are obtained as 72.09, 73.24 and 82.74 percent respectively. The software is available for free download from http://code.google.com/p/cmater-bioinfo.

  3. Identification and functional analysis of a core gene module associated with hepatitis C virus-induced human hepatocellular carcinoma progression.

    PubMed

    Bai, Gaobo; Zheng, Wenling; Ma, Wenli

    2018-05-01

    Hepatitis C virus (HCV)-induced human hepatocellular carcinoma (HCC) progression may be due to a complex multi-step processes. The developmental mechanism of these processes is worth investigating for the prevention, diagnosis and therapy of HCC. The aim of the present study was to investigate the molecular mechanism underlying the progression of HCV-induced hepatocarcinogenesis. First, the dynamic gene module, consisting of key genes associated with progression between the normal stage and HCC, was identified using the Weighted Gene Co-expression Network Analysis tool from R language. By defining those genes in the module as seeds, the change of co-expression in differentially expressed gene sets in two consecutive stages of pathological progression was examined. Finally, interaction pairs of HCV viral proteins and their directly targeted proteins in the identified module were extracted from the literature and a comprehensive interaction dataset from yeast two-hybrid experiments. By combining the interactions between HCV and their targets, and protein-protein interactions in the Search Tool for the Retrieval of Interacting Genes database (STRING), the HCV-key genes interaction network was constructed and visualized using Cytoscape software 3.2. As a result, a module containing 44 key genes was identified to be associated with HCC progression, due to the dynamic features and functions of those genes in the module. Several important differentially co-expressed gene pairs were identified between non-HCC and HCC stages. In the key genes, cyclin dependent kinase 1 (CDK1), NDC80, cyclin A2 (CCNA2) and rac GTPase activating protein 1 (RACGAP1) were shown to be targeted by the HCV nonstructural proteins NS5A, NS3 and NS5B, respectively. The four genes perform an intermediary role between the HCV viral proteins and the dysfunctional module in the HCV key genes interaction network. These findings provided valuable information for understanding the mechanism of HCV-induced HCC progression and for seeking drug targets for the therapy and prevention of HCC.

  4. Database resources of the National Center for Biotechnology Information

    PubMed Central

    Wheeler, David L.; Barrett, Tanya; Benson, Dennis A.; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Khovayko, Oleg; Landsman, David; Lipman, David J.; Madden, Thomas L.; Maglott, Donna R.; Miller, Vadim; Ostell, James; Pruitt, Kim D.; Schuler, Gregory D.; Shumway, Martin; Sequeira, Edwin; Sherry, Steven T.; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusov, Roman L.; Tatusova, Tatiana A.; Wagner, Lukas; Yaschenko, Eugene

    2008-01-01

    In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:18045790

  5. IntegromeDB: an integrated system and biological search engine

    PubMed Central

    2012-01-01

    Background With the growth of biological data in volume and heterogeneity, web search engines become key tools for researchers. However, general-purpose search engines are not specialized for the search of biological data. Description Here, we present an approach at developing a biological web search engine based on the Semantic Web technologies and demonstrate its implementation for retrieving gene- and protein-centered knowledge. The engine is available at http://www.integromedb.org. Conclusions The IntegromeDB search engine allows scanning data on gene regulation, gene expression, protein-protein interactions, pathways, metagenomics, mutations, diseases, and other gene- and protein-related data that are automatically retrieved from publicly available databases and web pages using biological ontologies. To perfect the resource design and usability, we welcome and encourage community feedback. PMID:22260095

  6. Effective screening strategy using ensembled pharmacophore models combined with cascade docking: application to p53-MDM2 interaction inhibitors.

    PubMed

    Xue, Xin; Wei, Jin-Lian; Xu, Li-Li; Xi, Mei-Yang; Xu, Xiao-Li; Liu, Fang; Guo, Xiao-Ke; Wang, Lei; Zhang, Xiao-Jin; Zhang, Ming-Ye; Lu, Meng-Chen; Sun, Hao-Peng; You, Qi-Dong

    2013-10-28

    Protein-protein interactions (PPIs) play a crucial role in cellular function and form the backbone of almost all biochemical processes. In recent years, protein-protein interaction inhibitors (PPIIs) have represented a treasure trove of potential new drug targets. Unfortunately, there are few successful drugs of PPIIs on the market. Structure-based pharmacophore (SBP) combined with docking has been demonstrated as a useful Virtual Screening (VS) strategy in drug development projects. However, the combination of target complexity and poor binding affinity prediction has thwarted the application of this strategy in the discovery of PPIIs. Here we report an effective VS strategy on p53-MDM2 PPI. First, we built a SBP model based on p53-MDM2 complex cocrystal structures. The model was then simplified by using a Receptor-Ligand complex-based pharmacophore model considering the critical binding features between MDM2 and its small molecular inhibitors. Cascade docking was subsequently applied to improve the hit rate. Based on this strategy, we performed VS on NCI and SPECS databases and successfully discovered 6 novel compounds from 15 hits with the best, compound 1 (NSC 5359), K(i) = 180 ± 50 nM. These compounds can serve as lead compounds for further optimization.

  7. Functional Analysis of OMICs Data and Small Molecule Compounds in an Integrated "Knowledge-Based" Platform.

    PubMed

    Dubovenko, Alexey; Nikolsky, Yuri; Rakhmatulin, Eugene; Nikolskaya, Tatiana

    2017-01-01

    Analysis of NGS and other sequencing data, gene variants, gene expression, proteomics, and other high-throughput (OMICs) data is challenging because of its biological complexity and high level of technical and biological noise. One way to deal with both problems is to perform analysis with a high fidelity annotated knowledgebase of protein interactions, pathways, and functional ontologies. This knowledgebase has to be structured in a computer-readable format and must include software tools for managing experimental data, analysis, and reporting. Here, we present MetaCore™ and Key Pathway Advisor (KPA), an integrated platform for functional data analysis. On the content side, MetaCore and KPA encompass a comprehensive database of molecular interactions of different types, pathways, network models, and ten functional ontologies covering human, mouse, and rat genes. The analytical toolkit includes tools for gene/protein list enrichment analysis, statistical "interactome" tool for the identification of over- and under-connected proteins in the dataset, and a biological network analysis module made up of network generation algorithms and filters. The suite also features Advanced Search, an application for combinatorial search of the database content, as well as a Java-based tool called Pathway Map Creator for drawing and editing custom pathway maps. Applications of MetaCore and KPA include molecular mode of action of disease research, identification of potential biomarkers and drug targets, pathway hypothesis generation, analysis of biological effects for novel small molecule compounds and clinical applications (analysis of large cohorts of patients, and translational and personalized medicine).

  8. A systematic study of chemogenomics of carbohydrates.

    PubMed

    Gu, Jiangyong; Luo, Fang; Chen, Lirong; Yuan, Gu; Xu, Xiaojie

    2014-03-04

    Chemogenomics focuses on the interactions between biologically active molecules and protein targets for drug discovery. Carbohydrates are the most abundant compounds in natural products. Compared with other drugs, the carbohydrate drugs show weaker side effects. Searching for multi-target carbohydrate drugs can be regarded as a solution to improve therapeutic efficacy and safety. In this work, we collected 60 344 carbohydrates from the Universal Natural Products Database (UNPD) and explored the chemical space of carbohydrates by principal component analysis. We found that there is a large quantity of potential lead compounds among carbohydrates. Then we explored the potential of carbohydrates in drug discovery by using a network-based multi-target computational approach. All carbohydrates were docked to 2389 target proteins. The most potential carbohydrates for drug discovery and their indications were predicted based on a docking score-weighted prediction model. We also explored the interactions between carbohydrates and target proteins to find the pathological networks, potential drug candidates and new indications.

  9. Identification of lactoferricin B intracellular targets using an Escherichia coli proteome chip.

    PubMed

    Tu, Yu-Hsuan; Ho, Yu-Hsuan; Chuang, Ying-Chih; Chen, Po-Chung; Chen, Chien-Sheng

    2011-01-01

    Lactoferricin B (LfcinB) is a well-known antimicrobial peptide. Several studies have indicated that it can inhibit bacteria by affecting intracellular activities, but the intracellular targets of this antimicrobial peptide have not been identified. Therefore, we used E. coli proteome chips to identify the intracellular target proteins of LfcinB in a high-throughput manner. We probed LfcinB with E. coli proteome chips and further conducted normalization and Gene Ontology (GO) analyses. The results of the GO analyses showed that the identified proteins were associated with metabolic processes. Moreover, we validated the interactions between LfcinB and chip assay-identified proteins with fluorescence polarization (FP) assays. Sixteen proteins were identified, and an E. coli interaction database (EcID) analysis revealed that the majority of the proteins that interact with these 16 proteins affected the tricarboxylic acid (TCA) cycle. Knockout assays were conducted to further validate the FP assay results. These results showed that phosphoenolpyruvate carboxylase was a target of LfcinB, indicating that one of its mechanisms of action may be associated with pyruvate metabolism. Thus, we used pyruvate assays to conduct an in vivo validation of the relationship between LfcinB and pyruvate level in E. coli. These results showed that E. coli exposed to LfcinB had abnormal pyruvate amounts, indicating that LfcinB caused an accumulation of pyruvate. In conclusion, this study successfully revealed the intracellular targets of LfcinB using an E. coli proteome chip approach.

  10. Identification of Lactoferricin B Intracellular Targets Using an Escherichia coli Proteome Chip

    PubMed Central

    Chen, Po-Chung; Chen, Chien-Sheng

    2011-01-01

    Lactoferricin B (LfcinB) is a well-known antimicrobial peptide. Several studies have indicated that it can inhibit bacteria by affecting intracellular activities, but the intracellular targets of this antimicrobial peptide have not been identified. Therefore, we used E. coli proteome chips to identify the intracellular target proteins of LfcinB in a high-throughput manner. We probed LfcinB with E. coli proteome chips and further conducted normalization and Gene Ontology (GO) analyses. The results of the GO analyses showed that the identified proteins were associated with metabolic processes. Moreover, we validated the interactions between LfcinB and chip assay-identified proteins with fluorescence polarization (FP) assays. Sixteen proteins were identified, and an E. coli interaction database (EcID) analysis revealed that the majority of the proteins that interact with these 16 proteins affected the tricarboxylic acid (TCA) cycle. Knockout assays were conducted to further validate the FP assay results. These results showed that phosphoenolpyruvate carboxylase was a target of LfcinB, indicating that one of its mechanisms of action may be associated with pyruvate metabolism. Thus, we used pyruvate assays to conduct an in vivo validation of the relationship between LfcinB and pyruvate level in E. coli. These results showed that E. coli exposed to LfcinB had abnormal pyruvate amounts, indicating that LfcinB caused an accumulation of pyruvate. In conclusion, this study successfully revealed the intracellular targets of LfcinB using an E. coli proteome chip approach. PMID:22164243

  11. Functional study of hot pepper 26S proteasome subunit RPN7 induced by Tobacco mosaic virus from nuclear proteome analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Boo-Ja; Kwon, Sun Jae; Kim, Sung-Kyu

    Two-dimensional gel electrophoresis (2-DE) was applied for the screening of Tobacco mosaic virus (TMV)-induced hot pepper (Capsicum annuum cv. Bugang) nuclear proteins. From differentially expressed protein spots, we acquired the matched peptide mass fingerprint (PMF) data, analyzed by MALDI-TOF MS, from the non-redundant hot pepper EST protein FASTA database using the VEMS 2.0 software. Among six identified nuclear proteins, the hot pepper 26S proteasome subunit RPN7 (CaRPN7) was subjected to further study. The level of CaRPN7 mRNA was specifically increased during incompatible TMV-P{sub 0} interaction, but not during compatible TMV-P{sub 1.2} interaction. When CaRPN7::GFP fusion protein was targeted in onionmore » cells, the nuclei had been broken into pieces. In the hot pepper leaves, cell death was exacerbated and genomic DNA laddering was induced by Agrobacterium-mediated transient overexpression of CaPRN7. Thus, this report presents that the TMV-induced CaRPN7 may be involved in programmed cell death (PCD) in the hot pepper plant.« less

  12. Complete genome-wide screening and subtractive genomic approach revealed new virulence factors, potential drug targets against bio-war pathogen Brucella melitensis 16M

    PubMed Central

    Pradeepkiran, Jangampalli Adi; Sainath, Sri Bhashyam; Kumar, Konidala Kranthi; Bhaskar, Matcha

    2015-01-01

    Brucella melitensis 16M is a Gram-negative coccobacillus that infects both animals and humans. It causes a disease known as brucellosis, which is characterized by acute febrile illness in humans and causes abortions in livestock. To prevent and control brucellosis, identification of putative drug targets is crucial. The present study aimed to identify drug targets in B. melitensis 16M by using a subtractive genomic approach. We used available database repositories (Database of Essential Genes, Kyoto Encyclopedia of Genes and Genomes Automatic Annotation Server, and Kyoto Encyclopedia of Genes and Genomes) to identify putative genes that are nonhomologous to humans and essential for pathogen B. melitensis 16M. The results revealed that among 3 Mb genome size of pathogen, 53 putative characterized and 13 uncharacterized hypothetical genes were identified; further, from Basic Local Alignment Search Tool protein analysis, one hypothetical protein showed a close resemblance (50%) to Silicibacter pomeroyi DUF1285 family protein (2RE3). A further homology model of the target was constructed using MODELLER 9.12 and optimized through variable target function method by molecular dynamics optimization with simulating annealing. The stereochemical quality of the restrained model was evaluated by PROCHECK, VERIFY-3D, ERRAT, and WHATIF servers. Furthermore, structure-based virtual screening was carried out against the predicted active site of the respective protein using the glycerol structural analogs from the PubChem database. We identified five best inhibitors with strong affinities, stable interactions, and also with reliable drug-like properties. Hence, these leads might be used as the most effective inhibitors of modeled protein. The outcome of the present work of virtual screening of putative gene targets might facilitate design of potential drugs for better treatment against brucellosis. PMID:25834405

  13. Detecting protein-protein interactions in the intact cell of Bacillus subtilis (ATCC 6633).

    PubMed

    Winters, Michael S; Day, R A

    2003-07-01

    The salt bridge, paired group-specific reagent cyanogen (ethanedinitrile; C(2)N(2)) converts naturally occurring pairs of functional groups into covalently linked products. Cyanogen readily permeates cell walls and membranes. When the paired groups are shared between associated proteins, isolation of the covalently linked proteins allows their identity to be assigned. Examination of organisms of known genome sequence permits identification of the linked proteins by mass spectrometric techniques applied to peptides derived from them. The cyanogen-linked proteins were isolated by polyacrylamide gel electrophoresis. Digestion of the isolated proteins with proteases of known specificity afforded sets of peptides that could be analyzed by mass spectrometry. These data were compared with those derived theoretically from the Swiss Protein Database by computer-based comparisons (Protein Prospector; http://prospector.ucsf.edu). Identification of associated proteins in the ribosome of Bacillus subtilis strain ATCC 6633 showed that there is an association homology with the association patterns of the ribosomal proteins of Haloarcula marismortui and Thermus thermophilus. In addition, other proteins involved in protein biosynthesis were shown to be associated with ribosomal proteins.

  14. Detecting Protein-Protein Interactions in the Intact Cell of Bacillus subtilis (ATCC 6633)

    PubMed Central

    Winters, Michael S.; Day, R. A.

    2003-01-01

    The salt bridge, paired group-specific reagent cyanogen (ethanedinitrile; C2N2) converts naturally occurring pairs of functional groups into covalently linked products. Cyanogen readily permeates cell walls and membranes. When the paired groups are shared between associated proteins, isolation of the covalently linked proteins allows their identity to be assigned. Examination of organisms of known genome sequence permits identification of the linked proteins by mass spectrometric techniques applied to peptides derived from them. The cyanogen-linked proteins were isolated by polyacrylamide gel electrophoresis. Digestion of the isolated proteins with proteases of known specificity afforded sets of peptides that could be analyzed by mass spectrometry. These data were compared with those derived theoretically from the Swiss Protein Database by computer-based comparisons (Protein Prospector; http://prospector.ucsf.edu). Identification of associated proteins in the ribosome of Bacillus subtilis strain ATCC 6633 showed that there is an association homology with the association patterns of the ribosomal proteins of Haloarcula marismortui and Thermus thermophilus. In addition, other proteins involved in protein biosynthesis were shown to be associated with ribosomal proteins. PMID:12837803

  15. HIV Molecular Immunology 2015

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yusim, Karina; Korber, Bette Tina; Brander, Christian

    The scope and purpose of the HIV molecular immunology database: HIV Molecular Immunology is a companion volume to HIV Sequence Compendium. This publication, the 2015 edition, is the PDF version of the web-based HIV Immunology Database (http://www.hiv.lanl.gov/ content/immunology/). The web interface for this relational database has many search options, as well as interactive tools to help immunologists design reagents and interpret their results. In the HIV Immunology Database, HIV-specific B-cell and T-cell responses are summarized and annotated. Immunological responses are divided into three parts, CTL, T helper, and antibody. Within these parts, defined epitopes are organized by protein and bindingmore » sites within each protein, moving from left to right through the coding regions spanning the HIV genome. We include human responses to natural HIV infections, as well as vaccine studies in a range of animal models and human trials. Responses that are not specifically defined, such as responses to whole proteins or monoclonal antibody responses to discontinuous epitopes, are summarized at the end of each protein section. Studies describing general HIV responses to the virus, but not to any specific protein, are included at the end of each part. The annotation includes information such as cross-reactivity, escape mutations, antibody sequence, TCR usage, functional domains that overlap with an epitope, immune response associations with rates of progression and therapy, and how specific epitopes were experimentally defined. Basic information such as HLA specificities for T-cell epitopes, isotypes of monoclonal antibodies, and epitope sequences are included whenever possible. All studies that we can find that incorporate the use of a specific monoclonal antibody are included in the entry for that antibody. A single T-cell epitope can have multiple entries, generally one entry per study. Finally, maps of all defined linear epitopes relative to the HXB2 reference proteins are provided. Alignments of CTL, helper T-cell, and antibody epitopes are available through the search interface on our web site at http:// www.hiv.lanl.gov/content/immunology.« less

  16. Drug search for leishmaniasis: a virtual screening approach by grid computing

    NASA Astrophysics Data System (ADS)

    Ochoa, Rodrigo; Watowich, Stanley J.; Flórez, Andrés; Mesa, Carol V.; Robledo, Sara M.; Muskus, Carlos

    2016-07-01

    The trypanosomatid protozoa Leishmania is endemic in 100 countries, with infections causing 2 million new cases of leishmaniasis annually. Disease symptoms can include severe skin and mucosal ulcers, fever, anemia, splenomegaly, and death. Unfortunately, therapeutics approved to treat leishmaniasis are associated with potentially severe side effects, including death. Furthermore, drug-resistant Leishmania parasites have developed in most endemic countries. To address an urgent need for new, safe and inexpensive anti-leishmanial drugs, we utilized the IBM World Community Grid to complete computer-based drug discovery screens (Drug Search for Leishmaniasis) using unique leishmanial proteins and a database of 600,000 drug-like small molecules. Protein structures from different Leishmania species were selected for molecular dynamics (MD) simulations, and a series of conformational "snapshots" were chosen from each MD trajectory to simulate the protein's flexibility. A Relaxed Complex Scheme methodology was used to screen 2000 MD conformations against the small molecule database, producing >1 billion protein-ligand structures. For each protein target, a binding spectrum was calculated to identify compounds predicted to bind with highest average affinity to all protein conformations. Significantly, four different Leishmania protein targets were predicted to strongly bind small molecules, with the strongest binding interactions predicted to occur for dihydroorotate dehydrogenase (LmDHODH; PDB:3MJY). A number of predicted tight-binding LmDHODH inhibitors were tested in vitro and potent selective inhibitors of Leishmania panamensis were identified. These promising small molecules are suitable for further development using iterative structure-based optimization and in vitro/in vivo validation assays.

  17. Drug search for leishmaniasis: a virtual screening approach by grid computing.

    PubMed

    Ochoa, Rodrigo; Watowich, Stanley J; Flórez, Andrés; Mesa, Carol V; Robledo, Sara M; Muskus, Carlos

    2016-07-01

    The trypanosomatid protozoa Leishmania is endemic in ~100 countries, with infections causing ~2 million new cases of leishmaniasis annually. Disease symptoms can include severe skin and mucosal ulcers, fever, anemia, splenomegaly, and death. Unfortunately, therapeutics approved to treat leishmaniasis are associated with potentially severe side effects, including death. Furthermore, drug-resistant Leishmania parasites have developed in most endemic countries. To address an urgent need for new, safe and inexpensive anti-leishmanial drugs, we utilized the IBM World Community Grid to complete computer-based drug discovery screens (Drug Search for Leishmaniasis) using unique leishmanial proteins and a database of 600,000 drug-like small molecules. Protein structures from different Leishmania species were selected for molecular dynamics (MD) simulations, and a series of conformational "snapshots" were chosen from each MD trajectory to simulate the protein's flexibility. A Relaxed Complex Scheme methodology was used to screen ~2000 MD conformations against the small molecule database, producing >1 billion protein-ligand structures. For each protein target, a binding spectrum was calculated to identify compounds predicted to bind with highest average affinity to all protein conformations. Significantly, four different Leishmania protein targets were predicted to strongly bind small molecules, with the strongest binding interactions predicted to occur for dihydroorotate dehydrogenase (LmDHODH; PDB:3MJY). A number of predicted tight-binding LmDHODH inhibitors were tested in vitro and potent selective inhibitors of Leishmania panamensis were identified. These promising small molecules are suitable for further development using iterative structure-based optimization and in vitro/in vivo validation assays.

  18. PhosphoregDB: The tissue and sub-cellular distribution of mammalian protein kinases and phosphatases

    PubMed Central

    Forrest, Alistair RR; Taylor, Darrin F; Fink, J Lynn; Gongora, M Milena; Flegg, Cameron; Teasdale, Rohan D; Suzuki, Harukazu; Kanamori, Mutsumi; Kai, Chikatoshi; Hayashizaki, Yoshihide; Grimmond, Sean M

    2006-01-01

    Background Protein kinases and protein phosphatases are the fundamental components of phosphorylation dependent protein regulatory systems. We have created a database for the protein kinase-like and phosphatase-like loci of mouse that integrates protein sequence, interaction, classification and pathway information with the results of a systematic screen of their sub-cellular localization and tissue specific expression data mined from the GNF tissue atlas of mouse. Results The database lets users query where a specific kinase or phosphatase is expressed at both the tissue and sub-cellular levels. Similarly the interface allows the user to query by tissue, pathway or sub-cellular localization, to reveal which components are co-expressed or co-localized. A review of their expression reveals 30% of these components are detected in all tissues tested while 70% show some level of tissue restriction. Hierarchical clustering of the expression data reveals that expression of these genes can be used to separate the samples into tissues of related lineage, including 3 larger clusters of nervous tissue, developing embryo and cells of the immune system. By overlaying the expression, sub-cellular localization and classification data we examine correlations between class, specificity and tissue restriction and show that tyrosine kinases are more generally expressed in fewer tissues than serine/threonine kinases. Conclusion Together these data demonstrate that cell type specific systems exist to regulate protein phosphorylation and that for accurate modelling and for determination of enzyme substrate relationships the co-location of components needs to be considered. PMID:16504016

  19. Comprehensive Reconstruction and Visualization of Non-Coding Regulatory Networks in Human

    PubMed Central

    Bonnici, Vincenzo; Russo, Francesco; Bombieri, Nicola; Pulvirenti, Alfredo; Giugno, Rosalba

    2014-01-01

    Research attention has been powered to understand the functional roles of non-coding RNAs (ncRNAs). Many studies have demonstrated their deregulation in cancer and other human disorders. ncRNAs are also present in extracellular human body fluids such as serum and plasma, giving them a great potential as non-invasive biomarkers. However, non-coding RNAs have been relatively recently discovered and a comprehensive database including all of them is still missing. Reconstructing and visualizing the network of ncRNAs interactions are important steps to understand their regulatory mechanism in complex systems. This work presents ncRNA-DB, a NoSQL database that integrates ncRNAs data interactions from a large number of well established on-line repositories. The interactions involve RNA, DNA, proteins, and diseases. ncRNA-DB is available at http://ncrnadb.scienze.univr.it/ncrnadb/. It is equipped with three interfaces: web based, command-line, and a Cytoscape app called ncINetView. By accessing only one resource, users can search for ncRNAs and their interactions, build a network annotated with all known ncRNAs and associated diseases, and use all visual and mining features available in Cytoscape. PMID:25540777

  20. Comprehensive reconstruction and visualization of non-coding regulatory networks in human.

    PubMed

    Bonnici, Vincenzo; Russo, Francesco; Bombieri, Nicola; Pulvirenti, Alfredo; Giugno, Rosalba

    2014-01-01

    Research attention has been powered to understand the functional roles of non-coding RNAs (ncRNAs). Many studies have demonstrated their deregulation in cancer and other human disorders. ncRNAs are also present in extracellular human body fluids such as serum and plasma, giving them a great potential as non-invasive biomarkers. However, non-coding RNAs have been relatively recently discovered and a comprehensive database including all of them is still missing. Reconstructing and visualizing the network of ncRNAs interactions are important steps to understand their regulatory mechanism in complex systems. This work presents ncRNA-DB, a NoSQL database that integrates ncRNAs data interactions from a large number of well established on-line repositories. The interactions involve RNA, DNA, proteins, and diseases. ncRNA-DB is available at http://ncrnadb.scienze.univr.it/ncrnadb/. It is equipped with three interfaces: web based, command-line, and a Cytoscape app called ncINetView. By accessing only one resource, users can search for ncRNAs and their interactions, build a network annotated with all known ncRNAs and associated diseases, and use all visual and mining features available in Cytoscape.

Top