Sample records for human protein database

  1. Human Mitochondrial Protein Database

    National Institute of Standards and Technology Data Gateway

    SRD 131 Human Mitochondrial Protein Database (Web, free access)   The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.

  2. hPDI: a database of experimental human protein-DNA interactions.

    PubMed

    Xie, Zhi; Hu, Shaohui; Blackshaw, Seth; Zhu, Heng; Qian, Jiang

    2010-01-15

    The human protein DNA Interactome (hPDI) database holds experimental protein-DNA interaction data for humans identified by protein microarray assays. The unique characteristics of hPDI are that it contains consensus DNA-binding sequences not only for nearly 500 human transcription factors but also for >500 unconventional DNA-binding proteins, which are completely uncharacterized previously. Users can browse, search and download a subset or the entire data via a web interface. This database is freely accessible for any academic purposes. http://bioinfo.wilmer.jhu.edu/PDI/.

  3. Role for protein–protein interaction databases in human genetics

    PubMed Central

    Pattin, Kristine A; Moore, Jason H

    2010-01-01

    Proteomics and the study of protein–protein interactions are becoming increasingly important in our effort to understand human diseases on a system-wide level. Thanks to the development and curation of protein-interaction databases, up-to-date information on these interaction networks is accessible and publicly available to the scientific community. As our knowledge of protein–protein interactions increases, it is important to give thought to the different ways that these resources can impact biomedical research. In this article, we highlight the importance of protein–protein interactions in human genetics and genetic epidemiology. Since protein–protein interactions demonstrate one of the strongest functional relationships between genes, combining genomic data with available proteomic data may provide us with a more in-depth understanding of common human diseases. In this review, we will discuss some of the fundamentals of protein interactions, the databases that are publicly available and how information from these databases can be used to facilitate genome-wide genetic studies. PMID:19929610

  4. Meta sequence analysis of human blood peptides and their parent proteins.

    PubMed

    Bowden, Peter; Pendrak, Voitek; Zhu, Peihong; Marshall, John G

    2010-04-18

    Sequence analysis of the blood peptides and their qualities will be key to understanding the mechanisms that contribute to error in LC-ESI-MS/MS. Analysis of peptides and their proteins at the level of sequences is much more direct and informative than the comparison of disparate accession numbers. A portable database of all blood peptide and protein sequences with descriptor fields and gene ontology terms might be useful for designing immunological or MRM assays from human blood. The results of twelve studies of human blood peptides and/or proteins identified by LC-MS/MS and correlated against a disparate array of genetic libraries were parsed and matched to proteins from the human ENSEMBL, SwissProt and RefSeq databases by SQL. The reported peptide and protein sequences were organized into an SQL database with full protein sequences and up to five unique peptides in order of prevalence along with the peptide count for each protein. Structured query language or BLAST was used to acquire descriptive information in current databases. Sampling error at the level of peptides is the largest source of disparity between groups. Chi Square analysis of peptide to protein distributions confirmed the significant agreement between groups on identified proteins. Copyright 2010. Published by Elsevier B.V.

  5. MIPS: a database for genomes and protein sequences

    PubMed Central

    Mewes, H. W.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Münsterkötter, M.; Rudd, S.; Weil, B.

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz–Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91–93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155–158; Barker et al. (2001) Nucleic Acids Res., 29, 29–32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de). PMID:11752246

  6. MIPS: a database for genomes and protein sequences.

    PubMed

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  7. Genic insights from integrated human proteomics in GeneCards.

    PubMed

    Fishilevich, Simon; Zimmerman, Shahar; Kohn, Asher; Iny Stein, Tsippi; Olender, Tsviya; Kolker, Eugene; Safran, Marilyn; Lancet, Doron

    2016-01-01

    GeneCards is a one-stop shop for searchable human gene annotations (http://www.genecards.org/). Data are automatically mined from ∼120 sources and presented in an integrated web card for every human gene. We report the application of recent advances in proteomics to enhance gene annotation and classification in GeneCards. First, we constructed the Human Integrated Protein Expression Database (HIPED), a unified database of protein abundance in human tissues, based on the publically available mass spectrometry (MS)-based proteomics sources ProteomicsDB, Multi-Omics Profiling Expression Database, Protein Abundance Across Organisms and The MaxQuant DataBase. The integrated database, residing within GeneCards, compares favourably with its individual sources, covering nearly 90% of human protein-coding genes. For gene annotation and comparisons, we first defined a protein expression vector for each gene, based on normalized abundances in 69 normal human tissues. This vector is portrayed in the GeneCards expression section as a bar graph, allowing visual inspection and comparison. These data are juxtaposed with transcriptome bar graphs. Using the protein expression vectors, we further defined a pairwise metric that helps assess expression-based pairwise proximity. This new metric for finding functional partners complements eight others, including sharing of pathways, gene ontology (GO) terms and domains, implemented in the GeneCards Suite. In parallel, we calculated proteome-based differential expression, highlighting a subset of tissues that overexpress a gene and subserving gene classification. This textual annotation allows users of VarElect, the suite's next-generation phenotyper, to more effectively discover causative disease variants. Finally, we define the protein-RNA expression ratio and correlation as yet another attribute of every gene in each tissue, adding further annotative information. The results constitute a significant enhancement of several GeneCards sections and help promote and organize the genome-wide structural and functional knowledge of the human proteome. Database URL:http://www.genecards.org/. © The Author(s) 2016. Published by Oxford University Press.

  8. MultitaskProtDB-II: an update of a database of multitasking/moonlighting proteins

    PubMed Central

    Franco-Serrano, Luís; Hernández, Sergio; Calvo, Alejandra; Severi, María A; Ferragut, Gabriela; Pérez-Pons, JosepAntoni; Piñol, Jaume; Pich, Òscar; Mozo-Villarias, Ángel; Amela, Isaac

    2018-01-01

    Abstract Multitasking, or moonlighting, is the capability of some proteins to execute two or more biological functions. MultitaskProtDB-II is a database of multifunctional proteins that has been updated. In the previous version, the information contained was: NCBI and UniProt accession numbers, canonical and additional biological functions, organism, monomeric/oligomeric states, PDB codes and bibliographic references. In the present update, the number of entries has been increased from 288 to 694 moonlighting proteins. MultitaskProtDB-II is continually being curated and updated. The new database also contains the following information: GO descriptors for the canonical and moonlighting functions, three-dimensional structure (for those proteins lacking PDB structure, a model was made using Itasser and Phyre), the involvement of the proteins in human diseases (78% of human moonlighting proteins) and whether the protein is a target of a current drug (48% of human moonlighting proteins). These numbers highlight the importance of these proteins for the analysis and explanation of human diseases and target-directed drug design. Moreover, 25% of the proteins of the database are involved in virulence of pathogenic microorganisms, largely in the mechanism of adhesion to the host. This highlights their importance for the mechanism of microorganism infection and vaccine design. MultitaskProtDB-II is available at http://wallace.uab.es/multitaskII. PMID:29136215

  9. A Novel Biclustering Approach to Association Rule Mining for Predicting HIV-1–Human Protein Interactions

    PubMed Central

    Mukhopadhyay, Anirban; Maulik, Ujjwal; Bandyopadhyay, Sanghamitra

    2012-01-01

    Identification of potential viral-host protein interactions is a vital and useful approach towards development of new drugs targeting those interactions. In recent days, computational tools are being utilized for predicting viral-host interactions. Recently a database containing records of experimentally validated interactions between a set of HIV-1 proteins and a set of human proteins has been published. The problem of predicting new interactions based on this database is usually posed as a classification problem. However, posing the problem as a classification one suffers from the lack of biologically validated negative interactions. Therefore it will be beneficial to use the existing database for predicting new viral-host interactions without the need of negative samples. Motivated by this, in this article, the HIV-1–human protein interaction database has been analyzed using association rule mining. The main objective is to identify a set of association rules both among the HIV-1 proteins and among the human proteins, and use these rules for predicting new interactions. In this regard, a novel association rule mining technique based on biclustering has been proposed for discovering frequent closed itemsets followed by the association rules from the adjacency matrix of the HIV-1–human interaction network. Novel HIV-1–human interactions have been predicted based on the discovered association rules and tested for biological significance. For validation of the predicted new interactions, gene ontology-based and pathway-based studies have been performed. These studies show that the human proteins which are predicted to interact with a particular viral protein share many common biological activities. Moreover, literature survey has been used for validation purpose to identify some predicted interactions that are already validated experimentally but not present in the database. Comparison with other prediction methods is also discussed. PMID:22539940

  10. PICKLE 2.0: A human protein-protein interaction meta-database employing data integration via genetic information ontology

    PubMed Central

    Gioutlakis, Aris; Klapa, Maria I.

    2017-01-01

    It has been acknowledged that source databases recording experimentally supported human protein-protein interactions (PPIs) exhibit limited overlap. Thus, the reconstruction of a comprehensive PPI network requires appropriate integration of multiple heterogeneous primary datasets, presenting the PPIs at various genetic reference levels. Existing PPI meta-databases perform integration via normalization; namely, PPIs are merged after converted to a certain target level. Hence, the node set of the integrated network depends each time on the number and type of the combined datasets. Moreover, the irreversible a priori normalization process hinders the identification of normalization artifacts in the integrated network, which originate from the nonlinearity characterizing the genetic information flow. PICKLE (Protein InteraCtion KnowLedgebasE) 2.0 implements a new architecture for this recently introduced human PPI meta-database. Its main novel feature over the existing meta-databases is its approach to primary PPI dataset integration via genetic information ontology. Building upon the PICKLE principles of using the reviewed human complete proteome (RHCP) of UniProtKB/Swiss-Prot as the reference protein interactor set, and filtering out protein interactions with low probability of being direct based on the available evidence, PICKLE 2.0 first assembles the RHCP genetic information ontology network by connecting the corresponding genes, nucleotide sequences (mRNAs) and proteins (UniProt entries) and then integrates PPI datasets by superimposing them on the ontology network without any a priori transformations. Importantly, this process allows the resulting heterogeneous integrated network to be reversibly normalized to any level of genetic reference without loss of the original information, the latter being used for identification of normalization biases, and enables the appraisal of potential false positive interactions through PPI source database cross-checking. The PICKLE web-based interface (www.pickle.gr) allows for the simultaneous query of multiple entities and provides integrated human PPI networks at either the protein (UniProt) or the gene level, at three PPI filtering modes. PMID:29023571

  11. Sys-BodyFluid: a systematical database for human body fluid proteome research

    PubMed Central

    Li, Su-Jun; Peng, Mao; Li, Hong; Liu, Bo-Shu; Wang, Chuan; Wu, Jia-Rui; Li, Yi-Xue; Zeng, Rong

    2009-01-01

    Recently, body fluids have widely become an important target for proteomic research and proteomic study has produced more and more body fluid related protein data. A database is needed to collect and analyze these proteome data. Thus, we developed this web-based body fluid proteome database Sys-BodyFluid. It contains eleven kinds of body fluid proteomes, including plasma/serum, urine, cerebrospinal fluid, saliva, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, tear fluid, seminal fluid, human milk and amniotic fluid. Over 10 000 proteins are presented in the Sys-BodyFluid. Sys-BodyFluid provides the detailed protein annotations, including protein description, Gene Ontology, domain information, protein sequence and involved pathways. These proteome data can be retrieved by using protein name, protein accession number and sequence similarity. In addition, users can query between these different body fluids to get the different proteins identification information. Sys-BodyFluid database can facilitate the body fluid proteomics and disease proteomics research as a reference database. It is available at http://www.biosino.org/bodyfluid/. PMID:18978022

  12. Sys-BodyFluid: a systematical database for human body fluid proteome research.

    PubMed

    Li, Su-Jun; Peng, Mao; Li, Hong; Liu, Bo-Shu; Wang, Chuan; Wu, Jia-Rui; Li, Yi-Xue; Zeng, Rong

    2009-01-01

    Recently, body fluids have widely become an important target for proteomic research and proteomic study has produced more and more body fluid related protein data. A database is needed to collect and analyze these proteome data. Thus, we developed this web-based body fluid proteome database Sys-BodyFluid. It contains eleven kinds of body fluid proteomes, including plasma/serum, urine, cerebrospinal fluid, saliva, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, tear fluid, seminal fluid, human milk and amniotic fluid. Over 10,000 proteins are presented in the Sys-BodyFluid. Sys-BodyFluid provides the detailed protein annotations, including protein description, Gene Ontology, domain information, protein sequence and involved pathways. These proteome data can be retrieved by using protein name, protein accession number and sequence similarity. In addition, users can query between these different body fluids to get the different proteins identification information. Sys-BodyFluid database can facilitate the body fluid proteomics and disease proteomics research as a reference database. It is available at http://www.biosino.org/bodyfluid/.

  13. The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data.

    PubMed

    Hermjakob, Henning; Montecchi-Palazzi, Luisa; Bader, Gary; Wojcik, Jérôme; Salwinski, Lukasz; Ceol, Arnaud; Moore, Susan; Orchard, Sandra; Sarkans, Ugis; von Mering, Christian; Roechert, Bernd; Poux, Sylvain; Jung, Eva; Mersch, Henning; Kersey, Paul; Lappe, Michael; Li, Yixue; Zeng, Rong; Rana, Debashis; Nikolski, Macha; Husi, Holger; Brun, Christine; Shanker, K; Grant, Seth G N; Sander, Chris; Bork, Peer; Zhu, Weimin; Pandey, Akhilesh; Brazma, Alvis; Jacq, Bernard; Vidal, Marc; Sherman, David; Legrain, Pierre; Cesareni, Gianni; Xenarios, Ioannis; Eisenberg, David; Steipe, Boris; Hogue, Chris; Apweiler, Rolf

    2004-02-01

    A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across experiments is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community standard data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Standards Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomolecular Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Reference Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Molecular Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).

  14. The BioGRID interaction database: 2017 update

    PubMed Central

    Chatr-aryamontri, Andrew; Oughtred, Rose; Boucher, Lorrie; Rust, Jennifer; Chang, Christie; Kolas, Nadine K.; O'Donnell, Lara; Oster, Sara; Theesfeld, Chandra; Sellam, Adnane; Stark, Chris; Breitkreutz, Bobby-Joe; Dolinski, Kara; Tyers, Mike

    2017-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the annotation and archival of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2016 (build 3.4.140), the BioGRID contains 1 072 173 genetic and protein interactions, and 38 559 post-translational modifications, as manually annotated from 48 114 publications. This dataset represents interaction records for 66 model organisms and represents a 30% increase compared to the previous 2015 BioGRID update. BioGRID curates the biomedical literature for major model organism species, including humans, with a recent emphasis on central biological processes and specific human diseases. To facilitate network-based approaches to drug discovery, BioGRID now incorporates 27 501 chemical–protein interactions for human drug targets, as drawn from the DrugBank database. A new dynamic interaction network viewer allows the easy navigation and filtering of all genetic and protein interaction data, as well as for bioactive compounds and their established targets. BioGRID data are directly downloadable without restriction in a variety of standardized formats and are freely distributed through partner model organism databases and meta-databases. PMID:27980099

  15. A comprehensive and scalable database search system for metaproteomics.

    PubMed

    Chatterjee, Sandip; Stupp, Gregory S; Park, Sung Kyu Robin; Ducom, Jean-Christophe; Yates, John R; Su, Andrew I; Wolan, Dennis W

    2016-08-16

    Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples. The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.

  16. HPIminer: A text mining system for building and visualizing human protein interaction networks and pathways.

    PubMed

    Subramani, Suresh; Kalpana, Raja; Monickaraj, Pankaj Moses; Natarajan, Jeyakumar

    2015-04-01

    The knowledge on protein-protein interactions (PPI) and their related pathways are equally important to understand the biological functions of the living cell. Such information on human proteins is highly desirable to understand the mechanism of several diseases such as cancer, diabetes, and Alzheimer's disease. Because much of that information is buried in biomedical literature, an automated text mining system for visualizing human PPI and pathways is highly desirable. In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature. HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG. To our knowledge, HPIminer is the first system to build interaction networks from literature as well as curated databases. Further, the new interactions mined only from literature and not reported earlier in databases are highlighted as new. A comparative study with other similar tools shows that the resultant network is more informative and provides additional information on interacting proteins and their associated networks. Copyright © 2015 Elsevier Inc. All rights reserved.

  17. Inconsistencies in the red blood cell membrane proteome analysis: generation of a database for research and diagnostic applications

    PubMed Central

    Hegedűs, Tamás; Chaubey, Pururawa Mayank; Várady, György; Szabó, Edit; Sarankó, Hajnalka; Hofstetter, Lia; Roschitzki, Bernd; Sarkadi, Balázs

    2015-01-01

    Based on recent results, the determination of the easily accessible red blood cell (RBC) membrane proteins may provide new diagnostic possibilities for assessing mutations, polymorphisms or regulatory alterations in diseases. However, the analysis of the current mass spectrometry-based proteomics datasets and other major databases indicates inconsistencies—the results show large scattering and only a limited overlap for the identified RBC membrane proteins. Here, we applied membrane-specific proteomics studies in human RBC, compared these results with the data in the literature, and generated a comprehensive and expandable database using all available data sources. The integrated web database now refers to proteomic, genetic and medical databases as well, and contains an unexpected large number of validated membrane proteins previously thought to be specific for other tissues and/or related to major human diseases. Since the determination of protein expression in RBC provides a method to indicate pathological alterations, our database should facilitate the development of RBC membrane biomarker platforms and provide a unique resource to aid related further research and diagnostics. Database URL: http://rbcc.hegelab.org PMID:26078478

  18. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

    PubMed

    Deutsch, Eric W; Sun, Zhi; Campbell, David S; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S; Moritz, Robert L

    2016-11-04

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/ .

  19. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics

    PubMed Central

    Deutsch, Eric W.; Sun, Zhi; Campbell, David S.; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S.; Moritz, Robert L.

    2016-01-01

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances – a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ~20,000 primary isoforms plus contaminants to a very large database that includes almost all non-redundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/. PMID:27577934

  20. Interactome of the hepatitis C virus: Literature mining with ANDSystem.

    PubMed

    Saik, Olga V; Ivanisenko, Timofey V; Demenkov, Pavel S; Ivanisenko, Vladimir A

    2016-06-15

    A study of the molecular genetics mechanisms of host-pathogen interactions is of paramount importance in developing drugs against viral diseases. Currently, the literature contains a huge amount of information that describes interactions between HCV and human proteins. In addition, there are many factual databases that contain experimentally verified data on HCV-host interactions. The sources of such data are the original data along with the data manually extracted from the literature. However, the manual analysis of scientific publications is time consuming and, because of this, databases created with such an approach often do not have complete information. One of the most promising methods to provide actualisation and completeness of information is text mining. Here, with the use of a previously developed method by the authors using ANDSystem, an automated extraction of information on the interactions between HCV and human proteins was conducted. As a data source for the text mining approach, PubMed abstracts and full text articles were used. Additionally, external factual databases were analyzed. On the basis of this analysis, a special version of ANDSystem, extended with the HCV interactome, was created. The HCV interactome contains information about the interactions between 969 human and 11 HCV proteins. Among the 969 proteins, 153 'new' proteins were found not previously referred to in any external databases of protein-protein interactions for HCV-host interactions. Thus, the extended ANDSystem possesses a more comprehensive detailing of HCV-host interactions versus other existing databases. It was interesting that HCV proteins more preferably interact with human proteins that were already involved in a large number of protein-protein interactions as well as those associated with many diseases. Among human proteins of the HCV interactome, there were a large number of proteins regulated by microRNAs. It turned out that the results obtained for protein-protein interactions and microRNA-regulation did not depend on how well the proteins were studied, while protein-disease interactions appeared to be dependent on the level of study. In particular, the mean number of diseases linked to well-studied proteins (proteins were considered well-studied if they were mentioned in 50 or more PubMed publications) from the HCV interactome was 20.8, significantly exceeding the mean number of associations with diseases (10.1) for the total set of well-studied human proteins present in ANDSystem. For proteins not highly poorly-studied investigated, proteins from the HCV interactome (each protein was referred to in less than 50 publications) distribution of the number of diseases associated with them had no statistically significant differences from the distribution of the number of diseases associated with poorly-studied proteins based on the total set of human proteins stored in ANDSystem. With this, the average number of associations with diseases for the HCV interactome and the total set of human proteins were 0.3 and 0.2, respectively. Thus, ANDSystem, extended with the HCV interactome, can be helpful in a wide range of issues related to analyzing HCV-host interactions in the search for anti-HCV drug targets. The demo version of the extended ANDSystem covered here containing only interactions between human proteins, genes, metabolites, diseases, miRNAs and molecular-genetic pathways, as well as interactions between human proteins/genes and HCV proteins, is freely available at the following web address: http://www-bionet.sscc.ru/psd/andhcv/. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  1. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    NASA Astrophysics Data System (ADS)

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-06-01

    Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

  2. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    PubMed Central

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-01-01

    Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications. PMID:27049631

  3. MIPS: a database for genomes and protein sequences.

    PubMed Central

    Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D

    1999-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138

  4. MIPS: analysis and annotation of proteins from whole genomes.

    PubMed

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  5. LymPHOS 2.0: an update of a phosphosite database of primary human T cells

    PubMed Central

    Nguyen, Tien Dung; Vidal-Cortes, Oriol; Gallardo, Oscar; Abian, Joaquin; Carrascal, Montserrat

    2015-01-01

    LymPHOS is a web-oriented database containing peptide and protein sequences and spectrometric information on the phosphoproteome of primary human T-Lymphocytes. Current release 2.0 contains 15 566 phosphorylation sites from 8273 unique phosphopeptides and 4937 proteins, which correspond to a 45-fold increase over the original database description. It now includes quantitative data on phosphorylation changes after time-dependent treatment with activators of the TCR-mediated signal transduction pathway. Sequence data quality has also been improved with the use of multiple search engines for database searching. LymPHOS can be publicly accessed at http://www.lymphos.org. Database URL: http://www.lymphos.org. PMID:26708986

  6. Identification and Validation of Human Missing Proteins and Peptides in Public Proteome Databases: Data Mining Strategy.

    PubMed

    Elguoshy, Amr; Hirao, Yoshitoshi; Xu, Bo; Saito, Suguru; Quadery, Ali F; Yamamoto, Keiko; Mitsui, Toshiaki; Yamamoto, Tadashi

    2017-12-01

    In an attempt to complete human proteome project (HPP), Chromosome-Centric Human Proteome Project (C-HPP) launched the journey of missing protein (MP) investigation in 2012. However, 2579 and 572 protein entries in the neXtProt (2017-1) are still considered as missing and uncertain proteins, respectively. Thus, in this study, we proposed a pipeline to analyze, identify, and validate human missing and uncertain proteins in open-access transcriptomics and proteomics databases. Analysis of RNA expression pattern for missing proteins in Human protein Atlas showed that 28% of them, such as Olfactory receptor 1I1 ( O60431 ), had no RNA expression, suggesting the necessity to consider uncommon tissues for transcriptomic and proteomic studies. Interestingly, 21% had elevated expression level in a particular tissue (tissue-enriched proteins), indicating the importance of targeting such proteins in their elevated tissues. Additionally, the analysis of RNA expression level for missing proteins showed that 95% had no or low expression level (0-10 transcripts per million), indicating that low abundance is one of the major obstacles facing the detection of missing proteins. Moreover, missing proteins are predicted to generate fewer predicted unique tryptic peptides than the identified proteins. Searching for these predicted unique tryptic peptides that correspond to missing and uncertain proteins in the experimental peptide list of open-access MS-based databases (PA, GPM) resulted in the detection of 402 missing and 19 uncertain proteins with at least two unique peptides (≥9 aa) at <(5 × 10 -4 )% FDR. Finally, matching the native spectra for the experimentally detected peptides with their SRMAtlas synthetic counterparts at three transition sources (QQQ, QTOF, QTRAP) gave us an opportunity to validate 41 missing proteins by ≥2 proteotypic peptides.

  7. Exploring the human seminal plasma proteome: an unexplored gold mine of biomarker for male infertility and male reproduction disorder.

    PubMed

    Gilany, Kambiz; Minai-Tehrani, Arash; Savadi-Shiraz, Elham; Rezadoost, Hassan; Lakpour, Niknam

    2015-01-01

    The human seminal fluid is a complex body fluid. It is not known how many proteins are expressed in the seminal plasma; however in analog with the blood it is possible up to 10,000 proteins are expressed in the seminal plasma. The human seminal fluid is a rich source of potential biomarkers for male infertility and reproduction disorder. In this review, the ongoing list of proteins identified from the human seminal fluid was collected. To date, 4188 redundant proteins of the seminal fluid are identified using different proteomics technology, including 2-DE, SDS-PAGE-LC-MS/MS, MudPIT. However, this was reduced to a database of 2168 non-redundant protein using UniProtKB/Swiss-Prot reviewed database. The core concept of proteome were analyzed including pI, MW, Amino Acids, Chromosome and PTM distribution in the human seminal plasma proteome. Additionally, the biological process, molecular function and KEGG pathway were investigated using DAVID software. Finally, the biomarker identified in different male reproductive system disorder was investigated using proteomics platforms so far. In this study, an attempt was made to update the human seminal plasma proteome database. Our finding showed that human seminal plasma studies used to date seem to have converged on a set of proteins that are repeatedly identified in many studies and that represent only a small fraction of the entire human seminal plasma proteome.

  8. Introducing the CPL/MUW proteome database: interpretation of human liver and liver cancer proteome profiles by referring to isolated primary cells.

    PubMed

    Wimmer, Helge; Gundacker, Nina C; Griss, Johannes; Haudek, Verena J; Stättner, Stefan; Mohr, Thomas; Zwickl, Hannes; Paulitschke, Verena; Baron, David M; Trittner, Wolfgang; Kubicek, Markus; Bayer, Editha; Slany, Astrid; Gerner, Christopher

    2009-06-01

    Interpretation of proteome data with a focus on biomarker discovery largely relies on comparative proteome analyses. Here, we introduce a database-assisted interpretation strategy based on proteome profiles of primary cells. Both 2-D-PAGE and shotgun proteomics are applied. We obtain high data concordance with these two different techniques. When applying mass analysis of tryptic spot digests from 2-D gels of cytoplasmic fractions, we typically identify several hundred proteins. Using the same protein fractions, we usually identify more than thousand proteins by shotgun proteomics. The data consistency obtained when comparing these independent data sets exceeds 99% of the proteins identified in the 2-D gels. Many characteristic differences in protein expression of different cells can thus be independently confirmed. Our self-designed SQL database (CPL/MUW - database of the Clinical Proteomics Laboratories at the Medical University of Vienna accessible via www.meduniwien.ac.at/proteomics/database) facilitates (i) quality management of protein identification data, which are based on MS, (ii) the detection of cell type-specific proteins and (iii) of molecular signatures of specific functional cell states. Here, we demonstrate, how the interpretation of proteome profiles obtained from human liver tissue and hepatocellular carcinoma tissue is assisted by the Clinical Proteomics Laboratories at the Medical University of Vienna-database. Therefore, we suggest that the use of reference experiments supported by a tailored database may substantially facilitate data interpretation of proteome profiling experiments.

  9. Why are they missing? : Bioinformatics characterization of missing human proteins.

    PubMed

    Elguoshy, Amr; Magdeldin, Sameh; Xu, Bo; Hirao, Yoshitoshi; Zhang, Ying; Kinoshita, Naohiko; Takisawa, Yusuke; Nameta, Masaaki; Yamamoto, Keiko; El-Refy, Ali; El-Fiky, Fawzy; Yamamoto, Tadashi

    2016-10-21

    NeXtProt is a web-based protein knowledge platform that supports research on human proteins. NeXtProt (release 2015-04-28) lists 20,060 proteins, among them, 3373 canonical proteins (16.8%) lack credible experimental evidence at protein level (PE2:PE5). Therefore, they are considered as "missing proteins". A comprehensive bioinformatic workflow has been proposed to analyze these "missing" proteins. The aims of current study were to analyze physicochemical properties, existence and distribution of the tryptic cleavage sites, and to pinpoint the signature peptides of the missing proteins. Our findings showed that 23.7% of missing proteins were hydrophobic proteins possessing transmembrane domains (TMD). Also, forty missing entries generate tryptic peptides were either out of mass detection range (>30aa) or mapped to different proteins (<9aa). Additionally, 21% of missing entries didn't generate any unique tryptic peptides. In silico endopeptidase combination strategy increased the possibility of missing proteins identification. Coherently, using both mature protein database and signal peptidome database could be a promising option to identify some missing proteins by targeting their unique N-terminal tryptic peptide from mature protein database and or C-terminus tryptic peptide from signal peptidome database. In conclusion, Identification of missing protein requires additional consideration during sample preparation, extraction, digestion and data analysis to increase its incidence of identification. Copyright © 2016. Published by Elsevier B.V.

  10. PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification.

    PubMed

    Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier

    2003-01-01

    The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.

  11. MIPS: analysis and annotation of proteins from whole genomes

    PubMed Central

    Mewes, H. W.; Amid, C.; Arnold, R.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Münsterkötter, M.; Pagel, P.; Strack, N.; Stümpflen, V.; Warfsmann, J.; Ruepp, A.

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein–protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354

  12. Extraction, integration and analysis of alternative splicing and protein structure distributed information

    PubMed Central

    D'Antonio, Matteo; Masseroli, Marco

    2009-01-01

    Background Alternative splicing has been demonstrated to affect most of human genes; different isoforms from the same gene encode for proteins which differ for a limited number of residues, thus yielding similar structures. This suggests possible correlations between alternative splicing and protein structure. In order to support the investigation of such relationships, we have developed the Alternative Splicing and Protein Structure Scrutinizer (PASS), a Web application to automatically extract, integrate and analyze human alternative splicing and protein structure data sparsely available in the Alternative Splicing Database, Ensembl databank and Protein Data Bank. Primary data from these databases have been integrated and analyzed using the Protein Identifier Cross-Reference, BLAST, CLUSTALW and FeatureMap3D software tools. Results A database has been developed to store the considered primary data and the results from their analysis; a system of Perl scripts has been implemented to automatically create and update the database and analyze the integrated data; a Web interface has been implemented to make the analyses easily accessible; a database has been created to manage user accesses to the PASS Web application and store user's data and searches. Conclusion PASS automatically integrates data from the Alternative Splicing Database with protein structure data from the Protein Data Bank. Additionally, it comprehensively analyzes the integrated data with publicly available well-known bioinformatics tools in order to generate structural information of isoform pairs. Further analysis of such valuable information might reveal interesting relationships between alternative splicing and protein structure differences, which may be significantly associated with different functions. PMID:19828075

  13. In silico analysis of protein toxin and bacteriocins from Lactobacillus paracasei SD1 genome and available online databases

    PubMed Central

    Surachat, Komwit; Sangket, Unitsa; Deachamag, Panchalika; Chotigeat, Wilaiwan

    2017-01-01

    Lactobacillus paracasei SD1 is a potential probiotic strain due to its ability to survive several conditions in human dental cavities. To ascertain its safety for human use, we therefore performed a comprehensive bioinformatics analysis and characterization of the bacterial protein toxins produced by this strain. We report the complete genome of Lactobacillus paracasei SD1 and its comparison to other Lactobacillus genomes. Additionally, we identify and analyze its protein toxins and antimicrobial proteins using reliable online database resources and establish its phylogenetic relationship with other bacterial genomes. Our investigation suggests that this strain is safe for human use and contains several bacteriocins that confer health benefits to the host. An in silico analysis of protein-protein interactions between the target bacteriocins and the microbial proteins gtfB and luxS of Streptococcus mutans was performed and is discussed here. PMID:28837656

  14. Exploring Protein Function Using the Saccharomyces Genome Database.

    PubMed

    Wong, Edith D

    2017-01-01

    Elucidating the function of individual proteins will help to create a comprehensive picture of cell biology, as well as shed light on human disease mechanisms, possible treatments, and cures. Due to its compact genome, and extensive history of experimentation and annotation, the budding yeast Saccharomyces cerevisiae is an ideal model organism in which to determine protein function. This information can then be leveraged to infer functions of human homologs. Despite the large amount of research and biological data about S. cerevisiae, many proteins' functions remain unknown. Here, we explore ways to use the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org ) to predict the function of proteins and gain insight into their roles in various cellular processes.

  15. ActiveDriverDB: human disease mutations and genome variation in post-translational modification sites of proteins

    PubMed Central

    Krassowski, Michal; Paczkowska, Marta; Cullion, Kim; Huang, Tina; Dzneladze, Irakli; Ouellette, B F Francis; Yamada, Joseph T; Fradet-Turcotte, Amelie

    2018-01-01

    Abstract Interpretation of genetic variation is needed for deciphering genotype-phenotype associations, mechanisms of inherited disease, and cancer driver mutations. Millions of single nucleotide variants (SNVs) in human genomes are known and thousands are associated with disease. An estimated 21% of disease-associated amino acid substitutions corresponding to missense SNVs are located in protein sites of post-translational modifications (PTMs), chemical modifications of amino acids that extend protein function. ActiveDriverDB is a comprehensive human proteo-genomics database that annotates disease mutations and population variants through the lens of PTMs. We integrated >385,000 published PTM sites with ∼3.6 million substitutions from The Cancer Genome Atlas (TCGA), the ClinVar database of disease genes, and human genome sequencing projects. The database includes site-specific interaction networks of proteins, upstream enzymes such as kinases, and drugs targeting these enzymes. We also predicted network-rewiring impact of mutations by analyzing gains and losses of kinase-bound sequence motifs. ActiveDriverDB provides detailed visualization, filtering, browsing and searching options for studying PTM-associated mutations. Users can upload mutation datasets interactively and use our application programming interface in pipelines. Integrative analysis of mutations and PTMs may help decipher molecular mechanisms of phenotypes and disease, as exemplified by case studies of TP53, BRCA2 and VHL. The open-source database is available at https://www.ActiveDriverDB.org. PMID:29126202

  16. DenHunt - A Comprehensive Database of the Intricate Network of Dengue-Human Interactions

    PubMed Central

    Arjunan, Selvam; Sastri, Narayan P.; Chandra, Nagasuma

    2016-01-01

    Dengue virus (DENV) is a human pathogen and its etiology has been widely established. There are many interactions between DENV and human proteins that have been reported in literature. However, no publicly accessible resource for efficiently retrieving the information is yet available. In this study, we mined all publicly available dengue–human interactions that have been reported in the literature into a database called DenHunt. We retrieved 682 direct interactions of human proteins with dengue viral components, 382 indirect interactions and 4120 differentially expressed human genes in dengue infected cell lines and patients. We have illustrated the importance of DenHunt by mapping the dengue–human interactions on to the host interactome and observed that the virus targets multiple host functional complexes of important cellular processes such as metabolism, immune system and signaling pathways suggesting a potential role of these interactions in viral pathogenesis. We also observed that 7 percent of the dengue virus interacting human proteins are also associated with other infectious and non-infectious diseases. Finally, the understanding that comes from such analyses could be used to design better strategies to counteract the diseases caused by dengue virus. The whole dataset has been catalogued in a searchable database, called DenHunt (http://proline.biochem.iisc.ernet.in/DenHunt/). PMID:27618709

  17. DenHunt - A Comprehensive Database of the Intricate Network of Dengue-Human Interactions.

    PubMed

    Karyala, Prashanthi; Metri, Rahul; Bathula, Christopher; Yelamanchi, Syam K; Sahoo, Lipika; Arjunan, Selvam; Sastri, Narayan P; Chandra, Nagasuma

    2016-09-01

    Dengue virus (DENV) is a human pathogen and its etiology has been widely established. There are many interactions between DENV and human proteins that have been reported in literature. However, no publicly accessible resource for efficiently retrieving the information is yet available. In this study, we mined all publicly available dengue-human interactions that have been reported in the literature into a database called DenHunt. We retrieved 682 direct interactions of human proteins with dengue viral components, 382 indirect interactions and 4120 differentially expressed human genes in dengue infected cell lines and patients. We have illustrated the importance of DenHunt by mapping the dengue-human interactions on to the host interactome and observed that the virus targets multiple host functional complexes of important cellular processes such as metabolism, immune system and signaling pathways suggesting a potential role of these interactions in viral pathogenesis. We also observed that 7 percent of the dengue virus interacting human proteins are also associated with other infectious and non-infectious diseases. Finally, the understanding that comes from such analyses could be used to design better strategies to counteract the diseases caused by dengue virus. The whole dataset has been catalogued in a searchable database, called DenHunt (http://proline.biochem.iisc.ernet.in/DenHunt/).

  18. The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes

    PubMed Central

    Rigden, Daniel J

    2017-01-01

    Abstract This year's Database Issue of Nucleic Acids Research contains 152 papers that include descriptions of 54 new databases and update papers on 98 databases, of which 16 have not been previously featured in NAR. As always, these databases cover a broad range of molecular biology subjects, including genome structure, gene expression and its regulation, proteins, protein domains, and protein–protein interactions. Following the recent trend, an increasing number of new and established databases deal with the issues of human health, from cancer-causing mutations to drugs and drug targets. In accordance with this trend, three recently compiled databases that have been selected by NAR reviewers and editors as ‘breakthrough’ contributions, denovo-db, the Monarch Initiative, and Open Targets, cover human de novo gene variants, disease-related phenotypes in model organisms, and a bioinformatics platform for therapeutic target identification and validation, respectively. We expect these databases to attract the attention of numerous researchers working in various areas of genetics and genomics. Looking back at the past 12 years, we present here the ‘golden set’ of databases that have consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database. The Database Issue is freely available online at the https://academic.oup.com/nar web site. An updated version of the NAR Molecular Biology Database Collection is available at http://www.oxfordjournals.org/nar/database/a/. PMID:28053160

  19. Identifying the missing proteins in human proteome by biological language model.

    PubMed

    Dong, Qiwen; Wang, Kai; Liu, Xuan

    2016-12-23

    With the rapid development of high-throughput sequencing technology, the proteomics research becomes a trendy field in the post genomics era. It is necessary to identify all the native-encoding protein sequences for further function and pathway analysis. Toward that end, the Human Proteome Organization lunched the Human Protein Project in 2011. However many proteins are hard to be detected by experiment methods, which becomes one of the bottleneck in Human Proteome Project. In consideration of the complicatedness of detecting these missing proteins by using wet-experiment approach, here we use bioinformatics method to pre-filter the missing proteins. Since there are analogy between the biological sequences and natural language, the n-gram models from Natural Language Processing field has been used to filter the missing proteins. The dataset used in this study contains 616 missing proteins from the "uncertain" category of the neXtProt database. There are 102 proteins deduced by the n-gram model, which have high probability to be native human proteins. We perform a detail analysis on the predicted structure and function of these missing proteins and also compare the high probability proteins with other mass spectrum datasets. The evaluation shows that the results reported here are in good agreement with those obtained by other well-established databases. The analysis shows that 102 proteins may be native gene-coding proteins and some of the missing proteins are membrane or natively disordered proteins which are hard to be detected by experiment methods.

  20. The Protein Disease Database of human body fluids: II. Computer methods and data issues.

    PubMed

    Lemkin, P F; Orr, G A; Goldstein, M P; Creed, G J; Myrick, J E; Merril, C R

    1995-01-01

    The Protein Disease Database (PDD) is a relational database of proteins and diseases. With this database it is possible to screen for quantitative protein abnormalities associated with disease states. These quantitative relationships use data drawn from the peer-reviewed biomedical literature. Assays may also include those observed in high-resolution electrophoretic gels that offer the potential to quantitate many proteins in a single test as well as data gathered by enzymatic or immunologic assays. We are using the Internet World Wide Web (WWW) and the Web browser paradigm as an access method for wide distribution and querying of the Protein Disease Database. The WWW hypertext transfer protocol and its Common Gateway Interface make it possible to build powerful graphical user interfaces that can support easy-to-use data retrieval using query specification forms or images. The details of these interactions are totally transparent to the users of these forms. Using a client-server SQL relational database, user query access, initial data entry and database maintenance are all performed over the Internet with a Web browser. We discuss the underlying design issues, mapping mechanisms and assumptions that we used in constructing the system, data entry, access to the database server, security, and synthesis of derived two-dimensional gel image maps and hypertext documents resulting from SQL database searches.

  1. Metagenomic Taxonomy-Guided Database-Searching Strategy for Improving Metaproteomic Analysis.

    PubMed

    Xiao, Jinqiu; Tanca, Alessandro; Jia, Ben; Yang, Runqing; Wang, Bo; Zhang, Yu; Li, Jing

    2018-04-06

    Metaproteomics provides a direct measure of the functional information by investigating all proteins expressed by a microbiota. However, due to the complexity and heterogeneity of microbial communities, it is very hard to construct a sequence database suitable for a metaproteomic study. Using a public database, researchers might not be able to identify proteins from poorly characterized microbial species, while a sequencing-based metagenomic database may not provide adequate coverage for all potentially expressed protein sequences. To address this challenge, we propose a metagenomic taxonomy-guided database-search strategy (MT), in which a merged database is employed, consisting of both taxonomy-guided reference protein sequences from public databases and proteins from metagenome assembly. By applying our MT strategy to a mock microbial mixture, about two times as many peptides were detected as with the metagenomic database only. According to the evaluation of the reliability of taxonomic attribution, the rate of misassignments was comparable to that obtained using an a priori matched database. We also evaluated the MT strategy with a human gut microbial sample, and we found 1.7 times as many peptides as using a standard metagenomic database. In conclusion, our MT strategy allows the construction of databases able to provide high sensitivity and precision in peptide identification in metaproteomic studies, enabling the detection of proteins from poorly characterized species within the microbiota.

  2. An interactive mutation database for human coagulation factor IX provides novel insights into the phenotypes and genetics of hemophilia B.

    PubMed

    Rallapalli, P M; Kemball-Cook, G; Tuddenham, E G; Gomez, K; Perkins, S J

    2013-07-01

    Factor IX (FIX) is important in the coagulation cascade, being activated to FIXa on cleavage. Defects in the human F9 gene frequently lead to hemophilia B. To assess 1113 unique F9 mutations corresponding to 3721 patient entries in a new and up-to-date interactive web database alongside the FIXa protein structure. The mutations database was built using MySQL and structural analyses were based on a homology model for the human FIXa structure based on closely-related crystal structures. Mutations have been found in 336 (73%) out of 461 residues in FIX. There were 812 unique point mutations, 182 deletions, 54 polymorphisms, 39 insertions and 26 others that together comprise a total of 1113 unique variants. The 64 unique mild severity mutations in the mature protein with known circulating protein phenotypes include 15 (23%) quantitative type I mutations and 41 (64%) predominantly qualitative type II mutations. Inhibitors were described in 59 reports (1.6%) corresponding to 25 unique mutations. The interactive database provides insights into mechanisms of hemophilia B. Type II mutations are deduced to disrupt predominantly those structural regions involved with functional interactions. The interactive features of the database will assist in making judgments about patient management. © 2013 International Society on Thrombosis and Haemostasis.

  3. Identifying relevant data for a biological database: handcrafted rules versus machine learning.

    PubMed

    Sehgal, Aditya Kumar; Das, Sanmay; Noto, Keith; Saier, Milton H; Elkan, Charles

    2011-01-01

    With well over 1,000 specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both the methods compared and the results will be of interest to curators of many specialized databases.

  4. Knowledge Discovery in Variant Databases Using Inductive Logic Programming

    PubMed Central

    Nguyen, Hoan; Luu, Tien-Dao; Poch, Olivier; Thompson, Julie D.

    2013-01-01

    Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/. PMID:23589683

  5. Knowledge discovery in variant databases using inductive logic programming.

    PubMed

    Nguyen, Hoan; Luu, Tien-Dao; Poch, Olivier; Thompson, Julie D

    2013-01-01

    Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/.

  6. The Functional Human C-Terminome

    PubMed Central

    Hedden, Michael; Lyon, Kenneth F.; Brooks, Steven B.; David, Roxanne P.; Limtong, Justin; Newsome, Jacklyn M.; Novakovic, Nemanja; Rajasekaran, Sanguthevar; Thapar, Vishal; Williams, Sean R.; Schiller, Martin R.

    2016-01-01

    All translated proteins end with a carboxylic acid commonly called the C-terminus. Many short functional sequences (minimotifs) are located on or immediately proximal to the C-terminus. However, information about the function of protein C-termini has not been consolidated into a single source. Here, we built a new “C-terminome” database and web system focused on human proteins. Approximately 3,600 C-termini in the human proteome have a minimotif with an established molecular function. To help evaluate the function of the remaining C-termini in the human proteome, we inferred minimotifs identified by experimentation in rodent cells, predicted minimotifs based upon consensus sequence matches, and predicted novel highly repetitive sequences in C-termini. Predictions can be ranked by enrichment scores or Gene Evolutionary Rate Profiling (GERP) scores, a measurement of evolutionary constraint. By searching for new anchored sequences on the last 10 amino acids of proteins in the human proteome with lengths between 3–10 residues and up to 5 degenerate positions in the consensus sequences, we have identified new consensus sequences that predict instances in the majority of human genes. All of this information is consolidated into a database that can be accessed through a C-terminome web system with search and browse functions for minimotifs and human proteins. A known consensus sequence-based predicted function is assigned to nearly half the proteins in the human proteome. Weblink: http://cterminome.bio-toolkit.com. PMID:27050421

  7. Merging in-silico and in vitro salivary protein complex partners using the STRING database: A tutorial.

    PubMed

    Crosara, Karla Tonelli Bicalho; Moffa, Eduardo Buozi; Xiao, Yizhi; Siqueira, Walter Luiz

    2018-01-16

    Protein-protein interaction is a common physiological mechanism for protection and actions of proteins in an organism. The identification and characterization of protein-protein interactions in different organisms is necessary to better understand their physiology and to determine their efficacy. In a previous in vitro study using mass spectrometry, we identified 43 proteins that interact with histatin 1. Six previously documented interactors were confirmed and 37 novel partners were identified. In this tutorial, we aimed to demonstrate the usefulness of the STRING database for studying protein-protein interactions. We used an in-silico approach along with the STRING database (http://string-db.org/) and successfully performed a fast simulation of a novel constructed histatin 1 protein-protein network, including both the previously known and the predicted interactors, along with our newly identified interactors. Our study highlights the advantages and importance of applying bioinformatics tools to merge in-silico tactics with experimental in vitro findings for rapid advancement of our knowledge about protein-protein interactions. Our findings also indicate that bioinformatics tools such as the STRING protein network database can help predict potential interactions between proteins and thus serve as a guide for future steps in our exploration of the Human Interactome. Our study highlights the usefulness of the STRING protein database for studying protein-protein interactions. The STRING database can collect and integrate data about known and predicted protein-protein associations from many organisms, including both direct (physical) and indirect (functional) interactions, in an easy-to-use interface. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites.

    PubMed

    Shen, Hong-Bin; Chou, Kuo-Chen

    2007-04-20

    Proteins may simultaneously exist at, or move between, two or more different subcellular locations. Proteins with multiple locations or dynamic feature of this kind are particularly interesting because they may have some very special biological functions intriguing to investigators in both basic research and drug discovery. For instance, among the 6408 human protein entries that have experimentally observed subcellular location annotations in the Swiss-Prot database (version 50.7, released 19-Sept-2006), 973 ( approximately 15%) have multiple location sites. The number of total human protein entries (except those annotated with "fragment" or those with less than 50 amino acids) in the same database is 14,370, meaning a gap of (14,370-6408)=7962 entries for which no knowledge is available about their subcellular locations. Although one can use the computational approach to predict the desired information for the gap, so far all the existing methods for predicting human protein subcellular localization are limited in the case of single location site only. To overcome such a barrier, a new ensemble classifier, named Hum-mPLoc, was developed that can be used to deal with the case of multiple location sites as well. Hum-mPLoc is freely accessible to the public as a web server at http://202.120.37.186/bioinf/hum-multi. Meanwhile, for the convenience of people working in the relevant areas, Hum-mPLoc has been used to identify all human protein entries in the Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain. The large-scale results thus obtained have been deposited in a downloadable file prepared with Microsoft Excel and named "Tab_Hum-mPLoc.xls". This file is available at the same website and will be updated twice a year to include new entries of human proteins and reflect the continuous development of Hum-mPLoc.

  9. ImmunemiR - A Database of Prioritized Immune miRNA Disease Associations and its Interactome.

    PubMed

    Prabahar, Archana; Natarajan, Jeyakumar

    2017-01-01

    MicroRNAs are the key regulators of gene expression and their abnormal expression in the immune system may be associated with several human diseases such as inflammation, cancer and autoimmune diseases. Elucidation of miRNA disease association through the interactome will deepen the understanding of its disease mechanisms. A specialized database for immune miRNAs is highly desirable to demonstrate the immune miRNA disease associations in the interactome. miRNAs specific to immune related diseases were retrieved from curated databases such as HMDD, miR2disease and PubMed literature based on MeSH classification of immune system diseases. The additional data such as miRNA target genes, genes coding protein-protein interaction information were compiled from related resources. Further, miRNAs were prioritized to specific immune diseases using random walk ranking algorithm. In total 245 immune miRNAs associated with 92 OMIM disease categories were identified from external databases. The resultant data were compiled as ImmunemiR, a database of prioritized immune miRNA disease associations. This database provides both text based annotation information and network visualization of its interactome. To our knowledge, ImmunemiR is the first available database to provide a comprehensive repository of human immune disease associated miRNAs with network visualization options of its target genes, protein-protein interactions (PPI) and its disease associations. It is freely available at http://www.biominingbu.org/immunemir/. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  10. The Protein-DNA Interface database

    PubMed Central

    2010-01-01

    The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 Å or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface. We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes. PMID:20482798

  11. The Protein-DNA Interface database.

    PubMed

    Norambuena, Tomás; Melo, Francisco

    2010-05-18

    The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 A or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface.We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes.

  12. GenomewidePDB 2.0: A Newly Upgraded Versatile Proteogenomic Database for the Chromosome-Centric Human Proteome Project.

    PubMed

    Jeong, Seul-Ki; Hancock, William S; Paik, Young-Ki

    2015-09-04

    Since the launch of the Chromosome-centric Human Proteome Project (C-HPP) in 2012, the number of "missing" proteins has fallen to 2932, down from ∼5932 since the number was first counted in 2011. We compared the characteristics of missing proteins with those of already annotated proteins with respect to transcriptional expression pattern and the time periods in which newly identified proteins were annotated. We learned that missing proteins commonly exhibit lower levels of transcriptional expression and less tissue-specific expression compared with already annotated proteins. This makes it more difficult to identify missing proteins as time goes on. One of the C-HPP goals is to identify alternative spliced product of proteins (ASPs), which are usually difficult to find by shot-gun proteomic methods due to their sequence similarities with the representative proteins. To resolve this problem, it may be necessary to use a targeted proteomics approach (e.g., selected and multiple reaction monitoring [S/MRM] assays) and an innovative bioinformatics platform that enables the selection of target peptides for rarely expressed missing proteins or ASPs. Given that the success of efforts to identify missing proteins may rely on more informative public databases, it was necessary to upgrade the available integrative databases. To this end, we attempted to improve the features and utility of GenomewidePDB by integrating transcriptomic information (e.g., alternatively spliced transcripts), annotated peptide information, and an advanced search interface that can find proteins of interest when applying a targeted proteomics strategy. This upgraded version of the database, GenomewidePDB 2.0, may not only expedite identification of the remaining missing proteins but also enhance the exchange of information among the proteome community. GenomewidePDB 2.0 is available publicly at http://genomewidepdb.proteomix.org/.

  13. LocSigDB: a database of protein localization signals

    PubMed Central

    Negi, Simarjeet; Pandey, Sanjit; Srinivasan, Satish M.; Mohammed, Akram; Guda, Chittibabu

    2015-01-01

    LocSigDB (http://genome.unmc.edu/LocSigDB/) is a manually curated database of experimental protein localization signals for eight distinct subcellular locations; primarily in a eukaryotic cell with brief coverage of bacterial proteins. Proteins must be localized at their appropriate subcellular compartment to perform their desired function. Mislocalization of proteins to unintended locations is a causative factor for many human diseases; therefore, collection of known sorting signals will help support many important areas of biomedical research. By performing an extensive literature study, we compiled a collection of 533 experimentally determined localization signals, along with the proteins that harbor such signals. Each signal in the LocSigDB is annotated with its localization, source, PubMed references and is linked to the proteins in UniProt database along with the organism information that contain the same amino acid pattern as the given signal. From LocSigDB webserver, users can download the whole database or browse/search for data using an intuitive query interface. To date, LocSigDB is the most comprehensive compendium of protein localization signals for eight distinct subcellular locations. Database URL: http://genome.unmc.edu/LocSigDB/ PMID:25725059

  14. GALT protein database: querying structural and functional features of GALT enzyme.

    PubMed

    d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

    2014-09-01

    Knowledge of the impact of variations on protein structure can enhance the comprehension of the mechanisms of genetic diseases related to that protein. Here, we present a new version of GALT Protein Database, a Web-accessible data repository for the storage and interrogation of structural effects of variations of the enzyme galactose-1-phosphate uridylyltransferase (GALT), the impairment of which leads to classic Galactosemia, a rare genetic disease. This new version of this database now contains the models of 201 missense variants of GALT enzyme, including heterozygous variants, and it allows users not only to retrieve information about the missense variations affecting this protein, but also to investigate their impact on substrate binding, intersubunit interactions, stability, and other structural features. In addition, it allows the interactive visualization of the models of variants collected into the database. We have developed additional tools to improve the use of the database by nonspecialized users. This Web-accessible database (http://bioinformatica.isa.cnr.it/GALT/GALT2.0) represents a model of tools potentially suitable for application to other proteins that are involved in human pathologies and that are subjected to genetic variations. © 2014 WILEY PERIODICALS, INC.

  15. The systematic annotation of the three main GPCR families in Reactome.

    PubMed

    Jassal, Bijay; Jupe, Steven; Caudy, Michael; Birney, Ewan; Stein, Lincoln; Hermjakob, Henning; D'Eustachio, Peter

    2010-07-29

    Reactome is an open-source, freely available database of human biological pathways and processes. A major goal of our work is to provide an integrated view of cellular signalling processes that spans from ligand-receptor interactions to molecular readouts at the level of metabolic and transcriptional events. To this end, we have built the first catalogue of all human G protein-coupled receptors (GPCRs) known to bind endogenous or natural ligands. The UniProt database has records for 797 proteins classified as GPCRs and sorted into families A/1, B/2 and C/3 on the basis of amino acid sequence. To these records we have added details from the IUPHAR database and our own manual curation of relevant literature to create reactions in which 563 GPCRs bind ligands and also interact with specific G-proteins to initiate signalling cascades. We believe the remaining 234 GPCRs are true orphans. The Reactome GPCR pathway can be viewed as a detailed interactive diagram and can be exported in many forms. It provides a template for the orthology-based inference of GPCR reactions for diverse model organism species, and can be overlaid with protein-protein interaction and gene expression datasets to facilitate overrepresentation studies and other forms of pathway analysis. Database URL: http://www.reactome.org.

  16. Human Variome Project Quality Assessment Criteria for Variation Databases.

    PubMed

    Vihinen, Mauno; Hancock, John M; Maglott, Donna R; Landrum, Melissa J; Schaafsma, Gerard C P; Taschner, Peter

    2016-06-01

    Numerous databases containing information about DNA, RNA, and protein variations are available. Gene-specific variant databases (locus-specific variation databases, LSDBs) are typically curated and maintained for single genes or groups of genes for a certain disease(s). These databases are widely considered as the most reliable information source for a particular gene/protein/disease, but it should also be made clear they may have widely varying contents, infrastructure, and quality. Quality is very important to evaluate because these databases may affect health decision-making, research, and clinical practice. The Human Variome Project (HVP) established a Working Group for Variant Database Quality Assessment. The basic principle was to develop a simple system that nevertheless provides a good overview of the quality of a database. The HVP quality evaluation criteria that resulted are divided into four main components: data quality, technical quality, accessibility, and timeliness. This report elaborates on the developed quality criteria and how implementation of the quality scheme can be achieved. Examples are provided for the current status of the quality items in two different databases, BTKbase, an LSDB, and ClinVar, a central archive of submissions about variants and their clinical significance. © 2016 WILEY PERIODICALS, INC.

  17. Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate.

    PubMed

    Park, Gun Wook; Hwang, Heeyoun; Kim, Kwang Hoe; Lee, Ju Yeon; Lee, Hyun Kyoung; Park, Ji Yeong; Ji, Eun Sun; Park, Sung-Kyu Robin; Yates, John R; Kwon, Kyung-Hoon; Park, Young Mok; Lee, Hyoung-Joo; Paik, Young-Ki; Kim, Jin Young; Yoo, Jong Shin

    2016-11-04

    In the Chromosome-Centric Human Proteome Project (C-HPP), false-positive identification by peptide spectrum matches (PSMs) after database searches is a major issue for proteogenomic studies using liquid-chromatography and mass-spectrometry-based large proteomic profiling. Here we developed a simple strategy for protein identification, with a controlled false discovery rate (FDR) at the protein level, using an integrated proteomic pipeline (IPP) that consists of four engrailed steps as follows. First, using three different search engines, SEQUEST, MASCOT, and MS-GF+, individual proteomic searches were performed against the neXtProt database. Second, the search results from the PSMs were combined using statistical evaluation tools including DTASelect and Percolator. Third, the peptide search scores were converted into E-scores normalized using an in-house program. Last, ProteinInferencer was used to filter the proteins containing two or more peptides with a controlled FDR of 1.0% at the protein level. Finally, we compared the performance of the IPP to a conventional proteomic pipeline (CPP) for protein identification using a controlled FDR of <1% at the protein level. Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including 477 alternative splicing variants (vs 182 using the CPP) were identified from human hippocampal tissue. In addition, a total of 10 missing proteins (vs 7 using the CPP) were identified with two or more unique peptides, and their tryptic peptides were validated using MS/MS spectral pattern from a repository database or their corresponding synthetic peptides. This study shows that the IPP effectively improved the identification of proteins, including alternative splicing variants and missing proteins, in human hippocampal tissues for the C-HPP. All RAW files used in this study were deposited in ProteomeXchange (PXD000395).

  18. In Silico Screening of the Human Gut Metaproteome Identifies Th17-Promoting Peptides Encrypted in Proteins of Commensal Bacteria

    PubMed Central

    Hidalgo-Cantabrana, Claudio; Moro-García, Marco A.; Blanco-Míguez, Aitor; Fdez-Riverola, Florentino; Lourenço, Anália; Alonso-Arias, Rebeca; Sánchez, Borja

    2017-01-01

    Scientific studies focused on the role of the human microbiome over human health have generated billions of gigabits of genetic information during the last decade. Nowadays integration of all this information in public databases and development of pipelines allowing us to biotechnologically exploit this information are urgently needed. Prediction of the potential bioactivity of the products encoded by the human gut microbiome, or metaproteome, is the first step for identifying proteins responsible for the molecular interaction between microorganisms and the immune system. We have recently published the Mechanism of Action of the Human Microbiome (MAHMI) database (http://www.mahmi.org), conceived as a resource compiling peptide sequences with a potential immunomodulatory activity. Fifteen out of the 300 hundred million peptides contained in the MAHMI database were synthesized. These peptides were identified as being encrypted in proteins produced by gut microbiota members, they do not contain cleavage points for the major intestinal endoproteases and displayed high probability to have immunomodulatory bioactivity. The bacterial peptides FR-16 and LR-17 encrypted in proteins from Bifidobacterium longum DJ010A and Bifidobacterium fragilis YCH46 respectively, showed the higher immune modulation capability over human peripheral blood mononuclear cells. Both peptides modulated the immune response toward increases in the Th17 and decreases in the Th1 cell response, together with an induction of IL-22 production. These results strongly suggest the combined use of bioinformatics and in vitro tools as a first stage in the screening of bioactive peptides encrypted in the human gut metaproteome. PMID:28943872

  19. Database resources of the National Center for Biotechnology Information

    PubMed Central

    2015-01-01

    The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (Bookshelf, PubMed Central (PMC) and PubReader); medical genetics (ClinVar, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen); genes and genomics (BioProject, BioSample, dbSNP, dbVar, Epigenomics, Gene, Gene Expression Omnibus (GEO), Genome, HomoloGene, the Map Viewer, Nucleotide, PopSet, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser, Trace Archive and UniGene); and proteins and chemicals (Biosystems, COBALT, the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB), Protein Clusters, Protein and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for many of these databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. PMID:25398906

  20. Database resources of the National Center for Biotechnology Information

    PubMed Central

    2016-01-01

    The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (PubMed Central (PMC), Bookshelf and PubReader), health (ClinVar, dbGaP, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen), genomes (BioProject, Assembly, Genome, BioSample, dbSNP, dbVar, Epigenomics, the Map Viewer, Nucleotide, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser and the Trace Archive), genes (Gene, Gene Expression Omnibus (GEO), HomoloGene, PopSet and UniGene), proteins (Protein, the Conserved Domain Database (CDD), COBALT, Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB) and Protein Clusters) and chemicals (Biosystems and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for most of these databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:26615191

  1. Analysis of high accuracy, quantitative proteomics data in the MaxQB database.

    PubMed

    Schaab, Christoph; Geiger, Tamar; Stoehr, Gabriele; Cox, Juergen; Mann, Matthias

    2012-03-01

    MS-based proteomics generates rapidly increasing amounts of precise and quantitative information. Analysis of individual proteomic experiments has made great strides, but the crucial ability to compare and store information across different proteome measurements still presents many challenges. For example, it has been difficult to avoid contamination of databases with low quality peptide identifications, to control for the inflation in false positive identifications when combining data sets, and to integrate quantitative data. Although, for example, the contamination with low quality identifications has been addressed by joint analysis of deposited raw data in some public repositories, we reasoned that there should be a role for a database specifically designed for high resolution and quantitative data. Here we describe a novel database termed MaxQB that stores and displays collections of large proteomics projects and allows joint analysis and comparison. We demonstrate the analysis tools of MaxQB using proteome data of 11 different human cell lines and 28 mouse tissues. The database-wide false discovery rate is controlled by adjusting the project specific cutoff scores for the combined data sets. The 11 cell line proteomes together identify proteins expressed from more than half of all human genes. For each protein of interest, expression levels estimated by label-free quantification can be visualized across the cell lines. Similarly, the expression rank order and estimated amount of each protein within each proteome are plotted. We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes. Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation. This information can be used to pinpoint false protein identifications, independently of peptide database scores. The information contained in MaxQB, including high resolution fragment spectra, is accessible to the community via a user-friendly web interface at http://www.biochem.mpg.de/maxqb.

  2. MutHTP: Mutations in Human Transmembrane Proteins.

    PubMed

    A, Kulandaisamy; S, Binny Priya; R, Sakthivel; Tarnovskaya, Svetlana; Bizin, Ilya; Hönigschmid, Peter; Frishman, Dmitrij; Gromiha, M Michael

    2018-02-01

    We have developed a novel database, MutHTP, which contains information on 183395 disease-associated and 17827 neutral mutations in human transmembrane proteins. For each mutation site MutHTP provides a description of its location with respect to the membrane protein topology, structural environment (if available) and functional features. Comprehensive visualization, search, display and download options are available. The database is publicly available at http://www.iitm.ac.in/bioinfo/MutHTP/. The website is implemented using HTML, PHP and javascript and supports recent versions of all major browsers, such as Firefox, Chrome and Opera. gromiha@iitm.ac.in. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  3. Columba: an integrated database of proteins, structures, and annotations.

    PubMed

    Trissl, Silke; Rother, Kristian; Müller, Heiko; Steinke, Thomas; Koch, Ina; Preissner, Robert; Frömmel, Cornelius; Leser, Ulf

    2005-03-31

    Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures. COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web. The COLUMBA database facilitates the creation of protein structure data sets for many structure-based studies. It allows to combine queries on a number of structure-related databases not covered by other projects at present. Thus, information on both many and few protein structures can be used efficiently. The web interface for COLUMBA is available at http://www.columba-db.de.

  4. Construction and Deciphering of Human Phosphorylation-Mediated Signaling Transduction Networks.

    PubMed

    Zhang, Menghuan; Li, Hong; He, Ying; Sun, Han; Xia, Li; Wang, Lishun; Sun, Bo; Ma, Liangxiao; Zhang, Guoqing; Li, Jing; Li, Yixue; Xie, Lu

    2015-07-02

    Protein phosphorylation is the most abundant reversible covalent modification. Human protein kinases participate in almost all biological pathways, and approximately half of the kinases are associated with disease. PhoSigNet was designed to store and display human phosphorylation-mediated signal transduction networks, with additional information related to cancer. It contains 11 976 experimentally validated directed edges and 216 871 phosphorylation sites. Moreover, 3491 differentially expressed proteins in human cancer from dbDEPC, 18 907 human cancer variation sites from CanProVar, and 388 hyperphosphorylation sites from PhosphoSitePlus were collected as annotation information. Compared with other phosphorylation-related databases, PhoSigNet not only takes the kinase-substrate regulatory relationship pairs into account, but also extends regulatory relationships up- and downstream (e.g., from ligand to receptor, from G protein to kinase, and from transcription factor to targets). Furthermore, PhoSigNet allows the user to investigate the impact of phosphorylation modifications on cancer. By using one set of in-house time series phosphoproteomics data, the reconstruction of a conditional and dynamic phosphorylation-mediated signaling network was exemplified. We expect PhoSigNet to be a useful database and analysis platform benefiting both proteomics and cancer studies.

  5. MetaPro-IQ: a universal metaproteomic approach to studying human and mouse gut microbiota.

    PubMed

    Zhang, Xu; Ning, Zhibin; Mayne, Janice; Moore, Jasmine I; Li, Jennifer; Butcher, James; Deeke, Shelley Ann; Chen, Rui; Chiang, Cheng-Kang; Wen, Ming; Mack, David; Stintzi, Alain; Figeys, Daniel

    2016-06-24

    The gut microbiota has been shown to be closely associated with human health and disease. While next-generation sequencing can be readily used to profile the microbiota taxonomy and metabolic potential, metaproteomics is better suited for deciphering microbial biological activities. However, the application of gut metaproteomics has largely been limited due to the low efficiency of protein identification. Thus, a high-performance and easy-to-implement gut metaproteomic approach is required. In this study, we developed a high-performance and universal workflow for gut metaproteome identification and quantification (named MetaPro-IQ) by using the close-to-complete human or mouse gut microbial gene catalog as database and an iterative database search strategy. An average of 38 and 33 % of the acquired tandem mass spectrometry (MS) spectra was confidently identified for the studied mouse stool and human mucosal-luminal interface samples, respectively. In total, we accurately quantified 30,749 protein groups for the mouse metaproteome and 19,011 protein groups for the human metaproteome. Moreover, the MetaPro-IQ approach enabled comparable identifications with the matched metagenome database search strategy that is widely used but needs prior metagenomic sequencing. The response of gut microbiota to high-fat diet in mice was then assessed, which showed distinct metaproteome patterns for high-fat-fed mice and identified 849 proteins as significant responders to high-fat feeding in comparison to low-fat feeding. We present MetaPro-IQ, a metaproteomic approach for highly efficient intestinal microbial protein identification and quantification, which functions as a universal workflow for metaproteomic studies, and will thus facilitate the application of metaproteomics for better understanding the functions of gut microbiota in health and disease.

  6. ApoptoProteomics, an integrated database for analysis of proteomics data obtained from apoptotic cells.

    PubMed

    Arntzen, Magnus Ø; Thiede, Bernd

    2012-02-01

    Apoptosis is the most commonly described form of programmed cell death, and dysfunction is implicated in a large number of human diseases. Many quantitative proteome analyses of apoptosis have been performed to gain insight in proteins involved in the process. This resulted in large and complex data sets that are difficult to evaluate. Therefore, we developed the ApoptoProteomics database for storage, browsing, and analysis of the outcome of large scale proteome analyses of apoptosis derived from human, mouse, and rat. The proteomics data of 52 publications were integrated and unified with protein annotations from UniProt-KB, the caspase substrate database homepage (CASBAH), and gene ontology. Currently, more than 2300 records of more than 1500 unique proteins were included, covering a large proportion of the core signaling pathways of apoptosis. Analysis of the data set revealed a high level of agreement between the reported changes in directionality reported in proteomics studies and expected apoptosis-related function and may disclose proteins without a current recognized involvement in apoptosis based on gene ontology. Comparison between induction of apoptosis by the intrinsic and the extrinsic apoptotic signaling pathway revealed slight differences. Furthermore, proteomics has significantly contributed to the field of apoptosis in identifying hundreds of caspase substrates. The database is available at http://apoptoproteomics.uio.no.

  7. ApoptoProteomics, an Integrated Database for Analysis of Proteomics Data Obtained from Apoptotic Cells*

    PubMed Central

    Arntzen, Magnus Ø.; Thiede, Bernd

    2012-01-01

    Apoptosis is the most commonly described form of programmed cell death, and dysfunction is implicated in a large number of human diseases. Many quantitative proteome analyses of apoptosis have been performed to gain insight in proteins involved in the process. This resulted in large and complex data sets that are difficult to evaluate. Therefore, we developed the ApoptoProteomics database for storage, browsing, and analysis of the outcome of large scale proteome analyses of apoptosis derived from human, mouse, and rat. The proteomics data of 52 publications were integrated and unified with protein annotations from UniProt-KB, the caspase substrate database homepage (CASBAH), and gene ontology. Currently, more than 2300 records of more than 1500 unique proteins were included, covering a large proportion of the core signaling pathways of apoptosis. Analysis of the data set revealed a high level of agreement between the reported changes in directionality reported in proteomics studies and expected apoptosis-related function and may disclose proteins without a current recognized involvement in apoptosis based on gene ontology. Comparison between induction of apoptosis by the intrinsic and the extrinsic apoptotic signaling pathway revealed slight differences. Furthermore, proteomics has significantly contributed to the field of apoptosis in identifying hundreds of caspase substrates. The database is available at http://apoptoproteomics.uio.no. PMID:22067098

  8. In silico analysis of candidate proteins sharing homology with Streptococcus agalactiae proteins and their role in male infertility.

    PubMed

    Parida, Rajeshwari; Samanta, Luna

    2017-02-01

    Leukocytospermia is a physiologic condition defined as human semen with a leukocyte count of >1 x 10 6 cells/ml that is often correlated with male infertility. Moreover, bacteriospermia has been associated with leukocytospermia ultimately leading to male infertility. We have found that semen samples with >1 x 10 6 /ml leukocytes and/or bacteriospermia have oxidative predominance as evidenced by augmented protein carbonyl and lipid peroxidation status of the semen which is implicated in sperm dysfunction. It has been reported that Streptococcus agalactiae is present in bacteriospermic samples. Previous research has shown that human leukocyte antigen beta chain paralog (HLA-DRB) alleles interact best with the infected sperm cells rather than the non-infected cells. Little is known about the interaction of major histocompatibility complex (MHC) present on leukocytes with the sperm upon bacterial infection and how it induces an immunological response which we have addressed by epitope mapping. Therefore, we examined MHC class II derived bacterial peptides which might have human sperm-related functional aspects. Twenty-two S. agalactiae proteins were obtained from PUBMED protein database for our study. Protein sequences with more than two accession numbers were aligned using CLUSTAL Omega to check their conservation pattern. Each protein sequence was then analyzed for T-cell epitope prediction against HLA-DRB alleles using the immune epitope database (IEDB) analysis tool. Out of a plethora of peptides obtained from this analysis, peptides corresponding to proteins of interest such as DNA binding response regulator, hyaluronate lyase and laminin binding protein were screened against the human proteome using Blastp. Interestingly, we have found bacterial peptides sharing homology with human peptides deciphering some of the important sperm functions. Antibodies raised against these probable bacterial antigens of fertility will not only help us understand the mechanism of leukocytospermia/bacteriospermia induced male factor infertility but also open new avenues for immunocontraception. AA: amino acid; ASA: antisperm antibodies; GBS: group B streptococcus; HLA: human leukocyte antigen; HAS3: hyaluronan synthase 3: IEDB: immune epitope database; MAPO2: O 6 -methylguanine-induced apoptosis 2; MHC: major histocompatibility complex; ROS: reactive oxygen species; Rosbin1: round spermatid basic protein 1; S. agalactiae: Streptococcus agalactiae;SA: sperm antigen; SPATA17: spermatogenesis associated protein17; SPNR: spermatid perinuclear RNA binding protein; TEX15: testis-expressed sequence 15 protein; TOPAZ: testis- and ovary-specific PAZ domain-containing protein; TPABP: testis-specific poly-A binding protein; TPAP: testis-specific poly(A) polymerase; WHO: World Health Organization.

  9. dbDSM: a manually curated database for deleterious synonymous mutations.

    PubMed

    Wen, Pengbo; Xiao, Peng; Xia, Junfeng

    2016-06-15

    Synonymous mutations (SMs), which changed the sequence of a gene without directly altering the amino acid sequence of the encoded protein, were thought to have no functional consequences for a long time. They are often assumed to be neutral in models of mutation and selection and were completely ignored in many studies. However, accumulating experimental evidence has demonstrated that these mutations exert their impact on gene functions via splicing accuracy, mRNA stability, translation fidelity, protein folding and expression, and some of these mutations are implicated in human diseases. To the best of our knowledge, there is still no database specially focusing on disease-related SMs. We have developed a new database called dbDSM (database of Deleterious Synonymous Mutation), a continually updated database that collects, curates and manages available human disease-related SM data obtained from published literature. In the current release, dbDSM collects 1936 SM-disease association entries, including 1289 SMs and 443 human diseases from ClinVar, GRASP, GWAS Catalog, GWASdb, PolymiRTS database, PubMed database and Web of Knowledge. Additionally, we provided users a link to download all the data in the dbDSM and a link to submit novel data into the database. We hope dbDSM will be a useful resource for investigating the roles of SMs in human disease. dbDSM is freely available online at http://bioinfo.ahu.edu.cn:8080/dbDSM/index.jsp with all major browser supported. jfxia@ahu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Human gut endogenous proteins as a potential source of angiotensin-I-converting enzyme (ACE-I)-, renin inhibitory and antioxidant peptides.

    PubMed

    Dave, Lakshmi A; Hayes, Maria; Montoya, Carlos A; Rutherfurd, Shane M; Moughan, Paul J

    2016-02-01

    It is well known that endogenous bioactive proteins and peptides play a substantial role in the body's first line of immunological defence, immune-regulation and normal body functioning. Further, the peptides derived from the luminal digestion of proteins are also important for body function. For example, within the peptide database BIOPEP (http://www.uwm.edu.pl/biochemia/index.php/en/biopep) 12 endogenous antimicrobial and 64 angiotensin-I-converting enzyme (ACE-I) inhibitory peptides derived from human milk and plasma proteins are listed. The antimicrobial peptide database (http://aps.unmc.edu/AP/main.php) lists over 111 human host-defence peptides. Several endogenous proteins are secreted in the gut and are subject to the same gastrointestinal digestion processes as food proteins derived from the diet. The human gut endogenous proteins (GEP) include mucins, serum albumin, digestive enzymes, hormones, and proteins from sloughed off epithelial cells and gut microbiota, and numerous other secreted proteins. To date, much work has been carried out regarding the health altering effects of food-derived bioactive peptides but little attention has been paid to the possibility that GEP may also be a source of bioactive peptides. In this review, we discuss the potential of GEP to constitute a gut cryptome from which bioactive peptides such as ACE-I inhibitory, renin inhibitory and antioxidant peptides may be derived. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Predicting Protein Relationships to Human Pathways through a Relational Learning Approach Based on Simple Sequence Features.

    PubMed

    García-Jiménez, Beatriz; Pons, Tirso; Sanchis, Araceli; Valencia, Alfonso

    2014-01-01

    Biological pathways are important elements of systems biology and in the past decade, an increasing number of pathway databases have been set up to document the growing understanding of complex cellular processes. Although more genome-sequence data are becoming available, a large fraction of it remains functionally uncharacterized. Thus, it is important to be able to predict the mapping of poorly annotated proteins to original pathway models. We have developed a Relational Learning-based Extension (RLE) system to investigate pathway membership through a function prediction approach that mainly relies on combinations of simple properties attributed to each protein. RLE searches for proteins with molecular similarities to specific pathway components. Using RLE, we associated 383 uncharacterized proteins to 28 pre-defined human Reactome pathways, demonstrating relative confidence after proper evaluation. Indeed, in specific cases manual inspection of the database annotations and the related literature supported the proposed classifications. Examples of possible additional components of the Electron transport system, Telomere maintenance and Integrin cell surface interactions pathways are discussed in detail. All the human predicted proteins in the 2009 and 2012 releases 30 and 40 of Reactome are available at http://rle.bioinfo.cnio.es.

  12. LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources.

    PubMed

    Karchin, Rachel; Diekhans, Mark; Kelly, Libusha; Thomas, Daryl J; Pieper, Ursula; Eswar, Narayanan; Haussler, David; Sali, Andrej

    2005-06-15

    The NCBI dbSNP database lists over 9 million single nucleotide polymorphisms (SNPs) in the human genome, but currently contains limited annotation information. SNPs that result in amino acid residue changes (nsSNPs) are of critical importance in variation between individuals, including disease and drug sensitivity. We have developed LS-SNP, a genomic scale software pipeline to annotate nsSNPs. LS-SNP comprehensively maps nsSNPs onto protein sequences, functional pathways and comparative protein structure models, and predicts positions where nsSNPs destabilize proteins, interfere with the formation of domain-domain interfaces, have an effect on protein-ligand binding or severely impact human health. It currently annotates 28,043 validated SNPs that produce amino acid residue substitutions in human proteins from the SwissProt/TrEMBL database. Annotations can be viewed via a web interface either in the context of a genomic region or by selecting sets of SNPs, genes, proteins or pathways. These results are useful for identifying candidate functional SNPs within a gene, haplotype or pathway and in probing molecular mechanisms responsible for functional impacts of nsSNPs. http://www.salilab.org/LS-SNP CONTACT: rachelk@salilab.org http://salilab.org/LS-SNP/supp-info.pdf.

  13. A human protein atlas for normal and cancer tissues based on antibody proteomics.

    PubMed

    Uhlén, Mathias; Björling, Erik; Agaton, Charlotta; Szigyarto, Cristina Al-Khalili; Amini, Bahram; Andersen, Elisabet; Andersson, Ann-Catrin; Angelidou, Pia; Asplund, Anna; Asplund, Caroline; Berglund, Lisa; Bergström, Kristina; Brumer, Harry; Cerjan, Dijana; Ekström, Marica; Elobeid, Adila; Eriksson, Cecilia; Fagerberg, Linn; Falk, Ronny; Fall, Jenny; Forsberg, Mattias; Björklund, Marcus Gry; Gumbel, Kristoffer; Halimi, Asif; Hallin, Inga; Hamsten, Carl; Hansson, Marianne; Hedhammar, My; Hercules, Görel; Kampf, Caroline; Larsson, Karin; Lindskog, Mats; Lodewyckx, Wald; Lund, Jan; Lundeberg, Joakim; Magnusson, Kristina; Malm, Erik; Nilsson, Peter; Odling, Jenny; Oksvold, Per; Olsson, Ingmarie; Oster, Emma; Ottosson, Jenny; Paavilainen, Linda; Persson, Anja; Rimini, Rebecca; Rockberg, Johan; Runeson, Marcus; Sivertsson, Asa; Sköllermo, Anna; Steen, Johanna; Stenvall, Maria; Sterky, Fredrik; Strömberg, Sara; Sundberg, Mårten; Tegel, Hanna; Tourle, Samuel; Wahlund, Eva; Waldén, Annelie; Wan, Jinghong; Wernérus, Henrik; Westberg, Joakim; Wester, Kenneth; Wrethagen, Ulla; Xu, Lan Lan; Hober, Sophia; Pontén, Fredrik

    2005-12-01

    Antibody-based proteomics provides a powerful approach for the functional study of the human proteome involving the systematic generation of protein-specific affinity reagents. We used this strategy to construct a comprehensive, antibody-based protein atlas for expression and localization profiles in 48 normal human tissues and 20 different cancers. Here we report a new publicly available database containing, in the first version, approximately 400,000 high resolution images corresponding to more than 700 antibodies toward human proteins. Each image has been annotated by a certified pathologist to provide a knowledge base for functional studies and to allow queries about protein profiles in normal and disease tissues. Our results suggest it should be possible to extend this analysis to the majority of all human proteins thus providing a valuable tool for medical and biological research.

  14. The BioGRID Interaction Database: 2011 update

    PubMed Central

    Stark, Chris; Breitkreutz, Bobby-Joe; Chatr-aryamontri, Andrew; Boucher, Lorrie; Oughtred, Rose; Livstone, Michael S.; Nixon, Julie; Van Auken, Kimberly; Wang, Xiaodong; Shi, Xiaoqi; Reguly, Teresa; Rust, Jennifer M.; Winter, Andrew; Dolinski, Kara; Tyers, Mike

    2011-01-01

    The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions. PMID:21071413

  15. Systematic Mapping and Functional Analysis of a Family of Human Epididymal Secretory Sperm-Located Proteins*

    PubMed Central

    Li, JianYuan; Liu, FuJun; Wang, HaiYan; Liu, Xin; Liu, Juan; Li, Ning; Wan, FengChun; Wang, WenTing; Zhang, ChengLin; Jin, ShaoHua; Liu, Jie; Zhu, Peng; Liu, YunXiang

    2010-01-01

    The mammalian spermatozoon has many cellular compartments, such as head and tail, permitting it to interact with the female reproductive tract and fertilize the egg. It acquires this fertilizing potential during transit through the epididymis, which secretes proteins that coat different sperm domains. Optimal levels of these proteins provide the spermatozoon with its ability to move to, bind to, fuse with, and penetrate the egg; otherwise male infertility results. As few human epididymal proteins have been characterized, this work was performed to generate a database of human epididymal sperm-located proteins involved in maturation. Two-dimensional gel electrophoresis of epididymal tissue and luminal fluid proteins, followed by identification using MALDI-TOF/MS or MALDI-TOF/TOF, revealed over a thousand spots in gels comprising 745 abundant nonstructural proteins, 408 in luminal fluids, of which 207 were present on spermatozoa. Antibodies raised to 619 recombinant or synthetic peptides, used in Western blots, histological sections, and washed sperm preparations to confirm antibody quality and protein expression, indicated their regional location in the epididymal epithelium and highly specific locations on washed functional spermatozoa. Sperm function tests suggested the role of some proteins in motility and protection against oxidative attack. A large database of these proteins, characterized by size, pI, chromosomal location, and function, was given a unified terminology reflecting their sperm domain location. These novel, secreted human epididymal proteins are potential targets for a posttesticular contraceptive acting to provide rapid, reversible, functional sterility in men and they are also biomarkers that could be used in noninvasive assessments of male fertility. PMID:20736409

  16. Analysis of differential protein expression in normal and neoplastic human breast epithelial cell lines

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williams, K.; Chubb, C.; Huberman, E.

    High resolution two dimensional get electrophoresis (2DE) and database analysis was used to establish protein expression patterns for cultured normal human mammary epithelial cells and thirteen breast cancer cell lines. The Human Breast Epithelial Cell database contains the 2DE protein patterns, including relative protein abundances, for each cell line, plus a composite pattern that contains all the common and specifically expressed proteins from all the cell lines. Significant differences in protein expression, both qualitative and quantitative, were observed not only between normal cells and tumor cells, but also among the tumor cell lines. Eight percent of the consistently detected proteinsmore » were found in significantly (P < 0.001) variable levels among the cell lines. Using a combination of immunostaining, comigration with purified protein, subcellular fractionation, and amino-terminal protein sequencing, we identified a subset of the differentially expressed proteins. These identified proteins include the cytoskeletal proteins actin, tubulin, vimentin, and cytokeratins. The cell lines can be classified into four distinct groups based on their intermediate filament protein profile. We also identified heat shock proteins; hsp27, hsp60, and hsp70 varied in abundance and in some cases in the relative phosphorylation levels among the cell lines. Finally, we identified IMP dehydrogenase in each of the cell lines, and found the levels of this enzyme in the tumor cell lines elevated 2- to 20-fold relative to the levels in normal cells.« less

  17. BLAST and FASTA similarity searching for multiple sequence alignment.

    PubMed

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  18. Unbiased Protein Association Study on the Public Human Proteome Reveals Biological Connections between Co-Occurring Protein Pairs

    PubMed Central

    2017-01-01

    Mass-spectrometry-based, high-throughput proteomics experiments produce large amounts of data. While typically acquired to answer specific biological questions, these data can also be reused in orthogonal ways to reveal new biological knowledge. We here present a novel method for such orthogonal data reuse of public proteomics data. Our method elucidates biological relationships between proteins based on the co-occurrence of these proteins across human experiments in the PRIDE database. The majority of the significantly co-occurring protein pairs that were detected by our method have been successfully mapped to existing biological knowledge. The validity of our novel method is substantiated by the extremely few pairs that can be mapped to existing knowledge based on random associations between the same set of proteins. Moreover, using literature searches and the STRING database, we were able to derive meaningful biological associations for unannotated protein pairs that were detected using our method, further illustrating that as-yet unknown associations present highly interesting targets for follow-up analysis. PMID:28480704

  19. Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin

    ERIC Educational Resources Information Center

    Offner, Susan

    2010-01-01

    The beta hemoglobin protein is identical in humans and chimpanzees. In this tutorial, students see that even though the proteins are identical, the genes that code for them are not. There are many more differences in the introns than in the exons, which indicates that coding regions of DNA are more highly conserved than non-coding regions.

  20. Peeping into Human Renal Calcium Oxalate Stone Matrix: Characterization of Novel Proteins Involved in the Intricate Mechanism of Urolithiasis

    PubMed Central

    Tandon, Chanderdeep

    2013-01-01

    Background The increasing number of patients suffering from urolithiasis represents one of the major challenges which nephrologists face worldwide today. For enhancing therapeutic outcomes of this disease, the pathogenic basis for the formation of renal stones is the need of hour. Proteins are found as major component in human renal stone matrix and are considered to have a potential role in crystal–membrane interaction, crystal growth and stone formation but their role in urolithiasis still remains obscure. Methods Proteins were isolated from the matrix of human CaOx containing kidney stones. Proteins having MW>3 kDa were subjected to anion exchange chromatography followed by molecular-sieve chromatography. The effect of these purified proteins was tested against CaOx nucleation and growth and on oxalate injured Madin–Darby Canine Kidney (MDCK) renal epithelial cells for their activity. Proteins were identified by Matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF MS) followed by database search with MASCOT server. In silico molecular interaction studies with CaOx crystals were also investigated. Results Five proteins were identified from the matrix of calcium oxalate kidney stones by MALDI-TOF MS followed by database search with MASCOT server with the competence to control the stone formation process. Out of which two proteins were promoters, two were inhibitors and one protein had a dual activity of both inhibition and promotion towards CaOx nucleation and growth. Further molecular modelling calculations revealed the mode of interaction of these proteins with CaOx at the molecular level. Conclusions We identified and characterized Ethanolamine-phosphate cytidylyltransferase, Ras GTPase-activating-like protein, UDP-glucose:glycoprotein glucosyltransferase 2, RIMS-binding protein 3A, Macrophage-capping protein as novel proteins from the matrix of human calcium oxalate stone which play a critical role in kidney stone formation. Thus, these proteins having potential to modulate calcium oxalate crystallization will throw light on understanding and controlling urolithiasis in humans. PMID:23894559

  1. Proteomic profiling of human pleural effusion using two-dimensional nano liquid chromatography tandem mass spectrometry.

    PubMed

    Tyan, Yu-Chang; Wu, Hsin-Yi; Lai, Wu-Wei; Su, Wu-Chou; Liao, Pao-Chi

    2005-01-01

    Pleural effusion, an accumulation of pleural fluid, contains proteins originated from plasma filtrate and, especially when tissues are damaged, parenchyma interstitial spaces of lungs and/or other organs. This study details protein profiles in human pleural effusion from 43 lung adenocarcinoma patients by a two-dimensional nano-high performance liquid chromatography electrospray ionization tandem mass spectrometry (2D nano-HPLC-ESI-MS/MS) system. The experimental results revealed the identification of 1415 unique proteins from human pleural effusion. Among these 124 proteins identified with higher confidence levels, some proteins have not been reported in plasma and may represent proteins specifically present in pleural effusion. These proteins are valuable for mass identification of differentially expressed proteins involved in proteomics database and screening biomarker to further study in human lung adenocarcinoma. The significance of the use of proteomics analysis of human pleural fluid for the search of new lung cancer marker proteins, and for their simultaneous display and analysis in patients suffering from lung disorders has been examined.

  2. The Human Ageing Genomic Resources: online databases and tools for biogerontologists

    PubMed Central

    de Magalhães, João Pedro; Budovsky, Arie; Lehmann, Gilad; Costa, Joana; Li, Yang; Fraifeld, Vadim; Church, George M.

    2009-01-01

    Summary Ageing is a complex, challenging phenomenon that will require multiple, interdisciplinary approaches to unravel its puzzles. To assist basic research on ageing, we developed the Human Ageing Genomic Resources (HAGR). This work provides an overview of the databases and tools in HAGR and describes how the gerontology research community can employ them. Several recent changes and improvements to HAGR are also presented. The two centrepieces in HAGR are GenAge and AnAge. GenAge is a gene database featuring genes associated with ageing and longevity in model organisms, a curated database of genes potentially associated with human ageing, and a list of genes tested for their association with human longevity. A myriad of biological data and information is included for hundreds of genes, making GenAge a reference for research that reflects our current understanding of the genetic basis of ageing. GenAge can also serve as a platform for the systems biology of ageing, and tools for the visualization of protein-protein interactions are also included. AnAge is a database of ageing in animals, featuring over 4,000 species, primarily assembled as a resource for comparative and evolutionary studies of ageing. Longevity records, developmental and reproductive traits, taxonomic information, basic metabolic characteristics, and key observations related to ageing are included in AnAge. Software is also available to aid researchers in the form of Perl modules to automate numerous tasks and as an SPSS script to analyse demographic mortality data. The Human Ageing Genomic Resources are available online at http://genomics.senescence.info. PMID:18986374

  3. A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics*

    PubMed Central

    Li, Jing; Su, Zengliu; Ma, Ze-Qiang; Slebos, Robbert J. C.; Halvey, Patrick; Tabb, David L.; Liebler, Daniel C.; Pao, William; Zhang, Bing

    2011-01-01

    Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. PMID:21389108

  4. ChiTaRS-3.1-the enhanced chimeric transcripts and RNA-seq database matched with protein-protein interactions.

    PubMed

    Gorohovski, Alessandro; Tagore, Somnath; Palande, Vikrant; Malka, Assaf; Raviv-Shay, Dorith; Frenkel-Morgenstern, Milana

    2017-01-04

    Discovery of chimeric RNAs, which are produced by chromosomal translocations as well as the joining of exons from different genes by trans-splicing, has added a new level of complexity to our study and understanding of the transcriptome. The enhanced ChiTaRS-3.1 database (http://chitars.md.biu.ac.il) is designed to make widely accessible a wealth of mined data on chimeric RNAs, with easy-to-use analytical tools built-in. The database comprises 34 922: chimeric transcripts along with 11 714: cancer breakpoints. In this latest version, we have included multiple cross-references to GeneCards, iHop, PubMed, NCBI, Ensembl, OMIM, RefSeq and the Mitelman collection for every entry in the 'Full Collection'. In addition, for every chimera, we have added a predicted Chimeric Protein-Protein Interaction (ChiPPI) network, which allows for easy visualization of protein partners of both parental and fusion proteins for all human chimeras. The database contains a comprehensive annotation for 34 922: chimeric transcripts from eight organisms, and includes the manual annotation of 200 sense-antiSense (SaS) chimeras. The current improvements in the content and functionality to the ChiTaRS database make it a central resource for the study of chimeric transcripts and fusion proteins. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. P185-M Protein Identification and Validation of Results in Workflows that Integrate over Various Instruments, Datasets, Search Engines

    PubMed Central

    Hufnagel, P.; Glandorf, J.; Körting, G.; Jabs, W.; Schweiger-Hufnagel, U.; Hahner, S.; Lubeck, M.; Suckau, D.

    2007-01-01

    Analysis of complex proteomes often results in long protein lists, but falls short in measuring the validity of identification and quantification results on a greater number of proteins. Biological and technical replicates are mandatory, as is the combination of the MS data from various workflows (gels, 1D-LC, 2D-LC), instruments (TOF/TOF, trap, qTOF or FTMS), and search engines. We describe a database-driven study that combines two workflows, two mass spectrometers, and four search engines with protein identification following a decoy database strategy. The sample was a tryptically digested lysate (10,000 cells) of a human colorectal cancer cell line. Data from two LC-MALDI-TOF/TOF runs and a 2D-LC-ESI-trap run using capillary and nano-LC columns were submitted to the proteomics software platform ProteinScape. The combined MALDI data and the ESI data were searched using Mascot (Matrix Science), Phenyx (GeneBio), ProteinSolver (Bruker and Protagen), and Sequest (Thermo) against a decoy database generated from IPI-human in order to obtain one protein list across all workflows and search engines at a defined maximum false-positive rate of 5%. ProteinScape combined the data to one LC-MALDI and one LC-ESI dataset. The initial separate searches from the two combined datasets generated eight independent peptide lists. These were compiled into an integrated protein list using the ProteinExtractor algorithm. An initial evaluation of the generated data led to the identification of approximately 1200 proteins. Result integration on a peptide level allowed discrimination of protein isoforms that would not have been possible with a mere combination of protein lists.

  6. Database resources of the National Center for Biotechnology Information.

    PubMed

    2016-01-04

    The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank(®) nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (PubMed Central (PMC), Bookshelf and PubReader), health (ClinVar, dbGaP, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen), genomes (BioProject, Assembly, Genome, BioSample, dbSNP, dbVar, Epigenomics, the Map Viewer, Nucleotide, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser and the Trace Archive), genes (Gene, Gene Expression Omnibus (GEO), HomoloGene, PopSet and UniGene), proteins (Protein, the Conserved Domain Database (CDD), COBALT, Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB) and Protein Clusters) and chemicals (Biosystems and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for most of these databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  7. Database resources of the National Center for Biotechnology Information.

    PubMed

    2015-01-01

    The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank(®) nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (Bookshelf, PubMed Central (PMC) and PubReader); medical genetics (ClinVar, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen); genes and genomics (BioProject, BioSample, dbSNP, dbVar, Epigenomics, Gene, Gene Expression Omnibus (GEO), Genome, HomoloGene, the Map Viewer, Nucleotide, PopSet, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser, Trace Archive and UniGene); and proteins and chemicals (Biosystems, COBALT, the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB), Protein Clusters, Protein and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for many of these databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  8. ChemProt-2.0: visual navigation in a disease chemical biology database

    PubMed Central

    Kim Kjærulff, Sonny; Wich, Louis; Kringelum, Jens; Jacobsen, Ulrik P.; Kouskoumvekaki, Irene; Audouze, Karine; Lund, Ole; Brunak, Søren; Oprea, Tudor I.; Taboureau, Olivier

    2013-01-01

    ChemProt-2.0 (http://www.cbs.dtu.dk/services/ChemProt-2.0) is a public available compilation of multiple chemical–protein annotation resources integrated with diseases and clinical outcomes information. The database has been updated to >1.15 million compounds with 5.32 millions bioactivity measurements for 15 290 proteins. Each protein is linked to quality-scored human protein–protein interactions data based on more than half a million interactions, for studying diseases and biological outcomes (diseases, pathways and GO terms) through protein complexes. In ChemProt-2.0, therapeutic effects as well as adverse drug reactions have been integrated allowing for suggesting proteins associated to clinical outcomes. New chemical structure fingerprints were computed based on the similarity ensemble approach. Protein sequence similarity search was also integrated to evaluate the promiscuity of proteins, which can help in the prediction of off-target effects. Finally, the database was integrated into a visual interface that enables navigation of the pharmacological space for small molecules. Filtering options were included in order to facilitate and to guide dynamic search of specific queries. PMID:23185041

  9. A compatible exon-exon junction database for the identification of exon skipping events using tandem mass spectrum data.

    PubMed

    Mo, Fan; Hong, Xu; Gao, Feng; Du, Lin; Wang, Jun; Omenn, Gilbert S; Lin, Biaoyang

    2008-12-16

    Alternative splicing is an important gene regulation mechanism. It is estimated that about 74% of multi-exon human genes have alternative splicing. High throughput tandem (MS/MS) mass spectrometry provides valuable information for rapidly identifying potentially novel alternatively-spliced protein products from experimental datasets. However, the ability to identify alternative splicing events through tandem mass spectrometry depends on the database against which the spectra are searched. We wrote scripts in perl, Bioperl, mysql and Ensembl API and built a theoretical exon-exon junction protein database to account for all possible combinations of exons for a gene while keeping the frame of translation (i.e., keeping only in-phase exon-exon combinations) from the Ensembl Core Database. Using our liver cancer MS/MS dataset, we identified a total of 488 non-redundant peptides that represent putative exon skipping events. Our exon-exon junction database provides the scientific community with an efficient means to identify novel alternatively spliced (exon skipping) protein isoforms using mass spectrometry data. This database will be useful in annotating genome structures using rapidly accumulating proteomics data.

  10. TrypsNetDB: An integrated framework for the functional characterization of trypanosomatid proteins

    PubMed Central

    Gazestani, Vahid H.; Yip, Chun Wai; Nikpour, Najmeh; Berghuis, Natasha

    2017-01-01

    Trypanosomatid parasites cause serious infections in humans and production losses in livestock. Due to the high divergence from other eukaryotes, such as humans and model organisms, the functional roles of many trypanosomatid proteins cannot be predicted by homology-based methods, rendering a significant portion of their proteins as uncharacterized. Recent technological advances have led to the availability of multiple systematic and genome-wide datasets on trypanosomatid parasites that are informative regarding the biological role(s) of their proteins. Here, we report TrypsNetDB (http://trypsNetDB.org), a web-based resource for the functional annotation of 16 different species/strains of trypanosomatid parasites. The database not only visualizes the network context of the queried protein(s) in an intuitive way but also examines the response of the represented network in more than 50 different biological contexts and its enrichment for various biological terms and pathways, protein sequence signatures, and potential RNA regulatory elements. The interactome core of the database, as of Jan 23, 2017, contains 101,187 interactions among 13,395 trypanosomatid proteins inferred from 97 genome-wide and focused studies on the interactome of these organisms. PMID:28158179

  11. Sequence Complexity of Amyloidogenic Regions in Intrinsically Disordered Human Proteins

    PubMed Central

    Das, Swagata; Pal, Uttam; Das, Supriya; Bagga, Khyati; Roy, Anupam; Mrigwani, Arpita; Maiti, Nakul C.

    2014-01-01

    An amyloidogenic region (AR) in a protein sequence plays a significant role in protein aggregation and amyloid formation. We have investigated the sequence complexity of AR that is present in intrinsically disordered human proteins. More than 80% human proteins in the disordered protein databases (DisProt+IDEAL) contained one or more ARs. With decrease of protein disorder, AR content in the protein sequence was decreased. A probability density distribution analysis and discrete analysis of AR sequences showed that ∼8% residue in a protein sequence was in AR and the region was in average 8 residues long. The residues in the AR were high in sequence complexity and it seldom overlapped with low complexity regions (LCR), which was largely abundant in disorder proteins. The sequences in the AR showed mixed conformational adaptability towards α-helix, β-sheet/strand and coil conformations. PMID:24594841

  12. A Viral-Human Interactome Based on Structural Motif-Domain Interactions Captures the Human Infectome

    PubMed Central

    Guo, Xianwu; Rodríguez-Pérez, Mario A.

    2013-01-01

    Protein interactions between a pathogen and its host are fundamental in the establishment of the pathogen and underline the infection mechanism. In the present work, we developed a single predictive model for building a host-viral interactome based on the identification of structural descriptors from motif-domain interactions of protein complexes deposited in the Protein Data Bank (PDB). The structural descriptors were used for searching, in a database of protein sequences of human and five clinically important viruses; therefore, viral and human proteins sharing a descriptor were predicted as interacting proteins. The analysis of the host-viral interactome allowed to identify a set of new interactions that further explain molecular mechanism associated with viral infections and showed that it was able to capture human proteins already associated to viral infections (human infectome) and non-infectious diseases (human diseasome). The analysis of human proteins targeted by viral proteins in the context of a human interactome showed that their neighbors are enriched in proteins reported with differential expression under infection and disease conditions. It is expected that the findings of this work will contribute to the development of systems biology for infectious diseases, and help guide the rational identification and prioritization of novel drug targets. PMID:23951184

  13. RPG: the Ribosomal Protein Gene database.

    PubMed

    Nakao, Akihiro; Yoshihama, Maki; Kenmochi, Naoya

    2004-01-01

    RPG (http://ribosome.miyazaki-med.ac.jp/) is a new database that provides detailed information about ribosomal protein (RP) genes. It contains data from humans and other organisms, including Drosophila melanogaster, Caenorhabditis elegans, Saccharo myces cerevisiae, Methanococcus jannaschii and Escherichia coli. Users can search the database by gene name and organism. Each record includes sequences (genomic, cDNA and amino acid sequences), intron/exon structures, genomic locations and information about orthologs. In addition, users can view and compare the gene structures of the above organisms and make multiple amino acid sequence alignments. RPG also provides information on small nucleolar RNAs (snoRNAs) that are encoded in the introns of RP genes.

  14. RPG: the Ribosomal Protein Gene database

    PubMed Central

    Nakao, Akihiro; Yoshihama, Maki; Kenmochi, Naoya

    2004-01-01

    RPG (http://ribosome.miyazaki-med.ac.jp/) is a new database that provides detailed information about ribosomal protein (RP) genes. It contains data from humans and other organisms, including Drosophila melanogaster, Caenorhabditis elegans, Saccharo myces cerevisiae, Methanococcus jannaschii and Escherichia coli. Users can search the database by gene name and organism. Each record includes sequences (genomic, cDNA and amino acid sequences), intron/exon structures, genomic locations and information about orthologs. In addition, users can view and compare the gene structures of the above organisms and make multiple amino acid sequence alignments. RPG also provides information on small nucleolar RNAs (snoRNAs) that are encoded in the introns of RP genes. PMID:14681386

  15. e23D: database and visualization of A-to-I RNA editing sites mapped to 3D protein structures.

    PubMed

    Solomon, Oz; Eyal, Eran; Amariglio, Ninette; Unger, Ron; Rechavi, Gidi

    2016-07-15

    e23D, a database of A-to-I RNA editing sites from human, mouse and fly mapped to evolutionary related protein 3D structures, is presented. Genomic coordinates of A-to-I RNA editing sites are converted to protein coordinates and mapped onto 3D structures from PDB or theoretical models from ModBase. e23D allows visualization of the protein structure, modeling of recoding events and orientation of the editing with respect to nearby genomic functional sites from databases of disease causing mutations and genomic polymorphism. http://www.sheba-cancer.org.il/e23D CONTACT: oz.solomon@live.biu.ac.il or Eran.Eyal@sheba.health.gov.il. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Back to the future: the human protein index (HPI) and the agenda for post-proteomic biology.

    PubMed

    Anderson, N G; Matheson, A; Anderson, N L

    2001-01-01

    The effort to produce an index of all human proteins (the human protein index, or HPI) began twenty years ago, before the initiation of the human genome program. Because DNA sequencing technology is inherently simpler and more scalable than protein analytical technology, and because the finiteness of genomes invited a spirit of rapid conquest, the notion of genome sequencing has displaced that of protein databases in the minds of most molecular biologists for the last decade. However, now that the human genome sequence is nearing completion, a major realignment is under way that brings proteins back to the center of biological thinking. Using an influx of new and improved protein technologies--from mass spectrometry to re-engineered two-dimensional (2-D) gel systems, the original objectives of the HPI have been expanded and the time frame for its execution radically shortened. Several additional large scale technology efforts flowing from the HPI are also described.

  17. [Multiplexing mapping of human cDNAs]. Final report, September 1, 1991--February 28, 1994

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    Using PCR with automated product analysis, 329 human brain cDNA sequences have been assigned to individual human chromosomes. Primers were designed from single-pass cDNA sequences expressed sequence tags (ESTs). Primers were used in PCR reactions with DNA from somatic cell hybrid mapping panels as templates, often with multiplexing. Many ESTs mapped match sequence database records. To evaluate of these matches, the position of the primers relative to the matching region (In), the BLAST scores and the Poisson probability values of the EST/sequence record match were determined. In cases where the gene product was stringently identified by the sequence match hadmore » already been mapped, the gene locus determined by EST was consistent with the previous position which strongly supports the validity of assigning unknown genes to human chromosomes based on the EST sequence matches. In the present cases mapping the ESTs to a chromosome can also be considered to have mapped the known gene product: rolipram-sensitive cAMP phosphodiesterase, chromosome 1; protein phosphatase 2A{beta}, chromosome 4; alpha-catenin, chromosome 5; the ELE1 oncogene, chromosome 10q11.2 or q2.1-q23; MXII protein, chromosome l0q24-qter; ribosomal protein L18a homologue, chromosome 14; ribosomal protein L3, chromosome 17; and moesin, Xp11-cen. There were also ESTs mapped that were closely related to non-human sequence records. These matches therefore can be considered to identify human counterparts of known gene products, or members of known gene families. Examples of these include membrane proteins, translation-associated proteins, structural proteins, and enzymes. These data then demonstrate that single pass sequence information is sufficient to design PCR primers useful for assigning cDNA sequences to human chromosomes. When the EST sequence matches previous sequence database records, the chromosome assignments of the EST can be used to make preliminary assignments of the human gene to a chromosome.« less

  18. Two-dimensional electrophoretic profiling of normal human kidney glomerulus proteome and construction of an extensible markup language (XML)-based database.

    PubMed

    Yoshida, Yutaka; Miyazaki, Kenji; Kamiie, Junichi; Sato, Masao; Okuizumi, Seiji; Kenmochi, Akihisa; Kamijo, Ken'ichi; Nabetani, Takuji; Tsugita, Akira; Xu, Bo; Zhang, Ying; Yaoita, Eishin; Osawa, Tetsuo; Yamamoto, Tadashi

    2005-03-01

    To contribute to physiology and pathophysiology of the glomerulus of human kidney, we have launched a proteomic study of human glomerulus, and compiled a profile of proteins expressed in the glomerulus of normal human kidney by two-dimensional gel electrophoresis (2-DE) and identification with matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS) and/or liquid chromatography-tandem mass spectrometry (LC-MS/MS). Kidney cortices with normal appearance were obtained from patients under surgical nephrectomy due to renal tumor, and glomeruli were highly purified by a standard sieving method followed by picking-up under a phase-contrast microscope. The glomerular proteins were separated by 2-DE with 24 cm immobilized pH gradient strips in the 3-10 range in the first dimension and 26 x 20 cm sodium dodecyl sulfate polyacrylamide electrophoresis gels of 12.5% in the second dimension. Gels were silver-stained, and valid spots were processed for identification through an integrated robotic system that consisted of a spot picker, an in-gel digester, and a MALDI-TOF MS and / or a LC-MS/MS. From 2-DE gel images of glomeruli of four subjects with no apparent pathologic manifestations, a synthetic gel image of normal glomerular proteins was created. The synthetic gel image contained 1713 valid spots, of which 1559 spots were commonly observed in the respective 2-DE gels. Among the 1559 spots, 347 protein spots, representing 212 proteins, have so far been identified, and used for the construction of an extensible markup language (XML)-based database. The database is deposited on a web site (http://www.sw.nec.co.jp/bio/rd/hgldb/index.html) in a form accessible to researchers to contribute to proteomic studies of human glomerulus in health and disease.

  19. GALT protein database, a bioinformatics resource for the management and analysis of structural features of a galactosemia-related protein and its mutants.

    PubMed

    d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

    2009-06-01

    We describe the GALT-Prot database and its related web-based application that have been developed to collect information about the structural and functional effects of mutations on the human enzyme galactose-1-phosphate uridyltransferase (GALT) involved in the genetic disease named galactosemia type I. Besides a list of missense mutations at gene and protein sequence levels, GALT-Prot reports the analysis results of mutant GALT structures. In addition to the structural information about the wild-type enzyme, the database also includes structures of over 100 single point mutants simulated by means of a computational procedure, and the analysis to each mutant was made with several bioinformatics programs in order to investigate the effect of the mutations. The web-based interface allows querying of the database, and several links are also provided in order to guarantee a high integration with other resources already present on the web. Moreover, the architecture of the database and the web application is flexible and can be easily adapted to store data related to other proteins with point mutations. GALT-Prot is freely available at http://bioinformatica.isa.cnr.it/GALT/.

  20. SASD: the Synthetic Alternative Splicing Database for identifying novel isoform from proteomics

    PubMed Central

    2013-01-01

    Background Alternative splicing is an important and widespread mechanism for generating protein diversity and regulating protein expression. High-throughput identification and analysis of alternative splicing in the protein level has more advantages than in the mRNA level. The combination of alternative splicing database and tandem mass spectrometry provides a powerful technique for identification, analysis and characterization of potential novel alternative splicing protein isoforms from proteomics. Therefore, based on the peptidomic database of human protein isoforms for proteomics experiments, our objective is to design a new alternative splicing database to 1) provide more coverage of genes, transcripts and alternative splicing, 2) exclusively focus on the alternative splicing, and 3) perform context-specific alternative splicing analysis. Results We used a three-step pipeline to create a synthetic alternative splicing database (SASD) to identify novel alternative splicing isoforms and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. First, we extracted information on gene structures of all genes in the Ensembl Genes 71 database and incorporated the Integrated Pathway Analysis Database. Then, we compiled artificial splicing transcripts. Lastly, we translated the artificial transcripts into alternative splicing peptides. The SASD is a comprehensive database containing 56,630 genes (Ensembl gene IDs), 95,260 transcripts (Ensembl transcript IDs), and 11,919,779 Alternative Splicing peptides, and also covering about 1,956 pathways, 6,704 diseases, 5,615 drugs, and 52 organs. The database has a web-based user interface that allows users to search, display and download a single gene/transcript/protein, custom gene set, pathway, disease, drug, organ related alternative splicing. Moreover, the quality of the database was validated with comparison to other known databases and two case studies: 1) in liver cancer and 2) in breast cancer. Conclusions The SASD provides the scientific community with an efficient means to identify, analyze, and characterize novel Exon Skipping and Intron Retention protein isoforms from mass spectrometry and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. PMID:24267658

  1. Recent advances in proteomics of cereals.

    PubMed

    Bansal, Monika; Sharma, Madhu; Kanwar, Priyanka; Goyal, Aakash

    Cereals contribute a major part of human nutrition and are considered as an integral source of energy for human diets. With genomic databases already available in cereals such as rice, wheat, barley, and maize, the focus has now moved to proteome analysis. Proteomics studies involve the development of appropriate databases based on developing suitable separation and purification protocols, identification of protein functions, and can confirm their functional networks based on already available data from other sources. Tremendous progress has been made in the past decade in generating huge data-sets for covering interactions among proteins, protein composition of various organs and organelles, quantitative and qualitative analysis of proteins, and to characterize their modulation during plant development, biotic, and abiotic stresses. Proteomics platforms have been used to identify and improve our understanding of various metabolic pathways. This article gives a brief review of efforts made by different research groups on comparative descriptive and functional analysis of proteomics applications achieved in the cereal science so far.

  2. Identification of Human N-Myristoylated Proteins from Human Complementary DNA Resources by Cell-Free and Cellular Metabolic Labeling Analyses.

    PubMed

    Takamitsu, Emi; Otsuka, Motoaki; Haebara, Tatsuki; Yano, Manami; Matsuzaki, Kanako; Kobuchi, Hirotsugu; Moriya, Koko; Utsumi, Toshihiko

    2015-01-01

    To identify physiologically important human N-myristoylated proteins, 90 cDNA clones predicted to encode human N-myristoylated proteins were selected from a human cDNA resource (4,369 Kazusa ORFeome project human cDNA clones) by two bioinformatic N-myristoylation prediction systems, NMT-The MYR Predictor and Myristoylator. After database searches to exclude known human N-myristoylated proteins, 37 cDNA clones were selected as potential human N-myristoylated proteins. The susceptibility of these cDNA clones to protein N-myristoylation was first evaluated using fusion proteins in which the N-terminal ten amino acid residues were fused to an epitope-tagged model protein. Then, protein N-myristoylation of the gene products of full-length cDNAs was evaluated by metabolic labeling experiments both in an insect cell-free protein synthesis system and in transfected human cells. As a result, the products of 13 cDNA clones (FBXL7, PPM1B, SAMM50, PLEKHN, AIFM3, C22orf42, STK32A, FAM131C, DRICH1, MCC1, HID1, P2RX5, STK32B) were found to be human N-myristoylated proteins. Analysis of the role of protein N-myristoylation on the intracellular localization of SAMM50, a mitochondrial outer membrane protein, revealed that protein N-myristoylation was required for proper targeting of SAMM50 to mitochondria. Thus, the strategy used in this study is useful for the identification of physiologically important human N-myristoylated proteins from human cDNA resources.

  3. Identification of Human N-Myristoylated Proteins from Human Complementary DNA Resources by Cell-Free and Cellular Metabolic Labeling Analyses

    PubMed Central

    Takamitsu, Emi; Otsuka, Motoaki; Haebara, Tatsuki; Yano, Manami; Matsuzaki, Kanako; Kobuchi, Hirotsugu; Moriya, Koko; Utsumi, Toshihiko

    2015-01-01

    To identify physiologically important human N-myristoylated proteins, 90 cDNA clones predicted to encode human N-myristoylated proteins were selected from a human cDNA resource (4,369 Kazusa ORFeome project human cDNA clones) by two bioinformatic N-myristoylation prediction systems, NMT-The MYR Predictor and Myristoylator. After database searches to exclude known human N-myristoylated proteins, 37 cDNA clones were selected as potential human N-myristoylated proteins. The susceptibility of these cDNA clones to protein N-myristoylation was first evaluated using fusion proteins in which the N-terminal ten amino acid residues were fused to an epitope-tagged model protein. Then, protein N-myristoylation of the gene products of full-length cDNAs was evaluated by metabolic labeling experiments both in an insect cell-free protein synthesis system and in transfected human cells. As a result, the products of 13 cDNA clones (FBXL7, PPM1B, SAMM50, PLEKHN, AIFM3, C22orf42, STK32A, FAM131C, DRICH1, MCC1, HID1, P2RX5, STK32B) were found to be human N-myristoylated proteins. Analysis of the role of protein N-myristoylation on the intracellular localization of SAMM50, a mitochondrial outer membrane protein, revealed that protein N-myristoylation was required for proper targeting of SAMM50 to mitochondria. Thus, the strategy used in this study is useful for the identification of physiologically important human N-myristoylated proteins from human cDNA resources. PMID:26308446

  4. Peptide reranking with protein-peptide correspondence and precursor peak intensity information.

    PubMed

    Yang, Chao; He, Zengyou; Yang, Can; Yu, Weichuan

    2012-01-01

    Searching tandem mass spectra against a protein database has been a mainstream method for peptide identification. Improving peptide identification results by ranking true Peptide-Spectrum Matches (PSMs) over their false counterparts leads to the development of various reranking algorithms. In peptide reranking, discriminative information is essential to distinguish true PSMs from false PSMs. Generally, most peptide reranking methods obtain discriminative information directly from database search scores or by training machine learning models. Information in the protein database and MS1 spectra (i.e., single stage MS spectra) is ignored. In this paper, we propose to use information in the protein database and MS1 spectra to rerank peptide identification results. To quantitatively analyze their effects to peptide reranking results, three peptide reranking methods are proposed: PPMRanker, PPIRanker, and MIRanker. PPMRanker only uses Protein-Peptide Map (PPM) information from the protein database, PPIRanker only uses Precursor Peak Intensity (PPI) information, and MIRanker employs both PPM information and PPI information. According to our experiments on a standard protein mixture data set, a human data set and a mouse data set, PPMRanker and MIRanker achieve better peptide reranking results than PetideProphet, PeptideProphet+NSP (number of sibling peptides) and a score regularization method SRPI. The source codes of PPMRanker, PPIRanker, and MIRanker, and all supplementary documents are available at our website: http://bioinformatics.ust.hk/pepreranking/. Alternatively, these documents can also be downloaded from: http://sourceforge.net/projects/pepreranking/.

  5. CORUM: the comprehensive resource of mammalian protein complexes

    PubMed Central

    Ruepp, Andreas; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Stransky, Michael; Waegele, Brigitte; Schmidt, Thorsten; Doudieu, Octave Noubibou; Stümpflen, Volker; Mewes, H. Werner

    2008-01-01

    Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The CORUM (http://mips.gsf.de/genre/proj/corum/index.html) database is a collection of experimentally verified mammalian protein complexes. Information is manually derived by critical reading of the scientific literature from expert annotators. Information about protein complexes includes protein complex names, subunits, literature references as well as the function of the complexes. For functional annotation, we use the FunCat catalogue that enables to organize the protein complex space into biologically meaningful subsets. The database contains more than 1750 protein complexes that are built from 2400 different genes, thus representing 12% of the protein-coding genes in human. A web-based system is available to query, view and download the data. CORUM provides a comprehensive dataset of protein complexes for discoveries in systems biology, analyses of protein networks and protein complex-associated diseases. Comparable to the MIPS reference dataset of protein complexes from yeast, CORUM intends to serve as a reference for mammalian protein complexes. PMID:17965090

  6. Proteomic profile of dormant Trichophyton Rubrum conidia

    PubMed Central

    Leng, Wenchuan; Liu, Tao; Li, Rui; Yang, Jian; Wei, Candong; Zhang, Wenliang; Jin, Qi

    2008-01-01

    Background Trichophyton rubrum is the most common dermatophyte causing fungal skin infections in humans. Asexual sporulation is an important means of propagation for T. rubrum, and conidia produced by this way are thought to be the primary cause of human infections. Despite their importance in pathogenesis, the conidia of T. rubrum remain understudied. We intend to intensively investigate the proteome of dormant T. rubrum conidia to characterize its molecular and cellular features and to enhance the development of novel therapeutic strategies. Results The proteome of T. rubrum conidia was analyzed by combining shotgun proteomics with sample prefractionation and multiple enzyme digestion. In total, 1026 proteins were identified. All identified proteins were compared to those in the NCBI non-redundant protein database, the eukaryotic orthologous groups database, and the gene ontology database to obtain functional annotation information. Functional classification revealed that the identified proteins covered nearly all major biological processes. Some proteins were spore specific and related to the survival and dispersal of T. rubrum conidia, and many proteins were important to conidial germination and response to environmental conditions. Conclusion Our results suggest that the proteome of T. rubrum conidia is considerably complex, and that the maintenance of conidial dormancy is an intricate and elaborate process. This data set provides the first global framework for the dormant T. rubrum conidia proteome and is a stepping stone on the way to further study of the molecular mechanisms of T. rubrum conidial germination and the maintenance of conidial dormancy. PMID:18578874

  7. PrionScan: an online database of predicted prion domains in complete proteomes.

    PubMed

    Espinosa Angarica, Vladimir; Angulo, Alfonso; Giner, Arturo; Losilla, Guillermo; Ventura, Salvador; Sancho, Javier

    2014-02-05

    Prions are a particular type of amyloids related to a large variety of important processes in cells, but also responsible for serious diseases in mammals and humans. The number of experimentally characterized prions is still low and corresponds to a handful of examples in microorganisms and mammals. Prion aggregation is mediated by specific protein domains with a remarkable compositional bias towards glutamine/asparagine and against charged residues and prolines. These compositional features have been used to predict new prion proteins in the genomes of different organisms. Despite these efforts, there are only a few available data sources containing prion predictions at a genomic scale. Here we present PrionScan, a new database of predicted prion-like domains in complete proteomes. We have previously developed a predictive methodology to identify and score prionogenic stretches in protein sequences. In the present work, we exploit this approach to scan all the protein sequences in public databases and compile a repository containing relevant information of proteins bearing prion-like domains. The database is updated regularly alongside UniprotKB and in its present version contains approximately 28000 predictions in proteins from different functional categories in more than 3200 organisms from all the taxonomic subdivisions. PrionScan can be used in two different ways: database query and analysis of protein sequences submitted by the users. In the first mode, simple queries allow to retrieve a detailed description of the properties of a defined protein. Queries can also be combined to generate more complex and specific searching patterns. In the second mode, users can submit and analyze their own sequences. It is expected that this database would provide relevant insights on prion functions and regulation from a genome-wide perspective, allowing researches performing cross-species prion biology studies. Our database might also be useful for guiding experimentalists in the identification of new candidates for further experimental characterization.

  8. DDRprot: a database of DNA damage response-related proteins.

    PubMed

    Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M

    2016-01-01

    The DNA Damage Response (DDR) signalling network is an essential system that protects the genome's integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used.Database URL: http://ddr.cbbio.es. © The Author(s) 2016. Published by Oxford University Press.

  9. CanisOme--The protein signatures of Canis lupus familiaris diseases.

    PubMed

    Fernandes, Mónica; Rosa, Nuno; Esteves, Eduardo; Correia, Maria José; Arrais, Joel; Ribeiro, Paulo; Vala, Helena; Barros, Marlene

    2016-03-16

    Although the applications of Proteomics in Human Biomedicine have been explored for some time now, in animal and veterinary research, the potential of this resource has just started to be explored, especially when companion animal health is considered. In the last years, knowledge on the Canis lupus familiaris proteome has been accumulating in the literature and a resource compiling all this information and critically reviewing it was lacking. This article presents such a resource for the first time. CanisOme is a database of all proteins identified in Canis lupus familiaris tissues, either in health or in disease, annotated with information on the proteins present on the sample and on the donors. This database reunites information on 549 proteins, associated with 63 dog diseases and 33 dog breeds. Examples of how this information may be used to produce new hypothesis on disease mechanisms is presented both through the functional analysis of the proteins quantified in canine cutaneous mast cell tumors and through the study of the interactome of C. lupus familiaris and Leishmania infantum. Therefore, the usefulness of CanisOme for researchers looking for protein biomarkers in dogs and interested in a comprehensive analysis of disease mechanisms is demonstrated. This paper presents CanisOme, a database of proteomic studies with relevant protein annotation, allowing the enlightenment of disease mechanisms and the discovery of novel disease biomarkers for C. lupus familiaris. This knowledge is important not only for the improvement of animal health but also for the use of dogs as models for human health studies. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. Identification of shed proteins from Chinese hamster ovary cells: Application of statistical confidence using human and mouse protein databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ahram, Mamoun; Strittmatter, Eric F.; Monroe, Matthew E.

    The shedding process releases ligands, receptors, and other proteins from the surface of the cell and is a mechanism whereby cells communicate. Even though altered regulation of this process has been implicated in several diseases, global approaches to evaluate shed proteins have not been developed. A goal of this study was to identify global changes in shed proteins in media taken from cells exposed to low-doses of radiation in an effort to develop a fundamental understanding of the bystander response. CHO cells were chosen for this study because they have been widely used for radiation studies and since they havemore » been reported to respond to radiation by releasing factors into the media that cause genomic instability and cytotoxicity in unexposed cells, i.e., a bystander effect. Media samples taken for irradiated cells were evaluated using a combination of tandem- and FTICR-mass spectrometry analysis. Since the hamster genome has not been sequenced, mass spectrometry data was searched against the mouse and human proteins databases. Nearly 150 proteins that were identified by tandem mass spectrometry were confirmed by FTICR. When both types of mass spectrometry data were evaluated with a new confidence scoring tool, which is based on discriminant analyses, about 500 protein were identified. Approximately 20% of these identifications were either integral membrane proteins or membrane associated proteins, suggesting that they were derived from the cell surface, hence were likely shed. However, estimates of quantitative changes, based on two independent mass spectrometry approaches, did not identify any protein abundance changes attributable to the bystander effect. Results from this study demonstrate the feasibility of global evaluation of shed proteins using mass spectrometry in conjunction with cross-species protein databases and that significant improvement in peptide/protein identifications is provided by the confidence scoring tool.« less

  11. ZikaBase: An integrated ZIKV- Human Interactome Map database.

    PubMed

    Gurumayum, Sanathoi; Brahma, Rahul; Naorem, Leimarembi Devi; Muthaiyan, Mathavan; Gopal, Jeyakodi; Venkatesan, Amouda

    2018-01-15

    Re-emergence of ZIKV has caused infections in more than 1.5 million people. The molecular mechanism and pathogenesis of ZIKV is not well explored due to unavailability of adequate model and lack of publically accessible resources to provide information of ZIKV-Human protein interactome map till today. This study made an attempt to curate the ZIKV-Human interaction proteins from published literatures and RNA-Seq data. 11 direct interaction, 12 associated genes are retrieved from literatures and 3742 Differentially Expressed Genes (DEGs) are obtained from RNA-Seq analysis. The genes have been analyzed to construct the ZIKV-Human Interactome Map. The importance of the study has been illustrated by the enrichment analysis and observed that direct interaction and associated genes are enriched in viral entry into host cell. Also, ZIKV infection modulates 32% signal and 27% immune system pathways. The integrated database, ZikaBase has been developed to help the virology research community and accessible at https://test5.bicpu.edu.in. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Protannotator: a semiautomated pipeline for chromosome-wise functional annotation of the "missing" human proteome.

    PubMed

    Islam, Mohammad T; Garg, Gagan; Hancock, William S; Risk, Brian A; Baker, Mark S; Ranganathan, Shoba

    2014-01-03

    The chromosome-centric human proteome project (C-HPP) aims to define the complete set of proteins encoded in each human chromosome. The neXtProt database (September 2013) lists 20,128 proteins for the human proteome, of which 3831 human proteins (∼19%) are considered "missing" according to the standard metrics table (released September 27, 2013). In support of the C-HPP initiative, we have extended the annotation strategy developed for human chromosome 7 "missing" proteins into a semiautomated pipeline to functionally annotate the "missing" human proteome. This pipeline integrates a suite of bioinformatics analysis and annotation software tools to identify homologues and map putative functional signatures, gene ontology, and biochemical pathways. From sequential BLAST searches, we have primarily identified homologues from reviewed nonhuman mammalian proteins with protein evidence for 1271 (33.2%) "missing" proteins, followed by 703 (18.4%) homologues from reviewed nonhuman mammalian proteins and subsequently 564 (14.7%) homologues from reviewed human proteins. Functional annotations for 1945 (50.8%) "missing" proteins were also determined. To accelerate the identification of "missing" proteins from proteomics studies, we generated proteotypic peptides in silico. Matching these proteotypic peptides to ENCODE proteogenomic data resulted in proteomic evidence for 107 (2.8%) of the 3831 "missing proteins, while evidence from a recent membrane proteomic study supported the existence for another 15 "missing" proteins. The chromosome-wise functional annotation of all "missing" proteins is freely available to the scientific community through our web server (http://biolinfo.org/protannotator).

  13. Tandem mass spectrometry of human tryptic blood peptides calculated by a statistical algorithm and captured by a relational database with exploration by a general statistical analysis system.

    PubMed

    Bowden, Peter; Beavis, Ron; Marshall, John

    2009-11-02

    A goodness of fit test may be used to assign tandem mass spectra of peptides to amino acid sequences and to directly calculate the expected probability of mis-identification. The product of the peptide expectation values directly yields the probability that the parent protein has been mis-identified. A relational database could capture the mass spectral data, the best fit results, and permit subsequent calculations by a general statistical analysis system. The many files of the Hupo blood protein data correlated by X!TANDEM against the proteins of ENSEMBL were collected into a relational database. A redundant set of 247,077 proteins and peptides were correlated by X!TANDEM, and that was collapsed to a set of 34,956 peptides from 13,379 distinct proteins. About 6875 distinct proteins were only represented by a single distinct peptide, 2866 proteins showed 2 distinct peptides, and 3454 proteins showed at least three distinct peptides by X!TANDEM. More than 99% of the peptides were associated with proteins that had cumulative expectation values, i.e. probability of false positive identification, of one in one hundred or less. The distribution of peptides per protein from X!TANDEM was significantly different than those expected from random assignment of peptides.

  14. PAMDB: a comprehensive Pseudomonas aeruginosa metabolome database.

    PubMed

    Huang, Weiliang; Brewer, Luke K; Jones, Jace W; Nguyen, Angela T; Marcu, Ana; Wishart, David S; Oglesby-Sherrouse, Amanda G; Kane, Maureen A; Wilks, Angela

    2018-01-04

    The Pseudomonas aeruginosaMetabolome Database (PAMDB, http://pseudomonas.umaryland.edu) is a searchable, richly annotated metabolite database specific to P. aeruginosa. P. aeruginosa is a soil organism and significant opportunistic pathogen that adapts to its environment through a versatile energy metabolism network. Furthermore, P. aeruginosa is a model organism for the study of biofilm formation, quorum sensing, and bioremediation processes, each of which are dependent on unique pathways and metabolites. The PAMDB is modelled on the Escherichia coli (ECMDB), yeast (YMDB) and human (HMDB) metabolome databases and contains >4370 metabolites and 938 pathways with links to over 1260 genes and proteins. The database information was compiled from electronic databases, journal articles and mass spectrometry (MS) metabolomic data obtained in our laboratories. For each metabolite entered, we provide detailed compound descriptions, names and synonyms, structural and physiochemical information, nuclear magnetic resonance (NMR) and MS spectra, enzymes and pathway information, as well as gene and protein sequences. The database allows extensive searching via chemical names, structure and molecular weight, together with gene, protein and pathway relationships. The PAMBD and its future iterations will provide a valuable resource to biologists, natural product chemists and clinicians in identifying active compounds, potential biomarkers and clinical diagnostics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. AllergenOnline: A peer-reviewed, curated allergen database to assess novel food proteins for potential cross-reactivity.

    PubMed

    Goodman, Richard E; Ebisawa, Motohiro; Ferreira, Fatima; Sampson, Hugh A; van Ree, Ronald; Vieths, Stefan; Baumert, Joseph L; Bohle, Barbara; Lalithambika, Sreedevi; Wise, John; Taylor, Steve L

    2016-05-01

    Increasingly regulators are demanding evaluation of potential allergenicity of foods prior to marketing. Primary risks are the transfer of allergens or potentially cross-reactive proteins into new foods. AllergenOnline was developed in 2005 as a peer-reviewed bioinformatics platform to evaluate risks of new dietary proteins in genetically modified organisms (GMO) and novel foods. The process used to identify suspected allergens and evaluate the evidence of allergenicity was refined between 2010 and 2015. Candidate proteins are identified from the NCBI database using keyword searches, the WHO/IUIS nomenclature database and peer reviewed publications. Criteria to classify proteins as allergens are described. Characteristics of the protein, the source and human subjects, test methods and results are evaluated by our expert panel and archived. Food, inhalant, salivary, venom, and contact allergens are included. Users access allergen sequences through links to the NCBI database and relevant references are listed online. Version 16 includes 1956 sequences from 778 taxonomic-protein groups that are accepted with evidence of allergic serum IgE-binding and/or biological activity. AllergenOnline provides a useful peer-reviewed tool for identifying the primary potential risks of allergy for GMOs and novel foods based on criteria described by the Codex Alimentarius Commission (2003). © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Complete genome-wide screening and subtractive genomic approach revealed new virulence factors, potential drug targets against bio-war pathogen Brucella melitensis 16M

    PubMed Central

    Pradeepkiran, Jangampalli Adi; Sainath, Sri Bhashyam; Kumar, Konidala Kranthi; Bhaskar, Matcha

    2015-01-01

    Brucella melitensis 16M is a Gram-negative coccobacillus that infects both animals and humans. It causes a disease known as brucellosis, which is characterized by acute febrile illness in humans and causes abortions in livestock. To prevent and control brucellosis, identification of putative drug targets is crucial. The present study aimed to identify drug targets in B. melitensis 16M by using a subtractive genomic approach. We used available database repositories (Database of Essential Genes, Kyoto Encyclopedia of Genes and Genomes Automatic Annotation Server, and Kyoto Encyclopedia of Genes and Genomes) to identify putative genes that are nonhomologous to humans and essential for pathogen B. melitensis 16M. The results revealed that among 3 Mb genome size of pathogen, 53 putative characterized and 13 uncharacterized hypothetical genes were identified; further, from Basic Local Alignment Search Tool protein analysis, one hypothetical protein showed a close resemblance (50%) to Silicibacter pomeroyi DUF1285 family protein (2RE3). A further homology model of the target was constructed using MODELLER 9.12 and optimized through variable target function method by molecular dynamics optimization with simulating annealing. The stereochemical quality of the restrained model was evaluated by PROCHECK, VERIFY-3D, ERRAT, and WHATIF servers. Furthermore, structure-based virtual screening was carried out against the predicted active site of the respective protein using the glycerol structural analogs from the PubChem database. We identified five best inhibitors with strong affinities, stable interactions, and also with reliable drug-like properties. Hence, these leads might be used as the most effective inhibitors of modeled protein. The outcome of the present work of virtual screening of putative gene targets might facilitate design of potential drugs for better treatment against brucellosis. PMID:25834405

  17. Revealing the potential pathogenesis of glioma by utilizing a glioma associated protein-protein interaction network.

    PubMed

    Pan, Weiran; Li, Gang; Yang, Xiaoxiao; Miao, Jinming

    2015-04-01

    This study aims to explore the potential mechanism of glioma through bioinformatic approaches. The gene expression profile (GSE4290) of glioma tumor and non-tumor samples was downloaded from Gene Expression Omnibus database. A total of 180 samples were available, including 23 non-tumor and 157 tumor samples. Then the raw data were preprocessed using robust multiarray analysis, and 8,890 differentially expressed genes (DEGs) were identified by using t-test (false discovery rate < 0.0005). Furthermore, 16 known glioma related genes were abstracted from Genetic Association Database. After mapping 8,890 DEGs and 16 known glioma related genes to Human Protein Reference Database, a glioma associated protein-protein interaction network (GAPN) was constructed. In addition, 51 sub-networks in GAPN were screened out through Molecular Complex Detection (score ≥ 1), and sub-network 1 was found to have the closest interaction (score = 3). What' more, for the top 10 sub-networks, Gene Ontology (GO) enrichment analysis (p value < 0.05) was performed, and DEGs involved in sub-network 1 and 2, such as BRMS1L and CCNA1, were predicted to regulate cell growth, cell cycle, and DNA replication via interacting with known glioma related genes. Finally, the overlaps of DEGs and human essential, housekeeping, tissue-specific genes were calculated (p value = 1.0, 1.0, and 0.00014, respectively) and visualized by Venn Diagram package in R. About 61% of human tissue-specific genes were DEGs as well. This research shed new light on the pathogenesis of glioma based on DEGs and GAPN, and our findings might provide potential targets for clinical glioma treatment.

  18. The Challenge of Human Spermatozoa Proteome: A Systematic Review.

    PubMed

    Gilany, Kambiz; Minai-Tehrani, Arash; Amini, Mehdi; Agharezaee, Niloofar; Arjmand, Babak

    2017-01-01

    Currently, there are 20,197 human protein-coding genes in the most expertly curated database (UniProtKB/Swiss-Pro). Big efforts have been made by the international consortium, the Chromosome-Centric Human Proteome Project (C-HPP) and independent researchers, to map human proteome. In brief, anno 2017 the human proteome was outlined. The male factor contributes to 50% of infertility in couples. However, there are limited human spermatozoa proteomic studies. Firstly, the development of the mapping of the human spermatozoa was analyzed. The human spermatozoa have been used as a model for missing proteins. It has been shown that human spermatozoa are excellent sources for finding missing proteins. Y chromosome proteome mapping is led by Iran. However, it seems that it is extremely challenging to map the human spermatozoa Y chromosome proteins based on current mass spectrometry-based proteomics technology. Post-translation modifications (PTMs) of human spermatozoa proteome are the most unexplored area and currently the exact role of PTMs in male infertility is unknown. Additionally, the clinical human spermatozoa proteomic analysis, anno 2017 was done in this study.

  19. A Community Standard Format for the Representation of Protein Affinity Reagents*

    PubMed Central

    Gloriam, David E.; Orchard, Sandra; Bertinetti, Daniela; Björling, Erik; Bongcam-Rudloff, Erik; Borrebaeck, Carl A. K.; Bourbeillon, Julie; Bradbury, Andrew R. M.; de Daruvar, Antoine; Dübel, Stefan; Frank, Ronald; Gibson, Toby J.; Gold, Larry; Haslam, Niall; Herberg, Friedrich W.; Hiltke, Tara; Hoheisel, Jörg D.; Kerrien, Samuel; Koegl, Manfred; Konthur, Zoltán; Korn, Bernhard; Landegren, Ulf; Montecchi-Palazzi, Luisa; Palcy, Sandrine; Rodriguez, Henry; Schweinsberg, Sonja; Sievert, Volker; Stoevesandt, Oda; Taussig, Michael J.; Ueffing, Marius; Uhlén, Mathias; van der Maarel, Silvère; Wingren, Christer; Woollard, Peter; Sherman, David J.; Hermjakob, Henning

    2010-01-01

    Protein affinity reagents (PARs), most commonly antibodies, are essential reagents for protein characterization in basic research, biotechnology, and diagnostics as well as the fastest growing class of therapeutics. Large numbers of PARs are available commercially; however, their quality is often uncertain. In addition, currently available PARs cover only a fraction of the human proteome, and their cost is prohibitive for proteome scale applications. This situation has triggered several initiatives involving large scale generation and validation of antibodies, for example the Swedish Human Protein Atlas and the German Antibody Factory. Antibodies targeting specific subproteomes are being pursued by members of Human Proteome Organisation (plasma and liver proteome projects) and the United States National Cancer Institute (cancer-associated antigens). ProteomeBinders, a European consortium, aims to set up a resource of consistently quality-controlled protein-binding reagents for the whole human proteome. An ultimate PAR database resource would allow consumers to visit one on-line warehouse and find all available affinity reagents from different providers together with documentation that facilitates easy comparison of their cost and quality. However, in contrast to, for example, nucleotide databases among which data are synchronized between the major data providers, current PAR producers, quality control centers, and commercial companies all use incompatible formats, hindering data exchange. Here we propose Proteomics Standards Initiative (PSI)-PAR as a global community standard format for the representation and exchange of protein affinity reagent data. The PSI-PAR format is maintained by the Human Proteome Organisation PSI and was developed within the context of ProteomeBinders by building on a mature proteomics standard format, PSI-molecular interaction, which is a widely accepted and established community standard for molecular interaction data. Further information and documentation are available on the PSI-PAR web site. PMID:19674966

  20. IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model

    PubMed Central

    Xia, Kai; Dong, Dong; Han, Jing-Dong J

    2006-01-01

    Background Although protein-protein interaction (PPI) networks have been explored by various experimental methods, the maps so built are still limited in coverage and accuracy. To further expand the PPI network and to extract more accurate information from existing maps, studies have been carried out to integrate various types of functional relationship data. A frequently updated database of computationally analyzed potential PPIs to provide biological researchers with rapid and easy access to analyze original data as a biological network is still lacking. Results By applying a probabilistic model, we integrated 27 heterogeneous genomic, proteomic and functional annotation datasets to predict PPI networks in human. In addition to previously studied data types, we show that phenotypic distances and genetic interactions can also be integrated to predict PPIs. We further built an easy-to-use, updatable integrated PPI database, the Integrated Network Database (IntNetDB) online, to provide automatic prediction and visualization of PPI network among genes of interest. The networks can be visualized in SVG (Scalable Vector Graphics) format for zooming in or out. IntNetDB also provides a tool to extract topologically highly connected network neighborhoods from a specific network for further exploration and research. Using the MCODE (Molecular Complex Detections) algorithm, 190 such neighborhoods were detected among all the predicted interactions. The predicted PPIs can also be mapped to worm, fly and mouse interologs. Conclusion IntNetDB includes 180,010 predicted protein-protein interactions among 9,901 human proteins and represents a useful resource for the research community. Our study has increased prediction coverage by five-fold. IntNetDB also provides easy-to-use network visualization and analysis tools that allow biological researchers unfamiliar with computational biology to access and analyze data over the internet. The web interface of IntNetDB is freely accessible at . Visualization requires Mozilla version 1.8 (or higher) or Internet Explorer with installation of SVGviewer. PMID:17112386

  1. DBSecSys 2.0: a database of Burkholderia mallei and Burkholderia pseudomallei secretion systems.

    PubMed

    Memišević, Vesna; Kumar, Kamal; Zavaljevski, Nela; DeShazer, David; Wallqvist, Anders; Reifman, Jaques

    2016-09-20

    Burkholderia mallei and B. pseudomallei are the causative agents of glanders and melioidosis, respectively, diseases with high morbidity and mortality rates. B. mallei and B. pseudomallei are closely related genetically; B. mallei evolved from an ancestral strain of B. pseudomallei by genome reduction and adaptation to an obligate intracellular lifestyle. Although these two bacteria cause different diseases, they share multiple virulence factors, including bacterial secretion systems, which represent key components of bacterial pathogenicity. Despite recent progress, the secretion system proteins for B. mallei and B. pseudomallei, their pathogenic mechanisms of action, and host factors are not well characterized. We previously developed a manually curated database, DBSecSys, of bacterial secretion system proteins for B. mallei. Here, we report an expansion of the database with corresponding information about B. pseudomallei. DBSecSys 2.0 contains comprehensive literature-based and computationally derived information about B. mallei ATCC 23344 and literature-based and computationally derived information about B. pseudomallei K96243. The database contains updated information for 163 B. mallei proteins from the previous database and 61 additional B. mallei proteins, and new information for 281 B. pseudomallei proteins associated with 5 secretion systems, their 1,633 human- and murine-interacting targets, and 2,400 host-B. mallei interactions and 2,286 host-B. pseudomallei interactions. The database also includes information about 13 pathogenic mechanisms of action for B. mallei and B. pseudomallei secretion system proteins inferred from the available literature or computationally. Additionally, DBSecSys 2.0 provides details about 82 virulence attenuation experiments for 52 B. mallei secretion system proteins and 98 virulence attenuation experiments for 61 B. pseudomallei secretion system proteins. We updated the Web interface and data access layer to speed-up users' search of detailed information for orthologous proteins related to secretion systems of the two pathogens. The updates of DBSecSys 2.0 provide unique capabilities to access comprehensive information about secretion systems of B. mallei and B. pseudomallei. They enable studies and comparisons of corresponding proteins of these two closely related pathogens and their host-interacting partners. The database is available at http://dbsecsys.bhsai.org .

  2. Gene and protein nomenclature in public databases

    PubMed Central

    Fundel, Katrin; Zimmer, Ralf

    2006-01-01

    Background Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. Results We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. Conclusion In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries considerably increases size and decreases ambiguity. The entries of the curated synonym dictionary are available for manual querying, editing, and PubMed- or Google-search via the ProThesaurus-wiki. For automated querying via custom software, we offer a web service and an exemplary client application. PMID:16899134

  3. DNAtraffic--a new database for systems biology of DNA dynamics during the cell life.

    PubMed

    Kuchta, Krzysztof; Barszcz, Daniela; Grzesiuk, Elzbieta; Pomorski, Pawel; Krwawicz, Joanna

    2012-01-01

    DNAtraffic (http://dnatraffic.ibb.waw.pl/) is dedicated to be a unique comprehensive and richly annotated database of genome dynamics during the cell life. It contains extensive data on the nomenclature, ontology, structure and function of proteins related to the DNA integrity mechanisms such as chromatin remodeling, histone modifications, DNA repair and damage response from eight organisms: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Escherichia coli and Arabidopsis thaliana. DNAtraffic contains comprehensive information on the diseases related to the assembled human proteins. DNAtraffic is richly annotated in the systemic information on the nomenclature, chemistry and structure of DNA damage and their sources, including environmental agents or commonly used drugs targeting nucleic acids and/or proteins involved in the maintenance of genome stability. One of the DNAtraffic database aim is to create the first platform of the combinatorial complexity of DNA network analysis. Database includes illustrations of pathways, damage, proteins and drugs. Since DNAtraffic is designed to cover a broad spectrum of scientific disciplines, it has to be extensively linked to numerous external data sources. Our database represents the result of the manual annotation work aimed at making the DNAtraffic much more useful for a wide range of systems biology applications.

  4. DNAtraffic—a new database for systems biology of DNA dynamics during the cell life

    PubMed Central

    Kuchta, Krzysztof; Barszcz, Daniela; Grzesiuk, Elzbieta; Pomorski, Pawel; Krwawicz, Joanna

    2012-01-01

    DNAtraffic (http://dnatraffic.ibb.waw.pl/) is dedicated to be a unique comprehensive and richly annotated database of genome dynamics during the cell life. It contains extensive data on the nomenclature, ontology, structure and function of proteins related to the DNA integrity mechanisms such as chromatin remodeling, histone modifications, DNA repair and damage response from eight organisms: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Escherichia coli and Arabidopsis thaliana. DNAtraffic contains comprehensive information on the diseases related to the assembled human proteins. DNAtraffic is richly annotated in the systemic information on the nomenclature, chemistry and structure of DNA damage and their sources, including environmental agents or commonly used drugs targeting nucleic acids and/or proteins involved in the maintenance of genome stability. One of the DNAtraffic database aim is to create the first platform of the combinatorial complexity of DNA network analysis. Database includes illustrations of pathways, damage, proteins and drugs. Since DNAtraffic is designed to cover a broad spectrum of scientific disciplines, it has to be extensively linked to numerous external data sources. Our database represents the result of the manual annotation work aimed at making the DNAtraffic much more useful for a wide range of systems biology applications. PMID:22110027

  5. DBSecSys: a database of Burkholderia mallei secretion systems.

    PubMed

    Memišević, Vesna; Kumar, Kamal; Cheng, Li; Zavaljevski, Nela; DeShazer, David; Wallqvist, Anders; Reifman, Jaques

    2014-07-16

    Bacterial pathogenicity represents a major public health concern worldwide. Secretion systems are a key component of bacterial pathogenicity, as they provide the means for bacterial proteins to penetrate host-cell membranes and insert themselves directly into the host cells' cytosol. Burkholderia mallei is a Gram-negative bacterium that uses multiple secretion systems during its host infection life cycle. To date, the identities of secretion system proteins for B. mallei are not well known, and their pathogenic mechanisms of action and host factors are largely uncharacterized. We present the Database of Burkholderia malleiSecretion Systems (DBSecSys), a compilation of manually curated and computationally predicted bacterial secretion system proteins and their host factors. Currently, DBSecSys contains comprehensive experimentally and computationally derived information about B. mallei strain ATCC 23344. The database includes 143 B. mallei proteins associated with five secretion systems, their 1,635 human and murine interacting targets, and the corresponding 2,400 host-B. mallei interactions. The database also includes information about 10 pathogenic mechanisms of action for B. mallei secretion system proteins inferred from the available literature. Additionally, DBSecSys provides details about 42 virulence attenuation experiments for 27 B. mallei secretion system proteins. Users interact with DBSecSys through a Web interface that allows for data browsing, querying, visualizing, and downloading. DBSecSys provides a comprehensive, systematically organized resource of experimental and computational data associated with B. mallei secretion systems. It provides the unique ability to study secretion systems not only through characterization of their corresponding pathogen proteins, but also through characterization of their host-interacting partners.The database is available at https://applications.bhsai.org/dbsecsys.

  6. Detection of Missing Proteins Using the PRIDE Database as a Source of Mass Spectrometry Evidence.

    PubMed

    Garin-Muga, Alba; Odriozola, Leticia; Martínez-Val, Ana; Del Toro, Noemí; Martínez, Rocío; Molina, Manuela; Cantero, Laura; Rivera, Rocío; Garrido, Nicolás; Dominguez, Francisco; Sanchez Del Pino, Manuel M; Vizcaíno, Juan Antonio; Corrales, Fernando J; Segura, Victor

    2016-11-04

    The current catalogue of the human proteome is not yet complete, as experimental proteomics evidence is still elusive for a group of proteins known as the missing proteins. The Human Proteome Project (HPP) has been successfully using technology and bioinformatic resources to improve the characterization of such challenging proteins. In this manuscript, we propose a pipeline starting with the mining of the PRIDE database to select a group of data sets potentially enriched in missing proteins that are subsequently analyzed for protein identification with a method based on the statistical analysis of proteotypic peptides. Spermatozoa and the HEK293 cell line were found to be a promising source of missing proteins and clearly merit further attention in future studies. After the analysis of the selected samples, we found 342 PSMs, suggesting the presence of 97 missing proteins in human spermatozoa or the HEK293 cell line, while only 36 missing proteins were potentially detected in the retina, frontal cortex, aorta thoracica, or placenta. The functional analysis of the missing proteins detected confirmed their tissue specificity, and the validation of a selected set of peptides using targeted proteomics (SRM/MRM assays) further supports the utility of the proposed pipeline. As illustrative examples, DNAH3 and TEPP in spermatozoa, and UNCX and ATAD3C in HEK293 cells were some of the more robust and remarkable identifications in this study. We provide evidence indicating the relevance to carefully analyze the ever-increasing MS/MS data available from PRIDE and other repositories as sources for missing proteins detection in specific biological matrices as revealed for HEK293 cells.

  7. Detection of Missing Proteins Using the PRIDE Database as a Source of Mass Spectrometry Evidence

    PubMed Central

    2016-01-01

    The current catalogue of the human proteome is not yet complete, as experimental proteomics evidence is still elusive for a group of proteins known as the missing proteins. The Human Proteome Project (HPP) has been successfully using technology and bioinformatic resources to improve the characterization of such challenging proteins. In this manuscript, we propose a pipeline starting with the mining of the PRIDE database to select a group of data sets potentially enriched in missing proteins that are subsequently analyzed for protein identification with a method based on the statistical analysis of proteotypic peptides. Spermatozoa and the HEK293 cell line were found to be a promising source of missing proteins and clearly merit further attention in future studies. After the analysis of the selected samples, we found 342 PSMs, suggesting the presence of 97 missing proteins in human spermatozoa or the HEK293 cell line, while only 36 missing proteins were potentially detected in the retina, frontal cortex, aorta thoracica, or placenta. The functional analysis of the missing proteins detected confirmed their tissue specificity, and the validation of a selected set of peptides using targeted proteomics (SRM/MRM assays) further supports the utility of the proposed pipeline. As illustrative examples, DNAH3 and TEPP in spermatozoa, and UNCX and ATAD3C in HEK293 cells were some of the more robust and remarkable identifications in this study. We provide evidence indicating the relevance to carefully analyze the ever-increasing MS/MS data available from PRIDE and other repositories as sources for missing proteins detection in specific biological matrices as revealed for HEK293 cells. PMID:27581094

  8. An emerging cyberinfrastructure for biodefense pathogen and pathogen-host data.

    PubMed

    Zhang, C; Crasta, O; Cammer, S; Will, R; Kenyon, R; Sullivan, D; Yu, Q; Sun, W; Jha, R; Liu, D; Xue, T; Zhang, Y; Moore, M; McGarvey, P; Huang, H; Chen, Y; Zhang, J; Mazumder, R; Wu, C; Sobral, B

    2008-01-01

    The NIAID-funded Biodefense Proteomics Resource Center (RC) provides storage, dissemination, visualization and analysis capabilities for the experimental data deposited by seven Proteomics Research Centers (PRCs). The data and its publication is to support researchers working to discover candidates for the next generation of vaccines, therapeutics and diagnostics against NIAID's Category A, B and C priority pathogens. The data includes transcriptional profiles, protein profiles, protein structural data and host-pathogen protein interactions, in the context of the pathogen life cycle in vivo and in vitro. The database has stored and supported host or pathogen data derived from Bacillus, Brucella, Cryptosporidium, Salmonella, SARS, Toxoplasma, Vibrio and Yersinia, human tissue libraries, and mouse macrophages. These publicly available data cover diverse data types such as mass spectrometry, yeast two-hybrid (Y2H), gene expression profiles, X-ray and NMR determined protein structures and protein expression clones. The growing database covers over 23 000 unique genes/proteins from different experiments and organisms. All of the genes/proteins are annotated and integrated across experiments using UniProt Knowledgebase (UniProtKB) accession numbers. The web-interface for the database enables searching, querying and downloading at the level of experiment, group and individual gene(s)/protein(s) via UniProtKB accession numbers or protein function keywords. The system is accessible at http://www.proteomicsresource.org/.

  9. Overcoming Species Boundaries in Peptide Identification with Bayesian Information Criterion-driven Error-tolerant Peptide Search (BICEPS)*

    PubMed Central

    Renard, Bernhard Y.; Xu, Buote; Kirchner, Marc; Zickmann, Franziska; Winter, Dominic; Korten, Simone; Brattig, Norbert W.; Tzur, Amit; Hamprecht, Fred A.; Steen, Hanno

    2012-01-01

    Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability, and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian information criterion-driven error-tolerant peptide search (BICEPS) and offer an open source implementation based on this statistical criterion to automatically balance the information of each single spectrum and the database, while limiting the run time. We show that BICEPS performs as well as current database search algorithms when such algorithms are applied to sequenced organisms, whereas BICEPS only uses a remotely related organism database. For instance, we use a chicken instead of a human database corresponding to an evolutionary distance of more than 300 million years (International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716). We demonstrate the successful application to cross-species proteomics with a 33% increase in the number of identified proteins for a filarial nematode sample of Litomosoides sigmodontis. PMID:22493179

  10. How many human proteoforms are there?

    PubMed

    Aebersold, Ruedi; Agar, Jeffrey N; Amster, I Jonathan; Baker, Mark S; Bertozzi, Carolyn R; Boja, Emily S; Costello, Catherine E; Cravatt, Benjamin F; Fenselau, Catherine; Garcia, Benjamin A; Ge, Ying; Gunawardena, Jeremy; Hendrickson, Ronald C; Hergenrother, Paul J; Huber, Christian G; Ivanov, Alexander R; Jensen, Ole N; Jewett, Michael C; Kelleher, Neil L; Kiessling, Laura L; Krogan, Nevan J; Larsen, Martin R; Loo, Joseph A; Ogorzalek Loo, Rachel R; Lundberg, Emma; MacCoss, Michael J; Mallick, Parag; Mootha, Vamsi K; Mrksich, Milan; Muir, Tom W; Patrie, Steven M; Pesavento, James J; Pitteri, Sharon J; Rodriguez, Henry; Saghatelian, Alan; Sandoval, Wendy; Schlüter, Hartmut; Sechi, Salvatore; Slavoff, Sarah A; Smith, Lloyd M; Snyder, Michael P; Thomas, Paul M; Uhlén, Mathias; Van Eyk, Jennifer E; Vidal, Marc; Walt, David R; White, Forest M; Williams, Evan R; Wohlschlager, Therese; Wysocki, Vicki H; Yates, Nathan A; Young, Nicolas L; Zhang, Bing

    2018-02-14

    Despite decades of accumulated knowledge about proteins and their post-translational modifications (PTMs), numerous questions remain regarding their molecular composition and biological function. One of the most fundamental queries is the extent to which the combinations of DNA-, RNA- and PTM-level variations explode the complexity of the human proteome. Here, we outline what we know from current databases and measurement strategies including mass spectrometry-based proteomics. In doing so, we examine prevailing notions about the number of modifications displayed on human proteins and how they combine to generate the protein diversity underlying health and disease. We frame central issues regarding determination of protein-level variation and PTMs, including some paradoxes present in the field today. We use this framework to assess existing data and to ask the question, "How many distinct primary structures of proteins (proteoforms) are created from the 20,300 human genes?" We also explore prospects for improving measurements to better regularize protein-level biology and efficiently associate PTMs to function and phenotype.

  11. Analysis of 10,000 ESTs from lymphocytes of the cynomolgus monkey to improve our understanding of its immune system

    PubMed Central

    Chen, Wei-Hua; Wang, Xue-Xia; Lin, Wei; He, Xiao-Wei; Wu, Zhen-Qiang; Lin, Ying; Hu, Song-Nian; Wang, Xiao-Ning

    2006-01-01

    Background The cynomolgus monkey (Macaca fascicularis) is one of the most widely used surrogate animal models for an increasing number of human diseases and vaccines, especially immune-system-related ones. Towards a better understanding of the gene expression background upon its immunogenetics, we constructed a cDNA library from Epstein-Barr virus (EBV)-transformed B lymphocytes of a cynomolgus monkey and sequenced 10,000 randomly picked clones. Results After processing, 8,312 high-quality expressed sequence tags (ESTs) were generated and assembled into 3,728 unigenes. Annotations of these uniquely expressed transcripts demonstrated that out of the 2,524 open reading frame (ORF) positive unigenes (mitochondrial and ribosomal sequences were not included), 98.8% shared significant similarities (E-value less than 1e-10) with the NCBI nucleotide (nt) database, while only 67.7% (E-value less than 1e-5) did so with the NCBI non-redundant protein (nr) database. Further analysis revealed that 90.0% of the unigenes that shared no similarities to the nr database could be assigned to human chromosomes, in which 75 did not match significantly to any cynomolgus monkey and human ESTs. The mapping regions to known human genes on the human genome were described in detail. The protein family and domain analysis revealed that the first, second and fourth of the most abundantly expressed protein families were all assigned to immunoglobulin and major histocompatibility complex (MHC)-related proteins. The expression profiles of these genes were compared with that of homologous genes in human blood, lymph nodes and a RAMOS cell line, which demonstrated expression changes after transformation with EBV. The degree of sequence similarity of the MHC class I and II genes to the human reference sequences was evaluated. The results indicated that class I molecules showed weak amino acid identities (<90%), while class II showed slightly higher ones. Conclusion These results indicated that the genes expressed in the cynomolgus monkey could be used to identify novel protein-coding genes and revise those incomplete or incorrect annotations in the human genome by comparative methods, since the old world monkeys and humans share high similarities at the molecular level, especially within coding regions. The identification of multiple genes involved in the immune response, their sequence variations to the human homologues, and their responses to EBV infection could provide useful information to improve our understanding of the cynomolgus monkey immune system. PMID:16618371

  12. CPLA 1.0: an integrated database of protein lysine acetylation.

    PubMed

    Liu, Zexian; Cao, Jun; Gao, Xinjiao; Zhou, Yanhong; Wen, Longping; Yang, Xiangjiao; Yao, Xuebiao; Ren, Jian; Xue, Yu

    2011-01-01

    As a reversible post-translational modification (PTM) discovered decades ago, protein lysine acetylation was known for its regulation of transcription through the modification of histones. Recent studies discovered that lysine acetylation targets broad substrates and especially plays an essential role in cellular metabolic regulation. Although acetylation is comparable with other major PTMs such as phosphorylation, an integrated resource still remains to be developed. In this work, we presented the compendium of protein lysine acetylation (CPLA) database for lysine acetylated substrates with their sites. From the scientific literature, we manually collected 7151 experimentally identified acetylation sites in 3311 targets. We statistically studied the regulatory roles of lysine acetylation by analyzing the Gene Ontology (GO) and InterPro annotations. Combined with protein-protein interaction information, we systematically discovered a potential human lysine acetylation network (HLAN) among histone acetyltransferases (HATs), substrates and histone deacetylases (HDACs). In particular, there are 1862 triplet relationships of HAT-substrate-HDAC retrieved from the HLAN, at least 13 of which were previously experimentally verified. The online services of CPLA database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0). The CPLA database is freely available for all users at: http://cpla.biocuckoo.org.

  13. pseudoMap: an innovative and comprehensive resource for identification of siRNA-mediated mechanisms in human transcribed pseudogenes.

    PubMed

    Chan, Wen-Ling; Yang, Wen-Kuang; Huang, Hsien-Da; Chang, Jan-Gowth

    2013-01-01

    RNA interference (RNAi) is a gene silencing process within living cells, which is controlled by the RNA-induced silencing complex with a sequence-specific manner. In flies and mice, the pseudogene transcripts can be processed into short interfering RNAs (siRNAs) that regulate protein-coding genes through the RNAi pathway. Following these findings, we construct an innovative and comprehensive database to elucidate siRNA-mediated mechanism in human transcribed pseudogenes (TPGs). To investigate TPG producing siRNAs that regulate protein-coding genes, we mapped the TPGs to small RNAs (sRNAs) that were supported by publicly deep sequencing data from various sRNA libraries and constructed the TPG-derived siRNA-target interactions. In addition, we also presented that TPGs can act as a target for miRNAs that actually regulate the parental gene. To enable the systematic compilation and updating of these results and additional information, we have developed a database, pseudoMap, capturing various types of information, including sequence data, TPG and cognate annotation, deep sequencing data, RNA-folding structure, gene expression profiles, miRNA annotation and target prediction. As our knowledge, pseudoMap is the first database to demonstrate two mechanisms of human TPGs: encoding siRNAs and decoying miRNAs that target the parental gene. pseudoMap is freely accessible at http://pseudomap.mbc.nctu.edu.tw/. Database URL: http://pseudomap.mbc.nctu.edu.tw/

  14. MitProNet: A Knowledgebase and Analysis Platform of Proteome, Interactome and Diseases for Mammalian Mitochondria

    PubMed Central

    Mao, Song; Chai, Xiaoqiang; Hu, Yuling; Hou, Xugang; Tang, Yiheng; Bi, Cheng; Li, Xiao

    2014-01-01

    Mitochondrion plays a central role in diverse biological processes in most eukaryotes, and its dysfunctions are critically involved in a large number of diseases and the aging process. A systematic identification of mitochondrial proteomes and characterization of functional linkages among mitochondrial proteins are fundamental in understanding the mechanisms underlying biological functions and human diseases associated with mitochondria. Here we present a database MitProNet which provides a comprehensive knowledgebase for mitochondrial proteome, interactome and human diseases. First an inventory of mammalian mitochondrial proteins was compiled by widely collecting proteomic datasets, and the proteins were classified by machine learning to achieve a high-confidence list of mitochondrial proteins. The current version of MitProNet covers 1124 high-confidence proteins, and the remainders were further classified as middle- or low-confidence. An organelle-specific network of functional linkages among mitochondrial proteins was then generated by integrating genomic features encoded by a wide range of datasets including genomic context, gene expression profiles, protein-protein interactions, functional similarity and metabolic pathways. The functional-linkage network should be a valuable resource for the study of biological functions of mitochondrial proteins and human mitochondrial diseases. Furthermore, we utilized the network to predict candidate genes for mitochondrial diseases using prioritization algorithms. All proteins, functional linkages and disease candidate genes in MitProNet were annotated according to the information collected from their original sources including GO, GEO, OMIM, KEGG, MIPS, HPRD and so on. MitProNet features a user-friendly graphic visualization interface to present functional analysis of linkage networks. As an up-to-date database and analysis platform, MitProNet should be particularly helpful in comprehensive studies of complicated biological mechanisms underlying mitochondrial functions and human mitochondrial diseases. MitProNet is freely accessible at http://bio.scu.edu.cn:8085/MitProNet. PMID:25347823

  15. Creation of a Human Secretome: A Novel Composite Library of Human Secreted Proteins: Validation Using Ovarian Cancer Gene Expression Data and a Virtual Secretome Array.

    PubMed

    Vathipadiekal, Vinod; Wang, Victoria; Wei, Wei; Waldron, Levi; Drapkin, Ronny; Gillette, Michael; Skates, Steven; Birrer, Michael

    2015-11-01

    To generate a comprehensive "Secretome" of proteins potentially found in the blood and derive a virtual Affymetrix array. To validate the utility of this database for the discovery of novel serum-based biomarkers using ovarian cancer transcriptomic data. The secretome was constructed by aggregating the data from databases of known secreted proteins, transmembrane or membrane proteins, signal peptides, G-protein coupled receptors, or proteins existing in the extracellular region, and the virtual array was generated by mapping them to Affymetrix probeset identifiers. Whole-genome microarray data from ovarian cancer, normal ovarian surface epithelium, and fallopian tube epithelium were used to identify transcripts upregulated in ovarian cancer. We established the secretome from eight public databases and a virtual array consisting of 16,521 Affymetrix U133 Plus 2.0 probesets. Using ovarian cancer transcriptomic data, we identified candidate blood-based biomarkers for ovarian cancer and performed bioinformatic validation by demonstrating rediscovery of known biomarkers including CA125 and HE4. Two novel top biomarkers (FGF18 and GPR172A) were validated in serum samples from an independent patient cohort. We present the secretome, comprising the most comprehensive resource available for protein products that are potentially found in the blood. The associated virtual array can be used to translate gene-expression data into cancer biomarker discovery. A list of blood-based biomarkers for ovarian cancer detection is reported and includes CA125 and HE4. FGF18 and GPR172A were identified and validated by ELISA as being differentially expressed in the serum of ovarian cancer patients compared with controls. ©2015 American Association for Cancer Research.

  16. MAPU: Max-Planck Unified database of organellar, cellular, tissue and body fluid proteomes

    PubMed Central

    Zhang, Yanling; Zhang, Yong; Adachi, Jun; Olsen, Jesper V.; Shi, Rong; de Souza, Gustavo; Pasini, Erica; Foster, Leonard J.; Macek, Boris; Zougman, Alexandre; Kumar, Chanchal; Wiśniewski, Jacek R.; Jun, Wang; Mann, Matthias

    2007-01-01

    Mass spectrometry (MS)-based proteomics has become a powerful technology to map the protein composition of organelles, cell types and tissues. In our department, a large-scale effort to map these proteomes is complemented by the Max-Planck Unified (MAPU) proteome database. MAPU contains several body fluid proteomes; including plasma, urine, and cerebrospinal fluid. Cell lines have been mapped to a depth of several thousand proteins and the red blood cell proteome has also been analyzed in depth. The liver proteome is represented with 3200 proteins. By employing high resolution MS and stringent validation criteria, false positive identification rates in MAPU are lower than 1:1000. Thus MAPU datasets can serve as reference proteomes in biomarker discovery. MAPU contains the peptides identifying each protein, measured masses, scores and intensities and is freely available at using a clickable interface of cell or body parts. Proteome data can be queried across proteomes by protein name, accession number, sequence similarity, peptide sequence and annotation information. More than 4500 mouse and 2500 human proteins have already been identified in at least one proteome. Basic annotation information and links to other public databases are provided in MAPU and we plan to add further analysis tools. PMID:17090601

  17. MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome.

    PubMed

    Wang, Julia; Al-Ouran, Rami; Hu, Yanhui; Kim, Seon-Young; Wan, Ying-Wooi; Wangler, Michael F; Yamamoto, Shinya; Chao, Hsiao-Tuan; Comjean, Aram; Mohr, Stephanie E; Perrimon, Norbert; Liu, Zhandong; Bellen, Hugo J

    2017-06-01

    One major challenge encountered with interpreting human genetic variants is the limited understanding of the functional impact of genetic alterations on biological processes. Furthermore, there remains an unmet demand for an efficient survey of the wealth of information on human homologs in model organisms across numerous databases. To efficiently assess the large volume of publically available information, it is important to provide a concise summary of the most relevant information in a rapid user-friendly format. To this end, we created MARRVEL (model organism aggregated resources for rare variant exploration). MARRVEL is a publicly available website that integrates information from six human genetic databases and seven model organism databases. For any given variant or gene, MARRVEL displays information from OMIM, ExAC, ClinVar, Geno2MP, DGV, and DECIPHER. Importantly, it curates model organism-specific databases to concurrently display a concise summary regarding the human gene homologs in budding and fission yeast, worm, fly, fish, mouse, and rat on a single webpage. Experiment-based information on tissue expression, protein subcellular localization, biological process, and molecular function for the human gene and homologs in the seven model organisms are arranged into a concise output. Hence, rather than visiting multiple separate databases for variant and gene analysis, users can obtain important information by searching once through MARRVEL. Altogether, MARRVEL dramatically improves efficiency and accessibility to data collection and facilitates analysis of human genes and variants by cross-disciplinary integration of 18 million records available in public databases to facilitate clinical diagnosis and basic research. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  18. Reconstruction of a Functional Human Gene Network, with an Application for Prioritizing Positional Candidate Genes

    PubMed Central

    Franke, Lude; Bakel, Harm van; Fokkens, Like; de Jong, Edwin D.; Egmont-Petersen, Michael; Wijmenga, Cisca

    2006-01-01

    Most common genetic disorders have a complex inheritance and may result from variants in many genes, each contributing only weak effects to the disease. Pinpointing these disease genes within the myriad of susceptibility loci identified in linkage studies is difficult because these loci may contain hundreds of genes. However, in any disorder, most of the disease genes will be involved in only a few different molecular pathways. If we know something about the relationships between the genes, we can assess whether some genes (which may reside in different loci) functionally interact with each other, indicating a joint basis for the disease etiology. There are various repositories of information on pathway relationships. To consolidate this information, we developed a functional human gene network that integrates information on genes and the functional relationships between genes, based on data from the Kyoto Encyclopedia of Genes and Genomes, the Biomolecular Interaction Network Database, Reactome, the Human Protein Reference Database, the Gene Ontology database, predicted protein-protein interactions, human yeast two-hybrid interactions, and microarray coexpressions. We applied this network to interrelate positional candidate genes from different disease loci and then tested 96 heritable disorders for which the Online Mendelian Inheritance in Man database reported at least three disease genes. Artificial susceptibility loci, each containing 100 genes, were constructed around each disease gene, and we used the network to rank these genes on the basis of their functional interactions. By following up the top five genes per artificial locus, we were able to detect at least one known disease gene in 54% of the loci studied, representing a 2.8-fold increase over random selection. This suggests that our method can significantly reduce the cost and effort of pinpointing true disease genes in analyses of disorders for which numerous loci have been reported but for which most of the genes are unknown. PMID:16685651

  19. Evaluation of Potential Infectivity of Alzheimer and Parkinson Disease Proteins in Recipients of Cadaver-Derived Human Growth Hormone

    PubMed Central

    Irwin, David J.; Abrams, Joseph Y.; Schonberger, Lawrence B.; Leschek, Ellen Werber; Mills, James L.; Lee, Virginia M.-Y.; Trojanowski, John Q.

    2013-01-01

    Importance Growing evidence of cell-to-cell transmission of neurodegenerative disease (ND)–associated proteins (NDAPs) (ie, tau, Aβ, and α-synuclein) suggests possible similarities in the infectious prion protein (PrPsc) in spongiform encephalopathies. There are limited data on the potential human-to-human transmission of NDAPs associated with Alzheimer disease (AD) and other non-PrPsc ND. Objective To examine evidence for human-to-human transmission of AD, Parkinson disease (PD), and related NDAPs in cadaveric human growth hormone (c-hGH) recipients. Design We conducted a detailed immunohistochemical analysis of pathological NDAPs other than PrPsc in human pituitary glands. We also searched for ND in recipients of pituitary-derived c-hGH by reviewing the National Hormone and Pituitary Program (NHPP) cohort database and medical literature. Setting University-based academic center and agencies of the US Department of Health and Human Services. Participants Thirty-four routine autopsy subjects (10 non-ND controls and 24 patients with ND) and a US cohort of c-hGH recipients in the NHPP. Main Outcome Measures Detectable NDAPs in human pituitary sections and death certificate reports of non-PrPsc ND in the NHPP database. Results We found mild amounts of pathological tau, Aβ, and α-synuclein deposits in the adeno/neurohypophysis of patients with ND and control patients. No cases of AD or PD were identified, and 3 deaths attributed to amyotrophic lateral sclerosis (ALS) were found among US NHPP c-hGH recipients, including 2 of the 796 decedents in the originally confirmed NHPP c-hGH cohort database. Conclusions and Relevance Despite the likely frequent exposure of c-hGH recipients to NDAPs, and their markedly elevated risk of PrPsc-related disease, this population of NHPP c-hGH recipients does not appear to be at increased risk of AD or PD. We discovered 3 ALS cases of unclear significance among US c-hGH recipients despite the absence of pathological deposits of ALS-associated proteins (TDP-43, FUS, and ubiquilin) in human pituitary glands. In this unique in vivo model of human-to-human transmission, we found no evidence to support concerns that NDAPs underlying AD and PD transmit disease in humans despite evidence of their cell-to-cell transmission in model systems of these disorders. Further monitoring is required to confirm these conclusions. PMID:23380910

  20. Elucidation of cross-species proteomic effects in human and hominin bone proteome identification through a bioinformatics experiment.

    PubMed

    Welker, F

    2018-02-20

    The study of ancient protein sequences is increasingly focused on the analysis of older samples, including those of ancient hominins. The analysis of such ancient proteomes thereby potentially suffers from "cross-species proteomic effects": the loss of peptide and protein identifications at increased evolutionary distances due to a larger number of protein sequence differences between the database sequence and the analyzed organism. Error-tolerant proteomic search algorithms should theoretically overcome this problem at both the peptide and protein level; however, this has not been demonstrated. If error-tolerant searches do not overcome the cross-species proteomic issue then there might be inherent biases in the identified proteomes. Here, a bioinformatics experiment is performed to test this using a set of modern human bone proteomes and three independent searches against sequence databases at increasing evolutionary distances: the human (0 Ma), chimpanzee (6-8 Ma) and orangutan (16-17 Ma) reference proteomes, respectively. Incorrectly suggested amino acid substitutions are absent when employing adequate filtering criteria for mutable Peptide Spectrum Matches (PSMs), but roughly half of the mutable PSMs were not recovered. As a result, peptide and protein identification rates are higher in error-tolerant mode compared to non-error-tolerant searches but did not recover protein identifications completely. Data indicates that peptide length and the number of mutations between the target and database sequences are the main factors influencing mutable PSM identification. The error-tolerant results suggest that the cross-species proteomics problem is not overcome at increasing evolutionary distances, even at the protein level. Peptide and protein loss has the potential to significantly impact divergence dating and proteome comparisons when using ancient samples as there is a bias towards the identification of conserved sequences and proteins. Effects are minimized between moderately divergent proteomes, as indicated by almost complete recovery of informative positions in the search against the chimpanzee proteome (≈90%, 6-8 Ma). This provides a bioinformatic background to future phylogenetic and proteomic analysis of ancient hominin proteomes, including the future description of novel hominin amino acid sequences, but also has negative implications for the study of fast-evolving proteins in hominins, non-hominin animals, and ancient bacterial proteins in evolutionary contexts.

  1. Human milk fortifier with high versus standard protein content for promoting growth of preterm infants: A meta-analysis.

    PubMed

    Liu, Tian-Tian; Dang, Dan; Lv, Xiao-Ming; Wang, Teng-Fei; Du, Jin-Feng; Wu, Hui

    2015-06-01

    To compare the growth of preterm infants fed standard protein-fortified human milk with that containing human milk fortifier (HMF) with a higher-than-standard protein content. Published articles reporting randomized controlled trials and prospective observational intervention studies listed on the PubMed®, Embase®, CINAHL and Cochrane Library databases were searched using the keywords 'fortifier', 'human milk', 'breastfeeding', 'breast milk' and 'human milk fortifier'. The mean difference with 95% confidence intervals was used to compare the effect of HMF with a higher-than-standard protein content on infant growth characteristics. Five studies with 352 infants with birth weight ≤ 1750 g and a gestational age ≤ 34 weeks who were fed human milk were included in this meta-analysis. Infants in the experimental groups given human milk with higher-than-standard protein fortifier achieved significantly greater weight and length at the end of the study, and greater weight gain, length gain, and head circumference gain, compared with control groups fed human milk with the standard HMF. HMF with a higher-than-standard protein content can improve preterm infant growth compared with standard HMF. © The Author(s) 2015 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.

  2. Consolidation of proteomics data in the Cancer Proteomics database.

    PubMed

    Arntzen, Magnus Ø; Boddie, Paul; Frick, Rahel; Koehler, Christian J; Thiede, Bernd

    2015-11-01

    Cancer is a class of diseases characterized by abnormal cell growth and one of the major reasons for human deaths. Proteins are involved in the molecular mechanisms leading to cancer, furthermore they are affected by anti-cancer drugs, and protein biomarkers can be used to diagnose certain cancer types. Therefore, it is important to explore the proteomics background of cancer. In this report, we developed the Cancer Proteomics database to re-interrogate published proteome studies investigating cancer. The database is divided in three sections related to cancer processes, cancer types, and anti-cancer drugs. Currently, the Cancer Proteomics database contains 9778 entries of 4118 proteins extracted from 143 scientific articles covering all three sections: cell death (cancer process), prostate cancer (cancer type) and platinum-based anti-cancer drugs including carboplatin, cisplatin, and oxaliplatin (anti-cancer drugs). The detailed information extracted from the literature includes basic information about the articles (e.g., PubMed ID, authors, journal name, publication year), information about the samples (type, study/reference, prognosis factor), and the proteomics workflow (Subcellular fractionation, protein, and peptide separation, mass spectrometry, quantification). Useful annotations such as hyperlinks to UniProt and PubMed were included. In addition, many filtering options were established as well as export functions. The database is freely available at http://cancerproteomics.uio.no. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. On-the-fly selection of cell-specific enhancers, genes, miRNAs and proteins across the human body using SlideBase

    PubMed Central

    Ienasescu, Hans; Li, Kang; Andersson, Robin; Vitezic, Morana; Rennie, Sarah; Chen, Yun; Vitting-Seerup, Kristoffer; Lagoni, Emil; Boyd, Mette; Bornholdt, Jette; de Hoon, Michiel J. L.; Kawaji, Hideya; Lassmann, Timo; Hayashizaki, Yoshihide; Forrest, Alistair R. R.; Carninci, Piero; Sandelin, Albin

    2016-01-01

    Genomics consortia have produced large datasets profiling the expression of genes, micro-RNAs, enhancers and more across human tissues or cells. There is a need for intuitive tools to select subsets of such data that is the most relevant for specific studies. To this end, we present SlideBase, a web tool which offers a new way of selecting genes, promoters, enhancers and microRNAs that are preferentially expressed/used in a specified set of cells/tissues, based on the use of interactive sliders. With the help of sliders, SlideBase enables users to define custom expression thresholds for individual cell types/tissues, producing sets of genes, enhancers etc. which satisfy these constraints. Changes in slider settings result in simultaneous changes in the selected sets, updated in real time. SlideBase is linked to major databases from genomics consortia, including FANTOM, GTEx, The Human Protein Atlas and BioGPS. Database URL: http://slidebase.binf.ku.dk PMID:28025337

  4. Large-Scale Concatenation cDNA Sequencing

    PubMed Central

    Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

    1997-01-01

    A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174

  5. An emerging cyberinfrastructure for biodefense pathogen and pathogen–host data

    PubMed Central

    Zhang, C.; Crasta, O.; Cammer, S.; Will, R.; Kenyon, R.; Sullivan, D.; Yu, Q.; Sun, W.; Jha, R.; Liu, D.; Xue, T.; Zhang, Y.; Moore, M.; McGarvey, P.; Huang, H.; Chen, Y.; Zhang, J.; Mazumder, R.; Wu, C.; Sobral, B.

    2008-01-01

    The NIAID-funded Biodefense Proteomics Resource Center (RC) provides storage, dissemination, visualization and analysis capabilities for the experimental data deposited by seven Proteomics Research Centers (PRCs). The data and its publication is to support researchers working to discover candidates for the next generation of vaccines, therapeutics and diagnostics against NIAID's Category A, B and C priority pathogens. The data includes transcriptional profiles, protein profiles, protein structural data and host–pathogen protein interactions, in the context of the pathogen life cycle in vivo and in vitro. The database has stored and supported host or pathogen data derived from Bacillus, Brucella, Cryptosporidium, Salmonella, SARS, Toxoplasma, Vibrio and Yersinia, human tissue libraries, and mouse macrophages. These publicly available data cover diverse data types such as mass spectrometry, yeast two-hybrid (Y2H), gene expression profiles, X-ray and NMR determined protein structures and protein expression clones. The growing database covers over 23 000 unique genes/proteins from different experiments and organisms. All of the genes/proteins are annotated and integrated across experiments using UniProt Knowledgebase (UniProtKB) accession numbers. The web-interface for the database enables searching, querying and downloading at the level of experiment, group and individual gene(s)/protein(s) via UniProtKB accession numbers or protein function keywords. The system is accessible at http://www.proteomicsresource.org/. PMID:17984082

  6. Database resources of the National Center for Biotechnology Information.

    PubMed

    Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; Dicuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian

    2012-01-01

    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

  7. Database resources of the National Center for Biotechnology Information

    PubMed Central

    Acland, Abigail; Agarwala, Richa; Barrett, Tanya; Beck, Jeff; Benson, Dennis A.; Bollin, Colleen; Bolton, Evan; Bryant, Stephen H.; Canese, Kathi; Church, Deanna M.; Clark, Karen; DiCuccio, Michael; Dondoshansky, Ilya; Federhen, Scott; Feolo, Michael; Geer, Lewis Y.; Gorelenkov, Viatcheslav; Hoeppner, Marilu; Johnson, Mark; Kelly, Christopher; Khotomlianski, Viatcheslav; Kimchi, Avi; Kimelman, Michael; Kitts, Paul; Krasnov, Sergey; Kuznetsov, Anatoliy; Landsman, David; Lipman, David J.; Lu, Zhiyong; Madden, Thomas L.; Madej, Tom; Maglott, Donna R.; Marchler-Bauer, Aron; Karsch-Mizrachi, Ilene; Murphy, Terence; Ostell, James; O'Sullivan, Christopher; Panchenko, Anna; Phan, Lon; Pruitt, Don Preussm Kim D.; Rubinstein, Wendy; Sayers, Eric W.; Schneider, Valerie; Schuler, Gregory D.; Sequeira, Edwin; Sherry, Stephen T.; Shumway, Martin; Sirotkin, Karl; Siyan, Karanjit; Slotta, Douglas; Soboleva, Alexandra; Soussov, Vladimir; Starchenko, Grigory; Tatusova, Tatiana A.; Trawick, Bart W.; Vakatov, Denis; Wang, Yanli; Ward, Minghong; John Wilbur, W.; Yaschenko, Eugene; Zbicz, Kerry

    2014-01-01

    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, PubReader, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Primer-BLAST, COBALT, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, ClinVar, MedGen, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page. PMID:24259429

  8. Functional Interaction Network Construction and Analysis for Disease Discovery.

    PubMed

    Wu, Guanming; Haw, Robin

    2017-01-01

    Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.

  9. HIV Molecular Immunology 2014

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yusim, Karina; Korber, Bette Tina Marie; Barouch, Dan

    HIV Molecular Immunology is a companion volume to HIV Sequence Compendium. This publication, the 2014 edition, is the PDF version of the web-based HIV Immunology Database (http://www.hiv.lanl.gov/content/immunology/). The web interface for this relational database has many search options, as well as interactive tools to help immunologists design reagents and interpret their results. In the HIV Immunology Database, HIV-specific B-cell and T-cell responses are summarized and annotated. Immunological responses are divided into three parts, CTL, T helper, and antibody. Within these parts, defined epitopes are organized by protein and binding sites within each protein, moving from left to right through themore » coding regions spanning the HIV genome. We include human responses to natural HIV infections, as well as vaccine studies in a range of animal models and human trials. Responses that are not specifically defined, such as responses to whole proteins or monoclonal antibody responses to discontinuous epitopes, are summarized at the end of each protein section. Studies describing general HIV responses to the virus, but not to any specific protein, are included at the end of each part. The annotation includes information such as crossreactivity, escape mutations, antibody sequence, TCR usage, functional domains that overlap with an epitope, immune response associations with rates of progression and therapy, and how specific epitopes were experimentally defined. Basic information such as HLA specificities for T-cell epitopes, isotypes of monoclonal antibodies, and epitope sequences are included whenever possible. All studies that we can find that incorporate the use of a specific monoclonal antibody are included in the entry for that antibody. A single T-cell epitope can have multiple entries, generally one entry per study. Finally, maps of all defined linear epitopes relative to the HXB2 reference proteins are provided.« less

  10. Novel Gbeta Mimic Kelch Proteins (Gpb1 and Gpb2 Connect G-Protein Signaling to Ras via Yeast Neurofibromin Homologs Ira1 and Ira2: A Model for Human NF1

    DTIC Science & Technology

    2008-03-01

    tractable fungal model system, Cryptococcus neoformans, and identified two kelch repeat homologs that are involved in mating (Kem1 and Kem2). To...find kelch-repeat proteins involved in G protein signaling, Cryptococcus homologues of Gpb1/2, which interacts with and negatively regulates the G...protein alpha subunit, Gpa2, in S. cerevisiae, were searched by BLAST (tblastn) in Cryptococcus genome database of serotype A (Duke University Medical

  11. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders.

    PubMed

    Hamosh, Ada; Scott, Alan F; Amberger, Joanna S; Bocchini, Carol A; McKusick, Victor A

    2005-01-01

    Online Mendelian Inheritance in Man (OMIM) is a comprehensive, authoritative and timely knowledgebase of human genes and genetic disorders compiled to support human genetics research and education and the practice of clinical genetics. Started by Dr Victor A. McKusick as the definitive reference Mendelian Inheritance in Man, OMIM (http://www.ncbi.nlm.nih.gov/omim/) is now distributed electronically by the National Center for Biotechnology Information, where it is integrated with the Entrez suite of databases. Derived from the biomedical literature, OMIM is written and edited at Johns Hopkins University with input from scientists and physicians around the world. Each OMIM entry has a full-text summary of a genetically determined phenotype and/or gene and has numerous links to other genetic databases such as DNA and protein sequence, PubMed references, general and locus-specific mutation databases, HUGO nomenclature, MapViewer, GeneTests, patient support groups and many others. OMIM is an easy and straightforward portal to the burgeoning information in human genetics.

  12. BloodChIP: a database of comparative genome-wide transcription factor binding profiles in human blood cells.

    PubMed

    Chacon, Diego; Beck, Dominik; Perera, Dilmi; Wong, Jason W H; Pimanda, John E

    2014-01-01

    The BloodChIP database (http://www.med.unsw.edu.au/CRCWeb.nsf/page/BloodChIP) supports exploration and visualization of combinatorial transcription factor (TF) binding at a particular locus in human CD34-positive and other normal and leukaemic cells or retrieval of target gene sets for user-defined combinations of TFs across one or more cell types. Increasing numbers of genome-wide TF binding profiles are being added to public repositories, and this trend is likely to continue. For the power of these data sets to be fully harnessed by experimental scientists, there is a need for these data to be placed in context and easily accessible for downstream applications. To this end, we have built a user-friendly database that has at its core the genome-wide binding profiles of seven key haematopoietic TFs in human stem/progenitor cells. These binding profiles are compared with binding profiles in normal differentiated and leukaemic cells. We have integrated these TF binding profiles with chromatin marks and expression data in normal and leukaemic cell fractions. All queries can be exported into external sites to construct TF-gene and protein-protein networks and to evaluate the association of genes with cellular processes and tissue expression.

  13. CPLA 1.0: an integrated database of protein lysine acetylation

    PubMed Central

    Liu, Zexian; Cao, Jun; Gao, Xinjiao; Zhou, Yanhong; Wen, Longping; Yang, Xiangjiao; Yao, Xuebiao; Ren, Jian; Xue, Yu

    2011-01-01

    As a reversible post-translational modification (PTM) discovered decades ago, protein lysine acetylation was known for its regulation of transcription through the modification of histones. Recent studies discovered that lysine acetylation targets broad substrates and especially plays an essential role in cellular metabolic regulation. Although acetylation is comparable with other major PTMs such as phosphorylation, an integrated resource still remains to be developed. In this work, we presented the compendium of protein lysine acetylation (CPLA) database for lysine acetylated substrates with their sites. From the scientific literature, we manually collected 7151 experimentally identified acetylation sites in 3311 targets. We statistically studied the regulatory roles of lysine acetylation by analyzing the Gene Ontology (GO) and InterPro annotations. Combined with protein–protein interaction information, we systematically discovered a potential human lysine acetylation network (HLAN) among histone acetyltransferases (HATs), substrates and histone deacetylases (HDACs). In particular, there are 1862 triplet relationships of HAT-substrate-HDAC retrieved from the HLAN, at least 13 of which were previously experimentally verified. The online services of CPLA database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0). The CPLA database is freely available for all users at: http://cpla.biocuckoo.org. PMID:21059677

  14. Genomic identification of potential targets unique to Candida albicans for the discovery of antifungal agents.

    PubMed

    Tripathi, Himanshu; Luqman, Suaib; Meena, Abha; Khan, Feroz

    2014-01-01

    Despite of modern antifungal therapy, the mortality rates of invasive infection with human fungal pathogen Candida albicans are up to 40%. Studies suggest that drug resistance in the three most common species of human fungal pathogens viz., C. albicans, Aspergillus fumigatus (causing mortality rate up to 90%) and Cryptococcus neoformans (causing mortality rate up to 70%) is due to mutations in the target enzymes or high expression of drug transporter genes. Drug resistance in human fungal pathogens has led to an imperative need for the identification of new targets unique to fungal pathogens. In the present study, we have used a comparative genomics approach to find out potential target proteins unique to C. albicans, an opportunistic fungus responsible for severe infection in immune-compromised human. Interestingly, many target proteins of existing antifungal agents showed orthologs in human cells. To identify unique proteins, we have compared proteome of C. albicans [SC5314] i.e., 14,633 total proteins retrieved from the RefSeq database of NCBI, USA with proteome of human and non-pathogenic yeast Saccharomyces cerevisiae. Results showed that 4,568 proteins were identified unique to C. albicans as compared to those of human and later when these unique proteins were compared with S. cerevisiae proteome, finally 2,161 proteins were identified as unique proteins and after removing repeats total 1,618 unique proteins (42 functionally known, 1,566 hypothetical and 10 unknown) were selected as potential antifungal drug targets unique to C. albicans.

  15. Transporter taxonomy - a comparison of different transport protein classification schemes.

    PubMed

    Viereck, Michael; Gaulton, Anna; Digles, Daniela; Ecker, Gerhard F

    2014-06-01

    Currently, there are more than 800 well characterized human membrane transport proteins (including channels and transporters) and there are estimates that about 10% (approx. 2000) of all human genes are related to transport. Membrane transport proteins are of interest as potential drug targets, for drug delivery, and as a cause of side effects and drug–drug interactions. In light of the development of Open PHACTS, which provides an open pharmacological space, we analyzed selected membrane transport protein classification schemes (Transporter Classification Database, ChEMBL, IUPHAR/BPS Guide to Pharmacology, and Gene Ontology) for their ability to serve as a basis for pharmacology driven protein classification. A comparison of these membrane transport protein classification schemes by using a set of clinically relevant transporters as use-case reveals the strengths and weaknesses of the different taxonomy approaches.

  16. Discovery of candidate KEN-box motifs using cell cycle keyword enrichment combined with native disorder prediction and motif conservation.

    PubMed

    Michael, Sushama; Travé, Gilles; Ramu, Chenna; Chica, Claudia; Gibson, Toby J

    2008-02-15

    KEN-box-mediated target selection is one of the mechanisms used in the proteasomal destruction of mitotic cell cycle proteins via the APC/C complex. While annotating the Eukaryotic Linear Motif resource (ELM, http://elm.eu.org/), we found that KEN motifs were significantly enriched in human protein entries with cell cycle keywords in the UniProt/Swiss-Prot database-implying that KEN-boxes might be more common than reported. Matches to short linear motifs in protein database searches are not, per se, significant. KEN-box enrichment with cell cycle Gene Ontology terms suggests that collectively these motifs are functional but does not prove that any given instance is so. Candidates were surveyed for native disorder prediction using GlobPlot and IUPred and for motif conservation in homologues. Among >25 strong new candidates, the most notable are human HIPK2, CHFR, CDC27, Dab2, Upf2, kinesin Eg5, DNA Topoisomerase 1 and yeast Cdc5 and Swi5. A similar number of weaker candidates were present. These proteins have yet to be tested for APC/C targeted destruction, providing potential new avenues of research.

  17. MAPU: Max-Planck Unified database of organellar, cellular, tissue and body fluid proteomes.

    PubMed

    Zhang, Yanling; Zhang, Yong; Adachi, Jun; Olsen, Jesper V; Shi, Rong; de Souza, Gustavo; Pasini, Erica; Foster, Leonard J; Macek, Boris; Zougman, Alexandre; Kumar, Chanchal; Wisniewski, Jacek R; Jun, Wang; Mann, Matthias

    2007-01-01

    Mass spectrometry (MS)-based proteomics has become a powerful technology to map the protein composition of organelles, cell types and tissues. In our department, a large-scale effort to map these proteomes is complemented by the Max-Planck Unified (MAPU) proteome database. MAPU contains several body fluid proteomes; including plasma, urine, and cerebrospinal fluid. Cell lines have been mapped to a depth of several thousand proteins and the red blood cell proteome has also been analyzed in depth. The liver proteome is represented with 3200 proteins. By employing high resolution MS and stringent validation criteria, false positive identification rates in MAPU are lower than 1:1000. Thus MAPU datasets can serve as reference proteomes in biomarker discovery. MAPU contains the peptides identifying each protein, measured masses, scores and intensities and is freely available at http://www.mapuproteome.com using a clickable interface of cell or body parts. Proteome data can be queried across proteomes by protein name, accession number, sequence similarity, peptide sequence and annotation information. More than 4500 mouse and 2500 human proteins have already been identified in at least one proteome. Basic annotation information and links to other public databases are provided in MAPU and we plan to add further analysis tools.

  18. dbPAF: an integrative database of protein phosphorylation in animals and fungi.

    PubMed

    Ullah, Shahid; Lin, Shaofeng; Xu, Yang; Deng, Wankun; Ma, Lili; Zhang, Ying; Liu, Zexian; Xue, Yu

    2016-03-24

    Protein phosphorylation is one of the most important post-translational modifications (PTMs) and regulates a broad spectrum of biological processes. Recent progresses in phosphoproteomic identifications have generated a flood of phosphorylation sites, while the integration of these sites is an urgent need. In this work, we developed a curated database of dbPAF, containing known phosphorylation sites in H. sapiens, M. musculus, R. norvegicus, D. melanogaster, C. elegans, S. pombe and S. cerevisiae. From the scientific literature and public databases, we totally collected and integrated 54,148 phosphoproteins with 483,001 phosphorylation sites. Multiple options were provided for accessing the data, while original references and other annotations were also present for each phosphoprotein. Based on the new data set, we computationally detected significantly over-represented sequence motifs around phosphorylation sites, predicted potential kinases that are responsible for the modification of collected phospho-sites, and evolutionarily analyzed phosphorylation conservation states across different species. Besides to be largely consistent with previous reports, our results also proposed new features of phospho-regulation. Taken together, our database can be useful for further analyses of protein phosphorylation in human and other model organisms. The dbPAF database was implemented in PHP + MySQL and freely available at http://dbpaf.biocuckoo.org.

  19. (abstract) Modeling Protein Families and Human Genes: Hidden Markov Models and a Little Beyond

    NASA Technical Reports Server (NTRS)

    Baldi, Pierre

    1994-01-01

    We will first give a brief overview of Hidden Markov Models (HMMs) and their use in Computational Molecular Biology. In particular, we will describe a detailed application of HMMs to the G-Protein-Coupled-Receptor Superfamily. We will also describe a number of analytical results on HMMs that can be used in discrimination tests and database mining. We will then discuss the limitations of HMMs and some new directions of research. We will conclude with some recent results on the application of HMMs to human gene modeling and parsing.

  20. The Zebrafish Model Organism Database: new support for human disease models, mutation details, gene expression phenotypes and searching

    PubMed Central

    Howe, Douglas G.; Bradford, Yvonne M.; Eagle, Anne; Fashena, David; Frazer, Ken; Kalita, Patrick; Mani, Prita; Martin, Ryan; Moxon, Sierra Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Toro, Sabrina; Van Slyke, Ceri; Westerfield, Monte

    2017-01-01

    The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for zebrafish (Danio rerio) genetic, genomic, phenotypic and developmental data. ZFIN curators provide expert manual curation and integration of comprehensive data involving zebrafish genes, mutants, transgenic constructs and lines, phenotypes, genotypes, gene expressions, morpholinos, TALENs, CRISPRs, antibodies, anatomical structures, models of human disease and publications. We integrate curated, directly submitted, and collaboratively generated data, making these available to zebrafish research community. Among the vertebrate model organisms, zebrafish are superbly suited for rapid generation of sequence-targeted mutant lines, characterization of phenotypes including gene expression patterns, and generation of human disease models. The recent rapid adoption of zebrafish as human disease models is making management of these data particularly important to both the research and clinical communities. Here, we describe recent enhancements to ZFIN including use of the zebrafish experimental conditions ontology, ‘Fish’ records in the ZFIN database, support for gene expression phenotypes, models of human disease, mutation details at the DNA, RNA and protein levels, and updates to the ZFIN single box search. PMID:27899582

  1. Proteomic analysis of bovine nucleolus.

    PubMed

    Patel, Amrutlal K; Olson, Doug; Tikoo, Suresh K

    2010-09-01

    Nucleolus is the most prominent subnuclear structure, which performs a wide variety of functions in the eukaryotic cellular processes. In order to understand the structural and functional role of the nucleoli in bovine cells, we analyzed the proteomic composition of the bovine nucleoli. The nucleoli were isolated from Madin Darby bovine kidney cells and subjected to proteomic analysis by LC-MS/MS after fractionation by SDS-PAGE and strong cation exchange chromatography. Analysis of the data using the Mascot database search and the GPM database search identified 311 proteins in the bovine nucleoli, which contained 22 proteins previously not identified in the proteomic analysis of human nucleoli. Analysis of the identified proteins using the GoMiner software suggested that the bovine nucleoli contained proteins involved in ribosomal biogenesis, cell cycle control, transcriptional, translational and post-translational regulation, transport, and structural organization. Copyright © 2010 Beijing Genomics Institute. Published by Elsevier Ltd. All rights reserved.

  2. Applicability of computational systems biology in toxicology.

    PubMed

    Kongsbak, Kristine; Hadrup, Niels; Audouze, Karine; Vinggaard, Anne Marie

    2014-07-01

    Systems biology as a research field has emerged within the last few decades. Systems biology, often defined as the antithesis of the reductionist approach, integrates information about individual components of a biological system. In integrative systems biology, large data sets from various sources and databases are used to model and predict effects of chemicals on, for instance, human health. In toxicology, computational systems biology enables identification of important pathways and molecules from large data sets; tasks that can be extremely laborious when performed by a classical literature search. However, computational systems biology offers more advantages than providing a high-throughput literature search; it may form the basis for establishment of hypotheses on potential links between environmental chemicals and human diseases, which would be very difficult to establish experimentally. This is possible due to the existence of comprehensive databases containing information on networks of human protein-protein interactions and protein-disease associations. Experimentally determined targets of the specific chemical of interest can be fed into these networks to obtain additional information that can be used to establish hypotheses on links between the chemical and human diseases. Such information can also be applied for designing more intelligent animal/cell experiments that can test the established hypotheses. Here, we describe how and why to apply an integrative systems biology method in the hypothesis-generating phase of toxicological research. © 2014 Nordic Association for the Publication of BCPT (former Nordic Pharmacological Society).

  3. Sequence conservation from human to prokaryotes of Surf1, a protein involved in cytochrome c oxidase assembly, deficient in Leigh syndrome.

    PubMed

    Poyau, A; Buchet, K; Godinot, C

    1999-12-03

    The human SURF1 gene encoding a protein involved in cytochrome c oxidase (COX) assembly, is mutated in most patients presenting Leigh syndrome associated with COX deficiency. Proteins homologous to the human Surf1 have been identified in nine eukaryotes and six prokaryotes using database alignment tools, structure prediction and/or cDNA sequencing. Their sequence comparison revealed a remarkable Surf1 conservation during evolution and put forward at least four highly conserved domains that should be essential for Surf1 function. In Paracoccus denitrificans, the Surf1 homologue is found in the quinol oxidase operon, suggesting that Surf1 is associated with a primitive quinol oxidase which belongs to the same superfamily as cytochrome oxidase.

  4. Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases.

    PubMed

    Berger, Seth I; Posner, Jeremy M; Ma'ayan, Avi

    2007-10-04

    In recent years, mammalian protein-protein interaction network databases have been developed. The interactions in these databases are either extracted manually from low-throughput experimental biomedical research literature, extracted automatically from literature using techniques such as natural language processing (NLP), generated experimentally using high-throughput methods such as yeast-2-hybrid screens, or interactions are predicted using an assortment of computational approaches. Genes or proteins identified as significantly changing in proteomic experiments, or identified as susceptibility disease genes in genomic studies, can be placed in the context of protein interaction networks in order to assign these genes and proteins to pathways and protein complexes. Genes2Networks is a software system that integrates the content of ten mammalian interaction network datasets. Filtering techniques to prune low-confidence interactions were implemented. Genes2Networks is delivered as a web-based service using AJAX. The system can be used to extract relevant subnetworks created from "seed" lists of human Entrez gene symbols. The output includes a dynamic linkable three color web-based network map, with a statistical analysis report that identifies significant intermediate nodes used to connect the seed list. Genes2Networks is powerful web-based software that can help experimental biologists to interpret lists of genes and proteins such as those commonly produced through genomic and proteomic experiments, as well as lists of genes and proteins associated with disease processes. This system can be used to find relationships between genes and proteins from seed lists, and predict additional genes or proteins that may play key roles in common pathways or protein complexes.

  5. Identification of StARD3 as a Lutein-binding Protein in the Macula of the Primate Retina†

    PubMed Central

    Li, Binxing; Vachali, Preejith; Frederick, Jeanne M.; Bernstein, Paul S.

    2011-01-01

    Lutein, zeaxanthin and their metabolites are the xanthophyll carotenoids that form the macular pigment of the human retina. Epidemiological evidence suggests that high levels of these carotenoids in the diet, serum and macula are associated with decreased risk of age-related macular degeneration (AMD), and the AREDS2 study is prospectively testing this hypothesis. Understanding the biochemical mechanisms underlying the selective uptakes of lutein and zeaxanthin into the human macula may provide important insights into the physiology of the human macula in health and disease. GSTP1 is the macular zeaxanthin-binding protein, but the identity of the human macular lutein-binding protein has remained elusive. Prior identification of the silkworm lutein-binding protein (CBP) as a member of the steroidogenic acute regulatory domain (StARD) protein family, and selective labeling of monkey photoreceptor inner segments by anti-CBP antibody provided an important clue toward identifying the primate retina lutein-binding protein. Homology of CBP to all 15 human StARD proteins was analyzed using database searches, western blotting and immunohistochemistry, and we here provide evidence to identify StARD3 (also known as MLN64) as a human retinal lutein-binding protein. Further, recombinant StARD3 selectively binds lutein with high affinity (KD = 0.45 micromolar) when assessed by surface plasmon resonance (SPR) binding assays. Our results demonstrate previously unrecognized, specific interactions of StARD3 with lutein and provide novel avenues to explore its roles in human macular physiology and disease. PMID:21322544

  6. Screening of missing proteins in the human liver proteome by improved MRM-approach-based targeted proteomics.

    PubMed

    Chen, Chen; Liu, Xiaohui; Zheng, Weimin; Zhang, Lei; Yao, Jun; Yang, Pengyuan

    2014-04-04

    To completely annotate the human genome, the task of identifying and characterizing proteins that currently lack mass spectrometry (MS) evidence is inevitable and urgent. In this study, as the first effort to screen missing proteins in large scale, we developed an approach based on SDS-PAGE followed by liquid chromatography-multiple reaction monitoring (LC-MRM), for screening of those missing proteins with only a single peptide hit in the previous liver proteome data set. Proteins extracted from normal human liver were separated in SDS-PAGE and digested in split gel slice, and the resulting digests were then subjected to LC-schedule MRM analysis. The MRM assays were developed through synthesized crude peptides for target peptides. In total, the expressions of 57 target proteins were confirmed from 185 MRM assays in normal human liver tissues. Among the proved 57 one-hit wonders, 50 proteins are of the minimally redundant set in the PeptideAtlas database, 7 proteins even have none MS-based information previously in various biological processes. We conclude that our SDS-PAGE-MRM workflow can be a powerful approach to screen missing or poorly characterized proteins in different samples and to provide their quantity if detected. The MRM raw data have been uploaded to ISB/SRM Atlas/PASSEL (PXD000648).

  7. IMGT/3Dstructure-DB and IMGT/StructuralQuery, a database and a tool for immunoglobulin, T cell receptor and MHC structural data

    PubMed Central

    Kaas, Quentin; Ruiz, Manuel; Lefranc, Marie-Paule

    2004-01-01

    IMGT/3Dstructure-DB and IMGT/Structural-Query are a novel 3D structure database and a new tool for immunological proteins. They are part of IMGT, the international ImMunoGenetics information system®, a high-quality integrated knowledge resource specializing in immunoglobulins (IG), T cell receptors (TR), major histocompatibility complex (MHC) and related proteins of the immune system (RPI) of human and other vertebrate species, which consists of databases, Web resources and interactive on-line tools. IMGT/3Dstructure-DB data are described according to the IMGT Scientific chart rules based on the IMGT-ONTOLOGY concepts. IMGT/3Dstructure-DB provides IMGT gene and allele identification of IG, TR and MHC proteins with known 3D structures, domain delimitations, amino acid positions according to the IMGT unique numbering and renumbered coordinate flat files. Moreover IMGT/3Dstructure-DB provides 2D graphical representations (or Collier de Perles) and results of contact analysis. The IMGT/StructuralQuery tool allows search of this database based on specific structural characteristics. IMGT/3Dstructure-DB and IMGT/StructuralQuery are freely available at http://imgt.cines.fr. PMID:14681396

  8. IDPT: Insights into potential intrinsically disordered proteins through transcriptomic analysis of genes for prostate carcinoma epigenetic data.

    PubMed

    Mallik, Saurav; Sen, Sagnik; Maulik, Ujjwal

    2016-07-15

    Involvement of intrinsically disordered proteins (IDPs) with various dreadful diseases like cancer is an interesting research topic. In order to gain novel insights into the regulation of IDPs, in this article, we perform a transcriptomic analysis of mRNAs (genes) for transcripts encoding IDPs on a human multi-omics prostate carcinoma dataset having both gene expression and methylation data. In this regard, firstly the genes that consist of both the expression and methylation data, and that are corresponding to the cancer-related prostate-tissue-specific disordered proteins of MobiDb database, are selected. We apply standard t-test for determining differentially expressed genes as well as differentially methylated genes. A network having these genes and their targeter miRNAs from Diana Tarbase v7.0 database and corresponding Transcription Factors from TRANSFAC and ITFP databases, is then built. Thereafter, we perform literature search, and KEGG pathway and Gene Ontology analyses using DAVID database. Finally, we report several significant potential gene-markers (with the corresponding IDPs) that have inverse relationship between differential expression and methylation patterns, and that are hub genes of the TF-miRNA-gene network. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. SNPdbe: constructing an nsSNP functional impacts database.

    PubMed

    Schaefer, Christian; Meier, Alice; Rost, Burkhard; Bromberg, Yana

    2012-02-15

    Many existing databases annotate experimentally characterized single nucleotide polymorphisms (SNPs). Each non-synonymous SNP (nsSNP) changes one amino acid in the gene product (single amino acid substitution;SAAS). This change can either affect protein function or be neutral in that respect. Most polymorphisms lack experimental annotation of their functional impact. Here, we introduce SNPdbe-SNP database of effects, with predictions of computationally annotated functional impacts of SNPs. Database entries represent nsSNPs in dbSNP and 1000 Genomes collection, as well as variants from UniProt and PMD. SAASs come from >2600 organisms; 'human' being the most prevalent. The impact of each SAAS on protein function is predicted using the SNAP and SIFT algorithms and augmented with experimentally derived function/structure information and disease associations from PMD, OMIM and UniProt. SNPdbe is consistently updated and easily augmented with new sources of information. The database is available as an MySQL dump and via a web front end that allows searches with any combination of organism names, sequences and mutation IDs. http://www.rostlab.org/services/snpdbe.

  10. Protein annotation from protein interaction networks and Gene Ontology.

    PubMed

    Nguyen, Cao D; Gardiner, Katheleen J; Cios, Krzysztof J

    2011-10-01

    We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precision and 60% recall versus 45% and 26% for Majority and 24% and 61% for χ²-statistics, respectively. Copyright © 2011 Elsevier Inc. All rights reserved.

  11. Evidence for a vast peptide overlap between West Nile virus and human proteomes.

    PubMed

    Capone, Giovanni; Pagoni, Maria; Delfino, Antonella Pesce; Kanduc, Darja

    2013-10-01

    The primary amino acid sequence of West Nile virus (WNV) polyprotein, GenBank accession number M12294, was analyzed by computional biology. WNV is a mosquito-borne neurotropic flavivirus that has emerged globally as a significant cause of viral encephalitis in humans. Using pentapeptides as scanning units and the perfect peptide match program from PIR International Protein Sequence Database, we compared the WNV polyprotein and the human proteome. WNV polyprotein showed significant sequence similarities to a number of human proteins. Several of these proteins are involved in embryogenesis, neurite outgrowth, cortical neuron branching, formation of mature synapses, semaphorin interactions, and voltage dependent L-type calcium channel subunits. The biocomputional study suggest that common amino acid segments might represent a potential platform for further studies on the neurological pathophysiology of WNV infections. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. OpenFluDB, a database for human and animal influenza virus

    PubMed Central

    Liechti, Robin; Gleizes, Anne; Kuznetsov, Dmitry; Bougueleret, Lydie; Le Mercier, Philippe; Bairoch, Amos; Xenarios, Ioannis

    2010-01-01

    Although research on influenza lasted for more than 100 years, it is still one of the most prominent diseases causing half a million human deaths every year. With the recent observation of new highly pathogenic H5N1 and H7N7 strains, and the appearance of the influenza pandemic caused by the H1N1 swine-like lineage, a collaborative effort to share observations on the evolution of this virus in both animals and humans has been established. The OpenFlu database (OpenFluDB) is a part of this collaborative effort. It contains genomic and protein sequences, as well as epidemiological data from more than 27 000 isolates. The isolate annotations include virus type, host, geographical location and experimentally tested antiviral resistance. Putative enhanced pathogenicity as well as human adaptation propensity are computed from protein sequences. Each virus isolate can be associated with the laboratories that collected, sequenced and submitted it. Several analysis tools including multiple sequence alignment, phylogenetic analysis and sequence similarity maps enable rapid and efficient mining. The contents of OpenFluDB are supplied by direct user submission, as well as by a daily automatic procedure importing data from public repositories. Additionally, a simple mechanism facilitates the export of OpenFluDB records to GenBank. This resource has been successfully used to rapidly and widely distribute the sequences collected during the recent human swine flu outbreak and also as an exchange platform during the vaccine selection procedure. Database URL: http://openflu.vital-it.ch. PMID:20624713

  13. Exploiting the Proteome to Improve the Genome-Wide Genetic Analysis of Epistasis in Common Human Diseases

    PubMed Central

    Pattin, Kristine A.; Moore, Jason H.

    2009-01-01

    One of the central goals of human genetics is the identification of loci with alleles or genotypes that confer increased susceptibility. The availability of dense maps of single-nucleotide polymorphisms (SNPs) along with high-throughput genotyping technologies has set the stage for routine genome-wide association studies that are expected to significantly improve our ability to identify susceptibility loci. Before this promise can be realized, there are some significant challenges that need to be addressed. We address here the challenge of detecting epistasis or gene-gene interactions in genome-wide association studies. Discovering epistatic interactions in high dimensional datasets remains a challenge due to the computational complexity resulting from the analysis of all possible combinations of SNPs. One potential way to overcome the computational burden of a genome-wide epistasis analysis would be to devise a logical way to prioritize the many SNPs in a dataset so that the data may be analyzed more efficiently and yet still retain important biological information. One of the strongest demonstrations of the functional relationship between genes is protein-protein interaction. Thus, it is plausible that the expert knowledge extracted from protein interaction databases may allow for a more efficient analysis of genome-wide studies as well as facilitate the biological interpretation of the data. In this review we will discuss the challenges of detecting epistasis in genome-wide genetic studies and the means by which we propose to apply expert knowledge extracted from protein interaction databases to facilitate this process. We explore some of the fundamentals of protein interactions and the databases that are publicly available. PMID:18551320

  14. Database resources of the National Center for Biotechnology Information

    PubMed Central

    Sayers, Eric W.; Barrett, Tanya; Benson, Dennis A.; Bolton, Evan; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M.; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J.; Lu, Zhiyong; Madden, Thomas L.; Madej, Tom; Maglott, Donna R.; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D.; Schuler, Gregory D.; Sequeira, Edwin; Sherry, Stephen T.; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A.; Wagner, Lukas; Wang, Yanli; Wilbur, W. John; Yaschenko, Eugene; Ye, Jian

    2012-01-01

    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:22140104

  15. Database resources of the National Center for Biotechnology Information

    PubMed Central

    2013-01-01

    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page. PMID:23193264

  16. HIV Molecular Immunology 2015

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yusim, Karina; Korber, Bette Tina; Brander, Christian

    The scope and purpose of the HIV molecular immunology database: HIV Molecular Immunology is a companion volume to HIV Sequence Compendium. This publication, the 2015 edition, is the PDF version of the web-based HIV Immunology Database (http://www.hiv.lanl.gov/ content/immunology/). The web interface for this relational database has many search options, as well as interactive tools to help immunologists design reagents and interpret their results. In the HIV Immunology Database, HIV-specific B-cell and T-cell responses are summarized and annotated. Immunological responses are divided into three parts, CTL, T helper, and antibody. Within these parts, defined epitopes are organized by protein and bindingmore » sites within each protein, moving from left to right through the coding regions spanning the HIV genome. We include human responses to natural HIV infections, as well as vaccine studies in a range of animal models and human trials. Responses that are not specifically defined, such as responses to whole proteins or monoclonal antibody responses to discontinuous epitopes, are summarized at the end of each protein section. Studies describing general HIV responses to the virus, but not to any specific protein, are included at the end of each part. The annotation includes information such as cross-reactivity, escape mutations, antibody sequence, TCR usage, functional domains that overlap with an epitope, immune response associations with rates of progression and therapy, and how specific epitopes were experimentally defined. Basic information such as HLA specificities for T-cell epitopes, isotypes of monoclonal antibodies, and epitope sequences are included whenever possible. All studies that we can find that incorporate the use of a specific monoclonal antibody are included in the entry for that antibody. A single T-cell epitope can have multiple entries, generally one entry per study. Finally, maps of all defined linear epitopes relative to the HXB2 reference proteins are provided. Alignments of CTL, helper T-cell, and antibody epitopes are available through the search interface on our web site at http:// www.hiv.lanl.gov/content/immunology.« less

  17. Combination of Multiple Spectral Libraries Improves the Current Search Methods Used to Identify Missing Proteins in the Chromosome-Centric Human Proteome Project.

    PubMed

    Cho, Jin-Young; Lee, Hyoung-Joo; Jeong, Seul-Ki; Kim, Kwang-Youl; Kwon, Kyung-Hoon; Yoo, Jong Shin; Omenn, Gilbert S; Baker, Mark S; Hancock, William S; Paik, Young-Ki

    2015-12-04

    Approximately 2.9 billion long base-pair human reference genome sequences are known to encode some 20 000 representative proteins. However, 3000 proteins, that is, ~15% of all proteins, have no or very weak proteomic evidence and are still missing. Missing proteins may be present in rare samples in very low abundance or be only temporarily expressed, causing problems in their detection and protein profiling. In particular, some technical limitations cause missing proteins to remain unassigned. For example, current mass spectrometry techniques have high limits and error rates for the detection of complex biological samples. An insufficient proteome coverage in a reference sequence database and spectral library also raises major issues. Thus, the development of a better strategy that results in greater sensitivity and accuracy in the search for missing proteins is necessary. To this end, we used a new strategy, which combines a reference spectral library search and a simulated spectral library search, to identify missing proteins. We built the human iRefSPL, which contains the original human reference spectral library and additional peptide sequence-spectrum match entries from other species. We also constructed the human simSPL, which contains the simulated spectra of 173 907 human tryptic peptides determined by MassAnalyzer (version 2.3.1). To prove the enhanced analytical performance of the combination of the human iRefSPL and simSPL methods for the identification of missing proteins, we attempted to reanalyze the placental tissue data set (PXD000754). The data from each experiment were analyzed using PeptideProphet, and the results were combined using iProphet. For the quality control, we applied the class-specific false-discovery rate filtering method. All of the results were filtered at a false-discovery rate of <1% at the peptide and protein levels. The quality-controlled results were then cross-checked with the neXtProt DB (2014-09-19 release). The two spectral libraries, iRefSPL and simSPL, were designed to ensure no overlap of the proteome coverage. They were shown to be complementary to spectral library searching and significantly increased the number of matches. From this trial, 12 new missing proteins were identified that passed the following criterion: at least 2 peptides of 7 or more amino acids in length or one of 9 or more amino acids in length with one or more unique sequences. Thus, the iRefSPL and simSPL combination can be used to help identify peptides that have not been detected by conventional sequence database searches with improved sensitivity and a low error rate.

  18. Structural Genomics of Protein Phosphatases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Almo,S.; Bonanno, J.; Sauder, J.

    The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptionalmore » regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.« less

  19. Competing endogenous RNA and interactome bioinformatic analyses on human telomerase.

    PubMed

    Arancio, Walter; Pizzolanti, Giuseppe; Genovese, Swonild Ilenia; Baiamonte, Concetta; Giordano, Carla

    2014-04-01

    We present a classic interactome bioinformatic analysis and a study on competing endogenous (ce) RNAs for hTERT. The hTERT gene codes for the catalytic subunit and limiting component of the human telomerase complex. Human telomerase reverse transcriptase (hTERT) is essential for the integrity of telomeres. Telomere dysfunctions have been widely reported to be involved in aging, cancer, and cellular senescence. The hTERT gene network has been analyzed using the BioGRID interaction database (http://thebiogrid.org/) and related analysis tools such as Osprey (http://biodata.mshri.on.ca/osprey/servlet/Index) and GeneMANIA (http://genemania.org/). The network of interaction of hTERT transcripts has been further analyzed following the competing endogenous (ce) RNA hypotheses (messenger [m] RNAs cross-talk via micro [mi] RNAs) using the miRWalk database and tools (www.ma.uni-heidelberg.de/apps/zmf/mirwalk/). These analyses suggest a role for Akt, nuclear factor-κB (NF-κB), heat shock protein 90 (HSP90), p70/p80 autoantigen, 14-3-3 proteins, and dynein in telomere functions. Roles for histone acetylation/deacetylation and proteoglycan metabolism are also proposed.

  20. Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

    PubMed Central

    Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio

    2004-01-01

    The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394

  1. Comparative bioinformatics analyses and profiling of lysosome-related organelle proteomes

    NASA Astrophysics Data System (ADS)

    Hu, Zhang-Zhi; Valencia, Julio C.; Huang, Hongzhan; Chi, An; Shabanowitz, Jeffrey; Hearing, Vincent J.; Appella, Ettore; Wu, Cathy

    2007-01-01

    Complete and accurate profiling of cellular organelle proteomes, while challenging, is important for the understanding of detailed cellular processes at the organelle level. Mass spectrometry technologies coupled with bioinformatics analysis provide an effective approach for protein identification and functional interpretation of organelle proteomes. In this study, we have compiled human organelle reference datasets from large-scale proteomic studies and protein databases for seven lysosome-related organelles (LROs), as well as the endoplasmic reticulum and mitochondria, for comparative organelle proteome analysis. Heterogeneous sources of human organelle proteins and rodent homologs are mapped to human UniProtKB protein entries based on ID and/or peptide mappings, followed by functional annotation and categorization using the iProXpress proteomic expression analysis system. Cataloging organelle proteomes allows close examination of both shared and unique proteins among various LROs and reveals their functional relevance. The proteomic comparisons show that LROs are a closely related family of organelles. The shared proteins indicate the dynamic and hybrid nature of LROs, while the unique transmembrane proteins may represent additional candidate marker proteins for LROs. This comparative analysis, therefore, provides a basis for hypothesis formulation and experimental validation of organelle proteins and their functional roles.

  2. Proteomic analysis and comparison of the biopsy and autopsy specimen of human brain temporal lobe.

    PubMed

    He, Sizhi; Wang, Qingsong; He, Jintang; Pu, Hai; Yang, Wei; Ji, Jianguo

    2006-09-01

    The proteomic study on human temporal lobe can help us to understand the physiological function of CNS in normal as well as in pathological state. Proteomic tools are potent for the assessment of protein stability post mortem. In this pilot study, the human temporal lobe biopsy specimen with chronic pharmacoresistant temporal lobe epilepsy (TLE) and autopsy specimen in control were separated by 2-DE. Using MALDI-TOF-MS and MS/MS, 375 protein spots were identified which were the products of 267 genes. Six down-regulated and 23 up-regulated protein spots in the autopsy specimen were ascertained after the gel image analysis with the ImageMaster software. A number of proteins that include neurotransmitter metabolic and glycolytic enzymes, cytoprotective proteins and cytoskeleton were found decreased while the precursor of apolipoprotein A-I increased in the TLE brain. We tried several methods to prepare the protein samples and found that DNase and RNase treatment, ultracentrifugation and Amersham clean-up kit purification can improve gel separation quality. This work optimized the sample preparation method and constructed a primary protein database of human temporal lobe and found some proteins with remarkable level change probably involved in the post-mortem process and chronic pharmacoresistant TLE pathogenesis.

  3. The PBII gene of the human salivary proline-rich protein P-B produces another protein, Q504X8, with an opiorphin homolog, QRGPR.

    PubMed

    Saitoh, Eiichi; Sega, Takuya; Imai, Akane; Isemura, Satoko; Kato, Tetsuo; Ochiai, Akihito; Taniguchi, Masayuki

    2018-04-01

    The NCBI gene database and human-transcriptome database for alternative splicing were used to determine the expression of mRNAs for P-B (SMR3B) and variant form of P-B. The translational product from the former mRNA was identified as the protein named P-B, whereas that from the latter has not yet been elucidated. In the present study, we investigated the expression of P-B and its variant form at the protein level. To identify the variant protein of P-B, (1) cationic proteins with a higher isoelectric point in human pooled whole saliva were purified by a two dimensional liquid chromatography; (2) the peptide fragments generated from the in-solution of all proteins digested with trypsin separated and analyzed by MALDI-TOF-MS; and (3) the presence or absence of P-B in individual saliva was examined by 15% SDS-PAGE. The peptide sequences (I 37 PPPYSCTPNMNNCSR 52 , C 53 HHHHKRHHYPCNYCFCYPK 72 , R 59 HHYPCNYCFCYPK 72 and H 60 HYPCNYCFCYPK 72 ) present in the variant protein of P-B were identified. The peptide sequence (G 6 PYPPGPLAPPQPFGPGFVPPPPPPPYGPGR 36 ) in P-B (or the variant) and sequence (I 37 PPPPPAPYGPGIFPPPPPQP 57 ) in P-B were identified. The sum of the sequences identified indicated a 91.23% sequence identity for P-B and 79.76% for the variant. There were cases in which P-B existed in individual saliva, but there were cases in which it did not exist in individual saliva. The variant protein is produced by excising a non-canonical intron (CC-AC pair) from the 3'-noncoding sequence of the PBII gene. Both P-B and the variant are subject to proteolysis in the oral cavity. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. LIFEdb: a database for functional genomics experiments integrating information from external sources, and serving as a sample tracking system

    PubMed Central

    Bannasch, Detlev; Mehrle, Alexander; Glatting, Karl-Heinz; Pepperkok, Rainer; Poustka, Annemarie; Wiemann, Stefan

    2004-01-01

    We have implemented LIFEdb (http://www.dkfz.de/LIFEdb) to link information regarding novel human full-length cDNAs generated and sequenced by the German cDNA Consortium with functional information on the encoded proteins produced in functional genomics and proteomics approaches. The database also serves as a sample-tracking system to manage the process from cDNA to experimental read-out and data interpretation. A web interface enables the scientific community to explore and visualize features of the annotated cDNAs and ORFs combined with experimental results, and thus helps to unravel new features of proteins with as yet unknown functions. PMID:14681468

  5. IMGT, the international ImMunoGeneTics information system®

    PubMed Central

    Lefranc, Marie-Paule; Giudicelli, Véronique; Kaas, Quentin; Duprat, Elodie; Jabado-Michaloud, Joumana; Scaviner, Dominique; Ginestoux, Chantal; Clément, Oliver; Chaume, Denys; Lefranc, Gérard

    2005-01-01

    The international ImMunoGeneTics information system® (IMGT) (http://imgt.cines.fr), created in 1989, by the Laboratoire d'ImmunoGénétique Moléculaire LIGM (Université Montpellier II and CNRS) at Montpellier, France, is a high-quality integrated knowledge resource specializing in the immunoglobulins (IGs), T cell receptors (TRs), major histocompatibility complex (MHC) of human and other vertebrates, and related proteins of the immune systems (RPI) that belong to the immunoglobulin superfamily (IgSF) and to the MHC superfamily (MhcSF). IMGT includes several sequence databases (IMGT/LIGM-DB, IMGT/PRIMER-DB, IMGT/PROTEIN-DB and IMGT/MHC-DB), one genome database (IMGT/GENE-DB) and one three-dimensional (3D) structure database (IMGT/3Dstructure-DB), Web resources comprising 8000 HTML pages (IMGT Marie-Paule page), and interactive tools. IMGT data are expertly annotated according to the rules of the IMGT Scientific chart, based on the IMGT-ONTOLOGY concepts. IMGT tools are particularly useful for the analysis of the IG and TR repertoires in normal physiological and pathological situations. IMGT is used in medical research (autoimmune diseases, infectious diseases, AIDS, leukemias, lymphomas, myelomas), veterinary research, biotechnology related to antibody engineering (phage displays, combinatorial libraries, chimeric, humanized and human antibodies), diagnostics (clonalities, detection and follow up of residual diseases) and therapeutical approaches (graft, immunotherapy and vaccinology). IMGT is freely available at http://imgt.cines.fr. PMID:15608269

  6. Dataset of potential targets for Mycobacterium tuberculosis H37Rv through comparative genome analysis.

    PubMed

    Asif, Siddiqui M; Asad, Amir; Faizan, Ahmad; Anjali, Malik S; Arvind, Arya; Neelesh, Kapoor; Hirdesh, Kumar; Sanjay, Kumar

    2009-12-31

    Mycobacterium tuberculosis is the causative agent of the disease, tuberculosis and H37Rv is the most studied clinical strain. We use comparative genome analysis of Mycobacterium tuberculosis H37Rv and human for the identification of potential targets dataset. We used DEG (Database of Essential Genes) to identify essential genes in the H37Rv strain. The analysis shows that 628 of the 3989 genes in Mycobacterium tuberculosis H37Rv were found to be essential of which 324 genes lack similarity to the human genome. Subsequently hypothetical proteins were removed through manual curation. This further resulted in a dataset of 135 proteins with essential function and no homology to human.

  7. Lectins in human pathogenic fungi.

    PubMed

    Gallegos, Belém; Martínez, Ruth; Pérez, Laura; Del Socorro Pina, María; Perez, Eduardo; Hernández, Pedro

    2014-01-01

    Lectins are carbohydrate-binding proteins widely distributed in nature. They constitute a highly diverse group of proteins consisting of many different protein families that are, in general, structurally unrelated. In the last few years, mushroom and other fungal lectins have attracted wide attention due to their antitumour, antiproliferative and immunomodulatory activities. The present mini-review provides concise information about recent developments in understanding lectins from human pathogenic fungi. A bibliographic search was performed in the Science Direct and PubMed databases, using the following keywords "lectin", "fungi", "human" and "pathogenic". Lectins present in fungi have been classified; however, the role played by lectins derived from human pathogenic fungi in infectious processes remains uncertain; thus, this is a scientific field requiring more research. This manuscript is part of the series of works presented at the "V International Workshop: Molecular genetic approaches to the study of human pathogenic fungi" (Oaxaca, Mexico, 2012). Copyright © 2013 Revista Iberoamericana de Micología. Published by Elsevier Espana. All rights reserved.

  8. Genomic atlas of the human plasma proteome.

    PubMed

    Sun, Benjamin B; Maranville, Joseph C; Peters, James E; Stacey, David; Staley, James R; Blackshaw, James; Burgess, Stephen; Jiang, Tao; Paige, Ellie; Surendran, Praveen; Oliver-Williams, Clare; Kamat, Mihir A; Prins, Bram P; Wilcox, Sheri K; Zimmerman, Erik S; Chi, An; Bansal, Narinder; Spain, Sarah L; Wood, Angela M; Morrell, Nicholas W; Bradley, John R; Janjic, Nebojsa; Roberts, David J; Ouwehand, Willem H; Todd, John A; Soranzo, Nicole; Suhre, Karsten; Paul, Dirk S; Fox, Caroline S; Plenge, Robert M; Danesh, John; Runz, Heiko; Butterworth, Adam S

    2018-06-01

    Although plasma proteins have important roles in biological processes and are the direct targets of many drugs, the genetic factors that control inter-individual variation in plasma protein levels are not well understood. Here we characterize the genetic architecture of the human plasma proteome in healthy blood donors from the INTERVAL study. We identify 1,927 genetic associations with 1,478 proteins, a fourfold increase on existing knowledge, including trans associations for 1,104 proteins. To understand the consequences of perturbations in plasma protein levels, we apply an integrated approach that links genetic variation with biological pathway, disease, and drug databases. We show that protein quantitative trait loci overlap with gene expression quantitative trait loci, as well as with disease-associated loci, and find evidence that protein biomarkers have causal roles in disease using Mendelian randomization analysis. By linking genetic factors to diseases via specific proteins, our analyses highlight potential therapeutic targets, opportunities for matching existing drugs with new disease indications, and potential safety concerns for drugs under development.

  9. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders.

    PubMed

    Hamosh, Ada; Scott, Alan F; Amberger, Joanna; Bocchini, Carol; Valle, David; McKusick, Victor A

    2002-01-01

    Online Mendelian Inheritance in Man (OMIM) is a comprehensive, authoritative and timely knowledgebase of human genes and genetic disorders compiled to support research and education in human genomics and the practice of clinical genetics. Started by Dr Victor A. McKusick as the definitive reference Mendelian Inheritance in Man, OMIM (www.ncbi.nlm.nih.gov/omim) is now distributed electronically by the National Center for Biotechnology Information (NCBI), where it is integrated with the Entrez suite of databases. Derived from the biomedical literature, OMIM is written and edited at Johns Hopkins University with input from scientists and physicians around the world. Each OMIM entry has a full-text summary of a genetically determined phenotype and/or gene and has numerous links to other genetic databases such as DNA and protein sequence, PubMed references, general and locus-specific mutation databases, approved gene nomenclature, and the highly detailed mapviewer, as well as patient support groups and many others. OMIM is an easy and straightforward portal to the burgeoning information in human genetics.

  10. Molecular cloning and expression of the CRISP family of proteins in the boar.

    PubMed

    Vadnais, Melissa L; Foster, Douglas N; Roberts, Kenneth P

    2008-12-01

    The family of mammalian cysteine-rich secretory proteins (CRISP) have been well characterized in the rat, mouse, and human. Here we report the molecular cloning and expression analysis of CRISP1, CRISP2, and CRISP3 in the boar. A partial sequence published in the National Center for Biotechnology Information (NCBI) database was used to derive the full-length sequences for CRISP1 and CRISP2 using rapid amplification of cDNA ends. RT-PCR confirmed the expression of these mRNAs in the boar reproductive tract, and real time RT-PCR showed CRISP1 to be highly expressed throughout the epididymis, with CRISP2 highly expressed in the testis. A search of the porcine genomic sequence in the NCBI database identified a BAC (CH242-199E6) encoding the CRISP1 gene. This BAC is derived from porcine Chromosome 7 and is syntenic with the regions of the mouse, rat, and human genomes encoding the CRISP gene family. This BAC was found to encode a third CRISP protein with a predicted amino acid sequence of high similarity to human CRISP3. Using RT-PCR we show that CRISP3 expression in the boar reproductive tract is confined to the prostate. Recombinant porcine (rp) CRISP2 protein was produced and purified. When incubated with capacitated boar sperm, rpCRISP2 induced an acrosome reaction, consistent with its demonstrated ability to alter the activity of calcium channels.

  11. Quantitative Analysis of Human Salivary Gland-Derived Intact Proteome Using Top-Down Mass Spectrometry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Si; Brown, Joseph N.; Tolic, Nikola

    There are several notable challenges inherent to fully characterizing the entirety of the human saliva proteome using bottom-up approaches, including polymorphic isoforms, post-translational modifications, unique splice variants, deletions, and truncations. To address these challenges, we have developed a top-down based liquid chromatography-mass spectrometry (LC-MS) approach, which cataloged 20 major human salivary proteins with a total of 83 proteoforms, containing a broad range of post-translational modifications. Among these proteins, several previously reported disease biomarker proteins were identified at the intact protein level, such as beta-2 microglobulin (B2M). In addition, intact glycosylated proteoforms of several saliva proteins were also characterized, including intactmore » N-glycosylated protein prolactin inducible protein (PIP) and O-glycosylated acidic protein rich protein (aPRP). These characterized proteoforms constitute an intact saliva proteoform database, which was used for quantitative comparison of intact salivary proteoforms among six healthy individuals. Human parotid (PS) and submandibular/sublingual gland (SMSL) secretion samples (2 μg of protein each) from six healthy individuals were compared using RPLC coupled with the 12T FTICR mass spectrometer. Significantly different protein and PTM patterns were resolved with high reproducibility between PS and SMSL glands. The results from this study provide further insight into the potential mechanisms of PTM pathways in oral glandular secretion, expanding our knowledge of this complex yet easily accessible fluid. Intact protein LC-MS approach presented herein can potentially be applied for rapid and accurate identification of biomarkers from only a few microliters of human glandular saliva.« less

  12. RAID: a comprehensive resource for human RNA-associated (RNA–RNA/RNA–protein) interaction

    PubMed Central

    Zhang, Xiaomeng; Wu, Deng; Chen, Liqun; Li, Xiang; Yang, Jinxurong; Fan, Dandan; Dong, Tingting; Liu, Mingyue; Tan, Puwen; Xu, Jintian; Yi, Ying; Wang, Yuting; Zou, Hua; Hu, Yongfei; Fan, Kaili; Kang, Juanjuan; Huang, Yan; Miao, Zhengqiang; Bi, Miaoman; Jin, Nana; Li, Kongning; Li, Xia; Xu, Jianzhen; Wang, Dong

    2014-01-01

    Transcriptomic analyses have revealed an unexpected complexity in the eukaryote transcriptome, which includes not only protein-coding transcripts but also an expanding catalog of noncoding RNAs (ncRNAs). Diverse coding and noncoding RNAs (ncRNAs) perform functions through interaction with each other in various cellular processes. In this project, we have developed RAID (http://www.rna-society.org/raid), an RNA-associated (RNA–RNA/RNA–protein) interaction database. RAID intends to provide the scientific community with all-in-one resources for efficient browsing and extraction of the RNA-associated interactions in human. This version of RAID contains more than 6100 RNA-associated interactions obtained by manually reviewing more than 2100 published papers, including 4493 RNA–RNA interactions and 1619 RNA–protein interactions. Each entry contains detailed information on an RNA-associated interaction, including RAID ID, RNA/protein symbol, RNA/protein categories, validated method, expressing tissue, literature references (Pubmed IDs), and detailed functional description. Users can query, browse, analyze, and manipulate RNA-associated (RNA–RNA/RNA–protein) interaction. RAID provides a comprehensive resource of human RNA-associated (RNA–RNA/RNA–protein) interaction network. Furthermore, this resource will help in uncovering the generic organizing principles of cellular function network. PMID:24803509

  13. The Protein Information Resource: an integrated public resource of functional annotation of proteins

    PubMed Central

    Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.

    2002-01-01

    The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247

  14. PDB explorer -- a web based algorithm for protein annotation viewer and 3D visualization.

    PubMed

    Nayarisseri, Anuraj; Shardiwal, Rakesh Kumar; Yadav, Mukesh; Kanungo, Neha; Singh, Pooja; Shah, Pratik; Ahmed, Sheaza

    2014-12-01

    The PDB file format, is a text format characterizing the three dimensional structures of macro molecules available in the Protein Data Bank (PDB). Determined protein structure are found in coalition with other molecules or ions such as nucleic acids, water, ions, Drug molecules and so on, which therefore can be described in the PDB format and have been deposited in PDB database. PDB is a machine generated file, it's not human readable format, to read this file we need any computational tool to understand it. The objective of our present study is to develop a free online software for retrieval, visualization and reading of annotation of a protein 3D structure which is available in PDB database. Main aim is to create PDB file in human readable format, i.e., the information in PDB file is converted in readable sentences. It displays all possible information from a PDB file including 3D structure of that file. Programming languages and scripting languages like Perl, CSS, Javascript, Ajax, and HTML have been used for the development of PDB Explorer. The PDB Explorer directly parses the PDB file, calling methods for parsed element secondary structure element, atoms, coordinates etc. PDB Explorer is freely available at http://www.pdbexplorer.eminentbio.com/home with no requirement of log-in.

  15. A Brief Review of RNA–Protein Interaction Database Resources

    PubMed Central

    Yi, Ying; Zhao, Yue; Huang, Yan; Wang, Dong

    2017-01-01

    RNA–Protein interactions play critical roles in various biological processes. By collecting and analyzing the RNA–Protein interactions and binding sites from experiments and predictions, RNA–Protein interaction databases have become an essential resource for the exploration of the transcriptional and post-transcriptional regulatory network. Here, we briefly review several widely used RNA–Protein interaction database resources developed in recent years to provide a guide of these databases. The content and major functions in databases are presented. The brief description of database helps users to quickly choose the database containing information they interested. In short, these RNA–Protein interaction database resources are continually updated, but the current state shows the efforts to identify and analyze the large amount of RNA–Protein interactions. PMID:29657278

  16. Genome databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts inmore » the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.« less

  17. WormQTLHD—a web database for linking human disease to natural variation data in C. elegans

    PubMed Central

    van der Velde, K. Joeri; de Haan, Mark; Zych, Konrad; Arends, Danny; Snoek, L. Basten; Kammenga, Jan E.; Jansen, Ritsert C.; Swertz, Morris A.; Li, Yang

    2014-01-01

    Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism—Caenorhabditis elegans—has been used to produce much molecular quantitative genetics and systems biology data over the past decade. We present WormQTLHD (Human Disease), a database that quantitatively and systematically links expression Quantitative Trait Loci (eQTL) findings in C. elegans to gene–disease associations in man. WormQTLHD, available online at http://www.wormqtl-hd.org, is a user-friendly set of tools to reveal functionally coherent, evolutionary conserved gene networks. These can be used to predict novel gene-to-gene associations and the functions of genes underlying the disease of interest. We created a new database that links C. elegans eQTL data sets to human diseases (34 337 gene–disease associations from OMIM, DGA, GWAS Central and NHGRI GWAS Catalogue) based on overlapping sets of orthologous genes associated to phenotypes in these two species. We utilized QTL results, high-throughput molecular phenotypes, classical phenotypes and genotype data covering different developmental stages and environments from WormQTL database. All software is available as open source, built on MOLGENIS and xQTL workbench. PMID:24217915

  18. WormQTLHD--a web database for linking human disease to natural variation data in C. elegans.

    PubMed

    van der Velde, K Joeri; de Haan, Mark; Zych, Konrad; Arends, Danny; Snoek, L Basten; Kammenga, Jan E; Jansen, Ritsert C; Swertz, Morris A; Li, Yang

    2014-01-01

    Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism-Caenorhabditis elegans-has been used to produce much molecular quantitative genetics and systems biology data over the past decade. We present WormQTL(HD) (Human Disease), a database that quantitatively and systematically links expression Quantitative Trait Loci (eQTL) findings in C. elegans to gene-disease associations in man. WormQTL(HD), available online at http://www.wormqtl-hd.org, is a user-friendly set of tools to reveal functionally coherent, evolutionary conserved gene networks. These can be used to predict novel gene-to-gene associations and the functions of genes underlying the disease of interest. We created a new database that links C. elegans eQTL data sets to human diseases (34 337 gene-disease associations from OMIM, DGA, GWAS Central and NHGRI GWAS Catalogue) based on overlapping sets of orthologous genes associated to phenotypes in these two species. We utilized QTL results, high-throughput molecular phenotypes, classical phenotypes and genotype data covering different developmental stages and environments from WormQTL database. All software is available as open source, built on MOLGENIS and xQTL workbench.

  19. Amyloid precursor protein interaction network in human testis: sentinel proteins for male reproduction.

    PubMed

    Silva, Joana Vieira; Yoon, Sooyeon; Domingues, Sara; Guimarães, Sofia; Goltsev, Alexander V; da Cruz E Silva, Edgar Figueiredo; Mendes, José Fernando F; da Cruz E Silva, Odete Abreu Beirão; Fardilha, Margarida

    2015-01-16

    Amyloid precursor protein (APP) is widely recognized for playing a central role in Alzheimer's disease pathogenesis. Although APP is expressed in several tissues outside the human central nervous system, the functions of APP and its family members in other tissues are still poorly understood. APP is involved in several biological functions which might be potentially important for male fertility, such as cell adhesion, cell motility, signaling, and apoptosis. Furthermore, APP superfamily members are known to be associated with fertility. Knowledge on the protein networks of APP in human testis and spermatozoa will shed light on the function of APP in the male reproductive system. We performed a Yeast Two-Hybrid screen and a database search to study the interaction network of APP in human testis and sperm. To gain insights into the role of APP superfamily members in fertility, the study was extended to APP-like protein 2 (APLP2). We analyzed several topological properties of the APP interaction network and the biological and physiological properties of the proteins in the APP interaction network were also specified by gene ontologyand pathways analyses. We classified significant features related to the human male reproduction for the APP interacting proteins and identified modules of proteins with similar functional roles which may show cooperative behavior for male fertility. The present work provides the first report on the APP interactome in human testis. Our approach allowed the identification of novel interactions and recognition of key APP interacting proteins for male reproduction, particularly in sperm-oocyte interaction.

  20. EADB: An Estrogenic Activity Database for Assessing Potential Endocrine Activity

    EPA Science Inventory

    Endocrine-active chemicals can potentially have adverse effects on both humans and wildlife. They can interfere with the body’s endocrine system through direct or indirect interactions with many protein targets. Estrogen receptors (ERs) are one of the major targets, and many ...

  1. Integration of multiple biological features yields high confidence human protein interactome.

    PubMed

    Karagoz, Kubra; Sevimoglu, Tuba; Arga, Kazim Yalcin

    2016-08-21

    The biological function of a protein is usually determined by its physical interaction with other proteins. Protein-protein interactions (PPIs) are identified through various experimental methods and are stored in curated databases. The noisiness of the existing PPI data is evident, and it is essential that a more reliable data is generated. Furthermore, the selection of a set of PPIs at different confidence levels might be necessary for many studies. Although different methodologies were introduced to evaluate the confidence scores for binary interactions, a highly reliable, almost complete PPI network of Homo sapiens is not proposed yet. The quality and coverage of human protein interactome need to be improved to be used in various disciplines, especially in biomedicine. In the present work, we propose an unsupervised statistical approach to assign confidence scores to PPIs of H. sapiens. To achieve this goal PPI data from six different databases were collected and a total of 295,288 non-redundant interactions between 15,950 proteins were acquired. The present scoring system included the context information that was assigned to PPIs derived from eight biological attributes. A high confidence network, which included 147,923 binary interactions between 13,213 proteins, had scores greater than the cutoff value of 0.80, for which sensitivity, specificity, and coverage were 94.5%, 80.9%, and 82.8%, respectively. We compared the present scoring method with others for evaluation. Reducing the noise inherent in experimental PPIs via our scoring scheme increased the accuracy significantly. As it was demonstrated through the assessment of process and cancer subnetworks, this study allows researchers to construct and analyze context-specific networks via valid PPI sets and one can easily achieve subnetworks around proteins of interest at a specified confidence level. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Unraveling the Web of Viroinformatics: Computational Tools and Databases in Virus Research

    PubMed Central

    Priyadarshini, Pragya; Vrati, Sudhanshu

    2014-01-01

    The beginning of the second century of research in the field of virology (the first virus was discovered in 1898) was marked by its amalgamation with bioinformatics, resulting in the birth of a new domain—viroinformatics. The availability of more than 100 Web servers and databases embracing all or specific viruses (for example, dengue virus, influenza virus, hepatitis virus, human immunodeficiency virus [HIV], hemorrhagic fever virus [HFV], human papillomavirus [HPV], West Nile virus, etc.) as well as distinct applications (comparative/diversity analysis, viral recombination, small interfering RNA [siRNA]/short hairpin RNA [shRNA]/microRNA [miRNA] studies, RNA folding, protein-protein interaction, structural analysis, and phylotyping and genotyping) will definitely aid the development of effective drugs and vaccines. However, information about their access and utility is not available at any single source or on any single platform. Therefore, a compendium of various computational tools and resources dedicated specifically to virology is presented in this article. PMID:25428870

  3. Unraveling the web of viroinformatics: computational tools and databases in virus research.

    PubMed

    Sharma, Deepak; Priyadarshini, Pragya; Vrati, Sudhanshu

    2015-02-01

    The beginning of the second century of research in the field of virology (the first virus was discovered in 1898) was marked by its amalgamation with bioinformatics, resulting in the birth of a new domain--viroinformatics. The availability of more than 100 Web servers and databases embracing all or specific viruses (for example, dengue virus, influenza virus, hepatitis virus, human immunodeficiency virus [HIV], hemorrhagic fever virus [HFV], human papillomavirus [HPV], West Nile virus, etc.) as well as distinct applications (comparative/diversity analysis, viral recombination, small interfering RNA [siRNA]/short hairpin RNA [shRNA]/microRNA [miRNA] studies, RNA folding, protein-protein interaction, structural analysis, and phylotyping and genotyping) will definitely aid the development of effective drugs and vaccines. However, information about their access and utility is not available at any single source or on any single platform. Therefore, a compendium of various computational tools and resources dedicated specifically to virology is presented in this article. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  4. Translational analysis of mouse and human placental protein and mRNA reveals distinct molecular pathologies in human preeclampsia.

    PubMed

    Cox, Brian; Sharma, Parveen; Evangelou, Andreas I; Whiteley, Kathie; Ignatchenko, Vladimir; Ignatchenko, Alex; Baczyk, Dora; Czikk, Marie; Kingdom, John; Rossant, Janet; Gramolini, Anthony O; Adamson, S Lee; Kislinger, Thomas

    2011-12-01

    Preeclampsia (PE) adversely impacts ~5% of pregnancies. Despite extensive research, no consistent biomarkers or cures have emerged, suggesting that different molecular mechanisms may cause clinically similar disease. To address this, we undertook a proteomics study with three main goals: (1) to identify a panel of cell surface markers that distinguish the trophoblast and endothelial cells of the placenta in the mouse; (2) to translate this marker set to human via the Human Protein Atlas database; and (3) to utilize the validated human trophoblast markers to identify subgroups of human preeclampsia. To achieve these goals, plasma membrane proteins at the blood tissue interfaces were extracted from placentas using intravascular silica-bead perfusion, and then identified using shotgun proteomics. We identified 1181 plasma membrane proteins, of which 171 were enriched at the maternal blood-trophoblast interface and 192 at the fetal endothelial interface with a 70% conservation of expression in humans. Three distinct molecular subgroups of human preeclampsia were identified in existing human microarray data by using expression patterns of trophoblast-enriched proteins. Analysis of all misexpressed genes revealed divergent dysfunctions including angiogenesis (subgroup 1), MAPK signaling (subgroup 2), and hormone biosynthesis and metabolism (subgroup 3). Subgroup 2 lacked expected changes in known preeclampsia markers (sFLT1, sENG) and uniquely overexpressed GNA12. In an independent set of 40 banked placental specimens, GNA12 was overexpressed during preeclampsia when co-incident with chronic hypertension. In the current study we used a novel translational analysis to integrate mouse and human trophoblast protein expression with human microarray data. This strategy identified distinct molecular pathologies in human preeclampsia. We conclude that clinically similar preeclampsia patients exhibit divergent placental gene expression profiles thus implicating divergent molecular mechanisms in the origins of this disease.

  5. APADB: a database for alternative polyadenylation and microRNA regulation events

    PubMed Central

    Müller, Sören; Rycak, Lukas; Afonso-Grunz, Fabian; Winter, Peter; Zawada, Adam M.; Damrath, Ewa; Scheider, Jessica; Schmäh, Juliane; Koch, Ina; Kahl, Günter; Rotter, Björn

    2014-01-01

    Alternative polyadenylation (APA) is a widespread mechanism that contributes to the sophisticated dynamics of gene regulation. Approximately 50% of all protein-coding human genes harbor multiple polyadenylation (PA) sites; their selective and combinatorial use gives rise to transcript variants with differing length of their 3′ untranslated region (3′UTR). Shortened variants escape UTR-mediated regulation by microRNAs (miRNAs), especially in cancer, where global 3′UTR shortening accelerates disease progression, dedifferentiation and proliferation. Here we present APADB, a database of vertebrate PA sites determined by 3′ end sequencing, using massive analysis of complementary DNA ends. APADB provides (A)PA sites for coding and non-coding transcripts of human, mouse and chicken genes. For human and mouse, several tissue types, including different cancer specimens, are available. APADB records the loss of predicted miRNA binding sites and visualizes next-generation sequencing reads that support each PA site in a genome browser. The database tables can either be browsed according to organism and tissue or alternatively searched for a gene of interest. APADB is the largest database of APA in human, chicken and mouse. The stored information provides experimental evidence for thousands of PA sites and APA events. APADB combines 3′ end sequencing data with prediction algorithms of miRNA binding sites, allowing to further improve prediction algorithms. Current databases lack correct information about 3′UTR lengths, especially for chicken, and APADB provides necessary information to close this gap. Database URL: http://tools.genxpro.net/apadb/ PMID:25052703

  6. Protein Information Resource: a community resource for expert annotation of protein data

    PubMed Central

    Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy

    2001-01-01

    The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter­national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041

  7. HUNT: launch of a full-length cDNA database from the Helix Research Institute.

    PubMed

    Yudate, H T; Suwa, M; Irie, R; Matsui, H; Nishikawa, T; Nakamura, Y; Yamaguchi, D; Peng, Z Z; Yamamoto, T; Nagai, K; Hayashi, K; Otsuki, T; Sugiyama, T; Ota, T; Suzuki, Y; Sugano, S; Isogai, T; Masuho, Y

    2001-01-01

    The Helix Research Institute (HRI) in Japan is releasing 4356 HUman Novel Transcripts and related information in the newly established HUNT database. The institute is a joint research project principally funded by the Japanese Ministry of International Trade and Industry, and the clones were sequenced in the governmental New Energy and Industrial Technology Development Organization (NEDO) Human cDNA Sequencing Project. The HUNT database contains an extensive amount of annotation from advanced analysis and represents an essential bioinformatics contribution towards understanding of the gene function. The HRI human cDNA clones were obtained from full-length enriched cDNA libraries constructed with the oligo-capping method and have resulted in novel full-length cDNA sequences. A large fraction has little similarity to any proteins of known function and to obtain clues about possible function we have developed original analysis procedures. Any putative function deduced here can be validated or refuted by complementary analysis results. The user can also extract information from specific categories like PROSITE patterns, PFAM domains, PSORT localization, transmembrane helices and clones with GENIUS structure assignments. The HUNT database can be accessed at http://www.hri.co.jp/HUNT.

  8. Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approach.

    PubMed

    Giardine, Belinda; Borg, Joseph; Higgs, Douglas R; Peterson, Kenneth R; Philipsen, Sjaak; Maglott, Donna; Singleton, Belinda K; Anstee, David J; Basak, A Nazli; Clark, Barnaby; Costa, Flavia C; Faustino, Paula; Fedosyuk, Halyna; Felice, Alex E; Francina, Alain; Galanello, Renzo; Gallivan, Monica V E; Georgitsi, Marianthi; Gibbons, Richard J; Giordano, Piero C; Harteveld, Cornelis L; Hoyer, James D; Jarvis, Martin; Joly, Philippe; Kanavakis, Emmanuel; Kollia, Panagoula; Menzel, Stephan; Miller, Webb; Moradkhani, Kamran; Old, John; Papachatzopoulou, Adamantia; Papadakis, Manoussos N; Papadopoulos, Petros; Pavlovic, Sonja; Perseu, Lucia; Radmilovic, Milena; Riemer, Cathy; Satta, Stefania; Schrijver, Iris; Stojiljkovic, Maja; Thein, Swee Lay; Traeger-Synodinos, Jan; Tully, Ray; Wada, Takahito; Waye, John S; Wiemann, Claudia; Zukic, Branka; Chui, David H K; Wajcman, Henri; Hardison, Ross C; Patrinos, George P

    2011-03-20

    We developed a series of interrelated locus-specific databases to store all published and unpublished genetic variation related to hemoglobinopathies and thalassemia and implemented microattribution to encourage submission of unpublished observations of genetic variation to these public repositories. A total of 1,941 unique genetic variants in 37 genes, encoding globins and other erythroid proteins, are currently documented in these databases, with reciprocal attribution of microcitations to data contributors. Our project provides the first example of implementing microattribution to incentivise submission of all known genetic variation in a defined system. It has demonstrably increased the reporting of human variants, leading to a comprehensive online resource for systematically describing human genetic variation in the globin genes and other genes contributing to hemoglobinopathies and thalassemias. The principles established here will serve as a model for other systems and for the analysis of other common and/or complex human genetic diseases.

  9. G6PDdb, an integrated database of glucose-6-phosphate dehydrogenase (G6PD) mutations.

    PubMed

    Kwok, Colin J; Martin, Andrew C R; Au, Shannon W N; Lam, Veronica M S

    2002-03-01

    G6PDdb (http://www.rubic.rdg.ac.uk/g6pd/ or http://www.bioinf.org.uk/g6pd/) is a newly created web-accessible locus-specific mutation database for the human Glucose-6-phosphate dehydrogenase (G6PD) gene. The relational database integrates up-to-date mutational and structural data from various databanks (GenBank, Protein Data Bank, etc.) with biochemically characterized variants and their associated phenotypes obtained from published literature and the Favism website. An automated analysis of the mutations likely to have a significant impact on the structure of the protein has been performed using a recently developed procedure. The database may be queried online and the full results of the analysis of the structural impact of mutations are available. The web page provides a form for submitting additional mutation data and is linked to resources such as the Favism website, OMIM, HGMD, HGVBASE, and the PDB. This database provides insights into the molecular aspects and clinical significance of G6PD deficiency for researchers and clinicians and the web page functions as a knowledge base relevant to the understanding of G6PD deficiency and its management. Copyright 2002 Wiley-Liss, Inc.

  10. Kin-Driver: a database of driver mutations in protein kinases.

    PubMed

    Simonetti, Franco L; Tornador, Cristian; Nabau-Moretó, Nuria; Molina-Vila, Miguel A; Marino-Buslje, Cristina

    2014-01-01

    Somatic mutations in protein kinases (PKs) are frequent driver events in many human tumors, while germ-line mutations are associated with hereditary diseases. Here we present Kin-driver, the first database that compiles driver mutations in PKs with experimental evidence demonstrating their functional role. Kin-driver is a manual expert-curated database that pays special attention to activating mutations (AMs) and can serve as a validation set to develop new generation tools focused on the prediction of gain-of-function driver mutations. It also offers an easy and intuitive environment to facilitate the visualization and analysis of mutations in PKs. Because all mutations are mapped onto a multiple sequence alignment, analogue positions between kinases can be identified and tentative new mutations can be proposed for studying by transferring annotation. Finally, our database can also be of use to clinical and translational laboratories, helping them to identify uncommon AMs that can correlate with response to new antitumor drugs. The website was developed using PHP and JavaScript, which are supported by all major browsers; the database was built using MySQL server. Kin-driver is available at: http://kin-driver.leloir.org.ar/ © The Author(s) 2014. Published by Oxford University Press.

  11. Quantitative Proteomics Identifies Activation of Hallmark Pathways of Cancer in Patient Melanoma.

    PubMed

    Byrum, Stephanie D; Larson, Signe K; Avaritt, Nathan L; Moreland, Linley E; Mackintosh, Samuel G; Cheung, Wang L; Tackett, Alan J

    2013-03-01

    Molecular pathways regulating melanoma initiation and progression are potential targets of therapeutic development for this aggressive cancer. Identification and molecular analysis of these pathways in patients has been primarily restricted to targeted studies on individual proteins. Here, we report the most comprehensive analysis of formalin-fixed paraffin-embedded human melanoma tissues using quantitative proteomics. From 61 patient samples, we identified 171 proteins varying in abundance among benign nevi, primary melanoma, and metastatic melanoma. Seventy-three percent of these proteins were validated by immunohistochemistry staining of malignant melanoma tissues from the Human Protein Atlas database. Our results reveal that molecular pathways involved with tumor cell proliferation, motility, and apoptosis are mis-regulated in melanoma. These data provide the most comprehensive proteome resource on patient melanoma and reveal insight into the molecular mechanisms driving melanoma progression.

  12. A proteomic chronology of gene expression through the cell cycle in human myeloid leukemia cells.

    PubMed

    Ly, Tony; Ahmad, Yasmeen; Shlien, Adam; Soroka, Dominique; Mills, Allie; Emanuele, Michael J; Stratton, Michael R; Lamond, Angus I

    2014-01-01

    Technological advances have enabled the analysis of cellular protein and RNA levels with unprecedented depth and sensitivity, allowing for an unbiased re-evaluation of gene regulation during fundamental biological processes. Here, we have chronicled the dynamics of protein and mRNA expression levels across a minimally perturbed cell cycle in human myeloid leukemia cells using centrifugal elutriation combined with mass spectrometry-based proteomics and RNA-Seq, avoiding artificial synchronization procedures. We identify myeloid-specific gene expression and variations in protein abundance, isoform expression and phosphorylation at different cell cycle stages. We dissect the relationship between protein and mRNA levels for both bulk gene expression and for over ∼6000 genes individually across the cell cycle, revealing complex, gene-specific patterns. This data set, one of the deepest surveys to date of gene expression in human cells, is presented in an online, searchable database, the Encyclopedia of Proteome Dynamics (http://www.peptracker.com/epd/). DOI: http://dx.doi.org/10.7554/eLife.01630.001.

  13. A proteomic chronology of gene expression through the cell cycle in human myeloid leukemia cells

    PubMed Central

    Ly, Tony; Ahmad, Yasmeen; Shlien, Adam; Soroka, Dominique; Mills, Allie; Emanuele, Michael J; Stratton, Michael R; Lamond, Angus I

    2014-01-01

    Technological advances have enabled the analysis of cellular protein and RNA levels with unprecedented depth and sensitivity, allowing for an unbiased re-evaluation of gene regulation during fundamental biological processes. Here, we have chronicled the dynamics of protein and mRNA expression levels across a minimally perturbed cell cycle in human myeloid leukemia cells using centrifugal elutriation combined with mass spectrometry-based proteomics and RNA-Seq, avoiding artificial synchronization procedures. We identify myeloid-specific gene expression and variations in protein abundance, isoform expression and phosphorylation at different cell cycle stages. We dissect the relationship between protein and mRNA levels for both bulk gene expression and for over ∼6000 genes individually across the cell cycle, revealing complex, gene-specific patterns. This data set, one of the deepest surveys to date of gene expression in human cells, is presented in an online, searchable database, the Encyclopedia of Proteome Dynamics (http://www.peptracker.com/epd/). DOI: http://dx.doi.org/10.7554/eLife.01630.001 PMID:24596151

  14. Cytoscape: a software environment for integrated models of biomolecular interaction networks.

    PubMed

    Shannon, Paul; Markiel, Andrew; Ozier, Owen; Baliga, Nitin S; Wang, Jonathan T; Ramage, Daniel; Amin, Nada; Schwikowski, Benno; Ideker, Trey

    2003-11-01

    Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

  15. Heart research advances using database search engines, Human Protein Atlas and the Sydney Heart Bank.

    PubMed

    Li, Amy; Estigoy, Colleen; Raftery, Mark; Cameron, Darryl; Odeberg, Jacob; Pontén, Fredrik; Lal, Sean; Dos Remedios, Cristobal G

    2013-10-01

    This Methodological Review is intended as a guide for research students who may have just discovered a human "novel" cardiac protein, but it may also help hard-pressed reviewers of journal submissions on a "novel" protein reported in an animal model of human heart failure. Whether you are an expert or not, you may know little or nothing about this particular protein of interest. In this review we provide a strategic guide on how to proceed. We ask: How do you discover what has been published (even in an abstract or research report) about this protein? Everyone knows how to undertake literature searches using PubMed and Medline but these are usually encyclopaedic, often producing long lists of papers, most of which are either irrelevant or only vaguely relevant to your query. Relatively few will be aware of more advanced search engines such as Google Scholar and even fewer will know about Quertle. Next, we provide a strategy for discovering if your "novel" protein is expressed in the normal, healthy human heart, and if it is, we show you how to investigate its subcellular location. This can usually be achieved by visiting the website "Human Protein Atlas" without doing a single experiment. Finally, we provide a pathway to discovering if your protein of interest changes its expression level with heart failure/disease or with ageing. Crown Copyright © 2013. Published by Elsevier B.V. All rights reserved.

  16. Global analysis of the rat and human platelet proteome – the molecular blueprint for illustrating multi-functional platelets and cross-species function evolution

    PubMed Central

    Yu, Yanbao; Leng, Taohua; Yun, Dong; Liu, Na; Yao, Jun; Dai, Ying; Yang, Pengyuan; Chen, Xian

    2013-01-01

    Emerging evidences indicate that blood platelets function in multiple biological processes including immune response, bone metastasis and liver regeneration in addition to their known roles in hemostasis and thrombosis. Global elucidation of platelet proteome will provide the molecular base of these platelet functions. Here, we set up a high throughput platform for maximum exploration of the rat/human platelet proteome using integrated proteomics technologies, and then applied to identify the largest number of the proteins expressed in both rat and human platelets. After stringent statistical filtration, a total of 837 unique proteins matched with at least two unique peptides were precisely identified, making it the first comprehensive protein database so far for rat platelets. Meanwhile, quantitative analyses of the thrombin-stimulated platelets offered great insights into the biological functions of platelet proteins and therefore confirmed our global profiling data. A comparative proteomic analysis between rat and human platelets was also conducted, which revealed not only a significant similarity, but also an across-species evolutionary link that the orthologous proteins representing ‘core proteome’, and the ‘evolutionary proteome’ is actually a relatively static proteome. PMID:20443191

  17. Secretome profiling of primary human skeletal muscle cells.

    PubMed

    Hartwig, Sonja; Raschke, Silja; Knebel, Birgit; Scheler, Mika; Irmler, Martin; Passlack, Waltraud; Muller, Stefan; Hanisch, Franz-Georg; Franz, Thomas; Li, Xinping; Dicken, Hans-Dieter; Eckardt, Kristin; Beckers, Johannes; de Angelis, Martin Hrabe; Weigert, Cora; Häring, Hans-Ulrich; Al-Hasani, Hadi; Ouwens, D Margriet; Eckel, Jürgen; Kotzka, Jorg; Lehr, Stefan

    2014-05-01

    The skeletal muscle is a metabolically active tissue that secretes various proteins. These so-called myokines have been proposed to affect muscle physiology and to exert systemic effects on other tissues and organs. Yet, changes in the secretory profile may participate in the pathophysiology of metabolic diseases. The present study aimed at characterizing the secretome of differentiated primary human skeletal muscle cells (hSkMC) derived from healthy, adult donors combining three different mass spectrometry based non-targeted approaches as well as one antibody based method. This led to the identification of 548 non-redundant proteins in conditioned media from hSkmc. For 501 proteins, significant mRNA expression could be demonstrated. Applying stringent consecutive filtering using SignalP, SecretomeP and ER_retention signal databases, 305 proteins were assigned as potential myokines of which 12 proteins containing a secretory signal peptide were not previously described. This comprehensive profiling study of the human skeletal muscle secretome expands our knowledge of the composition of the human myokinome and may contribute to our understanding of the role of myokines in multiple biological processes. This article is part of a Special Issue entitled: Biomarkers: A Proteomic Challenge. © 2013.

  18. Database resources of the National Center for Biotechnology Information.

    PubMed

    Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian

    2011-01-01

    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

  19. Identification and analysis of potential targets in Streptococcus sanguinis using computer aided protein data analysis.

    PubMed

    Chowdhury, Md Rabiul Hossain; Bhuiyan, Md IqbalKaiser; Saha, Ayan; Mosleh, Ivan Mhai; Mondol, Sobuj; Ahmed, C M Sabbir

    2014-01-01

    Streptococcus sanguinis is a Gram-positive, facultative aerobic bacterium that is a member of the viridans streptococcus group. It is found in human mouths in dental plaque, which accounts for both dental cavities and bacterial endocarditis, and which entails a mortality rate of 25%. Although a range of remedial mediators have been found to control this organism, the effectiveness of agents such as penicillin, amoxicillin, trimethoprim-sulfamethoxazole, and erythromycin, was observed. The emphasis of this investigation was on finding substitute and efficient remedial approaches for the total destruction of this bacterium. In this computational study, various databases and online software were used to ascertain some specific targets of S. sanguinis. Particularly, the Kyoto Encyclopedia of Genes and Genomes databases were applied to determine human nonhomologous proteins, as well as the metabolic pathways involved with those proteins. Different software such as Phyre2, CastP, DoGSiteScorer, the Protein Function Predictor server, and STRING were utilized to evaluate the probable active drug binding site with its known function and protein-protein interaction. In this study, among 218 essential proteins of this pathogenic bacterium, 81 nonhomologous proteins were accrued, and 15 proteins that are unique in several metabolic pathways of S. sanguinis were isolated through metabolic pathway analysis. Furthermore, four essentially membrane-bound unique proteins that are involved in distinct metabolic pathways were revealed by this research. Active sites and druggable pockets of these selected proteins were investigated with bioinformatic techniques. In addition, this study also mentions the activity of those proteins, as well as their interactions with the other proteins. Our findings helped to identify the type of protein to be considered as an efficient drug target. This study will pave the way for researchers to develop and discover more effective and specific therapeutic agents against S. sanguinis.

  20. MPIC: a mitochondrial protein import components database for plant and non-plant species.

    PubMed

    Murcha, Monika W; Narsai, Reena; Devenish, James; Kubiszewski-Jakubiak, Szymon; Whelan, James

    2015-01-01

    In the 2 billion years since the endosymbiotic event that gave rise to mitochondria, variations in mitochondrial protein import have evolved across different species. With the genomes of an increasing number of plant species sequenced, it is possible to gain novel insights into mitochondrial protein import pathways. We have generated the Mitochondrial Protein Import Components (MPIC) Database (DB; http://www.plantenergy.uwa.edu.au/applications/mpic) providing searchable information on the protein import apparatus of plant and non-plant mitochondria. An in silico analysis was carried out, comparing the mitochondrial protein import apparatus from 24 species representing various lineages from Saccharomyces cerevisiae (yeast) and algae to Homo sapiens (human) and higher plants, including Arabidopsis thaliana (Arabidopsis), Oryza sativa (rice) and other more recently sequenced plant species. Each of these species was extensively searched and manually assembled for analysis in the MPIC DB. The database presents an interactive diagram in a user-friendly manner, allowing users to select their import component of interest. The MPIC DB presents an extensive resource facilitating detailed investigation of the mitochondrial protein import machinery and allowing patterns of conservation and divergence to be recognized that would otherwise have been missed. To demonstrate the usefulness of the MPIC DB, we present a comparative analysis of the mitochondrial protein import machinery in plants and non-plant species, revealing plant-specific features that have evolved. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  1. The prescribable drugs with efficacy in experimental epilepsies (PDE3) database for drug repurposing research in epilepsy.

    PubMed

    Sivapalarajah, Shayeeshan; Krishnakumar, Mathangi; Bickerstaffe, Harry; Chan, YikYing; Clarkson, Joseph; Hampden-Martin, Alistair; Mirza, Ahmad; Tanti, Matthew; Marson, Anthony; Pirmohamed, Munir; Mirza, Nasir

    2018-02-01

    Current antiepileptic drugs (AEDs) have several shortcomings. For example, they fail to control seizures in 30% of patients. Hence, there is a need to identify new AEDs. Drug repurposing is the discovery of new indications for approved drugs. This drug "recycling" offers the potential of significant savings in the time and cost of drug development. Many drugs licensed for other indications exhibit antiepileptic efficacy in animal models. Our aim was to create a database of "prescribable" drugs, approved for other conditions, with published evidence of efficacy in animal models of epilepsy, and to collate data that would assist in choosing the most promising candidates for drug repurposing. The database was created by the following: (1) computational literature-mining using novel software that identifies Medline abstracts containing the name of a prescribable drug, a rodent model of epilepsy, and a phrase indicating seizure reduction; then (2) crowdsourced manual curation of the identified abstracts. The final database includes 173 drugs and 500 abstracts. It is made freely available at www.liverpool.ac.uk/D3RE/PDE3. The database is reliable: 94% of the included drugs have corroborative evidence of efficacy in animal models (for example, evidence from multiple independent studies). The database includes many drugs that are appealing candidates for repurposing, as they are widely accepted by prescribers and patients-the database includes half of the 20 most commonly prescribed drugs in England-and they target many proteins involved in epilepsy but not targeted by current AEDs. It is important to note that the drugs are of potential relevance to human epilepsy-the database is highly enriched with drugs that target proteins of known causal human epilepsy genes (Fisher's exact test P-value < 3 × 10 -5 ). We present data to help prioritize the most promising candidates for repurposing from the database. The PDE3 database is an important new resource for drug repurposing research in epilepsy. Wiley Periodicals, Inc. © 2018 International League Against Epilepsy.

  2. ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures.

    PubMed

    Konc, Janez; Cesnik, Tomo; Konc, Joanna Trykowska; Penca, Matej; Janežič, Dušanka

    2012-02-27

    ProBiS-Database is a searchable repository of precalculated local structural alignments in proteins detected by the ProBiS algorithm in the Protein Data Bank. Identification of functionally important binding regions of the protein is facilitated by structural similarity scores mapped to the query protein structure. PDB structures that have been aligned with a query protein may be rapidly retrieved from the ProBiS-Database, which is thus able to generate hypotheses concerning the roles of uncharacterized proteins. Presented with uncharacterized protein structure, ProBiS-Database can discern relationships between such a query protein and other better known proteins in the PDB. Fast access and a user-friendly graphical interface promote easy exploration of this database of over 420 million local structural alignments. The ProBiS-Database is updated weekly and is freely available online at http://probis.cmm.ki.si/database.

  3. A HUPO test sample study reveals common problems in mass spectrometry-based proteomics

    PubMed Central

    Bell, Alexander W.; Deutsch, Eric W.; Au, Catherine E.; Kearney, Robert E.; Beavis, Ron; Sechi, Salvatore; Nilsson, Tommy; Bergeron, John J.M.

    2009-01-01

    We carried out a test sample study to try to identify errors leading to irreproducibility, including incompleteness of peptide sampling, in LC-MS-based proteomics. We distributed a test sample consisting of an equimolar mix of 20 highly purified recombinant human proteins, to 27 laboratories for identification. Each protein contained one or more unique tryptic peptides of 1250 Da to also test for ion selection and sampling in the mass spectrometer. Of the 27 labs, initially only 7 labs reported all 20 proteins correctly, and only 1 lab reported all the tryptic peptides of 1250 Da. Nevertheless, a subsequent centralized analysis of the raw data revealed that all 20 proteins and most of the 1250 Da peptides had in fact been detected by all 27 labs. The centralized analysis allowed us to determine sources of problems encountered in the study, which include missed identifications (false negatives), environmental contamination, database matching, and curation of protein identifications. Improved search engines and databases are likely to increase the fidelity of mass spectrometry-based proteomics. PMID:19448641

  4. NovelFam3000 – Uncharacterized human protein domains conserved across model organisms

    PubMed Central

    Kemmer, Danielle; Podowski, Raf M; Arenillas, David; Lim, Jonathan; Hodges, Emily; Roth, Peggy; Sonnhammer, Erik LL; Höög, Christer; Wasserman, Wyeth W

    2006-01-01

    Background Despite significant efforts from the research community, an extensive portion of the proteins encoded by human genes lack an assigned cellular function. Most metazoan proteins are composed of structural and/or functional domains, of which many appear in multiple proteins. Once a domain is characterized in one protein, the presence of a similar sequence in an uncharacterized protein serves as a basis for inference of function. Thus knowledge of a domain's function, or the protein within which it arises, can facilitate the analysis of an entire set of proteins. Description From the Pfam domain database, we extracted uncharacterized protein domains represented in proteins from humans, worms, and flies. A data centre was created to facilitate the analysis of the uncharacterized domain-containing proteins. The centre both provides researchers with links to dispersed internet resources containing gene-specific experimental data and enables them to post relevant experimental results or comments. For each human gene in the system, a characterization score is posted, allowing users to track the progress of characterization over time or to identify for study uncharacterized domains in well-characterized genes. As a test of the system, a subset of 39 domains was selected for analysis and the experimental results posted to the NovelFam3000 system. For 25 human protein members of these 39 domain families, detailed sub-cellular localizations were determined. Specific observations are presented based on the analysis of the integrated information provided through the online NovelFam3000 system. Conclusion Consistent experimental results between multiple members of a domain family allow for inferences of the domain's functional role. We unite bioinformatics resources and experimental data in order to accelerate the functional characterization of scarcely annotated domain families. PMID:16533400

  5. Identification and analysis of potential targets in Streptococcus sanguinis using computer aided protein data analysis

    PubMed Central

    Chowdhury, Md Rabiul Hossain; Bhuiyan, Md IqbalKaiser; Saha, Ayan; Mosleh, Ivan MHAI; Mondol, Sobuj; Ahmed, C M Sabbir

    2014-01-01

    Purpose Streptococcus sanguinis is a Gram-positive, facultative aerobic bacterium that is a member of the viridans streptococcus group. It is found in human mouths in dental plaque, which accounts for both dental cavities and bacterial endocarditis, and which entails a mortality rate of 25%. Although a range of remedial mediators have been found to control this organism, the effectiveness of agents such as penicillin, amoxicillin, trimethoprim–sulfamethoxazole, and erythromycin, was observed. The emphasis of this investigation was on finding substitute and efficient remedial approaches for the total destruction of this bacterium. Materials and methods In this computational study, various databases and online software were used to ascertain some specific targets of S. sanguinis. Particularly, the Kyoto Encyclopedia of Genes and Genomes databases were applied to determine human nonhomologous proteins, as well as the metabolic pathways involved with those proteins. Different software such as Phyre2, CastP, DoGSiteScorer, the Protein Function Predictor server, and STRING were utilized to evaluate the probable active drug binding site with its known function and protein–protein interaction. Results In this study, among 218 essential proteins of this pathogenic bacterium, 81 nonhomologous proteins were accrued, and 15 proteins that are unique in several metabolic pathways of S. sanguinis were isolated through metabolic pathway analysis. Furthermore, four essentially membrane-bound unique proteins that are involved in distinct metabolic pathways were revealed by this research. Active sites and druggable pockets of these selected proteins were investigated with bioinformatic techniques. In addition, this study also mentions the activity of those proteins, as well as their interactions with the other proteins. Conclusion Our findings helped to identify the type of protein to be considered as an efficient drug target. This study will pave the way for researchers to develop and discover more effective and specific therapeutic agents against S. sanguinis. PMID:25473301

  6. Identification of StARD3 as a lutein-binding protein in the macula of the primate retina.

    PubMed

    Li, Binxing; Vachali, Preejith; Frederick, Jeanne M; Bernstein, Paul S

    2011-04-05

    Lutein, zeaxanthin, and their metabolites are the xanthophyll carotenoids that form the macular pigment of the human retina. Epidemiological evidence suggests that high levels of these carotenoids in the diet, serum, and macula are associated with a decreased risk of age-related macular degeneration (AMD), and the AREDS2 study is prospectively testing this hypothesis. Understanding the biochemical mechanisms underlying the selective uptakes of lutein and zeaxanthin into the human macula may provide important insights into the physiology of the human macula in health and disease. GSTP1 is the macular zeaxanthin-binding protein, but the identity of the human macular lutein-binding protein has remained elusive. Prior identification of the silkworm lutein-binding protein (CBP) as a member of the steroidogenic acute regulatory domain (StARD) protein family and selective labeling of monkey photoreceptor inner segments with an anti-CBP antibody provided an important clue for identifying the primate retina lutein-binding protein. The homology of CBP with all 15 human StARD proteins was analyzed using database searches, Western blotting, and immunohistochemistry, and we here provide evidence to identify StARD3 (also known as MLN64) as a human retinal lutein-binding protein. Antibody to StARD3, N-62 StAR, localizes to all neurons of monkey macular retina and especially cone inner segments and axons, but does not colocalize with the Müller cell marker, glutamine synthetase. Further, recombinant StARD3 selectively binds lutein with high affinity (K(D) = 0.45 μM) when assessed by surface plasmon resonance (SPR) binding assays. Our results demonstrate previously unrecognized, specific interactions of StARD3 with lutein and provide novel avenues for exploring its roles in human macular physiology and disease.

  7. Reconstruction of the experimentally supported human protein interactome: what can we learn?

    PubMed

    Klapa, Maria I; Tsafou, Kalliopi; Theodoridis, Evangelos; Tsakalidis, Athanasios; Moschonas, Nicholas K

    2013-10-02

    Understanding the topology and dynamics of the human protein-protein interaction (PPI) network will significantly contribute to biomedical research, therefore its systematic reconstruction is required. Several meta-databases integrate source PPI datasets, but the protein node sets of their networks vary depending on the PPI data combined. Due to this inherent heterogeneity, the way in which the human PPI network expands via multiple dataset integration has not been comprehensively analyzed. We aim at assembling the human interactome in a global structured way and exploring it to gain insights of biological relevance. First, we defined the UniProtKB manually reviewed human "complete" proteome as the reference protein-node set and then we mined five major source PPI datasets for direct PPIs exclusively between the reference proteins. We updated the protein and publication identifiers and normalized all PPIs to the UniProt identifier level. The reconstructed interactome covers approximately 60% of the human proteome and has a scale-free structure. No apparent differentiating gene functional classification characteristics were identified for the unrepresented proteins. The source dataset integration augments the network mainly in PPIs. Polyubiquitin emerged as the highest-degree node, but the inclusion of most of its identified PPIs may be reconsidered. The high number (>300) of connections of the subsequent fifteen proteins correlates well with their essential biological role. According to the power-law network structure, the unrepresented proteins should mainly have up to four connections with equally poorly-connected interactors. Reconstructing the human interactome based on the a priori definition of the protein nodes enabled us to identify the currently included part of the human "complete" proteome, and discuss the role of the proteins within the network topology with respect to their function. As the network expansion has to comply with the scale-free theory, we suggest that the core of the human interactome has essentially emerged. Thus, it could be employed in systems biology and biomedical research, despite the considerable number of currently unrepresented proteins. The latter are probably involved in specialized physiological conditions, justifying the scarcity of related PPI information, and their identification can assist in designing relevant functional experiments and targeted text mining algorithms.

  8. Choosing an Optimal Database for Protein Identification from Tandem Mass Spectrometry Data.

    PubMed

    Kumar, Dhirendra; Yadav, Amit Kumar; Dash, Debasis

    2017-01-01

    Database searching is the preferred method for protein identification from digital spectra of mass to charge ratios (m/z) detected for protein samples through mass spectrometers. The search database is one of the major influencing factors in discovering proteins present in the sample and thus in deriving biological conclusions. In most cases the choice of search database is arbitrary. Here we describe common search databases used in proteomic studies and their impact on final list of identified proteins. We also elaborate upon factors like composition and size of the search database that can influence the protein identification process. In conclusion, we suggest that choice of the database depends on the type of inferences to be derived from proteomics data. However, making additional efforts to build a compact and concise database for a targeted question should generally be rewarding in achieving confident protein identifications.

  9. An on-line database for human milk composition in China.

    PubMed

    Yin, Shi-An; Yang, Zhen-Yu

    2016-12-01

    Understanding human milk composition is critical for setting nutrient recommended intakes (RNIs) for both infants and lactating women. However, nationwide human milk composition remains unavailable in China. Through cross-sectional study, human milk samples from 11 provinces in China were collected and their compositions were analyzed. Nutritional and health status of the lactating women and their infants were evaluated through questionnaire, physical examination and biochemical indicators. A total of 6,481 breast milk samples including colostrum (1,859), transitional milk (1,235) and mature milk (3,387) were collected. Contents of protein, fat, lactose, total solid and energy of more than 4,500 samples were analyzed using a human milk analyzer. About 2,000 samples were randomly selected for 24 mineral analyses. Free B-vitamins including thiamin, riboflavin, pyridoxal, pyridomine, pyridoxamine, nicotinamide, nicotinic acid, flavin adenine dinucleotide (FAD), biotin and pantothenic acid were analyzed in 1,800 samples. Amino acids (~800) and proteins (alpha-lactoalbumin, beta-casein, and lactoferrin) were analyzed. In addition, serum retinol and carotenoids, 25(OH)D, vitamin B-12, folic acid, ferritin and biochemical indicators (n=1,200 to 2,000) were analysed in the lactating women who provided the breast milk. Ongoing work: Fatty acids (C4-C24), fatsoluble vitamins and carotenoids, are on-going analysis. A regional breast milk compositional database is at an advanced stage of development in China with the intention that it be available on-line.

  10. Construction and Screening of a Lentiviral Secretome Library.

    PubMed

    Liu, Tao; Jia, Panpan; Ma, Huailei; Reed, Sean A; Luo, Xiaozhou; Larman, H Benjamin; Schultz, Peter G

    2017-06-22

    Over 2,000 human proteins are predicted to be secreted, but the biological function of the many of these proteins is still unknown. Moreover, a number of these proteins may act as new therapeutic agents or be targets for the development of therapeutic antibodies. To further explore the extracellular proteome, we have developed a secretome-enriched open reading frame (ORF) library that can be readily screened for autocrine activity in cell-based phenotypic or reporter assays. Next-generation sequencing (NGS) and database analysis predict that the library contains approximately 900 ORFs encoding known secreted proteins (accounting for 77.8% of the library), as well as genes encoding potentially unknown secreted proteins. In a proof-of-principle study, human TF-1 cells were screened for proliferative factors, and the known cytokine GMCSF was identified as a dominant hit. This library offers a relatively low-cost and straightforward approach for functional autocrine screens of secreted proteins. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Classification of ligand molecules in PDB with fast heuristic graph match algorithm COMPLIG.

    PubMed

    Saito, Mihoko; Takemura, Naomi; Shirai, Tsuyoshi

    2012-12-14

    A fast heuristic graph-matching algorithm, COMPLIG, was devised to classify the small-molecule ligands in the Protein Data Bank (PDB), which are currently not properly classified on structure basis. By concurrently classifying proteins and ligands, we determined the most appropriate parameter for categorizing ligands to be more than 60% identity of atoms and bonds between molecules, and we classified 11,585 types of ligands into 1946 clusters. Although the large clusters were composed of nucleotides or amino acids, a significant presence of drug compounds was also observed. Application of the system to classify the natural ligand status of human proteins in the current database suggested that, at most, 37% of the experimental structures of human proteins were in complex with natural ligands. However, protein homology- and/or ligand similarity-based modeling was implied to provide models of natural interactions for an additional 28% of the total, which might be used to increase the knowledge of intrinsic protein-metabolite interactions. Copyright © 2012 Elsevier Ltd. All rights reserved.

  12. Identification of osteosarcoma-related specific proteins in serum samples using surface-enhanced laser desorption/ionization-time-of-flight mass spectrometry.

    PubMed

    Gu, Jianli; Li, Jitian; Huang, Manyu; Zhang, Zhiyong; Li, Dongsheng; Song, Guoying; Ding, Xingpo; Li, Wuyin

    2014-01-01

    Osteosarcoma (OS) is the most common malignant bone tumor. To identify OS-related specific proteins for early diagnosis of OS, a novel approach, surface-enhanced laser desorption/ionization-time-of-flight mass spectrometry (SELDI-TOF-MS) to serum samples from 25 OS patients, 16 osteochondroma, and 26 age-matched normal human volunteers as controls, was performed. Two proteins showed a significantly different expression in OS serum samples from control groups. Proteomic profiles and external leave-one-out cross-validation analysis showed that the correct rate of allocation, the sensitivity, and the specificity of diagnosis were 100%. These two proteins were further identified by searching the EPO-KB database, and one of the proteins identified as Serine rich region profile is involved in various cellular signaling cascades and tumor genesis. The presence of these two proteins in OS patients but absence from premalignant and normal human controls implied that they can be potential biomarkers for early diagnosis of OS.

  13. Global analysis of host-pathogen interactions that regulate early stage HIV-1 replication

    PubMed Central

    König, Renate; Zhou, Yingyao; Elleder, Daniel; Diamond, Tracy L.; Bonamy, Ghislain M.C.; Irelan, Jeffrey T.; Chiang, Chih-yuan; Tu, Buu P.; De Jesus, Paul D.; Lilley, Caroline E.; Seidel, Shannon; Opaluch, Amanda M.; Caldwell, Jeremy S.; Weitzman, Matthew D.; Kuhen, Kelli L.; Bandyopadhyay, Sourav; Ideker, Trey; Orth, Anthony P.; Miraglia, Loren J.; Bushman, Frederic D.; Young, John A.; Chanda, Sumit K.

    2008-01-01

    Human Immunodeficiency Viruses (HIV-1 and HIV-2) rely upon host-encoded proteins to facilitate their replication. Here we combined genome-wide siRNA analyses with interrogation of human interactome databases to assemble a host-pathogen biochemical network containing 213 confirmed host cellular factors and 11 HIV-1-encoded proteins. Protein complexes that regulate ubiquitin conjugation, proteolysis, DNA damage response and RNA splicing were identified as important modulators of early stage HIV-1 infection. Additionally, over 40 new factors were shown to specifically influence initiation and/or kinetics of HIV-1 DNA synthesis, including cytoskeletal regulatory proteins, modulators of post-translational modification, and nucleic acid binding proteins. Finally, fifteen proteins with diverse functional roles, including nuclear transport, prostaglandin synthesis, ubiquitination, and transcription, were found to influence nuclear import or viral DNA integration. Taken together, the multi-scale approach described here has uncovered multiprotein virus-host interactions that likely act in concert to facilitate early steps of HIV-1 infection. PMID:18854154

  14. Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information.

    PubMed

    Li, Min; Li, Wenkai; Wu, Fang-Xiang; Pan, Yi; Wang, Jianxin

    2018-06-14

    Essential proteins are important participants in various life activities and play a vital role in the survival and reproduction of living organisms. Identification of essential proteins from protein-protein interaction (PPI) networks has great significance to facilitate the study of human complex diseases, the design of drugs and the development of bioinformatics and computational science. Studies have shown that highly connected proteins in a PPI network tend to be essential. A series of computational methods have been proposed to identify essential proteins by analyzing topological structures of PPI networks. However, the high noise in the PPI data can degrade the accuracy of essential protein prediction. Moreover, proteins must be located in the appropriate subcellular localization to perform their functions, and only when the proteins are located in the same subcellular localization, it is possible that they can interact with each other. In this paper, we propose a new network-based essential protein discovery method based on sub-network partition and prioritization by integrating subcellular localization information, named SPP. The proposed method SPP was tested on two different yeast PPI networks obtained from DIP database and BioGRID database. The experimental results show that SPP can effectively reduce the effect of false positives in PPI networks and predict essential proteins more accurately compared with other existing computational methods DC, BC, CC, SC, EC, IC, NC. Copyright © 2018 Elsevier Ltd. All rights reserved.

  15. Human epidermis is a novel site of phospholipase B expression.

    PubMed

    Maury, Eric; Prévost, Marie Claude; Nauze, Michel; Redoulès, Daniel; Tarroux, Roger; Charvéron, Marie; Salles, Jean Pierre; Perret, Bertrand; Chap, Hugues; Gassama-Diagne, Ama

    2002-07-12

    Phospholipase B (PLB) is an enzyme that displays both phospholipase A(2) and lysophospholipase activities. Analysis of human epidermis homogenates indicated the presence of a 97 kDa PLB protein, as well as a phospholipase A(2) activity, both being enriched in the soluble fraction. Immunolabelling and in situ hybridization experiments showed that this enzyme is expressed in the different layers of epidermis with an accumulation at the dermo-epidermis junction. RT-PCR data indicated that PLB is specifically expressed in natural and reconstructed epidermis. By 3'-RACE-PCR and screening of human genome databases, we obtained a 3600 bp cDNA coding for human PLB highly homologous to already described intestinal brush border PLBs. These data led us to conclude that the soluble PLB corresponds to a proteolytic cleavage of the membrane anchored protein. Altogether, our results provide the first characterization of human PLB which should play an important role in epidermal barrier function.

  16. GEneSTATION 1.0: a synthetic resource of diverse evolutionary and functional genomic data for studying the evolution of pregnancy-associated tissues and phenotypes

    PubMed Central

    Kim, Mara; Cooper, Brian A.; Venkat, Rohit; Phillips, Julie B.; Eidem, Haley R.; Hirbo, Jibril; Nutakki, Sashank; Williams, Scott M.; Muglia, Louis J.; Capra, J. Anthony; Petren, Kenneth; Abbot, Patrick; Rokas, Antonis; McGary, Kriston L.

    2016-01-01

    Mammalian gestation and pregnancy are fast evolving processes that involve the interaction of the fetal, maternal and paternal genomes. Version 1.0 of the GEneSTATION database (http://genestation.org) integrates diverse types of omics data across mammals to advance understanding of the genetic basis of gestation and pregnancy-associated phenotypes and to accelerate the translation of discoveries from model organisms to humans. GEneSTATION is built using tools from the Generic Model Organism Database project, including the biology-aware database CHADO, new tools for rapid data integration, and algorithms that streamline synthesis and user access. GEneSTATION contains curated life history information on pregnancy and reproduction from 23 high-quality mammalian genomes. For every human gene, GEneSTATION contains diverse evolutionary (e.g. gene age, population genetic and molecular evolutionary statistics), organismal (e.g. tissue-specific gene and protein expression, differential gene expression, disease phenotype), and molecular data types (e.g. Gene Ontology Annotation, protein interactions), as well as links to many general (e.g. Entrez, PubMed) and pregnancy disease-specific (e.g. PTBgene, dbPTB) databases. By facilitating the synthesis of diverse functional and evolutionary data in pregnancy-associated tissues and phenotypes and enabling their quick, intuitive, accurate and customized meta-analysis, GEneSTATION provides a novel platform for comprehensive investigation of the function and evolution of mammalian pregnancy. PMID:26567549

  17. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    PubMed

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. © The Author(s) 2016. Published by Oxford University Press.

  18. Systematic analysis of human kinase genes: a large number of genes and alternative splicing events result in functional and structural diversity

    PubMed Central

    Milanesi, Luciano; Petrillo, Mauro; Sepe, Leandra; Boccia, Angelo; D'Agostino, Nunzio; Passamano, Myriam; Di Nardo, Salvatore; Tasco, Gianluca; Casadio, Rita; Paolella, Giovanni

    2005-01-01

    Background Protein kinases are a well defined family of proteins, characterized by the presence of a common kinase catalytic domain and playing a significant role in many important cellular processes, such as proliferation, maintenance of cell shape, apoptosys. In many members of the family, additional non-kinase domains contribute further specialization, resulting in subcellular localization, protein binding and regulation of activity, among others. About 500 genes encode members of the kinase family in the human genome, and although many of them represent well known genes, a larger number of genes code for proteins of more recent identification, or for unknown proteins identified as kinase only after computational studies. Results A systematic in silico study performed on the human genome, led to the identification of 5 genes, on chromosome 1, 11, 13, 15 and 16 respectively, and 1 pseudogene on chromosome X; some of these genes are reported as kinases from NCBI but are absent in other databases, such as KinBase. Comparative analysis of 483 gene regions and subsequent computational analysis, aimed at identifying unannotated exons, indicates that a large number of kinase may code for alternately spliced forms or be incorrectly annotated. An InterProScan automated analysis was perfomed to study domain distribution and combination in the various families. At the same time, other structural features were also added to the annotation process, including the putative presence of transmembrane alpha helices, and the cystein propensity to participate into a disulfide bridge. Conclusion The predicted human kinome was extended by identifiying both additional genes and potential splice variants, resulting in a varied panorama where functionality may be searched at the gene and protein level. Structural analysis of kinase proteins domains as defined in multiple sources together with transmembrane alpha helices and signal peptide prediction provides hints to function assignment. The results of the human kinome analysis are collected in the KinWeb database, available for browsing and searching over the internet, where all results from the comparative analysis and the gene structure annotation are made available, alongside the domain information. Kinases may be searched by domain combinations and the relative genes may be viewed in a graphic browser at various level of magnification up to gene organization on the full chromosome set. PMID:16351747

  19. RAID: a comprehensive resource for human RNA-associated (RNA-RNA/RNA-protein) interaction.

    PubMed

    Zhang, Xiaomeng; Wu, Deng; Chen, Liqun; Li, Xiang; Yang, Jinxurong; Fan, Dandan; Dong, Tingting; Liu, Mingyue; Tan, Puwen; Xu, Jintian; Yi, Ying; Wang, Yuting; Zou, Hua; Hu, Yongfei; Fan, Kaili; Kang, Juanjuan; Huang, Yan; Miao, Zhengqiang; Bi, Miaoman; Jin, Nana; Li, Kongning; Li, Xia; Xu, Jianzhen; Wang, Dong

    2014-07-01

    Transcriptomic analyses have revealed an unexpected complexity in the eukaryote transcriptome, which includes not only protein-coding transcripts but also an expanding catalog of noncoding RNAs (ncRNAs). Diverse coding and noncoding RNAs (ncRNAs) perform functions through interaction with each other in various cellular processes. In this project, we have developed RAID (http://www.rna-society.org/raid), an RNA-associated (RNA-RNA/RNA-protein) interaction database. RAID intends to provide the scientific community with all-in-one resources for efficient browsing and extraction of the RNA-associated interactions in human. This version of RAID contains more than 6100 RNA-associated interactions obtained by manually reviewing more than 2100 published papers, including 4493 RNA-RNA interactions and 1619 RNA-protein interactions. Each entry contains detailed information on an RNA-associated interaction, including RAID ID, RNA/protein symbol, RNA/protein categories, validated method, expressing tissue, literature references (Pubmed IDs), and detailed functional description. Users can query, browse, analyze, and manipulate RNA-associated (RNA-RNA/RNA-protein) interaction. RAID provides a comprehensive resource of human RNA-associated (RNA-RNA/RNA-protein) interaction network. Furthermore, this resource will help in uncovering the generic organizing principles of cellular function network. © 2014 Zhang et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  20. The Human Cell Surfaceome of Breast Tumors

    PubMed Central

    da Cunha, Júlia Pinheiro Chagas; Galante, Pedro Alexandre Favoretto; de Souza, Jorge Estefano Santana; Pieprzyk, Martin; Carraro, Dirce Maria; Old, Lloyd J.; Camargo, Anamaria Aranha; de Souza, Sandro José

    2013-01-01

    Introduction. Cell surface proteins are ideal targets for cancer therapy and diagnosis. We have identified a set of more than 3700 genes that code for transmembrane proteins believed to be at human cell surface. Methods. We used a high-throuput qPCR system for the analysis of 573 cell surface protein-coding genes in 12 primary breast tumors, 8 breast cell lines, and 21 normal human tissues including breast. To better understand the role of these genes in breast tumors, we used a series of bioinformatics strategies to integrates different type, of the datasets, such as KEGG, protein-protein interaction databases, ONCOMINE, and data from, literature. Results. We found that at least 77 genes are overexpressed in breast primary tumors while at least 2 of them have also a restricted expression pattern in normal tissues. We found common signaling pathways that may be regulated in breast tumors through the overexpression of these cell surface protein-coding genes. Furthermore, a comparison was made between the genes found in this report and other genes associated with features clinically relevant for breast tumorigenesis. Conclusions. The expression profiling generated in this study, together with an integrative bioinformatics analysis, allowed us to identify putative targets for breast tumors. PMID:24195083

  1. Highly Disordered Proteins in Prostate Cancer.

    PubMed

    Uversky, Vladimir N; Na, Insung; Landau, Kevin S; Schenck, Ryan O

    2017-01-01

    Prostate cancer is one of the major threats to the man's health. There are several mechanisms of the prostate cancer development characterized by the involvement of various androgen-related and androgen-unrelated factors in prostate cancer pathogenesis and in the metastatic carcinogenesis of prostate. In all these processes, proteins play various important roles, and the KEGG database has information on 88 human proteins experimentally shown to be involved in prostate cancer. It is known that many proteins associated with different human maladies are intrinsically disordered (i.e., they do not have stable secondary and/or tertiary structure in their unbound states). The goal of this review is to consider several highly disordered proteins known to be associated with the prostate cancer pathogenesis in order to better understand the roles of disordered proteins in this disease. We also hope that consideration of the pathology-related proteins from the perspective of intrinsic disorder can potentially lead to future experimental studies of these proteins to find novel pathways associated with prostate cancer. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. [In silico identification of molecular mimicry between T-cell epitopes of Neisseria meningitidis B and the human proteome].

    PubMed

    Batista-Duharte, Alexander; Téllez, Bruno; Tamayo, Maybia; Portuondo, Deivys; Cabrera, Osmir; Sierra, Gustavo; Pérez, Oliver

    2013-07-01

    The objective of the study was to determine the T-cell epitopes of four of the most frequent antigenic proteins of the outer membrane of Neisseria meningitidis B, and to identify the most relevant sites for molecular mimicry with T-cell epitopes in humans. In order to do so, an in silico study -a type of study that uses bioinformatic tools- was carried out using SWISS-PROT/TrEMBL, SYFPEITHI and FASTA databases, which helped to determine the protein sequences, CD4 and CD8 T-cell epitope prediction, as well as the molecular mimicry with humans, respectively. Molecular similarity was found in several human proteins present in different organs and tissues such as: liver, skin and epithelial tissues, brain, lymphatic system and testicles. Of these, those found in testicles were more similar, showing the highest frequency of mimetic sequences. This finding shed light on the success of N. meningitidis B to colonize human tissues and the failure of certain vaccines against this bacterium, and it even helps to explain possible autoimmune reactions associated with the infection or vaccination.

  3. Wikidata as a semantic framework for the Gene Wiki initiative.

    PubMed

    Burgstaller-Muehlbacher, Sebastian; Waagmeester, Andra; Mitraka, Elvira; Turner, Julia; Putman, Tim; Leong, Justin; Naik, Chinmay; Pavlidis, Paul; Schriml, Lynn; Good, Benjamin M; Su, Andrew I

    2016-01-01

    Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia. In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes have been imported from NCBI and 27,306 human proteins and 16,728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike. The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified. Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias. Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists. In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web. Database URL: https://www.wikidata.org/. © The Author(s) 2016. Published by Oxford University Press.

  4. SelTarbase, a database of human mononucleotide-microsatellite mutations and their potential impact to tumorigenesis and immunology

    PubMed Central

    Woerner, Stefan M.; Yuan, Yan P.; Benner, Axel; Korff, Sebastian; von Knebel Doeberitz, Magnus; Bork, Peer

    2010-01-01

    About 15% of human colorectal cancers and, at varying degrees, other tumor entities as well as nearly all tumors related to Lynch syndrome are hallmarked by microsatellite instability (MSI) as a result of a defective mismatch repair system. The functional impact of resulting mutations depends on their genomic localization. Alterations within coding mononucleotide repeat tracts (MNRs) can lead to protein truncation and formation of neopeptides, whereas alterations within untranslated MNRs can alter transcription level or transcript stability. These mutations may provide selective advantage or disadvantage to affected cells. They may further concern the biology of microsatellite unstable cells, e.g. by generating immunogenic peptides induced by frameshifts mutations. The Selective Targets database (http://www.seltarbase.org) is a curated database of a growing number of public MNR mutation data in microsatellite unstable human tumors. Regression calculations for various MSI–H tumor entities indicating statistically deviant mutation frequencies predict TGFBR2, BAX, ACVR2A and others that are shown or highly suspected to be involved in MSI tumorigenesis. Many useful tools for further analyzing genomic DNA, derived wild-type and mutated cDNAs and peptides are integrated. A comprehensive database of all human coding, untranslated, non-coding RNA- and intronic MNRs (MNR_ensembl) is also included. Herewith, SelTarbase presents as a plenty instrument for MSI-carcinogenesis-related research, diagnostics and therapy. PMID:19820113

  5. Human Single-Chain Fv Immunoconjugates Targeted to a Melanoma-Associated Chondroitin Sulfate Proteoglycan Mediate Specific Lysis of Human Melanoma Cells by Natural Killer Cells and Complement

    NASA Astrophysics Data System (ADS)

    Wang, Baiyang; Chen, Yi-Bin; Ayalon, Oran; Bender, Jeffrey; Garen, Alan

    1999-02-01

    Two antimelanoma immunoconjugates containing a human single-chain Fv (scFv) targeting domain conjugated to the Fc effector domain of human IgG1 were synthesized as secreted two-chain molecules in Chinese hamster ovary and Drosophila S2 cells, and purified by affinity chromatography on protein A. The scFv targeting domains originally were isolated as melanoma-specific clones from a scFv fusion-phage library, derived from the antibody repertoire of a vaccinated melanoma patient. The purified immunoconjugates showed similar binding specificity as did the fusion-phage clones. Binding occurred to human melanoma cells but not to human melanocytes or to several other types of normal cells and tumor cells. A 250-kDa melanoma protein was immunoprecipitated by the immunoconjugates and analyzed by mass spectrometry, using two independent procedures. A screen of protein sequence databases showed an exact match of several peptide masses between the immunoprecipitated protein and the core protein of a chondroitin sulfate proteoglycan, which is expressed on the surface of most human melanoma cells. The Fc effector domain of the immunoconjugates binds natural killer (NK) cells and also the C1q protein that initiates the complement cascade; both NK cells and complement can activate powerful cytolytic responses against the targeted tumor cells. An in vitro cytolysis assay was used to test for an immunoconjugate-dependent specific cytolytic response against cultured human melanoma cells by NK cells and complement. The melanoma cells, but not the human fibroblast cells used as the control, were efficiently lysed by both NK cells and complement in the presence of the immunoconjugates. The in vitro results suggest that the immunoconjugates also could activate a specific cytolytic immune response against melanoma tumors in vivo.

  6. Comprehensive analysis of the N-glycan biosynthetic pathway using bioinformatics to generate UniCorn: A theoretical N-glycan structure database.

    PubMed

    Akune, Yukie; Lin, Chi-Hung; Abrahams, Jodie L; Zhang, Jingyu; Packer, Nicolle H; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P

    2016-08-05

    Glycan structures attached to proteins are comprised of diverse monosaccharide sequences and linkages that are produced from precursor nucleotide-sugars by a series of glycosyltransferases. Databases of these structures are an essential resource for the interpretation of analytical data and the development of bioinformatics tools. However, with no template to predict what structures are possible the human glycan structure databases are incomplete and rely heavily on the curation of published, experimentally determined, glycan structure data. In this work, a library of 45 human glycosyltransferases was used to generate a theoretical database of N-glycan structures comprised of 15 or less monosaccharide residues. Enzyme specificities were sourced from major online databases including Kyoto Encyclopedia of Genes and Genomes (KEGG) Glycan, Consortium for Functional Glycomics (CFG), Carbohydrate-Active enZymes (CAZy), GlycoGene DataBase (GGDB) and BRENDA. Based on the known activities, more than 1.1 million theoretical structures and 4.7 million synthetic reactions were generated and stored in our database called UniCorn. Furthermore, we analyzed the differences between the predicted glycan structures in UniCorn and those contained in UniCarbKB (www.unicarbkb.org), a database which stores experimentally described glycan structures reported in the literature, and demonstrate that UniCorn can be used to aid in the assignment of ambiguous structures whilst also serving as a discovery database. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Comparison of human cell signaling pathway databases—evolution, drawbacks and challenges

    PubMed Central

    Chowdhury, Saikat; Sarkar, Ram Rup

    2015-01-01

    Elucidating the complexities of cell signaling pathways is of immense importance to gain understanding about various biological phenomenon, such as dynamics of gene/protein expression regulation, cell fate determination, embryogenesis and disease progression. The successful completion of human genome project has also helped experimental and theoretical biologists to analyze various important pathways. To advance this study, during the past two decades, systematic collections of pathway data from experimental studies have been compiled and distributed freely by several databases, which also integrate various computational tools for further analysis. Despite significant advancements, there exist several drawbacks and challenges, such as pathway data heterogeneity, annotation, regular update and automated image reconstructions, which motivated us to perform a thorough review on popular and actively functioning 24 cell signaling databases. Based on two major characteristics, pathway information and technical details, freely accessible data from commercial and academic databases are examined to understand their evolution and enrichment. This review not only helps to identify some novel and useful features, which are not yet included in any of the databases but also highlights their current limitations and subsequently propose the reasonable solutions for future database development, which could be useful to the whole scientific community. PMID:25632107

  8. KMeyeDB: a graphical database of mutations in genes that cause eye diseases.

    PubMed

    Kawamura, Takashi; Ohtsubo, Masafumi; Mitsuyama, Susumu; Ohno-Nakamura, Saho; Shimizu, Nobuyoshi; Minoshima, Shinsei

    2010-06-01

    KMeyeDB (http://mutview.dmb.med.keio.ac.jp/) is a database of human gene mutations that cause eye diseases. We have substantially enriched the amount of data in the database, which now contains information about the mutations of 167 human genes causing eye-related diseases including retinitis pigmentosa, cone-rod dystrophy, night blindness, Oguchi disease, Stargardt disease, macular degeneration, Leber congenital amaurosis, corneal dystrophy, cataract, glaucoma, retinoblastoma, Bardet-Biedl syndrome, and Usher syndrome. KMeyeDB is operated using the database software MutationView, which deals with various characters of mutations, gene structure, protein functional domains, and polymerase chain reaction (PCR) primers, as well as clinical data for each case. Users can access the database using an ordinary Internet browser with smooth user-interface, without user registration. The results are displayed on the graphical windows together with statistical calculations. All mutations and associated data have been collected from published articles. Careful data analysis with KMeyeDB revealed many interesting features regarding the mutations in 167 genes that cause 326 different types of eye diseases. Some genes are involved in multiple types of eye diseases, whereas several eye diseases are caused by different mutations in one gene.

  9. Role of PELP1 in EGFR-ER Signaling Crosstalk in Ovarian Cancer Cells

    DTIC Science & Technology

    2009-04-01

    expression of genes involved in metastasis using a focused microarray approach. We have used Human Tumor Metastasis Microarray (Oligo GE array from...ovarian cancer progression. Analysis of human genome databases and SAGE data suggested deregulation of PELP1 expression in ovarian cancer cells...PI3K, and STAT3 in the cytosol. PELP1/MNAR regulates meiosis via its interactions with heterotimeric Gbc protein, androgen receptor (AR), and by

  10. Human Disease Insight: An integrated knowledge-based platform for disease-gene-drug information.

    PubMed

    Tasleem, Munazzah; Ishrat, Romana; Islam, Asimul; Ahmad, Faizan; Hassan, Md Imtaiyaz

    2016-01-01

    The scope of the Human Disease Insight (HDI) database is not limited to researchers or physicians as it also provides basic information to non-professionals and creates disease awareness, thereby reducing the chances of patient suffering due to ignorance. HDI is a knowledge-based resource providing information on human diseases to both scientists and the general public. Here, our mission is to provide a comprehensive human disease database containing most of the available useful information, with extensive cross-referencing. HDI is a knowledge management system that acts as a central hub to access information about human diseases and associated drugs and genes. In addition, HDI contains well-classified bioinformatics tools with helpful descriptions. These integrated bioinformatics tools enable researchers to annotate disease-specific genes and perform protein analysis, search for biomarkers and identify potential vaccine candidates. Eventually, these tools will facilitate the analysis of disease-associated data. The HDI provides two types of search capabilities and includes provisions for downloading, uploading and searching disease/gene/drug-related information. The logistical design of the HDI allows for regular updating. The database is designed to work best with Mozilla Firefox and Google Chrome and is freely accessible at http://humandiseaseinsight.com. Copyright © 2015 King Saud Bin Abdulaziz University for Health Sciences. Published by Elsevier Ltd. All rights reserved.

  11. The Halophile protein database.

    PubMed

    Sharma, Naveen; Farooqi, Mohammad Samir; Chaturvedi, Krishna Kumar; Lal, Shashi Bhushan; Grover, Monendra; Rai, Anil; Pandey, Pankaj

    2014-01-01

    Halophilic archaea/bacteria adapt to different salt concentration, namely extreme, moderate and low. These type of adaptations may occur as a result of modification of protein structure and other changes in different cell organelles. Thus proteins may play an important role in the adaptation of halophilic archaea/bacteria to saline conditions. The Halophile protein database (HProtDB) is a systematic attempt to document the biochemical and biophysical properties of proteins from halophilic archaea/bacteria which may be involved in adaptation of these organisms to saline conditions. In this database, various physicochemical properties such as molecular weight, theoretical pI, amino acid composition, atomic composition, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (Gravy) have been listed. These physicochemical properties play an important role in identifying the protein structure, bonding pattern and function of the specific proteins. This database is comprehensive, manually curated, non-redundant catalogue of proteins. The database currently contains 59 897 proteins properties extracted from 21 different strains of halophilic archaea/bacteria. The database can be accessed through link. Database URL: http://webapp.cabgrid.res.in/protein/ © The Author(s) 2014. Published by Oxford University Press.

  12. Gaining knowledge from previously unexplained spectra-application of the PTM-Explorer software to detect PTM in HUPO BPP MS/MS data.

    PubMed

    Chamrad, Daniel C; Körting, Gerhard; Schäfer, Heike; Stephan, Christian; Thiele, Herbert; Apweiler, Rolf; Meyer, Helmut E; Marcus, Katrin; Blüggel, Martin

    2006-09-01

    A novel software tool named PTM-Explorer has been applied to LC-MS/MS datasets acquired within the Human Proteome Organisation (HUPO) Brain Proteome Project (BPP). PTM-Explorer enables automatic identification of peptide MS/MS spectra that were not explained in typical sequence database searches. The main focus was detection of PTMs, but PTM-Explorer detects also unspecific peptide cleavage, mass measurement errors, experimental modifications, amino acid substitutions, transpeptidation products and unknown mass shifts. To avoid a combinatorial problem the search is restricted to a set of selected protein sequences, which stem from previous protein identifications using a common sequence database search. Prior to application to the HUPO BPP data, PTM-Explorer was evaluated on excellently manually characterized and evaluated LC-MS/MS data sets from Alpha-A-Crystallin gel spots obtained from mouse eye lens. Besides various PTMs including phosphorylation, a wealth of experimental modifications and unspecific cleavage products were successfully detected, completing the primary structure information of the measured proteins. Our results indicate that a large amount of MS/MS spectra that currently remain unidentified in standard database searches contain valuable information that can only be elucidated using suitable software tools.

  13. Identification of learning and memory genes in canine; promoter investigation and determining the selective pressure.

    PubMed

    Seifi Moroudi, Reihane; Masoudi, Ali Akbar; Vaez Torshizi, Rasoul; Zandi, Mohammad

    2014-12-01

    One of the important behaviors of dogs is trainability which is affected by learning and memory genes. These kinds of the genes have not yet been identified in dogs. In the current research, these genes were found in animal models by mining the biological data and scientific literatures. The proteins of these genes were obtained from the UniProt database in dogs and humans. Not all homologous proteins perform similar functions, thus comparison of these proteins was studied in terms of protein families, domains, biological processes, molecular functions, and cellular location of metabolic pathways in Interpro, KEGG, Quick Go and Psort databases. The results showed that some of these proteins have the same performance in the rat or mouse, dog, and human. It is anticipated that the protein of these genes may be effective in learning and memory in dogs. Then, the expression pattern of the recognized genes was investigated in the dog hippocampus using the existing information in the GEO profile. The results showed that BDNF, TAC1 and CCK genes are expressed in the dog hippocampus, therefore, these genes could be strong candidates associated with learning and memory in dogs. Subsequently, due to the importance of the promoter regions in gene function, this region was investigated in the above genes. Analysis of the promoter indicated that the HNF-4 site of BDNF gene and the transcription start site of CCK gene is exposed to methylation. Phylogenetic analysis of protein sequences of these genes showed high similarity in each of these three genes among the studied species. The dN/dS ratio for BDNF, TAC1 and CCK genes indicates a purifying selection during the evolution of the genes.

  14. Differential protein expression in alligator leukocytes in response to bacterial lipopolysaccharide injection.

    PubMed

    Merchant, Mark; Kinney, Clint; Sanders, Paige

    2009-12-01

    Blood was collected from three juvenile alligators (Alligator mississippiensis) before, and again 24h after, injection with bacterial lipopolysaccharide (LPS). The leukocytes were collected from both samples, and the proteins were extracted. Each group of proteins was labeled with a different fluorescent dye and the differences in protein expression were analyzed by two dimensional differential in-gel expressions (2D-DIGE). The proteins which appeared to be increased or decreased by treatment with LPS were selected and analyzed by MALDI-TOF to determine mass and LC-MS/MS to acquire the partial protein sequences. The peptide sequences were compared to the NCBI protein sequence database to determine homology with other sequences from other species. Several proteins of interest appeared to be increased upon LPS stimulation. Proteins with homology to human transgelin-2, fish glucose-6-phosphate dehydrogenase, amphibian α-enolase, alligator lactate dehydrogenase, fish ubiquitin-activating enzyme, and fungal β-tubulin were also increased after LPS injection. Proteins with homology to fish vimentin 4, murine heterogeneous nuclear ribonucleoprotein A3, and avian calreticulin were found to be decreased in response to LPS. In addition, five proteins, four of which were up-regulated (827, 560, 512, and 650%) and one that exhibited repressed expression (307%), did not show homology to any protein in the database, and thus may represent newly discovered proteins. We are using this biochemical approach to isolate and characterize alligator proteins with potential relevant immune function.

  15. Topological and organizational properties of the products of house-keeping and tissue-specific genes in protein-protein interaction networks.

    PubMed

    Lin, Wen-Hsien; Liu, Wei-Chung; Hwang, Ming-Jing

    2009-03-11

    Human cells of various tissue types differ greatly in morphology despite having the same set of genetic information. Some genes are expressed in all cell types to perform house-keeping functions, while some are selectively expressed to perform tissue-specific functions. In this study, we wished to elucidate how proteins encoded by human house-keeping genes and tissue-specific genes are organized in human protein-protein interaction networks. We constructed protein-protein interaction networks for different tissue types using two gene expression datasets and one protein-protein interaction database. We then calculated three network indices of topological importance, the degree, closeness, and betweenness centralities, to measure the network position of proteins encoded by house-keeping and tissue-specific genes, and quantified their local connectivity structure. Compared to a random selection of proteins, house-keeping gene-encoded proteins tended to have a greater number of directly interacting neighbors and occupy network positions in several shortest paths of interaction between protein pairs, whereas tissue-specific gene-encoded proteins did not. In addition, house-keeping gene-encoded proteins tended to connect with other house-keeping gene-encoded proteins in all tissue types, whereas tissue-specific gene-encoded proteins also tended to connect with other tissue-specific gene-encoded proteins, but only in approximately half of the tissue types examined. Our analysis showed that house-keeping gene-encoded proteins tend to occupy important network positions, while those encoded by tissue-specific genes do not. The biological implications of our findings were discussed and we proposed a hypothesis regarding how cells organize their protein tools in protein-protein interaction networks. Our results led us to speculate that house-keeping gene-encoded proteins might form a core in human protein-protein interaction networks, while clusters of tissue-specific gene-encoded proteins are attached to the core at more peripheral positions of the networks.

  16. Proteomic analysis of secreted proteins by human bronchial epithelial cells in response to cadmium toxicity.

    PubMed

    Chen, De-Ju; Xu, Yan-Ming; Zheng, Wei; Huang, Dong-Yang; Wong, Wing-Yan; Tai, William Chi-Shing; Cho, Yong-Yeon; Lau, Andy T Y

    2015-09-01

    For years, many studies have been conducted to investigate the intracellular response of cells challenged with toxic metal(s), yet, the corresponding secretome responses, especially in human lung cells, are largely unexplored. Here, we provide a secretome analysis of human bronchial epithelial cells (BEAS-2B) treated with cadmium chloride (CdCl2 ), with the aim of identifying secreted proteins in response to Cd toxicity. Proteins from control and spent media were separated by two-dimensional electrophoresis and visualized by silver staining. Differentially-secreted proteins were identified by MALDI-TOF-MS analysis and database searching. We characterized, for the first time, the extracellular proteome changes of BEAS-2B dosed with Cd. Our results unveiled that Cd treatment led to the marked upregulation of molecular chaperones, antioxidant enzymes, enzymes associated with glutathione metabolic process, proteins involved in cellular energy metabolism, as well as tumor-suppressors. Pretreatment of cells with the thiol antioxidant glutathione before Cd treatment effectively abrogated the secretion of these proteins and prevented cell death. Taken together, our results demonstrate that Cd causes oxidative stress-induced cytotoxicity; and the differentially-secreted protein signatures could be considered as targets for potential use as extracellular biomarkers upon Cd exposure. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Proteomic analysis of human follicular fluid associated with successful in vitro fertilization.

    PubMed

    Shen, Xiaofang; Liu, Xin; Zhu, Peng; Zhang, Yuhua; Wang, Jiahui; Wang, Yanwei; Wang, Wenting; Liu, Juan; Li, Ning; Liu, Fujun

    2017-07-27

    Human follicular fluid (HFF) provides a key environment for follicle development and oocyte maturation, and contributes to oocyte quality and in vitro fertilization (IVF) outcome. To better understand folliculogenesis in the ovary, a proteomic strategy based on dual reverse phase high performance liquid chromatography (RP-HPLC) coupled to matrix-assisted laser desorption/ionization time-of-flight tandem mass spectrometry (LC-MALDI TOF/TOF MS) was used to investigate the protein profile of HFF from women undergoing successful IVF. A total of 219 unique high-confidence (False Discovery Rate (FDR) < 0.01) HFF proteins were identified by searching the reviewed Swiss-Prot human database (20,183 sequences), and MS data were further verified by western blot. PANTHER showed HFF proteins were involved in complement and coagulation cascade, growth factor and hormone, immunity, and transportation, KEGG indicated their pathway, and STRING demonstrated their interaction networks. In comparison, 32% and 50% of proteins have not been reported in previous human follicular fluid and plasma. Our HFF proteome research provided a new complementary high-confidence dataset of folliculogenesis and oocyte maturation environment. Those proteins associated with innate immunity, complement cascade, blood coagulation, and angiogenesis might serve as the biomarkers of female infertility and IVF outcome, and their pathways facilitated a complete exhibition of reproductive process.

  18. Mining disease genes using integrated protein-protein interaction and gene-gene co-regulation information.

    PubMed

    Li, Jin; Wang, Limei; Guo, Maozu; Zhang, Ruijie; Dai, Qiguo; Liu, Xiaoyan; Wang, Chunyu; Teng, Zhixia; Xuan, Ping; Zhang, Mingming

    2015-01-01

    In humans, despite the rapid increase in disease-associated gene discovery, a large proportion of disease-associated genes are still unknown. Many network-based approaches have been used to prioritize disease genes. Many networks, such as the protein-protein interaction (PPI), KEGG, and gene co-expression networks, have been used. Expression quantitative trait loci (eQTLs) have been successfully applied for the determination of genes associated with several diseases. In this study, we constructed an eQTL-based gene-gene co-regulation network (GGCRN) and used it to mine for disease genes. We adopted the random walk with restart (RWR) algorithm to mine for genes associated with Alzheimer disease. Compared to the Human Protein Reference Database (HPRD) PPI network alone, the integrated HPRD PPI and GGCRN networks provided faster convergence and revealed new disease-related genes. Therefore, using the RWR algorithm for integrated PPI and GGCRN is an effective method for disease-associated gene mining.

  19. Specific identification of Bacillus anthracis strains

    NASA Astrophysics Data System (ADS)

    Krishnamurthy, Thaiya; Deshpande, Samir; Hewel, Johannes; Liu, Hongbin; Wick, Charles H.; Yates, John R., III

    2007-01-01

    Accurate identification of human pathogens is the initial vital step in treating the civilian terrorism victims and military personnel afflicted in biological threat situations. We have applied a powerful multi-dimensional protein identification technology (MudPIT) along with newly generated software termed Profiler to identify the sequences of specific proteins observed for few strains of Bacillus anthracis, a human pathogen. Software termed Profiler was created to initially screen the MudPIT data of B. anthracis strains and establish the observed proteins specific for its strains. A database was also generated using Profiler containing marker proteins of B. anthracis and its strains, which in turn could be used for detecting the organism and its corresponding strains in samples. Analysis of the unknowns by our methodology, combining MudPIT and Profiler, led to the accurate identification of the anthracis strains present in samples. Thus, a new approach for the identification of B. anthracis strains in unknown samples, based on the molecular mass and sequences of marker proteins, has been ascertained.

  20. Systematic Identification and Characterization of Novel Human Skin-Associated Genes Encoding Membrane and Secreted Proteins

    PubMed Central

    Buhren, Bettina Alexandra; Martinez, Cynthia; Schrumpf, Holger; Gasis, Marcia; Grether-Beck, Susanne; Krutmann, Jean

    2013-01-01

    Through bioinformatics analyses of a human gene expression database representing 105 different tissues and cell types, we identified 687 skin-associated genes that are selectively and highly expressed in human skin. Over 50 of these represent uncharacterized genes not previously associated with skin and include a subset that encode novel secreted and plasma membrane proteins. The high levels of skin-associated expression for eight of these novel therapeutic target genes were confirmed by semi-quantitative real time PCR, western blot and immunohistochemical analyses of normal skin and skin-derived cell lines. Four of these are expressed specifically by epidermal keratinocytes; two that encode G-protein-coupled receptors (GPR87 and GPR115), and two that encode secreted proteins (WFDC5 and SERPINB7). Further analyses using cytokine-activated and terminally differentiated human primary keratinocytes or a panel of common inflammatory, autoimmune or malignant skin diseases revealed distinct patterns of regulation as well as disease associations that point to important roles in cutaneous homeostasis and disease. Some of these novel uncharacterized skin genes may represent potential biomarkers or drug targets for the development of future diagnostics or therapeutics. PMID:23840300

  1. The Nuclear Protein Database (NPD): sub-nuclear localisation and functional annotation of the nuclear proteome

    PubMed Central

    Dellaire, G.; Farrall, R.; Bickmore, W.A.

    2003-01-01

    The Nuclear Protein Database (NPD) is a curated database that contains information on more than 1300 vertebrate proteins that are thought, or are known, to localise to the cell nucleus. Each entry is annotated with information on predicted protein size and isoelectric point, as well as any repeats, motifs or domains within the protein sequence. In addition, information on the sub-nuclear localisation of each protein is provided and the biological and molecular functions are described using Gene Ontology (GO) terms. The database is searchable by keyword, protein name, sub-nuclear compartment and protein domain/motif. Links to other databases are provided (e.g. Entrez, SWISS-PROT, OMIM, PubMed, PubMed Central). Thus, NPD provides a gateway through which the nuclear proteome may be explored. The database can be accessed at http://npd.hgu.mrc.ac.uk and is updated monthly. PMID:12520015

  2. Complex network theory for the identification and assessment of candidate protein targets.

    PubMed

    McGarry, Ken; McDonald, Sharon

    2018-06-01

    In this work we use complex network theory to provide a statistical model of the connectivity patterns of human proteins and their interaction partners. Our intention is to identify important proteins that may be predisposed to be potential candidates as drug targets for therapeutic interventions. Target proteins usually have more interaction partners than non-target proteins, but there are no hard-and-fast rules for defining the actual number of interactions. We devise a statistical measure for identifying hub proteins, we score our target proteins with gene ontology annotations. The important druggable protein targets are likely to have similar biological functions that can be assessed for their potential therapeutic value. Our system provides a statistical analysis of the local and distant neighborhood protein interactions of the potential targets using complex network measures. This approach builds a more accurate model of drug-to-target activity and therefore the likely impact on treating diseases. We integrate high quality protein interaction data from the HINT database and disease associated proteins from the DrugTarget database. Other sources include biological knowledge from Gene Ontology and drug information from DrugBank. The problem is a very challenging one since the data is highly imbalanced between target proteins and the more numerous nontargets. We use undersampling on the training data and build Random Forest classifier models which are used to identify previously unclassified target proteins. We validate and corroborate these findings from the available literature. Copyright © 2018 Elsevier Ltd. All rights reserved.

  3. Systematic Differences in Signal Emitting and Receiving Revealed by PageRank Analysis of a Human Protein Interactome

    PubMed Central

    Li, Xiu-Qing

    2012-01-01

    Most protein PageRank studies do not use signal flow direction information in protein interactions because this information was not readily available in large protein databases until recently. Therefore, four questions have yet to be answered: A) What is the general difference between signal emitting and receiving in a protein interactome? B) Which proteins are among the top ranked in directional ranking? C) Are high ranked proteins more evolutionarily conserved than low ranked ones? D) Do proteins with similar ranking tend to have similar subcellular locations? In this study, we address these questions using the forward, reverse, and non-directional PageRank approaches to rank an information-directional network of human proteins and study their evolutionary conservation. The forward ranking gives credit to information receivers, reverse ranking to information emitters, and non-directional ranking mainly to the number of interactions. The protein lists generated by the forward and non-directional rankings are highly correlated, but those by the reverse and non-directional rankings are not. The results suggest that the signal emitting/receiving system is characterized by key-emittings and relatively even receivings in the human protein interactome. Signaling pathway proteins are frequent in top ranked ones. Eight proteins are both informational top emitters and top receivers. Top ranked proteins, except a few species-related novel-function ones, are evolutionarily well conserved. Protein-subunit ranking position reflects subunit function. These results demonstrate the usefulness of different PageRank approaches in characterizing protein networks and provide insights to protein interaction in the cell. PMID:23028653

  4. Histone Code Modulation by Oncogenic PWWP-Domain Protein in Breast Cancers

    DTIC Science & Technology

    2014-08-01

    discs, the Drosophila melanogaster homo- logue of human retinoblastoma binding protein 2. Genetics 2000; 156: 645-663. [10] Zeng J, Ge Z, Wang L...in breast cancer patients (7-11). Earlier, we used genomic analysis of copy number and gene expression to perform a detailed analysis of the 8p11-12...from the 8p11-12 region (14). Very recently, we searched the Cancer Genome Atlas database that contains 744 breast invasive carcinomas. We found DNA or

  5. Motoring through: the role of kinesin superfamily proteins in female meiosis.

    PubMed

    Camlin, Nicole J; McLaughlin, Eileen A; Holt, Janet E

    2017-07-01

    The kinesin motor protein family consists of 14 distinct subclasses and 45 kinesin proteins in humans. A large number of these proteins, or their orthologues, have been shown to possess essential function(s) in both the mitotic and the meiotic cell cycle. Kinesins have important roles in chromosome separation, microtubule dynamics, spindle formation, cytokinesis and cell cycle progression. This article contains a review of the literature with respect to the role of kinesin motor proteins in female meiosis in model species. Throughout, we discuss the function of each class of kinesin proteins during oocyte meiosis, and where such data are not available their role in mitosis is considered. Finally, the review highlights the potential clinical importance of this family of proteins for human oocyte quality. To examine the role of kinesin motor proteins in oocyte meiosis. A search was performed on the Pubmed database for journal articles published between January 1970 and February 2017. Search terms included 'oocyte kinesin' and 'meiosis kinesin' in addition to individual kinesin names with the terms oocyte or meiosis. Within human cells 45 kinesin motor proteins have been discovered, with the role of only 13 of these proteins, or their orthologues, investigated in female meiosis. Furthermore, of these kinesins only half have been examined in mammalian oocytes, despite alterations occurring in gene transcripts or protein expression with maternal ageing, cryopreservation or behavioral conditions, such as binge drinking, for many of them. Kinesin motor proteins have distinct and important roles throughout oocyte meiosis in many non-mammalian model species. However, the functions these proteins have in mammalian meiosis, particularly in humans, are less clear owing to lack of research. This review brings to light the need for more experimental investigation of kinesin motor proteins, particularly those associated with maternal ageing, cryopreservation or exposure to environmental toxicants. © The Author 2017. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  6. Pharmacophore modeling, docking, and principal component analysis based clustering: combined computer-assisted approaches to identify new inhibitors of the human rhinovirus coat protein.

    PubMed

    Steindl, Theodora M; Crump, Carolyn E; Hayden, Frederick G; Langer, Thierry

    2005-10-06

    The development and application of a sophisticated virtual screening and selection protocol to identify potential, novel inhibitors of the human rhinovirus coat protein employing various computer-assisted strategies are described. A large commercially available database of compounds was screened using a highly selective, structure-based pharmacophore model generated with the program Catalyst. A docking study and a principal component analysis were carried out within the software package Cerius and served to validate and further refine the obtained results. These combined efforts led to the selection of six candidate structures, for which in vitro anti-rhinoviral activity could be shown in a biological assay.

  7. Identifying protein phosphorylation sites with kinase substrate specificity on human viruses.

    PubMed

    Bretaña, Neil Arvin; Lu, Cheng-Tsung; Chiang, Chiu-Yun; Su, Min-Gang; Huang, Kai-Yao; Lee, Tzong-Yi; Weng, Shun-Long

    2012-01-01

    Viruses infect humans and progress inside the body leading to various diseases and complications. The phosphorylation of viral proteins catalyzed by host kinases plays crucial regulatory roles in enhancing replication and inhibition of normal host-cell functions. Due to its biological importance, there is a desire to identify the protein phosphorylation sites on human viruses. However, the use of mass spectrometry-based experiments is proven to be expensive and labor-intensive. Furthermore, previous studies which have identified phosphorylation sites in human viruses do not include the investigation of the responsible kinases. Thus, we are motivated to propose a new method to identify protein phosphorylation sites with its kinase substrate specificity on human viruses. The experimentally verified phosphorylation data were extracted from virPTM--a database containing 301 experimentally verified phosphorylation data on 104 human kinase-phosphorylated virus proteins. In an attempt to investigate kinase substrate specificities in viral protein phosphorylation sites, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. The experimental human phosphorylation sites are collected from Phospho.ELM, grouped according to its kinase annotation, and compared with the virus MDD clusters. This investigation identifies human kinases such as CK2, PKB, CDK, and MAPK as potential kinases for catalyzing virus protein substrates as confirmed by published literature. Profile hidden Markov model is then applied to learn a predictive model for each subgroup. A five-fold cross validation evaluation on the MDD-clustered HMMs yields an average accuracy of 84.93% for Serine, and 78.05% for Threonine. Furthermore, an independent testing data collected from UniProtKB and Phospho.ELM is used to make a comparison of predictive performance on three popular kinase-specific phosphorylation site prediction tools. In the independent testing, the high sensitivity and specificity of the proposed method demonstrate the predictive effectiveness of the identified substrate motifs and the importance of investigating potential kinases for viral protein phosphorylation sites.

  8. Automated Gene Ontology annotation for anonymous sequence data.

    PubMed

    Hennig, Steffen; Groth, Detlef; Lehrach, Hans

    2003-07-01

    Gene Ontology (GO) is the most widely accepted attempt to construct a unified and structured vocabulary for the description of genes and their products in any organism. Annotation by GO terms is performed in most of the current genome projects, which besides generality has the advantage of being very convenient for computer based classification methods. However, direct use of GO in small sequencing projects is not easy, especially for species not commonly represented in public databases. We present a software package (GOblet), which performs annotation based on GO terms for anonymous cDNA or protein sequences. It uses the species independent GO structure and vocabulary together with a series of protein databases collected from various sites, to perform a detailed GO annotation by sequence similarity searches. The sensitivity and the reference protein sets can be selected by the user. GOblet runs automatically and is available as a public service on our web server. The paper also addresses the reliability of automated GO annotations by using a reference set of more than 6000 human proteins. The GOblet server is accessible at http://goblet.molgen.mpg.de.

  9. Literature Mining of Pathogenesis-Related Proteins in Human Pathogens for Database Annotation

    DTIC Science & Technology

    2009-10-01

    Salmonella , and Shigella. In most cases the host is human, but may also include other mammal species. 2. Negative literature set of PH-PPIs. Of...cis.udel.edu The objective of Gallus Reactome is to provide a curated set of metabolic and signaling pathways for the chicken . To assist annotators...interested in papers that document pathways in the chicken , abstracts are classified according to the species that were the source of the experimental

  10. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system.

    PubMed

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    2015-11-19

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.

  11. Many local pattern texture features: which is better for image-based multilabel human protein subcellular localization classification?

    PubMed

    Yang, Fan; Xu, Ying-Ying; Shen, Hong-Bin

    2014-01-01

    Human protein subcellular location prediction can provide critical knowledge for understanding a protein's function. Since significant progress has been made on digital microscopy, automated image-based protein subcellular location classification is urgently needed. In this paper, we aim to investigate more representative image features that can be effectively used for dealing with the multilabel subcellular image samples. We prepared a large multilabel immunohistochemistry (IHC) image benchmark from the Human Protein Atlas database and tested the performance of different local texture features, including completed local binary pattern, local tetra pattern, and the standard local binary pattern feature. According to our experimental results from binary relevance multilabel machine learning models, the completed local binary pattern, and local tetra pattern are more discriminative for describing IHC images when compared to the traditional local binary pattern descriptor. The combination of these two novel local pattern features and the conventional global texture features is also studied. The enhanced performance of final binary relevance classification model trained on the combined feature space demonstrates that different features are complementary to each other and thus capable of improving the accuracy of classification.

  12. GPU-based cloud service for Smith-Waterman algorithm using frequency distance filtration scheme.

    PubMed

    Lee, Sheng-Ta; Lin, Chun-Yuan; Hung, Che Lun

    2013-01-01

    As the conventional means of analyzing the similarity between a query sequence and database sequences, the Smith-Waterman algorithm is feasible for a database search owing to its high sensitivity. However, this algorithm is still quite time consuming. CUDA programming can improve computations efficiently by using the computational power of massive computing hardware as graphics processing units (GPUs). This work presents a novel Smith-Waterman algorithm with a frequency-based filtration method on GPUs rather than merely accelerating the comparisons yet expending computational resources to handle such unnecessary comparisons. A user friendly interface is also designed for potential cloud server applications with GPUs. Additionally, two data sets, H1N1 protein sequences (query sequence set) and human protein database (database set), are selected, followed by a comparison of CUDA-SW and CUDA-SW with the filtration method, referred to herein as CUDA-SWf. Experimental results indicate that reducing unnecessary sequence alignments can improve the computational time by up to 41%. Importantly, by using CUDA-SWf as a cloud service, this application can be accessed from any computing environment of a device with an Internet connection without time constraints.

  13. Eukaryotic DING Proteins Are Endogenous: An Immunohistological Study in Mouse Tissues

    PubMed Central

    Collombet, Jean-Marc; Elias, Mikael; Gotthard, Guillaume; Four, Elise; Renault, Frédérique; Joffre, Aurélie; Baubichon, Dominique; Rochu, Daniel; Chabrière, Eric

    2010-01-01

    Background DING proteins encompass an intriguing protein family first characterized by their conserved N-terminal sequences. Some of these proteins seem to have key roles in various human diseases, e.g., rheumatoid arthritis, atherosclerosis, HIV suppression. Although this protein family seems to be ubiquitous in eukaryotes, their genes are consistently lacking from genomic databases. Such a lack has considerably hampered functional studies and has fostered therefore the hypothesis that DING proteins isolated from eukaryotes were in fact prokaryotic contaminants. Principal Findings In the framework of our study, we have performed a comprehensive immunological detection of DING proteins in mice. We demonstrate that DING proteins are present in all tissues tested as isoforms of various molecular weights (MWs). Their intracellular localization is tissue-dependant, being exclusively nuclear in neurons, but cytoplasmic and nuclear in other tissues. We also provide evidence that germ-free mouse plasma contains as much DING protein as wild-type. Significance Hence, data herein provide a valuable basis for future investigations aimed at eukaryotic DING proteins, revealing that these proteins seem ubiquitous in mouse tissue. Our results strongly suggest that mouse DING proteins are endogenous. Moreover, the determination in this study of the precise cellular localization of DING proteins constitute a precious evidence to understand their molecular involvements in their related human diseases. PMID:20161715

  14. SM-TF: A structural database of small molecule-transcription factor complexes.

    PubMed

    Xu, Xianjin; Ma, Zhiwei; Sun, Hongmin; Zou, Xiaoqin

    2016-06-30

    Transcription factors (TFs) are the proteins involved in the transcription process, ensuring the correct expression of specific genes. Numerous diseases arise from the dysfunction of specific TFs. In fact, over 30 TFs have been identified as therapeutic targets of about 9% of the approved drugs. In this study, we created a structural database of small molecule-transcription factor (SM-TF) complexes, available online at http://zoulab.dalton.missouri.edu/SM-TF. The 3D structures of the co-bound small molecule and the corresponding binding sites on TFs are provided in the database, serving as a valuable resource to assist structure-based drug design related to TFs. Currently, the SM-TF database contains 934 entries covering 176 TFs from a variety of species. The database is further classified into several subsets by species and organisms. The entries in the SM-TF database are linked to the UniProt database and other sequence-based TF databases. Furthermore, the druggable TFs from human and the corresponding approved drugs are linked to the DrugBank. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  15. Reconstruction of the experimentally supported human protein interactome: what can we learn?

    PubMed Central

    2013-01-01

    Background Understanding the topology and dynamics of the human protein-protein interaction (PPI) network will significantly contribute to biomedical research, therefore its systematic reconstruction is required. Several meta-databases integrate source PPI datasets, but the protein node sets of their networks vary depending on the PPI data combined. Due to this inherent heterogeneity, the way in which the human PPI network expands via multiple dataset integration has not been comprehensively analyzed. We aim at assembling the human interactome in a global structured way and exploring it to gain insights of biological relevance. Results First, we defined the UniProtKB manually reviewed human “complete” proteome as the reference protein-node set and then we mined five major source PPI datasets for direct PPIs exclusively between the reference proteins. We updated the protein and publication identifiers and normalized all PPIs to the UniProt identifier level. The reconstructed interactome covers approximately 60% of the human proteome and has a scale-free structure. No apparent differentiating gene functional classification characteristics were identified for the unrepresented proteins. The source dataset integration augments the network mainly in PPIs. Polyubiquitin emerged as the highest-degree node, but the inclusion of most of its identified PPIs may be reconsidered. The high number (>300) of connections of the subsequent fifteen proteins correlates well with their essential biological role. According to the power-law network structure, the unrepresented proteins should mainly have up to four connections with equally poorly-connected interactors. Conclusions Reconstructing the human interactome based on the a priori definition of the protein nodes enabled us to identify the currently included part of the human “complete” proteome, and discuss the role of the proteins within the network topology with respect to their function. As the network expansion has to comply with the scale-free theory, we suggest that the core of the human interactome has essentially emerged. Thus, it could be employed in systems biology and biomedical research, despite the considerable number of currently unrepresented proteins. The latter are probably involved in specialized physiological conditions, justifying the scarcity of related PPI information, and their identification can assist in designing relevant functional experiments and targeted text mining algorithms. PMID:24088582

  16. Identification of differentially expressed proteins during human urinary bladder cancer progression.

    PubMed

    Memon, Ashfaque A; Chang, Jong W; Oh, Bong R; Yoo, Yung J

    2005-01-01

    Comparative proteome analysis was performed between RT4 (grade-1) and T24 (grade-3) bladder cancer cell lines, in an attempt to identify differentially expressed proteins during bladder cancer progression. Among those relatively abundant proteins, seven spots changed more than two-fold reproducibly and identified by peptide mass fingerprinting using mass spectrometry and database search. We found most extensive and reproducible down-regulation of NADP dependent isocitrate dehydrogenase cytoplasmic (IDPc) and peroxiredoxin-II (Prx-II), in poorly differentiated T24 compared to well-differentiated RT4 bladder cancer cell line. Subsequent Western blotting analysis of human biopsy samples from bladder cancer patient revealed significant loss of IDPc and Prx-II in more advance tumor samples, in agreement with data on cell lines. These results suggest that loss of IDPc and Prx-II during tumor development may involve in tumor progression and metastasis. However, additional investigations are needed on large number of human samples to further verify these findings.

  17. Integration of gel-based and gel-free proteomic data for functional analysis of proteins through Soybean Proteome Database.

    PubMed

    Komatsu, Setsuko; Wang, Xin; Yin, Xiaojian; Nanjo, Yohei; Ohyanagi, Hajime; Sakata, Katsumi

    2017-06-23

    The Soybean Proteome Database (SPD) stores data on soybean proteins obtained with gel-based and gel-free proteomic techniques. The database was constructed to provide information on proteins for functional analyses. The majority of the data is focused on soybean (Glycine max 'Enrei'). The growth and yield of soybean are strongly affected by environmental stresses such as flooding. The database was originally constructed using data on soybean proteins separated by two-dimensional polyacrylamide gel electrophoresis, which is a gel-based proteomic technique. Since 2015, the database has been expanded to incorporate data obtained by label-free mass spectrometry-based quantitative proteomics, which is a gel-free proteomic technique. Here, the portions of the database consisting of gel-free proteomic data are described. The gel-free proteomic database contains 39,212 proteins identified in 63 sample sets, such as temporal and organ-specific samples of soybean plants grown under flooding stress or non-stressed conditions. In addition, data on organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored. Furthermore, the database integrates multiple omics data such as genomics, transcriptomics, metabolomics, and proteomics. The SPD database is accessible at http://proteome.dc.affrc.go.jp/Soybean/. The Soybean Proteome Database stores data obtained from both gel-based and gel-free proteomic techniques. The gel-free proteomic database comprises 39,212 proteins identified in 63 sample sets, such as different organs of soybean plants grown under flooding stress or non-stressed conditions in a time-dependent manner. In addition, organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored in the gel-free proteomics database. A total of 44,704 proteins, including 5490 proteins identified using a gel-based proteomic technique, are stored in the SPD. It accounts for approximately 80% of all predicted proteins from genome sequences, though there are over lapped proteins. Based on the demonstrated application of data stored in the database for functional analyses, it is suggested that these data will be useful for analyses of biological mechanisms in soybean. Furthermore, coupled with recent advances in information and communication technology, the usefulness of this database would increase in the analyses of biological mechanisms. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Remodeling Cildb, a popular database for cilia and links for ciliopathies

    PubMed Central

    2014-01-01

    Background New generation technologies in cell and molecular biology generate large amounts of data hard to exploit for individual proteins. This is particularly true for ciliary and centrosomal research. Cildb is a multi–species knowledgebase gathering high throughput studies, which allows advanced searches to identify proteins involved in centrosome, basal body or cilia biogenesis, composition and function. Combined to localization of genetic diseases on human chromosomes given by OMIM links, candidate ciliopathy proteins can be compiled through Cildb searches. Methods Othology between recent versions of the whole proteomes was computed using Inparanoid and ciliary high throughput studies were remapped on these recent versions. Results Due to constant evolution of the ciliary and centrosomal field, Cildb has been recently upgraded twice, with new species whole proteomes and new ciliary studies, and the latter version displays a novel BioMart interface, much more intuitive than the previous ones. Conclusions This already popular database is designed now for easier use and is up to date in regard to high throughput ciliary studies. PMID:25422781

  19. Navigating through the Jungle of Allergens: Features and Applications of Allergen Databases.

    PubMed

    Radauer, Christian

    2017-01-01

    The increasing number of available data on allergenic proteins demanded the establishment of structured, freely accessible allergen databases. In this review article, features and applications of 6 of the most widely used allergen databases are discussed. The WHO/IUIS Allergen Nomenclature Database is the official resource of allergen designations. Allergome is the most comprehensive collection of data on allergens and allergen sources. AllergenOnline is aimed at providing a peer-reviewed database of allergen sequences for prediction of allergenicity of proteins, such as those planned to be inserted into genetically modified crops. The Structural Database of Allergenic Proteins (SDAP) provides a database of allergen sequences, structures, and epitopes linked to bioinformatics tools for sequence analysis and comparison. The Immune Epitope Database (IEDB) is the largest repository of T-cell, B-cell, and major histocompatibility complex protein epitopes including epitopes of allergens. AllFam classifies allergens into families of evolutionarily related proteins using definitions from the Pfam protein family database. These databases contain mostly overlapping data, but also show differences in terms of their targeted users, the criteria for including allergens, data shown for each allergen, and the availability of bioinformatics tools. © 2017 S. Karger AG, Basel.

  20. Detection of alternative splice variants at the proteome level in Aspergillus flavus.

    PubMed

    Chang, Kung-Yen; Georgianna, D Ryan; Heber, Steffen; Payne, Gary A; Muddiman, David C

    2010-03-05

    Identification of proteins from proteolytic peptides or intact proteins plays an essential role in proteomics. Researchers use search engines to match the acquired peptide sequences to the target proteins. However, search engines depend on protein databases to provide candidates for consideration. Alternative splicing (AS), the mechanism where the exon of pre-mRNAs can be spliced and rearranged to generate distinct mRNA and therefore protein variants, enable higher eukaryotic organisms, with only a limited number of genes, to have the requisite complexity and diversity at the proteome level. Multiple alternative isoforms from one gene often share common segments of sequences. However, many protein databases only include a limited number of isoforms to keep minimal redundancy. As a result, the database search might not identify a target protein even with high quality tandem MS data and accurate intact precursor ion mass. We computationally predicted an exhaustive list of putative isoforms of Aspergillus flavus proteins from 20 371 expressed sequence tags to investigate whether an alternative splicing protein database can assign a greater proportion of mass spectrometry data. The newly constructed AS database provided 9807 new alternatively spliced variants in addition to 12 832 previously annotated proteins. The searches of the existing tandem MS spectra data set using the AS database identified 29 new proteins encoded by 26 genes. Nine fungal genes appeared to have multiple protein isoforms. In addition to the discovery of splice variants, AS database also showed potential to improve genome annotation. In summary, the introduction of an alternative splicing database helps identify more proteins and unveils more information about a proteome.

  1. Directed proteomic analysis of the human nucleolus.

    PubMed

    Andersen, Jens S; Lyon, Carol E; Fox, Archa H; Leung, Anthony K L; Lam, Yun Wah; Steen, Hanno; Mann, Matthias; Lamond, Angus I

    2002-01-08

    The nucleolus is a subnuclear organelle containing the ribosomal RNA gene clusters and ribosome biogenesis factors. Recent studies suggest it may also have roles in RNA transport, RNA modification, and cell cycle regulation. Despite over 150 years of research into nucleoli, many aspects of their structure and function remain uncharacterized. We report a proteomic analysis of human nucleoli. Using a combination of mass spectrometry (MS) and sequence database searches, including online analysis of the draft human genome sequence, 271 proteins were identified. Over 30% of the nucleolar proteins were encoded by novel or uncharacterized genes, while the known proteins included several unexpected factors with no previously known nucleolar functions. MS analysis of nucleoli isolated from HeLa cells in which transcription had been inhibited showed that a subset of proteins was enriched. These data highlight the dynamic nature of the nucleolar proteome and show that proteins can either associate with nucleoli transiently or accumulate only under specific metabolic conditions. This extensive proteomic analysis shows that nucleoli have a surprisingly large protein complexity. The many novel factors and separate classes of proteins identified support the view that the nucleolus may perform additional functions beyond its known role in ribosome subunit biogenesis. The data also show that the protein composition of nucleoli is not static and can alter significantly in response to the metabolic state of the cell.

  2. Reverse screening methods to search for the protein targets of chemopreventive compounds

    NASA Astrophysics Data System (ADS)

    Huang, Hongbin; Zhang, Guigui; Zhou, Yuquan; Lin, Chenru; Chen, Suling; Lin, Yutong; Mai, Shangkang; Huang, Zunnan

    2018-05-01

    This article is a systematic review of reverse screening methods used to search for the protein targets of chemopreventive compounds or drugs. Typical chemopreventive compounds include components of traditional Chinese medicine, natural compounds and Food and Drug Administration (FDA)-approved drugs. Such compounds are somewhat selective but are predisposed to bind multiple protein targets distributed throughout diverse signaling pathways in human cells. In contrast to conventional virtual screening, which identifies the ligands of a targeted protein from a compound database, reverse screening is used to identify the potential targets or unintended targets of a given compound from a large number of receptors by examining their known ligands or crystal structures. This method, also known as in silico or computational target fishing, is highly valuable for discovering the target receptors of query molecules from terrestrial or marine natural products, exploring the molecular mechanisms of chemopreventive compounds, finding alternative indications of existing drugs by drug repositioning, and detecting adverse drug reactions and drug toxicity. Reverse screening can be divided into three major groups: shape screening, pharmacophore screening and reverse docking. Several large software packages, such as Schrödinger and Discovery Studio; typical software/network services such as ChemMapper, PharmMapper, idTarget and INVDOCK; and practical databases of known target ligands and receptor crystal structures, such as ChEMBL, BindingDB and the Protein Data Bank (PDB), are available for use in these computational methods. Different programs, online services and databases have different applications and constraints. Here, we conducted a systematic analysis and multilevel classification of the computational programs, online services and compound libraries available for shape screening, pharmacophore screening and reverse docking to enable non-specialist users to quickly learn and grasp the types of calculations used in protein target fishing. In addition, we review the main features of these methods, programs and databases and provide a variety of examples illustrating the application of one or a combination of reverse screening methods for accurate target prediction.

  3. Reverse Screening Methods to Search for the Protein Targets of Chemopreventive Compounds.

    PubMed

    Huang, Hongbin; Zhang, Guigui; Zhou, Yuquan; Lin, Chenru; Chen, Suling; Lin, Yutong; Mai, Shangkang; Huang, Zunnan

    2018-01-01

    This article is a systematic review of reverse screening methods used to search for the protein targets of chemopreventive compounds or drugs. Typical chemopreventive compounds include components of traditional Chinese medicine, natural compounds and Food and Drug Administration (FDA)-approved drugs. Such compounds are somewhat selective but are predisposed to bind multiple protein targets distributed throughout diverse signaling pathways in human cells. In contrast to conventional virtual screening, which identifies the ligands of a targeted protein from a compound database, reverse screening is used to identify the potential targets or unintended targets of a given compound from a large number of receptors by examining their known ligands or crystal structures. This method, also known as in silico or computational target fishing, is highly valuable for discovering the target receptors of query molecules from terrestrial or marine natural products, exploring the molecular mechanisms of chemopreventive compounds, finding alternative indications of existing drugs by drug repositioning, and detecting adverse drug reactions and drug toxicity. Reverse screening can be divided into three major groups: shape screening, pharmacophore screening and reverse docking. Several large software packages, such as Schrödinger and Discovery Studio; typical software/network services such as ChemMapper, PharmMapper, idTarget, and INVDOCK; and practical databases of known target ligands and receptor crystal structures, such as ChEMBL, BindingDB, and the Protein Data Bank (PDB), are available for use in these computational methods. Different programs, online services and databases have different applications and constraints. Here, we conducted a systematic analysis and multilevel classification of the computational programs, online services and compound libraries available for shape screening, pharmacophore screening and reverse docking to enable non-specialist users to quickly learn and grasp the types of calculations used in protein target fishing. In addition, we review the main features of these methods, programs and databases and provide a variety of examples illustrating the application of one or a combination of reverse screening methods for accurate target prediction.

  4. Reverse Screening Methods to Search for the Protein Targets of Chemopreventive Compounds

    PubMed Central

    Huang, Hongbin; Zhang, Guigui; Zhou, Yuquan; Lin, Chenru; Chen, Suling; Lin, Yutong; Mai, Shangkang; Huang, Zunnan

    2018-01-01

    This article is a systematic review of reverse screening methods used to search for the protein targets of chemopreventive compounds or drugs. Typical chemopreventive compounds include components of traditional Chinese medicine, natural compounds and Food and Drug Administration (FDA)-approved drugs. Such compounds are somewhat selective but are predisposed to bind multiple protein targets distributed throughout diverse signaling pathways in human cells. In contrast to conventional virtual screening, which identifies the ligands of a targeted protein from a compound database, reverse screening is used to identify the potential targets or unintended targets of a given compound from a large number of receptors by examining their known ligands or crystal structures. This method, also known as in silico or computational target fishing, is highly valuable for discovering the target receptors of query molecules from terrestrial or marine natural products, exploring the molecular mechanisms of chemopreventive compounds, finding alternative indications of existing drugs by drug repositioning, and detecting adverse drug reactions and drug toxicity. Reverse screening can be divided into three major groups: shape screening, pharmacophore screening and reverse docking. Several large software packages, such as Schrödinger and Discovery Studio; typical software/network services such as ChemMapper, PharmMapper, idTarget, and INVDOCK; and practical databases of known target ligands and receptor crystal structures, such as ChEMBL, BindingDB, and the Protein Data Bank (PDB), are available for use in these computational methods. Different programs, online services and databases have different applications and constraints. Here, we conducted a systematic analysis and multilevel classification of the computational programs, online services and compound libraries available for shape screening, pharmacophore screening and reverse docking to enable non-specialist users to quickly learn and grasp the types of calculations used in protein target fishing. In addition, we review the main features of these methods, programs and databases and provide a variety of examples illustrating the application of one or a combination of reverse screening methods for accurate target prediction. PMID:29868550

  5. iTRAQ Quantitative Proteomic Comparison of Metastatic and Non-Metastatic Uveal Melanoma Tumors

    PubMed Central

    Crabb, John W.; Hu, Bo; Crabb, John S.; Triozzi, Pierre; Saunthararajah, Yogen; Singh, Arun D.

    2015-01-01

    Background Uveal melanoma is the most common malignancy of the adult eye. The overall mortality rate is high because this aggressive cancer often metastasizes before ophthalmic diagnosis. Quantitative proteomic analysis of primary metastasizing and non-metastasizing tumors was pursued for insights into mechanisms and biomarkers of uveal melanoma metastasis. Methods Eight metastatic and 7 non-metastatic human primary uveal melanoma tumors were analyzed by LC MS/MS iTRAQ technology with Bruch’s membrane/choroid complex from normal postmortem eyes as control tissue. Tryptic peptides from tumor and control proteins were labeled with iTRAQ tags, fractionated by cation exchange chromatography, and analyzed by LC MS/MS. Protein identification utilized the Mascot search engine and the human Uni-Prot/Swiss-Protein database with false discovery ≤ 1%; protein quantitation utilized the Mascot weighted average method. Proteins designated differentially expressed exhibited quantitative differences (p ≤ 0.05, t-test) in a training set of five metastatic and five non-metastatic tumors. Logistic regression models developed from the training set were used to classify the metastatic status of five independent tumors. Results Of 1644 proteins identified and quantified in 5 metastatic and 5 non-metastatic tumors, 12 proteins were found uniquely in ≥ 3 metastatic tumors, 28 were found significantly elevated and 30 significantly decreased only in metastatic tumors, and 31 were designated differentially expressed between metastatic and non-metastatic tumors. Logistic regression modeling of differentially expressed collagen alpha-3(VI) and heat shock protein beta-1 allowed correct prediction of metastasis status for each of five independent tumor specimens. Conclusions The present data provide new clues to molecular differences in metastatic and non-metastatic uveal melanoma tumors. While sample size is limited and validation required, the results support collagen alpha-3(VI) and heat shock protein beta-1 as candidate biomarkers of uveal melanoma metastasis and establish a quantitative proteomic database for uveal melanoma primary tumors. PMID:26305875

  6. Finding the Subcellular Location of Barley, Wheat, Rice and Maize Proteins: The Compendium of Crop Proteins with Annotated Locations (cropPAL).

    PubMed

    Hooper, Cornelia M; Castleden, Ian R; Aryamanesh, Nader; Jacoby, Richard P; Millar, A Harvey

    2016-01-01

    Barley, wheat, rice and maize provide the bulk of human nutrition and have extensive industrial use as agricultural products. The genomes of these crops each contains >40,000 genes encoding proteins; however, the major genome databases for these species lack annotation information of protein subcellular location for >80% of these gene products. We address this gap, by constructing the compendium of crop protein subcellular locations called crop Proteins with Annotated Locations (cropPAL). Subcellular location is most commonly determined by fluorescent protein tagging of live cells or mass spectrometry detection in subcellular purifications, but can also be predicted from amino acid sequence or protein expression patterns. The cropPAL database collates 556 published studies, from >300 research institutes in >30 countries that have been previously published, as well as compiling eight pre-computed subcellular predictions for all Hordeum vulgare, Triticum aestivum, Oryza sativa and Zea mays protein sequences. The data collection including metadata for proteins and published studies can be accessed through a search portal http://crop-PAL.org. The subcellular localization information housed in cropPAL helps to depict plant cells as compartmentalized protein networks that can be investigated for improving crop yield and quality, and developing new biotechnological solutions to agricultural challenges. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  7. Decoding the disease-associated proteins encoded in the human chromosome 4.

    PubMed

    Chen, Lien-Chin; Liu, Mei-Ying; Hsiao, Yung-Chin; Choong, Wai-Kok; Wu, Hsin-Yi; Hsu, Wen-Lian; Liao, Pao-Chi; Sung, Ting-Yi; Tsai, Shih-Feng; Yu, Jau-Song; Chen, Yu-Ju

    2013-01-04

    Chromosome 4 is the fourth largest chromosome, containing approximately 191 megabases (~6.4% of the human genome) with 757 protein-coding genes. A number of marker genes for many diseases have been found in this chromosome, including genetic diseases (e.g., hepatocellular carcinoma) and biomedical research (cardiac system, aging, metabolic disorders, immune system, cancer and stem cell) related genes (e.g., oncogenes, growth factors). As a pilot study for the chromosome 4-centric human proteome project (Chr 4-HPP), we present here a systematic analysis of the disease association, protein isoforms, coding single nucleotide polymorphisms of these 757 protein-coding genes and their experimental evidence at the protein level. We also describe how the findings from the chromosome 4 project might be used to drive the biomarker discovery and validation study in disease-oriented projects, using the examples of secretomic and membrane proteomic approaches in cancer research. By integrating with cancer cell secretomes and several other existing databases in the public domain, we identified 141 chromosome 4-encoded proteins as cancer cell-secretable/shedable proteins. Additionally, we also identified 54 chromosome 4-encoded proteins that have been classified as cancer-associated proteins with successful selected or multiple reaction monitoring (SRM/MRM) assays developed. From literature annotation and topology analysis, 271 proteins were recognized as membrane proteins while 27.9% of the 757 proteins do not have any experimental evidence at the protein-level. In summary, the analysis revealed that the chromosome 4 is a rich resource for cancer-associated proteins for biomarker verification projects and for drug target discovery projects.

  8. The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases.

    PubMed

    Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning

    2007-10-18

    Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at http://www.ebi.ac.uk/Tools/picr.

  9. The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases

    PubMed Central

    Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning

    2007-01-01

    Background Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. Results We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. Conclusion We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at . PMID:17945017

  10. Complete-proteome mapping of human influenza A adaptive mutations: implications for human transmissibility of zoonotic strains.

    PubMed

    Miotto, Olivo; Heiny, A T; Albrecht, Randy; García-Sastre, Adolfo; Tan, Tin Wee; August, J Thomas; Brusic, Vladimir

    2010-02-03

    There is widespread concern that H5N1 avian influenza A viruses will emerge as a pandemic threat, if they become capable of human-to-human (H2H) transmission. Avian strains lack this capability, which suggests that it requires important adaptive mutations. We performed a large-scale comparative analysis of proteins from avian and human strains, to produce a catalogue of mutations associated with H2H transmissibility, and to detect their presence in avian isolates. We constructed a dataset of influenza A protein sequences from 92,343 public database records. Human and avian sequence subsets were compared, using a method based on mutual information, to identify characteristic sites where human isolates present conserved mutations. The resulting catalogue comprises 68 characteristic sites in eight internal proteins. Subtype variability prevented the identification of adaptive mutations in the hemagglutinin and neuraminidase proteins. The high number of sites in the ribonucleoprotein complex suggests interdependence between mutations in multiple proteins. Characteristic sites are often clustered within known functional regions, suggesting their functional roles in cellular processes. By isolating and concatenating characteristic site residues, we defined adaptation signatures, which summarize the adaptive potential of specific isolates. Most adaptive mutations emerged within three decades after the 1918 pandemic, and have remained remarkably stable thereafter. Two lineages with stable internal protein constellations have circulated among humans without reassorting. On the contrary, H5N1 avian and swine viruses reassort frequently, causing both gains and losses of adaptive mutations. Human host adaptation appears to be complex and systemic, involving nearly all influenza proteins. Adaptation signatures suggest that the ability of H5N1 strains to infect humans is related to the presence of an unusually high number of adaptive mutations. However, these mutations appear unstable, suggesting low pandemic potential of H5N1 in its current form. In addition, adaptation signatures indicate that pandemic H1N1/09 strain possesses multiple human-transmissibility mutations, though not an unusually high number with respect to swine strains that infected humans in the past. Adaptation signatures provide a novel tool for identifying zoonotic strains with the potential to infect humans.

  11. UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions.

    PubMed

    Robasky, Kimberly; Bulyk, Martha L

    2011-01-01

    The Universal PBM Resource for Oligonucleotide-Binding Evaluation (UniPROBE) database is a centralized repository of information on the DNA-binding preferences of proteins as determined by universal protein-binding microarray (PBM) technology. Each entry for a protein (or protein complex) in UniPROBE provides the quantitative preferences for all possible nucleotide sequence variants ('words') of length k ('k-mers'), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. In this update, we describe >130% expansion of the database content, incorporation of a protein BLAST (blastp) tool for finding protein sequence matches in UniPROBE, the introduction of UniPROBE accession numbers and additional database enhancements. The UniPROBE database is available at http://uniprobe.org.

  12. MIPS: a database for protein sequences, homology data and yeast genome information.

    PubMed Central

    Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

    1997-01-01

    The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

  13. Differential gene expression analysis in glioblastoma cells and normal human brain cells based on GEO database.

    PubMed

    Wang, Anping; Zhang, Guibin

    2017-11-01

    The differentially expressed genes between glioblastoma (GBM) cells and normal human brain cells were investigated to performed pathway analysis and protein interaction network analysis for the differentially expressed genes. GSE12657 and GSE42656 gene chips, which contain gene expression profile of GBM were obtained from Gene Expression Omniub (GEO) database of National Center for Biotechnology Information (NCBI). The 'limma' data packet in 'R' software was used to analyze the differentially expressed genes in the two gene chips, and gene integration was performed using 'RobustRankAggreg' package. Finally, pheatmap software was used for heatmap analysis and Cytoscape, DAVID, STRING and KOBAS were used for protein-protein interaction, Gene Ontology (GO) and KEGG analyses. As results: i) 702 differentially expressed genes were identified in GSE12657, among those genes, 548 were significantly upregulated and 154 were significantly downregulated (p<0.01, fold-change >1), and 1,854 differentially expressed genes were identified in GSE42656, among the genes, 1,068 were significantly upregulated and 786 were significantly downregulated (p<0.01, fold-change >1). A total of 167 differentially expressed genes including 100 upregulated genes and 67 downregulated genes were identified after gene integration, and the genes showed significantly different expression levels in GBM compared with normal human brain cells (p<0.05). ii) Interactions between the protein products of 101 differentially expressed genes were identified using STRING and expression network was established. A key gene, called CALM3, was identified by Cytoscape software. iii) GO enrichment analysis showed that differentially expressed genes were mainly enriched in 'neurotransmitter:sodium symporter activity' and 'neurotransmitter transporter activity', which can affect the activity of neurotransmitter transportation. KEGG pathway analysis showed that the differentially expressed genes were mainly enriched in 'protein processing in endoplasmic reticulum', which can affect protein processing in endoplasmic reticulum. The results showed that: i) 167 differentially expressed genes were identified from two gene chips after integration; and ii) protein interaction network was established, and GO and KEGG pathway analyses were successfully performed to identify and annotate the key gene, which provide new insights for the studies on GBN at gene level.

  14. NPIDB: Nucleic acid-Protein Interaction DataBase.

    PubMed

    Kirsanov, Dmitry D; Zanegina, Olga N; Aksianov, Evgeniy A; Spirin, Sergei A; Karyagina, Anna S; Alexeevski, Andrei V

    2013-01-01

    The Nucleic acid-Protein Interaction DataBase (http://npidb.belozersky.msu.ru/) contains information derived from structures of DNA-protein and RNA-protein complexes extracted from the Protein Data Bank (3846 complexes in October 2012). It provides a web interface and a set of tools for extracting biologically meaningful characteristics of nucleoprotein complexes. The content of the database is updated weekly. The current version of the Nucleic acid-Protein Interaction DataBase is an upgrade of the version published in 2007. The improvements include a new web interface, new tools for calculation of intermolecular interactions, a classification of SCOP families that contains DNA-binding protein domains and data on conserved water molecules on the DNA-protein interface.

  15. The PMDB Protein Model Database

    PubMed Central

    Castrignanò, Tiziana; De Meo, Paolo D'Onorio; Cozzetto, Domenico; Talamo, Ivano Giuseppe; Tramontano, Anna

    2006-01-01

    The Protein Model Database (PMDB) is a public resource aimed at storing manually built 3D models of proteins. The database is designed to provide access to models published in the scientific literature, together with validating experimental data. It is a relational database and it currently contains >74 000 models for ∼240 proteins. The system is accessible at and allows predictors to submit models along with related supporting evidence and users to download them through a simple and intuitive interface. Users can navigate in the database and retrieve models referring to the same target protein or to different regions of the same protein. Each model is assigned a unique identifier that allows interested users to directly access the data. PMID:16381873

  16. PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank.

    PubMed

    Tusnády, Gábor E; Dosztányi, Zsuzsanna; Simon, István

    2005-01-01

    PDB_TM is a database for transmembrane proteins with known structures. It aims to collect all transmembrane proteins that are deposited in the protein structure database (PDB) and to determine their membrane-spanning regions. These assignments are based on the TMDET algorithm, which uses only structural information to locate the most likely position of the lipid bilayer and to distinguish between transmembrane and globular proteins. This algorithm was applied to all PDB entries and the results were collected in the PDB_TM database. By using TMDET algorithm, the PDB_TM database can be automatically updated every week, keeping it synchronized with the latest PDB updates. The PDB_TM database is available at http://www.enzim.hu/PDB_TM.

  17. Kinase Pathway Database: An Integrated Protein-Kinase and NLP-Based Protein-Interaction Resource

    PubMed Central

    Koike, Asako; Kobayashi, Yoshiyuki; Takagi, Toshihisa

    2003-01-01

    Protein kinases play a crucial role in the regulation of cellular functions. Various kinds of information about these molecules are important for understanding signaling pathways and organism characteristics. We have developed the Kinase Pathway Database, an integrated database involving major completely sequenced eukaryotes. It contains the classification of protein kinases and their functional conservation, ortholog tables among species, protein–protein, protein–gene, and protein–compound interaction data, domain information, and structural information. It also provides an automatic pathway graphic image interface. The protein, gene, and compound interactions are automatically extracted from abstracts for all genes and proteins by natural-language processing (NLP).The method of automatic extraction uses phrase patterns and the GENA protein, gene, and compound name dictionary, which was developed by our group. With this database, pathways are easily compared among species using data with more than 47,000 protein interactions and protein kinase ortholog tables. The database is available for querying and browsing at http://kinasedb.ontology.ims.u-tokyo.ac.jp/. PMID:12799355

  18. BitterDB: a database of bitter compounds

    PubMed Central

    Wiener, Ayana; Shudler, Marina; Levit, Anat; Niv, Masha Y.

    2012-01-01

    Basic taste qualities like sour, salty, sweet, bitter and umami serve specific functions in identifying food components found in the diet of humans and animals, and are recognized by proteins in the oral cavity. Recognition of bitter taste and aversion to it are thought to protect the organism against the ingestion of poisonous food compounds, which are often bitter. Interestingly, bitter taste receptors are expressed not only in the mouth but also in extraoral tissues, such as the gastrointestinal tract, indicating that they may play a role in digestive and metabolic processes. BitterDB database, available at http://bitterdb.agri.huji.ac.il/bitterdb/, includes over 550 compounds that were reported to taste bitter to humans. The compounds can be searched by name, chemical structure, similarity to other bitter compounds, association with a particular human bitter taste receptor, and so on. The database also contains information on mutations in bitter taste receptors that were shown to influence receptor activation by bitter compounds. The aim of BitterDB is to facilitate studying the chemical features associated with bitterness. These studies may contribute to predicting bitterness of unknown compounds, predicting ligands for bitter receptors from different species and rational design of bitterness modulators. PMID:21940398

  19. Essential proteins and possible therapeutic targets of Wolbachia endosymbiont and development of FiloBase-a comprehensive drug target database for Lymphatic filariasis

    NASA Astrophysics Data System (ADS)

    Sharma, Om Prakash; Kumar, Muthuvel Suresh

    2016-01-01

    Lymphatic filariasis (Lf) is one of the oldest and most debilitating tropical diseases. Millions of people are suffering from this prevalent disease. It is estimated to infect over 120 million people in at least 80 nations of the world through the tropical and subtropical regions. More than one billion people are in danger of getting affected with this life-threatening disease. Several studies were suggested its emerging limitations and resistance towards the available drugs and therapeutic targets for Lf. Therefore, better medicine and drug targets are in demand. We took an initiative to identify the essential proteins of Wolbachia endosymbiont of Brugia malayi, which are indispensable for their survival and non-homologous to human host proteins. In this current study, we have used proteome subtractive approach to screen the possible therapeutic targets for wBm. In addition, numerous literatures were mined in the hunt for potential drug targets, drugs, epitopes, crystal structures, and expressed sequence tag (EST) sequences for filarial causing nematodes. Data obtained from our study were presented in a user friendly database named FiloBase. We hope that information stored in this database may be used for further research and drug development process against filariasis. URL: http://filobase.bicpu.edu.in.

  20. [Locus HS.633957 expression in human gastrointestinal tract and tumors].

    PubMed

    Polev, D E; Krukovskaia, L L; Kozlov, A P

    2011-01-01

    Human locus HS.633957 corresponds to its namesake cluster in the UniGene database http:/www.ncbi.nlm.nih.gov/unigene. It is located on chromosome 7 and is 3.7 tpn in size. It does not seem to encode proteins nor has its function been identified. According to bioinformation evidence, its expression is tumor-specific. PCR assay on kDNA samples from different intact human tissues detected its slight expression in liver, heart, embryonal brain and kidney as well as in a wide spectrum of tumors. This work features locus Hs.633957 expression in different parts of human gastrointestinal tract and tumors.

  1. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less

  2. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

    DOE PAGES

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    2015-11-19

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less

  3. Establishment of an international database for genetic variants in esophageal cancer.

    PubMed

    Vihinen, Mauno

    2016-10-01

    The establishment of a database has been suggested in order to collect, organize, and distribute genetic information about esophageal cancer. The World Organization for Specialized Studies on Diseases of the Esophagus and the Human Variome Project will be in charge of a central database of information about esophageal cancer-related variations from publications, databases, and laboratories; in addition to genetic details, clinical parameters will also be included. The aim will be to get all the central players in research, clinical, and commercial laboratories to contribute. The database will follow established recommendations and guidelines. The database will require a team of dedicated curators with different backgrounds. Numerous layers of systematics will be applied to facilitate computational analyses. The data items will be extensively integrated with other information sources. The database will be distributed as open access to ensure exchange of the data with other databases. Variations will be reported in relation to reference sequences on three levels--DNA, RNA, and protein-whenever applicable. In the first phase, the database will concentrate on genetic variations including both somatic and germline variations for susceptibility genes. Additional types of information can be integrated at a later stage. © 2016 New York Academy of Sciences.

  4. Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

    PubMed

    Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

    2017-06-01

    Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  5. [Isolation and purification of human blood plasma proteins able to form potassium channels in artificial bilayer lipid membrane].

    PubMed

    Venediktova, N I; Kuznetsov, K V; Gritsenko, E N; Gulidova, G P; Mironova, G D

    2012-01-01

    Protein fraction able to induce K(+)-selective transport across bilayer lipid membrane was isolated from human blood plasma with the use of the detergent and proteolytic enzyme-free method developed at our laboratory. After addition of the studied sample to the artificial membrane in the presence of 100 mM KCl, a discrete current change was observed. No channel activity was recorded in the presence of calcium and sodium ions. Channel forming activity of fraction was observed only in the presence of K+. Using a threefold gradient of KCl in the presence of studied proteins the potassium-selective potential balanced by voltage of -29 mV was registered. This value is very close to the theoretical Nernst potential in this case. This means that the examined ion channel is cation-selective. According to data obtained with MS-MALDI-TOF/TOF and database NCBI three protein components were identified in isolated researched sample.

  6. DR-GAS: a database of functional genetic variants and their phosphorylation states in human DNA repair systems.

    PubMed

    Sehgal, Manika; Singh, Tiratha Raj

    2014-04-01

    We present DR-GAS(1), a unique, consolidated and comprehensive DNA repair genetic association studies database of human DNA repair system. It presents information on repair genes, assorted mechanisms of DNA repair, linkage disequilibrium, haplotype blocks, nsSNPs, phosphorylation sites, associated diseases, and pathways involved in repair systems. DNA repair is an intricate process which plays an essential role in maintaining the integrity of the genome by eradicating the damaging effect of internal and external changes in the genome. Hence, it is crucial to extensively understand the intact process of DNA repair, genes involved, non-synonymous SNPs which perhaps affect the function, phosphorylated residues and other related genetic parameters. All the corresponding entries for DNA repair genes, such as proteins, OMIM IDs, literature references and pathways are cross-referenced to their respective primary databases. DNA repair genes and their associated parameters are either represented in tabular or in graphical form through images elucidated by computational and statistical analyses. It is believed that the database will assist molecular biologists, biotechnologists, therapeutic developers and other scientific community to encounter biologically meaningful information, and meticulous contribution of genetic level information towards treacherous diseases in human DNA repair systems. DR-GAS is freely available for academic and research purposes at: http://www.bioinfoindia.org/drgas. Copyright © 2014 Elsevier B.V. All rights reserved.

  7. The relationship between target-class and the physicochemical properties of antibacterial drugs

    PubMed Central

    Mugumbate, Grace; Overington, John P.

    2015-01-01

    The discovery of novel mechanism of action (MOA) antibacterials has been associated with the concept that antibacterial drugs occupy a differentiated region of physicochemical space compared to human-targeted drugs. With, in broad terms, antibacterials having higher molecular weight, lower log P and higher polar surface area (PSA). By analysing the physicochemical properties of about 1700 approved drugs listed in the ChEMBL database, we show, that antibacterials for whose targets are riboproteins (i.e., composed of a complex of RNA and protein) fall outside the conventional human ‘drug-like’ chemical space; whereas antibacterials that modulate bacterial protein targets, generally comply with the ‘rule-of-five’ guidelines for classical oral human drugs. Our analysis suggests a strong target-class association for antibacterials—either protein-targeted or riboprotein-targeted. There is much discussion in the literature on the failure of screening approaches to deliver novel antibacterial lead series, and linkage of this poor success rate for antibacterials with the chemical space properties of screening collections. Our analysis suggests that consideration of target-class may be an underappreciated factor in antibacterial lead discovery, and that in fact bacterial protein-targets may well have similar binding site characteristics to human protein targets, and questions the assumption that larger, more polar compounds are a key part of successful future antibacterial discovery. PMID:25975639

  8. Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers

    PubMed Central

    2011-01-01

    Background Most information on genomic variations and their associations with phenotypes are covered exclusively in scientific publications rather than in structured databases. These texts commonly describe variations using natural language; database identifiers are seldom mentioned. This complicates the retrieval of variations, associated articles, as well as information extraction, e. g. the search for biological implications. To overcome these challenges, procedures to map textual mentions of variations to database identifiers need to be developed. Results This article describes a workflow for normalization of variation mentions, i.e. the association of them to unique database identifiers. Common pitfalls in the interpretation of single nucleotide polymorphism (SNP) mentions are highlighted and discussed. The developed normalization procedure achieves a precision of 98.1 % and a recall of 67.5% for unambiguous association of variation mentions with dbSNP identifiers on a text corpus based on 296 MEDLINE abstracts containing 527 mentions of SNPs. The annotated corpus is freely available at http://www.scai.fraunhofer.de/snp-normalization-corpus.html. Conclusions Comparable approaches usually focus on variations mentioned on the protein sequence and neglect problems for other SNP mentions. The results presented here indicate that normalizing SNPs described on DNA level is more difficult than the normalization of SNPs described on protein level. The challenges associated with normalization are exemplified with ambiguities and errors, which occur in this corpus. PMID:21992066

  9. 'RetinoGenetics': a comprehensive mutation database for genes related to inherited retinal degeneration.

    PubMed

    Ran, Xia; Cai, Wei-Jun; Huang, Xiu-Feng; Liu, Qi; Lu, Fan; Qu, Jia; Wu, Jinyu; Jin, Zi-Bing

    2014-01-01

    Inherited retinal degeneration (IRD), a leading cause of human blindness worldwide, is exceptionally heterogeneous with clinical heterogeneity and genetic variety. During the past decades, tremendous efforts have been made to explore the complex heterogeneity, and massive mutations have been identified in different genes underlying IRD with the significant advancement of sequencing technology. In this study, we developed a comprehensive database, 'RetinoGenetics', which contains informative knowledge about all known IRD-related genes and mutations for IRD. 'RetinoGenetics' currently contains 4270 mutations in 186 genes, with detailed information associated with 164 phenotypes from 934 publications and various types of functional annotations. Then extensive annotations were performed to each gene using various resources, including Gene Ontology, KEGG pathways, protein-protein interaction, mutational annotations and gene-disease network. Furthermore, by using the search functions, convenient browsing ways and intuitive graphical displays, 'RetinoGenetics' could serve as a valuable resource for unveiling the genetic basis of IRD. Taken together, 'RetinoGenetics' is an integrative, informative and updatable resource for IRD-related genetic predispositions. Database URL: http://www.retinogenetics.org/. © The Author(s) 2014. Published by Oxford University Press.

  10. Investigation of potent lead for acquired immunodeficiency syndrome from traditional Chinese medicine.

    PubMed

    Hung, Tzu-Chieh; Lee, Wen-Yuan; Chen, Kuen-Bao; Chan, Yueh-Chiu; Chen, Calvin Yu-Chian

    2014-01-01

    Acquired immunodeficiency syndrome (AIDS), caused by human immunodeficiency virus (HIV), has become, because of the rapid spread of the disease, a serious global problem and cannot be treated. Recent studies indicate that VIF is a protein of HIV to prevent all of human immunity to attack HIV. Molecular compounds of traditional Chinese medicine (TCM) database filtered through molecular docking and molecular dynamics simulations to inhibit VIF can protect against HIV. Glutamic acid, plantagoguanidinic acid, and Aurantiamide acetate based docking score higher with other TCM compounds selected. Molecular dynamics are useful for analysis and detection ligand interactions. According to the docking position, hydrophobic interactions, hydrogen bonding changes, and structure variation, the study try to select the efficacy of traditional Chinese medicine compound Aurantiamide acetate is better than the other for protein-ligand interactions to maintain the protein composition, based on changes in the structure.

  11. Prioritizing and modelling of putative drug target proteins of Candida albicans by systems biology approach.

    PubMed

    Ismail, Tariq; Fatima, Nighat; Muhammad, Syed Aun; Zaidi, Syed Saoud; Rehman, Nisar; Hussain, Izhar; Tariq, Najam Us Sahr; Amirzada, Imran; Mannan, Abdul

    2018-01-01

    Candida albicans (Candida albicans) is one of the major sources of nosocomial infections in humans which may prove fatal in 30% of cases. The hospital acquired infection is very difficult to treat affectively due to the presence of drug resistant pathogenic strains, therefore there is a need to find alternative drug targets to cure this infection. In silico and computational level frame work was used to prioritize and establish antifungal drug targets of Candida albicans. The identification of putative drug targets was based on acquiring 5090 completely annotated genes of Candida albicans from available databases which were categorized into essential and non-essential genes. The result indicated that 9% of proteins were essential and could become potential candidates for intervention which might result in pathogen eradication. We studied cluster of orthologs and the subtractive genomic analysis of these essential proteins against human genome was made as a reference to minimize the side effects. It was seen that 14% of Candida albicans proteins were evolutionary related to the human proteins while 86% are non-human homologs. In the next step of compatible drug target selections, the non-human homologs were sequentially compared to the human microbiome data to minimize the potential effects against gut flora which accumulated to 38% of the essential genome. The sub-cellular localization of these candidate proteins in fungal cellular systems indicated that 80% of them are cytoplasmic, 10% are mitochondrial and the remaining 10% are associated with the cell wall. The role of these non-human and non-gut flora putative target proteins in Candida albicans biological pathways was studied. Due to their integrated and critical role in Candida albicans replication cycle, four proteins were selected for molecular modeling. For drug designing and development, four high quality and reliable protein models with more than 70% sequence identity were constructed. These proteins are used for the docking studies of the known and new ligands (unpublished data). Our study will be an effective framework for drug target identifications of pathogenic microbial strains and development of new therapies against the infections they cause.

  12. Human protoporphyrinogen oxidase: expression, purification, and characterization of the cloned enzyme.

    PubMed Central

    Dailey, T. A.; Dailey, H. A.

    1996-01-01

    Protoporphyrinogen oxidase (E.C.1.3.3.4) catalyzes the oxygen-dependent oxidation of protoporphyrinogen IX to protoporphyrin IX. The enzyme from human placenta has been cloned, sequenced, expressed in Escherichia coli, purified to homogeneity, and characterized. Northern blot analysis of eight different human tissues show evidence for only a single transcript in all tissue types and the size of this transcript is approximately 1.8 kb. The human cDNA has been inserted into an expression vector for E. coli and the protein produced at high levels in these cells. The protein is found in both membrane and cytoplasmic fractions. The enzyme was purified to homogeneity in the presence of detergents using a metal chelate affinity column. The purified protein is a homodimer composed of subunits of molecular weight of 51,000. The enzyme contains one noncovalently bound FAD per dimer, has a monomer extinction coefficient of 48,000 at 270 nm and contains no detectable redox active metals. The apparent K(m) and Kcat for protoporphyrinogen IX are 1.7 microM and 10.5 min-1, respectively. The enzyme does not use coproporphyrinogen III as a substrate and is inhibited by micromolar concentrations of the herbicide acifluorfen. Protein database searches reveal significant homology between protoporphyrinogen oxidase and monoamine oxidase. PMID:8771201

  13. Leukotriene signaling in the extinct human subspecies Homo denisovan and Homo neanderthalensis. Structural and functional comparison with Homo sapiens.

    PubMed

    Adel, Susan; Kakularam, Kumar Reddy; Horn, Thomas; Reddanna, Pallu; Kuhn, Hartmut; Heydeck, Dagmar

    2015-01-01

    Mammalian lipoxygenases (LOXs) have been implicated in cell differentiation and in the biosynthesis of pro- and anti-inflammatory lipid mediators. The initial draft sequence of the Homo neanderthalensis genome (coverage of 1.3-fold) suggested defective leukotriene signaling in this archaic human subspecies since expression of essential proteins appeared to be corrupted. Meanwhile high quality genomic sequence data became available for two extinct human subspecies (H. neanderthalensis, Homo denisovan) and completion of the human 1000 genome project provided a comprehensive database characterizing the genetic variability of the human genome. For this study we extracted the nucleotide sequences of selected eicosanoid relevant genes (ALOX5, ALOX15, ALOX12, ALOX15B, ALOX12B, ALOXE3, COX1, COX2, LTA4H, LTC4S, ALOX5AP, CYSLTR1, CYSLTR2, BLTR1, BLTR2) from the corresponding databases. Comparison of the deduced amino acid sequences in connection with site-directed mutagenesis studies and structural modeling suggested that the major enzymes and receptors of leukotriene signaling as well as the two cyclooxygenase isoforms were fully functional in these two extinct human subspecies. Copyright © 2014 Elsevier Inc. All rights reserved.

  14. Sequence analysis of serum albumins reveals the molecular evolution of ligand recognition properties.

    PubMed

    Fanali, Gabriella; Ascenzi, Paolo; Bernardi, Giorgio; Fasano, Mauro

    2012-01-01

    Serum albumin (SA) is a circulating protein providing a depot and carrier for many endogenous and exogenous compounds. At least seven major binding sites have been identified by structural and functional investigations mainly in human SA. SA is conserved in vertebrates, with at least 49 entries in protein sequence databases. The multiple sequence analysis of this set of entries leads to the definition of a cladistic tree for the molecular evolution of SA orthologs in vertebrates, thus showing the clustering of the considered species, with lamprey SAs (Lethenteron japonicum and Petromyzon marinus) in a separate outgroup. Sequence analysis aimed at searching conserved domains revealed that most SA sequences are made up by three repeated domains (about 600 residues), as extensively characterized for human SA. On the contrary, lamprey SAs are giant proteins (about 1400 residues) comprising seven repeated domains. The phylogenetic analysis of the SA family reveals a stringent correlation with the taxonomic classification of the species available in sequence databases. A focused inspection of the sequences of ligand binding sites in SA revealed that in all sites most residues involved in ligand binding are conserved, although the versatility towards different ligands could be peculiar of higher organisms. Moreover, the analysis of molecular links between the different sites suggests that allosteric modulation mechanisms could be restricted to higher vertebrates.

  15. Kangaroo – A pattern-matching program for biological sequences

    PubMed Central

    2002-01-01

    Background Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Results Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. Conclusion A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats. PMID:12150718

  16. PPInterFinder--a mining tool for extracting causal relations on human proteins from literature.

    PubMed

    Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar

    2013-01-01

    One of the most common and challenging problem in biomedical text mining is to mine protein-protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder--a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. DATABASE URL: http://www.biomining-bu.in/ppinterfinder/

  17. Neanderthal and Denisova tooth protein variants in present-day humans

    PubMed Central

    Zanolli, Clément; Hourset, Mathilde; Esclassan, Rémi

    2017-01-01

    Environment parameters, diet and genetic factors interact to shape tooth morphostructure. In the human lineage, archaic and modern hominins show differences in dental traits, including enamel thickness, but variability also exists among living populations. Several polymorphisms, in particular in the non-collagenous extracellular matrix proteins of the tooth hard tissues, like enamelin, are involved in dental structure variation and defects and may be associated with dental disorders or susceptibility to caries. To gain insights into the relationships between tooth protein polymorphisms and dental structural morphology and defects, we searched for non-synonymous polymorphisms in tooth proteins from Neanderthal and Denisova hominins. The objective was to identify archaic-specific missense variants that may explain the dental morphostructural variability between extinct and modern humans, and to explore their putative impact on present-day dental phenotypes. Thirteen non-collagenous extracellular matrix proteins specific to hard dental tissues have been selected, searched in the publicly available sequence databases of Neanderthal and Denisova individuals and compared with modern human genome data. A total of 16 non-synonymous polymorphisms were identified in 6 proteins (ameloblastin, amelotin, cementum protein 1, dentin matrix acidic phosphoprotein 1, enamelin and matrix Gla protein). Most of them are encoded by dentin and enamel genes located on chromosome 4, previously reported to show signs of archaic introgression within Africa. Among the variants shared with modern humans, two are ancestral (common with apes) and one is the derived enamelin major variant, T648I (rs7671281), associated with a thinner enamel and specific to the Homo lineage. All the others are specific to Neanderthals and Denisova, and are found at a very low frequency in modern Africans or East and South Asians, suggesting that they may be related to particular dental traits or disease susceptibility in these populations. This modern regional distribution of archaic dental polymorphisms may reflect persistence of archaic variants in some populations and may contribute in part to the geographic dental variations described in modern humans. PMID:28902892

  18. Domain fusion analysis by applying relational algebra to protein sequence and domain databases

    PubMed Central

    Truong, Kevin; Ikura, Mitsuhiko

    2003-01-01

    Background Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. Results This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . Conclusion As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time. PMID:12734020

  19. Distribution of cellular HSV-1 receptor expression in human brain.

    PubMed

    Lathe, Richard; Haas, Juergen G

    2017-06-01

    Herpes simplex virus type 1 (HSV-1) is a neurotropic virus linked to a range of acute and chronic neurological disorders affecting distinct regions of the brain. Unusually, HSV-1 entry into cells requires the interaction of viral proteins glycoprotein D (gD) and glycoprotein B (gB) with distinct cellular receptor proteins. Several different gD and gB receptors have been identified, including TNFRSF14/HVEM and PVRL1/nectin 1 as gD receptors and PILRA, MAG, and MYH9 as gB receptors. We investigated the expression of these receptor molecules in different areas of the adult and developing human brain using online transcriptome databases. Whereas all HSV-1 receptors showed distinct expression patterns in different brain areas, the Allan Brain Atlas (ABA) reported increased expression of both gD and gB receptors in the hippocampus. Specifically, for PVRL1, TNFRFS14, and MYH9, the differential z scores for hippocampal expression, a measure of relative levels of increased expression, rose to 2.9, 2.9, and 2.5, respectively, comparable to the z score for the archetypical hippocampus-enriched mineralocorticoid receptor (NR3C2, z = 3.1). These data were confirmed at the Human Brain Transcriptome (HBT) database, but HBT data indicate that MAG expression is also enriched in hippocampus. The HBT database allowed the developmental pattern of expression to be investigated; we report that all HSV1 receptors markedly increase in expression levels between gestation and the postnatal/adult periods. These results suggest that differential receptor expression levels of several HSV-1 gD and gB receptors in the adult hippocampus are likely to underlie the susceptibility of this brain region to HSV-1 infection.

  20. Appearance of the pituitary factor Pit-1 increases chromatin remodeling at hypersensitive site III in the human GH locus.

    PubMed

    Yang, Xiaoyang; Jin, Yan; Cattini, Peter A

    2010-07-01

    Expression of pituitary and placental members of the human GH and chorionic somatomammotropin (CS) gene family is directed by an upstream remote locus control region (LCR). Pituitary-specific expression of GH requires direct binding of Pit-1 (listed as POU1F1 in the HUGO database) to sequences marked by a hypersensitive site (HS) region (HS I/II) 14.6 kb upstream of the GH-N gene (listed as GH1 in the HUGO database). We used human embryonic kidney 293 (HEK293) cells overexpressing wild-type and mutant Pit-1 proteins as a model system to gain insight into the mechanism by which Pit-1 gains access to the GH LCR. Addition of Pit-1 to these cells increased DNA accessibility at HS III, located 28 kb upstream of the human GH-N gene, in a POU homeodomain-dependent manner, as reflected by effects on histone hyperacetylation and RNA polymerase II activity. Direct binding of Pit-1 to HS III sequences is not supported. However, the potential for binding of ETS family members to this region has been demonstrated, and Pit-1 association with this ETS element in HS III sequences requires the POU homeodomain. Also, both ETS1 and ELK1 co-precipitate from human pituitary extracts using two independent sources of Pit-1 antibodies. Finally, overexpression of ELK1 or Pit-1 expression in HEK293 cells increased GH-N RNA levels. However, while ELK1 overexpression also stimulated placental CS RNA levels, the effect of Pit-1 appeared to correlate with ETS factor levels and target GH-N preferentially. These data are consistent with recruitment and an early role for Pit-1 in remodeling of the GH LCR at the constitutively open HS III through protein-protein interaction.

  1. TOPDOM: database of conservatively located domains and motifs in proteins.

    PubMed

    Varga, Julia; Dobson, László; Tusnády, Gábor E

    2016-09-01

    The TOPDOM database-originally created as a collection of domains and motifs located consistently on the same side of the membranes in α-helical transmembrane proteins-has been updated and extended by taking into consideration consistently localized domains and motifs in globular proteins, too. By taking advantage of the recently developed CCTOP algorithm to determine the type of a protein and predict topology in case of transmembrane proteins, and by applying a thorough search for domains and motifs as well as utilizing the most up-to-date version of all source databases, we managed to reach a 6-fold increase in the size of the whole database and a 2-fold increase in the number of transmembrane proteins. TOPDOM database is available at http://topdom.enzim.hu The webpage utilizes the common Apache, PHP5 and MySQL software to provide the user interface for accessing and searching the database. The database itself is generated on a high performance computer. tusnady.gabor@ttk.mta.hu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  2. Mutational analysis of the human nucleotide excision repair gene ERCC1.

    PubMed Central

    Sijbers, A M; van der Spek, P J; Odijk, H; van den Berg, J; van Duin, M; Westerveld, A; Jaspers, N G; Bootsma, D; Hoeijmakers, J H

    1996-01-01

    The human DNA repair protein ERCC1 resides in a complex together with the ERCC4, ERCC11 and XP-F correcting activities, thought to perform the 5' strand incision during nucleotide excision repair (NER). Its yeast counterpart, RAD1-RAD10, has an additional engagement in a mitotic recombination pathway, probably required for repair of DNA cross-links. Mutational analysis revealed that the poorly conserved N-terminal 91 amino acids of ERCC1 are dispensable for both repair functions, in contrast to a deletion of only four residues from the C-terminus. A database search revealed a strongly conserved motif in this C-terminus sharing sequence homology with many DNA break processing proteins, indicating that this part is primarily required for the presumed structure-specific endonuclease activity of ERCC1. Most missense mutations in the central region give rise to an unstable protein (complex). Accordingly, we found that free ERCC1 is very rapidly degraded, suggesting that protein-protein interactions provide stability. Survival experiments show that the removal of cross-links requires less ERCC1 than UV repair. This suggests that the ERCC1-dependent step in cross-link repair occurs outside the context of NER and provides an explanation for the phenotype of the human repair syndrome xeroderma pigmentosum group F. PMID:8811092

  3. COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps.

    PubMed

    Chang, Yi-Chien; Hu, Zhenjun; Rachlin, John; Anton, Brian P; Kasif, Simon; Roberts, Richard J; Steffen, Martin

    2016-01-04

    The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on ∼ 3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Molecular differences between mature and immature dental pulp cells: Bioinformatics and preliminary results.

    PubMed

    Chen, Long; Jiang, Yifeng; Du, Zhen

    2018-04-01

    Although previous studies have demonstrated that dental pulp stem cells (DPSCs) from mature and immature teeth exhibit potential for multi-directional differentiation, the molecular and biological difference between the DPSCs from mature and immature permanent teeth has not been fully investigated. In the present study, 500 differentially expressed genes from dental pulp cells (DPCs) in mature and immature permanent teeth were obtained from the Gene Expression Omnibus online database. Based on bioinformatics analysis using the Database for Annotation, Visualization and Integrated Discovery, these genes were divided into a number of subgroups associated with immunity, inflammation and cell signaling. The results of the present study suggest that immune features, response to infection and cell signaling may be different in DPCs from mature and immature permanent teeth; furthermore, DPCs from immature permanent teeth may be more suitable for use in tissue engineering or stem cell therapy. The Online Mendelian Inheritance in Man database stated that Sonic Hedgehog (SHH), a differentially expressed gene in DPCs from mature and immature permanent teeth, serves a crucial role in the development of craniofacial tissues, including teeth, which further confirmed that SHH may cause DPCs from mature and immature permanent teeth to exhibit different biological characteristics. The Search Tool for the Retrieval of Interacting Genes/Proteins database revealed that SHH has functional protein associations with a number of other proteins, including Glioma-associated oncogene (GLI)1, GLI2, growth arrest-specific protein 1, bone morphogenetic protein (BMP)2 and BMP4, in mice and humans. It was also demonstrated that SHH may interact with other genes to regulate the biological characteristics of DPCs. The results of the present study may provide a useful reference basis for selecting suitable DPSCs and molecules for the treatment of these cells to optimize features for tissue engineering or stem cell therapy. Quantitative polymerase chain reaction should be performed to confirm the differential expression of these genes prior to the beginning of a functional study.

  5. Significance of aquaporins’ expression in the prognosis of gastric cancer

    PubMed Central

    Thapa, Saroj; Chetry, Mandika; Huang, Kaiyu; Peng, Yangpei; Wang, Jinsheng; Wang, Jiaoni; Zhou, Yingying; Shen, Yigen; Xue, Yangjing; Ji, Kangting

    2018-01-01

    Gastric carcinoma is one of the most lethal malignancy at present with leading cause of cancer-related deaths worldwide. Aquaporins (AQPs) are a family of small, integral membrane proteins, which have been evidenced to play a crucial role in cell migration and proliferation of different cancer cells including gastric cancers. However, the aberrant expression of specific AQPs and its correlation to detect predictive and prognostic significance in gastric cancer remains elusive. In the present study, we comprehensively explored immunohistochemistry based map of protein expression profiles in normal tissues, cancer and cell lines from publicly available Human Protein Atlas (HPA) database. Moreover, to improve our understanding of general gastric biology and guide to find novel predictive prognostic gastric cancer biomarker, we also retrieved ‘The Kaplan–Meier plotter’ (KM plotter) online database with specific AQPs mRNA to overall survival (OS) in different clinicopathological features. We revealed that ubiquitous expression of AQPs protein can be effective tools to generate gastric cancer biomarker. Furthermore, high level AQP3, AQP9, and AQP11 mRNA expression were correlated with better OS in all gastric patients, whereas AQP0, AQP1, AQP4, AQP5, AQP6, AQP8, and AQP10 mRNA expression were associated with poor OS. With regard to the clinicopathological features including Laurens classification, clinical stage, human epidermal growth factor receptor 2 (HER2) status, and different treatment strategy, we could illustrate significant role of individual AQP mRNA expression in the prognosis of gastric cancer patients. Thus, our results indicated that AQP’s protein and mRNA expression in gastric cancer patients provide effective role to predict prognosis and act as an essential agent to therapeutic strategy. PMID:29678898

  6. Exploring Site-Specific N-Glycosylation Microheterogeneity of Haptoglobin using Glycopeptide CID Tandem Mass Spectra and Glycan Database Search

    PubMed Central

    Chandler, Kevin Brown; Pompach, Petr; Goldman, Radoslav

    2013-01-01

    Glycosylation is a common protein modification with a significant role in many vital cellular processes and human diseases, making the characterization of protein-attached glycan structures important for understanding cell biology and disease processes. Direct analysis of protein N-glycosylation by tandem mass spectrometry of glycopeptides promises site-specific elucidation of N-glycan microheterogeneity, something which detached N-glycan and de-glycosylated peptide analyses cannot provide. However, successful implementation of direct N-glycopeptide analysis by tandem mass spectrometry remains a challenge. In this work, we consider algorithmic techniques for the analysis of LC-MS/MS data acquired from glycopeptide-enriched fractions of enzymatic digests of purified proteins. We implement a computational strategy which takes advantage of the properties of CID fragmentation spectra of N-glycopeptides, matching the MS/MS spectra to peptide-glycan pairs from protein sequences and glycan structure databases. Significantly, we also propose a novel false-discovery-rate estimation technique to estimate and manage the number of false identifications. We use a human glycoprotein standard, haptoglobin, digested with trypsin and GluC, enriched for glycopeptides using HILIC chromatography, and analyzed by LC-MS/MS to demonstrate our algorithmic strategy and evaluate its performance. Our software, GlycoPeptideSearch (GPS), assigned glycopeptide identifications to 246 of the spectra at false-discovery-rate 5.58%, identifying 42 distinct haptoglobin peptide-glycan pairs at each of the four haptoglobin N-linked glycosylation sites. We further demonstrate the effectiveness of this approach by analyzing plasma-derived haptoglobin, identifying 136 N-linked glycopeptide spectra at false-discovery-rate 0.4%, representing 15 distinct glycopeptides on at least three of the four N-linked glycosylation sites. The software, GlycoPeptideSearch, is available for download from http://edwardslab.bmcb.georgetown.edu/GPS. PMID:23829323

  7. SuperSweet—a resource on natural and artificial sweetening agents

    PubMed Central

    Ahmed, Jessica; Preissner, Saskia; Dunkel, Mathias; Worth, Catherine L.; Eckert, Andreas; Preissner, Robert

    2011-01-01

    A vast number of sweet tasting molecules are known, encompassing small compounds, carbohydrates, d-amino acids and large proteins. Carbohydrates play a particularly big role in human diet. The replacement of sugars in food with artificial sweeteners is common and is a general approach to prevent cavities, obesity and associated diseases such as diabetes and hyperlipidemia. Knowledge about the molecular basis of taste may reveal new strategies to overcome diet-induced diseases. In this context, the design of safe, low-calorie sweeteners is particularly important. Here, we provide a comprehensive collection of carbohydrates, artificial sweeteners and other sweet tasting agents like proteins and peptides. Additionally, structural information and properties such as number of calories, therapeutic annotations and a sweetness-index are stored in SuperSweet. Currently, the database consists of more than 8000 sweet molecules. Moreover, the database provides a modeled 3D structure of the sweet taste receptor and binding poses of the small sweet molecules. These binding poses provide hints for the design of new sweeteners. A user-friendly graphical interface allows similarity searching, visualization of docked sweeteners into the receptor etc. A sweetener classification tree and browsing features allow quick requests to be made to the database. The database is freely available at: http://bioinformatics.charite.de/sweet/. PMID:20952410

  8. SuperSweet--a resource on natural and artificial sweetening agents.

    PubMed

    Ahmed, Jessica; Preissner, Saskia; Dunkel, Mathias; Worth, Catherine L; Eckert, Andreas; Preissner, Robert

    2011-01-01

    A vast number of sweet tasting molecules are known, encompassing small compounds, carbohydrates, d-amino acids and large proteins. Carbohydrates play a particularly big role in human diet. The replacement of sugars in food with artificial sweeteners is common and is a general approach to prevent cavities, obesity and associated diseases such as diabetes and hyperlipidemia. Knowledge about the molecular basis of taste may reveal new strategies to overcome diet-induced diseases. In this context, the design of safe, low-calorie sweeteners is particularly important. Here, we provide a comprehensive collection of carbohydrates, artificial sweeteners and other sweet tasting agents like proteins and peptides. Additionally, structural information and properties such as number of calories, therapeutic annotations and a sweetness-index are stored in SuperSweet. Currently, the database consists of more than 8000 sweet molecules. Moreover, the database provides a modeled 3D structure of the sweet taste receptor and binding poses of the small sweet molecules. These binding poses provide hints for the design of new sweeteners. A user-friendly graphical interface allows similarity searching, visualization of docked sweeteners into the receptor etc. A sweetener classification tree and browsing features allow quick requests to be made to the database. The database is freely available at: http://bioinformatics.charite.de/sweet/.

  9. [The interpretation and integration of traditional Chinese phytotherapy into Western-type medicine with the possession of knowledge of the human genome].

    PubMed

    Blázovics, Anna

    2018-05-01

    The terminology of traditional Chinese medicine (TCM) is hardly interpretable in the context of human genome, therefore the human genome program attracted attention towards the Western practice of medicine in China. In the last two decades, several important steps could be observed in China in relation to the approach of traditional Chinese and Western medicine. The Chinese government supports the realization of information databases for research in order to clarify the molecular biology level to detect associations between gene expression signal transduction pathways and protein-protein interactions, and the effects of bioactive components of Chinese drugs and their effectiveness. The values of TCM are becoming more and more important for Western medicine as well, because molecular biological therapies did not redeem themselves, e.g., in tumor therapy. Orv Hetil. 2018; 159(18): 696-702.

  10. A Circular Dichroism Reference Database for Membrane Proteins

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wallace,B.; Wien, F.; Stone, T.

    2006-01-01

    Membrane proteins are a major product of most genomes and the target of a large number of current pharmaceuticals, yet little information exists on their structures because of the difficulty of crystallising them; hence for the most part they have been excluded from structural genomics programme targets. Furthermore, even methods such as circular dichroism (CD) spectroscopy which seek to define secondary structure have not been fully exploited because of technical limitations to their interpretation for membrane embedded proteins. Empirical analyses of circular dichroism (CD) spectra are valuable for providing information on secondary structures of proteins. However, the accuracy of themore » results depends on the appropriateness of the reference databases used in the analyses. Membrane proteins have different spectral characteristics than do soluble proteins as a result of the low dielectric constants of membrane bilayers relative to those of aqueous solutions (Chen & Wallace (1997) Biophys. Chem. 65:65-74). To date, no CD reference database exists exclusively for the analysis of membrane proteins, and hence empirical analyses based on current reference databases derived from soluble proteins are not adequate for accurate analyses of membrane protein secondary structures (Wallace et al (2003) Prot. Sci. 12:875-884). We have therefore created a new reference database of CD spectra of integral membrane proteins whose crystal structures have been determined. To date it contains more than 20 proteins, and spans the range of secondary structures from mostly helical to mostly sheet proteins. This reference database should enable more accurate secondary structure determinations of membrane embedded proteins and will become one of the reference database options in the CD calculation server DICHROWEB (Whitmore & Wallace (2004) NAR 32:W668-673).« less

  11. Endo-beta-N-acetylglucosaminidase, an enzyme involved in processing of free oligosaccharides in the cytosol.

    PubMed

    Suzuki, Tadashi; Yano, Keiichi; Sugimoto, Seiji; Kitajima, Ken; Lennarz, William J; Inoue, Sadako; Inoue, Yasuo; Emori, Yasufumi

    2002-07-23

    Formation of oligosaccharides occurs both in the cytosol and in the lumen of the endoplasmic reticulum (ER). Luminal oligosaccharides are transported into the cytosol to ensure that they do not interfere with proper functioning of the glycan-dependent quality control machinery in the lumen of the ER for newly synthesized glycoproteins. Once in the cytosol, free oligosaccharides are catabolized, possibly to maximize the reutilization of the component sugars. An endo-beta-N-acetylglucosaminidase (ENGase) is a key enzyme involved in the processing of free oligosaccharides in the cytosol. This enzyme activity has been widely described in animal cells, but the gene encoding this enzyme activity has not been reported. Here, we report the identification of the gene encoding human cytosolic ENGase. After 11 steps, the enzyme was purified 150,000-fold to homogeneity from hen oviduct, and several internal amino acid sequences were analyzed. Based on the internal sequence and examination of expressed sequence tag (EST) databases, we identified the human orthologue of the purified protein. The human protein consists of 743 aa and has no apparent signal sequence, supporting the idea that this enzyme is localized in the cytosol. By expressing the cDNA of the putative human ENGase in COS-7 cells, the enzyme activity in the soluble fraction was enhanced 100-fold over the basal level, confirming that the human gene identified indeed encodes for ENGase. Careful gene database surveys revealed the occurrence of ENGase homologues in Drosophila melanogaster, Caenorhabditis elegans, and Arabidopsis thaliana, indicating the broad occurrence of ENGase in higher eukaryotes. This gene was expressed in a variety of human tissues, suggesting that this enzyme is involved in basic biological processes in eukaryotic cells.

  12. Improved orthologous databases to ease protozoan targets inference.

    PubMed

    Kotowski, Nelson; Jardim, Rodrigo; Dávila, Alberto M R

    2015-09-29

    Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformatics applications and algorithms. In this article, we propose a methodology to build improved orthologous databases with the potential to aid on protozoan target identification, one of the many tasks which benefit from comparative genomics tools. Our analyses are based on OrthoSearch, a comparative genomics pipeline originally designed to infer orthologs through protein-profile comparison, supported by an HMM, reciprocal best hits based approach. Our methodology allows OrthoSearch to confront two orthologous databases and to generate an improved new one. Such can be later used to infer potential protozoan targets through a similarity analysis against the human genome. The protein sequences of Cryptosporidium hominis, Entamoeba histolytica and Leishmania infantum genomes were comparatively analyzed against three orthologous databases: (i) EggNOG KOG, (ii) ProtozoaDB and (iii) Kegg Orthology (KO). That allowed us to create two new orthologous databases, "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB", with 16,938 and 27,701 orthologous groups, respectively. Such new orthologous databases were used for a regular OrthoSearch run. By confronting "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB" databases and protozoan species we were able to detect the following total of orthologous groups and coverage (relation between the inferred orthologous groups and the species total number of proteins): Cryptosporidium hominis: 1,821 (11 %) and 3,254 (12 %); Entamoeba histolytica: 2,245 (13 %) and 5,305 (19 %); Leishmania infantum: 2,702 (16 %) and 4,760 (17 %). Using our HMM-based methodology and the largest created orthologous database, it was possible to infer 13 orthologous groups which represent potential protozoan targets; these were found because of our distant homology approach. We also provide the number of species-specific, pair-to-pair and core groups from such analyses, depicted in Venn diagrams. The orthologous databases generated by our HMM-based methodology provide a broader dataset, with larger amounts of orthologous groups when compared to the original databases used as input. Those may be used for several homology inference analyses, annotation tasks and protozoan targets identification.

  13. Tandem mass spectrometry for the detection of plant pathogenic fungi and the effects of database composition on protein inferences.

    PubMed

    Padliya, Neerav D; Garrett, Wesley M; Campbell, Kimberly B; Tabb, David L; Cooper, Bret

    2007-11-01

    LC-MS/MS has demonstrated potential for detecting plant pathogens. Unlike PCR or ELISA, LC-MS/MS does not require pathogen-specific reagents for the detection of pathogen-specific proteins and peptides. However, the MS/MS approach we and others have explored does require a protein sequence reference database and database-search software to interpret tandem mass spectra. To evaluate the limitations of database composition on pathogen identification, we analyzed proteins from cultured Ustilago maydis, Phytophthora sojae, Fusarium graminearum, and Rhizoctonia solani by LC-MS/MS. When the search database did not contain sequences for a target pathogen, or contained sequences to related pathogens, target pathogen spectra were reliably matched to protein sequences from nontarget organisms, giving an illusion that proteins from nontarget organisms were identified. Our analysis demonstrates that when database-search software is used as part of the identification process, a paradox exists whereby additional sequences needed to detect a wide variety of possible organisms may lead to more cross-species protein matches and misidentification of pathogens.

  14. PROFESS: a PROtein Function, Evolution, Structure and Sequence database

    PubMed Central

    Triplet, Thomas; Shortridge, Matthew D.; Griep, Mark A.; Stark, Jaime L.; Powers, Robert; Revesz, Peter

    2010-01-01

    The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are ∼1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein–protein interaction networks. Database URL: http://cse.unl.edu/∼profess/ PMID:20624718

  15. SynechoNET: integrated protein-protein interaction database of a model cyanobacterium Synechocystis sp. PCC 6803.

    PubMed

    Kim, Woo-Yeon; Kang, Sungsoo; Kim, Byoung-Chul; Oh, Jeehyun; Cho, Seongwoong; Bhak, Jong; Choi, Jong-Soon

    2008-01-01

    Cyanobacteria are model organisms for studying photosynthesis, carbon and nitrogen assimilation, evolution of plant plastids, and adaptability to environmental stresses. Despite many studies on cyanobacteria, there is no web-based database of their regulatory and signaling protein-protein interaction networks to date. We report a database and website SynechoNET that provides predicted protein-protein interactions. SynechoNET shows cyanobacterial domain-domain interactions as well as their protein-level interactions using the model cyanobacterium, Synechocystis sp. PCC 6803. It predicts the protein-protein interactions using public interaction databases that contain mutually complementary and redundant data. Furthermore, SynechoNET provides information on transmembrane topology, signal peptide, and domain structure in order to support the analysis of regulatory membrane proteins. Such biological information can be queried and visualized in user-friendly web interfaces that include the interactive network viewer and search pages by keyword and functional category. SynechoNET is an integrated protein-protein interaction database designed to analyze regulatory membrane proteins in cyanobacteria. It provides a platform for biologists to extend the genomic data of cyanobacteria by predicting interaction partners, membrane association, and membrane topology of Synechocystis proteins. SynechoNET is freely available at http://synechocystis.org/ or directly at http://bioportal.kobic.kr/SynechoNET/.

  16. Inborn errors of metabolism and the human interactome: a systems medicine approach.

    PubMed

    Woidy, Mathias; Muntau, Ania C; Gersting, Søren W

    2018-02-05

    The group of inborn errors of metabolism (IEM) displays a marked heterogeneity and IEM can affect virtually all functions and organs of the human organism; however, IEM share that their associated proteins function in metabolism. Most proteins carry out cellular functions by interacting with other proteins, and thus are organized in biological networks. Therefore, diseases are rarely the consequence of single gene mutations but of the perturbations caused in the related cellular network. Systematic approaches that integrate multi-omics and database information into biological networks have successfully expanded our knowledge of complex disorders but network-based strategies have been rarely applied to study IEM. We analyzed IEM on a proteome scale and found that IEM-associated proteins are organized as a network of linked modules within the human interactome of protein interactions, the IEM interactome. Certain IEM disease groups formed self-contained disease modules, which were highly interlinked. On the other hand, we observed disease modules consisting of proteins from many different disease groups in the IEM interactome. Moreover, we explored the overlap between IEM and non-IEM disease genes and applied network medicine approaches to investigate shared biological pathways, clinical signs and symptoms, and links to drug targets. The provided resources may help to elucidate the molecular mechanisms underlying new IEM, to uncover the significance of disease-associated mutations, to identify new biomarkers, and to develop novel therapeutic strategies.

  17. PACSY, a relational database management system for protein structure and chemical shift analysis.

    PubMed

    Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo; Lee, Weontae; Markley, John L

    2012-10-01

    PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu.

  18. Metaproteomics as a Complementary Approach to Gut Microbiota in Health and Disease

    NASA Astrophysics Data System (ADS)

    Petriz, Bernardo A.; Franco, Octávio L.

    2017-01-01

    Classic studies on phylotype profiling are limited to the identification of microbial constituents, where information is lacking about the molecular interaction of these bacterial communities with the host genome and the possible outcomes in host biology. A range of OMICs approaches have provided great progress linking the microbiota to health and disease. However, the investigation of this context through proteomic mass spectrometry-based tools is still being improved. Therefore, metaproteomics or community proteogenomics has emerged as a complementary approach to metagenomic data, as a field in proteomics aiming to perform large-scale characterization of proteins from environmental microbiota such as the human gut. The advances in molecular separation methods coupled with mass spectrometry (e.g. LC-MS/MS) and proteome bioinformatics have been fundamental in these novel large-scale metaproteomic studies, which have further been performed in a wide range of samples including soil, plant and human environments. Metaproteomic studies will make major progress if a comprehensive database covering the genes and expresses proteins from all gut microbial species is developed. To this end, we here present some of the main limitations of metaproteomic studies in complex microbiota environments such as the gut, also addressing the up-to-date pipelines in sample preparation prior to fractionation/separation and mass spectrometry analysis. In addition, a novel approach to the limitations of metagenomic databases is also discussed. Finally, prospects are addressed regarding the application of metaproteomic analysis using a unified host-microbiome gene database and other meta-OMICs platforms.

  19. MIPS: a database for protein sequences and complete genomes.

    PubMed Central

    Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

    1998-01-01

    The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795

  20. A chromosome-centric human proteome project (C-HPP) to characterize the sets of proteins encoded in chromosome 17.

    PubMed

    Liu, Suli; Im, Hogune; Bairoch, Amos; Cristofanilli, Massimo; Chen, Rui; Deutsch, Eric W; Dalton, Stephen; Fenyo, David; Fanayan, Susan; Gates, Chris; Gaudet, Pascale; Hincapie, Marina; Hanash, Samir; Kim, Hoguen; Jeong, Seul-Ki; Lundberg, Emma; Mias, George; Menon, Rajasree; Mu, Zhaomei; Nice, Edouard; Paik, Young-Ki; Uhlen, Mathias; Wells, Lance; Wu, Shiaw-Lin; Yan, Fangfei; Zhang, Fan; Zhang, Yue; Snyder, Michael; Omenn, Gilbert S; Beavis, Ronald C; Hancock, William S

    2013-01-04

    We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 "missing" proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of "missing" proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.

  1. HypoxiaDB: a database of hypoxia-regulated proteins

    PubMed Central

    Khurana, Pankaj; Sugadev, Ragumani; Jain, Jaspreet; Singh, Shashi Bala

    2013-01-01

    There has been intense interest in the cellular response to hypoxia, and a large number of differentially expressed proteins have been identified through various high-throughput experiments. These valuable data are scattered, and there have been no systematic attempts to document the various proteins regulated by hypoxia. Compilation, curation and annotation of these data are important in deciphering their role in hypoxia and hypoxia-related disorders. Therefore, we have compiled HypoxiaDB, a database of hypoxia-regulated proteins. It is a comprehensive, manually-curated, non-redundant catalog of proteins whose expressions are shown experimentally to be altered at different levels and durations of hypoxia. The database currently contains 72 000 manually curated entries taken on 3500 proteins extracted from 73 peer-reviewed publications selected from PubMed. HypoxiaDB is distinctive from other generalized databases: (i) it compiles tissue-specific protein expression changes under different levels and duration of hypoxia. Also, it provides manually curated literature references to support the inclusion of the protein in the database and establish its association with hypoxia. (ii) For each protein, HypoxiaDB integrates data on gene ontology, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway, protein–protein interactions, protein family (Pfam), OMIM (Online Mendelian Inheritance in Man), PDB (Protein Data Bank) structures and homology to other sequenced genomes. (iii) It also provides pre-compiled information on hypoxia-proteins, which otherwise requires tedious computational analysis. This includes information like chromosomal location, identifiers like Entrez, HGNC, Unigene, Uniprot, Ensembl, Vega, GI numbers and Genbank accession numbers associated with the protein. These are further cross-linked to respective public databases augmenting HypoxiaDB to the external repositories. (iv) In addition, HypoxiaDB provides an online sequence-similarity search tool for users to compare their protein sequences with HypoxiaDB protein database. We hope that HypoxiaDB will enrich our knowledge about hypoxia-related biology and eventually will lead to the development of novel hypothesis and advancements in diagnostic and therapeutic activities. HypoxiaDB is freely accessible for academic and non-profit users via http://www.hypoxiadb.com. Database URL: http://www.hypoxiadb.com PMID:24178989

  2. Direct Detection of Alternative Open Reading Frames Translation Products in Human Significantly Expands the Proteome

    PubMed Central

    Vanderperre, Benoît; Lucier, Jean-François; Bissonnette, Cyntia; Motard, Julie; Tremblay, Guillaume; Vanderperre, Solène; Wisztorski, Maxence; Salzet, Michel; Boisvert, François-Michel; Roucou, Xavier

    2013-01-01

    A fully mature mRNA is usually associated to a reference open reading frame encoding a single protein. Yet, mature mRNAs contain unconventional alternative open reading frames (AltORFs) located in untranslated regions (UTRs) or overlapping the reference ORFs (RefORFs) in non-canonical +2 and +3 reading frames. Although recent ribosome profiling and footprinting approaches have suggested the significant use of unconventional translation initiation sites in mammals, direct evidence of large-scale alternative protein expression at the proteome level is still lacking. To determine the contribution of alternative proteins to the human proteome, we generated a database of predicted human AltORFs revealing a new proteome mainly composed of small proteins with a median length of 57 amino acids, compared to 344 amino acids for the reference proteome. We experimentally detected a total of 1,259 alternative proteins by mass spectrometry analyses of human cell lines, tissues and fluids. In plasma and serum, alternative proteins represent up to 55% of the proteome and may be a potential unsuspected new source for biomarkers. We observed constitutive co-expression of RefORFs and AltORFs from endogenous genes and from transfected cDNAs, including tumor suppressor p53, and provide evidence that out-of-frame clones representing AltORFs are mistakenly rejected as false positive in cDNAs screening assays. Functional importance of alternative proteins is strongly supported by significant evolutionary conservation in vertebrates, invertebrates, and yeast. Our results imply that coding of multiple proteins in a single gene by the use of AltORFs may be a common feature in eukaryotes, and confirm that translation of unconventional ORFs generates an as yet unexplored proteome. PMID:23950983

  3. Database resources of the National Center for Biotechnology Information

    PubMed Central

    Wheeler, David L.; Barrett, Tanya; Benson, Dennis A.; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Kenton, David L.; Khovayko, Oleg; Lipman, David J.; Madden, Thomas L.; Maglott, Donna R.; Ostell, James; Pruitt, Kim D.; Schuler, Gregory D.; Schriml, Lynn M.; Sequeira, Edwin; Sherry, Stephen T.; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Suzek, Tugba O.; Tatusov, Roman; Tatusova, Tatiana A.; Wagner, Lukas; Yaschenko, Eugene

    2006-01-01

    In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Retroviral Genotyping Tools, HIV-1, Human Protein Interaction Database, SAGEmap, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of the resources can be accessed through the NCBI home page at: . PMID:16381840

  4. Database on pharmacophore analysis of active principles, from medicinal plants

    PubMed Central

    Pitchai, Daisy; Manikkam, Rajalakshmi; Rajendran, Sasikala R; Pitchai, Gnanamani

    2010-01-01

    Plants continue to be a major source of medicines, as they have been throughout human history. In the present days, drug discovery from plants involves a multidisciplinary approach combining ethnobotanical, phytochemical and biological techniques to provide us new chemical compounds (lead molecules) for the development of drugs against various pharmacological targets, including cancer, diabetes and its secondary complications. In view of this need in current drug discovery from medicinal plants, here we describe another web database containing the information of pharmacophore analysis of active principles possessing antidiabetic, antimicrobial, anticancerous and antioxidant properties from medicinal plants. The database provides the botanical, taxonomic classification, biochemical as well as pharmacological properties of medicinal plants. Data on antidiabetic, antimicrobial, anti oxidative, anti tumor and anti inflammatory compounds, and their physicochemical properties, SMILES Notation, Lipinski's properties are included in our database. One of the proposed features in the database is the predicted ADMET values and the interaction of bioactive compounds to the target protein. The database alphabetically lists the compound name and also provides tabs separating for anti microbial, antitumor, antidiabetic, and antioxidative compounds. Availability http://www.hccbif.info / PMID:21346859

  5. ARCPHdb: A comprehensive protein database for SF1 and SF2 helicase from archaea.

    PubMed

    Moukhtar, Mirna; Chaar, Wafi; Abdel-Razzak, Ziad; Khalil, Mohamad; Taha, Samir; Chamieh, Hala

    2017-01-01

    Superfamily 1 and Superfamily 2 helicases, two of the largest helicase protein families, play vital roles in many biological processes including replication, transcription and translation. Study of helicase proteins in the model microorganisms of archaea have largely contributed to the understanding of their function, architecture and assembly. Based on a large phylogenomics approach, we have identified and classified all SF1 and SF2 protein families in ninety five sequenced archaea genomes. Here we developed an online webserver linked to a specialized protein database named ARCPHdb to provide access for SF1 and SF2 helicase families from archaea. ARCPHdb was implemented using MySQL relational database. Web interfaces were developed using Netbeans. Data were stored according to UniProt accession numbers, NCBI Ref Seq ID, PDB IDs and Entrez Databases. A user-friendly interactive web interface has been developed to browse, search and download archaeal helicase protein sequences, their available 3D structure models, and related documentation available in the literature provided by ARCPHdb. The database provides direct links to matching external databases. The ARCPHdb is the first online database to compile all protein information on SF1 and SF2 helicase from archaea in one platform. This database provides essential resource information for all researchers interested in the field. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. Rice proteome database: a step toward functional analysis of the rice genome.

    PubMed

    Komatsu, Setsuko

    2005-09-01

    The technique of proteome analysis using two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) has the power to monitor global changes that occur in the protein complement of tissues and subcellular compartments. In this study, the proteins of rice were cataloged, a rice proteome database was constructed, and a functional characterization of some of the identified proteins was undertaken. Proteins extracted from various tissues and subcellular compartments in rice were separated by 2D-PAGE and an image analyzer was used to construct a display of the proteins. The Rice Proteome Database contains 23 reference maps based on 2D-PAGE of proteins from various rice tissues and subcellular compartments. These reference maps comprise 13129 identified proteins, and the amino acid sequences of 5092 proteins are entered in the database. Major proteins involved in growth or stress responses were identified using the proteome approach. Some of these proteins, including a beta-tubulin, calreticulin, and ribulose-1,5-bisphosphate carboxylase/oxygenase activase in rice, have unexpected functions. The information obtained from the Rice Proteome Database will aid in cloning the genes for and predicting the function of unknown proteins.

  7. In silico analysis of fragile histidine triad involved in regression of carcinoma.

    PubMed

    Rasheed, Muhammad Asif; Tariq, Fatima; Afzal, Sara; Mannanv, Shazia

    2017-04-01

    Hepatocellular carcinoma (HCCa) is a primary malignancy of the liver. Many different proteins are involved in HCCa including insulin growth factor (IGF) II , signal transducers and activators of transcription (STAT) 3, STAT4, mothers against decapentaplegic homolog 4 (SMAD 4), fragile histidine triad (FHIT) and selective internal radiation therapy (SIRT) etc. The present study is based on the bioinformatics analysis of FHIT protein in order to understand the proteomics aspect and improvement of the diagnosis of the disease based on the protein. Different information related to protein were gathered from different databases, including National Centre for Biotechnology Information (NCBI) Gene, Protein and Online Mendelian Inheritance in Man (OMIM) databases, Uniprot database, String database and Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Moreover, the structure of the protein and evaluation of the quality of the structure were included from Easy modeler programme. Hence, this analysis not only helped to gather information related to the protein at one place, but also analysed the structure and quality of the protein to conclude that the protein has a role in carcinoma.

  8. Bioinformatic analysis for allergenicity assessment of Bacillus thuringiensis Cry proteins expressed in insect-resistant food crops.

    PubMed

    Randhawa, Gurinder Jit; Singh, Monika; Grover, Monendra

    2011-02-01

    The novel proteins introduced into the genetically modified (GM) crops need to be evaluated for the potential allergenicity before their introduction into the food chain to address the safety concerns of consumers. At present, there is no single definitive test that can be relied upon to predict allergic response in humans to a new protein; hence a composite approach to allergic response prediction is described in this study. The present study reports on the evaluation of the Cry proteins, encoded by cry1Ac, cry1Ab, cry2Ab, cry1Ca, cry1Fa/cry1Ca hybrid, being expressed in Bt food crops that are under field trials in India, for potential allergenic cross-reactivity using bioinformatics search tools. The sequence identity of amino acids was analyzed using FASTA3 of AllergenOnline version 10.0 and BLASTX of NCBI Entrez to identify any potential sequence matches to allergen proteins. As a step further in the detection of allergens, an independent database of domains in the allergens available in the AllergenOnline database was also developed. The results indicated no significant alignment and similarity of Cry proteins at domain level with any of the known allergens revealing that there is no potential risk of allergenic cross-reactivity. Copyright © 2010 Elsevier Ltd. All rights reserved.

  9. Protein structure analysis of mutations causing inheritable diseases. An e-Science approach with life scientist friendly interfaces.

    PubMed

    Venselaar, Hanka; Te Beek, Tim A H; Kuipers, Remko K P; Hekkelman, Maarten L; Vriend, Gert

    2010-11-08

    Many newly detected point mutations are located in protein-coding regions of the human genome. Knowledge of their effects on the protein's 3D structure provides insight into the protein's mechanism, can aid the design of further experiments, and eventually can lead to the development of new medicines and diagnostic tools. In this article we describe HOPE, a fully automatic program that analyzes the structural and functional effects of point mutations. HOPE collects information from a wide range of information sources including calculations on the 3D coordinates of the protein by using WHAT IF Web services, sequence annotations from the UniProt database, and predictions by DAS services. Homology models are built with YASARA. Data is stored in a database and used in a decision scheme to identify the effects of a mutation on the protein's 3D structure and function. HOPE builds a report with text, figures, and animations that is easy to use and understandable for (bio)medical researchers. We tested HOPE by comparing its output to the results of manually performed projects. In all straightforward cases HOPE performed similar to a trained bioinformatician. The use of 3D structures helps optimize the results in terms of reliability and details. HOPE's results are easy to understand and are presented in a way that is attractive for researchers without an extensive bioinformatics background.

  10. The National NeuroAIDS Tissue Consortium (NNTC) Database: an integrated database for HIV-related studies

    PubMed Central

    Cserhati, Matyas F.; Pandey, Sanjit; Beaudoin, James J.; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S.

    2015-01-01

    We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33 017 407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database. Database URL: http://nntc-dcc.unmc.edu PMID:26228431

  11. The Universal Protein Resource (UniProt): an expanding universe of protein information.

    PubMed

    Wu, Cathy H; Apweiler, Rolf; Bairoch, Amos; Natale, Darren A; Barker, Winona C; Boeckmann, Brigitte; Ferro, Serenella; Gasteiger, Elisabeth; Huang, Hongzhan; Lopez, Rodrigo; Magrane, Michele; Martin, Maria J; Mazumder, Raja; O'Donovan, Claire; Redaschi, Nicole; Suzek, Baris

    2006-01-01

    The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.

  12. Domain fusion analysis by applying relational algebra to protein sequence and domain databases.

    PubMed

    Truong, Kevin; Ikura, Mitsuhiko

    2003-05-06

    Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.

  13. Protein Bioinformatics Databases and Resources

    PubMed Central

    Chen, Chuming; Huang, Hongzhan; Wu, Cathy H.

    2017-01-01

    Many publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. To help researchers quickly find the appropriate protein related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era. PMID:28150231

  14. Nup93, a Vertebrate Homologue of Yeast Nic96p, Forms a Complex with a Novel 205-kDa Protein and Is Required for Correct Nuclear Pore Assembly

    PubMed Central

    Grandi, Paola; Dang, Tam; Pané, Nelly; Shevchenko, Andrej; Mann, Matthias; Forbes, Douglass; Hurt, Ed

    1997-01-01

    Yeast and vertebrate nuclear pores display significant morphological similarity by electron microscopy, but sequence similarity between the respective proteins has been more difficult to observe. Herein we have identified a vertebrate nucleoporin, Nup93, in both human and Xenopus that has proved to be an evolutionarily related homologue of the yeast nucleoporin Nic96p. Polyclonal antiserum to human Nup93 detects corresponding proteins in human, rat, and Xenopus cells. Immunofluorescence and immunoelectron microscopy localize vertebrate Nup93 at the nuclear basket and at or near the nuclear entry to the gated channel of the pore. Immunoprecipitation from both mammalian and Xenopus cell extracts indicates that a small fraction of Nup93 physically interacts with the nucleoporin p62, just as yeast Nic96p interacts with the yeast p62 homologue. However, a large fraction of vertebrate Nup93 is extracted from pores and is also present in Xenopus egg extracts in complex with a newly discovered 205-kDa protein. Mass spectrometric sequencing of the human 205-kDa protein reveals that this protein is encoded by an open reading frame, KIAAO225, present in the human database. The putative human nucleoporin of 205 kDa has related sequence homologues in Caenorhabditis elegans and Saccharomyces cerevisiae. To analyze the role of the Nup93 complex in the pore, nuclei were assembled that lack the Nup93 complex after immunodepletion of a Xenopus nuclear reconstitution extract. The Nup93-complex–depleted nuclei are clearly defective for correct nuclear pore assembly. From these experiments, we conclude that the vertebrate and yeast pore have significant homology in their functionally important cores and that, with the identification of Nup93 and the 205-kDa protein, we have extended the knowledge of the nearest-neighbor interactions of this core in both yeast and vertebrates. PMID:9348540

  15. Expression of psoriasis-associated fatty acid-binding protein in senescent human dermal microvascular endothelial cells.

    PubMed

    Ha, Moon Kyung; Chung, Kee Yang; Lee, Ju Hee; Bang, Dongsik; Park, Yoon Kee; Lee, Kwang Hoon

    2004-09-01

    Aging is associated with the progressive pathophysiologic modification of endothelial cells. In vitro endothelial cell senescence is accompanied by proliferative activity failure and by perturbations in gene and protein expressions. Moreover, this cellular senescence in culture has been proposed to reflect processes that occur in aging organisms. In order to observe the changing patterns of protein expression in senescent human dermal microvascular endothelial cells (HDMECs), proteins obtained from both early- and late-passaged HDMECs were separated by two-dimensional electrophoresis, visualized by silver staining, and quantified by image processing. Proteins of interest were extracted by in-gel digestion with trypsin and quantified by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS), by searching the National Center for Biotechnology Information protein-sequence database. More than 2000 spots were detected by 2D electrophoresis within a linear pH range of 3-10. Twenty-two major differentially expressed spots were observed in serially passaged HDMECs and identified with high confidence by MALDI-TOF-MS. One of these spots was found to be a 14-15 kDa psoriasis-associated fatty acid-binding protein (PA-FABP) with high affinity for long-chain fatty acids. The expression of PA-FABP was confirmed to be elevated in senescent HDMECs (passage 20) by fluorescence-activated cell sorting (FACS), confocal laser microscopy, and by immunohistochemistry in aged human skin tissue. Our results suggest that the overexpression of FABP in cultured senescent HDMECs is closely related to skin aging.

  16. Novel signatures of cancer-associated fibroblasts.

    PubMed

    Bozóky, Benedek; Savchenko, Andrii; Csermely, Péter; Korcsmáros, Tamás; Dúl, Zoltán; Pontén, Fredrik; Székely, László; Klein, George

    2013-07-15

    Increasing evidence indicates the importance of the tumor microenvironment, in particular cancer-associated fibroblasts, in cancer development and progression. In our study, we developed a novel, visually based method to identify new immunohistochemical signatures of these fibroblasts. The method employed a protein list based on 759 protein products of genes identified by RNA profiling from our previous study, comparing fibroblasts with differential growth-modulating effect on human cancers cells, and their first neighbors in the human protein interactome. These 2,654 proteins were analyzed in the Human Protein Atlas online database by comparing their immunohistochemical expression patterns in normal versus tumor-associated fibroblasts. Twelve new proteins differentially expressed in cancer-associated fibroblasts were identified (DLG1, BHLHE40, ROCK2, RAB31, AZI2, PKM2, ARHGAP31, ARHGAP26, ITCH, EGLN1, RNF19A and PLOD2), four of them can be connected to the Rho kinase signaling pathway. They were further analyzed in several additional tumor stromata and revealed that the majority showed congruence among the different tumors. Many of them were also positive in normal myofibroblast-like cells. The new signatures can be useful in immunohistochemical analysis of different tumor stromata and may also give us an insight into the pathways activated in them in their true in vivo context. The method itself could be used for other similar analysis to identify proteins expressed in other cell types in tumors and their surrounding microenvironment. Copyright © 2013 UICC.

  17. PACSY, a relational database management system for protein structure and chemical shift analysis

    PubMed Central

    Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo

    2012-01-01

    PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu. PMID:22903636

  18. A protein relational database and protein family knowledge bases to facilitate structure-based design analyses.

    PubMed

    Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine

    2010-08-01

    The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.

  19. An Extensive Survey of Tyrosine Phosphorylation Revealing New Sites in Human Mammary Epithelial Cells

    PubMed Central

    Heibeck, Tyler H.; Ding, Shi-Jian; Opresko, Lee K.; Zhao, Rui; Schepmoes, Athena A.; Yang, Feng; Tolmachev, Aleksey V.; Monroe, Matthew E.; Camp, David G.; Smith, Richard D.; Wiley, H. Steven; Qian, Wei-Jun

    2010-01-01

    Protein tyrosine phosphorylation represents a central regulatory mechanism in cell signaling. Here we present an extensive survey of tyrosine phosphorylation sites in a normal-derived human mammary epithelial cell line by applying anti-phosphotyrosine peptide immunoaffinity purification coupled with high sensitivity capillary liquid chromatography tandem mass spectrometry. A total of 481 tyrosine phosphorylation sites (covered by 716 unique peptides) from 285 proteins were confidently identified in HMEC following the analysis of both the basal condition and acute stimulation with epidermal growth factor (EGF). The estimated false discovery rate was 1.0% as determined by searching against a scrambled database. Comparison of these data with existing literature showed significant agreement for previously reported sites. However, we observed 281 sites that were not previously reported for HMEC cultures and 29 of which have not been reported for any human cell or tissue system. The analysis showed that the majority of highly phosphorylated proteins were relatively low-abundance. Large differences in phosphorylation stoichiometry for sites within the same protein were also observed, raising the possibility of more important functional roles for such highly phosphorylated pTyr sites. By mapping to major signaling networks, such as the EGF receptor and insulin growth factor-1 receptor signaling pathways, many known proteins involved in these pathways were revealed to be tyrosine phosphorylated, which provides interesting targets for future hypothesis-driven and targeted quantitative studies involving tyrosine phosphorylation in HMEC or other human systems. PMID:19534553

  20. THGS: a web-based database of Transmembrane Helices in Genome Sequences

    PubMed Central

    Fernando, S. A.; Selvarani, P.; Das, Soma; Kumar, Ch. Kiran; Mondal, Sukanta; Ramakumar, S.; Sekar, K.

    2004-01-01

    Transmembrane Helices in Genome Sequences (THGS) is an interactive web-based database, developed to search the transmembrane helices in the user-interested gene sequences available in the Genome Database (GDB). The proposed database has provision to search sequence motifs in transmembrane and globular proteins. In addition, the motif can be searched in the other sequence databases (Swiss-Prot and PIR) or in the macromolecular structure database, Protein Data Bank (PDB). Further, the 3D structure of the corresponding queried motif, if it is available in the solved protein structures deposited in the Protein Data Bank, can also be visualized using the widely used graphics package RASMOL. All the sequence databases used in the present work are updated frequently and hence the results produced are up to date. The database THGS is freely available via the world wide web and can be accessed at http://pranag.physics.iisc.ernet.in/thgs/ or http://144.16.71.10/thgs/. PMID:14681375

  1. Database resources of the National Center for Biotechnology Information

    PubMed Central

    Wheeler, David L.; Barrett, Tanya; Benson, Dennis A.; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Khovayko, Oleg; Landsman, David; Lipman, David J.; Madden, Thomas L.; Maglott, Donna R.; Miller, Vadim; Ostell, James; Pruitt, Kim D.; Schuler, Gregory D.; Shumway, Martin; Sequeira, Edwin; Sherry, Steven T.; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusov, Roman L.; Tatusova, Tatiana A.; Wagner, Lukas; Yaschenko, Eugene

    2008-01-01

    In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:18045790

  2. The Human Skeletal Muscle Proteome Project: a reappraisal of the current literature

    PubMed Central

    Gonzalez‐Freire, Marta; Semba, Richard D.; Ubaida‐Mohien, Ceereena; Fabbri, Elisa; Scalzo, Paul; Højlund, Kurt; Dufresne, Craig; Lyashkov, Alexey

    2016-01-01

    Abstract Skeletal muscle is a large organ that accounts for up to half the total mass of the human body. A progressive decline in muscle mass and strength occurs with ageing and in some individuals configures the syndrome of ‘sarcopenia’, a condition that impairs mobility, challenges autonomy, and is a risk factor for mortality. The mechanisms leading to sarcopenia as well as myopathies are still little understood. The Human Skeletal Muscle Proteome Project was initiated with the aim to characterize muscle proteins and how they change with ageing and disease. We conducted an extensive review of the literature and analysed publically available protein databases. A systematic search of peer‐reviewed studies was performed using PubMed. Search terms included ‘human’, ‘skeletal muscle’, ‘proteome’, ‘proteomic(s)’, and ‘mass spectrometry’, ‘liquid chromatography‐mass spectrometry (LC‐MS/MS)’. A catalogue of 5431 non‐redundant muscle proteins identified by mass spectrometry‐based proteomics from 38 peer‐reviewed scientific publications from 2002 to November 2015 was created. We also developed a nosology system for the classification of muscle proteins based on localization and function. Such inventory of proteins should serve as a useful background reference for future research on changes in muscle proteome assessed by quantitative mass spectrometry‐based proteomic approaches that occur with ageing and diseases. This classification and compilation of the human skeletal muscle proteome can be used for the identification and quantification of proteins in skeletal muscle to discover new mechanisms for sarcopenia and specific muscle diseases that can be targeted for the prevention and treatment. PMID:27897395

  3. A database of human genes and a gene network involved in response to tick-borne encephalitis virus infection.

    PubMed

    Ignatieva, Elena V; Igoshin, Alexander V; Yudin, Nikolay S

    2017-12-28

    Tick-borne encephalitis is caused by the neurotropic, positive-sense RNA virus, tick-borne encephalitis virus (TBEV). TBEV infection can lead to a variety of clinical manifestations ranging from slight fever to severe neurological illness. Very little is known about genetic factors predisposing to severe forms of disease caused by TBEV. The aims of the study were to compile a catalog of human genes involved in response to TBEV infection and to rank genes from the catalog based on the number of neighbors in the network of pairwise interactions involving these genes and TBEV RNA or proteins. Based on manual review and curation of scientific publications a catalog comprising 140 human genes involved in response to TBEV infection was developed. To provide access to data on all genes, the TBEVhostDB web resource ( http://icg.nsc.ru/TBEVHostDB/ ) was created. We reconstructed a network formed by pairwise interactions between TBEV virion itself, viral RNA and viral proteins and 140 genes/proteins from TBEVHostDB. Genes were ranked according to the number of interactions in the network. Two genes/proteins (CCR5 and IFNAR1) that had maximal number of interactions were revealed. It was found that the subnetworks formed by CCR5 and IFNAR1 and their neighbors were a fragments of two key pathways functioning during the course of tick-borne encephalitis: (1) the attenuation of interferon-I signaling pathway by the TBEV NS5 protein that targeted peptidase D; (2) proinflammation and tissue damage pathway triggered by chemokine receptor CCR5 interacting with CD4, CCL3, CCL4, CCL2. Among nine genes associated with severe forms of TBEV infection, three genes/proteins (CCR5, IL10, ARID1B) were found to have protein-protein interactions within the network, and two genes/proteins (IFNL3 and the IL10, that was just mentioned) were up- or down-regulated in response to TBEV infection. Based on this finding, potential mechanisms for participation of CCR5, IL10, ARID1B, and IFNL3 in the host response to TBEV infection were suggested. A database comprising 140 human genes involved in response to TBEV infection was compiled and the TBEVHostDB web resource, providing access to all genes was created. This is the first effort of integrating and unifying data on genetic factors that may predispose to severe forms of diseases caused by TBEV. The TBEVHostDB could potentially be used for assessment of risk factors for severe forms of tick-borne encephalitis and for the design of personalized pharmacological strategies for the treatment of TBEV infection.

  4. The National NeuroAIDS Tissue Consortium (NNTC) Database: an integrated database for HIV-related studies.

    PubMed

    Cserhati, Matyas F; Pandey, Sanjit; Beaudoin, James J; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S

    2015-01-01

    We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33,017,407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database. © The Author(s) 2015. Published by Oxford University Press.

  5. Simrank: Rapid and sensitive general-purpose k-mer search tool

    PubMed Central

    2011-01-01

    Background Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available. Results Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset. Conclusions Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity. PMID:21524302

  6. Open reading frames associated with cancer in the dark matter of the human genome.

    PubMed

    Delgado, Ana Paula; Brandao, Pamela; Chapado, Maria Julia; Hamid, Sheilin; Narayanan, Ramaswamy

    2014-01-01

    The uncharacterized proteins (open reading frames, ORFs) in the human genome offer an opportunity to discover novel targets for cancer. A systematic analysis of the dark matter of the human proteome for druggability and biomarker discovery is crucial to mining the genome. Numerous data mining tools are available to mine these ORFs to develop a comprehensive knowledge base for future target discovery and validation. Using the Genetic Association Database, the ORFs of the human dark matter proteome were screened for evidence of association with neoplasms. The Phenome-Genome Integrator tool was used to establish phenotypic association with disease traits including cancer. Batch analysis of the tools for protein expression analysis, gene ontology and motifs and domains was used to characterize the ORFs. Sixty-two ORFs were identified for neoplasm association. The expression Quantitative Trait Loci (eQTL) analysis identified thirteen ORFs related to cancer traits. Protein expression, motifs and domain analysis and genome-wide association studies verified the relevance of these OncoORFs in diverse tumors. The OncoORFs are also associated with a wide variety of human diseases and disorders. Our results link the OncoORFs to diverse diseases and disorders. This suggests a complex landscape of the uncharacterized proteome in human diseases. These results open the dark matter of the proteome to novel cancer target research. Copyright© 2014, International Institute of Anticancer Research (Dr. John G. Delinasios), All rights reserved.

  7. Novel promoters and coding first exons in DLG2 linked to developmental disorders and intellectual disability.

    PubMed

    Reggiani, Claudio; Coppens, Sandra; Sekhara, Tayeb; Dimov, Ivan; Pichon, Bruno; Lufin, Nicolas; Addor, Marie-Claude; Belligni, Elga Fabia; Digilio, Maria Cristina; Faletra, Flavio; Ferrero, Giovanni Battista; Gerard, Marion; Isidor, Bertrand; Joss, Shelagh; Niel-Bütschi, Florence; Perrone, Maria Dolores; Petit, Florence; Renieri, Alessandra; Romana, Serge; Topa, Alexandra; Vermeesch, Joris Robert; Lenaerts, Tom; Casimir, Georges; Abramowicz, Marc; Bontempi, Gianluca; Vilain, Catheline; Deconinck, Nicolas; Smits, Guillaume

    2017-07-19

    Tissue-specific integrative omics has the potential to reveal new genic elements important for developmental disorders. Two pediatric patients with global developmental delay and intellectual disability phenotype underwent array-CGH genetic testing, both showing a partial deletion of the DLG2 gene. From independent human and murine omics datasets, we combined copy number variations, histone modifications, developmental tissue-specific regulation, and protein data to explore the molecular mechanism at play. Integrating genomics, transcriptomics, and epigenomics data, we describe two novel DLG2 promoters and coding first exons expressed in human fetal brain. Their murine conservation and protein-level evidence allowed us to produce new DLG2 gene models for human and mouse. These new genic elements are deleted in 90% of 29 patients (public and in-house) showing partial deletion of the DLG2 gene. The patients' clinical characteristics expand the neurodevelopmental phenotypic spectrum linked to DLG2 gene disruption to cognitive and behavioral categories. While protein-coding genes are regarded as well known, our work shows that integration of multiple omics datasets can unveil novel coding elements. From a clinical perspective, our work demonstrates that two new DLG2 promoters and exons are crucial for the neurodevelopmental phenotypes associated with this gene. In addition, our work brings evidence for the lack of cross-annotation in human versus mouse reference genomes and nucleotide versus protein databases.

  8. Quokka: a comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome.

    PubMed

    Li, Fuyi; Li, Chen; Marquez-Lago, Tatiana T; Leier, André; Akutsu, Tatsuya; Purcell, Anthony W; Smith, A Ian; Lithgow, Trevor; Daly, Roger J; Song, Jiangning; Chou, Kuo-Chen

    2018-06-27

    Kinase-regulated phosphorylation is a ubiquitous type of post-translational modification (PTM) in both eukaryotic and prokaryotic cells. Phosphorylation plays fundamental roles in many signalling pathways and biological processes, such as protein degradation and protein-protein interactions. Experimental studies have revealed that signalling defects caused by aberrant phosphorylation are highly associated with a variety of human diseases, especially cancers. In light of this, a number of computational methods aiming to accurately predict protein kinase family-specific or kinase-specific phosphorylation sites have been established, thereby facilitating phosphoproteomic data analysis. In this work, we present Quokka, a novel bioinformatics tool that allows users to rapidly and accurately identify human kinase family-regulated phosphorylation sites. Quokka was developed by using a variety of sequence scoring functions combined with an optimized logistic regression algorithm. We evaluated Quokka based on well-prepared up-to-date benchmark and independent test datasets, curated from the Phospho.ELM and UniProt databases, respectively. The independent test demonstrates that Quokka improves the prediction performance compared with state-of-the-art computational tools for phosphorylation prediction. In summary, our tool provides users with high-quality predicted human phosphorylation sites for hypothesis generation and biological validation. The Quokka webserver and datasets are freely available at http://quokka.erc.monash.edu/. Supplementary data are available at Bioinformatics online.

  9. Introducing meta-services for biomedical information extraction

    PubMed Central

    Leitner, Florian; Krallinger, Martin; Rodriguez-Penagos, Carlos; Hakenberg, Jörg; Plake, Conrad; Kuo, Cheng-Ju; Hsu, Chun-Nan; Tsai, Richard Tzong-Han; Hung, Hsi-Chuan; Lau, William W; Johnson, Calvin A; Sætre, Rune; Yoshida, Kazuhiro; Chen, Yan Hua; Kim, Sun; Shin, Soo-Yong; Zhang, Byoung-Tak; Baumgartner, William A; Hunter, Lawrence; Haddow, Barry; Matthews, Michael; Wang, Xinglong; Ruch, Patrick; Ehrler, Frédéric; Özgür, Arzucan; Erkan, Güneş; Radev, Dragomir R; Krauthammer, Michael; Luong, ThaiBinh; Hoffmann, Robert; Sander, Chris; Valencia, Alfonso

    2008-01-01

    We introduce the first meta-service for information extraction in molecular biology, the BioCreative MetaServer (BCMS; ). This prototype platform is a joint effort of 13 research groups and provides automatically generated annotations for PubMed/Medline abstracts. Annotation types cover gene names, gene IDs, species, and protein-protein interactions. The annotations are distributed by the meta-server in both human and machine readable formats (HTML/XML). This service is intended to be used by biomedical researchers and database annotators, and in biomedical language processing. The platform allows direct comparison, unified access, and result aggregation of the annotations. PMID:18834497

  10. Self-organized neural maps of human protein sequences.

    PubMed Central

    Ferrán, E. A.; Pflugfelder, B.; Ferrara, P.

    1994-01-01

    We have recently described a method based on artificial neural networks to cluster protein sequences into families. The network was trained with Kohonen's unsupervised learning algorithm using, as inputs, the matrix patterns derived from the dipeptide composition of the proteins. We present here a large-scale application of that method to classify the 1,758 human protein sequences stored in the SwissProt database (release 19.0), whose lengths are greater than 50 amino acids. In the final 2-dimensional topologically ordered map of 15 x 15 neurons, proteins belonging to known families were associated with the same neuron or with neighboring ones. Also, as an attempt to reduce the time-consuming learning procedure, we compared 2 learning protocols: one of 500 epochs (100 SUN CPU-hours [CPU-h]), and another one of 30 epochs (6.7 CPU-h). A further reduction of learning-computing time, by a factor of about 3.3, with similar protein clustering results, was achieved using a matrix of 11 x 11 components to represent the sequences. Although network training is time consuming, the classification of a new protein in the final ordered map is very fast (14.6 CPU-seconds). We also show a comparison between the artificial neural network approach and conventional methods of biosequence analysis. PMID:8019421

  11. PARPs database: A LIMS systems for protein-protein interaction data mining or laboratory information management system

    PubMed Central

    Droit, Arnaud; Hunter, Joanna M; Rouleau, Michèle; Ethier, Chantal; Picard-Cloutier, Aude; Bourgais, David; Poirier, Guy G

    2007-01-01

    Background In the "post-genome" era, mass spectrometry (MS) has become an important method for the analysis of proteins and the rapid advancement of this technique, in combination with other proteomics methods, results in an increasing amount of proteome data. This data must be archived and analysed using specialized bioinformatics tools. Description We herein describe "PARPs database," a data analysis and management pipeline for liquid chromatography tandem mass spectrometry (LC-MS/MS) proteomics. PARPs database is a web-based tool whose features include experiment annotation, protein database searching, protein sequence management, as well as data-mining of the peptides and proteins identified. Conclusion Using this pipeline, we have successfully identified several interactions of biological significance between PARP-1 and other proteins, namely RFC-1, 2, 3, 4 and 5. PMID:18093328

  12. The 2015 Nucleic Acids Research Database Issue and molecular biology database collection.

    PubMed

    Galperin, Michael Y; Rigden, Daniel J; Fernández-Suárez, Xosé M

    2015-01-01

    The 2015 Nucleic Acids Research Database Issue contains 172 papers that include descriptions of 56 new molecular biology databases, and updates on 115 databases whose descriptions have been previously published in NAR or other journals. Following the classification that has been introduced last year in order to simplify navigation of the entire issue, these articles are divided into eight subject categories. This year's highlights include RNAcentral, an international community portal to various databases on noncoding RNA; ValidatorDB, a validation database for protein structures and their ligands; SASBDB, a primary repository for small-angle scattering data of various macromolecular complexes; MoonProt, a database of 'moonlighting' proteins, and two new databases of protein-protein and other macromolecular complexes, ComPPI and the Complex Portal. This issue also includes an unusually high number of cancer-related databases and other databases dedicated to genomic basics of disease and potential drugs and drug targets. The size of NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/a/, remained approximately the same, following the addition of 74 new resources and removal of 77 obsolete web sites. The entire Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/). Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  13. Novel exon 1 protein-coding regions N-terminally extend human KCNE3 and KCNE4.

    PubMed

    Abbott, Geoffrey W

    2016-08-01

    The 5 human (h)KCNE β subunits each regulate various cation channels and are linked to inherited cardiac arrhythmias. Reported here are previously undiscovered protein-coding regions in exon 1 of hKCNE3 and hKCNE4 that extend their encoded extracellular domains by 44 and 51 residues, which yields full-length proteins of 147 and 221 residues, respectively. Full-length hKCNE3 and hKCNE4 transcript and protein are expressed in multiple human tissues; for hKCNE4, only the longer protein isoform is detectable. Two-electrode voltage-clamp electrophysiology revealed that, when coexpressed in Xenopus laevis oocytes with various potassium channels, the newly discovered segment preserved conversion of KCNQ1 by hKCNE3 to a constitutively open channel, but prevented its inhibition of Kv4.2 and KCNQ4. hKCNE4 slowing of Kv4.2 inactivation and positive-shifted steady-state inactivation were also preserved in the longer form. In contrast, full-length hKCNE4 inhibition of KCNQ1 was limited to 40% at +40 mV vs. 80% inhibition by the shorter form, and augmentation of KCNQ4 activity by hKCNE4 was entirely abolished by the additional segment. Among the genome databases analyzed, the longer KCNE3 is confined to primates; full-length KCNE4 is widespread in vertebrates but is notably absent from Mus musculus Findings highlight unexpected KCNE gene diversity, raise the possibility of dynamic regulation of KCNE partner modulation via splice variation, and suggest that the longer hKCNE3 and hKCNE4 proteins should be adopted in future mechanistic and genetic screening studies.-Abbott, G. W. Novel exon 1 protein-coding regions N-terminally extend human KCNE3 and KCNE4. © FASEB.

  14. The Histone Database: an integrated resource for histones and histone fold-containing proteins

    PubMed Central

    Mariño-Ramírez, Leonardo; Levine, Kevin M.; Morales, Mario; Zhang, Suiyuan; Moreland, R. Travis; Baxevanis, Andreas D.; Landsman, David

    2011-01-01

    Eukaryotic chromatin is composed of DNA and protein components—core histones—that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Histone Sequence Database is a curated collection of sequences and structures of histones and non-histone proteins containing histone folds, assembled from major public databases. Here, we report a substantial increase in the number of sequences and taxonomic coverage for histone and histone fold-containing proteins available in the database. Additionally, the database now contains an expanded dataset that includes archaeal histone sequences. The database also provides comprehensive multiple sequence alignments for each of the four core histones (H2A, H2B, H3 and H4), the linker histones (H1/H5) and the archaeal histones. The database also includes current information on solved histone fold-containing structures. The Histone Sequence Database is an inclusive resource for the analysis of chromatin structure and function focused on histones and histone fold-containing proteins. Database URL: The Histone Sequence Database is freely available and can be accessed at http://research.nhgri.nih.gov/histones/. PMID:22025671

  15. The electric dipole moment of DNA-binding HU protein calculated by the use of an NMR database.

    PubMed

    Takashima, S; Yamaoka, K

    1999-08-30

    Electric birefringence measurements indicated the presence of a large permanent dipole moment in HU protein-DNA complex. In order to substantiate this observation, numerical computation of the dipole moment of HU protein homodimer was carried out by using NMR protein databases. The dipole moments of globular proteins have hitherto been calculated with X-ray databases and NMR data have never been used before. The advantages of NMR databases are: (a) NMR data are obtained, unlike X-ray databases, using protein solutions. Accordingly, this method eliminates the bothersome question as to the possible alteration of the protein structure due to the transition from the crystalline state to the solution state. This question is particularly important for proteins such as HU protein which has some degree of internal flexibility; (b) the three-dimensional coordinates of hydrogen atoms in protein molecules can be determined with a sufficient resolution and this enables the N-H as well as C = O bond moments to be calculated. Since the NMR database of HU protein from Bacillus stearothermophilus consists of 25 models, the surface charge as well as the core dipole moments were computed for each of these structures. The results of these calculations show that the net permanent dipole moments of HU protein homodimer is approximately 500-530 D (1 D = 3.33 x 10(-30) Cm) at pH 7.5 and 600-630 D at the isoelectric point (pH 10.5). These permanent dipole moments are unusually large for a small protein of the size of 19.5 kDa. Nevertheless, the result of numerical calculations is compatible with the electro-optical observation, confirming a very large dipole moment in this protein.

  16. Identification of antipsychotic drug fluspirilene as a potential p53-MDM2 inhibitor: a combined computational and experimental study

    NASA Astrophysics Data System (ADS)

    Patil, Sachin P.; Pacitti, Michael F.; Gilroy, Kevin S.; Ruggiero, John C.; Griffin, Jonathan D.; Butera, Joseph J.; Notarfrancesco, Joseph M.; Tran, Shawn; Stoddart, John W.

    2015-02-01

    The inhibition of tumor suppressor p53 protein due to its direct interaction with oncogenic murine double minute 2 (MDM2) protein, plays a central role in almost 50 % of all human tumor cells. Therefore, pharmacological inhibition of the p53-binding pocket on MDM2, leading to p53 activation, presents an important therapeutic target against these cancers expressing wild-type p53. In this context, the present study utilized an integrated virtual and experimental screening approach to screen a database of approved drugs for potential p53-MDM2 interaction inhibitors. Specifically, using an ensemble rigid-receptor docking approach with four MDM2 protein crystal structures, six drug molecules were identified as possible p53-MDM2 inhibitors. These drug molecules were then subjected to further molecular modeling investigation through flexible-receptor docking followed by Prime/MM-GBSA binding energy analysis. These studies identified fluspirilene, an approved antipsychotic drug, as a top hit with MDM2 binding mode and energy similar to that of a native MDM2 crystal ligand. The molecular dynamics simulations suggested stable binding of fluspirilene to the p53-binding pocket on MDM2 protein. The experimental testing of fluspirilene showed significant growth inhibition of human colon tumor cells in a p53-dependent manner. Fluspirilene also inhibited growth of several other human tumor cell lines in the NCI60 cell line panel. Taken together, these computational and experimental data suggest a potentially novel role of fluspirilene in inhibiting the p53-MDM2 interaction. It is noteworthy here that fluspirilene has a long history of safe human use, thus presenting immediate clinical potential as a cancer therapeutic. Furthermore, fluspirilene could also serve as a structurally-novel lead molecule for the development of more potent, small-molecule p53-MDM2 inhibitors against several types of cancer. Importantly, the combined computational and experimental screening protocol presented in this study may also prove useful for screening other commercially-available compound databases for identification of novel, small molecule p53-MDM2 inhibitors.

  17. MoonProt: a database for proteins that are known to moonlight

    PubMed Central

    Mani, Mathew; Chen, Chang; Amblee, Vaishak; Liu, Haipeng; Mathur, Tanu; Zwicke, Grant; Zabad, Shadi; Patel, Bansi; Thakkar, Jagravi; Jeffery, Constance J.

    2015-01-01

    Moonlighting proteins comprise a class of multifunctional proteins in which a single polypeptide chain performs multiple biochemical functions that are not due to gene fusions, multiple RNA splice variants or pleiotropic effects. The known moonlighting proteins perform a variety of diverse functions in many different cell types and species, and information about their structures and functions is scattered in many publications. We have constructed the manually curated, searchable, internet-based MoonProt Database (http://www.moonlightingproteins.org) with information about the over 200 proteins that have been experimentally verified to be moonlighting proteins. The availability of this organized information provides a more complete picture of what is currently known about moonlighting proteins. The database will also aid researchers in other fields, including determining the functions of genes identified in genome sequencing projects, interpreting data from proteomics projects and annotating protein sequence and structural databases. In addition, information about the structures and functions of moonlighting proteins can be helpful in understanding how novel protein functional sites evolved on an ancient protein scaffold, which can also help in the design of proteins with novel functions. PMID:25324305

  18. Proteomic analysis identifies insulin-like growth factor-binding protein-related protein-1 as a podocyte product.

    PubMed

    Matsumoto, Takayuki; Hess, Sonja; Kajiyama, Hiroshi; Sakairi, Toru; Saleem, Moin A; Mathieson, Peter W; Nojima, Yoshihisa; Kopp, Jeffrey B

    2010-10-01

    The podocyte secretory proteome may influence the phenotype of adjacent podocytes, endothelial cells, parietal epithelial cells, and tubular epithelial cells but has not been systematically characterized. We have initiated studies to characterize this proteome, with the goal of further understanding the podocyte cell biology. We cultured differentiated conditionally immortalized human podocytes and subjected the proteins in conditioned medium to mass spectrometry. At a false discovery rate of <3%, we identified 111 candidates from conditioned medium, including 44 proteins that have signal peptides or are described as secreted proteins in the UniProt database. As validation, we confirmed that one of these proteins, insulin-like growth factor-binding protein-related protein-1 (IGFBP-rP1), was expressed in mRNA and protein of cultured podocytes. In addition, transforming growth factor-β1 stimulation increased IGFBP-rP1 in conditioned medium. We analyzed IGFBP-rP1 glomerular expression in a mouse model of human immunodeficiency virus-associated nephropathy. IGFBP-rP1 was absent from podocytes of normal mice and was expressed in podocytes and pseudocrescents of transgenic mice, where it was coexpressed with desmin, a podocyte injury marker. We conclude that IGFBP-rP1 may be a product of injured podocytes. Further analysis of the podocyte secretory proteome may identify biomarkers of podocyte injury.

  19. Characterization and analysis of posttranslational modifications of the human large cytoplasmic ribosomal subunit proteins by mass spectrometry and Edman sequencing.

    PubMed

    Odintsova, Tatyana I; Müller, Eva-Christina; Ivanov, Anton V; Egorov, Tsezi A; Bienert, Ralf; Vladimirov, Serguei N; Kostka, Susanne; Otto, Albrecht; Wittmann-Liebold, Brigitte; Karpova, Galina G

    2003-04-01

    The 60S ribosomal proteins were isolated from ribosomes of human placenta and separated by reversed phase HPLC. The fractions obtained were subjected to trypsin and Glu-C digestion and analyzed by mass fingerprinting (MALDI-TOF), MS/MS (ESI), and Edman sequencing. Forty-six large subunit proteins were found, 22 of which showed masses in accordance with the SwissProt database (June 2002) masses (proteins L6, L7, L9, L13, L15, L17, L18, L21, L22, L24, L26, L27, L30, L32, L34, L35, L36, L37, L37A, L38, L39, L41). Eleven (proteins L7, L10A, L11, L12, L13A, L23, L23A, L27A, L28, L29, and P0) resulted in mass changes that are consistent with N-terminal loss of methionine, acetylation, internal methylation, or hydroxylation. A loss of methionine without acetylation was found for protein L8 and L17. For nine proteins (L3, L4, L5, L7A, L10, L14, L19, L31, and L40), the molecular masses could not be determined. Proteins P1 and protein L3-like were not identified by the methods applied.

  20. Comparative and quantitative proteomic analysis of normal and degenerated human annulus fibrosus cells.

    PubMed

    Ye, Dongping; Liang, Weiguo; Dai, Libing; Zhou, Longqiang; Yao, Yicun; Zhong, Xin; Chen, Honghui; Xu, Jiake

    2015-05-01

    Degeneration of the intervertebral disc (IVD) is a major chronic medical condition associated with back pain. To better understand the pathogenesis of IVD degeneration, we performed comparative and quantitative proteomic analyses of normal and degenerated human annulus fibrosus (AF) cells and identified proteins that are differentially expressed between them. Annulus fibrosus cells were isolated and cultured from patients with lumbar disc herniation (the experimental group, degenerated AF cells) and scoliosis patients who underwent orthopaedic surgery (the control group, normal AF cells). Comparative proteomic analyses of normal and degenerated cultured AF cells were carried out using 2-D electrophoresis, mass spectrometric analyses, and database searching. Quantitative analyses of silver-stained 2-D electrophoresis gels of normal and degenerated cultured AF cells identified 10 protein spots that showed the most altered differential expression levels between the two groups. Among these, three proteins were decreased, including heat shock cognate 71-kDa protein, glucose-6-phosphate 1-dehydrogenase, and protocadherin-23, whereas seven proteins were increased, including guanine nucleotide-binding protein G(i) subunit α-2, superoxide dismutase, transmembrane protein 51, adenosine receptor A3, 26S protease regulatory subunit 8, lipid phosphate phosphatase-related protein, and fatty acyl-crotonic acid reductase 1. These differentially expressed proteins might be involved in the pathophysiological process of IVD degeneration and have potential values as biomarkers of the degeneration of IVD. © 2015 Wiley Publishing Asia Pty Ltd.

  1. Confronting the catalytic dark matter encoded by sequenced genomes

    PubMed Central

    Ellens, Kenneth W.; Christian, Nils; Singh, Charandeep; Satagopam, Venkata P.

    2017-01-01

    Abstract The post-genomic era has provided researchers with a deluge of protein sequences. However, a significant fraction of the proteins encoded by sequenced genomes remains without an identified function. Here, we aim at determining how many enzymes of uncertain or unknown function are still present in the Saccharomyces cerevisiae and human proteomes. Using information available in the Swiss-Prot, BRENDA and KEGG databases in combination with a Hidden Markov Model-based method, we estimate that >600 yeast and 2000 human proteins (>30% of their proteins of unknown function) are enzymes whose precise function(s) remain(s) to be determined. This illustrates the impressive scale of the ‘unknown enzyme problem’. We extensively review classical biochemical as well as more recent systematic experimental and computational approaches that can be used to support enzyme function discovery research. Finally, we discuss the possible roles of the elusive catalysts in light of recent developments in the fields of enzymology and metabolism as well as the significance of the unknown enzyme problem in the context of metabolic modeling, metabolic engineering and rare disease research. PMID:29059321

  2. PPInterFinder—a mining tool for extracting causal relations on human proteins from literature

    PubMed Central

    Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar

    2013-01-01

    One of the most common and challenging problem in biomedical text mining is to mine protein–protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder—a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. Database URL: http://www.biomining-bu.in/ppinterfinder/ PMID:23325628

  3. Regulatory interactions between long noncoding RNA LINC00968 and miR-9-3p in non-small cell lung cancer: A bioinformatic analysis based on miRNA microarray, GEO and TCGA.

    PubMed

    Li, Dong-Yao; Chen, Wen-Jie; Shang, Jun; Chen, Gang; Li, Shi-Kang

    2018-06-01

    Long non-coding RNAs (lncRNAs) have been demonstrated to mediate carcinogenesis in various types of cancer. However, the regulatory role of lncRNA LINC00968 in lung adenocarcinoma remains unclear. The microRNA (miRNA) expression in LINC00968-overexpressing human lung adenocarcinoma A549 cells was detected using miRNA microarray analysis. miR-9-3p was selected for further analysis, and its expression was verified in the Gene Expression Omnibus (GEO) database. In addition, the regulatory axis of LINC00968 was validated using The Cancer Genome Atlas (TCGA) database. Results of the GEO database indicated miR-9-3p expression in lung adenocarcinoma was significantly higher compared with normal tissues. Functional enrichment analyses of the target genes of miR-9-3p indicated protein binding and the AMP-activated protein kinase pathway were the most enriched Gene Ontology and KEGG terms, respectively. Combining target genes with the correlated genes of LINC00968 and miR-9-3p, 120 objective genes were obtained, which were used to construct a protein-protein interaction (PPI) network. Cyclin A2 (CCNA2) was identified to have a vital role in the PPI network. Significant correlations were detected between LINC00968, miR-9-3p and CCNA2 in lung adenocarcinoma. The LINC00968/miR-9-3p/CCNA2 regulatory axis provides a new foundation for further evaluating the regulatory mechanisms of LINC00968 in lung adenocarcinoma.

  4. Global differential expression of genes located in the Down Syndrome Critical Region in normal human brain

    PubMed Central

    Montoya, Julio Cesar; Fajardo, Dianora; Peña, Angela; Sánchez, Adalberto; Domínguez, Martha C; Satizábal, José María

    2014-01-01

    Background: The information of gene expression obtained from databases, have made possible the extraction and analysis of data related with several molecular processes involving not only in brain homeostasis but its disruption in some neuropathologies; principally in Down syndrome and the Alzheimer disease. Objective: To correlate the levels of transcription of 19 genes located in the Down Syndrome Critical Region (DSCR) with their expression in several substructures of normal human brain. Methods: There were obtained expression profiles of 19 DSCR genes in 42 brain substructures, from gene expression values available at the database of the human brain of the Brain Atlas of the Allen Institute for Brain Sciences", (http://human.brain-map.org/). The co-expression patterns of DSCR genes in brain were calculated by using multivariate statistical methods. Results: Highest levels of gene expression were registered at caudate nucleus, nucleus accumbens and putamen among central areas of cerebral cortex. Increased expression levels of RCAN1 that encode by a protein involved in signal transduction process of the CNS were recorded for PCP4 that participates in the binding to calmodulin and TTC3; a protein that is associated with differentiation of neurons. That previously identified brain structures play a crucial role in the learning process, in different class of memory and in motor skills. Conclusion: The precise regulation of DSCR gene expression is crucial to maintain the brain homeostasis, especially in those areas with high levels of gene expression associated with a remarkable process of learning and cognition. PMID:25767303

  5. Comprehensive Identification of Proteins from MALDI Imaging*

    PubMed Central

    Maier, Stefan K.; Hahne, Hannes; Gholami, Amin Moghaddas; Balluff, Benjamin; Meding, Stephan; Schoene, Cédrik; Walch, Axel K.; Kuster, Bernhard

    2013-01-01

    Matrix-assisted laser desorption/ionization imaging mass spectrometry (MALDI IMS) is a powerful tool for the visualization of proteins in tissues and has demonstrated considerable diagnostic and prognostic value. One main challenge is that the molecular identity of such potential biomarkers mostly remains unknown. We introduce a generic method that removes this issue by systematically identifying the proteins embedded in the MALDI matrix using a combination of bottom-up and top-down proteomics. The analyses of ten human tissues lead to the identification of 1400 abundant and soluble proteins constituting the set of proteins detectable by MALDI IMS including >90% of all IMS biomarkers reported in the literature. Top-down analysis of the matrix proteome identified 124 mostly N- and C-terminally fragmented proteins indicating considerable protein processing activity in tissues. All protein identification data from this study as well as the IMS literature has been deposited into MaTisse, a new publically available database, which we anticipate will become a valuable resource for the IMS community. PMID:23782541

  6. Saturation mutagenesis reveals manifold determinants of exon definition.

    PubMed

    Ke, Shengdong; Anquetil, Vincent; Zamalloa, Jorge Rojas; Maity, Alisha; Yang, Anthony; Arias, Mauricio A; Kalachikov, Sergey; Russo, James J; Ju, Jingyue; Chasin, Lawrence A

    2018-01-01

    To illuminate the extent and roles of exonic sequences in the splicing of human RNA transcripts, we conducted saturation mutagenesis of a 51-nt internal exon in a three-exon minigene. All possible single and tandem dinucleotide substitutions were surveyed. Using high-throughput genetics, 5560 minigene molecules were assayed for splicing in human HEK293 cells. Up to 70% of mutations produced substantial (greater than twofold) phenotypes of either increased or decreased splicing. Of all predicted secondary structural elements, only a single 15-nt stem-loop showed a strong correlation with splicing, acting negatively. The in vitro formation of exon-protein complexes between the mutant molecules and proteins associated with spliceosome formation (U2AF35, U2AF65, U1A, and U1-70K) correlated with splicing efficiencies, suggesting exon definition as the step affected by most mutations. The measured relative binding affinities of dozens of human RNA binding protein domains as reported in the CISBP-RNA database were found to correlate either positively or negatively with splicing efficiency, more than could fit on the 51-nt test exon simultaneously. The large number of these functional protein binding correlations point to a dynamic and heterogeneous population of pre-mRNA molecules, each responding to a particular collection of binding proteins. © 2018 Ke et al.; Published by Cold Spring Harbor Laboratory Press.

  7. A Systematic Review of the Effects of Plant Compared with Animal Protein Sources on Features of Metabolic Syndrome.

    PubMed

    Chalvon-Demersay, Tristan; Azzout-Marniche, Dalila; Arfsten, Judith; Egli, Léonie; Gaudichon, Claire; Karagounis, Leonidas G; Tomé, Daniel

    2017-03-01

    Dietary protein may play an important role in the prevention of metabolic dysfunctions. However, the way in which the protein source affects these dysfunctions has not been clearly established. The aim of the current systematic review was to compare the impact of plant- and animal-sourced dietary proteins on several features of metabolic syndrome in humans. The PubMed database was searched for both chronic and acute interventional studies, as well as observational studies, in healthy humans or those with metabolic dysfunctions, in which the impact of animal and plant protein intake was compared while using the following variables: cholesterolemia and triglyceridemia, blood pressure, glucose homeostasis, and body composition. Based on data extraction, we observed that soy protein consumption (with isoflavones), but not soy protein alone (without isoflavones) or other plant proteins (pea and lupine proteins, wheat gluten), leads to a 3% greater decrease in both total and LDL cholesterol compared with animal-sourced protein ingestion, especially in individuals with high fasting cholesterol concentrations. This observation was made when animal proteins were provided as a whole diet rather than given supplementally. Some observational studies reported an inverse association between plant protein intake and systolic and diastolic blood pressure, but this was not confirmed by intervention studies. Moreover, plant protein (wheat gluten, soy protein) intake as part of a mixed meal resulted in a lower postprandial insulin response than did whey. This systematic review provides some evidence that the intake of soy protein associated with isoflavones may prevent the onset of risk factors associated with cardiovascular disease, i.e., hypercholesterolemia and hypertension, in humans. However, we were not able to draw any further conclusions from the present work on the positive effects of plant proteins relating to glucose homeostasis and body composition. © 2017 American Society for Nutrition.

  8. In Silico Investigation of Traditional Chinese Medicine Compounds to Inhibit Human Histone Deacetylase 2 for Patients with Alzheimer's Disease

    PubMed Central

    Hung, Tzu-Chieh; Lee, Wen-Yuan; Chen, Kuen-Bao; Chan, Yueh-Chiu; Lee, Cheng-Chun

    2014-01-01

    Human histone deacetylase 2 (HDAC2) has been identified as being associated with Alzheimer's disease (AD), a neuropathic degenerative disease. In this study, we screen the world's largest Traditional Chinese Medicine (TCM) database for natural compounds that may be useful as lead compounds in the search for inhibitors of HDAC2 function. The technique of molecular docking was employed to select the ten top TCM candidates. We used three prediction models, multiple linear regression (MLR), support vector machine (SVM), and the Bayes network toolbox (BNT), to predict the bioactivity of the TCM candidates. Molecular dynamics simulation provides the protein-ligand interactions of compounds. The bioactivity predictions of pIC50 values suggest that the TCM candidatesm, (−)-Bontl ferulate, monomethylcurcumin, and ningposides C, have a greater effect on HDAC2 inhibition. The structure variation caused by the hydrogen bonds and hydrophobic interactions between protein-ligand interactions indicates that these compounds have an inhibitory effect on the protein. PMID:25045700

  9. TissueWikiMobile: an Integrative Protein Expression Image Browser for Pathological Knowledge Sharing and Annotation on a Mobile Device

    PubMed Central

    Cheng, Chihwen; Stokes, Todd H.; Hang, Sovandy; Wang, May D.

    2016-01-01

    Doctors need fast and convenient access to medical data. This motivates the use of mobile devices for knowledge retrieval and sharing. We have developed TissueWikiMobile on the Apple iPhone and iPad to seamlessly access TissueWiki, an enormous repository of medical histology images. TissueWiki is a three terabyte database of antibody information and histology images from the Human Protein Atlas (HPA). Using TissueWikiMobile, users are capable of extracting knowledge from protein expression, adding annotations to highlight regions of interest on images, and sharing their professional insight. By providing an intuitive human computer interface, users can efficiently operate TissueWikiMobile to access important biomedical data without losing mobility. TissueWikiMobile furnishes the health community a ubiquitous way to collaborate and share their expert opinions not only on the performance of various antibodies stains but also on histology image annotation. PMID:27532057

  10. HIstome--a relational knowledgebase of human histone proteins and histone modifying enzymes.

    PubMed

    Khare, Satyajeet P; Habib, Farhat; Sharma, Rahul; Gadewal, Nikhil; Gupta, Sanjay; Galande, Sanjeev

    2012-01-01

    Histones are abundant nuclear proteins that are essential for the packaging of eukaryotic DNA into chromosomes. Different histone variants, in combination with their modification 'code', control regulation of gene expression in diverse cellular processes. Several enzymes that catalyze the addition and removal of multiple histone modifications have been discovered in the past decade, enabling investigations of their role(s) in normal cellular processes and diverse pathological conditions. This sudden influx of data, however, has resulted in need of an updated knowledgebase that compiles, organizes and presents curated scientific information to the user in an easily accessible format. Here, we present HIstome, a browsable, manually curated, relational database that provides information about human histone proteins, their sites of modifications, variants and modifying enzymes. HIstome is a knowledgebase of 55 human histone proteins, 106 distinct sites of their post-translational modifications (PTMs) and 152 histone-modifying enzymes. Entries have been grouped into 5 types of histones, 8 types of post-translational modifications and 14 types of enzymes that catalyze addition and removal of these modifications. The resource will be useful for epigeneticists, pharmacologists and clinicians. HIstome: The Histone Infobase is available online at http://www.iiserpune.ac.in/∼coee/histome/ and http://www.actrec.gov.in/histome/.

  11. Toward the Standardization of Mitochondrial Proteomics: The Italian Mitochondrial Human Proteome Project Initiative.

    PubMed

    Alberio, Tiziana; Pieroni, Luisa; Ronci, Maurizio; Banfi, Cristina; Bongarzone, Italia; Bottoni, Patrizia; Brioschi, Maura; Caterino, Marianna; Chinello, Clizia; Cormio, Antonella; Cozzolino, Flora; Cunsolo, Vincenzo; Fontana, Simona; Garavaglia, Barbara; Giusti, Laura; Greco, Viviana; Lucacchini, Antonio; Maffioli, Elisa; Magni, Fulvio; Monteleone, Francesca; Monti, Maria; Monti, Valentina; Musicco, Clara; Petrosillo, Giuseppe; Porcelli, Vito; Saletti, Rosaria; Scatena, Roberto; Soggiu, Alessio; Tedeschi, Gabriella; Zilocchi, Mara; Roncada, Paola; Urbani, Andrea; Fasano, Mauro

    2017-12-01

    The Mitochondrial Human Proteome Project aims at understanding the function of the mitochondrial proteome and its crosstalk with the proteome of other organelles. Being able to choose a suitable and validated enrichment protocol of functional mitochondria, based on the specific needs of the downstream proteomics analysis, would greatly help the researchers in the field. Mitochondrial fractions from ten model cell lines were prepared using three enrichment protocols and analyzed on seven different LC-MS/MS platforms. All data were processed using neXtProt as reference database. The data are available for the Human Proteome Project purposes through the ProteomeXchange Consortium with the identifier PXD007053. The processed data sets were analyzed using a suite of R routines to perform a statistical analysis and to retrieve subcellular and submitochondrial localizations. Although the overall number of identified total and mitochondrial proteins was not significantly dependent on the enrichment protocol, specific line to line differences were observed. Moreover, the protein lists were mapped to a network representing the functional mitochondrial proteome, encompassing mitochondrial proteins and their first interactors. More than 80% of the identified proteins resulted in nodes of this network but with a different ability in coisolating mitochondria-associated structures for each enrichment protocol/cell line pair.

  12. Activity-based mass spectrometric characterization of proteases and inhibitors in human saliva

    PubMed Central

    Sun, Xiuli; Salih, Erdjan; Oppenheim, Frank G.; Helmerhorst, Eva J.

    2009-01-01

    Proteases present in oral fluid effectively modulate the structure and function of some salivary proteins and have been implicated in tissue destruction in oral disease. To identify the proteases operating in the oral environment, proteins in pooled whole saliva supernatant were separated by anion-exchange chromatography and individual fractions were analyzed for proteolytic activity by zymography using salivary histatins as the enzyme substrates. Protein bands displaying proteolytic activity were particularly prominent in the 50–75 kDa region. Individual bands were excised, in-gel trypsinized and subjected to LC/ESI-MS/MS. The data obtained were searched against human, oral microbial and protease databases. A total of 13 proteases were identified all of which were of mammalian origin. Proteases detected in multiple fractions with cleavage specificities toward arginine and lysine residues, were lactotransferrin, kallikrein-1, and human airway trypsin-like protease. Unexpectedly, ten protease inhibitors were co-identified suggesting they were associated with the proteases in the same fractions. The inhibitors found most frequently were alpha-2-macroglobulin-like protein 1, alpha-1-antitrypsin, and leukocyte elastase inhibitor. Regulation of oral fluid proteolysis is highly important given that an inbalance in such activities has been correlated to a variety of pathological conditions including oral cancer. PMID:20011683

  13. Application of 2-dimensional difference gel electrophoresis (2D-DIGE) to the study of thrombin-activated human platelet secretome.

    PubMed

    Della Corte, Anna; Maugeri, Norma; Pampuch, Agnieszka; Cerletti, Chiara; de Gaetano, Giovanni; Rotilio, Domenico

    2008-02-01

    Thrombin is an agonist inducing platelet activation. We combined two-dimensional difference gel electrophoresis (2D-DIGE) and mass spectrometry (MALDI-TOF MS) to analyse differentially expressed proteins secreted from thrombin-stimulated platelets. Human washed platelets, from healthy volunteers, were stimulated with thrombin 0.5 U/ml at 37 degrees C without stirring and the secreted proteins were resolved by 2D-DIGE. By image analysis, 1094 spots were detected in the 2D gel. The spots whose mean intensity showed at least a five-fold change intensity increase or decrease in the thrombin-activated platelet gel in comparison with unstimulated control were digested by trypsin and subjected to MALDI-TOF MS analysis. Peptides from mass spectra of in-gel digest samples were matched against available databases, using the Mascot search engine (Matrix Science) for peptide mass fingerprint. In the activated platelet secretome, transferrin, glutathione-transferase, WD repeat protein, ER-60, thrombospondin-1 precursor and thrombospondin were the most abundant. Also lamin A, a nuclear protein, not previously identified in platelets, appeared to be released. The novel strategy to combine 2D-DIGE with MALDI-TOF MS is a useful approach for a quantitative analysis of the effect of thrombin on the secretome profile of human platelets.

  14. Comparative proteomics of human endothelial cell caveolae and rafts using two-dimensional gel electrophoresis and mass spectrometry.

    PubMed

    Sprenger, Richard R; Speijer, Dave; Back, Jaap Willem; De Koster, Chris G; Pannekoek, Hans; Horrevoets, Anton J G

    2004-01-01

    The human endothelial cell plasma membrane harbors two subdomains of similar lipid composition, caveolae and rafts, both crucially involved in various essential cellular processes like transcytosis, signal transduction and cholesterol homeostasis. Caveolin-enriched membranes, isolated by either cationic silica or buoyant density methods, were explored by comparing large series of two-dimensional (2-D) maps and subsequent identification of over 100 protein spots by matrix-assisted laser desorption/ionization (MALDI) peptide mass fingerprinting. Improved representation and identification of membrane proteins and valuable information on various post-translational modifications was achieved by the presented optimized procedures for solubilization, destaining and database searching/computing. Whereas the cationic silica purification yielded predominantly known endoplasmic reticulum residents, the cold-detergent method yielded a large number of known caveolae residents, including caveolin-1. Thus, a large part of this subproteome was established, including known (trans-)membrane, signal transduction and glycosyl phosphatidylinositol (GPI)-anchored proteins. Several predicted proteins from the human genome were isolated for the first time from biological samples, including SGRP58, SLP-2, C8ORF2, and XRP-2. These findings and various optimized procedures can serve as a reference to study the differential composition of endothelial cell caveolae and rafts, known to be involved in pathologies like cancer and cardiovascular disease.

  15. SpliceDisease database: linking RNA splicing and disease.

    PubMed

    Wang, Juan; Zhang, Jie; Li, Kaibo; Zhao, Wei; Cui, Qinghua

    2012-01-01

    RNA splicing is an important aspect of gene regulation in many organisms. Splicing of RNA is regulated by complicated mechanisms involving numerous RNA-binding proteins and the intricate network of interactions among them. Mutations in cis-acting splicing elements or its regulatory proteins have been shown to be involved in human diseases. Defects in pre-mRNA splicing process have emerged as a common disease-causing mechanism. Therefore, a database integrating RNA splicing and disease associations would be helpful for understanding not only the RNA splicing but also its contribution to disease. In SpliceDisease database, we manually curated 2337 splicing mutation disease entries involving 303 genes and 370 diseases, which have been supported experimentally in 898 publications. The SpliceDisease database provides information including the change of the nucleotide in the sequence, the location of the mutation on the gene, the reference Pubmed ID and detailed description for the relationship among gene mutations, splicing defects and diseases. We standardized the names of the diseases and genes and provided links for these genes to NCBI and UCSC genome browser for further annotation and genomic sequences. For the location of the mutation, we give direct links of the entry to the respective position/region in the genome browser. The users can freely browse, search and download the data in SpliceDisease at http://cmbi.bjmu.edu.cn/sdisease.

  16. MitoNuc: a database of nuclear genes coding for mitochondrial proteins. Update 2002.

    PubMed

    Attimonelli, Marcella; Catalano, Domenico; Gissi, Carmela; Grillo, Giorgio; Licciulli, Flavio; Liuni, Sabino; Santamaria, Monica; Pesole, Graziano; Saccone, Cecilia

    2002-01-01

    Mitochondria, besides their central role in energy metabolism, have recently been found to be involved in a number of basic processes of cell life and to contribute to the pathogenesis of many degenerative diseases. All functions of mitochondria depend on the interaction of nuclear and organelle genomes. Mitochondrial genomes have been extensively sequenced and analysed and data have been collected in several specialised databases. In order to collect information on nuclear coded mitochondrial proteins we developed MitoNuc, a database containing detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa. The MitoNuc database can be retrieved through SRS and is available via the web site http://bighost.area.ba.cnr.it/mitochondriome where other mitochondrial databases developed by our group, the complete list of the sequenced mitochondrial genomes, links to other mitochondrial sites and related information, are available. The MitoAln database, related to MitoNuc in the previous release, reporting the multiple alignments of the relevant homologous protein coding regions, is no longer supported in the present release. In order to keep the links among entries in MitoNuc from homologous proteins, a new field in the database has been defined: the cluster identifier, an alpha numeric code used to identify each cluster of homologous proteins. A comment field derived from the corresponding SWISS-PROT entry has been introduced; this reports clinical data related to dysfunction of the protein. The logic scheme of MitoNuc database has been implemented in the ORACLE DBMS. This will allow the end-users to retrieve data through a friendly interface that will be soon implemented.

  17. SpirPep: an in silico digestion-based platform to assist bioactive peptides discovery from a genome-wide database.

    PubMed

    Anekthanakul, Krittima; Hongsthong, Apiradee; Senachak, Jittisak; Ruengjitchatchawalya, Marasri

    2018-04-20

    Bioactive peptides, including biological sources-derived peptides with different biological activities, are protein fragments that influence the functions or conditions of organisms, in particular humans and animals. Conventional methods of identifying bioactive peptides are time-consuming and costly. To quicken the processes, several bioinformatics tools are recently used to facilitate screening of the potential peptides prior their activity assessment in vitro and/or in vivo. In this study, we developed an efficient computational method, SpirPep, which offers many advantages over the currently available tools. The SpirPep web application tool is a one-stop analysis and visualization facility to assist bioactive peptide discovery. The tool is equipped with 15 customized enzymes and 1-3 miscleavage options, which allows in silico digestion of protein sequences encoded by protein-coding genes from single, multiple, or genome-wide scaling, and then directly classifies the peptides by bioactivity using an in-house database that contains bioactive peptides collected from 13 public databases. With this tool, the resulting peptides are categorized by each selected enzyme, and shown in a tabular format where the peptide sequences can be tracked back to their original proteins. The developed tool and webpages are coded in PHP and HTML with CSS/JavaScript. Moreover, the tool allows protein-peptide alignment visualization by Generic Genome Browser (GBrowse) to display the region and details of the proteins and peptides within each parameter, while considering digestion design for the desirable bioactivity. SpirPep is efficient; it takes less than 20 min to digest 3000 proteins (751,860 amino acids) with 15 enzymes and three miscleavages for each enzyme, and only a few seconds for single enzyme digestion. Obviously, the tool identified more bioactive peptides than that of the benchmarked tool; an example of validated pentapeptide (FLPIL) from LC-MS/MS was demonstrated. The web and database server are available at http://spirpepapp.sbi.kmutt.ac.th . SpirPep, a web-based bioactive peptide discovery application, is an in silico-based tool with an overview of the results. The platform is a one-stop analysis and visualization facility; and offers advantages over the currently available tools. This tool may be useful for further bioactivity analysis and the quantitative discovery of desirable peptides.

  18. Heterogeneous Biomedical Database Integration Using a Hybrid Strategy: A p53 Cantcer Research Database

    PubMed Central

    Bichutskiy, Vadim Y.; Colman, Richard; Brachmann, Rainer K.; Lathrop, Richard H.

    2006-01-01

    Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.) PMID:19458771

  19. Using SQL Databases for Sequence Similarity Searching and Analysis.

    PubMed

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  20. SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein–Protein Interactions

    PubMed Central

    Jefferson, Emily R.; Walsh, Thomas P.; Roberts, Timothy J.; Barton, Geoffrey J.

    2007-01-01

    SNAPPI-DB, a high performance database of Structures, iNterfaces and Alignments of Protein–Protein Interactions, and its associated Java Application Programming Interface (API) is described. SNAPPI-DB contains structural data, down to the level of atom co-ordinates, for each structure in the Protein Data Bank (PDB) together with associated data including SCOP, CATH, Pfam, SWISSPROT, InterPro, GO terms, Protein Quaternary Structures (PQS) and secondary structure information. Domain–domain interactions are stored for multiple domain definitions and are classified by their Superfamily/Family pair and interaction interface. Each set of classified domain–domain interactions has an associated multiple structure alignment for each partner. The API facilitates data access via PDB entries, domains and domain–domain interactions. Rapid development, fast database access and the ability to perform advanced queries without the requirement for complex SQL statements are provided via an object oriented database and the Java Data Objects (JDO) API. SNAPPI-DB contains many features which are not available in other databases of structural protein–protein interactions. It has been applied in three studies on the properties of protein–protein interactions and is currently being employed to train a protein–protein interaction predictor and a functional residue predictor. The database, API and manual are available for download at: . PMID:17202171

  1. SALAD database: a motif-based database of protein annotations for plant comparative genomics

    PubMed Central

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2010-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933

  2. SALAD database: a motif-based database of protein annotations for plant comparative genomics.

    PubMed

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2010-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.

  3. MEGADOCK-Web: an integrated database of high-throughput structure-based protein-protein interaction predictions.

    PubMed

    Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka

    2018-05-08

    Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on docking calculations with biochemical pathways and enables users to easily and quickly assess PPI feasibilities by archiving PPI predictions. MEGADOCK-Web also promotes the discovery of new PPIs and protein functions and is freely available for use at http://www.bi.cs.titech.ac.jp/megadock-web/ .

  4. ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.

    PubMed

    Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

    2010-03-01

    Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org

  5. Rice proteome analysis: a step toward functional analysis of the rice genome.

    PubMed

    Komatsu, Setsuko; Tanaka, Naoki

    2005-03-01

    The technique of proteome analysis using 2-DE has the power to monitor global changes that occur in the protein complement of tissues and subcellular compartments. In this review, we describe construction of the rice proteome database, the cataloging of rice proteins, and the functional characterization of some of the proteins identified. Initially, proteins extracted from various tissues and organelles were separated by 2-DE and an image analyzer was used to construct a display or reference map of the proteins. The rice proteome database currently contains 23 reference maps based on 2-DE of proteins from different rice tissues and subcellular compartments. These reference maps comprise 13 129 rice proteins, and the amino acid sequences of 5092 of these proteins are entered in the database. Major proteins involved in growth or stress responses have been identified by using a proteomics approach and some of these proteins have unique functions. Furthermore, initial work has also begun on analyzing the phosphoproteome and protein-protein interactions in rice. The information obtained from the rice proteome database will aid in the molecular cloning of rice genes and in predicting the function of unknown proteins.

  6. Databases and Associated Tools for Glycomics and Glycoproteomics.

    PubMed

    Lisacek, Frederique; Mariethoz, Julien; Alocci, Davide; Rudd, Pauline M; Abrahams, Jodie L; Campbell, Matthew P; Packer, Nicolle H; Ståhle, Jonas; Widmalm, Göran; Mullen, Elaine; Adamczyk, Barbara; Rojas-Macias, Miguel A; Jin, Chunsheng; Karlsson, Niclas G

    2017-01-01

    The access to biodatabases for glycomics and glycoproteomics has proven to be essential for current glycobiological research. This chapter presents available databases that are devoted to different aspects of glycobioinformatics. This includes oligosaccharide sequence databases, experimental databases, 3D structure databases (of both glycans and glycorelated proteins) and association of glycans with tissue, disease, and proteins. Specific search protocols are also provided using tools associated with experimental databases for converting primary glycoanalytical data to glycan structural information. In particular, researchers using glycoanalysis methods by U/HPLC (GlycoBase), MS (GlycoWorkbench, UniCarb-DB, GlycoDigest), and NMR (CASPER) will benefit from this chapter. In addition we also include information on how to utilize glycan structural information to query databases that associate glycans with proteins (UniCarbKB) and with interactions with pathogens (SugarBind).

  7. Differences in expression of retinal pigment epithelium mRNA between normal canines

    PubMed Central

    2004-01-01

    Abstract A reference database of differences in mRNA expression in normal healthy canine retinal pigment epithelium (RPE) has been established. This database identifies non-informative differences in mRNA expression that can be used in screening canine RPE for mutations associated with clinical effects on vision. Complementary DNA (cDNA) pools were prepared from mRNA harvested from RPE, amplified by PCR, and used in a subtractive hybridization protocol (representational differential analysis) to identify differences in RPE mRNA expression between canines. The effect of relatedness of the test canines on the frequency of occurrence of differences was evaluated by using 2 unrelated canines for comparison with 2 female sibling canines of blue heeler/bull terrier lineage. Differentially expressed cDNA species were cloned, sequenced, and identified by comparison to public database entries. The most frequently observed differentially expressed sequence from the unrelated canine comparison was cDNA with 21 base pairs (bp) identical to the human epithelial membrane protein 1 gene (present in 8 of 20 clones). Different clones from the same-sex sibling RPE contained repetitions of several short sequence motifs including the human epithelial membrane protein 1 (4 of 25 clones). Other prevalent differences between sibling RPE included sequences similar to a chicken genetic marker sequence motif (5 of 25), and 6 clones with homology to porcine major histocompatibility loci. In addition to identifying several repetitively occurring, noninformative, differentially expressed RPE mRNA species, the findings confirm that fewer differences occurred between siblings, highlighting the importance of using closely related subjects in representational difference analysis studies. PMID:15352545

  8. Proteome of Caulobacter crescentus cell cycle publicly accessible on SWICZ server.

    PubMed

    Vohradsky, Jiri; Janda, Ivan; Grünenfelder, Björn; Berndt, Peter; Röder, Daniel; Langen, Hanno; Weiser, Jaroslav; Jenal, Urs

    2003-10-01

    Here we present the Swiss-Czech Proteomics Server (SWICZ), which hosts the proteomic database summarizing information about the cell cycle of the aquatic bacterium Caulobacter crescentus. The database provides a searchable tool for easy access of global protein synthesis and protein stability data as examined during the C. crescentus cell cycle. Protein synthesis data collected from five different cell cycle stages were determined for each protein spot as a relative value of the total amount of [(35)S]methionine incorporation. Protein stability of pulse-labeled extracts were measured during a chase period equivalent to one cell cycle unit. Quantitative information for individual proteins together with descriptive data such as protein identities, apparent molecular masses and isoelectric points, were combined with information on protein function, genomic context, and the cell cycle stage, and were then assembled in a relational database with a world wide web interface (http://proteom.biomed.cas.cz), which allows the database records to be searched and displays the recovered information. A total of 1250 protein spots were reproducibly detected on two-dimensional gel electropherograms, 295 of which were identified by mass spectroscopy. The database is accessible either through clickable two-dimensional gel electrophoretic maps or by means of a set of dedicated search engines. Basic characterization of the experimental procedures, data processing, and a comprehensive description of the web site are presented. In its current state, the SWICZ proteome database provides a platform for the incorporation of new data emerging from extended functional studies on the C. crescentus proteome.

  9. Patterns of HIV-1 Protein Interaction Identify Perturbed Host-Cellular Subsystems

    PubMed Central

    MacPherson, Jamie I.; Dickerson, Jonathan E.; Pinney, John W.; Robertson, David L.

    2010-01-01

    Human immunodeficiency virus type 1 (HIV-1) exploits a diverse array of host cell functions in order to replicate. This is mediated through a network of virus-host interactions. A variety of recent studies have catalogued this information. In particular the HIV-1, Human Protein Interaction Database (HHPID) has provided a unique depth of protein interaction detail. However, as a map of HIV-1 infection, the HHPID is problematic, as it contains curation error and redundancy; in addition, it is based on a heterogeneous set of experimental methods. Based on identifying shared patterns of HIV-host interaction, we have developed a novel methodology to delimit the core set of host-cellular functions and their associated perturbation from the HHPID. Initially, using biclustering, we identify 279 significant sets of host proteins that undergo the same types of interaction. The functional cohesiveness of these protein sets was validated using a human protein-protein interaction network, gene ontology annotation and sequence similarity. Next, using a distance measure, we group host protein sets and identify 37 distinct higher-level subsystems. We further demonstrate the biological significance of these subsystems by cross-referencing with global siRNA screens that have been used to detect host factors necessary for HIV-1 replication, and investigate the seemingly small intersect between these data sets. Our results highlight significant host-cell subsystems that are perturbed during the course of HIV-1 infection. Moreover, we characterise the patterns of interaction that contribute to these perturbations. Thus, our work disentangles the complex set of HIV-1-host protein interactions in the HHPID, reconciles these with siRNA screens and provides an accessible and interpretable map of infection. PMID:20686668

  10. Rule Mining Techniques to Predict Prokaryotic Metabolic Pathways.

    PubMed

    Saidi, Rabie; Boudellioua, Imane; Martin, Maria J; Solovyev, Victor

    2017-01-01

    It is becoming more evident that computational methods are needed for the identification and the mapping of pathways in new genomes. We introduce an automatic annotation system (ARBA4Path Association Rule-Based Annotator for Pathways) that utilizes rule mining techniques to predict metabolic pathways across wide range of prokaryotes. It was demonstrated that specific combinations of protein domains (recorded in our rules) strongly determine pathways in which proteins are involved and thus provide information that let us very accurately assign pathway membership (with precision of 0.999 and recall of 0.966) to proteins of a given prokaryotic taxon. Our system can be used to enhance the quality of automatically generated annotations as well as annotating proteins with unknown function. The prediction models are represented in the form of human-readable rules, and they can be used effectively to add absent pathway information to many proteins in UniProtKB/TrEMBL database.

  11. The 86-kilodalton antigen from Schistosoma mansoni is a heat-shock protein homologous to yeast HSP-90.

    PubMed

    Johnson, K S; Wells, K; Bock, J V; Nene, V; Taylor, D W; Cordingley, J S

    1989-08-01

    We report the sequence of a cDNA clone encoding an 86-kDa polypeptide antigen (p86) from Schistosoma mansoni. Fusion proteins made in Escherichia coli are recognized by human infection sera. The reading frame of this antigen is highly homologous to those of the large heat-shock proteins of Saccharomyces cerevisiae (HSP90) and Drosophila melanogaster (HSP83). mRNA encoding p86 increases in response to heat shock of adult worms, as does HSP70. Comparisons of the sequences of HSP70 and HSP83 homologues show that these two families of heat-shock proteins are not significantly related except for the last four amino acid residues, which are Glu-Glu-Val-Asp in every case. This sequence is not found at the carboxy terminus of any other protein in the current databases.

  12. Literature Mining of Pathogenesis-Related Proteins in Human Pathogens for Database Annotation

    DTIC Science & Technology

    2009-10-01

    person shall be subject to any penalty for failing to comply with a collection of information if it does not display a currently valid OMB control...submission and for literature mining result display with automatically tagged abstracts. I. Literature data sets for machine learning algorithm training...mass spectrometry) proteomics data from Burkholderia strains. • Task1 ( M13 -15): Preliminary analysis of the Burkholderia proteomic space

  13. Comprehensive Proteomic Analysis of Human Milk-derived Extracellular Vesicles Unveils a Novel Functional Proteome Distinct from Other Milk Components*

    PubMed Central

    van Herwijnen, Martijn J.C.; Zonneveld, Marijke I.; Goerdayal, Soenita; Nolte – 't Hoen, Esther N.M.; Garssen, Johan; Stahl, Bernd; Maarten Altelaar, A.F.; Redegeld, Frank A.; Wauben, Marca H.M.

    2016-01-01

    Breast milk contains several macromolecular components with distinctive functions, whereby milk fat globules and casein micelles mainly provide nutrition to the newborn, and whey contains molecules that can stimulate the newborn's developing immune system and gastrointestinal tract. Although extracellular vesicles (EV) have been identified in breast milk, their physiological function and composition has not been addressed in detail. EV are submicron sized vehicles released by cells for intercellular communication via selectively incorporated lipids, nucleic acids, and proteins. Because of the difficulty in separating EV from other milk components, an in-depth analysis of the proteome of human milk-derived EV is lacking. In this study, an extensive LC-MS/MS proteomic analysis was performed of EV that had been purified from breast milk of seven individual donors using a recently established, optimized density-gradient-based EV isolation protocol. A total of 1963 proteins were identified in milk-derived EV, including EV-associated proteins like CD9, Annexin A5, and Flotillin-1, with a remarkable overlap between the different donors. Interestingly, 198 of the identified proteins are not present in the human EV database Vesiclepedia, indicating that milk-derived EV harbor proteins not yet identified in EV of different origin. Similarly, the proteome of milk-derived EV was compared with that of other milk components. For this, data from 38 published milk proteomic studies were combined in order to construct the total milk proteome, which consists of 2698 unique proteins. Remarkably, 633 proteins identified in milk-derived EV have not yet been identified in human milk to date. Interestingly, these novel proteins include proteins involved in regulation of cell growth and controlling inflammatory signaling pathways, suggesting that milk-derived EVs could support the newborn's developing gastrointestinal tract and immune system. Overall, this study provides an expansion of the whole milk proteome and illustrates that milk-derived EV are macromolecular components with a unique functional proteome. PMID:27601599

  14. Proteasix: a tool for automated and large-scale prediction of proteases involved in naturally occurring peptide generation.

    PubMed

    Klein, Julie; Eales, James; Zürbig, Petra; Vlahou, Antonia; Mischak, Harald; Stevens, Robert

    2013-04-01

    In this study, we have developed Proteasix, an open-source peptide-centric tool that can be used to predict in silico the proteases involved in naturally occurring peptide generation. We developed a curated cleavage site (CS) database, containing 3500 entries about human protease/CS combinations. On top of this database, we built a tool, Proteasix, which allows CS retrieval and protease associations from a list of peptides. To establish the proof of concept of the approach, we used a list of 1388 peptides identified from human urine samples, and compared the prediction to the analysis of 1003 randomly generated amino acid sequences. Metalloprotease activity was predominantly involved in urinary peptide generation, and more particularly to peptides associated with extracellular matrix remodelling, compared to proteins from other origins. In comparison, random sequences returned almost no results, highlighting the specificity of the prediction. This study provides a tool that can facilitate linking of identified protein fragments to predicted protease activity, and therefore into presumed mechanisms of disease. Experiments are needed to confirm the in silico hypotheses; nevertheless, this approach may be of great help to better understand molecular mechanisms of disease, and define new biomarkers, and therapeutic targets. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. A three-way approach for protein function classification

    PubMed Central

    2017-01-01

    The knowledge of protein functions plays an essential role in understanding biological cells and has a significant impact on human life in areas such as personalized medicine, better crops and improved therapeutic interventions. Due to expense and inherent difficulty of biological experiments, intelligent methods are generally relied upon for automatic assignment of functions to proteins. The technological advancements in the field of biology are improving our understanding of biological processes and are regularly resulting in new features and characteristics that better describe the role of proteins. It is inevitable to neglect and overlook these anticipated features in designing more effective classification techniques. A key issue in this context, that is not being sufficiently addressed, is how to build effective classification models and approaches for protein function prediction by incorporating and taking advantage from the ever evolving biological information. In this article, we propose a three-way decision making approach which provides provisions for seeking and incorporating future information. We considered probabilistic rough sets based models such as Game-Theoretic Rough Sets (GTRS) and Information-Theoretic Rough Sets (ITRS) for inducing three-way decisions. An architecture of protein functions classification with probabilistic rough sets based three-way decisions is proposed and explained. Experiments are carried out on Saccharomyces cerevisiae species dataset obtained from Uniprot database with the corresponding functional classes extracted from the Gene Ontology (GO) database. The results indicate that as the level of biological information increases, the number of deferred cases are reduced while maintaining similar level of accuracy. PMID:28234929

  16. A three-way approach for protein function classification.

    PubMed

    Ur Rehman, Hafeez; Azam, Nouman; Yao, JingTao; Benso, Alfredo

    2017-01-01

    The knowledge of protein functions plays an essential role in understanding biological cells and has a significant impact on human life in areas such as personalized medicine, better crops and improved therapeutic interventions. Due to expense and inherent difficulty of biological experiments, intelligent methods are generally relied upon for automatic assignment of functions to proteins. The technological advancements in the field of biology are improving our understanding of biological processes and are regularly resulting in new features and characteristics that better describe the role of proteins. It is inevitable to neglect and overlook these anticipated features in designing more effective classification techniques. A key issue in this context, that is not being sufficiently addressed, is how to build effective classification models and approaches for protein function prediction by incorporating and taking advantage from the ever evolving biological information. In this article, we propose a three-way decision making approach which provides provisions for seeking and incorporating future information. We considered probabilistic rough sets based models such as Game-Theoretic Rough Sets (GTRS) and Information-Theoretic Rough Sets (ITRS) for inducing three-way decisions. An architecture of protein functions classification with probabilistic rough sets based three-way decisions is proposed and explained. Experiments are carried out on Saccharomyces cerevisiae species dataset obtained from Uniprot database with the corresponding functional classes extracted from the Gene Ontology (GO) database. The results indicate that as the level of biological information increases, the number of deferred cases are reduced while maintaining similar level of accuracy.

  17. KnotProt: a database of proteins with knots and slipknots

    PubMed Central

    Jamroz, Michal; Niemyska, Wanda; Rawdon, Eric J.; Stasiak, Andrzej; Millett, Kenneth C.; Sułkowski, Piotr; Sulkowska, Joanna I.

    2015-01-01

    The protein topology database KnotProt, http://knotprot.cent.uw.edu.pl/, collects information about protein structures with open polypeptide chains forming knots or slipknots. The knotting complexity of the cataloged proteins is presented in the form of a matrix diagram that shows users the knot type of the entire polypeptide chain and of each of its subchains. The pattern visible in the matrix gives the knotting fingerprint of a given protein and permits users to determine, for example, the minimal length of the knotted regions (knot's core size) or the depth of a knot, i.e. how many amino acids can be removed from either end of the cataloged protein structure before converting it from a knot to a different type of knot. In addition, the database presents extensive information about the biological functions, families and fold types of proteins with non-trivial knotting. As an additional feature, the KnotProt database enables users to submit protein or polymer chains and generate their knotting fingerprints. PMID:25361973

  18. Database resources of the National Center for Biotechnology Information.

    PubMed

    Wheeler, David L; Barrett, Tanya; Benson, Dennis A; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Geer, Lewis Y; Kapustin, Yuri; Khovayko, Oleg; Landsman, David; Lipman, David J; Madden, Thomas L; Maglott, Donna R; Ostell, James; Miller, Vadim; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Steven T; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusov, Roman L; Tatusova, Tatiana A; Wagner, Lukas; Yaschenko, Eugene

    2007-01-01

    In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link(BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace and Assembly Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Viral Genotyping Tools, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

  19. Database resources of the National Center for Biotechnology Information.

    PubMed

    Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Madden, Thomas L; Maglott, Donna R; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Yaschenko, Eugene; Ye, Jian

    2009-01-01

    In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

  20. Integrated Controlling System and Unified Database for High Throughput Protein Crystallography Experiments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gaponov, Yu.A.; Igarashi, N.; Hiraki, M.

    2004-05-12

    An integrated controlling system and a unified database for high throughput protein crystallography experiments have been developed. Main features of protein crystallography experiments (purification, crystallization, crystal harvesting, data collection, data processing) were integrated into the software under development. All information necessary to perform protein crystallography experiments is stored (except raw X-ray data that are stored in a central data server) in a MySQL relational database. The database contains four mutually linked hierarchical trees describing protein crystals, data collection of protein crystal and experimental data processing. A database editor was designed and developed. The editor supports basic database functions to view,more » create, modify and delete user records in the database. Two search engines were realized: direct search of necessary information in the database and object oriented search. The system is based on TCP/IP secure UNIX sockets with four predefined sending and receiving behaviors, which support communications between all connected servers and clients with remote control functions (creating and modifying data for experimental conditions, data acquisition, viewing experimental data, and performing data processing). Two secure login schemes were designed and developed: a direct method (using the developed Linux clients with secure connection) and an indirect method (using the secure SSL connection using secure X11 support from any operating system with X-terminal and SSH support). A part of the system has been implemented on a new MAD beam line, NW12, at the Photon Factory Advanced Ring for general user experiments.« less

  1. The Comparative Toxicogenomics Database (CTD): A Resource for Comparative Toxicological Studies

    PubMed Central

    CJ, Mattingly; MC, Rosenstein; GT, Colby; JN, Forrest; JL, Boyer

    2006-01-01

    The etiology of most chronic diseases involves interactions between environmental factors and genes that modulate important biological processes (Olden and Wilson, 2000). We are developing the publicly available Comparative Toxicogenomics Database (CTD) to promote understanding about the effects of environmental chemicals on human health. CTD identifies interactions between chemicals and genes and facilitates cross-species comparative studies of these genes. The use of diverse animal models and cross-species comparative sequence studies has been critical for understanding basic physiological mechanisms and gene and protein functions. Similarly, these approaches will be valuable for exploring the molecular mechanisms of action of environmental chemicals and the genetic basis of differential susceptibility. PMID:16902965

  2. EUCLID: automatic classification of proteins in functional classes by their database annotations.

    PubMed

    Tamames, J; Ouzounis, C; Casari, G; Sander, C; Valencia, A

    1998-01-01

    A tool is described for the automatic classification of sequences in functional classes using their database annotations. The Euclid system is based on a simple learning procedure from examples provided by human experts. Euclid is freely available for academics at http://www.gredos.cnb.uam.es/EUCLID, with the corresponding dictionaries for the generation of three, eight and 14 functional classes. E-mail: valencia@cnb.uam.es The results of the EUCLID classification of different genomes are available at http://www.sander.ebi.ac. uk/genequiz/. A detailed description of the different applications mentioned in the text is available at http://www.gredos.cnb.uam. es/EUCLID/Full_Paper

  3. Development of proteome-wide binding reagents for research and diagnostics.

    PubMed

    Taussig, Michael J; Schmidt, Ronny; Cook, Elizabeth A; Stoevesandt, Oda

    2013-12-01

    Alongside MS, antibodies and other specific protein-binding molecules have a special place in proteomics as affinity reagents in a toolbox of applications for determining protein location, quantitative distribution and function (affinity proteomics). The realisation that the range of research antibodies available, while apparently vast is nevertheless still very incomplete and frequently of uncertain quality, has stimulated projects with an objective of raising comprehensive, proteome-wide sets of protein binders. With progress in automation and throughput, a remarkable number of recent publications refer to the practical possibility of selecting binders to every protein encoded in the genome. Here we review the requirements of a pipeline of production of protein binders for the human proteome, including target prioritisation, antigen design, 'next generation' methods, databases and the approaches taken by ongoing projects in Europe and the USA. While the task of generating affinity reagents for all human proteins is complex and demanding, the benefits of well-characterised and quality-controlled pan-proteome binder resources for biomedical research, industry and life sciences in general would be enormous and justify the effort. Given the technical, personnel and financial resources needed to fulfil this aim, expansion of current efforts may best be addressed through large-scale international collaboration. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Needles in the EST Haystack: Large-Scale Identification and Analysis of Excretory-Secretory (ES) Proteins in Parasitic Nematodes Using Expressed Sequence Tags (ESTs)

    PubMed Central

    Nagaraj, Shivashankar H.; Gasser, Robin B.; Ranganathan, Shoba

    2008-01-01

    Background Parasitic nematodes of humans, other animals and plants continue to impose a significant public health and economic burden worldwide, due to the diseases they cause. Promising antiparasitic drug and vaccine candidates have been discovered from excreted or secreted (ES) proteins released from the parasite and exposed to the immune system of the host. Mining the entire expressed sequence tag (EST) data available from parasitic nematodes represents an approach to discover such ES targets. Methods and Findings In this study, we predicted, using EST2Secretome, a novel, high-throughput, computational workflow system, 4,710 ES proteins from 452,134 ESTs derived from 39 different species of nematodes, parasitic in animals (including humans) or plants. In total, 2,632, 786, and 1,292 ES proteins were predicted for animal-, human-, and plant-parasitic nematodes. Subsequently, we systematically analysed ES proteins using computational methods. Of these 4,710 proteins, 2,490 (52.8%) had orthologues in Caenorhabditis elegans, whereas 621 (13.8%) appeared to be novel, currently having no significant match to any molecule available in public databases. Of the C. elegans homologues, 267 had strong “loss-of-function” phenotypes by RNA interference (RNAi) in this nematode. We could functionally classify 1,948 (41.3%) sequences using the Gene Ontology (GO) terms, establish pathway associations for 573 (12.2%) sequences using Kyoto Encyclopaedia of Genes and Genomes (KEGG), and identify protein interaction partners for 1,774 (37.6%) molecules. We also mapped 758 (16.1%) proteins to protein domains including the nematode-specific protein family “transthyretin-like” and “chromadorea ALT,” considered as vaccine candidates against filariasis in humans. Conclusions We report the large-scale analysis of ES proteins inferred from EST data for a range of parasitic nematodes. This set of ES proteins provides an inventory of known and novel members of ES proteins as a foundation for studies focused on understanding the biology of parasitic nematodes and their interactions with their hosts, as well as for the development of novel drugs or vaccines for parasite intervention and control. PMID:18820748

  5. Identification of functional candidates amongst hypothetical proteins of Treponema pallidum ssp. pallidum.

    PubMed

    Naqvi, Ahmad Abu Turab; Shahbaaz, Mohd; Ahmad, Faizan; Hassan, Md Imtaiyaz

    2015-01-01

    Syphilis is a globally occurring venereal disease, and its infection is propagated through sexual contact. The causative agent of syphilis, Treponema pallidum ssp. pallidum, a Gram-negative sphirochaete, is an obligate human parasite. Genome of T. pallidum ssp. pallidum SS14 strain (RefSeq NC_010741.1) encodes 1,027 proteins, of which 444 proteins are known as hypothetical proteins (HPs), i.e., proteins of unknown functions. Here, we performed functional annotation of HPs of T. pallidum ssp. pallidum using various database, domain architecture predictors, protein function annotators and clustering tools. We have analyzed the sequences of 444 HPs of T. pallidum ssp. pallidum and subsequently predicted the function of 207 HPs with a high level of confidence. However, functions of 237 HPs are predicted with less accuracy. We found various enzymes, transporters, binding proteins in the annotated group of HPs that may be possible molecular targets, facilitating for the survival of pathogen. Our comprehensive analysis helps to understand the mechanism of pathogenesis to provide many novel potential therapeutic interventions.

  6. The top skin-associated genes: a comparative analysis of human and mouse skin transcriptomes.

    PubMed

    Gerber, Peter Arne; Buhren, Bettina Alexandra; Schrumpf, Holger; Homey, Bernhard; Zlotnik, Albert; Hevezi, Peter

    2014-06-01

    The mouse represents a key model system for the study of the physiology and biochemistry of skin. Comparison of skin between mouse and human is critical for interpretation and application of data from mouse experiments to human disease. Here, we review the current knowledge on structure and immunology of mouse and human skin. Moreover, we present a systematic comparison of human and mouse skin transcriptomes. To this end, we have recently used a genome-wide database of human gene expression to identify genes highly expressed in skin, with no, or limited expression elsewhere - human skin-associated genes (hSAGs). Analysis of our set of hSAGs allowed us to generate a comprehensive molecular characterization of healthy human skin. Here, we used a similar database to generate a list of mouse skin-associated genes (mSAGs). A comparative analysis between the top human (n=666) and mouse (n=873) skin-associated genes (SAGs) revealed a total of only 30.2% identity between the two lists. The majority of shared genes encode proteins that participate in structural and barrier functions. Analysis of the top functional annotation terms revealed an overlap for morphogenesis, cell adhesion, structure, and signal transduction. The results of this analysis, discussed in the context of published data, illustrate the diversity between the molecular make up of skin of both species and grants a probable explanation, why results generated in murine in vivo models often fail to translate into the human.

  7. GRBase, a new gene regulation data base available by anonymous ftp.

    PubMed Central

    Collier, B; Danielsen, M

    1994-01-01

    The Gene Regulation Database (GRBase) is a compendium of information on the structure and function of proteins involved in the control of gene expression in eukaryotes. These proteins include transcription factors, proteins involved in signal transduction, and receptors. The database can be obtained by FTP in Filemaker Pro, text, and postscript formats. The database will be expanded in the coming year to include reviews on families of proteins involved in gene regulation and to allow online searching. PMID:7937071

  8. Adenovirus 36 DNA in human adipose tissue.

    PubMed

    Ponterio, E; Cangemi, R; Mariani, S; Casella, G; De Cesare, A; Trovato, F M; Garozzo, A; Gnessi, L

    2015-12-01

    Recent studies have suggested a possible correlation between obesity and adenovirus 36 (Adv36) infection in humans. As information on adenoviral DNA presence in human adipose tissue are limited, we evaluated the presence of Adv36 DNA in adipose tissue of 21 adult overweight or obese patients. Total DNA was extracted from adipose tissue biopsies. Virus detection was performed using PCR protocols with primers against specific Adv36 fiber protein and the viral oncogenic E4orf1 protein nucleotide sequences. Sequences were aligned with the NCBI database and phylogenetic analyses were carried out with MEGA6 software. Adv36 DNA was found in four samples (19%). This study indicates that some individuals carry Adv36 in the visceral adipose tissue. Further studies are needed to determine the specific effect of Adv36 infection on adipocytes, the prevalence of Adv36 infection and its relationship with obesity in the perspective of developing a vaccine that could potentially prevent or mitigate infection.

  9. The UniProtKB guide to the human proteome

    PubMed Central

    Breuza, Lionel; Poux, Sylvain; Estreicher, Anne; Famiglietti, Maria Livia; Magrane, Michele; Tognolli, Michael; Bridge, Alan; Baratin, Delphine; Redaschi, Nicole

    2016-01-01

    Advances in high-throughput and advanced technologies allow researchers to routinely perform whole genome and proteome analysis. For this purpose, they need high-quality resources providing comprehensive gene and protein sets for their organisms of interest. Using the example of the human proteome, we will describe the content of a complete proteome in the UniProt Knowledgebase (UniProtKB). We will show how manual expert curation of UniProtKB/Swiss-Prot is complemented by expert-driven automatic annotation to build a comprehensive, high-quality and traceable resource. We will also illustrate how the complexity of the human proteome is captured and structured in UniProtKB. Database URL: www.uniprot.org PMID:26896845

  10. Mouse cysteine-rich secretory protein 4 (CRISP4): a member of the Crisp family exclusively expressed in the epididymis in an androgen-dependent manner.

    PubMed

    Jalkanen, Jenni; Huhtaniemi, Ilpo; Poutanen, Matti

    2005-05-01

    The final maturation of spermatozoa produced in the testis takes place during their passage through the epididymis. In this process, the proteins secreted into the epididymal lumen along with changes in the pH and salt composition of the epididymal fluid cause several biochemical changes and remodeling of the sperm plasma membrane. The Crisp family is a group of cysteine-rich secretory proteins that previously consisted of three members, one of which-CRISP1-is an epididymal protein shown to attach to the sperm surface in the epididymal lumen and to inhibit gamete membrane fusion. In the present paper, we introduce a new member of the Crisp protein family, CRISP4. The new gene was discovered through in silico analysis of the epididymal expressed sequence tag library deposited in the UniGene database. The peptide sequence of CRISP4 has a signal sequence suggesting that it is secreted into the epididymal lumen and might thus interact with sperm. Unlike the other members of the family, Crisp4 is located on chromosome 1 in a cluster of genes encoding for cysteine-rich proteins. Crisp4 is expressed in the mouse exclusively in epithelial cells of the epididymis in an androgen-dependent manner, and the expression of the gene starts at puberty along with the onset of sperm maturation. The identified murine CRISP4 peptide has high homology with human CRISP1, and the homology is higher than that between murine and human CRISP1, suggesting that CRISP4 represents the mouse counterpart of human CRISP1 and could have similar effects on sperm membrane as mouse and human CRISP1.

  11. Systematic asymmetric nucleotide exchanges produce human mitochondrial RNAs cryptically encoding for overlapping protein coding genes.

    PubMed

    Seligmann, Hervé

    2013-05-07

    GenBank's EST database includes RNAs matching exactly human mitochondrial sequences assuming systematic asymmetric nucleotide exchange-transcription along exchange rules: A→G→C→U/T→A (12 ESTs), A→U/T→C→G→A (4 ESTs), C→G→U/T→C (3 ESTs), and A→C→G→U/T→A (1 EST), no RNAs correspond to other potential asymmetric exchange rules. Hypothetical polypeptides translated from nucleotide-exchanged human mitochondrial protein coding genes align with numerous GenBank proteins, predicted secondary structures resemble their putative GenBank homologue's. Two independent methods designed to detect overlapping genes (one based on nucleotide contents analyses in relation to replicative deamination gradients at third codon positions, and circular code analyses of codon contents based on frame redundancy), confirm nucleotide-exchange-encrypted overlapping genes. Methods converge on which genes are most probably active, and which not, and this for the various exchange rules. Mean EST lengths produced by different nucleotide exchanges are proportional to (a) extents that various bioinformatics analyses confirm the protein coding status of putative overlapping genes; (b) known kinetic chemistry parameters of the corresponding nucleotide substitutions by the human mitochondrial DNA polymerase gamma (nucleotide DNA misinsertion rates); (c) stop codon densities in predicted overlapping genes (stop codon readthrough and exchanging polymerization regulate gene expression by counterbalancing each other). Numerous rarely expressed proteins seem encoded within regular mitochondrial genes through asymmetric nucleotide exchange, avoiding lengthening genomes. Intersecting evidence between several independent approaches confirms the working hypothesis status of gene encryption by systematic nucleotide exchanges. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. PANTHER-PSEP: predicting disease-causing genetic variants using position-specific evolutionary preservation.

    PubMed

    Tang, Haiming; Thomas, Paul D

    2016-07-15

    PANTHER-PSEP is a new software tool for predicting non-synonymous genetic variants that may play a causal role in human disease. Several previous variant pathogenicity prediction methods have been proposed that quantify evolutionary conservation among homologous proteins from different organisms. PANTHER-PSEP employs a related but distinct metric based on 'evolutionary preservation': homologous proteins are used to reconstruct the likely sequences of ancestral proteins at nodes in a phylogenetic tree, and the history of each amino acid can be traced back in time from its current state to estimate how long that state has been preserved in its ancestors. Here, we describe the PSEP tool, and assess its performance on standard benchmarks for distinguishing disease-associated from neutral variation in humans. On these benchmarks, PSEP outperforms not only previous tools that utilize evolutionary conservation, but also several highly used tools that include multiple other sources of information as well. For predicting pathogenic human variants, the trace back of course starts with a human 'reference' protein sequence, but the PSEP tool can also be applied to predicting deleterious or pathogenic variants in reference proteins from any of the ∼100 other species in the PANTHER database. PANTHER-PSEP is freely available on the web at http://pantherdb.org/tools/csnpScoreForm.jsp Users can also download the command-line based tool at ftp://ftp.pantherdb.org/cSNP_analysis/PSEP/ CONTACT: pdthomas@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Cloning and identification of a cDNA that encodes a novel human protein with thrombospondin type I repeat domain, hPWTSR.

    PubMed

    Chen, Jin-Zhong; Wang, Shu; Tang, Rong; Yang, Quan-Sheng; Zhao, Enpeng; Chao, Yaoqiong; Ying, Kang; Xie, Yi; Mao, Yu-Min

    2002-09-01

    A cDNA was isolated from the fetal brain cDNA library by high throughput cDNA sequencing. The 2390 bp cDNA with an open reading fragment (ORF) of 816 bp encodes a 272 amino acids putative protein with a thrombospondin type I repeat (TSR) domain and a cysteine-rich region at the N-terminus, so it is named hPWTSR. We used Northern blot detected two bands with length of about 3 kb and 4 kb respectively, which expressed in human adult tissues with different intensities. The expression pattern was verified by RT-PCR, revealing that the transcripts were expressed ubiquitously in fetal tissues and human tumor tissues too. However, the transcript was detected neither in ovarian carcinoma GI-102 nor in lung carcinoma LX-1. Blast analysis against NCBI database revealed that the new gene contained at least 5 exons and located in human chromosome 6q22.33. Our results demonstrate that the gene is a novel member of TSR supergene family.

  14. Vascular endothelial cells express isoforms of protein kinase A inhibitor.

    PubMed

    Lum, Hazel; Hao, Zengping; Gayle, Dave; Kumar, Priyadarsini; Patterson, Carolyn E; Uhler, Michael D

    2002-01-01

    The expression and function of the endogenous inhibitor of cAMP-dependent protein kinase (PKI) in endothelial cells are unknown. In this study, overexpression of rabbit muscle PKI gene into endothelial cells inhibited the cAMP-mediated increase and exacerbated thrombin-induced decrease in endothelial barrier function. We investigated PKI expression in human pulmonary artery (HPAECs), foreskin microvessel (HMECs), and brain microvessel endothelial cells (HBMECs). RT-PCR using specific primers for human PKI alpha, human PKI gamma, and mouse PKI beta sequences detected PKI alpha and PKI gamma mRNA in all three cell types. Sequencing and BLAST analysis indicated that forward and reverse DNA strands for PKI alpha and PKI gamma were of >96% identity with database sequences. RNase protection assays showed protection of the 542 nucleotides in HBMEC and HPAEC PKI alpha mRNA and 240 nucleotides in HBMEC, HPAEC, and HMEC PKI gamma mRNA. Western blot analysis indicated that PKI gamma protein was detected in all three cell types, whereas PKI alpha was found in HBMECs. In summary, endothelial cells from three different vascular beds express PKI alpha and PKI gamma, which may be physiologically important in endothelial barrier function.

  15. Phenome-genome association studies of pancreatic cancer: new targets for therapy and diagnosis.

    PubMed

    Narayanan, Ramaswamy

    2015-01-01

    Pancreatic cancer, has a very high mortality rate and requires novel molecular targets for diagnosis and therapy. Genetic association studies over databases offer an attractive starting point for gene discovery. The National Center for Biotechnology Information (NCBI) Phenome Genome Integrator (PheGenI) tool was enriched for pancreatic cancer-associated traits. The genes associated with the trait were characterized using diverse bioinformatics tools for Genome-Wide Association (GWA), transcriptome and proteome profile and protein classes for motif and domain. Two hundred twenty-six genes were identified that had a genetic association with pancreatic cancer in the human genome. This included 25 uncharacterized open reading frames (ORFs). Bioinformatics analysis of these ORFs identified putative druggable proteins and biomarkers including enzymes, transporters and G-protein-coupled receptor signaling proteins. Secreted proteins including a neuroendocrine factor and a chemokine were identified. Five out of these ORFs encompassed non coding RNAs. The ORF protein expression was detected in numerous body fluids, such as ascites, bile, pancreatic juice, milk, plasma, serum and saliva. Transcriptome and proteome analyses showed a correlation of mRNA and protein expression for nine ORFs. Analysis of the Catalogue of Somatic Mutations in Cancer (COSMIC) database revealed a strong correlation across copy number variations and mRNA over-expression for four ORFs. Mining of the International Cancer Gene Consortium (ICGC) database identified somatic mutations in a significant number of pancreatic patients' tumors for most of these ORFs. The pancreatic cancer-associated ORFs were also found to be genetically associated with other neoplasms, including leukemia, malignant melanoma, neuroblastoma and prostate carcinomas, as well as other unrelated diseases and disorders, such as Alzheimer's disease, Crohn's disease, coronary diseases, attention deficit disorder and addiction. Based on Genome-Wide Association Studies (GWAS), copy number variations, somatic mutational status and correlation of gene expression in pancreatic tumors at the mRNA and protein level, expression specificity in normal tissues and detection in body fluids, six ORFs emerged as putative leads for pancreatic cancer. These six targets provide a basis for accelerated drug discovery and diagnostic marker development for pancreatic cancer. Copyright© 2015, International Institute of Anticancer Research (Dr. John G. Delinasios), All rights reserved.

  16. Similarity of a 16.5kDa tegumental protein of the human liver fluke Opisthorchis viverrini to nematode cytoplasmic motility protein.

    PubMed

    Labbunruang, Nipawan; Phadungsil, Wansika; Tesana, Smarn; Smooker, Peter M; Grams, Rudi

    2016-05-01

    Opisthorchis viverrini is the causative agent of human opisthorchiasis in Thailand and long lasting infection with the parasite has been correlated with the development of cholangiocarcinoma. In this work we have molecularly characterized the first member of a protein family carrying two DM9 repeats in this parasite (OvDM9-1). InterPro and other protein family databases describe the DM9 repeat as a protein domain of unknown function that has been first noted in Drosophila melanogaster. Two paralogous proteins have been partially characterized in the genus Fasciola, Fasciola hepatica TP16.5, a novel tegumental antigen in human fascioliasis and, recently F. gigantica DM9-1, a parenchymal protein with structural similarity to nematode cytoplasmic motility protein (MFP2). In this study, we show further evidence that this family of trematode proteins is related to MFP2 in sequence and structure. Soluble recombinant OvDM9-1 was used for structural analyses and for production of specific antisera. The native protein was detected in soluble and insoluble crude worm extracts and in seemingly various oligomeric forms in the latter. The potential for oligomerization was supported by cross-linking experiments of recombinant OvDM9-1. Structure prediction suggested a β-rich secondary structure of the protein and this was supported by a circular dichroism analysis. Molecular modeling in Phyre2 identified both MFP2 domains as distant homologs of OvDM9-1. The protein was located in tegumental type tissue and the cecal epithelium in the mature parasite. Recombinant OvDM9-1 was used as target in indirect ELISA but sera from infected hamsters showed only marginal reactivity towards it. It is proposed that OvDM9-1 and other members of this protein family have a role in cellular transport through functions on the cytoskeleton. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. An expressed sequence tag (EST) data mining strategy succeeding in the discovery of new G-protein coupled receptors.

    PubMed

    Wittenberger, T; Schaller, H C; Hellebrand, S

    2001-03-30

    We have developed a comprehensive expressed sequence tag database search method and used it for the identification of new members of the G-protein coupled receptor superfamily. Our approach proved to be especially useful for the detection of expressed sequence tag sequences that do not encode conserved parts of a protein, making it an ideal tool for the identification of members of divergent protein families or of protein parts without conserved domain structures in the expressed sequence tag database. At least 14 of the expressed sequence tags found with this strategy are promising candidates for new putative G-protein coupled receptors. Here, we describe the sequence and expression analysis of five new members of this receptor superfamily, namely GPR84, GPR86, GPR87, GPR90 and GPR91. We also studied the genomic structure and chromosomal localization of the respective genes applying in silico methods. A cluster of six closely related G-protein coupled receptors was found on the human chromosome 3q24-3q25. It consists of four orphan receptors (GPR86, GPR87, GPR91, and H963), the purinergic receptor P2Y1, and the uridine 5'-diphosphoglucose receptor KIAA0001. It seems likely that these receptors evolved from a common ancestor and therefore might have related ligands. In conclusion, we describe a data mining procedure that proved to be useful for the identification and first characterization of new genes and is well applicable for other gene families. Copyright 2001 Academic Press.

  18. Target identification in Fusobacterium nucleatum by subtractive genomics approach and enrichment analysis of host-pathogen protein-protein interactions.

    PubMed

    Kumar, Amit; Thotakura, Pragna Lakshmi; Tiwary, Basant Kumar; Krishna, Ramadas

    2016-05-12

    Fusobacterium nucleatum, a well studied bacterium in periodontal diseases, appendicitis, gingivitis, osteomyelitis and pregnancy complications has recently gained attention due to its association with colorectal cancer (CRC) progression. Treatment with berberine was shown to reverse F. nucleatum-induced CRC progression in mice by balancing the growth of opportunistic pathogens in tumor microenvironment. Intestinal microbiota imbalance and the infections caused by F. nucleatum might be regulated by therapeutic intervention. Hence, we aimed to predict drug target proteins in F. nucleatum, through subtractive genomics approach and host-pathogen protein-protein interactions (HP-PPIs). We also carried out enrichment analysis of host interacting partners to hypothesize the possible mechanisms involved in CRC progression due to F. nucleatum. In subtractive genomics approach, the essential, virulence and resistance related proteins were retrieved from RefSeq proteome of F. nucleatum by searching against Database of Essential Genes (DEG), Virulence Factor Database (VFDB) and Antibiotic Resistance Gene-ANNOTation (ARG-ANNOT) tool respectively. A subsequent hierarchical screening to identify non-human homologous, metabolic pathway-independent/pathway-specific and druggable proteins resulted in eight pathway-independent and 27 pathway-specific druggable targets. Co-aggregation of F. nucleatum with host induces proinflammatory gene expression thereby potentiates tumorigenesis. Hence, proteins from IBDsite, a database for inflammatory bowel disease (IBD) research and those involved in colorectal adenocarcinoma as interpreted from The Cancer Genome Atlas (TCGA) were retrieved to predict drug targets based on HP-PPIs with F. nucleatum proteome. Prediction of HP-PPIs exhibited 186 interactions contributed by 103 host and 76 bacterial proteins. Bacterial interacting partners were accounted as putative targets. And enrichment analysis of host interacting partners showed statistically enriched terms that were in positive correlation with CRC, atherosclerosis, cardiovascular, osteoporosis, Alzheimer's and other diseases. Subtractive genomics analysis provided a set of target proteins suggested to be indispensable for survival and pathogenicity of F. nucleatum. These target proteins might be considered for designing potent inhibitors to abrogate F. nucleatum infections. From enrichment analysis, it was hypothesized that F. nucleatum infection might enhance CRC progression by simultaneously regulating multiple signaling cascades which could lead to up-regulation of proinflammatory responses, oncogenes, modulation of host immune defense mechanism and suppression of DNA repair system.

  19. Proteomic analysis of cellular soluble proteins from human bronchial smooth muscle cells by combining nondenaturing micro 2DE and quantitative LC-MS/MS. 2. Similarity search between protein maps for the analysis of protein complexes.

    PubMed

    Jin, Ya; Yuan, Qi; Zhang, Jun; Manabe, Takashi; Tan, Wen

    2015-09-01

    Human bronchial smooth muscle cell soluble proteins were analyzed by a combined method of nondenaturing micro 2DE, grid gel-cutting, and quantitative LC-MS/MS and a native protein map was prepared for each of the identified 4323 proteins [1]. A method to evaluate the degree of similarity between the protein maps was developed since we expected the proteins comprising a protein complex would be separated together under nondenaturing conditions. The following procedure was employed using Excel macros; (i) maps that have three or more squares with protein quantity data were selected (2328 maps), (ii) within each map, the quantity values of the squares were normalized setting the highest value to be 1.0, (iii) in comparing a map with another map, the smaller normalized quantity in two corresponding squares was taken and summed throughout the map to give an "overlap score," (iv) each map was compared against all the 2328 maps and the largest overlap score, obtained when a map was compared with itself, was set to be 1.0 thus providing 2328 "overlap factors," (v) step (iv) was repeated for all maps providing 2328 × 2328 matrix of overlap factors. From the matrix, protein pairs that showed overlap factors above 0.65 from both protein sides were selected (431 protein pairs). Each protein pair was searched in a database (UniProtKB) on complex formation and 301 protein pairs, which comprise 35 protein complexes, were found to be documented. These results demonstrated that native protein maps and their similarity search would enable simultaneous analysis of multiple protein complexes in cells. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. sc-PDB: a 3D-database of ligandable binding sites—10 years on

    PubMed Central

    Desaphy, Jérémy; Bret, Guillaume; Rognan, Didier; Kellenberger, Esther

    2015-01-01

    The sc-PDB database (available at http://bioinfo-pharma.u-strasbg.fr/scPDB/) is a comprehensive and up-to-date selection of ligandable binding sites of the Protein Data Bank. Sites are defined from complexes between a protein and a pharmacological ligand. The database provides the all-atom description of the protein, its ligand, their binding site and their binding mode. Currently, the sc-PDB archive registers 9283 binding sites from 3678 unique proteins and 5608 unique ligands. The sc-PDB database was publicly launched in 2004 with the aim of providing structure files suitable for computational approaches to drug design, such as docking. During the last 10 years we have improved and standardized the processes for (i) identifying binding sites, (ii) correcting structures, (iii) annotating protein function and ligand properties and (iv) characterizing their binding mode. This paper presents the latest enhancements in the database, specifically pertaining to the representation of molecular interaction and to the similarity between ligand/protein binding patterns. The new website puts emphasis in pictorial analysis of data. PMID:25300483

  1. dbCPG: A web resource for cancer predisposition genes.

    PubMed

    Wei, Ran; Yao, Yao; Yang, Wu; Zheng, Chun-Hou; Zhao, Min; Xia, Junfeng

    2016-06-21

    Cancer predisposition genes (CPGs) are genes in which inherited mutations confer highly or moderately increased risks of developing cancer. Identification of these genes and understanding the biological mechanisms that underlie them is crucial for the prevention, early diagnosis, and optimized management of cancer. Over the past decades, great efforts have been made to identify CPGs through multiple strategies. However, information on these CPGs and their molecular functions is scattered. To address this issue and provide a comprehensive resource for researchers, we developed the Cancer Predisposition Gene Database (dbCPG, Database URL: http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp), the first literature-based gene resource for exploring human CPGs. It contains 827 human (724 protein-coding, 23 non-coding, and 80 unknown type genes), 637 rats, and 658 mouse CPGs. Furthermore, data mining was performed to gain insights into the understanding of the CPGs data, including functional annotation, gene prioritization, network analysis of prioritized genes and overlap analysis across multiple cancer types. A user-friendly web interface with multiple browse, search, and upload functions was also developed to facilitate access to the latest information on CPGs. Taken together, the dbCPG database provides a comprehensive data resource for further studies of cancer predisposition genes.

  2. MiCroKit 3.0: an integrated database of midbody, centrosome and kinetochore.

    PubMed

    Ren, Jian; Liu, Zexian; Gao, Xinjiao; Jin, Changjiang; Ye, Mingliang; Zou, Hanfa; Wen, Longping; Zhang, Zhaolei; Xue, Yu; Yao, Xuebiao

    2010-01-01

    During cell division/mitosis, a specific subset of proteins is spatially and temporally assembled into protein super complexes in three distinct regions, i.e. centrosome/spindle pole, kinetochore/centromere and midbody/cleavage furrow/phragmoplast/bud neck, and modulates cell division process faithfully. Although many experimental efforts have been carried out to investigate the characteristics of these proteins, no integrated database was available. Here, we present the MiCroKit database (http://microkit.biocuckoo.org) of proteins that localize in midbody, centrosome and/or kinetochore. We collected into the MiCroKit database experimentally verified microkit proteins from the scientific literature that have unambiguous supportive evidence for subcellular localization under fluorescent microscope. The current version of MiCroKit 3.0 provides detailed information for 1489 microkit proteins from seven model organisms, including Saccharomyces cerevisiae, Schizasaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, Xenopus laevis, Mus musculus and Homo sapiens. Moreover, the orthologous information was provided for these microkit proteins, and could be a useful resource for further experimental identification. The online service of MiCroKit database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0).

  3. MiCroKit 3.0: an integrated database of midbody, centrosome and kinetochore

    PubMed Central

    Liu, Zexian; Gao, Xinjiao; Jin, Changjiang; Ye, Mingliang; Zou, Hanfa; Wen, Longping; Zhang, Zhaolei; Xue, Yu; Yao, Xuebiao

    2010-01-01

    During cell division/mitosis, a specific subset of proteins is spatially and temporally assembled into protein super complexes in three distinct regions, i.e. centrosome/spindle pole, kinetochore/centromere and midbody/cleavage furrow/phragmoplast/bud neck, and modulates cell division process faithfully. Although many experimental efforts have been carried out to investigate the characteristics of these proteins, no integrated database was available. Here, we present the MiCroKit database (http://microkit.biocuckoo.org) of proteins that localize in midbody, centrosome and/or kinetochore. We collected into the MiCroKit database experimentally verified microkit proteins from the scientific literature that have unambiguous supportive evidence for subcellular localization under fluorescent microscope. The current version of MiCroKit 3.0 provides detailed information for 1489 microkit proteins from seven model organisms, including Saccharomyces cerevisiae, Schizasaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, Xenopus laevis, Mus musculus and Homo sapiens. Moreover, the orthologous information was provided for these microkit proteins, and could be a useful resource for further experimental identification. The online service of MiCroKit database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0). PMID:19783819

  4. In silico re-identification of properties of drug target proteins.

    PubMed

    Kim, Baeksoo; Jo, Jihoon; Han, Jonghyun; Park, Chungoo; Lee, Hyunju

    2017-05-31

    Computational approaches in the identification of drug targets are expected to reduce time and effort in drug development. Advances in genomics and proteomics provide the opportunity to uncover properties of druggable genomes. Although several studies have been conducted for distinguishing drug targets from non-drug targets, they mainly focus on the sequences and functional roles of proteins. Many other properties of proteins have not been fully investigated. Using the DrugBank (version 3.0) database containing nearly 6,816 drug entries including 760 FDA-approved drugs and 1822 of their targets and human UniProt/Swiss-Prot databases, we defined 1578 non-redundant drug target and 17,575 non-drug target proteins. To select these non-redundant protein datasets, we built four datasets (A, B, C, and D) by considering clustering of paralogous proteins. We first reassessed the widely used properties of drug target proteins. We confirmed and extended that drug target proteins (1) are likely to have more hydrophobic, less polar, less PEST sequences, and more signal peptide sequences higher and (2) are more involved in enzyme catalysis, oxidation and reduction in cellular respiration, and operational genes. In this study, we proposed new properties (essentiality, expression pattern, PTMs, and solvent accessibility) for effectively identifying drug target proteins. We found that (1) drug targetability and protein essentiality are decoupled, (2) druggability of proteins has high expression level and tissue specificity, and (3) functional post-translational modification residues are enriched in drug target proteins. In addition, to predict the drug targetability of proteins, we exploited two machine learning methods (Support Vector Machine and Random Forest). When we predicted drug targets by combining previously known protein properties and proposed new properties, an F-score of 0.8307 was obtained. When the newly proposed properties are integrated, the prediction performance is improved and these properties are related to drug targets. We believe that our study will provide a new aspect in inferring drug-target interactions.

  5. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more.

    PubMed

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-07-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: 'Find all diseases associated with Bisphenol A'. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Using Bioinformatic Approaches to Identify Pathways Targeted by Human Leukemogens

    PubMed Central

    Thomas, Reuben; Phuong, Jimmy; McHale, Cliona M.; Zhang, Luoping

    2012-01-01

    We have applied bioinformatic approaches to identify pathways common to chemical leukemogens and to determine whether leukemogens could be distinguished from non-leukemogenic carcinogens. From all known and probable carcinogens classified by IARC and NTP, we identified 35 carcinogens that were associated with leukemia risk in human studies and 16 non-leukemogenic carcinogens. Using data on gene/protein targets available in the Comparative Toxicogenomics Database (CTD) for 29 of the leukemogens and 11 of the non-leukemogenic carcinogens, we analyzed for enrichment of all 250 human biochemical pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The top pathways targeted by the leukemogens included metabolism of xenobiotics by cytochrome P450, glutathione metabolism, neurotrophin signaling pathway, apoptosis, MAPK signaling, Toll-like receptor signaling and various cancer pathways. The 29 leukemogens formed 18 distinct clusters comprising 1 to 3 chemicals that did not correlate with known mechanism of action or with structural similarity as determined by 2D Tanimoto coefficients in the PubChem database. Unsupervised clustering and one-class support vector machines, based on the pathway data, were unable to distinguish the 29 leukemogens from 11 non-leukemogenic known and probable IARC carcinogens. However, using two-class random forests to estimate leukemogen and non-leukemogen patterns, we estimated a 76% chance of distinguishing a random leukemogen/non-leukemogen pair from each other. PMID:22851955

  7. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more

    PubMed Central

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-01-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized ‘Given X, find all associated Ys’ query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: ‘Find all diseases associated with Bisphenol A’. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. PMID:25925572

  8. ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes

    PubMed Central

    Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

    2010-01-01

    Motivation: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith–Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid™, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. Availability: The database can be accessed through http://proteinworlddb.org Contact: otto@fiocruz.br PMID:20089515

  9. Algorithms for database-dependent search of MS/MS data.

    PubMed

    Matthiesen, Rune

    2013-01-01

    The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.

  10. DOMMINO 2.0: integrating structurally resolved protein-, RNA-, and DNA-mediated macromolecular interactions

    PubMed Central

    Kuang, Xingyan; Dhroso, Andi; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry

    2016-01-01

    Macromolecular interactions are formed between proteins, DNA and RNA molecules. Being a principle building block in macromolecular assemblies and pathways, the interactions underlie most of cellular functions. Malfunctioning of macromolecular interactions is also linked to a number of diseases. Structural knowledge of the macromolecular interaction allows one to understand the interaction’s mechanism, determine its functional implications and characterize the effects of genetic variations, such as single nucleotide polymorphisms, on the interaction. Unfortunately, until now the interactions mediated by different types of macromolecules, e.g. protein–protein interactions or protein–DNA interactions, are collected into individual and unrelated structural databases. This presents a significant obstacle in the analysis of macromolecular interactions. For instance, the homogeneous structural interaction databases prevent scientists from studying structural interactions of different types but occurring in the same macromolecular complex. Here, we introduce DOMMINO 2.0, a structural Database Of Macro-Molecular INteractiOns. Compared to DOMMINO 1.0, a comprehensive database on protein-protein interactions, DOMMINO 2.0 includes the interactions between all three basic types of macromolecules extracted from PDB files. DOMMINO 2.0 is automatically updated on a weekly basis. It currently includes ∼1 040 000 interactions between two polypeptide subunits (e.g. domains, peptides, termini and interdomain linkers), ∼43 000 RNA-mediated interactions, and ∼12 000 DNA-mediated interactions. All protein structures in the database are annotated using SCOP and SUPERFAMILY family annotation. As a result, protein-mediated interactions involving protein domains, interdomain linkers, C- and N- termini, and peptides are identified. Our database provides an intuitive web interface, allowing one to investigate interactions at three different resolution levels: whole subunit network, binary interaction and interaction interface. Database URL: http://dommino.org PMID:26827237

  11. The structure and dipole moment of globular proteins in solution and crystalline states: use of NMR and X-ray databases for the numerical calculation of dipole moment.

    PubMed

    Takashima, S

    2001-04-05

    The large dipole moment of globular proteins has been well known because of the detailed studies using dielectric relaxation and electro-optical methods. The search for the origin of these dipolemoments, however, must be based on the detailed knowledge on protein structure with atomic resolutions. At present, we have two sources of information on the structure of protein molecules: (1) x-ray databases obtained in crystalline state; (2) NMR databases obtained in solution state. While x-ray databases consist of only one model, NMR databases, because of the fluctuation of the protein folding in solution, consist of a number of models, thus enabling the computation of dipole moment repeated for all these models. The aim of this work, using these databases, is the detailed investigation on the interdependence between the structure and dipole moment of protein molecules. The dipole moment of protein molecules has roughly two components: one dipole moment is due to surface charges and the other, core dipole moment, is due to polar groups such as N--H and C==O bonds. The computation of surface charge dipole moment consists of two steps: (A) calculation of the pK shifts of charged groups for electrostatic interactions and (B) calculation of the dipole moment using the pK corrected for electrostatic shifts. The dipole moments of several proteins were computed using both NMR and x-ray databases. The dipole moments of these two sets of calculations are, with a few exceptions, in good agreement with one another and also with measured dipole moments.

  12. Biomine: predicting links between biological entities using network models of heterogeneous databases.

    PubMed

    Eronen, Lauri; Toivonen, Hannu

    2012-06-06

    Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.

  13. A Novel Proteomics Approach to Identify SUMOylated Proteins and Their Modification Sites in Human Cells*

    PubMed Central

    Galisson, Frederic; Mahrouche, Louiza; Courcelles, Mathieu; Bonneil, Eric; Meloche, Sylvain; Chelbi-Alix, Mounira K.; Thibault, Pierre

    2011-01-01

    The small ubiquitin-related modifier (SUMO) is a small group of proteins that are reversibly attached to protein substrates to modify their functions. The large scale identification of protein SUMOylation and their modification sites in mammalian cells represents a significant challenge because of the relatively small number of in vivo substrates and the dynamic nature of this modification. We report here a novel proteomics approach to selectively enrich and identify SUMO conjugates from human cells. We stably expressed different SUMO paralogs in HEK293 cells, each containing a His6 tag and a strategically located tryptic cleavage site at the C terminus to facilitate the recovery and identification of SUMOylated peptides by affinity enrichment and mass spectrometry. Tryptic peptides with short SUMO remnants offer significant advantages in large scale SUMOylome experiments including the generation of paralog-specific fragment ions following CID and ETD activation, and the identification of modified peptides using conventional database search engines such as Mascot. We identified 205 unique protein substrates together with 17 precise SUMOylation sites present in 12 SUMO protein conjugates including three new sites (Lys-380, Lys-400, and Lys-497) on the protein promyelocytic leukemia. Label-free quantitative proteomics analyses on purified nuclear extracts from untreated and arsenic trioxide-treated cells revealed that all identified SUMOylated sites of promyelocytic leukemia were differentially SUMOylated upon stimulation. PMID:21098080

  14. TryTransDB: A web-based resource for transport proteins in Trypanosomatidae.

    PubMed

    Sonar, Krushna; Kabra, Ritika; Singh, Shailza

    2018-03-12

    TryTransDB is a web-based resource that stores transport protein data which can be retrieved using a standalone BLAST tool. We have attempted to create an integrated database that can be a one-stop shop for the researchers working with transport proteins of Trypanosomatidae family. TryTransDB (Trypanosomatidae Transport Protein Database) is a web based comprehensive resource that can fire a BLAST search against most of the transport protein sequences (protein and nucleotide) from Trypanosomatidae family organisms. This web resource further allows to compute a phylogenetic tree by performing multiple sequence alignment (MSA) using CLUSTALW suite embedded in it. Also, cross-linking to other databases helps in gathering more information for a certain transport protein in a single website.

  15. AIM: A comprehensive Arabidopsis Interactome Module database and related interologs in plants

    USDA-ARS?s Scientific Manuscript database

    Systems biology analysis of protein modules is important for understanding the functional relationships between proteins in the interactome. Here, we present a comprehensive database named AIM for Arabidopsis (Arabidopsis thaliana) interactome modules. The database contains almost 250,000 modules th...

  16. VaProS: a database-integration approach for protein/genome information retrieval.

    PubMed

    Gojobori, Takashi; Ikeo, Kazuho; Katayama, Yukie; Kawabata, Takeshi; Kinjo, Akira R; Kinoshita, Kengo; Kwon, Yeondae; Migita, Ohsuke; Mizutani, Hisashi; Muraoka, Masafumi; Nagata, Koji; Omori, Satoshi; Sugawara, Hideaki; Yamada, Daichi; Yura, Kei

    2016-12-01

    Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein-protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts' knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/ .

  17. mTM-align: a server for fast protein structure database search and multiple protein structure alignment.

    PubMed

    Dong, Runze; Pan, Shuo; Peng, Zhenling; Zhang, Yang; Yang, Jianyi

    2018-05-21

    With the rapid increase of the number of protein structures in the Protein Data Bank, it becomes urgent to develop algorithms for efficient protein structure comparisons. In this article, we present the mTM-align server, which consists of two closely related modules: one for structure database search and the other for multiple structure alignment. The database search is speeded up based on a heuristic algorithm and a hierarchical organization of the structures in the database. The multiple structure alignment is performed using the recently developed algorithm mTM-align. Benchmark tests demonstrate that our algorithms outperform other peering methods for both modules, in terms of speed and accuracy. One of the unique features for the server is the interplay between database search and multiple structure alignment. The server provides service not only for performing fast database search, but also for making accurate multiple structure alignment with the structures found by the search. For the database search, it takes about 2-5 min for a structure of a medium size (∼300 residues). For the multiple structure alignment, it takes a few seconds for ∼10 structures of medium sizes. The server is freely available at: http://yanglab.nankai.edu.cn/mTM-align/.

  18. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

    PubMed Central

    Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.

    2005-01-01

    The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248

  19. Single step generation of protein arrays from DNA by cell-free expression and in situ immobilisation (PISA method).

    PubMed

    He, M; Taussig, M J

    2001-08-01

    We describe a format for production of protein arrays termed 'protein in situ array' (PISA). A PISA is rapidly generated in one step directly from PCR-generated DNA fragments by cell-free protein expression and in situ immobilisation at a surface. The template for expression is DNA encoding individual proteins or domains, which is produced by PCR using primers designed from information in DNA databases. Coupled transcription and translation is carried out on a surface to which the tagged protein adheres as soon as it is synthesised. Because proteins generated by cell-free synthesis are usually soluble and functional, this method can overcome problems of insolubility or degradation associated with bacterial expression of recombinant proteins. Moreover, the use of PCR-generated DNA enables rapid production of proteins or domains based on genome information alone and will be particularly useful where cloned material is not available. Here we show that human single-chain antibody fragments (three domain, V(H)/K form) and an enzyme (luciferase) can be functionally arrayed by the PISA method.

  20. YPD™, PombePD™ and WormPD™: model organism volumes of the BioKnowledge™ Library, an integrated resource for protein information

    PubMed Central

    Costanzo, Maria C.; Crawford, Matthew E.; Hirschman, Jodi E.; Kranz, Janice E.; Olsen, Philip; Robertson, Laura S.; Skrzypek, Marek S.; Braun, Burkhard R.; Hopkins, Kelley Lennon; Kondu, Pinar; Lengieza, Carey; Lew-Smith, Jodi E.; Tillberg, Michael; Garrels, James I.

    2001-01-01

    The BioKnowledge Library is a relational database and web site (http://www.proteome.com) composed of protein-specific information collected from the scientific literature. Each Protein Report on the web site summarizes and displays published information about a single protein, including its biochemical function, role in the cell and in the whole organism, localization, mutant phenotype and genetic interactions, regulation, domains and motifs, interactions with other proteins and other relevant data. This report describes four species-specific volumes of the BioKnowledge Library, concerned with the model organisms Saccharo­myces cerevisiae (YPD), Schizosaccharomyces pombe (PombePD) and Caenorhabditis elegans (WormPD), and with the fungal pathogen Candida albicans (CalPD™). Protein Reports of each species are unified in format, easily searchable and extensively cross-referenced between species. The relevance of these comprehensively curated resources to analysis of proteins in other species is discussed, and is illustrated by a survey of model organism proteins that have similarity to human proteins involved in disease. PMID:11125054

  1. Application of Ferulic Acid for Alzheimer’s Disease: Combination of Text Mining and Experimental Validation

    PubMed Central

    Meng, Guilin; Meng, Xiulin; Ma, Xiaoye; Zhang, Gengping; Hu, Xiaolin; Jin, Aiping; Liu, Xueyuan

    2018-01-01

    Alzheimer’s disease (AD) is an increasing concern in human health. Despite significant research, highly effective drugs to treat AD are lacking. The present study describes the text mining process to identify drug candidates from a traditional Chinese medicine (TCM) database, along with associated protein target mechanisms. We carried out text mining to identify literatures that referenced both AD and TCM and focused on identifying compounds and protein targets of interest. After targeting one potential TCM candidate, corresponding protein-protein interaction (PPI) networks were assembled in STRING to decipher the most possible mechanism of action. This was followed by validation using Western blot and co-immunoprecipitation in an AD cell model. The text mining strategy using a vast amount of AD-related literature and the TCM database identified curcumin, whose major component was ferulic acid (FA). This was used as a key candidate compound for further study. Using the top calculated interaction score in STRING, BACE1 and MMP2 were implicated in the activity of FA in AD. Exposure of SHSY5Y-APP cells to FA resulted in the decrease in expression levels of BACE-1 and APP, while the expression of MMP-2 and MMP-9 increased in a dose-dependent manner. This suggests that FA induced BACE1 and MMP2 pathways maybe novel potential mechanisms involved in AD. The text mining of literature and TCM database related to AD suggested FA as a promising TCM ingredient for the treatment of AD. Potential mechanisms interconnected and integrated with Aβ aggregation inhibition and extracellular matrix remodeling underlying the activity of FA were identified using in vitro studies. PMID:29896095

  2. A catalog for the transcripts from the venomous structures of the caterpillar Lonomia obliqua: identification of the proteins potentially involved in the coagulation disorder and hemorrhagic syndrome

    PubMed Central

    Veiga, Ana B. G.; Ribeiro, José M. C.; Guimarães, Jorge A.; Francischetti, Ivo M.B.

    2010-01-01

    Accidents with the caterpillar Lonomia obliqua are often associated with a coagulation disorder and hemorrhagic syndrome in humans. In the present study, we have constructed cDNA libraries from two venomous structures of the caterpillar, namely the tegument and the bristle. High-throughput sequencing and bioinformatics analyses were performed in parallel. Over one thousand cDNAs were obtained and clustered to produce a database of 538 contigs and singletons (clusters) for the tegument library and 368 for the bristle library. We have thus identified dozens of full-length cDNAs coding for proteins with sequence homology to snake venom prothrombin activator, trypsin-like enzymes, blood coagulation factors and prophenoloxidase cascade activators. We also report cDNA coding for cysteine proteases, Group III phospholipase A2, C-type lectins, lipocalins, in addition to protease inhibitors including serpins, Kazal-type inhibitors, cystatins and trypsin inhibitor-like molecules. Antibacterial proteins and housekeeping genes are also described. A significant number of sequences were devoid of database matches, suggesting that their biologic function remains to be defined. We also report the N-terminus of the most abundant proteins present in the bristle, tegument, hemolymph, and "cryosecretion". Thus, we have created a catalog that contains the predicted molecular weight, isoelectric point, accession number, and putative function for each selected molecule from the venomous structures of L. obliqua. The role of these molecules in the coagulation disorder and hemorrhagic syndrome caused by envenomation with this caterpillar is discussed. All sequence information and the Supplemental Data, including Figures and Tables with hyperlinks to FASTA-formatted files for each contig and the best match to the Databases, are available at http://www.ncbi.nih.gov/projects/omes. PMID:16023793

  3. Analysis of gene expression profile microarray data in complex regional pain syndrome.

    PubMed

    Tan, Wulin; Song, Yiyan; Mo, Chengqiang; Jiang, Shuangjian; Wang, Zhongxing

    2017-09-01

    The aim of the present study was to predict key genes and proteins associated with complex regional pain syndrome (CRPS) using bioinformatics analysis. The gene expression profiling microarray data, GSE47603, which included peripheral blood samples from 4 patients with CRPS and 5 healthy controls, was obtained from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) in CRPS patients compared with healthy controls were identified using the GEO2R online tool. Functional enrichment analysis was then performed using The Database for Annotation Visualization and Integrated Discovery online tool. Protein‑protein interaction (PPI) network analysis was subsequently performed using Search Tool for the Retrieval of Interaction Genes database and analyzed with Cytoscape software. A total of 257 DEGs were identified, including 243 upregulated genes and 14 downregulated ones. Genes in the human leukocyte antigen (HLA) family were most significantly differentially expressed. Enrichment analysis demonstrated that signaling pathways, including immune response, cell motion, adhesion and angiogenesis were associated with CRPS. PPI network analysis revealed that key genes, including early region 1A binding protein p300 (EP300), CREB‑binding protein (CREBBP), signal transducer and activator of transcription (STAT)3, STAT5A and integrin α M were associated with CRPS. The results suggest that the immune response may therefore serve an important role in CRPS development. In addition, genes in the HLA family, such as HLA‑DQB1 and HLA‑DRB1, may present potential biomarkers for the diagnosis of CRPS. Furthermore, EP300, its paralog CREBBP, and the STAT family genes, STAT3 and STAT5 may be important in the development of CRPS.

  4. Application of Ferulic Acid for Alzheimer's Disease: Combination of Text Mining and Experimental Validation.

    PubMed

    Meng, Guilin; Meng, Xiulin; Ma, Xiaoye; Zhang, Gengping; Hu, Xiaolin; Jin, Aiping; Zhao, Yanxin; Liu, Xueyuan

    2018-01-01

    Alzheimer's disease (AD) is an increasing concern in human health. Despite significant research, highly effective drugs to treat AD are lacking. The present study describes the text mining process to identify drug candidates from a traditional Chinese medicine (TCM) database, along with associated protein target mechanisms. We carried out text mining to identify literatures that referenced both AD and TCM and focused on identifying compounds and protein targets of interest. After targeting one potential TCM candidate, corresponding protein-protein interaction (PPI) networks were assembled in STRING to decipher the most possible mechanism of action. This was followed by validation using Western blot and co-immunoprecipitation in an AD cell model. The text mining strategy using a vast amount of AD-related literature and the TCM database identified curcumin, whose major component was ferulic acid (FA). This was used as a key candidate compound for further study. Using the top calculated interaction score in STRING, BACE1 and MMP2 were implicated in the activity of FA in AD. Exposure of SHSY5Y-APP cells to FA resulted in the decrease in expression levels of BACE-1 and APP, while the expression of MMP-2 and MMP-9 increased in a dose-dependent manner. This suggests that FA induced BACE1 and MMP2 pathways maybe novel potential mechanisms involved in AD. The text mining of literature and TCM database related to AD suggested FA as a promising TCM ingredient for the treatment of AD. Potential mechanisms interconnected and integrated with Aβ aggregation inhibition and extracellular matrix remodeling underlying the activity of FA were identified using in vitro studies.

  5. Neural/Bayes network predictor for inheritable cardiac disease pathogenicity and phenotype.

    PubMed

    Burghardt, Thomas P; Ajtai, Katalin

    2018-04-11

    The cardiac muscle sarcomere contains multiple proteins contributing to contraction energy transduction and its regulation during a heartbeat. Inheritable heart disease mutants affect most of them but none more frequently than the ventricular myosin motor and cardiac myosin binding protein c (mybpc3). These co-localizing proteins have mybpc3 playing a regulatory role to the energy transducing motor. Residue substitution and functional domain assignment of each mutation in the protein sequence decides, under the direction of a sensible disease model, phenotype and pathogenicity. The unknown model mechanism is decided here using a method combing neural and Bayes networks. Missense single nucleotide polymorphisms (SNPs) are clues for the disease mechanism summarized in an extensive database collecting mutant sequence location and residue substitution as independent variables that imply the dependent disease phenotype and pathogenicity characteristics in 4 dimensional data points (4ddps). The SNP database contains entries with the majority having one or both dependent data entries unfulfilled. A neural network relating causes (mutant residue location and substitution) and effects (phenotype and pathogenicity) is trained, validated, and optimized using fulfilled 4ddps. It then predicts unfulfilled 4ddps providing the implicit disease model. A discrete Bayes network interprets fulfilled and predicted 4ddps with conditional probabilities for phenotype and pathogenicity given mutation location and residue substitution thus relating the neural network implicit model to explicit features of the motor and mybpc3 sequence and structural domains. Neural/Bayes network forecasting automates disease mechanism modeling by leveraging the world wide human missense SNP database that is in place and expanding. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  6. Creation of a federated database of blood proteins: a powerful new tool for finding and characterizing biomarkers in serum

    PubMed Central

    2014-01-01

    Protein biomarkers offer major benefits for diagnosis and monitoring of disease processes. Recent advances in protein mass spectrometry make it feasible to use this very sensitive technology to detect and quantify proteins in blood. To explore the potential of blood biomarkers, we conducted a thorough review to evaluate the reliability of data in the literature and to determine the spectrum of proteins reported to exist in blood with a goal of creating a Federated Database of Blood Proteins (FDBP). A unique feature of our approach is the use of a SQL database for all of the peptide data; the power of the SQL database combined with standard informatic algorithms such as BLAST and the statistical analysis system (SAS) allowed the rapid annotation and analysis of the database without the need to create special programs to manage the data. Our mathematical analysis and review shows that in addition to the usual secreted proteins found in blood, there are many reports of intracellular proteins and good agreement on transcription factors, DNA remodelling factors in addition to cellular receptors and their signal transduction enzymes. Overall, we have catalogued about 12,130 proteins identified by at least one unique peptide, and of these 3858 have 3 or more peptide correlations. The FDBP with annotations should facilitate testing blood for specific disease biomarkers. PMID:24476026

  7. CyanoClust: comparative genome resources of cyanobacteria and plastids.

    PubMed

    Sasaki, Naobumi V; Sato, Naoki

    2010-01-01

    Cyanobacteria, which perform oxygen-evolving photosynthesis as do chloroplasts of plants and algae, are one of the best-studied prokaryotic phyla and one from which many representative genomes have been sequenced. Lack of a suitable comparative genomic database has been a problem in cyanobacterial genomics because many proteins involved in physiological functions such as photosynthesis and nitrogen fixation are not catalogued in commonly used databases, such as Clusters of Orthologous Proteins (COG). CyanoClust is a database of homolog groups in cyanobacteria and plastids that are produced by the program Gclust. We have developed a web-server system for the protein homology database featuring cyanobacteria and plastids. Database URL: http://cyanoclust.c.u-tokyo.ac.jp/.

  8. Oral cancer databases: A comprehensive review.

    PubMed

    Sarode, Gargi S; Sarode, Sachin C; Maniyar, Nikunj; Anand, Rahul; Patil, Shankargouda

    2017-11-29

    Cancer database is a systematic collection and analysis of information on various human cancers at genomic and molecular level that can be utilized to understand various steps in carcinogenesis and for therapeutic advancement in cancer field. Oral cancer is one of the leading causes of morbidity and mortality all over the world. The current research efforts in this field are aimed at cancer etiology and therapy. Advanced genomic technologies including microarrays, proteomics, transcrpitomics, and gene sequencing development have culminated in generation of extensive data and subjection of several genes and microRNAs that are distinctively expressed and this information is stored in the form of various databases. Extensive data from various resources have brought the need for collaboration and data sharing to make effective use of this new knowledge. The current review provides comprehensive information of various publicly accessible databases that contain information pertinent to oral squamous cell carcinoma (OSCC) and databases designed exclusively for OSCC. The databases discussed in this paper are Protein-Coding Gene Databases and microRNA Databases. This paper also describes gene overlap in various databases, which will help researchers to reduce redundancy and focus on only those genes, which are common to more than one databases. We hope such introduction will promote awareness and facilitate the usage of these resources in the cancer research community, and researchers can explore the molecular mechanisms involved in the development of cancer, which can help in subsequent crafting of therapeutic strategies. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  9. Exceptionally long 5' UTR short tandem repeats specifically linked to primates.

    PubMed

    Namdar-Aligoodarzi, P; Mohammadparast, S; Zaker-Kandjani, B; Talebi Kakroodi, S; Jafari Vesiehsari, M; Ohadi, M

    2015-09-10

    We have previously reported genome-scale short tandem repeats (STRs) in the core promoter interval (i.e. -120 to +1 to the transcription start site) of protein-coding genes that have evolved identically in primates vs. non-primates. Those STRs may function as evolutionary switch codes for primate speciation. In the current study, we used the Ensembl database to analyze the 5' untranslated region (5' UTR) between +1 and +60 of the transcription start site of the entire human protein-coding genes annotated in the GeneCards database, in order to identify "exceptionally long" STRs (≥5-repeats), which may be of selective/adaptive advantage. The importance of this critical interval is its function as core promoter, and its effect on transcription and translation. In order to minimize ascertainment bias, we analyzed the evolutionary status of the human 5' UTR STRs of ≥5-repeats in several species encompassing six major orders and superorders across mammals, including primates, rodents, Scandentia, Laurasiatheria, Afrotheria, and Xenarthra. We introduce primate-specific STRs, and STRs which have expanded from mouse to primates. Identical co-occurrence of the identified STRs of rare average frequency between 0.006 and 0.0001 in primates supports a role for those motifs in processes that diverged primates from other mammals, such as neuronal differentiation (e.g. APOD and FGF4), and craniofacial development (e.g. FILIP1L). A number of the identified STRs of ≥5-repeats may be human-specific (e.g. ZMYM3 and DAZAP1). Future work is warranted to examine the importance of the listed genes in primate/human evolution, development, and disease. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. MultitaskProtDB: a database of multitasking proteins.

    PubMed

    Hernández, Sergio; Ferragut, Gabriela; Amela, Isaac; Perez-Pons, JosepAntoni; Piñol, Jaume; Mozo-Villarias, Angel; Cedano, Juan; Querol, Enrique

    2014-01-01

    We have compiled MultitaskProtDB, available online at http://wallace.uab.es/multitask, to provide a repository where the many multitasking proteins found in the literature can be stored. Multitasking or moonlighting is the capability of some proteins to execute two or more biological functions. Usually, multitasking proteins are experimentally revealed by serendipity. This ability of proteins to perform multitasking functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Even so, the study of this phenomenon is complex because, among other things, there is no database of moonlighting proteins. The existence of such a tool facilitates the collection and dissemination of these important data. This work reports the database, MultitaskProtDB, which is designed as a friendly user web page containing >288 multitasking proteins with their NCBI and UniProt accession numbers, canonical and additional biological functions, monomeric/oligomeric states, PDB codes when available and bibliographic references. This database also serves to gain insight into some characteristics of multitasking proteins such as frequencies of the different pairs of functions, phylogenetic conservation and so forth.

  11. KnotProt: a database of proteins with knots and slipknots.

    PubMed

    Jamroz, Michal; Niemyska, Wanda; Rawdon, Eric J; Stasiak, Andrzej; Millett, Kenneth C; Sułkowski, Piotr; Sulkowska, Joanna I

    2015-01-01

    The protein topology database KnotProt, http://knotprot.cent.uw.edu.pl/, collects information about protein structures with open polypeptide chains forming knots or slipknots. The knotting complexity of the cataloged proteins is presented in the form of a matrix diagram that shows users the knot type of the entire polypeptide chain and of each of its subchains. The pattern visible in the matrix gives the knotting fingerprint of a given protein and permits users to determine, for example, the minimal length of the knotted regions (knot's core size) or the depth of a knot, i.e. how many amino acids can be removed from either end of the cataloged protein structure before converting it from a knot to a different type of knot. In addition, the database presents extensive information about the biological functions, families and fold types of proteins with non-trivial knotting. As an additional feature, the KnotProt database enables users to submit protein or polymer chains and generate their knotting fingerprints. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Microarray analysis of genes associated with cell surface NIS protein levels in breast cancer.

    PubMed

    Beyer, Sasha J; Zhang, Xiaoli; Jimenez, Rafael E; Lee, Mei-Ling T; Richardson, Andrea L; Huang, Kun; Jhiang, Sissy M

    2011-10-11

    Na+/I- symporter (NIS)-mediated iodide uptake allows radioiodine therapy for thyroid cancer. NIS is also expressed in breast tumors, raising potential for radionuclide therapy of breast cancer. However, NIS expression in most breast cancers is low and may not be sufficient for radionuclide therapy. We aimed to identify biomarkers associated with NIS expression such that mechanisms underlying NIS modulation in human breast tumors may be elucidated. Published oligonucleotide microarray data within the National Center for Biotechnology Information Gene Expression Omnibus database were analyzed to identify gene expression tightly correlated with NIS mRNA level among human breast tumors. NIS immunostaining was performed in a tissue microarray composed of 28 human breast tumors which had corresponding oligonucleotide microarray data available for each tumor such that gene expression associated with cell surface NIS protein level could be identified. NIS mRNA levels do not vary among breast tumors or when compared to normal breast tissues when detected by Affymetrix oligonucleotide microarray platforms. Cell surface NIS protein levels are much more variable than their corresponding NIS mRNA levels. Despite a limited number of breast tumors examined, our analysis identified cysteinyl-tRNA synthetase as a biomarker that is highly associated with cell surface NIS protein levels in the ER-positive breast cancer subtype. Further investigation on genes associated with cell surface NIS protein levels within each breast cancer molecular subtype may lead to novel targets for selectively increasing NIS expression/function in a subset of breast cancers patients.

  13. Metaproteome analysis of endodontic infections in association with different clinical conditions.

    PubMed

    Provenzano, José Claudio; Siqueira, José F; Rôças, Isabela N; Domingues, Romênia R; Paes Leme, Adriana F; Silva, Márcia R S

    2013-01-01

    Analysis of the metaproteome of microbial communities is important to provide an insight of community physiology and pathogenicity. This study evaluated the metaproteome of endodontic infections associated with acute apical abscesses and asymptomatic apical periodontitis lesions. Proteins persisting or expressed after root canal treatment were also evaluated. Finally, human proteins associated with these infections were identified. Samples were taken from root canals of teeth with asymptomatic apical periodontitis before and after chemomechanical treatment using either NaOCl or chlorhexidine as the irrigant. Samples from abscesses were taken by aspiration of the purulent exudate. Clinical samples were processed for analysis of the exoproteome by using two complementary mass spectrometry platforms: nanoflow liquid chromatography coupled with linear ion trap quadrupole Velos Orbitrap and liquid chromatography-quadrupole time-of-flight. A total of 308 proteins of microbial origin were identified. The number of proteins in abscesses was higher than in asymptomatic cases. In canals irrigated with chlorhexidine, the number of identified proteins decreased substantially, while in the NaOCl group the number of proteins increased. The large majority of microbial proteins found in endodontic samples were related to metabolic and housekeeping processes, including protein synthesis, energy metabolism and DNA processes. Moreover, several other proteins related to pathogenicity and resistance/survival were found, including proteins involved with adhesion, biofilm formation and antibiotic resistance, stress proteins, exotoxins, invasins, proteases and endopeptidases (mostly in abscesses), and an archaeal protein linked to methane production. The majority of human proteins detected were related to cellular processes and metabolism, as well as immune defense. Interrogation of the metaproteome of endodontic microbial communities provides information on the physiology and pathogenicity of the community at the time of sampling. There is a growing need for expanded and more curated protein databases that permit more accurate identifications of proteins in metaproteomic studies.

  14. Reference System of DNA and Protein Sequences on CD-ROM

    NASA Astrophysics Data System (ADS)

    Nasu, Hisanori; Ito, Toshiaki

    DNASIS-DBREF31 is a database for DNA and Protein sequences in the form of optical Compact Disk (CD) ROM, developed and commercialized by Hitachi Software Engineering Co., Ltd. Both nucleic acid base sequences and protein amino acid sequences can be retrieved from a single CD-ROM. Existing database is offered in the form of on-line service, floppy disks, or magnetic tape, all of which have some problems or other, such as usability or storage capacity. DNASIS-DBREF31 newly adopt a CD-ROM as a database device to realize a mass storage and personal use of the database.

  15. Receptor Tyrosine Kinase MET Interactome and Neurodevelopmental Disorder Partners at the Developing Synapse

    PubMed Central

    Xie, Zhihui; Li, Jing; Baker, Jonathan; Eagleson, Kathie L.; Coba, Marcelo P.; Levitt, Pat

    2016-01-01

    Background Atypical synapse development and plasticity are implicated in many neurodevelopmental disorders (NDDs). NDD-associated, high confidence risk genes have been identified, yet little is known about functional relationships at the level of protein-protein interactions, which are the dominant molecular bases responsible for mediating circuit development. Methods Proteomics in three independent developing neocortical synaptosomal preparations identified putative interacting proteins of the ligand-activated MET receptor tyrosine kinase, an autism risk gene that mediates synapse development. The candidates were translated into interactome networks and analyzed bioinformatically. Additionally, three independent quantitative proximity ligation assays (PLA) in cultured neurons and four independent immunoprecipitation analyses of synaptosomes validated protein interactions. Results Approximately 11% (8/72) of MET-interacting proteins, including SHANK3, SYNGAP1 and GRIN2B, are associated with NDDs. Proteins in the MET interactome were translated into a novel MET interactome network based on human protein-protein interaction databases. High confidence genes from different NDD datasets that encode synaptosomal proteins were analyzed for being enriched in MET interactome proteins. This was found for autism, but not schizophrenia, bipolar disorder, major depressive disorder or attentional deficit hyperactivity disorder. There is correlated gene expression between MET and its interactive partners in developing human temporal and visual neocortices, but not with highly expressed genes that are not in the interactome. PLA and biochemical analyses demonstrate that MET-protein partner interactions are dynamically regulated by receptor activation. Conclusions The results provide a novel molecular framework for deciphering the functional relations of key regulators of synaptogenesis that contribute to both typical cortical development and to NDDs. PMID:27086544

  16. Multiple-approaches to the identification and quantification of cytochromes P450 in human liver tissue by mass spectrometry.

    PubMed

    Seibert, Cathrin; Davidson, Brian R; Fuller, Barry J; Patterson, Laurence H; Griffiths, William J; Wang, Yuqin

    2009-04-01

    Here we report the identification and approximate quantification of cytochrome P450 (CYP) proteins in human liver microsomes as determined by nano-LC-MS/MS with application of the exponentially modified protein abundance index (emPAI) algorithm during database searching. Protocols based on 1D-gel protein separation and 2D-LC peptide separation gave comparable results. In total, 18 CYP isoforms were unambiguously identified based on unique peptide matches. Further, we have determined the absolute quantity of two CYP enzymes (2E1 and 1A2) in human liver microsomes using stable-isotope dilution mass spectrometry, where microsomal proteins were separated by 1D-gel electrophoresis, digested with trypsin in the presence of either a CYP2E1- or 1A2-specific stable-isotope labeled tryptic peptide and analyzed by LC-MS/MS. Using multiple reaction monitoring (MRM) for the isotope-labeled tryptic peptides and their natural unlabeled analogues quantification could be performed over the range of 0.1-1.5 pmol on column. Liver microsomes from four individuals were analyzed for CYP2E1 giving values of 88-200 pmol/mg microsomal protein. The CYP1A2 content of microsomes from a further three individuals ranged from 165 to 263 pmol/mg microsomal protein. Although, in this proof-of-concept study for CYP quantification, the two CYP isoforms were quantified from different samples, there are no practical reasons to prevent multiplexing the method to allow the quantification of multiple CYP isoforms in a single sample.

  17. Multiple-approaches to the identification and quantification of cytochromes P450 in human liver tissue by mass spectrometry

    PubMed Central

    Seibert, Cathrin; Davidson, Brian R.; Fuller, Barry J.; Patterson, Laurence H.; Griffiths, William J.; Wang, Yuqin

    2009-01-01

    Here we report the identification and approximate quantification of cytochrome P450 (CYP) proteins in human liver microsomes as determined by nano-LC-MS/MS with application of the exponentially modified protein abundance index (emPAI) algorithm during database searching. Protocols based on 1D-gel protein separation and 2D-LC peptide separation gave comparable results. In total 18 CYP isoforms were unambiguously identified based on unique peptide matches. Further, we have determined the absolute quantity of two CYP enzymes (2E1 and 1A2) in human liver microsomes using stable-isotope dilution mass spectrometry, where microsomal proteins were separated by 1D-gel electrophoresis, digested with trypsin in the presence of either a CYP2E1- or 1A2-specific stable-isotope labelled tryptic peptide and analysed by LC-MS/MS. Using multiple reaction monitoring (MRM) for the isotope-labelled tryptic peptides and their natural unlabelled analogues quantification could be performed over the range of 0.1 – 1.5 pmol on column. Liver microsomes from four individuals were analysed for CYP2E1 giving values of 88 - 200 pmol/mg microsomal protein. The CYP1A2 content of microsomes from a further three individuals ranged from 165 – 263 pmol/mg microsomal protein. Although, in this proof-of-concept study for CYP quantification, the two CYP-isoforms were quantified from different samples, there are no practical reasons to prevent multiplexing the method to allow the quantification of multiple CYP-isoforms in a single sample. PMID:19714871

  18. SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

    PubMed Central

    Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

    2001-01-01

    Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202

  19. The Human Transcript Database: A Catalogue of Full Length cDNA Inserts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bouckk John; Michael McLeod; Kim Worley

    1999-09-10

    The BCM Search Launcher provided improved access to web-based sequence analysis services during the granting period and beyond. The Search Launcher web site grouped analysis procedures by function and provided default parameters that provided reasonable search results for most applications. For instance, most queries were automatically masked for repeat sequences prior to sequence database searches to avoid spurious matches. In addition to the web-based access and arrangements that were made using the functions easier, the BCM Search Launcher provided unique value-added applications like the BEAUTY sequence database search tool that combined information about protein domains and sequence database search resultsmore » to give an enhanced, more complete picture of the reliability and relative value of the information reported. This enhanced search tool made evaluating search results more straight-forward and consistent. Some of the favorite features of the web site are the sequence utilities and the batch client functionality that allows processing of multiple samples from the command line interface. One measure of the success of the BCM Search Launcher is the number of sites that have adopted the models first developed on the site. The graphic display on the BLAST search from the NCBI web site is one such outgrowth, as is the display of protein domain search results within BLAST search results, and the design of the Biology Workbench application. The logs of usage and comments from users confirm the great utility of this resource.« less

  20. Defining the proteome of human iris, ciliary body, retinal pigment epithelium, and choroid.

    PubMed

    Zhang, Pingbo; Kirby, David; Dufresne, Craig; Chen, Yan; Turner, Randi; Ferri, Sara; Edward, Deepak P; Van Eyk, Jennifer E; Semba, Richard D

    2016-04-01

    The iris is a fine structure that controls the amount of light that enters the eye. The ciliary body controls the shape of the lens and produces aqueous humor. The retinal pigment epithelium and choroid (RPE/choroid) are essential in supporting the retina and absorbing light energy that enters the eye. Proteins were extracted from iris, ciliary body, and RPE/choroid tissues of eyes from five individuals and fractionated using SDS-PAGE. After in-gel digestion, peptides were analyzed using LC-MS/MS on an Orbitrap Elite mass spectrometer. In iris, ciliary body, and RPE/choroid, we identified 2959, 2867, and 2755 nonredundant proteins with peptide and protein false-positive rates of <0.1% and <1%, respectively. Forty-three unambiguous protein isoforms were identified in iris, ciliary body, and RPE/choroid. Four "missing proteins" were identified in ciliary body based on ≥2 proteotypic peptides. The mass spectrometric proteome database of the human iris, ciliary body, and RPE/choroid may serve as a valuable resource for future investigations of the eye in health and disease. The MS proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001424 and PXD002194. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Proteomic and Bioinformatic Profile of Primary Human Oral Epithelial Cells

    PubMed Central

    Ghosh, Santosh K.; Yohannes, Elizabeth; Bebek, Gurkan; Weinberg, Aaron; Jiang, Bin; Willard, Belinda; Chance, Mark R.; Kinter, Michael T.; McCormick, Thomas S.

    2012-01-01

    Wounding of the oral mucosa occurs frequently in a highly septic environment. Remarkably, these wounds heal quickly and the oral cavity, for the most part, remains healthy. Deciphering the normal human oral epithelial cell (NHOEC) proteome is critical for understanding the mechanism(s) of protection elicited when the mucosal barrier is intact, as well as when it is breached. Combining 2D gel electrophoresis with shotgun proteomics resulted in identification of 1662 NHOEC proteins. Proteome annotations were performed based on protein classes, molecular functions, disease association and membership in canonical and metabolic signaling pathways. Comparing the NHOEC proteome with a database of innate immunity-relevant interactions (InnateDB) identified 64 common proteins associated with innate immunity. Comparison with published salivary proteomes revealed that 738/1662 NHOEC proteins were common, suggesting that significant numbers of salivary proteins are of epithelial origin. Gene ontology analysis showed similarities in the distributions of NHOEC and saliva proteomes with regard to biological processes, and molecular functions. We also assessed the inter-individual variability of the NHOEC proteome and observed it to be comparable with other primary cells. The baseline proteome described in this study should serve as a resource for proteome studies of the oral mucosa, especially in relation to disease processes. PMID:23035736

  2. High Dynamic Range Characterization of the Trauma Patient Plasma Proteome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Tao; Qian, Weijun; Gritsenko, Marina A.

    2006-06-08

    While human plasma represents an attractive sample for disease biomarker discovery, the extreme complexity and large dynamic range in protein concentrations present significant challenges for characterization, candidate biomarker discovery, and validation. Herein, we describe a strategy that combines immunoaffinity subtraction and chemical fractionation based on cysteinyl peptide and N-glycopeptide captures with 2D-LC-MS/MS to increase the dynamic range of analysis for plasma. Application of this ''divide-and-conquer'' strategy to trauma patient plasma significantly improved the overall dynamic range of detection and resulted in confident identification of 22,267 unique peptides from four different peptide populations (cysteinyl peptides, non-cysteinyl peptides, N-glycopeptides, and non-glycopeptides) thatmore » covered 3654 nonredundant proteins. Numerous low-abundance proteins were identified, exemplified by 78 ''classic'' cytokines and cytokine receptors and by 136 human cell differentiation molecules. Additionally, a total of 2910 different N-glycopeptides that correspond to 662 N-glycoproteins and 1553 N-glycosylation sites were identified. A panel of the proteins identified in this study is known to be involved in inflammation and immune responses. This study established an extensive reference protein database for trauma patients, which provides a foundation for future high-throughput quantitative plasma proteomic studies designed to elucidate the mechanisms that underlie systemic inflammatory responses.« less

  3. Online Mendelian Inheritance in Man (OMIM).

    PubMed

    Hamosh, A; Scott, A F; Amberger, J; Valle, D; McKusick, V A

    2000-01-01

    Online Mendelian Inheritance In Man (OMIM) is a public database of bibliographic information about human genes and genetic disorders. Begun by Dr. Victor McKusick as the authoritative reference Mendelian Inheritance in Man, it is now distributed electronically by the National Center for Biotechnology Information (NCBI). Material in OMIM is derived from the biomedical literature and is written by Dr. McKusick and his colleagues at Johns Hopkins University and elsewhere. Each OMIM entry has a full text summary of a genetic phenotype and/or gene and has copious links to other genetic resources such as DNA and protein sequence, PubMed references, mutation databases, approved gene nomenclature, and more. In addition, NCBI's neighboring feature allows users to identify related articles from PubMed selected on the basis of key words in the OMIM entry. Through its many features, OMIM is increasingly becoming a major gateway for clinicians, students, and basic researchers to the ever-growing literature and resources of human genetics. Copyright 2000 Wiley-Liss, Inc.

  4. Projections for fast protein structure retrieval

    PubMed Central

    Bhattacharya, Sourangshu; Bhattacharyya, Chiranjib; Chandra, Nagasuma R

    2006-01-01

    Background In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. Results Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali. PMID:17254310

  5. Integrating In Silico Resources to Map a Signaling Network

    PubMed Central

    Liu, Hanqing; Beck, Tim N.; Golemis, Erica A.; Serebriiskii, Ilya G.

    2013-01-01

    The abundance of publicly available life science databases offer a wealth of information that can support interpretation of experimentally derived data and greatly enhance hypothesis generation. Protein interaction and functional networks are not simply new renditions of existing data: they provide the opportunity to gain insights into the specific physical and functional role a protein plays as part of the biological system. In this chapter, we describe different in silico tools that can quickly and conveniently retrieve data from existing data repositories and discuss how the available tools are best utilized for different purposes. While emphasizing protein-protein interaction databases (e.g., BioGrid and IntAct), we also introduce metasearch platforms such as STRING and GeneMANIA, pathway databases (e.g., BioCarta and Pathway Commons), text mining approaches (e.g., PubMed and Chilibot), and resources for drug-protein interactions, genetic information for model organisms and gene expression information based on microarray data mining. Furthermore, we provide a simple step-by-step protocol to building customized protein-protein interaction networks in Cytoscape, a powerful network assembly and visualization program, integrating data retrieved from these various databases. As we illustrate, generation of composite interaction networks enables investigators to extract significantly more information about a given biological system than utilization of a single database or sole reliance on primary literature. PMID:24233784

  6. Approaches for Defining the Hsp90-dependent Proteome

    PubMed Central

    Hartson, Steven D.; Matts, Robert L.

    2011-01-01

    Hsp90 is the target of ongoing drug discovery studies seeking new compounds to treat cancer, neurodegenerative diseases, and protein folding disorders. To better understand Hsp90’s roles in cellular pathologies and in normal cells, numerous studies have utilized proteomics assays and related high-throughput tools to characterize its physical and functional protein partnerships. This review surveys these studies, and summarizes the strengths and limitations of the individual attacks. We also include downloadable spreadsheets compiling all of the Hsp90-interacting proteins identified in more than 23 studies. These tools include cross-references among gene aliases, human homologues of yeast Hsp90-interacting proteins, hyperlinks to database entries, summaries of canonical pathways that are enriched in the Hsp90 interactome, and additional bioinformatic annotations. In addition to summarizing Hsp90 proteomics studies performed to date and the insights they have provided, we identify gaps in our current understanding of Hsp90-mediated proteostasis. PMID:21906632

  7. sc-PDB: a 3D-database of ligandable binding sites--10 years on.

    PubMed

    Desaphy, Jérémy; Bret, Guillaume; Rognan, Didier; Kellenberger, Esther

    2015-01-01

    The sc-PDB database (available at http://bioinfo-pharma.u-strasbg.fr/scPDB/) is a comprehensive and up-to-date selection of ligandable binding sites of the Protein Data Bank. Sites are defined from complexes between a protein and a pharmacological ligand. The database provides the all-atom description of the protein, its ligand, their binding site and their binding mode. Currently, the sc-PDB archive registers 9283 binding sites from 3678 unique proteins and 5608 unique ligands. The sc-PDB database was publicly launched in 2004 with the aim of providing structure files suitable for computational approaches to drug design, such as docking. During the last 10 years we have improved and standardized the processes for (i) identifying binding sites, (ii) correcting structures, (iii) annotating protein function and ligand properties and (iv) characterizing their binding mode. This paper presents the latest enhancements in the database, specifically pertaining to the representation of molecular interaction and to the similarity between ligand/protein binding patterns. The new website puts emphasis in pictorial analysis of data. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Trends and Cost of Posterior Cervical Fusions With and Without Recombinant Human Bone Morphogenetic Protein-2 in the US Medicare Population.

    PubMed

    Myhre, Sue Lynn; Buser, Zorica; Meisel, Hans-Joerg; Brodke, Darrel S; Yoon, S Tim; Wang, Jeffrey C; Park, Jong-Beom; Youssef, Jim A

    2017-06-01

    Retrospective database review. To analyze and report the trends and cost of posterior cervical fusions (PCFs) with and without off-label recombinant human bone morphogenetic protein-2 (rhBMP-2) in the Medicare population. Patient records from the PearlDiver database were retrospectively reviewed from January 1, 2005, to December 31, 2012, to distinguish individuals who underwent a PCF with or without rhBMP-2. Total numbers, incidence, age, gender, geographic region, reimbursement, and length of stay were analyzed and summarized. The combined total of non-rhBMP-2 (n = 39 479; 85.51%) and rhBMP-2 PCF (n = 6692; 14.49%) procedures performed between 2005 and 2012 was 46 171. In general, the number of PCFs without rhBMP-2 consistently increased over time, while the number of PCFs with rhBMP-2 had only a slight increase from 2005 to 2012. On average, PCFs without rhBMP-2 were associated with $1197 higher cost than those with rhBMP-2, but the average length of stay was similar (6 days). From 2005 to 2012, the average cost for procedures with and without rhBMP-2 increased by $12 605 and $7291, respectively. The percentage of rhBMP-2 use peaked in 2007 and dwindled until 2010, and declined an additional 2.84% from 2011 to 2012. Multiple age, region, and gender tendencies were observed. To our knowledge, this was the first study to use the PearlDiver database to report incidence and cost trends of PCF procedures. This article provides meaningful trend data on PCFs to surgeons and clinicians, researchers, and patients, as well as functions as a beacon for future research questions.

  9. The European Bioinformatics Institute's data resources 2014.

    PubMed

    Brooksbank, Catherine; Bergman, Mary Todd; Apweiler, Rolf; Birney, Ewan; Thornton, Janet

    2014-01-01

    Molecular Biology has been at the heart of the 'big data' revolution from its very beginning, and the need for access to biological data is a common thread running from the 1965 publication of Dayhoff's 'Atlas of Protein Sequence and Structure' through the Human Genome Project in the late 1990s and early 2000s to today's population-scale sequencing initiatives. The European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) is one of three organizations worldwide that provides free access to comprehensive, integrated molecular data sets. Here, we summarize the principles underpinning the development of these public resources and provide an overview of EMBL-EBI's database collection to complement the reviews of individual databases provided elsewhere in this issue.

  10. SIMAP—the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage

    PubMed Central

    Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas

    2014-01-01

    The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881

  11. Applications of Protein Thermodynamic Database for Understanding Protein Mutant Stability and Designing Stable Mutants.

    PubMed

    Gromiha, M Michael; Anoosha, P; Huang, Liang-Tsung

    2016-01-01

    Protein stability is the free energy difference between unfolded and folded states of a protein, which lies in the range of 5-25 kcal/mol. Experimentally, protein stability is measured with circular dichroism, differential scanning calorimetry, and fluorescence spectroscopy using thermal and denaturant denaturation methods. These experimental data have been accumulated in the form of a database, ProTherm, thermodynamic database for proteins and mutants. It also contains sequence and structure information of a protein, experimental methods and conditions, and literature information. Different features such as search, display, and sorting options and visualization tools have been incorporated in the database. ProTherm is a valuable resource for understanding/predicting the stability of proteins and it can be accessed at http://www.abren.net/protherm/ . ProTherm has been effectively used to examine the relationship among thermodynamics, structure, and function of proteins. We describe the recent progress on the development of methods for understanding/predicting protein stability, such as (1) general trends on mutational effects on stability, (2) relationship between the stability of protein mutants and amino acid properties, (3) applications of protein three-dimensional structures for predicting their stability upon point mutations, (4) prediction of protein stability upon single mutations from amino acid sequence, and (5) prediction methods for addressing double mutants. A list of online resources for predicting has also been provided.

  12. SIBIS: a Bayesian model for inconsistent protein sequence estimation.

    PubMed

    Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D

    2014-09-01

    The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification.

    PubMed

    Schuemie, Martijn J; Mons, Barend; Weeber, Marc; Kors, Jan A

    2007-06-01

    Gene and protein name identification in text requires a dictionary approach to relate synonyms to the same gene or protein, and to link names to external databases. However, existing dictionaries are incomplete. We investigate two complementary methods for automatic generation of a comprehensive dictionary: combination of information from existing gene and protein databases and rule-based generation of spelling variations. Both methods have been reported in literature before, but have hitherto not been combined and evaluated systematically. We combined gene and protein names from several existing databases of four different organisms. The combined dictionaries showed a substantial increase in recall on three different test sets, as compared to any single database. Application of 23 spelling variation rules to the combined dictionaries further increased recall. However, many rules appeared to have no effect and some appear to have a detrimental effect on precision.

  14. The COG database: a tool for genome-scale analysis of protein functions and evolution

    PubMed Central

    Tatusov, Roman L.; Galperin, Michael Y.; Natale, Darren A.; Koonin, Eugene V.

    2000-01-01

    Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www.ncbi.nlm.nih.gov/COG ). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56–83% of the gene products from each of the complete bacterial and archaeal genomes and ~35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes. PMID:10592175

  15. Mining databases for protein aggregation: a review.

    PubMed

    Tsiolaki, Paraskevi L; Nastou, Katerina C; Hamodrakas, Stavros J; Iconomidou, Vassiliki A

    2017-09-01

    Protein aggregation is an active area of research in recent decades, since it is the most common and troubling indication of protein instability. Understanding the mechanisms governing protein aggregation and amyloidogenesis is a key component to the aetiology and pathogenesis of many devastating disorders, including Alzheimer's disease or type 2 diabetes. Protein aggregation data are currently found "scattered" in an increasing number of repositories, since advances in computational biology greatly influence this field of research. This review exploits the various resources of aggregation data and attempts to distinguish and analyze the biological knowledge they contain, by introducing protein-based, fragment-based and disease-based repositories, related to aggregation. In order to gain a broad overview of the available repositories, a novel comprehensive network maps and visualizes the current association between aggregation databases and other important databases and/or tools and discusses the beneficial role of community annotation. The need for unification of aggregation databases in a common platform is also addressed.

  16. Functional insights from the distribution and role of homopeptide repeat-containing proteins

    PubMed Central

    Faux, Noel G.; Bottomley, Stephen P.; Lesk, Arthur M.; Irving, James A.; Morrison, John R.; de la Banda, Maria Garcia; Whisstock, James C.

    2005-01-01

    Expansion of “low complex” repeats of amino acids such as glutamine (Poly-Q) is associated with protein misfolding and the development of degenerative diseases such as Huntington's disease. The mechanism by which such regions promote misfolding remains controversial, the function of many repeat-containing proteins (RCPs) remains obscure, and the role (if any) of repeat regions remains to be determined. Here, a Web-accessible database of RCPs is presented. The distribution and evolution of RCPs that contain homopeptide repeats tracts are considered, and the existence of functional patterns investigated. Generally, it is found that while polyamino acid repeats are extremely rare in prokaryotes, several eukaryote putative homologs of prokaryote RCP—involved in important housekeeping processes—retain the repetitive region, suggesting an ancient origin for certain repeats. Within eukarya, the most common uninterrupted amino acid repeats are glutamine, asparagines, and alanine. Interestingly, while poly-Q repeats are found in vertebrates and nonvertebrates, poly-N repeats are only common in more primitive nonvertebrate organisms, such as insects and nematodes. We have assigned function to eukaryote RCPs using Online Mendelian Inheritance in Man (OMIM), the Human Reference Protein Database (HRPD), FlyBase, and Wormpep. Prokaryote RCPs were annotated using BLASTp searches and Gene Ontology. These data reveal that the majority of RCPs are involved in processes that require the assembly of large, multiprotein complexes, such as transcription and signaling. PMID:15805494

  17. Identification of Potent Chloride Intracellular Channel Protein 1 Inhibitors from Traditional Chinese Medicine through Structure-Based Virtual Screening and Molecular Dynamics Analysis

    PubMed Central

    Wan, Minghui; Liao, Dongjiang; Peng, Guilin; Xu, Xin; Yin, Weiqiang; Guo, Guixin; Jiang, Funeng; Zhong, Weide

    2017-01-01

    Chloride intracellular channel 1 (CLIC1) is involved in the development of most aggressive human tumors, including gastric, colon, lung, liver, and glioblastoma cancers. It has become an attractive new therapeutic target for several types of cancer. In this work, we aim to identify natural products as potent CLIC1 inhibitors from Traditional Chinese Medicine (TCM) database using structure-based virtual screening and molecular dynamics (MD) simulation. First, structure-based docking was employed to screen the refined TCM database and the top 500 TCM compounds were obtained and reranked by X-Score. Then, 30 potent hits were achieved from the top 500 TCM compounds using cluster and ligand-protein interaction analysis. Finally, MD simulation was employed to validate the stability of interactions between each hit and CLIC1 protein from docking simulation, and Molecular Mechanics/Generalized Born Surface Area (MM-GBSA) analysis was used to refine the virtual hits. Six TCM compounds with top MM-GBSA scores and ideal-binding models were confirmed as the final hits. Our study provides information about the interaction between TCM compounds and CLIC1 protein, which may be helpful for further experimental investigations. In addition, the top 6 natural products structural scaffolds could serve as building blocks in designing drug-like molecules for CLIC1 inhibition. PMID:29147652

  18. Systematic detection of positive selection in the human-pathogen interactome and lasting effects on infectious disease susceptibility.

    PubMed

    Corona, Erik; Wang, Liuyang; Ko, Dennis; Patel, Chirag J

    2018-01-01

    Infectious disease has shaped the natural genetic diversity of humans throughout the world. A new approach to capture positive selection driven by pathogens would provide information regarding pathogen exposure in distinct human populations and the constantly evolving arms race between host and disease-causing agents. We created a human pathogen interaction database and used the integrated haplotype score (iHS) to detect recent positive selection in genes that interact with proteins from 26 different pathogens. We used the Human Genome Diversity Panel to identify specific populations harboring pathogen-interacting genes that have undergone positive selection. We found that human genes that interact with 9 pathogen species show evidence of recent positive selection. These pathogens are Yersenia pestis, human immunodeficiency virus (HIV) 1, Zaire ebolavirus, Francisella tularensis, dengue virus, human respiratory syncytial virus, measles virus, Rubella virus, and Bacillus anthracis. For HIV-1, GWAS demonstrate that some naturally selected variants in the host-pathogen protein interaction networks continue to have functional consequences for susceptibility to these pathogens. We show that selected human genes were enriched for HIV susceptibility variants (identified through GWAS), providing further support for the hypothesis that ancient humans were exposed to lentivirus pandemics. Human genes in the Italian, Miao, and Biaka Pygmy populations that interact with Y. pestis show significant signs of selection. These results reveal some of the genetic footprints created by pathogens in the human genome that may have left lasting marks on susceptibility to infectious disease.

  19. ATtRACT-a database of RNA-binding proteins and associated motifs.

    PubMed

    Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

    2016-01-01

    RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. © The Author(s) 2016. Published by Oxford University Press.

  20. SInCRe—structural interactome computational resource for Mycobacterium tuberculosis

    PubMed Central

    Metri, Rahul; Hariharaputran, Sridhar; Ramakrishnan, Gayatri; Anand, Praveen; Raghavender, Upadhyayula S.; Ochoa-Montaño, Bernardo; Higueruelo, Alicia P.; Sowdhamini, Ramanathan; Chandra, Nagasuma R.; Blundell, Tom L.; Srinivasan, Narayanaswamy

    2015-01-01

    We have developed an integrated database for Mycobacterium tuberculosis H37Rv (Mtb) that collates information on protein sequences, domain assignments, functional annotation and 3D structural information along with protein–protein and protein–small molecule interactions. SInCRe (Structural Interactome Computational Resource) is developed out of CamBan (Cambridge and Bangalore) collaboration. The motivation for development of this database is to provide an integrated platform to allow easily access and interpretation of data and results obtained by all the groups in CamBan in the field of Mtb informatics. In-house algorithms and databases developed independently by various academic groups in CamBan are used to generate Mtb-specific datasets and are integrated in this database to provide a structural dimension to studies on tuberculosis. The SInCRe database readily provides information on identification of functional domains, genome-scale modelling of structures of Mtb proteins and characterization of the small-molecule binding sites within Mtb. The resource also provides structure-based function annotation, information on small-molecule binders including FDA (Food and Drug Administration)-approved drugs, protein–protein interactions (PPIs) and natural compounds that bind to pathogen proteins potentially and result in weakening or elimination of host–pathogen protein–protein interactions. Together they provide prerequisites for identification of off-target binding. Database URL: http://proline.biochem.iisc.ernet.in/sincre PMID:26130660

Top