Sample records for visualizing protein database

  1. Integrated web visualizations for protein-protein interaction databases.

    PubMed

    Jeanquartier, Fleur; Jean-Quartier, Claire; Holzinger, Andreas

    2015-06-16

    Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. We selected M=10 out of N=53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. The supplementary table can be accessed on http://tinyurl.com/PPI-DB-Comparison-2015. Only some web resources featuring graph visualization can be successfully applied to interactive visual analysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization integration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive feature and visualization maturing.

  2. HPIminer: A text mining system for building and visualizing human protein interaction networks and pathways.

    PubMed

    Subramani, Suresh; Kalpana, Raja; Monickaraj, Pankaj Moses; Natarajan, Jeyakumar

    2015-04-01

    The knowledge on protein-protein interactions (PPI) and their related pathways are equally important to understand the biological functions of the living cell. Such information on human proteins is highly desirable to understand the mechanism of several diseases such as cancer, diabetes, and Alzheimer's disease. Because much of that information is buried in biomedical literature, an automated text mining system for visualizing human PPI and pathways is highly desirable. In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature. HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG. To our knowledge, HPIminer is the first system to build interaction networks from literature as well as curated databases. Further, the new interactions mined only from literature and not reported earlier in databases are highlighted as new. A comparative study with other similar tools shows that the resultant network is more informative and provides additional information on interacting proteins and their associated networks. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. ProXL (Protein Cross-Linking Database): A Platform for Analysis, Visualization, and Sharing of Protein Cross-Linking Mass Spectrometry Data

    PubMed Central

    2016-01-01

    ProXL is a Web application and accompanying database designed for sharing, visualizing, and analyzing bottom-up protein cross-linking mass spectrometry data with an emphasis on structural analysis and quality control. ProXL is designed to be independent of any particular software pipeline. The import process is simplified by the use of the ProXL XML data format, which shields developers of data importers from the relative complexity of the relational database schema. The database and Web interfaces function equally well for any software pipeline and allow data from disparate pipelines to be merged and contrasted. ProXL includes robust public and private data sharing capabilities, including a project-based interface designed to ensure security and facilitate collaboration among multiple researchers. ProXL provides multiple interactive and highly dynamic data visualizations that facilitate structural-based analysis of the observed cross-links as well as quality control. ProXL is open-source, well-documented, and freely available at https://github.com/yeastrc/proxl-web-app. PMID:27302480

  4. ProXL (Protein Cross-Linking Database): A Platform for Analysis, Visualization, and Sharing of Protein Cross-Linking Mass Spectrometry Data.

    PubMed

    Riffle, Michael; Jaschob, Daniel; Zelter, Alex; Davis, Trisha N

    2016-08-05

    ProXL is a Web application and accompanying database designed for sharing, visualizing, and analyzing bottom-up protein cross-linking mass spectrometry data with an emphasis on structural analysis and quality control. ProXL is designed to be independent of any particular software pipeline. The import process is simplified by the use of the ProXL XML data format, which shields developers of data importers from the relative complexity of the relational database schema. The database and Web interfaces function equally well for any software pipeline and allow data from disparate pipelines to be merged and contrasted. ProXL includes robust public and private data sharing capabilities, including a project-based interface designed to ensure security and facilitate collaboration among multiple researchers. ProXL provides multiple interactive and highly dynamic data visualizations that facilitate structural-based analysis of the observed cross-links as well as quality control. ProXL is open-source, well-documented, and freely available at https://github.com/yeastrc/proxl-web-app .

  5. CellMap visualizes protein-protein interactions and subcellular localization

    PubMed Central

    Dallago, Christian; Goldberg, Tatyana; Andrade-Navarro, Miguel Angel; Alanis-Lobato, Gregorio; Rost, Burkhard

    2018-01-01

    Many tools visualize protein-protein interaction (PPI) networks. The tool introduced here, CellMap, adds one crucial novelty by visualizing PPI networks in the context of subcellular localization, i.e. the location in the cell or cellular component in which a PPI happens. Users can upload images of cells and define areas of interest against which PPIs for selected proteins are displayed (by default on a cartoon of a cell). Annotations of localization are provided by the user or through our in-house database. The visualizer and server are written in JavaScript, making CellMap easy to customize and to extend by researchers and developers. PMID:29497493

  6. ChemProt-2.0: visual navigation in a disease chemical biology database

    PubMed Central

    Kim Kjærulff, Sonny; Wich, Louis; Kringelum, Jens; Jacobsen, Ulrik P.; Kouskoumvekaki, Irene; Audouze, Karine; Lund, Ole; Brunak, Søren; Oprea, Tudor I.; Taboureau, Olivier

    2013-01-01

    ChemProt-2.0 (http://www.cbs.dtu.dk/services/ChemProt-2.0) is a public available compilation of multiple chemical–protein annotation resources integrated with diseases and clinical outcomes information. The database has been updated to >1.15 million compounds with 5.32 millions bioactivity measurements for 15 290 proteins. Each protein is linked to quality-scored human protein–protein interactions data based on more than half a million interactions, for studying diseases and biological outcomes (diseases, pathways and GO terms) through protein complexes. In ChemProt-2.0, therapeutic effects as well as adverse drug reactions have been integrated allowing for suggesting proteins associated to clinical outcomes. New chemical structure fingerprints were computed based on the similarity ensemble approach. Protein sequence similarity search was also integrated to evaluate the promiscuity of proteins, which can help in the prediction of off-target effects. Finally, the database was integrated into a visual interface that enables navigation of the pharmacological space for small molecules. Filtering options were included in order to facilitate and to guide dynamic search of specific queries. PMID:23185041

  7. e23D: database and visualization of A-to-I RNA editing sites mapped to 3D protein structures.

    PubMed

    Solomon, Oz; Eyal, Eran; Amariglio, Ninette; Unger, Ron; Rechavi, Gidi

    2016-07-15

    e23D, a database of A-to-I RNA editing sites from human, mouse and fly mapped to evolutionary related protein 3D structures, is presented. Genomic coordinates of A-to-I RNA editing sites are converted to protein coordinates and mapped onto 3D structures from PDB or theoretical models from ModBase. e23D allows visualization of the protein structure, modeling of recoding events and orientation of the editing with respect to nearby genomic functional sites from databases of disease causing mutations and genomic polymorphism. http://www.sheba-cancer.org.il/e23D CONTACT: oz.solomon@live.biu.ac.il or Eran.Eyal@sheba.health.gov.il. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. VisANT 3.0: new modules for pathway visualization, editing, prediction and construction.

    PubMed

    Hu, Zhenjun; Ng, David M; Yamada, Takuji; Chen, Chunnuan; Kawashima, Shuichi; Mellor, Joe; Linghu, Bolan; Kanehisa, Minoru; Stuart, Joshua M; DeLisi, Charles

    2007-07-01

    With the integration of the KEGG and Predictome databases as well as two search engines for coexpressed genes/proteins using data sets obtained from the Stanford Microarray Database (SMD) and Gene Expression Omnibus (GEO) database, VisANT 3.0 supports exploratory pathway analysis, which includes multi-scale visualization of multiple pathways, editing and annotating pathways using a KEGG compatible visual notation and visualization of expression data in the context of pathways. Expression levels are represented either by color intensity or by nodes with an embedded expression profile. Multiple experiments can be navigated or animated. Known KEGG pathways can be enriched by querying either coexpressed components of known pathway members or proteins with known physical interactions. Predicted pathways for genes/proteins with unknown functions can be inferred from coexpression or physical interaction data. Pathways produced in VisANT can be saved as computer-readable XML format (VisML), graphic images or high-resolution Scalable Vector Graphics (SVG). Pathways in the format of VisML can be securely shared within an interested group or published online using a simple Web link. VisANT is freely available at http://visant.bu.edu.

  9. IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model

    PubMed Central

    Xia, Kai; Dong, Dong; Han, Jing-Dong J

    2006-01-01

    Background Although protein-protein interaction (PPI) networks have been explored by various experimental methods, the maps so built are still limited in coverage and accuracy. To further expand the PPI network and to extract more accurate information from existing maps, studies have been carried out to integrate various types of functional relationship data. A frequently updated database of computationally analyzed potential PPIs to provide biological researchers with rapid and easy access to analyze original data as a biological network is still lacking. Results By applying a probabilistic model, we integrated 27 heterogeneous genomic, proteomic and functional annotation datasets to predict PPI networks in human. In addition to previously studied data types, we show that phenotypic distances and genetic interactions can also be integrated to predict PPIs. We further built an easy-to-use, updatable integrated PPI database, the Integrated Network Database (IntNetDB) online, to provide automatic prediction and visualization of PPI network among genes of interest. The networks can be visualized in SVG (Scalable Vector Graphics) format for zooming in or out. IntNetDB also provides a tool to extract topologically highly connected network neighborhoods from a specific network for further exploration and research. Using the MCODE (Molecular Complex Detections) algorithm, 190 such neighborhoods were detected among all the predicted interactions. The predicted PPIs can also be mapped to worm, fly and mouse interologs. Conclusion IntNetDB includes 180,010 predicted protein-protein interactions among 9,901 human proteins and represents a useful resource for the research community. Our study has increased prediction coverage by five-fold. IntNetDB also provides easy-to-use network visualization and analysis tools that allow biological researchers unfamiliar with computational biology to access and analyze data over the internet. The web interface of IntNetDB is freely accessible at . Visualization requires Mozilla version 1.8 (or higher) or Internet Explorer with installation of SVGviewer. PMID:17112386

  10. MEGADOCK-Web: an integrated database of high-throughput structure-based protein-protein interaction predictions.

    PubMed

    Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka

    2018-05-08

    Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on docking calculations with biochemical pathways and enables users to easily and quickly assess PPI feasibilities by archiving PPI predictions. MEGADOCK-Web also promotes the discovery of new PPIs and protein functions and is freely available for use at http://www.bi.cs.titech.ac.jp/megadock-web/ .

  11. ImmunemiR - A Database of Prioritized Immune miRNA Disease Associations and its Interactome.

    PubMed

    Prabahar, Archana; Natarajan, Jeyakumar

    2017-01-01

    MicroRNAs are the key regulators of gene expression and their abnormal expression in the immune system may be associated with several human diseases such as inflammation, cancer and autoimmune diseases. Elucidation of miRNA disease association through the interactome will deepen the understanding of its disease mechanisms. A specialized database for immune miRNAs is highly desirable to demonstrate the immune miRNA disease associations in the interactome. miRNAs specific to immune related diseases were retrieved from curated databases such as HMDD, miR2disease and PubMed literature based on MeSH classification of immune system diseases. The additional data such as miRNA target genes, genes coding protein-protein interaction information were compiled from related resources. Further, miRNAs were prioritized to specific immune diseases using random walk ranking algorithm. In total 245 immune miRNAs associated with 92 OMIM disease categories were identified from external databases. The resultant data were compiled as ImmunemiR, a database of prioritized immune miRNA disease associations. This database provides both text based annotation information and network visualization of its interactome. To our knowledge, ImmunemiR is the first available database to provide a comprehensive repository of human immune disease associated miRNAs with network visualization options of its target genes, protein-protein interactions (PPI) and its disease associations. It is freely available at http://www.biominingbu.org/immunemir/. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  12. THGS: a web-based database of Transmembrane Helices in Genome Sequences

    PubMed Central

    Fernando, S. A.; Selvarani, P.; Das, Soma; Kumar, Ch. Kiran; Mondal, Sukanta; Ramakumar, S.; Sekar, K.

    2004-01-01

    Transmembrane Helices in Genome Sequences (THGS) is an interactive web-based database, developed to search the transmembrane helices in the user-interested gene sequences available in the Genome Database (GDB). The proposed database has provision to search sequence motifs in transmembrane and globular proteins. In addition, the motif can be searched in the other sequence databases (Swiss-Prot and PIR) or in the macromolecular structure database, Protein Data Bank (PDB). Further, the 3D structure of the corresponding queried motif, if it is available in the solved protein structures deposited in the Protein Data Bank, can also be visualized using the widely used graphics package RASMOL. All the sequence databases used in the present work are updated frequently and hence the results produced are up to date. The database THGS is freely available via the world wide web and can be accessed at http://pranag.physics.iisc.ernet.in/thgs/ or http://144.16.71.10/thgs/. PMID:14681375

  13. PROXiMATE: a database of mutant protein-protein complex thermodynamics and kinetics.

    PubMed

    Jemimah, Sherlyn; Yugandhar, K; Michael Gromiha, M

    2017-09-01

    We have developed PROXiMATE, a database of thermodynamic data for more than 6000 missense mutations in 174 heterodimeric protein-protein complexes, supplemented with interaction network data from STRING database, solvent accessibility, sequence, structural and functional information, experimental conditions and literature information. Additional features include complex structure visualization, search and display options, download options and a provision for users to upload their data. The database is freely available at http://www.iitm.ac.in/bioinfo/PROXiMATE/ . The website is implemented in Python, and supports recent versions of major browsers such as IE10, Firefox, Chrome and Opera. gromiha@iitm.ac.in. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  14. SynechoNET: integrated protein-protein interaction database of a model cyanobacterium Synechocystis sp. PCC 6803.

    PubMed

    Kim, Woo-Yeon; Kang, Sungsoo; Kim, Byoung-Chul; Oh, Jeehyun; Cho, Seongwoong; Bhak, Jong; Choi, Jong-Soon

    2008-01-01

    Cyanobacteria are model organisms for studying photosynthesis, carbon and nitrogen assimilation, evolution of plant plastids, and adaptability to environmental stresses. Despite many studies on cyanobacteria, there is no web-based database of their regulatory and signaling protein-protein interaction networks to date. We report a database and website SynechoNET that provides predicted protein-protein interactions. SynechoNET shows cyanobacterial domain-domain interactions as well as their protein-level interactions using the model cyanobacterium, Synechocystis sp. PCC 6803. It predicts the protein-protein interactions using public interaction databases that contain mutually complementary and redundant data. Furthermore, SynechoNET provides information on transmembrane topology, signal peptide, and domain structure in order to support the analysis of regulatory membrane proteins. Such biological information can be queried and visualized in user-friendly web interfaces that include the interactive network viewer and search pages by keyword and functional category. SynechoNET is an integrated protein-protein interaction database designed to analyze regulatory membrane proteins in cyanobacteria. It provides a platform for biologists to extend the genomic data of cyanobacteria by predicting interaction partners, membrane association, and membrane topology of Synechocystis proteins. SynechoNET is freely available at http://synechocystis.org/ or directly at http://bioportal.kobic.kr/SynechoNET/.

  15. JavaProtein Dossier: a novel web-based data visualization tool for comprehensive analysis of protein structure

    PubMed Central

    Neshich, Goran; Rocchia, Walter; Mancini, Adauto L.; Yamagishi, Michel E. B.; Kuser, Paula R.; Fileto, Renato; Baudet, Christian; Pinto, Ivan P.; Montagner, Arnaldo J.; Palandrani, Juliana F.; Krauchenco, Joao N.; Torres, Renato C.; Souza, Savio; Togawa, Roberto C.; Higa, Roberto H.

    2004-01-01

    JavaProtein Dossier (JPD) is a new concept, database and visualization tool providing one of the largest collections of the physicochemical parameters describing proteins' structure, stability, function and interaction with other macromolecules. By collecting as many descriptors/parameters as possible within a single database, we can achieve a better use of the available data and information. Furthermore, data grouping allows us to generate different parameters with the potential to provide new insights into the sequence–structure–function relationship. In JPD, residue selection can be performed according to multiple criteria. JPD can simultaneously display and analyze all the physicochemical parameters of any pair of structures, using precalculated structural alignments, allowing direct parameter comparison at corresponding amino acid positions among homologous structures. In order to focus on the physicochemical (and consequently pharmacological) profile of proteins, visualization tools (showing the structure and structural parameters) also had to be optimized. Our response to this challenge was the use of Java technology with its exceptional level of interactivity. JPD is freely accessible (within the Gold Sting Suite) at http://sms.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS, http://trantor.bioc.columbia.edu/SMS and http://www.es.embnet.org/SMS/ (Option: JavaProtein Dossier). PMID:15215458

  16. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system.

    PubMed

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    2015-11-19

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.

  17. Mining databases for protein aggregation: a review.

    PubMed

    Tsiolaki, Paraskevi L; Nastou, Katerina C; Hamodrakas, Stavros J; Iconomidou, Vassiliki A

    2017-09-01

    Protein aggregation is an active area of research in recent decades, since it is the most common and troubling indication of protein instability. Understanding the mechanisms governing protein aggregation and amyloidogenesis is a key component to the aetiology and pathogenesis of many devastating disorders, including Alzheimer's disease or type 2 diabetes. Protein aggregation data are currently found "scattered" in an increasing number of repositories, since advances in computational biology greatly influence this field of research. This review exploits the various resources of aggregation data and attempts to distinguish and analyze the biological knowledge they contain, by introducing protein-based, fragment-based and disease-based repositories, related to aggregation. In order to gain a broad overview of the available repositories, a novel comprehensive network maps and visualizes the current association between aggregation databases and other important databases and/or tools and discusses the beneficial role of community annotation. The need for unification of aggregation databases in a common platform is also addressed.

  18. Web server to identify similarity of amino acid motifs to compounds (SAAMCO).

    PubMed

    Casey, Fergal P; Davey, Norman E; Baran, Ivan; Varekova, Radka Svobodova; Shields, Denis C

    2008-07-01

    Protein-protein interactions are fundamental in mediating biological processes including metabolism, cell growth, and signaling. To be able to selectively inhibit or induce protein activity or complex formation is a key feature in controlling disease. For those situations in which protein-protein interactions derive substantial affinity from short linear peptide sequences, or motifs, we can develop search algorithms for peptidomimetic compounds that resemble the short peptide's structure but are not compromised by poor pharmacological properties. SAAMCO is a Web service ( http://bioware.ucd.ie/ approximately saamco) that facilitates the screening of motifs with known structures against bioactive compound databases. It is built on an algorithm that defines compound similarity based on the presence of appropriate amino acid side chain fragments and a favorable Root Mean Squared Deviation (RMSD) between compound and motif structure. The methodology is efficient as the available compound databases are preprocessed and fast regular expression searches filter potential matches before time-intensive 3D superposition is performed. The required input information is minimal, and the compound databases have been selected to maximize the availability of information on biological activity. "Hits" are accompanied with a visualization window and links to source database entries. Motif matching can be defined on partial or full similarity which will increase or reduce respectively the number of potential mimetic compounds. The Web server provides the functionality for rapid screening of known or putative interaction motifs against prepared compound libraries using a novel search algorithm. The tabulated results can be analyzed by linking to appropriate databases and by visualization.

  19. GALT protein database: querying structural and functional features of GALT enzyme.

    PubMed

    d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

    2014-09-01

    Knowledge of the impact of variations on protein structure can enhance the comprehension of the mechanisms of genetic diseases related to that protein. Here, we present a new version of GALT Protein Database, a Web-accessible data repository for the storage and interrogation of structural effects of variations of the enzyme galactose-1-phosphate uridylyltransferase (GALT), the impairment of which leads to classic Galactosemia, a rare genetic disease. This new version of this database now contains the models of 201 missense variants of GALT enzyme, including heterozygous variants, and it allows users not only to retrieve information about the missense variations affecting this protein, but also to investigate their impact on substrate binding, intersubunit interactions, stability, and other structural features. In addition, it allows the interactive visualization of the models of variants collected into the database. We have developed additional tools to improve the use of the database by nonspecialized users. This Web-accessible database (http://bioinformatica.isa.cnr.it/GALT/GALT2.0) represents a model of tools potentially suitable for application to other proteins that are involved in human pathologies and that are subjected to genetic variations. © 2014 WILEY PERIODICALS, INC.

  20. CADB: Conformation Angles DataBase of proteins

    PubMed Central

    Sheik, S. S.; Ananthalakshmi, P.; Bhargavi, G. Ramya; Sekar, K.

    2003-01-01

    Conformation Angles DataBase (CADB) provides an online resource to access data on conformation angles (both main-chain and side-chain) of protein structures in two data sets corresponding to 25% and 90% sequence identity between any two proteins, available in the Protein Data Bank. In addition, the database contains the necessary crystallographic parameters. The package has several flexible options and display facilities to visualize the main-chain and side-chain conformation angles for a particular amino acid residue. The package can also be used to study the interrelationship between the main-chain and side-chain conformation angles. A web based JAVA graphics interface has been deployed to display the user interested information on the client machine. The database is being updated at regular intervals and can be accessed over the World Wide Web interface at the following URL: http://144.16.71.148/cadb/. PMID:12520049

  1. Shewregdb: Database and visualization environment for experimental and predicted regulatory information in Shewanella oneidensis mr-1

    PubMed Central

    Syed, Mustafa H; Karpinets, Tatiana V; Leuze, Michael R; Kora, Guruprasad H; Romine, Margaret R; Uberbacher, Edward C

    2009-01-01

    Shewanella oneidensis MR-1 is an important model organism for environmental research as it has an exceptional metabolic and respiratory versatility regulated by a complex regulatory network. We have developed a database to collect experimental and computational data relating to regulation of gene and protein expression, and, a visualization environment that enables integration of these data types. The regulatory information in the database includes predictions of DNA regulator binding sites, sigma factor binding sites, transcription units, operons, promoters, and RNA regulators including non-coding RNAs, riboswitches, and different types of terminators. Availability http://shewanella-knowledgebase.org:8080/Shewanella/gbrowserLanding.jsp PMID:20198195

  2. Bioinformatics Analysis of Protein Phosphorylation in Plant Systems Biology Using P3DB.

    PubMed

    Yao, Qiuming; Xu, Dong

    2017-01-01

    Protein phosphorylation is one of the most pervasive protein post-translational modification events in plant cells. It is involved in many plant biological processes, such as plant growth, organ development, and plant immunology, by regulating or switching signaling and metabolic pathways. High-throughput experimental methods like mass spectrometry can easily characterize hundreds to thousands of phosphorylation events in a single experiment. With the increasing volume of the data sets, Plant Protein Phosphorylation DataBase (P3DB, http://p3db.org ) provides a comprehensive, systematic, and interactive online platform to deposit, query, analyze, and visualize these phosphorylation events in many plant species. It stores the protein phosphorylation sites in the context of identified mass spectra, phosphopeptides, and phosphoproteins contributed from various plant proteome studies. In addition, P3DB associates these plant phosphorylation sites to protein physicochemical information in the protein charts and tertiary structures, while various protein annotations from hierarchical kinase phosphatase families, protein domains, and gene ontology are also added into the database. P3DB not only provides rich information, but also interconnects and provides visualization of the data in networks, in systems biology context. Currently, P3DB includes the KiC (Kinase Client) assay network, the protein-protein interaction network, the kinase-substrate network, the phosphatase-substrate network, and the protein domain co-occurrence network. All of these are available to query for and visualize existing phosphorylation events. Although P3DB only hosts experimentally identified phosphorylation data, it provides a plant phosphorylation prediction model for any unknown queries on the fly. P3DB is an entry point to the plant phosphorylation community to deposit and visualize any customized data sets within this systems biology framework. Nowadays, P3DB has become one of the major bioinformatics platforms of protein phosphorylation in plant biology.

  3. LenVarDB: database of length-variant protein domains.

    PubMed

    Mutt, Eshita; Mathew, Oommen K; Sowdhamini, Ramanathan

    2014-01-01

    Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions-deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.

  4. SAFE Software and FED Database to Uncover Protein-Protein Interactions using Gene Fusion Analysis.

    PubMed

    Tsagrasoulis, Dimosthenis; Danos, Vasilis; Kissa, Maria; Trimpalis, Philip; Koumandou, V Lila; Karagouni, Amalia D; Tsakalidis, Athanasios; Kossida, Sophia

    2012-01-01

    Domain Fusion Analysis takes advantage of the fact that certain proteins in a given proteome A, are found to have statistically significant similarity with two separate proteins in another proteome B. In other words, the result of a fusion event between two separate proteins in proteome B is a specific full-length protein in proteome A. In such a case, it can be safely concluded that the protein pair has a common biological function or even interacts physically. In this paper, we present the Fusion Events Database (FED), a database for the maintenance and retrieval of fusion data both in prokaryotic and eukaryotic organisms and the Software for the Analysis of Fusion Events (SAFE), a computational platform implemented for the automated detection, filtering and visualization of fusion events (both available at: http://www.bioacademy.gr/bioinformatics/projects/ProteinFusion/index.htm). Finally, we analyze the proteomes of three microorganisms using these tools in order to demonstrate their functionality.

  5. SAFE Software and FED Database to Uncover Protein-Protein Interactions using Gene Fusion Analysis

    PubMed Central

    Tsagrasoulis, Dimosthenis; Danos, Vasilis; Kissa, Maria; Trimpalis, Philip; Koumandou, V. Lila; Karagouni, Amalia D.; Tsakalidis, Athanasios; Kossida, Sophia

    2012-01-01

    Domain Fusion Analysis takes advantage of the fact that certain proteins in a given proteome A, are found to have statistically significant similarity with two separate proteins in another proteome B. In other words, the result of a fusion event between two separate proteins in proteome B is a specific full-length protein in proteome A. In such a case, it can be safely concluded that the protein pair has a common biological function or even interacts physically. In this paper, we present the Fusion Events Database (FED), a database for the maintenance and retrieval of fusion data both in prokaryotic and eukaryotic organisms and the Software for the Analysis of Fusion Events (SAFE), a computational platform implemented for the automated detection, filtering and visualization of fusion events (both available at: http://www.bioacademy.gr/bioinformatics/projects/ProteinFusion/index.htm). Finally, we analyze the proteomes of three microorganisms using these tools in order to demonstrate their functionality. PMID:22267904

  6. SpirPro: A Spirulina proteome database and web-based tools for the analysis of protein-protein interactions at the metabolic level in Spirulina (Arthrospira) platensis C1.

    PubMed

    Senachak, Jittisak; Cheevadhanarak, Supapon; Hongsthong, Apiradee

    2015-07-29

    Spirulina (Arthrospira) platensis is the only cyanobacterium that in addition to being studied at the molecular level and subjected to gene manipulation, can also be mass cultivated in outdoor ponds for commercial use as a food supplement. Thus, encountering environmental changes, including temperature stresses, is common during the mass production of Spirulina. The use of cyanobacteria as an experimental platform, especially for photosynthetic gene manipulation in plants and bacteria, is becoming increasingly important. Understanding the mechanisms and protein-protein interaction networks that underlie low- and high-temperature responses is relevant to Spirulina mass production. To accomplish this goal, high-throughput techniques such as OMICs analyses are used. Thus, large datasets must be collected, managed and subjected to information extraction. Therefore, databases including (i) proteomic analysis and protein-protein interaction (PPI) data and (ii) domain/motif visualization tools are required for potential use in temperature response models for plant chloroplasts and photosynthetic bacteria. A web-based repository was developed including an embedded database, SpirPro, and tools for network visualization. Proteome data were analyzed integrated with protein-protein interactions and/or metabolic pathways from KEGG. The repository provides various information, ranging from raw data (2D-gel images) to associated results, such as data from interaction and/or pathway analyses. This integration allows in silico analyses of protein-protein interactions affected at the metabolic level and, particularly, analyses of interactions between and within the affected metabolic pathways under temperature stresses for comparative proteomic analysis. The developed tool, which is coded in HTML with CSS/JavaScript and depicted in Scalable Vector Graphics (SVG), is designed for interactive analysis and exploration of the constructed network. SpirPro is publicly available on the web at http://spirpro.sbi.kmutt.ac.th . SpirPro is an analysis platform containing an integrated proteome and PPI database that provides the most comprehensive data on this cyanobacterium at the systematic level. As an integrated database, SpirPro can be applied in various analyses, such as temperature stress response networking analysis in cyanobacterial models and interacting domain-domain analysis between proteins of interest.

  7. Catalytic site identification—a web server to identify catalytic site structural matches throughout PDB

    PubMed Central

    Kirshner, Daniel A.; Nilmeier, Jerome P.; Lightstone, Felice C.

    2013-01-01

    The catalytic site identification web server provides the innovative capability to find structural matches to a user-specified catalytic site among all Protein Data Bank proteins rapidly (in less than a minute). The server also can examine a user-specified protein structure or model to identify structural matches to a library of catalytic sites. Finally, the server provides a database of pre-calculated matches between all Protein Data Bank proteins and the library of catalytic sites. The database has been used to derive a set of hypothesized novel enzymatic function annotations. In all cases, matches and putative binding sites (protein structure and surfaces) can be visualized interactively online. The website can be accessed at http://catsid.llnl.gov. PMID:23680785

  8. Catalytic site identification--a web server to identify catalytic site structural matches throughout PDB.

    PubMed

    Kirshner, Daniel A; Nilmeier, Jerome P; Lightstone, Felice C

    2013-07-01

    The catalytic site identification web server provides the innovative capability to find structural matches to a user-specified catalytic site among all Protein Data Bank proteins rapidly (in less than a minute). The server also can examine a user-specified protein structure or model to identify structural matches to a library of catalytic sites. Finally, the server provides a database of pre-calculated matches between all Protein Data Bank proteins and the library of catalytic sites. The database has been used to derive a set of hypothesized novel enzymatic function annotations. In all cases, matches and putative binding sites (protein structure and surfaces) can be visualized interactively online. The website can be accessed at http://catsid.llnl.gov.

  9. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less

  10. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

    DOE PAGES

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    2015-11-19

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less

  11. LoopX: A Graphical User Interface-Based Database for Comprehensive Analysis and Comparative Evaluation of Loops from Protein Structures.

    PubMed

    Kadumuri, Rajashekar Varma; Vadrevu, Ramakrishna

    2017-10-01

    Due to their crucial role in function, folding, and stability, protein loops are being targeted for grafting/designing to create novel or alter existing functionality and improve stability and foldability. With a view to facilitate a thorough analysis and effectual search options for extracting and comparing loops for sequence and structural compatibility, we developed, LoopX a comprehensively compiled library of sequence and conformational features of ∼700,000 loops from protein structures. The database equipped with a graphical user interface is empowered with diverse query tools and search algorithms, with various rendering options to visualize the sequence- and structural-level information along with hydrogen bonding patterns, backbone φ, ψ dihedral angles of both the target and candidate loops. Two new features (i) conservation of the polar/nonpolar environment and (ii) conservation of sequence and conformation of specific residues within the loops have also been incorporated in the search and retrieval of compatible loops for a chosen target loop. Thus, the LoopX server not only serves as a database and visualization tool for sequence and structural analysis of protein loops but also aids in extracting and comparing candidate loops for a given target loop based on user-defined search options.

  12. Integrating In Silico Resources to Map a Signaling Network

    PubMed Central

    Liu, Hanqing; Beck, Tim N.; Golemis, Erica A.; Serebriiskii, Ilya G.

    2013-01-01

    The abundance of publicly available life science databases offer a wealth of information that can support interpretation of experimentally derived data and greatly enhance hypothesis generation. Protein interaction and functional networks are not simply new renditions of existing data: they provide the opportunity to gain insights into the specific physical and functional role a protein plays as part of the biological system. In this chapter, we describe different in silico tools that can quickly and conveniently retrieve data from existing data repositories and discuss how the available tools are best utilized for different purposes. While emphasizing protein-protein interaction databases (e.g., BioGrid and IntAct), we also introduce metasearch platforms such as STRING and GeneMANIA, pathway databases (e.g., BioCarta and Pathway Commons), text mining approaches (e.g., PubMed and Chilibot), and resources for drug-protein interactions, genetic information for model organisms and gene expression information based on microarray data mining. Furthermore, we provide a simple step-by-step protocol to building customized protein-protein interaction networks in Cytoscape, a powerful network assembly and visualization program, integrating data retrieved from these various databases. As we illustrate, generation of composite interaction networks enables investigators to extract significantly more information about a given biological system than utilization of a single database or sole reliance on primary literature. PMID:24233784

  13. Visualization portal for genetic variation (VizGVar): a tool for interactive visualization of SNPs and somatic mutations in exons, genes and protein domains.

    PubMed

    Solano-Román, Antonio; Alfaro-Arias, Verónica; Cruz-Castillo, Carlos; Orozco-Solano, Allan

    2018-03-15

    VizGVar was designed to meet the growing need of the research community for improved genomic and proteomic data viewers that benefit from better information visualization. We implemented a new information architecture and applied user centered design principles to provide a new improved way of visualizing genetic information and protein data related to human disease. VizGVar connects the entire database of Ensembl protein motifs, domains, genes and exons with annotated SNPs and somatic variations from PharmGKB and COSMIC. VizGVar precisely represents genetic variations and their respective location by colored curves to designate different types of variations. The structured hierarchy of biological data is reflected in aggregated patterns through different levels, integrating several layers of information at once. VizGVar provides a new interactive, web-based JavaScript visualization of somatic mutations and protein variation, enabling fast and easy discovery of clinically relevant variation patterns. VizGVar is accessible at http://vizport.io/vizgvar; http://vizport.io/vizgvar/doc/. asolano@broadinstitute.org or allan.orozcosolano@ucr.ac.cr.

  14. Applications of Protein Thermodynamic Database for Understanding Protein Mutant Stability and Designing Stable Mutants.

    PubMed

    Gromiha, M Michael; Anoosha, P; Huang, Liang-Tsung

    2016-01-01

    Protein stability is the free energy difference between unfolded and folded states of a protein, which lies in the range of 5-25 kcal/mol. Experimentally, protein stability is measured with circular dichroism, differential scanning calorimetry, and fluorescence spectroscopy using thermal and denaturant denaturation methods. These experimental data have been accumulated in the form of a database, ProTherm, thermodynamic database for proteins and mutants. It also contains sequence and structure information of a protein, experimental methods and conditions, and literature information. Different features such as search, display, and sorting options and visualization tools have been incorporated in the database. ProTherm is a valuable resource for understanding/predicting the stability of proteins and it can be accessed at http://www.abren.net/protherm/ . ProTherm has been effectively used to examine the relationship among thermodynamics, structure, and function of proteins. We describe the recent progress on the development of methods for understanding/predicting protein stability, such as (1) general trends on mutational effects on stability, (2) relationship between the stability of protein mutants and amino acid properties, (3) applications of protein three-dimensional structures for predicting their stability upon point mutations, (4) prediction of protein stability upon single mutations from amino acid sequence, and (5) prediction methods for addressing double mutants. A list of online resources for predicting has also been provided.

  15. IPRStats: visualization of the functional potential of an InterProScan run.

    PubMed

    Kelly, Ryan J; Vincent, David E; Friedberg, Iddo

    2010-12-21

    InterPro is a collection of protein signatures for the classification and automated annotation of proteins. Interproscan is a software tool that scans protein sequences against Interpro member databases using a variety of profile-based, hidden markov model and positional specific score matrix methods. It not only combines a set of analysis tools, but also performs data look-up from various sources, as well as some redundancy removal. Interproscan is robust and scalable, able to perform on any machine from a netbook to a large cluster. However, when performing whole-genome or metagenome analysis, there is a need for a fast statistical visualization of the results to have good initial grasp on the functional potential of the sequences in the analyzed data set. This is especially important when analyzing and comparing metagenomic or metaproteomic data-sets. IPRStats is a tool for the visualization of Interproscan results. Interproscan results are parsed from the Interproscan XML or EBIXML file into an SQLite or MySQL database. The results for each signature database scan are read and displayed as pie-charts or bar charts as summary statistics. A table is also provided, where each entry is a signature (e.g. a Pfam entry) accompanied by one or more Gene Ontology terms, if Interproscan was run using the Gene Ontology option. We present an platform-independent, open source licensed tool that is useful for Interproscan users who wish to view the summary of their results in a rapid and concise fashion.

  16. SSEP: secondary structural elements of proteins

    PubMed Central

    Shanthi, V.; Selvarani, P.; Kiran Kumar, Ch.; Mohire, C. S.; Sekar, K.

    2003-01-01

    SSEP is a comprehensive resource for accessing information related to the secondary structural elements present in the 25 and 90% non-redundant protein chains. The database contains 1771 protein chains from 1670 protein structures and 6182 protein chains from 5425 protein structures in 25 and 90% non-redundant protein chains, respectively. The current version provides information about the α-helical segments and β-strand fragments of varying lengths. In addition, it also contains the information about 310-helix, β- and ν-turns and hairpin loops. The free graphics program RASMOL has been interfaced with the search engine to visualize the three-dimensional structures of the user queried secondary structural fragment. The database is updated regularly and is available through Bioinformatics web server at http://cluster.physics.iisc.ernet.in/ssep/ or http://144.16.71.148/ssep/. PMID:12824336

  17. Proteome reference map and regulation network of neonatal rat cardiomyocyte

    PubMed Central

    Li, Zi-jian; Liu, Ning; Han, Qi-de; Zhang, You-yi

    2011-01-01

    Aim: To study and establish a proteome reference map and regulation network of neonatal rat cardiomyocyte. Methods: Cultured cardiomyocytes of neonatal rats were used. All proteins expressed in the cardiomyocytes were separated and identified by two-dimensional polyacrylamide gel electrophoresis (2-DE) and matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS). Biological networks and pathways of the neonatal rat cardiomyocytes were analyzed using the Ingenuity Pathway Analysis (IPA) program (www.ingenuity.com). A 2-DE database was made accessible on-line by Make2ddb package on a web server. Results: More than 1000 proteins were separated on 2D gels, and 148 proteins were identified. The identified proteins were used for the construction of an extensible markup language-based database. Biological networks and pathways were constructed to analyze the functions associate with cardiomyocyte proteins in the database. The 2-DE database of rat cardiomyocyte proteins can be accessed at http://2d.bjmu.edu.cn. Conclusion: A proteome reference map and regulation network of the neonatal rat cardiomyocytes have been established, which may serve as an international platform for storage, analysis and visualization of cardiomyocyte proteomic data. PMID:21841810

  18. ChiTaRS-3.1-the enhanced chimeric transcripts and RNA-seq database matched with protein-protein interactions.

    PubMed

    Gorohovski, Alessandro; Tagore, Somnath; Palande, Vikrant; Malka, Assaf; Raviv-Shay, Dorith; Frenkel-Morgenstern, Milana

    2017-01-04

    Discovery of chimeric RNAs, which are produced by chromosomal translocations as well as the joining of exons from different genes by trans-splicing, has added a new level of complexity to our study and understanding of the transcriptome. The enhanced ChiTaRS-3.1 database (http://chitars.md.biu.ac.il) is designed to make widely accessible a wealth of mined data on chimeric RNAs, with easy-to-use analytical tools built-in. The database comprises 34 922: chimeric transcripts along with 11 714: cancer breakpoints. In this latest version, we have included multiple cross-references to GeneCards, iHop, PubMed, NCBI, Ensembl, OMIM, RefSeq and the Mitelman collection for every entry in the 'Full Collection'. In addition, for every chimera, we have added a predicted Chimeric Protein-Protein Interaction (ChiPPI) network, which allows for easy visualization of protein partners of both parental and fusion proteins for all human chimeras. The database contains a comprehensive annotation for 34 922: chimeric transcripts from eight organisms, and includes the manual annotation of 200 sense-antiSense (SaS) chimeras. The current improvements in the content and functionality to the ChiTaRS database make it a central resource for the study of chimeric transcripts and fusion proteins. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Interactive and Versatile Navigation of Structural Databases.

    PubMed

    Korb, Oliver; Kuhn, Bernd; Hert, Jérôme; Taylor, Neil; Cole, Jason; Groom, Colin; Stahl, Martin

    2016-05-12

    We present CSD-CrossMiner, a novel tool for pharmacophore-based searches in crystal structure databases. Intuitive pharmacophore queries describing, among others, protein-ligand interaction patterns, ligand scaffolds, or protein environments can be built and modified interactively. Matching crystal structures are overlaid onto the query and visualized as soon as they are available, enabling the researcher to quickly modify a hypothesis on the fly. We exemplify the utility of the approach by showing applications relevant to real-world drug discovery projects, including the identification of novel fragments for a specific protein environment or scaffold hopping. The ability to concurrently search protein-ligand binding sites extracted from the Protein Data Bank (PDB) and small organic molecules from the Cambridge Structural Database (CSD) using the same pharmacophore query further emphasizes the flexibility of CSD-CrossMiner. We believe that CSD-CrossMiner closes an important gap in mining structural data and will allow users to extract more value from the growing number of available crystal structures.

  20. SpirPep: an in silico digestion-based platform to assist bioactive peptides discovery from a genome-wide database.

    PubMed

    Anekthanakul, Krittima; Hongsthong, Apiradee; Senachak, Jittisak; Ruengjitchatchawalya, Marasri

    2018-04-20

    Bioactive peptides, including biological sources-derived peptides with different biological activities, are protein fragments that influence the functions or conditions of organisms, in particular humans and animals. Conventional methods of identifying bioactive peptides are time-consuming and costly. To quicken the processes, several bioinformatics tools are recently used to facilitate screening of the potential peptides prior their activity assessment in vitro and/or in vivo. In this study, we developed an efficient computational method, SpirPep, which offers many advantages over the currently available tools. The SpirPep web application tool is a one-stop analysis and visualization facility to assist bioactive peptide discovery. The tool is equipped with 15 customized enzymes and 1-3 miscleavage options, which allows in silico digestion of protein sequences encoded by protein-coding genes from single, multiple, or genome-wide scaling, and then directly classifies the peptides by bioactivity using an in-house database that contains bioactive peptides collected from 13 public databases. With this tool, the resulting peptides are categorized by each selected enzyme, and shown in a tabular format where the peptide sequences can be tracked back to their original proteins. The developed tool and webpages are coded in PHP and HTML with CSS/JavaScript. Moreover, the tool allows protein-peptide alignment visualization by Generic Genome Browser (GBrowse) to display the region and details of the proteins and peptides within each parameter, while considering digestion design for the desirable bioactivity. SpirPep is efficient; it takes less than 20 min to digest 3000 proteins (751,860 amino acids) with 15 enzymes and three miscleavages for each enzyme, and only a few seconds for single enzyme digestion. Obviously, the tool identified more bioactive peptides than that of the benchmarked tool; an example of validated pentapeptide (FLPIL) from LC-MS/MS was demonstrated. The web and database server are available at http://spirpepapp.sbi.kmutt.ac.th . SpirPep, a web-based bioactive peptide discovery application, is an in silico-based tool with an overview of the results. The platform is a one-stop analysis and visualization facility; and offers advantages over the currently available tools. This tool may be useful for further bioactivity analysis and the quantitative discovery of desirable peptides.

  1. INTERSPIA: a web application for exploring the dynamics of protein-protein interactions among multiple species.

    PubMed

    Kwon, Daehong; Lee, Daehwan; Kim, Juyeon; Lee, Jongin; Sim, Mikang; Kim, Jaebum

    2018-05-09

    Proteins perform biological functions through cascading interactions with each other by forming protein complexes. As a result, interactions among proteins, called protein-protein interactions (PPIs) are not completely free from selection constraint during evolution. Therefore, the identification and analysis of PPI changes during evolution can give us new insight into the evolution of functions. Although many algorithms, databases and websites have been developed to help the study of PPIs, most of them are limited to visualize the structure and features of PPIs in a chosen single species with limited functions in the visualization perspective. This leads to difficulties in the identification of different patterns of PPIs in different species and their functional consequences. To resolve these issues, we developed a web application, called INTER-Species Protein Interaction Analysis (INTERSPIA). Given a set of proteins of user's interest, INTERSPIA first discovers additional proteins that are functionally associated with the input proteins and searches for different patterns of PPIs in multiple species through a server-side pipeline, and second visualizes the dynamics of PPIs in multiple species using an easy-to-use web interface. INTERSPIA is freely available at http://bioinfo.konkuk.ac.kr/INTERSPIA/.

  2. RBscore&NBench: a high-level web server for nucleic acid binding residues prediction with a large-scale benchmarking database.

    PubMed

    Miao, Zhichao; Westhof, Eric

    2016-07-08

    RBscore&NBench combines a web server, RBscore and a database, NBench. RBscore predicts RNA-/DNA-binding residues in proteins and visualizes the prediction scores and features on protein structures. The scoring scheme of RBscore directly links feature values to nucleic acid binding probabilities and illustrates the nucleic acid binding energy funnel on the protein surface. To avoid dataset, binding site definition and assessment metric biases, we compared RBscore with 18 web servers and 3 stand-alone programs on 41 datasets, which demonstrated the high and stable accuracy of RBscore. A comprehensive comparison led us to develop a benchmark database named NBench. The web server is available on: http://ahsoka.u-strasbg.fr/rbscorenbench/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. TranscriptomeBrowser 3.0: introducing a new compendium of molecular interactions and a new visualization tool for the study of gene regulatory networks.

    PubMed

    Lepoivre, Cyrille; Bergon, Aurélie; Lopez, Fabrice; Perumal, Narayanan B; Nguyen, Catherine; Imbert, Jean; Puthier, Denis

    2012-01-31

    Deciphering gene regulatory networks by in silico approaches is a crucial step in the study of the molecular perturbations that occur in diseases. The development of regulatory maps is a tedious process requiring the comprehensive integration of various evidences scattered over biological databases. Thus, the research community would greatly benefit from having a unified database storing known and predicted molecular interactions. Furthermore, given the intrinsic complexity of the data, the development of new tools offering integrated and meaningful visualizations of molecular interactions is necessary to help users drawing new hypotheses without being overwhelmed by the density of the subsequent graph. We extend the previously developed TranscriptomeBrowser database with a set of tables containing 1,594,978 human and mouse molecular interactions. The database includes: (i) predicted regulatory interactions (computed by scanning vertebrate alignments with a set of 1,213 position weight matrices), (ii) potential regulatory interactions inferred from systematic analysis of ChIP-seq experiments, (iii) regulatory interactions curated from the literature, (iv) predicted post-transcriptional regulation by micro-RNA, (v) protein kinase-substrate interactions and (vi) physical protein-protein interactions. In order to easily retrieve and efficiently analyze these interactions, we developed In-teractomeBrowser, a graph-based knowledge browser that comes as a plug-in for Transcriptome-Browser. The first objective of InteractomeBrowser is to provide a user-friendly tool to get new insight into any gene list by providing a context-specific display of putative regulatory and physical interactions. To achieve this, InteractomeBrowser relies on a "cell compartments-based layout" that makes use of a subset of the Gene Ontology to map gene products onto relevant cell compartments. This layout is particularly powerful for visual integration of heterogeneous biological information and is a productive avenue in generating new hypotheses. The second objective of InteractomeBrowser is to fill the gap between interaction databases and dynamic modeling. It is thus compatible with the network analysis software Cytoscape and with the Gene Interaction Network simulation software (GINsim). We provide examples underlying the benefits of this visualization tool for large gene set analysis related to thymocyte differentiation. The InteractomeBrowser plugin is a powerful tool to get quick access to a knowledge database that includes both predicted and validated molecular interactions. InteractomeBrowser is available through the TranscriptomeBrowser framework and can be found at: http://tagc.univ-mrs.fr/tbrowser/. Our database is updated on a regular basis.

  4. Web3DMol: interactive protein structure visualization based on WebGL.

    PubMed

    Shi, Maoxiang; Gao, Juntao; Zhang, Michael Q

    2017-07-03

    A growing number of web-based databases and tools for protein research are being developed. There is now a widespread need for visualization tools to present the three-dimensional (3D) structure of proteins in web browsers. Here, we introduce our 3D modeling program-Web3DMol-a web application focusing on protein structure visualization in modern web browsers. Users submit a PDB identification code or select a PDB archive from their local disk, and Web3DMol will display and allow interactive manipulation of the 3D structure. Featured functions, such as sequence plot, fragment segmentation, measure tool and meta-information display, are offered for users to gain a better understanding of protein structure. Easy-to-use APIs are available for developers to reuse and extend Web3DMol. Web3DMol can be freely accessed at http://web3dmol.duapp.com/, and the source code is distributed under the MIT license. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. YPED: An Integrated Bioinformatics Suite and Database for Mass Spectrometry-based Proteomics Research

    PubMed Central

    Colangelo, Christopher M.; Shifman, Mark; Cheung, Kei-Hoi; Stone, Kathryn L.; Carriero, Nicholas J.; Gulcicek, Erol E.; Lam, TuKiet T.; Wu, Terence; Bjornson, Robert D.; Bruce, Can; Nairn, Angus C.; Rinehart, Jesse; Miller, Perry L.; Williams, Kenneth R.

    2015-01-01

    We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography–tandem mass spectrometry (LC–MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED’s database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results. PMID:25712262

  6. YPED: an integrated bioinformatics suite and database for mass spectrometry-based proteomics research.

    PubMed

    Colangelo, Christopher M; Shifman, Mark; Cheung, Kei-Hoi; Stone, Kathryn L; Carriero, Nicholas J; Gulcicek, Erol E; Lam, TuKiet T; Wu, Terence; Bjornson, Robert D; Bruce, Can; Nairn, Angus C; Rinehart, Jesse; Miller, Perry L; Williams, Kenneth R

    2015-02-01

    We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography-tandem mass spectrometry (LC-MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED's database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  7. Functional Interaction Network Construction and Analysis for Disease Discovery.

    PubMed

    Wu, Guanming; Haw, Robin

    2017-01-01

    Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.

  8. ModuleRole: a tool for modulization, role determination and visualization in protein-protein interaction networks.

    PubMed

    Li, Guipeng; Li, Ming; Zhang, Yiwei; Wang, Dong; Li, Rong; Guimerà, Roger; Gao, Juntao Tony; Zhang, Michael Q

    2014-01-01

    Rapidly increasing amounts of (physical and genetic) protein-protein interaction (PPI) data are produced by various high-throughput techniques, and interpretation of these data remains a major challenge. In order to gain insight into the organization and structure of the resultant large complex networks formed by interacting molecules, using simulated annealing, a method based on the node connectivity, we developed ModuleRole, a user-friendly web server tool which finds modules in PPI network and defines the roles for every node, and produces files for visualization in Cytoscape and Pajek. For given proteins, it analyzes the PPI network from BioGRID database, finds and visualizes the modules these proteins form, and then defines the role every node plays in this network, based on two topological parameters Participation Coefficient and Z-score. This is the first program which provides interactive and very friendly interface for biologists to find and visualize modules and roles of proteins in PPI network. It can be tested online at the website http://www.bioinfo.org/modulerole/index.php, which is free and open to all users and there is no login requirement, with demo data provided by "User Guide" in the menu Help. Non-server application of this program is considered for high-throughput data with more than 200 nodes or user's own interaction datasets. Users are able to bookmark the web link to the result page and access at a later time. As an interactive and highly customizable application, ModuleRole requires no expert knowledge in graph theory on the user side and can be used in both Linux and Windows system, thus a very useful tool for biologist to analyze and visualize PPI networks from databases such as BioGRID. ModuleRole is implemented in Java and C, and is freely available at http://www.bioinfo.org/modulerole/index.php. Supplementary information (user guide, demo data) is also available at this website. API for ModuleRole used for this program can be obtained upon request.

  9. Visualization of protein interaction networks: problems and solutions

    PubMed Central

    2013-01-01

    Background Visualization concerns the representation of data visually and is an important task in scientific research. Protein-protein interactions (PPI) are discovered using either wet lab techniques, such mass spectrometry, or in silico predictions tools, resulting in large collections of interactions stored in specialized databases. The set of all interactions of an organism forms a protein-protein interaction network (PIN) and is an important tool for studying the behaviour of the cell machinery. Since graphic representation of PINs may highlight important substructures, e.g. protein complexes, visualization is more and more used to study the underlying graph structure of PINs. Although graphs are well known data structures, there are different open problems regarding PINs visualization: the high number of nodes and connections, the heterogeneity of nodes (proteins) and edges (interactions), the possibility to annotate proteins and interactions with biological information extracted by ontologies (e.g. Gene Ontology) that enriches the PINs with semantic information, but complicates their visualization. Methods In these last years many software tools for the visualization of PINs have been developed. Initially thought for visualization only, some of them have been successively enriched with new functions for PPI data management and PIN analysis. The paper analyzes the main software tools for PINs visualization considering four main criteria: (i) technology, i.e. availability/license of the software and supported OS (Operating System) platforms; (ii) interoperability, i.e. ability to import/export networks in various formats, ability to export data in a graphic format, extensibility of the system, e.g. through plug-ins; (iii) visualization, i.e. supported layout and rendering algorithms and availability of parallel implementation; (iv) analysis, i.e. availability of network analysis functions, such as clustering or mining of the graph, and the possibility to interact with external databases. Results Currently, many tools are available and it is not easy for the users choosing one of them. Some tools offer sophisticated 2D and 3D network visualization making available many layout algorithms, others tools are more data-oriented and support integration of interaction data coming from different sources and data annotation. Finally, some specialistic tools are dedicated to the analysis of pathways and cellular processes and are oriented toward systems biology studies, where the dynamic aspects of the processes being studied are central. Conclusion A current trend is the deployment of open, extensible visualization tools (e.g. Cytoscape), that may be incrementally enriched by the interactomics community with novel and more powerful functions for PIN analysis, through the development of plug-ins. On the other hand, another emerging trend regards the efficient and parallel implementation of the visualization engine that may provide high interactivity and near real-time response time, as in NAViGaTOR. From a technological point of view, open-source, free and extensible tools, like Cytoscape, guarantee a long term sustainability due to the largeness of the developers and users communities, and provide a great flexibility since new functions are continuously added by the developer community through new plug-ins, but the emerging parallel, often closed-source tools like NAViGaTOR, can offer near real-time response time also in the analysis of very huge PINs. PMID:23368786

  10. PCOGR: phylogenetic COG ranking as an online tool to judge the specificity of COGs with respect to freely definable groups of organisms.

    PubMed

    Meereis, Florian; Kaufmann, Michael

    2004-10-15

    The rapidly increasing number of completely sequenced genomes led to the establishment of the COG-database which, based on sequence homologies, assigns similar proteins from different organisms to clusters of orthologous groups (COGs). There are several bioinformatic studies that made use of this database to determine (hyper)thermophile-specific proteins by searching for COGs containing (almost) exclusively proteins from (hyper)thermophilic genomes. However, public software to perform individually definable group-specific searches is not available. The tool described here exactly fills this gap. The software is accessible at http://www.uni-wh.de/pcogr and is linked to the COG-database. The user can freely define two groups of organisms by selecting for each of the (current) 66 organisms to belong either to groupA, to the reference groupB or to be ignored by the algorithm. Then, for all COGs a specificity index is calculated with respect to the specificity to groupA, i. e. high scoring COGs contain proteins from the most of groupA organisms while proteins from the most organisms assigned to groupB are absent. In addition to ranking all COGs according to the user defined specificity criteria, a graphical visualization shows the distribution of all COGs by displaying their abundance as a function of their specificity indexes. This software allows detecting COGs specific to a predefined group of organisms. All COGs are ranked in the order of their specificity and a graphical visualization allows recognizing (i) the presence and abundance of such COGs and (ii) the phylogenetic relationship between groupA- and groupB-organisms. The software also allows detecting putative protein-protein interactions, novel enzymes involved in only partially known biochemical pathways, and alternate enzymes originated by convergent evolution.

  11. Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.

    PubMed

    Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming

    2016-07-01

    Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.

  12. ProteoLens: a visual analytic tool for multi-scale database-driven biological network data mining.

    PubMed

    Huan, Tianxiao; Sivachenko, Andrey Y; Harrison, Scott H; Chen, Jake Y

    2008-08-12

    New systems biology studies require researchers to understand how interplay among myriads of biomolecular entities is orchestrated in order to achieve high-level cellular and physiological functions. Many software tools have been developed in the past decade to help researchers visually navigate large networks of biomolecular interactions with built-in template-based query capabilities. To further advance researchers' ability to interrogate global physiological states of cells through multi-scale visual network explorations, new visualization software tools still need to be developed to empower the analysis. A robust visual data analysis platform driven by database management systems to perform bi-directional data processing-to-visualizations with declarative querying capabilities is needed. We developed ProteoLens as a JAVA-based visual analytic software tool for creating, annotating and exploring multi-scale biological networks. It supports direct database connectivity to either Oracle or PostgreSQL database tables/views, on which SQL statements using both Data Definition Languages (DDL) and Data Manipulation languages (DML) may be specified. The robust query languages embedded directly within the visualization software help users to bring their network data into a visualization context for annotation and exploration. ProteoLens supports graph/network represented data in standard Graph Modeling Language (GML) formats, and this enables interoperation with a wide range of other visual layout tools. The architectural design of ProteoLens enables the de-coupling of complex network data visualization tasks into two distinct phases: 1) creating network data association rules, which are mapping rules between network node IDs or edge IDs and data attributes such as functional annotations, expression levels, scores, synonyms, descriptions etc; 2) applying network data association rules to build the network and perform the visual annotation of graph nodes and edges according to associated data values. We demonstrated the advantages of these new capabilities through three biological network visualization case studies: human disease association network, drug-target interaction network and protein-peptide mapping network. The architectural design of ProteoLens makes it suitable for bioinformatics expert data analysts who are experienced with relational database management to perform large-scale integrated network visual explorations. ProteoLens is a promising visual analytic platform that will facilitate knowledge discoveries in future network and systems biology studies.

  13. HitPredict version 4: comprehensive reliability scoring of physical protein-protein interactions from more than 100 species.

    PubMed

    López, Yosvany; Nakai, Kenta; Patil, Ashwini

    2015-01-01

    HitPredict is a consolidated resource of experimentally identified, physical protein-protein interactions with confidence scores to indicate their reliability. The study of genes and their inter-relationships using methods such as network and pathway analysis requires high quality protein-protein interaction information. Extracting reliable interactions from most of the existing databases is challenging because they either contain only a subset of the available interactions, or a mixture of physical, genetic and predicted interactions. Automated integration of interactions is further complicated by varying levels of accuracy of database content and lack of adherence to standard formats. To address these issues, the latest version of HitPredict provides a manually curated dataset of 398 696 physical associations between 70 808 proteins from 105 species. Manual confirmation was used to resolve all issues encountered during data integration. For improved reliability assessment, this version combines a new score derived from the experimental information of the interactions with the original score based on the features of the interacting proteins. The combined interaction score performs better than either of the individual scores in HitPredict as well as the reliability score of another similar database. HitPredict provides a web interface to search proteins and visualize their interactions, and the data can be downloaded for offline analysis. Data usability has been enhanced by mapping protein identifiers across multiple reference databases. Thus, the latest version of HitPredict provides a significantly larger, more reliable and usable dataset of protein-protein interactions from several species for the study of gene groups. Database URL: http://hintdb.hgc.jp/htp. © The Author(s) 2015. Published by Oxford University Press.

  14. PDBFlex: exploring flexibility in protein structures

    PubMed Central

    Hrabe, Thomas; Li, Zhanwen; Sedova, Mayya; Rotkiewicz, Piotr; Jaroszewski, Lukasz; Godzik, Adam

    2016-01-01

    The PDBFlex database, available freely and with no login requirements at http://pdbflex.org, provides information on flexibility of protein structures as revealed by the analysis of variations between depositions of different structural models of the same protein in the Protein Data Bank (PDB). PDBFlex collects information on all instances of such depositions, identifying them by a 95% sequence identity threshold, performs analysis of their structural differences and clusters them according to their structural similarities for easy analysis. The PDBFlex contains tools and viewers enabling in-depth examination of structural variability including: 2D-scaling visualization of RMSD distances between structures of the same protein, graphs of average local RMSD in the aligned structures of protein chains, graphical presentation of differences in secondary structure and observed structural disorder (unresolved residues), difference distance maps between all sets of coordinates and 3D views of individual structures and simulated transitions between different conformations, the latter displayed using JSMol visualization software. PMID:26615193

  15. Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence-Function Space and Genome Context to Discover Novel Functions.

    PubMed

    Gerlt, John A

    2017-08-22

    The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.

  16. Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence–Function Space and Genome Context to Discover Novel Functions

    PubMed Central

    2017-01-01

    The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of “genomic enzymology” web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence–function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems. PMID:28826221

  17. ProtaBank: A repository for protein design and engineering data.

    PubMed

    Wang, Connie Y; Chang, Paul M; Ary, Marie L; Allen, Benjamin D; Chica, Roberto A; Mayo, Stephen L; Olafson, Barry D

    2018-03-25

    We present ProtaBank, a repository for storing, querying, analyzing, and sharing protein design and engineering data in an actively maintained and updated database. ProtaBank provides a format to describe and compare all types of protein mutational data, spanning a wide range of properties and techniques. It features a user-friendly web interface and programming layer that streamlines data deposition and allows for batch input and queries. The database schema design incorporates a standard format for reporting protein sequences and experimental data that facilitates comparison of results across different data sets. A suite of analysis and visualization tools are provided to facilitate discovery, to guide future designs, and to benchmark and train new predictive tools and algorithms. ProtaBank will provide a valuable resource to the protein engineering community by storing and safeguarding newly generated data, allowing for fast searching and identification of relevant data from the existing literature, and exploring correlations between disparate data sets. ProtaBank invites researchers to contribute data to the database to make it accessible for search and analysis. ProtaBank is available at https://protabank.org. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  18. MutHTP: Mutations in Human Transmembrane Proteins.

    PubMed

    A, Kulandaisamy; S, Binny Priya; R, Sakthivel; Tarnovskaya, Svetlana; Bizin, Ilya; Hönigschmid, Peter; Frishman, Dmitrij; Gromiha, M Michael

    2018-02-01

    We have developed a novel database, MutHTP, which contains information on 183395 disease-associated and 17827 neutral mutations in human transmembrane proteins. For each mutation site MutHTP provides a description of its location with respect to the membrane protein topology, structural environment (if available) and functional features. Comprehensive visualization, search, display and download options are available. The database is publicly available at http://www.iitm.ac.in/bioinfo/MutHTP/. The website is implemented using HTML, PHP and javascript and supports recent versions of all major browsers, such as Firefox, Chrome and Opera. gromiha@iitm.ac.in. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  19. Thermodynamic database for proteins: features and applications.

    PubMed

    Gromiha, M Michael; Sarai, Akinori

    2010-01-01

    We have developed a thermodynamic database for proteins and mutants, ProTherm, which is a collection of a large number of thermodynamic data on protein stability along with the sequence and structure information, experimental methods and conditions, and literature information. This is a valuable resource for understanding/predicting the stability of proteins, and it can be accessible at http://www.gibk26.bse.kyutech.ac.jp/jouhou/Protherm/protherm.html . ProTherm has several features including various search, display, and sorting options and visualization tools. We have analyzed the data in ProTherm to examine the relationship among thermodynamics, structure, and function of proteins. We describe the progress on the development of methods for understanding/predicting protein stability, such as (i) relationship between the stability of protein mutants and amino acid properties, (ii) average assignment method, (iii) empirical energy functions, (iv) torsion, distance, and contact potentials, and (v) machine learning techniques. The list of online resources for predicting protein stability has also been provided.

  20. An emerging cyberinfrastructure for biodefense pathogen and pathogen-host data.

    PubMed

    Zhang, C; Crasta, O; Cammer, S; Will, R; Kenyon, R; Sullivan, D; Yu, Q; Sun, W; Jha, R; Liu, D; Xue, T; Zhang, Y; Moore, M; McGarvey, P; Huang, H; Chen, Y; Zhang, J; Mazumder, R; Wu, C; Sobral, B

    2008-01-01

    The NIAID-funded Biodefense Proteomics Resource Center (RC) provides storage, dissemination, visualization and analysis capabilities for the experimental data deposited by seven Proteomics Research Centers (PRCs). The data and its publication is to support researchers working to discover candidates for the next generation of vaccines, therapeutics and diagnostics against NIAID's Category A, B and C priority pathogens. The data includes transcriptional profiles, protein profiles, protein structural data and host-pathogen protein interactions, in the context of the pathogen life cycle in vivo and in vitro. The database has stored and supported host or pathogen data derived from Bacillus, Brucella, Cryptosporidium, Salmonella, SARS, Toxoplasma, Vibrio and Yersinia, human tissue libraries, and mouse macrophages. These publicly available data cover diverse data types such as mass spectrometry, yeast two-hybrid (Y2H), gene expression profiles, X-ray and NMR determined protein structures and protein expression clones. The growing database covers over 23 000 unique genes/proteins from different experiments and organisms. All of the genes/proteins are annotated and integrated across experiments using UniProt Knowledgebase (UniProtKB) accession numbers. The web-interface for the database enables searching, querying and downloading at the level of experiment, group and individual gene(s)/protein(s) via UniProtKB accession numbers or protein function keywords. The system is accessible at http://www.proteomicsresource.org/.

  1. mpMoRFsDB: a database of molecular recognition features in membrane proteins.

    PubMed

    Gypas, Foivos; Tsaousis, Georgios N; Hamodrakas, Stavros J

    2013-10-01

    Molecular recognition features (MoRFs) are small, intrinsically disordered regions in proteins that undergo a disorder-to-order transition on binding to their partners. MoRFs are involved in protein-protein interactions and may function as the initial step in molecular recognition. The aim of this work was to collect, organize and store all membrane proteins that contain MoRFs. Membrane proteins constitute ∼30% of fully sequenced proteomes and are responsible for a wide variety of cellular functions. MoRFs were classified according to their secondary structure, after interacting with their partners. We identified MoRFs in transmembrane and peripheral membrane proteins. The position of transmembrane protein MoRFs was determined in relation to a protein's topology. All information was stored in a publicly available mySQL database with a user-friendly web interface. A Jmol applet is integrated for visualization of the structures. mpMoRFsDB provides valuable information related to disorder-based protein-protein interactions in membrane proteins. http://bioinformatics.biol.uoa.gr/mpMoRFsDB

  2. PeptideDepot: flexible relational database for visual analysis of quantitative proteomic data and integration of existing protein information.

    PubMed

    Yu, Kebing; Salomon, Arthur R

    2009-12-01

    Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through MS/MS. Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to various experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our high throughput autonomous proteomic pipeline used in the automated acquisition and post-acquisition analysis of proteomic data.

  3. DBSecSys: a database of Burkholderia mallei secretion systems.

    PubMed

    Memišević, Vesna; Kumar, Kamal; Cheng, Li; Zavaljevski, Nela; DeShazer, David; Wallqvist, Anders; Reifman, Jaques

    2014-07-16

    Bacterial pathogenicity represents a major public health concern worldwide. Secretion systems are a key component of bacterial pathogenicity, as they provide the means for bacterial proteins to penetrate host-cell membranes and insert themselves directly into the host cells' cytosol. Burkholderia mallei is a Gram-negative bacterium that uses multiple secretion systems during its host infection life cycle. To date, the identities of secretion system proteins for B. mallei are not well known, and their pathogenic mechanisms of action and host factors are largely uncharacterized. We present the Database of Burkholderia malleiSecretion Systems (DBSecSys), a compilation of manually curated and computationally predicted bacterial secretion system proteins and their host factors. Currently, DBSecSys contains comprehensive experimentally and computationally derived information about B. mallei strain ATCC 23344. The database includes 143 B. mallei proteins associated with five secretion systems, their 1,635 human and murine interacting targets, and the corresponding 2,400 host-B. mallei interactions. The database also includes information about 10 pathogenic mechanisms of action for B. mallei secretion system proteins inferred from the available literature. Additionally, DBSecSys provides details about 42 virulence attenuation experiments for 27 B. mallei secretion system proteins. Users interact with DBSecSys through a Web interface that allows for data browsing, querying, visualizing, and downloading. DBSecSys provides a comprehensive, systematically organized resource of experimental and computational data associated with B. mallei secretion systems. It provides the unique ability to study secretion systems not only through characterization of their corresponding pathogen proteins, but also through characterization of their host-interacting partners.The database is available at https://applications.bhsai.org/dbsecsys.

  4. Software Tool for Researching Annotations of Proteins (STRAP): Open-Source Protein Annotation Software with Data Visualization

    PubMed Central

    Bhatia, Vivek N.; Perlman, David H.; Costello, Catherine E.; McComb, Mark E.

    2009-01-01

    In order that biological meaning may be derived and testable hypotheses may be built from proteomics experiments, assignments of proteins identified by mass spectrometry or other techniques must be supplemented with additional notation, such as information on known protein functions, protein-protein interactions, or biological pathway associations. Collecting, organizing, and interpreting this data often requires the input of experts in the biological field of study, in addition to the time-consuming search for and compilation of information from online protein databases. Furthermore, visualizing this bulk of information can be challenging due to the limited availability of easy-to-use and freely available tools for this process. In response to these constraints, we have undertaken the design of software to automate annotation and visualization of proteomics data in order to accelerate the pace of research. Here we present the Software Tool for Researching Annotations of Proteins (STRAP) – a user-friendly, open-source C# application. STRAP automatically obtains gene ontology (GO) terms associated with proteins in a proteomics results ID list using the freely accessible UniProtKB and EBI GOA databases. Summarized in an easy-to-navigate tabular format, STRAP includes meta-information on the protein in addition to complimentary GO terminology. Additionally, this information can be edited by the user so that in-house expertise on particular proteins may be integrated into the larger dataset. STRAP provides a sortable tabular view for all terms, as well as graphical representations of GO-term association data in pie (biological process, cellular component and molecular function) and bar charts (cross comparison of sample sets) to aid in the interpretation of large datasets and differential analyses experiments. Furthermore, proteins of interest may be exported as a unique FASTA-formatted file to allow for customizable re-searching of mass spectrometry data, and gene names corresponding to the proteins in the lists may be encoded in the Gaggle microformat for further characterization, including pathway analysis. STRAP, a tutorial, and the C# source code are freely available from http://cpctools.sourceforge.net. PMID:19839595

  5. Identification of Fur, Aconitase, and Other Proteins Expressed by Mycobacterium tuberculosis under Conditions of Low and High Concentrations of Iron by Combined Two-Dimensional Gel Electrophoresis and Mass Spectrometry

    PubMed Central

    Wong, Diane K.; Lee, Bai-Yu; Horwitz, Marcus A.; Gibson, Bradford W.

    1999-01-01

    Iron plays a critical role in the pathophysiology of Mycobacterium tuberculosis. To gain a better understanding of iron regulation by this organism, we have used two-dimensional (2-D) gel electrophoresis, mass spectrometry, and database searching to study protein expression in M. tuberculosis under conditions of high and low iron concentration. Proteins in cellular extracts from M. tuberculosis Erdman strain grown under low-iron (1 μM) and high-iron (70 μM) conditions were separated by 2-D polyacrylamide gel electrophoresis, which allowed high-resolution separation of several hundred proteins, as visualized by Coomassie staining. The expression of at least 15 proteins was induced, and the expression of at least 12 proteins was decreased under low-iron conditions. In-gel trypsin digestion was performed on these differentially expressed proteins, and the digestion mixtures were analyzed by matrix-assisted laser desorption ionization time-of-flight mass spectrometry to determine the molecular masses of the resulting tryptic peptides. Partial sequence data on some of the peptides were obtained by using after source decay and/or collision-induced dissociation. The fragmentation data were used to search computerized peptide mass and protein sequence databases for known proteins. Ten iron-regulated proteins were identified, including Fur and aconitase proteins, both of which are known to be regulated by iron in other bacterial systems. Our study shows that, where large protein sequence databases are available from genomic studies, the combined use of 2-D gel electrophoresis, mass spectrometry, and database searching to analyze proteins expressed under defined environmental conditions is a powerful tool for identifying expressed proteins and their physiologic relevance. PMID:9864233

  6. MimoSA: a system for minimotif annotation

    PubMed Central

    2010-01-01

    Background Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. Results We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. Conclusions MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to dynamically rank papers with respect to context. PMID:20565705

  7. SGDB: a database of synthetic genes re-designed for optimizing protein over-expression.

    PubMed

    Wu, Gang; Zheng, Yuanpu; Qureshi, Imran; Zin, Htar Thant; Beck, Tyler; Bulka, Blazej; Freeland, Stephen J

    2007-01-01

    Here we present the Synthetic Gene Database (SGDB): a relational database that houses sequences and associated experimental information on synthetic (artificially engineered) genes from all peer-reviewed studies published to date. At present, the database comprises information from more than 200 published experiments. This resource not only provides reference material to guide experimentalists in designing new genes that improve protein expression, but also offers a dataset for analysis by bioinformaticians who seek to test ideas regarding the underlying factors that influence gene expression. The SGDB was built under MySQL database management system. We also offer an XML schema for standardized data description of synthetic genes. Users can access the database at http://www.evolvingcode.net/codon/sgdb/index.php, or batch downloads all information through XML files. Moreover, users may visually compare the coding sequences of a synthetic gene and its natural counterpart with an integrated web tool at http://www.evolvingcode.net/codon/sgdb/aligner.php, and discuss questions, findings and related information on an associated e-forum at http://www.evolvingcode.net/forum/viewforum.php?f=27.

  8. PDB explorer -- a web based algorithm for protein annotation viewer and 3D visualization.

    PubMed

    Nayarisseri, Anuraj; Shardiwal, Rakesh Kumar; Yadav, Mukesh; Kanungo, Neha; Singh, Pooja; Shah, Pratik; Ahmed, Sheaza

    2014-12-01

    The PDB file format, is a text format characterizing the three dimensional structures of macro molecules available in the Protein Data Bank (PDB). Determined protein structure are found in coalition with other molecules or ions such as nucleic acids, water, ions, Drug molecules and so on, which therefore can be described in the PDB format and have been deposited in PDB database. PDB is a machine generated file, it's not human readable format, to read this file we need any computational tool to understand it. The objective of our present study is to develop a free online software for retrieval, visualization and reading of annotation of a protein 3D structure which is available in PDB database. Main aim is to create PDB file in human readable format, i.e., the information in PDB file is converted in readable sentences. It displays all possible information from a PDB file including 3D structure of that file. Programming languages and scripting languages like Perl, CSS, Javascript, Ajax, and HTML have been used for the development of PDB Explorer. The PDB Explorer directly parses the PDB file, calling methods for parsed element secondary structure element, atoms, coordinates etc. PDB Explorer is freely available at http://www.pdbexplorer.eminentbio.com/home with no requirement of log-in.

  9. HotRegion: a database of predicted hot spot clusters.

    PubMed

    Cukuroglu, Engin; Gursoy, Attila; Keskin, Ozlem

    2012-01-01

    Hot spots are energetically important residues at protein interfaces and they are not randomly distributed across the interface but rather clustered. These clustered hot spots form hot regions. Hot regions are important for the stability of protein complexes, as well as providing specificity to binding sites. We propose a database called HotRegion, which provides the hot region information of the interfaces by using predicted hot spot residues, and structural properties of these interface residues such as pair potentials of interface residues, accessible surface area (ASA) and relative ASA values of interface residues of both monomer and complex forms of proteins. Also, the 3D visualization of the interface and interactions among hot spot residues are provided. HotRegion is accessible at http://prism.ccbb.ku.edu.tr/hotregion.

  10. SANSparallel: interactive homology search against Uniprot

    PubMed Central

    Somervuo, Panu; Holm, Liisa

    2015-01-01

    Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. PMID:25855811

  11. ExDom: an integrated database for comparative analysis of the exon–intron structures of protein domains in eukaryotes

    PubMed Central

    Bhasi, Ashwini; Philip, Philge; Manikandan, Vinu; Senapathy, Periannan

    2009-01-01

    We have developed ExDom, a unique database for the comparative analysis of the exon–intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon–intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon–intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon–intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/. PMID:18984624

  12. Plasma Glycoproteomics Reveals Sepsis Outcomes Linked to Distinct Proteins in Common Pathways

    PubMed Central

    DeLeon-Pennell, Kristine Y.; Nguyen, Nguyen T.; de Castro Brás, Lisandra E.; Flynn, Elizabeth R.; Cannon, Presley L.; Griswold, Michael E.; Jin, Yu-Fang; Puskarich, Michael A.; Jones, Alan E.; Lindsey, Merry L.

    2015-01-01

    Objective Sepsis remains a predominant cause of mortality in the ICU, yet strategies to increase survival have proved largely unsuccessful. This study aimed to identify proteins linked to sepsis outcomes using a glycoproteomic approach to target extracellular proteins that trigger downstream pathways and direct patient outcomes. Design Plasma was obtained from the LacTATEs cohort. N-linked plasma glycopeptides were quantified by solid-phase extraction coupled with mass spectrometry. Glycopeptides were assigned to proteins using RefSeq and visualized in a heat map. Protein differences were validated by immunoblotting, and proteins were mapped for biological processes using Database for Annotation, Visualization and Integrated Discovery and for functional pathways using Kyoto Encyclopedia of Genes and Genomes databases. Setting Hospitalized care. Measurements and Main Results A total of 501 glycopeptides corresponding to 234 proteins were identified. Of these, 66 glycopeptides were unique to the survivor group and corresponded to 54 proteins, 60 were unique to the nonsurvivor group and corresponded to 43 proteins, and 375 were common responses between groups and corresponded to 137 proteins. Immunoblotting showed that nonsurvivors had increased total kininogen; decreased total cathepsin-L1, vascular cell adhesion molecule, periostin, and neutrophil gelatinase–associated lipocalin; and a two-fold decrease in glycosylated clusterin (all p < 0.05). Kyoto Encyclopedia of Genes and Genomes analysis identified six enriched pathways. Interestingly, survivors relied on the extrinsic pathway of the complement and coagulation cascade, whereas nonsurvivors relied on the intrinsic pathway. Conclusion This study identifies proteins linked to patient outcomes and provides insight into unexplored mechanisms that can be investigated for the identification of novel therapeutic targets. (Crit Care Med 2015; XX:00–00) PMID:26086942

  13. An emerging cyberinfrastructure for biodefense pathogen and pathogen–host data

    PubMed Central

    Zhang, C.; Crasta, O.; Cammer, S.; Will, R.; Kenyon, R.; Sullivan, D.; Yu, Q.; Sun, W.; Jha, R.; Liu, D.; Xue, T.; Zhang, Y.; Moore, M.; McGarvey, P.; Huang, H.; Chen, Y.; Zhang, J.; Mazumder, R.; Wu, C.; Sobral, B.

    2008-01-01

    The NIAID-funded Biodefense Proteomics Resource Center (RC) provides storage, dissemination, visualization and analysis capabilities for the experimental data deposited by seven Proteomics Research Centers (PRCs). The data and its publication is to support researchers working to discover candidates for the next generation of vaccines, therapeutics and diagnostics against NIAID's Category A, B and C priority pathogens. The data includes transcriptional profiles, protein profiles, protein structural data and host–pathogen protein interactions, in the context of the pathogen life cycle in vivo and in vitro. The database has stored and supported host or pathogen data derived from Bacillus, Brucella, Cryptosporidium, Salmonella, SARS, Toxoplasma, Vibrio and Yersinia, human tissue libraries, and mouse macrophages. These publicly available data cover diverse data types such as mass spectrometry, yeast two-hybrid (Y2H), gene expression profiles, X-ray and NMR determined protein structures and protein expression clones. The growing database covers over 23 000 unique genes/proteins from different experiments and organisms. All of the genes/proteins are annotated and integrated across experiments using UniProt Knowledgebase (UniProtKB) accession numbers. The web-interface for the database enables searching, querying and downloading at the level of experiment, group and individual gene(s)/protein(s) via UniProtKB accession numbers or protein function keywords. The system is accessible at http://www.proteomicsresource.org/. PMID:17984082

  14. LIFEdb: a database for functional genomics experiments integrating information from external sources, and serving as a sample tracking system

    PubMed Central

    Bannasch, Detlev; Mehrle, Alexander; Glatting, Karl-Heinz; Pepperkok, Rainer; Poustka, Annemarie; Wiemann, Stefan

    2004-01-01

    We have implemented LIFEdb (http://www.dkfz.de/LIFEdb) to link information regarding novel human full-length cDNAs generated and sequenced by the German cDNA Consortium with functional information on the encoded proteins produced in functional genomics and proteomics approaches. The database also serves as a sample-tracking system to manage the process from cDNA to experimental read-out and data interpretation. A web interface enables the scientific community to explore and visualize features of the annotated cDNAs and ORFs combined with experimental results, and thus helps to unravel new features of proteins with as yet unknown functions. PMID:14681468

  15. JAIL: a structure-based interface library for macromolecules.

    PubMed

    Günther, Stefan; von Eichborn, Joachim; May, Patrick; Preissner, Robert

    2009-01-01

    The increasing number of solved macromolecules provides a solid number of 3D interfaces, if all types of molecular contacts are being considered. JAIL annotates three different kinds of macromolecular interfaces, those between interacting protein domains, interfaces of different protein chains and interfaces between proteins and nucleic acids. This results in a total number of about 184,000 database entries. All the interfaces can easily be identified by a detailed search form or by a hierarchical tree that describes the protein domain architectures classified by the SCOP database. Visual inspection of the interfaces is possible via an interactive protein viewer. Furthermore, large scale analyses are supported by an implemented sequential and by a structural clustering. Similar interfaces as well as non-redundant interfaces can be easily picked out. Additionally, the sequential conservation of binding sites was also included in the database and is retrievable via Jmol. A comprehensive download section allows the composition of representative data sets with user defined parameters. The huge data set in combination with various search options allow a comprehensive view on all interfaces between macromolecules included in the Protein Data Bank (PDB). The download of the data sets supports numerous further investigations in macromolecular recognition. JAIL is publicly available at http://bioinformatics.charite.de/jail.

  16. The visualCMAT: A web-server to select and interpret correlated mutations/co-evolving residues in protein families.

    PubMed

    Suplatov, Dmitry; Sharapova, Yana; Timonina, Daria; Kopylov, Kirill; Švedas, Vytas

    2018-04-01

    The visualCMAT web-server was designed to assist experimental research in the fields of protein/enzyme biochemistry, protein engineering, and drug discovery by providing an intuitive and easy-to-use interface to the analysis of correlated mutations/co-evolving residues. Sequence and structural information describing homologous proteins are used to predict correlated substitutions by the Mutual information-based CMAT approach, classify them into spatially close co-evolving pairs, which either form a direct physical contact or interact with the same ligand (e.g. a substrate or a crystallographic water molecule), and long-range correlations, annotate and rank binding sites on the protein surface by the presence of statistically significant co-evolving positions. The results of the visualCMAT are organized for a convenient visual analysis and can be downloaded to a local computer as a content-rich all-in-one PyMol session file with multiple layers of annotation corresponding to bioinformatic, statistical and structural analyses of the predicted co-evolution, or further studied online using the built-in interactive analysis tools. The online interactivity is implemented in HTML5 and therefore neither plugins nor Java are required. The visualCMAT web-server is integrated with the Mustguseal web-server capable of constructing large structure-guided sequence alignments of protein families and superfamilies using all available information about their structures and sequences in public databases. The visualCMAT web-server can be used to understand the relationship between structure and function in proteins, implemented at selecting hotspots and compensatory mutations for rational design and directed evolution experiments to produce novel enzymes with improved properties, and employed at studying the mechanism of selective ligand's binding and allosteric communication between topologically independent sites in protein structures. The web-server is freely available at https://biokinet.belozersky.msu.ru/visualcmat and there are no login requirements.

  17. PeptideDepot: Flexible Relational Database for Visual Analysis of Quantitative Proteomic Data and Integration of Existing Protein Information

    PubMed Central

    Yu, Kebing; Salomon, Arthur R.

    2010-01-01

    Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through tandem mass spectrometry (MS/MS). Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to a variety of experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our High Throughput Autonomous Proteomic Pipeline (HTAPP) used in the automated acquisition and post-acquisition analysis of proteomic data. PMID:19834895

  18. Plant Reactome: a resource for plant pathways and comparative analysis

    PubMed Central

    Naithani, Sushma; Preece, Justin; D'Eustachio, Peter; Gupta, Parul; Amarasinghe, Vindhya; Dharmawardhana, Palitha D.; Wu, Guanming; Fabregat, Antonio; Elser, Justin L.; Weiser, Joel; Keays, Maria; Fuentes, Alfonso Munoz-Pomer; Petryszak, Robert; Stein, Lincoln D.; Ware, Doreen; Jaiswal, Pankaj

    2017-01-01

    Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX. PMID:27799469

  19. Genic insights from integrated human proteomics in GeneCards.

    PubMed

    Fishilevich, Simon; Zimmerman, Shahar; Kohn, Asher; Iny Stein, Tsippi; Olender, Tsviya; Kolker, Eugene; Safran, Marilyn; Lancet, Doron

    2016-01-01

    GeneCards is a one-stop shop for searchable human gene annotations (http://www.genecards.org/). Data are automatically mined from ∼120 sources and presented in an integrated web card for every human gene. We report the application of recent advances in proteomics to enhance gene annotation and classification in GeneCards. First, we constructed the Human Integrated Protein Expression Database (HIPED), a unified database of protein abundance in human tissues, based on the publically available mass spectrometry (MS)-based proteomics sources ProteomicsDB, Multi-Omics Profiling Expression Database, Protein Abundance Across Organisms and The MaxQuant DataBase. The integrated database, residing within GeneCards, compares favourably with its individual sources, covering nearly 90% of human protein-coding genes. For gene annotation and comparisons, we first defined a protein expression vector for each gene, based on normalized abundances in 69 normal human tissues. This vector is portrayed in the GeneCards expression section as a bar graph, allowing visual inspection and comparison. These data are juxtaposed with transcriptome bar graphs. Using the protein expression vectors, we further defined a pairwise metric that helps assess expression-based pairwise proximity. This new metric for finding functional partners complements eight others, including sharing of pathways, gene ontology (GO) terms and domains, implemented in the GeneCards Suite. In parallel, we calculated proteome-based differential expression, highlighting a subset of tissues that overexpress a gene and subserving gene classification. This textual annotation allows users of VarElect, the suite's next-generation phenotyper, to more effectively discover causative disease variants. Finally, we define the protein-RNA expression ratio and correlation as yet another attribute of every gene in each tissue, adding further annotative information. The results constitute a significant enhancement of several GeneCards sections and help promote and organize the genome-wide structural and functional knowledge of the human proteome. Database URL:http://www.genecards.org/. © The Author(s) 2016. Published by Oxford University Press.

  20. Analysis of high accuracy, quantitative proteomics data in the MaxQB database.

    PubMed

    Schaab, Christoph; Geiger, Tamar; Stoehr, Gabriele; Cox, Juergen; Mann, Matthias

    2012-03-01

    MS-based proteomics generates rapidly increasing amounts of precise and quantitative information. Analysis of individual proteomic experiments has made great strides, but the crucial ability to compare and store information across different proteome measurements still presents many challenges. For example, it has been difficult to avoid contamination of databases with low quality peptide identifications, to control for the inflation in false positive identifications when combining data sets, and to integrate quantitative data. Although, for example, the contamination with low quality identifications has been addressed by joint analysis of deposited raw data in some public repositories, we reasoned that there should be a role for a database specifically designed for high resolution and quantitative data. Here we describe a novel database termed MaxQB that stores and displays collections of large proteomics projects and allows joint analysis and comparison. We demonstrate the analysis tools of MaxQB using proteome data of 11 different human cell lines and 28 mouse tissues. The database-wide false discovery rate is controlled by adjusting the project specific cutoff scores for the combined data sets. The 11 cell line proteomes together identify proteins expressed from more than half of all human genes. For each protein of interest, expression levels estimated by label-free quantification can be visualized across the cell lines. Similarly, the expression rank order and estimated amount of each protein within each proteome are plotted. We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes. Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation. This information can be used to pinpoint false protein identifications, independently of peptide database scores. The information contained in MaxQB, including high resolution fragment spectra, is accessible to the community via a user-friendly web interface at http://www.biochem.mpg.de/maxqb.

  1. SANSparallel: interactive homology search against Uniprot.

    PubMed

    Somervuo, Panu; Holm, Liisa

    2015-07-01

    Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. The Gene Expression Omnibus Database.

    PubMed

    Clough, Emily; Barrett, Tanya

    2016-01-01

    The Gene Expression Omnibus (GEO) database is an international public repository that archives and freely distributes high-throughput gene expression and other functional genomics data sets. Created in 2000 as a worldwide resource for gene expression studies, GEO has evolved with rapidly changing technologies and now accepts high-throughput data for many other data applications, including those that examine genome methylation, chromatin structure, and genome-protein interactions. GEO supports community-derived reporting standards that specify provision of several critical study elements including raw data, processed data, and descriptive metadata. The database not only provides access to data for tens of thousands of studies, but also offers various Web-based tools and strategies that enable users to locate data relevant to their specific interests, as well as to visualize and analyze the data. This chapter includes detailed descriptions of methods to query and download GEO data and use the analysis and visualization tools. The GEO homepage is at http://www.ncbi.nlm.nih.gov/geo/.

  3. The Gene Expression Omnibus database

    PubMed Central

    Clough, Emily; Barrett, Tanya

    2016-01-01

    The Gene Expression Omnibus (GEO) database is an international public repository that archives and freely distributes high-throughput gene expression and other functional genomics data sets. Created in 2000 as a worldwide resource for gene expression studies, GEO has evolved with rapidly changing technologies and now accepts high-throughput data for many other data applications, including those that examine genome methylation, chromatin structure, and genome–protein interactions. GEO supports community-derived reporting standards that specify provision of several critical study elements including raw data, processed data, and descriptive metadata. The database not only provides access to data for tens of thousands of studies, but also offers various Web-based tools and strategies that enable users to locate data relevant to their specific interests, as well as to visualize and analyze the data. This chapter includes detailed descriptions of methods to query and download GEO data and use the analysis and visualization tools. The GEO homepage is at http://www.ncbi.nlm.nih.gov/geo/. PMID:27008011

  4. PDB-Ligand: a ligand database based on PDB for the automated and customized classification of ligand-binding structures.

    PubMed

    Shin, Jae-Min; Cho, Doo-Ho

    2005-01-01

    PDB-Ligand (http://www.idrtech.com/PDB-Ligand/) is a three-dimensional structure database of small molecular ligands that are bound to larger biomolecules deposited in the Protein Data Bank (PDB). It is also a database tool that allows one to browse, classify, superimpose and visualize these structures. As of May 2004, there are about 4870 types of small molecular ligands, experimentally determined as a complex with protein or DNA in the PDB. The proteins that a given ligand binds are often homologous and present the same binding structure to the ligand. However, there are also many instances wherein a given ligand binds to two or more unrelated proteins, or to the same or homologous protein in different binding environments. PDB-Ligand serves as an interactive structural analysis and clustering tool for all the ligand-binding structures in the PDB. PDB-Ligand also provides an easier way to obtain a number of different structure alignments of many related ligand-binding structures based on a simple and flexible ligand clustering method. PDB-Ligand will be a good resource for both a better interpretation of ligand-binding structures and the development of better scoring functions to be used in many drug discovery applications.

  5. ActiveDriverDB: human disease mutations and genome variation in post-translational modification sites of proteins

    PubMed Central

    Krassowski, Michal; Paczkowska, Marta; Cullion, Kim; Huang, Tina; Dzneladze, Irakli; Ouellette, B F Francis; Yamada, Joseph T; Fradet-Turcotte, Amelie

    2018-01-01

    Abstract Interpretation of genetic variation is needed for deciphering genotype-phenotype associations, mechanisms of inherited disease, and cancer driver mutations. Millions of single nucleotide variants (SNVs) in human genomes are known and thousands are associated with disease. An estimated 21% of disease-associated amino acid substitutions corresponding to missense SNVs are located in protein sites of post-translational modifications (PTMs), chemical modifications of amino acids that extend protein function. ActiveDriverDB is a comprehensive human proteo-genomics database that annotates disease mutations and population variants through the lens of PTMs. We integrated >385,000 published PTM sites with ∼3.6 million substitutions from The Cancer Genome Atlas (TCGA), the ClinVar database of disease genes, and human genome sequencing projects. The database includes site-specific interaction networks of proteins, upstream enzymes such as kinases, and drugs targeting these enzymes. We also predicted network-rewiring impact of mutations by analyzing gains and losses of kinase-bound sequence motifs. ActiveDriverDB provides detailed visualization, filtering, browsing and searching options for studying PTM-associated mutations. Users can upload mutation datasets interactively and use our application programming interface in pipelines. Integrative analysis of mutations and PTMs may help decipher molecular mechanisms of phenotypes and disease, as exemplified by case studies of TP53, BRCA2 and VHL. The open-source database is available at https://www.ActiveDriverDB.org. PMID:29126202

  6. TrypsNetDB: An integrated framework for the functional characterization of trypanosomatid proteins

    PubMed Central

    Gazestani, Vahid H.; Yip, Chun Wai; Nikpour, Najmeh; Berghuis, Natasha

    2017-01-01

    Trypanosomatid parasites cause serious infections in humans and production losses in livestock. Due to the high divergence from other eukaryotes, such as humans and model organisms, the functional roles of many trypanosomatid proteins cannot be predicted by homology-based methods, rendering a significant portion of their proteins as uncharacterized. Recent technological advances have led to the availability of multiple systematic and genome-wide datasets on trypanosomatid parasites that are informative regarding the biological role(s) of their proteins. Here, we report TrypsNetDB (http://trypsNetDB.org), a web-based resource for the functional annotation of 16 different species/strains of trypanosomatid parasites. The database not only visualizes the network context of the queried protein(s) in an intuitive way but also examines the response of the represented network in more than 50 different biological contexts and its enrichment for various biological terms and pathways, protein sequence signatures, and potential RNA regulatory elements. The interactome core of the database, as of Jan 23, 2017, contains 101,187 interactions among 13,395 trypanosomatid proteins inferred from 97 genome-wide and focused studies on the interactome of these organisms. PMID:28158179

  7. JET2 Viewer: a database of predicted multiple, possibly overlapping, protein-protein interaction sites for PDB structures.

    PubMed

    Ripoche, Hugues; Laine, Elodie; Ceres, Nicoletta; Carbone, Alessandra

    2017-01-04

    The database JET2 Viewer, openly accessible at http://www.jet2viewer.upmc.fr/, reports putative protein binding sites for all three-dimensional (3D) structures available in the Protein Data Bank (PDB). This knowledge base was generated by applying the computational method JET 2 at large-scale on more than 20 000 chains. JET 2 strategy yields very precise predictions of interacting surfaces and unravels their evolutionary process and complexity. JET2 Viewer provides an online intelligent display, including interactive 3D visualization of the binding sites mapped onto PDB structures and suitable files recording JET 2 analyses. Predictions were evaluated on more than 15 000 experimentally characterized protein interfaces. This is, to our knowledge, the largest evaluation of a protein binding site prediction method. The overall performance of JET 2 on all interfaces are: Sen = 52.52, PPV = 51.24, Spe = 80.05, Acc = 75.89. The data can be used to foster new strategies for protein-protein interactions modulation and interaction surface redesign. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Cytoscape: a software environment for integrated models of biomolecular interaction networks.

    PubMed

    Shannon, Paul; Markiel, Andrew; Ozier, Owen; Baliga, Nitin S; Wang, Jonathan T; Ramage, Daniel; Amin, Nada; Schwikowski, Benno; Ideker, Trey

    2003-11-01

    Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

  9. Protein 3D Structure and Electron Microscopy Map Retrieval Using 3D-SURFER2.0 and EM-SURFER.

    PubMed

    Han, Xusi; Wei, Qing; Kihara, Daisuke

    2017-12-08

    With the rapid growth in the number of solved protein structures stored in the Protein Data Bank (PDB) and the Electron Microscopy Data Bank (EMDB), it is essential to develop tools to perform real-time structure similarity searches against the entire structure database. Since conventional structure alignment methods need to sample different orientations of proteins in the three-dimensional space, they are time consuming and unsuitable for rapid, real-time database searches. To this end, we have developed 3D-SURFER and EM-SURFER, which utilize 3D Zernike descriptors (3DZD) to conduct high-throughput protein structure comparison, visualization, and analysis. Taking an atomic structure or an electron microscopy map of a protein or a protein complex as input, the 3DZD of a query protein is computed and compared with the 3DZD of all other proteins in PDB or EMDB. In addition, local geometrical characteristics of a query protein can be analyzed using VisGrid and LIGSITE CSC in 3D-SURFER. This article describes how to use 3D-SURFER and EM-SURFER to carry out protein surface shape similarity searches, local geometric feature analysis, and interpretation of the search results. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  10. ExtraTrain: a database of Extragenic regions and Transcriptional information in prokaryotic organisms

    PubMed Central

    Pareja, Eduardo; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Bonal, Javier; Tobes, Raquel

    2006-01-01

    Background Transcriptional regulation processes are the principal mechanisms of adaptation in prokaryotes. In these processes, the regulatory proteins and the regulatory DNA signals located in extragenic regions are the key elements involved. As all extragenic spaces are putative regulatory regions, ExtraTrain covers all extragenic regions of available genomes and regulatory proteins from bacteria and archaea included in the UniProt database. Description ExtraTrain provides integrated and easily manageable information for 679816 extragenic regions and for the genes delimiting each of them. In addition ExtraTrain supplies a tool to explore extragenic regions, named Palinsight, oriented to detect and search palindromic patterns. This interactive visual tool is totally integrated in the database, allowing the search for regulatory signals in user defined sets of extragenic regions. The 26046 regulatory proteins included in ExtraTrain belong to the families AraC/XylS, ArsR, AsnC, Cold shock domain, CRP-FNR, DeoR, GntR, IclR, LacI, LuxR, LysR, MarR, MerR, NtrC/Fis, OmpR and TetR. The database follows the InterPro criteria to define these families. The information about regulators includes manually curated sets of references specifically associated to regulator entries. In order to achieve a sustainable and maintainable knowledge database ExtraTrain is a platform open to the contribution of knowledge by the scientific community providing a system for the incorporation of textual knowledge. Conclusion ExtraTrain is a new database for exploring Extragenic regions and Transcriptional information in bacteria and archaea. ExtraTrain database is available at . PMID:16539733

  11. Kin-Driver: a database of driver mutations in protein kinases.

    PubMed

    Simonetti, Franco L; Tornador, Cristian; Nabau-Moretó, Nuria; Molina-Vila, Miguel A; Marino-Buslje, Cristina

    2014-01-01

    Somatic mutations in protein kinases (PKs) are frequent driver events in many human tumors, while germ-line mutations are associated with hereditary diseases. Here we present Kin-driver, the first database that compiles driver mutations in PKs with experimental evidence demonstrating their functional role. Kin-driver is a manual expert-curated database that pays special attention to activating mutations (AMs) and can serve as a validation set to develop new generation tools focused on the prediction of gain-of-function driver mutations. It also offers an easy and intuitive environment to facilitate the visualization and analysis of mutations in PKs. Because all mutations are mapped onto a multiple sequence alignment, analogue positions between kinases can be identified and tentative new mutations can be proposed for studying by transferring annotation. Finally, our database can also be of use to clinical and translational laboratories, helping them to identify uncommon AMs that can correlate with response to new antitumor drugs. The website was developed using PHP and JavaScript, which are supported by all major browsers; the database was built using MySQL server. Kin-driver is available at: http://kin-driver.leloir.org.ar/ © The Author(s) 2014. Published by Oxford University Press.

  12. PharmDB-K: Integrated Bio-Pharmacological Network Database for Traditional Korean Medicine

    PubMed Central

    Lee, Ji-Hyun; Park, Kyoung Mii; Han, Dong-Jin; Bang, Nam Young; Kim, Do-Hee; Na, Hyeongjin; Lim, Semi; Kim, Tae Bum; Kim, Dae Gyu; Kim, Hyun-Jung; Chung, Yeonseok; Sung, Sang Hyun; Surh, Young-Joon; Kim, Sunghoon; Han, Byung Woo

    2015-01-01

    Despite the growing attention given to Traditional Medicine (TM) worldwide, there is no well-known, publicly available, integrated bio-pharmacological Traditional Korean Medicine (TKM) database for researchers in drug discovery. In this study, we have constructed PharmDB-K, which offers comprehensive information relating to TKM-associated drugs (compound), disease indication, and protein relationships. To explore the underlying molecular interaction of TKM, we integrated fourteen different databases, six Pharmacopoeias, and literature, and established a massive bio-pharmacological network for TKM and experimentally validated some cases predicted from the PharmDB-K analyses. Currently, PharmDB-K contains information about 262 TKMs, 7,815 drugs, 3,721 diseases, 32,373 proteins, and 1,887 side effects. One of the unique sets of information in PharmDB-K includes 400 indicator compounds used for standardization of herbal medicine. Furthermore, we are operating PharmDB-K via phExplorer (a network visualization software) and BioMart (a data federation framework) for convenient search and analysis of the TKM network. Database URL: http://pharmdb-k.org, http://biomart.i-pharm.org. PMID:26555441

  13. Plant Reactome: a resource for plant pathways and comparative analysis.

    PubMed

    Naithani, Sushma; Preece, Justin; D'Eustachio, Peter; Gupta, Parul; Amarasinghe, Vindhya; Dharmawardhana, Palitha D; Wu, Guanming; Fabregat, Antonio; Elser, Justin L; Weiser, Joel; Keays, Maria; Fuentes, Alfonso Munoz-Pomer; Petryszak, Robert; Stein, Lincoln D; Ware, Doreen; Jaiswal, Pankaj

    2017-01-04

    Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. DNAproDB: an interactive tool for structural analysis of DNA–protein complexes

    PubMed Central

    Sagendorf, Jared M.

    2017-01-01

    Abstract Many biological processes are mediated by complex interactions between DNA and proteins. Transcription factors, various polymerases, nucleases and histones recognize and bind DNA with different levels of binding specificity. To understand the physical mechanisms that allow proteins to recognize DNA and achieve their biological functions, it is important to analyze structures of DNA–protein complexes in detail. DNAproDB is a web-based interactive tool designed to help researchers study these complexes. DNAproDB provides an automated structure-processing pipeline that extracts structural features from DNA–protein complexes. The extracted features are organized in structured data files, which are easily parsed with any programming language or viewed in a browser. We processed a large number of DNA–protein complexes retrieved from the Protein Data Bank and created the DNAproDB database to store this data. Users can search the database by combining features of the DNA, protein or DNA–protein interactions at the interface. Additionally, users can upload their own structures for processing privately and securely. DNAproDB provides several interactive and customizable tools for creating visualizations of the DNA–protein interface at different levels of abstraction that can be exported as high quality figures. All functionality is documented and freely accessible at http://dnaprodb.usc.edu. PMID:28431131

  15. D-peaks: a visual tool to display ChIP-seq peaks along the genome.

    PubMed

    Brohée, Sylvain; Bontempi, Gianluca

    2012-01-01

    ChIP-sequencing is a method of choice to localize the positions of protein binding sites on DNA on a whole genomic scale. The deciphering of the sequencing data produced by this novel technique is challenging and it is achieved by their rigorous interpretation using dedicated tools and adapted visualization programs. Here, we present a bioinformatics tool (D-peaks) that adds several possibilities (including, user-friendliness, high-quality, relative position with respect to the genomic features) to the well-known visualization browsers or databases already existing. D-peaks is directly available through its web interface http://rsat.ulb.ac.be/dpeaks/ as well as a command line tool.

  16. Network portal: a database for storage, analysis and visualization of biological networks

    PubMed Central

    Turkarslan, Serdar; Wurtmann, Elisabeth J.; Wu, Wei-Ju; Jiang, Ning; Bare, J. Christopher; Foley, Karen; Reiss, David J.; Novichkov, Pavel; Baliga, Nitin S.

    2014-01-01

    The ease of generating high-throughput data has enabled investigations into organismal complexity at the systems level through the inference of networks of interactions among the various cellular components (genes, RNAs, proteins and metabolites). The wider scientific community, however, currently has limited access to tools for network inference, visualization and analysis because these tasks often require advanced computational knowledge and expensive computing resources. We have designed the network portal (http://networks.systemsbiology.net) to serve as a modular database for the integration of user uploaded and public data, with inference algorithms and tools for the storage, visualization and analysis of biological networks. The portal is fully integrated into the Gaggle framework to seamlessly exchange data with desktop and web applications and to allow the user to create, save and modify workspaces, and it includes social networking capabilities for collaborative projects. While the current release of the database contains networks for 13 prokaryotic organisms from diverse phylogenetic clades (4678 co-regulated gene modules, 3466 regulators and 9291 cis-regulatory motifs), it will be rapidly populated with prokaryotic and eukaryotic organisms as relevant data become available in public repositories and through user input. The modular architecture, simple data formats and open API support community development of the portal. PMID:24271392

  17. Motivated Proteins: A web application for studying small three-dimensional protein motifs

    PubMed Central

    Leader, David P; Milner-White, E James

    2009-01-01

    Background Small loop-shaped motifs are common constituents of the three-dimensional structure of proteins. Typically they comprise between three and seven amino acid residues, and are defined by a combination of dihedral angles and hydrogen bonding partners. The most abundant of these are αβ-motifs, asx-motifs, asx-turns, β-bulges, β-bulge loops, β-turns, nests, niches, Schellmann loops, ST-motifs, ST-staples and ST-turns. We have constructed a database of such motifs from a range of high-quality protein structures and built a web application as a visual interface to this. Description The web application, Motivated Proteins, provides access to these 12 motifs (with 48 sub-categories) in a database of over 400 representative proteins. Queries can be made for specific categories or sub-categories of motif, motifs in the vicinity of ligands, motifs which include part of an enzyme active site, overlapping motifs, or motifs which include a particular amino acid sequence. Individual proteins can be specified, or, where appropriate, motifs for all proteins listed. The results of queries are presented in textual form as an (X)HTML table, and may be saved as parsable plain text or XML. Motifs can be viewed and manipulated either individually or in the context of the protein in the Jmol applet structural viewer. Cartoons of the motifs imposed on a linear representation of protein secondary structure are also provided. Summary information for the motifs is available, as are histograms of amino acid distribution, and graphs of dihedral angles at individual positions in the motifs. Conclusion Motivated Proteins is a publicly and freely accessible web application that enables protein scientists to study small three-dimensional motifs without requiring knowledge of either Structured Query Language or the underlying database schema. PMID:19210785

  18. Vivaldi: visualization and validation of biomacromolecular NMR structures from the PDB.

    PubMed

    Hendrickx, Pieter M S; Gutmanas, Aleksandras; Kleywegt, Gerard J

    2013-04-01

    We describe Vivaldi (VIsualization and VALidation DIsplay; http://pdbe.org/vivaldi), a web-based service for the analysis, visualization, and validation of NMR structures in the Protein Data Bank (PDB). Vivaldi provides access to model coordinates and several types of experimental NMR data using interactive visualization tools, augmented with structural annotations and model-validation information. The service presents information about the modeled NMR ensemble, validation of experimental chemical shifts, residual dipolar couplings, distance and dihedral angle constraints, as well as validation scores based on empirical knowledge and databases. Vivaldi was designed for both expert NMR spectroscopists and casual non-expert users who wish to obtain a better grasp of the information content and quality of NMR structures in the public archive. Copyright © 2013 Wiley Periodicals, Inc.

  19. Updated regulation curation model at the Saccharomyces Genome Database

    PubMed Central

    Engel, Stacia R; Skrzypek, Marek S; Hellerstedt, Sage T; Wong, Edith D; Nash, Robert S; Weng, Shuai; Binkley, Gail; Sheppard, Travis K; Karra, Kalpana; Cherry, J Michael

    2018-01-01

    Abstract The Saccharomyces Genome Database (SGD) provides comprehensive, integrated biological information for the budding yeast Saccharomyces cerevisiae, along with search and analysis tools to explore these data, enabling the discovery of functional relationships between sequence and gene products in fungi and higher organisms. We have recently expanded our data model for regulation curation to address regulation at the protein level in addition to transcription, and are presenting the expanded data on the ‘Regulation’ pages at SGD. These pages include a summary describing the context under which the regulator acts, manually curated and high-throughput annotations showing the regulatory relationships for that gene and a graphical visualization of its regulatory network and connected networks. For genes whose products regulate other genes or proteins, the Regulation page includes Gene Ontology enrichment analysis of the biological processes in which those targets participate. For DNA-binding transcription factors, we also provide other information relevant to their regulatory function, such as DNA binding site motifs and protein domains. As with other data types at SGD, all regulatory relationships and accompanying data are available through YeastMine, SGD’s data warehouse based on InterMine. Database URL: http://www.yeastgenome.org PMID:29688362

  20. MDB: the Metalloprotein Database and Browser at The Scripps Research Institute

    PubMed Central

    Castagnetto, Jesus M.; Hennessy, Sean W.; Roberts, Victoria A.; Getzoff, Elizabeth D.; Tainer, John A.; Pique, Michael E.

    2002-01-01

    The Metalloprotein Database and Browser (MDB; http://metallo.scripps.edu) at The Scripps Research Institute is a web-accessible resource for metalloprotein research. It offers the scientific community quantitative information on geometrical parameters of metal-binding sites in protein structures available from the Protein Data Bank (PDB). The MDB also offers analytical tools for the examination of trends or patterns in the indexed metal-binding sites. A user can perform interactive searches, metal-site structure visualization (via a Java applet), and analysis of the quantitative data by accessing the MDB through a web browser without requiring an external application or platform-dependent plugin. The MDB also has a non-interactive interface with which other web sites and network-aware applications can seamlessly incorporate data or statistical analysis results from metal-binding sites. The information contained in the MDB is periodically updated with automated algorithms that find and index metal sites from new protein structures released by the PDB. PMID:11752342

  1. Computer systems and methods for the query and visualization multidimensional databases

    DOEpatents

    Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

    2017-04-25

    A method of generating a data visualization is performed at a computer having a display, one or more processors, and memory. The memory stores one or more programs for execution by the one or more processors. The process receives user specification of a plurality of characteristics of a data visualization. The data visualization is based on data from a multidimensional database. The characteristics specify at least x-position and y-position of data marks corresponding to tuples of data retrieved from the database. The process generates a data visualization according to the specified plurality of characteristics. The data visualization has an x-axis defined based on data for one or more first fields from the database that specify x-position of the data marks and the data visualization has a y-axis defined based on data for one or more second fields from the database that specify y-position of the data marks.

  2. cisPath: an R/Bioconductor package for cloud users for visualization and management of functional protein interaction networks.

    PubMed

    Wang, Likun; Yang, Luhe; Peng, Zuohan; Lu, Dan; Jin, Yan; McNutt, Michael; Yin, Yuxin

    2015-01-01

    With the burgeoning development of cloud technology and services, there are an increasing number of users who prefer cloud to run their applications. All software and associated data are hosted on the cloud, allowing users to access them via a web browser from any computer, anywhere. This paper presents cisPath, an R/Bioconductor package deployed on cloud servers for client users to visualize, manage, and share functional protein interaction networks. With this R package, users can easily integrate downloaded protein-protein interaction information from different online databases with private data to construct new and personalized interaction networks. Additional functions allow users to generate specific networks based on private databases. Since the results produced with the use of this package are in the form of web pages, cloud users can easily view and edit the network graphs via the browser, using a mouse or touch screen, without the need to download them to a local computer. This package can also be installed and run on a local desktop computer. Depending on user preference, results can be publicized or shared by uploading to a web server or cloud driver, allowing other users to directly access results via a web browser. This package can be installed and run on a variety of platforms. Since all network views are shown in web pages, such package is particularly useful for cloud users. The easy installation and operation is an attractive quality for R beginners and users with no previous experience with cloud services.

  3. cisPath: an R/Bioconductor package for cloud users for visualization and management of functional protein interaction networks

    PubMed Central

    2015-01-01

    Background With the burgeoning development of cloud technology and services, there are an increasing number of users who prefer cloud to run their applications. All software and associated data are hosted on the cloud, allowing users to access them via a web browser from any computer, anywhere. This paper presents cisPath, an R/Bioconductor package deployed on cloud servers for client users to visualize, manage, and share functional protein interaction networks. Results With this R package, users can easily integrate downloaded protein-protein interaction information from different online databases with private data to construct new and personalized interaction networks. Additional functions allow users to generate specific networks based on private databases. Since the results produced with the use of this package are in the form of web pages, cloud users can easily view and edit the network graphs via the browser, using a mouse or touch screen, without the need to download them to a local computer. This package can also be installed and run on a local desktop computer. Depending on user preference, results can be publicized or shared by uploading to a web server or cloud driver, allowing other users to directly access results via a web browser. Conclusions This package can be installed and run on a variety of platforms. Since all network views are shown in web pages, such package is particularly useful for cloud users. The easy installation and operation is an attractive quality for R beginners and users with no previous experience with cloud services. PMID:25708840

  4. Update on Genomic Databases and Resources at the National Center for Biotechnology Information.

    PubMed

    Tatusova, Tatiana

    2016-01-01

    The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data.

  5. FASMA: a service to format and analyze sequences in multiple alignments.

    PubMed

    Costantini, Susan; Colonna, Giovanni; Facchiano, Angelo M

    2007-12-01

    Multiple sequence alignments are successfully applied in many studies for under- standing the structural and functional relations among single nucleic acids and protein sequences as well as whole families. Because of the rapid growth of sequence databases, multiple sequence alignments can often be very large and difficult to visualize and analyze. We offer a new service aimed to visualize and analyze the multiple alignments obtained with different external algorithms, with new features useful for the comparison of the aligned sequences as well as for the creation of a final image of the alignment. The service is named FASMA and is available at http://bioinformatica.isa.cnr.it/FASMA/.

  6. Comprehensive Reconstruction and Visualization of Non-Coding Regulatory Networks in Human

    PubMed Central

    Bonnici, Vincenzo; Russo, Francesco; Bombieri, Nicola; Pulvirenti, Alfredo; Giugno, Rosalba

    2014-01-01

    Research attention has been powered to understand the functional roles of non-coding RNAs (ncRNAs). Many studies have demonstrated their deregulation in cancer and other human disorders. ncRNAs are also present in extracellular human body fluids such as serum and plasma, giving them a great potential as non-invasive biomarkers. However, non-coding RNAs have been relatively recently discovered and a comprehensive database including all of them is still missing. Reconstructing and visualizing the network of ncRNAs interactions are important steps to understand their regulatory mechanism in complex systems. This work presents ncRNA-DB, a NoSQL database that integrates ncRNAs data interactions from a large number of well established on-line repositories. The interactions involve RNA, DNA, proteins, and diseases. ncRNA-DB is available at http://ncrnadb.scienze.univr.it/ncrnadb/. It is equipped with three interfaces: web based, command-line, and a Cytoscape app called ncINetView. By accessing only one resource, users can search for ncRNAs and their interactions, build a network annotated with all known ncRNAs and associated diseases, and use all visual and mining features available in Cytoscape. PMID:25540777

  7. Comprehensive reconstruction and visualization of non-coding regulatory networks in human.

    PubMed

    Bonnici, Vincenzo; Russo, Francesco; Bombieri, Nicola; Pulvirenti, Alfredo; Giugno, Rosalba

    2014-01-01

    Research attention has been powered to understand the functional roles of non-coding RNAs (ncRNAs). Many studies have demonstrated their deregulation in cancer and other human disorders. ncRNAs are also present in extracellular human body fluids such as serum and plasma, giving them a great potential as non-invasive biomarkers. However, non-coding RNAs have been relatively recently discovered and a comprehensive database including all of them is still missing. Reconstructing and visualizing the network of ncRNAs interactions are important steps to understand their regulatory mechanism in complex systems. This work presents ncRNA-DB, a NoSQL database that integrates ncRNAs data interactions from a large number of well established on-line repositories. The interactions involve RNA, DNA, proteins, and diseases. ncRNA-DB is available at http://ncrnadb.scienze.univr.it/ncrnadb/. It is equipped with three interfaces: web based, command-line, and a Cytoscape app called ncINetView. By accessing only one resource, users can search for ncRNAs and their interactions, build a network annotated with all known ncRNAs and associated diseases, and use all visual and mining features available in Cytoscape.

  8. Proteome-wide Subcellular Topologies of E. coli Polypeptides Database (STEPdb)*

    PubMed Central

    Orfanoudaki, Georgia; Economou, Anastassios

    2014-01-01

    Cell compartmentalization serves both the isolation and the specialization of cell functions. After synthesis in the cytoplasm, over a third of all proteins are targeted to other subcellular compartments. Knowing how proteins are distributed within the cell and how they interact is a prerequisite for understanding it as a whole. Surface and secreted proteins are important pathogenicity determinants. Here we present the STEP database (STEPdb) that contains a comprehensive characterization of subcellular localization and topology of the complete proteome of Escherichia coli. Two widely used E. coli proteomes (K-12 and BL21) are presented organized into thirteen subcellular classes. STEPdb exploits the wealth of genetic, proteomic, biochemical, and functional information on protein localization, secretion, and targeting in E. coli, one of the best understood model organisms. Subcellular annotations were derived from a combination of bioinformatics prediction, proteomic, biochemical, functional, topological data and extensive literature re-examination that were refined through manual curation. Strong experimental support for the location of 1553 out of 4303 proteins was based on 426 articles and some experimental indications for another 526. Annotations were provided for another 320 proteins based on firm bioinformatic predictions. STEPdb is the first database that contains an extensive set of peripheral IM proteins (PIM proteins) and includes their graphical visualization into complexes, cellular functions, and interactions. It also summarizes all currently known protein export machineries of E. coli K-12 and pairs them, where available, with the secretory proteins that use them. It catalogs the Sec- and TAT-utilizing secretomes and summarizes their topological features such as signal peptides and transmembrane regions, transmembrane topologies and orientations. It also catalogs physicochemical and structural features that influence topology such as abundance, solubility, disorder, heat resistance, and structural domain families. Finally, STEPdb incorporates prediction tools for topology (TMHMM, SignalP, and Phobius) and disorder (IUPred) and implements the BLAST2STEP that performs protein homology searches against the STEPdb. PMID:25210196

  9. Linking microarray reporters with protein functions.

    PubMed

    Gaj, Stan; van Erk, Arie; van Haaften, Rachel I M; Evelo, Chris T A

    2007-09-26

    The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways. This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways. Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/.

  10. Automatic visualization of 3D geometry contained in online databases

    NASA Astrophysics Data System (ADS)

    Zhang, Jie; John, Nigel W.

    2003-04-01

    In this paper, the application of the Virtual Reality Modeling Language (VRML) for efficient database visualization is analyzed. With the help of JAVA programming, three examples of automatic visualization from a database containing 3-D Geometry are given. The first example is used to create basic geometries. The second example is used to create cylinders with a defined start point and end point. The third example is used to processs data from an old copper mine complex in Cheshire, United Kingdom. Interactive 3-D visualization of all geometric data in an online database is achieved with JSP technology.

  11. PIQMIe: a web server for semi-quantitative proteomics data management and analysis

    PubMed Central

    Kuzniar, Arnold; Kanaar, Roland

    2014-01-01

    We present the Proteomics Identifications and Quantitations Data Management and Integration Service or PIQMIe that aids in reliable and scalable data management, analysis and visualization of semi-quantitative mass spectrometry based proteomics experiments. PIQMIe readily integrates peptide and (non-redundant) protein identifications and quantitations from multiple experiments with additional biological information on the protein entries, and makes the linked data available in the form of a light-weight relational database, which enables dedicated data analyses (e.g. in R) and user-driven queries. Using the web interface, users are presented with a concise summary of their proteomics experiments in numerical and graphical forms, as well as with a searchable protein grid and interactive visualization tools to aid in the rapid assessment of the experiments and in the identification of proteins of interest. The web server not only provides data access through a web interface but also supports programmatic access through RESTful web service. The web server is available at http://piqmie.semiqprot-emc.cloudlet.sara.nl or http://www.bioinformatics.nl/piqmie. This website is free and open to all users and there is no login requirement. PMID:24861615

  12. PIQMIe: a web server for semi-quantitative proteomics data management and analysis.

    PubMed

    Kuzniar, Arnold; Kanaar, Roland

    2014-07-01

    We present the Proteomics Identifications and Quantitations Data Management and Integration Service or PIQMIe that aids in reliable and scalable data management, analysis and visualization of semi-quantitative mass spectrometry based proteomics experiments. PIQMIe readily integrates peptide and (non-redundant) protein identifications and quantitations from multiple experiments with additional biological information on the protein entries, and makes the linked data available in the form of a light-weight relational database, which enables dedicated data analyses (e.g. in R) and user-driven queries. Using the web interface, users are presented with a concise summary of their proteomics experiments in numerical and graphical forms, as well as with a searchable protein grid and interactive visualization tools to aid in the rapid assessment of the experiments and in the identification of proteins of interest. The web server not only provides data access through a web interface but also supports programmatic access through RESTful web service. The web server is available at http://piqmie.semiqprot-emc.cloudlet.sara.nl or http://www.bioinformatics.nl/piqmie. This website is free and open to all users and there is no login requirement. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Bioinformatics analysis of transcriptome dynamics during growth in angus cattle longissimus muscle.

    PubMed

    Moisá, Sonia J; Shike, Daniel W; Graugnard, Daniel E; Rodriguez-Zas, Sandra L; Everts, Robin E; Lewin, Harris A; Faulkner, Dan B; Berger, Larry L; Loor, Juan J

    2013-01-01

    Transcriptome dynamics in the longissimus muscle (LM) of young Angus cattle were evaluated at 0, 60, 120, and 220 days from early-weaning. Bioinformatic analysis was performed using the dynamic impact approach (DIA) by means of Kyoto Encyclopedia of Genes and Genomes (KEGG) and Database for Annotation, Visualization and Integrated Discovery (DAVID) databases. Between 0 to 120 days (growing phase) most of the highly-impacted pathways (eg, ascorbate and aldarate metabolism, drug metabolism, cytochrome P450 and Retinol metabolism) were inhibited. The phase between 120 to 220 days (finishing phase) was characterized by the most striking differences with 3,784 differentially expressed genes (DEGs). Analysis of those DEGs revealed that the most impacted KEGG canonical pathway was glycosylphosphatidylinositol (GPI)-anchor biosynthesis, which was inhibited. Furthermore, inhibition of calpastatin and activation of tyrosine aminotransferase ubiquitination at 220 days promotes proteasomal degradation, while the concurrent activation of ribosomal proteins promotes protein synthesis. Therefore, the balance of these processes likely results in a steady-state of protein turnover during the finishing phase. Results underscore the importance of transcriptome dynamics in LM during growth.

  14. SuperSweet—a resource on natural and artificial sweetening agents

    PubMed Central

    Ahmed, Jessica; Preissner, Saskia; Dunkel, Mathias; Worth, Catherine L.; Eckert, Andreas; Preissner, Robert

    2011-01-01

    A vast number of sweet tasting molecules are known, encompassing small compounds, carbohydrates, d-amino acids and large proteins. Carbohydrates play a particularly big role in human diet. The replacement of sugars in food with artificial sweeteners is common and is a general approach to prevent cavities, obesity and associated diseases such as diabetes and hyperlipidemia. Knowledge about the molecular basis of taste may reveal new strategies to overcome diet-induced diseases. In this context, the design of safe, low-calorie sweeteners is particularly important. Here, we provide a comprehensive collection of carbohydrates, artificial sweeteners and other sweet tasting agents like proteins and peptides. Additionally, structural information and properties such as number of calories, therapeutic annotations and a sweetness-index are stored in SuperSweet. Currently, the database consists of more than 8000 sweet molecules. Moreover, the database provides a modeled 3D structure of the sweet taste receptor and binding poses of the small sweet molecules. These binding poses provide hints for the design of new sweeteners. A user-friendly graphical interface allows similarity searching, visualization of docked sweeteners into the receptor etc. A sweetener classification tree and browsing features allow quick requests to be made to the database. The database is freely available at: http://bioinformatics.charite.de/sweet/. PMID:20952410

  15. SuperSweet--a resource on natural and artificial sweetening agents.

    PubMed

    Ahmed, Jessica; Preissner, Saskia; Dunkel, Mathias; Worth, Catherine L; Eckert, Andreas; Preissner, Robert

    2011-01-01

    A vast number of sweet tasting molecules are known, encompassing small compounds, carbohydrates, d-amino acids and large proteins. Carbohydrates play a particularly big role in human diet. The replacement of sugars in food with artificial sweeteners is common and is a general approach to prevent cavities, obesity and associated diseases such as diabetes and hyperlipidemia. Knowledge about the molecular basis of taste may reveal new strategies to overcome diet-induced diseases. In this context, the design of safe, low-calorie sweeteners is particularly important. Here, we provide a comprehensive collection of carbohydrates, artificial sweeteners and other sweet tasting agents like proteins and peptides. Additionally, structural information and properties such as number of calories, therapeutic annotations and a sweetness-index are stored in SuperSweet. Currently, the database consists of more than 8000 sweet molecules. Moreover, the database provides a modeled 3D structure of the sweet taste receptor and binding poses of the small sweet molecules. These binding poses provide hints for the design of new sweeteners. A user-friendly graphical interface allows similarity searching, visualization of docked sweeteners into the receptor etc. A sweetener classification tree and browsing features allow quick requests to be made to the database. The database is freely available at: http://bioinformatics.charite.de/sweet/.

  16. The BRENDA enzyme information system-From a database to an expert system.

    PubMed

    Schomburg, I; Jeske, L; Ulbrich, M; Placzek, S; Chang, A; Schomburg, D

    2017-11-10

    Enzymes, representing the largest and by far most complex group of proteins, play an essential role in all processes of life, including metabolism, gene expression, cell division, the immune system, and others. Their function, also connected to most diseases or stress control makes them interesting targets for research and applications in biotechnology, medical treatments, or diagnosis. Their functional parameters and other properties are collected, integrated, and made available to the scientific community in the BRaunschweig ENzyme DAtabase (BRENDA). In the last 30 years BRENDA has developed into one of the most highly used biological databases worldwide. The data contents, the process of data acquisition, data integration and control, the ways to access the data, and visualizations provided by the website are described and discussed. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  17. ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases.

    PubMed

    Shen, Li; Shao, Ningyi; Liu, Xiaochuan; Nestler, Eric

    2014-04-15

    Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. We have developed ngs.plot - a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.

  18. ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases

    PubMed Central

    2014-01-01

    Background Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. Results We have developed ngs.plot – a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. Conclusions We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data. PMID:24735413

  19. Developing Visualization Support System for Teaching/Learning Database Normalization

    ERIC Educational Resources Information Center

    Folorunso, Olusegun; Akinwale, AdioTaofeek

    2010-01-01

    Purpose: In tertiary institution, some students find it hard to learn database design theory, in particular, database normalization. The purpose of this paper is to develop a visualization tool to give students an interactive hands-on experience in database normalization process. Design/methodology/approach: The model-view-controller architecture…

  20. A PDB-wide, evolution-based assessment of protein-protein interfaces.

    PubMed

    Baskaran, Kumaran; Duarte, Jose M; Biyani, Nikhil; Bliven, Spencer; Capitani, Guido

    2014-10-18

    Thanks to the growth in sequence and structure databases, more than 50 million sequences are now available in UniProt and 100,000 structures in the PDB. Rich information about protein-protein interfaces can be obtained by a comprehensive study of protein contacts in the PDB, their sequence conservation and geometric features. An automated computational pipeline was developed to run our Evolutionary Protein-Protein Interface Classifier (EPPIC) software on the entire PDB and store the results in a relational database, currently containing > 800,000 interfaces. This allows the analysis of interface data on a PDB-wide scale. Two large benchmark datasets of biological interfaces and crystal contacts, each containing about 3000 entries, were automatically generated based on criteria thought to be strong indicators of interface type. The BioMany set of biological interfaces includes NMR dimers solved as crystal structures and interfaces that are preserved across diverse crystal forms, as catalogued by the Protein Common Interface Database (ProtCID) from Xu and Dunbrack. The second dataset, XtalMany, is derived from interfaces that would lead to infinite assemblies and are therefore crystal contacts. BioMany and XtalMany were used to benchmark the EPPIC approach. The performance of EPPIC was also compared to classifications from the Protein Interfaces, Surfaces, and Assemblies (PISA) program on a PDB-wide scale, finding that the two approaches give the same call in about 88% of PDB interfaces. By comparing our safest predictions to the PDB author annotations, we provide a lower-bound estimate of the error rate of biological unit annotations in the PDB. Additionally, we developed a PyMOL plugin for direct download and easy visualization of EPPIC interfaces for any PDB entry. Both the datasets and the PyMOL plugin are available at http://www.eppic-web.org/ewui/\\#downloads. Our computational pipeline allows us to analyze protein-protein contacts and their sequence conservation across the entire PDB. Two new benchmark datasets are provided, which are over an order of magnitude larger than existing manually curated ones. These tools enable the comprehensive study of several aspects of protein-protein contacts in the PDB and represent a basis for future, even larger scale studies of protein-protein interactions.

  1. Reconfigurable Image Generator

    NASA Technical Reports Server (NTRS)

    Archdeacon, John L. (Inventor); Iwai, Nelson H. (Inventor); Kato, Kenji H. (Inventor); Sweet, Barbara T. (Inventor)

    2017-01-01

    A RiG may simulate visual conditions of a real world environment, and generate the necessary amount of pixels in a visual simulation at rates up to 120 frames per second. RiG may also include a database generation system capable of producing visual databases suitable to drive the visual fidelity required by the RiG.

  2. Comprehensive Identification of Proteins from MALDI Imaging*

    PubMed Central

    Maier, Stefan K.; Hahne, Hannes; Gholami, Amin Moghaddas; Balluff, Benjamin; Meding, Stephan; Schoene, Cédrik; Walch, Axel K.; Kuster, Bernhard

    2013-01-01

    Matrix-assisted laser desorption/ionization imaging mass spectrometry (MALDI IMS) is a powerful tool for the visualization of proteins in tissues and has demonstrated considerable diagnostic and prognostic value. One main challenge is that the molecular identity of such potential biomarkers mostly remains unknown. We introduce a generic method that removes this issue by systematically identifying the proteins embedded in the MALDI matrix using a combination of bottom-up and top-down proteomics. The analyses of ten human tissues lead to the identification of 1400 abundant and soluble proteins constituting the set of proteins detectable by MALDI IMS including >90% of all IMS biomarkers reported in the literature. Top-down analysis of the matrix proteome identified 124 mostly N- and C-terminally fragmented proteins indicating considerable protein processing activity in tissues. All protein identification data from this study as well as the IMS literature has been deposited into MaTisse, a new publically available database, which we anticipate will become a valuable resource for the IMS community. PMID:23782541

  3. RepeatsDB-lite: a web server for unit annotation of tandem repeat proteins.

    PubMed

    Hirsh, Layla; Paladin, Lisanna; Piovesan, Damiano; Tosatto, Silvio C E

    2018-05-09

    RepeatsDB-lite (http://protein.bio.unipd.it/repeatsdb-lite) is a web server for the prediction of repetitive structural elements and units in tandem repeat (TR) proteins. TRs are a widespread but poorly annotated class of non-globular proteins carrying heterogeneous functions. RepeatsDB-lite extends the prediction to all TR types and strongly improves the performance both in terms of computational time and accuracy over previous methods, with precision above 95% for solenoid structures. The algorithm exploits an improved TR unit library derived from the RepeatsDB database to perform an iterative structural search and assignment. The web interface provides tools for analyzing the evolutionary relationships between units and manually refine the prediction by changing unit positions and protein classification. An all-against-all structure-based sequence similarity matrix is calculated and visualized in real-time for every user edit. Reviewed predictions can be submitted to RepeatsDB for review and inclusion.

  4. SURVEY AND SUMMARY: The morph server: a standardized system for analyzing and visualizing macromolecular motions in a database framework

    PubMed Central

    Krebs, Werner G.; Gerstein, Mark

    2000-01-01

    The number of solved structures of macromolecules that have the same fold and thus exhibit some degree of conformational variability is rapidly increasing. It is consequently advantageous to develop a standardized terminology for describing this variability and automated systems for processing protein structures in different conformations. We have developed such a system as a ‘front-end’ server to our database of macromolecular motions. Our system attempts to describe a protein motion as a rigid-body rotation of a small ‘core’ relative to a larger one, using a set of hinges. The motion is placed in a standardized coordinate system so that all statistics between any two motions are directly comparable. We find that while this model can accommodate most protein motions, it cannot accommodate all; the degree to which a motion can be accommodated provides an aid in classifying it. Furthermore, we perform an adiabatic mapping (a restrained interpolation) between every two conformations. This gives some indication of the extent of the energetic barriers that need to be surmounted in the motion, and as a by-product results in a ‘morph movie’. We make these movies available over the Web to aid in visualization. Many instances of conformational variability occur between proteins with somewhat different sequences. We can accommodate these differences in a rough fashion, generating an ‘evolutionary morph’. Users have already submitted hundreds of examples of protein motions to our server, producing a comprehensive set of statistics. So far the statistics show that the median submitted motion has a rotation of ~10° and a maximum Cα displacement of 17 Å. Almost all involve at least one large torsion angle change of >140°. The server is accessible at http://bioinfo.mbb.yale.edu/MolMovDB PMID:10734184

  5. Integrated remote sensing and visualization (IRSV) system for transportation infrastructure operations and management, phase two, volume 4 : web-based bridge information database--visualization analytics and distributed sensing.

    DOT National Transportation Integrated Search

    2012-03-01

    This report introduces the design and implementation of a Web-based bridge information visual analytics system. This : project integrates Internet, multiple databases, remote sensing, and other visualization technologies. The result : combines a GIS ...

  6. The Biomolecular Interaction Network Database and related tools 2005 update

    PubMed Central

    Alfarano, C.; Andrade, C. E.; Anthony, K.; Bahroos, N.; Bajec, M.; Bantoft, K.; Betel, D.; Bobechko, B.; Boutilier, K.; Burgess, E.; Buzadzija, K.; Cavero, R.; D'Abreo, C.; Donaldson, I.; Dorairajoo, D.; Dumontier, M. J.; Dumontier, M. R.; Earles, V.; Farrall, R.; Feldman, H.; Garderman, E.; Gong, Y.; Gonzaga, R.; Grytsan, V.; Gryz, E.; Gu, V.; Haldorsen, E.; Halupa, A.; Haw, R.; Hrvojic, A.; Hurrell, L.; Isserlin, R.; Jack, F.; Juma, F.; Khan, A.; Kon, T.; Konopinsky, S.; Le, V.; Lee, E.; Ling, S.; Magidin, M.; Moniakis, J.; Montojo, J.; Moore, S.; Muskat, B.; Ng, I.; Paraiso, J. P.; Parker, B.; Pintilie, G.; Pirone, R.; Salama, J. J.; Sgro, S.; Shan, T.; Shu, Y.; Siew, J.; Skinner, D.; Snyder, K.; Stasiuk, R.; Strumpf, D.; Tuekam, B.; Tao, S.; Wang, Z.; White, M.; Willis, R.; Wolting, C.; Wong, S.; Wrong, A.; Xin, C.; Yao, R.; Yates, B.; Zhang, S.; Zheng, K.; Pawson, T.; Ouellette, B. F. F.; Hogue, C. W. V.

    2005-01-01

    The Biomolecular Interaction Network Database (BIND) (http://bind.ca) archives biomolecular interaction, reaction, complex and pathway information. Our aim is to curate the details about molecular interactions that arise from published experimental research and to provide this information, as well as tools to enable data analysis, freely to researchers worldwide. BIND data are curated into a comprehensive machine-readable archive of computable information and provides users with methods to discover interactions and molecular mechanisms. BIND has worked to develop new methods for visualization that amplify the underlying annotation of genes and proteins to facilitate the study of molecular interaction networks. BIND has maintained an open database policy since its inception in 1999. Data growth has proceeded at a tremendous rate, approaching over 100 000 records. New services provided include a new BIND Query and Submission interface, a Standard Object Access Protocol service and the Small Molecule Interaction Database (http://smid.blueprint.org) that allows users to determine probable small molecule binding sites of new sequences and examine conserved binding residues. PMID:15608229

  7. NCBI GEO: mining millions of expression profiles--database and tools.

    PubMed

    Barrett, Tanya; Suzek, Tugba O; Troup, Dennis B; Wilhite, Stephen E; Ngau, Wing-Chi; Ledoux, Pierre; Rudnev, Dmitry; Lash, Alex E; Fujibuchi, Wataru; Edgar, Ron

    2005-01-01

    The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest fully public repository for high-throughput molecular abundance data, primarily gene expression data. The database has a flexible and open design that allows the submission, storage and retrieval of many data types. These data include microarray-based experiments measuring the abundance of mRNA, genomic DNA and protein molecules, as well as non-array-based technologies such as serial analysis of gene expression (SAGE) and mass spectrometry proteomic technology. GEO currently holds over 30,000 submissions representing approximately half a billion individual molecular abundance measurements, for over 100 organisms. Here, we describe recent database developments that facilitate effective mining and visualization of these data. Features are provided to examine data from both experiment- and gene-centric perspectives using user-friendly Web-based interfaces accessible to those without computational or microarray-related analytical expertise. The GEO database is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.

  8. Molecular differences between mature and immature dental pulp cells: Bioinformatics and preliminary results.

    PubMed

    Chen, Long; Jiang, Yifeng; Du, Zhen

    2018-04-01

    Although previous studies have demonstrated that dental pulp stem cells (DPSCs) from mature and immature teeth exhibit potential for multi-directional differentiation, the molecular and biological difference between the DPSCs from mature and immature permanent teeth has not been fully investigated. In the present study, 500 differentially expressed genes from dental pulp cells (DPCs) in mature and immature permanent teeth were obtained from the Gene Expression Omnibus online database. Based on bioinformatics analysis using the Database for Annotation, Visualization and Integrated Discovery, these genes were divided into a number of subgroups associated with immunity, inflammation and cell signaling. The results of the present study suggest that immune features, response to infection and cell signaling may be different in DPCs from mature and immature permanent teeth; furthermore, DPCs from immature permanent teeth may be more suitable for use in tissue engineering or stem cell therapy. The Online Mendelian Inheritance in Man database stated that Sonic Hedgehog (SHH), a differentially expressed gene in DPCs from mature and immature permanent teeth, serves a crucial role in the development of craniofacial tissues, including teeth, which further confirmed that SHH may cause DPCs from mature and immature permanent teeth to exhibit different biological characteristics. The Search Tool for the Retrieval of Interacting Genes/Proteins database revealed that SHH has functional protein associations with a number of other proteins, including Glioma-associated oncogene (GLI)1, GLI2, growth arrest-specific protein 1, bone morphogenetic protein (BMP)2 and BMP4, in mice and humans. It was also demonstrated that SHH may interact with other genes to regulate the biological characteristics of DPCs. The results of the present study may provide a useful reference basis for selecting suitable DPSCs and molecules for the treatment of these cells to optimize features for tissue engineering or stem cell therapy. Quantitative polymerase chain reaction should be performed to confirm the differential expression of these genes prior to the beginning of a functional study.

  9. Causal biological network database: a comprehensive platform of causal biological network models focused on the pulmonary and vascular systems

    PubMed Central

    Boué, Stéphanie; Talikka, Marja; Westra, Jurjen Willem; Hayes, William; Di Fabio, Anselmo; Park, Jennifer; Schlage, Walter K.; Sewer, Alain; Fields, Brett; Ansari, Sam; Martin, Florian; Veljkovic, Emilija; Kenney, Renee; Peitsch, Manuel C.; Hoeng, Julia

    2015-01-01

    With the wealth of publications and data available, powerful and transparent computational approaches are required to represent measured data and scientific knowledge in a computable and searchable format. We developed a set of biological network models, scripted in the Biological Expression Language, that reflect causal signaling pathways across a wide range of biological processes, including cell fate, cell stress, cell proliferation, inflammation, tissue repair and angiogenesis in the pulmonary and cardiovascular context. This comprehensive collection of networks is now freely available to the scientific community in a centralized web-based repository, the Causal Biological Network database, which is composed of over 120 manually curated and well annotated biological network models and can be accessed at http://causalbionet.com. The website accesses a MongoDB, which stores all versions of the networks as JSON objects and allows users to search for genes, proteins, biological processes, small molecules and keywords in the network descriptions to retrieve biological networks of interest. The content of the networks can be visualized and browsed. Nodes and edges can be filtered and all supporting evidence for the edges can be browsed and is linked to the original articles in PubMed. Moreover, networks may be downloaded for further visualization and evaluation. Database URL: http://causalbionet.com PMID:25887162

  10. MCAW-DB: A glycan profile database capturing the ambiguity of glycan recognition patterns.

    PubMed

    Hosoda, Masae; Takahashi, Yushi; Shiota, Masaaki; Shinmachi, Daisuke; Inomoto, Renji; Higashimoto, Shinichi; Aoki-Kinoshita, Kiyoko F

    2018-05-11

    Glycan-binding protein (GBP) interaction experiments, such as glycan microarrays, are often used to understand glycan recognition patterns. However, oftentimes the interpretation of glycan array experimental data makes it difficult to identify discrete GBP binding patterns due to their ambiguity. It is known that lectins, for example, are non-specific in their binding affinities; the same lectin can bind to different monosaccharides or even different glycan structures. In bioinformatics, several tools to mine the data generated from these sorts of experiments have been developed. These tools take a library of predefined motifs, which are commonly-found glycan patterns such as sialyl-Lewis X, and attempt to identify the motif(s) that are specific to the GBP being analyzed. In our previous work, as opposed to using predefined motifs, we developed the Multiple Carbohydrate Alignment with Weights (MCAW) tool to visualize the state of the glycans being recognized by the GBP under analysis. We previously reported on the effectiveness of our tool and algorithm by analyzing several glycan array datasets from the Consortium of Functional Glycomics (CFG). In this work, we report on our analysis of 1081 data sets which we collected from the CFG, the results of which we have made publicly and freely available as a database called MCAW-DB. We introduce this database, its usage and describe several analysis results. We show how MCAW-DB can be used to analyze glycan-binding patterns of GBPs amidst their ambiguity. For example, the visualization of glycan-binding patterns in MCAW-DB show how they correlate with the concentrations of the samples used in the array experiments. Using MCAW-DB, the patterns of glycans found to bind to various GBP-glycan binding proteins are visualized, indicating the binding "environment" of the glycans. Thus, the ambiguity of glycan recognition is numerically represented, along with the patterns of monosaccharides surrounding the binding region. The profiles in MCAW-DB could potentially be used as predictors of affinity of unknown or novel glycans to particular GBPs by comparing how well they match the existing profiles for those GBPs. Moreover, as the glycan profiles of diseased tissues become available, glycan alignments could also be used to identify glycan biomarkers unique to that tissue. Databases of these alignments may be of great use for drug discovery. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  11. Linking microarray reporters with protein functions

    PubMed Central

    Gaj, Stan; van Erk, Arie; van Haaften, Rachel IM; Evelo, Chris TA

    2007-01-01

    Background The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways. Results This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways. Conclusion Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/. PMID:17897448

  12. The Pathway Tools software.

    PubMed

    Karp, Peter D; Paley, Suzanne; Romero, Pedro

    2002-01-01

    Bioinformatics requires reusable software tools for creating model-organism databases (MODs). The Pathway Tools is a reusable, production-quality software environment for creating a type of MOD called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc (see http://ecocyc.org) integrates our evolving understanding of the genes, proteins, metabolic network, and genetic network of an organism. This paper provides an overview of the four main components of the Pathway Tools: The PathoLogic component supports creation of new PGDBs from the annotated genome of an organism. The Pathway/Genome Navigator provides query, visualization, and Web-publishing services for PGDBs. The Pathway/Genome Editors support interactive updating of PGDBs. The Pathway Tools ontology defines the schema of PGDBs. The Pathway Tools makes use of the Ocelot object database system for data management services for PGDBs. The Pathway Tools has been used to build PGDBs for 13 organisms within SRI and by external users.

  13. COGcollator: a web server for analysis of distant relationships between homologous protein families.

    PubMed

    Dibrova, Daria V; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Mulkidjanian, Armen Y

    2017-11-29

    The Clusters of Orthologous Groups (COGs) of proteins systematize evolutionary related proteins into specific groups with similar functions. However, the available databases do not provide means to assess the extent of similarity between the COGs. We intended to provide a method for identification and visualization of evolutionary relationships between the COGs, as well as a respective web server. Here we introduce the COGcollator, a web tool for identification of evolutionarily related COGs and their further analysis. We demonstrate the utility of this tool by identifying the COGs that contain distant homologs of (i) the catalytic subunit of bacterial rotary membrane ATP synthases and (ii) the DNA/RNA helicases of the superfamily 1. This article was reviewed by Drs. Igor N. Berezovsky, Igor Zhulin and Yuri Wolf.

  14. Revealing the potential pathogenesis of glioma by utilizing a glioma associated protein-protein interaction network.

    PubMed

    Pan, Weiran; Li, Gang; Yang, Xiaoxiao; Miao, Jinming

    2015-04-01

    This study aims to explore the potential mechanism of glioma through bioinformatic approaches. The gene expression profile (GSE4290) of glioma tumor and non-tumor samples was downloaded from Gene Expression Omnibus database. A total of 180 samples were available, including 23 non-tumor and 157 tumor samples. Then the raw data were preprocessed using robust multiarray analysis, and 8,890 differentially expressed genes (DEGs) were identified by using t-test (false discovery rate < 0.0005). Furthermore, 16 known glioma related genes were abstracted from Genetic Association Database. After mapping 8,890 DEGs and 16 known glioma related genes to Human Protein Reference Database, a glioma associated protein-protein interaction network (GAPN) was constructed. In addition, 51 sub-networks in GAPN were screened out through Molecular Complex Detection (score ≥ 1), and sub-network 1 was found to have the closest interaction (score = 3). What' more, for the top 10 sub-networks, Gene Ontology (GO) enrichment analysis (p value < 0.05) was performed, and DEGs involved in sub-network 1 and 2, such as BRMS1L and CCNA1, were predicted to regulate cell growth, cell cycle, and DNA replication via interacting with known glioma related genes. Finally, the overlaps of DEGs and human essential, housekeeping, tissue-specific genes were calculated (p value = 1.0, 1.0, and 0.00014, respectively) and visualized by Venn Diagram package in R. About 61% of human tissue-specific genes were DEGs as well. This research shed new light on the pathogenesis of glioma based on DEGs and GAPN, and our findings might provide potential targets for clinical glioma treatment.

  15. Visualizing the semantic content of large text databases using text maps

    NASA Technical Reports Server (NTRS)

    Combs, Nathan

    1993-01-01

    A methodology for generating text map representations of the semantic content of text databases is presented. Text maps provide a graphical metaphor for conceptualizing and visualizing the contents and data interrelationships of large text databases. Described are a set of experiments conducted against the TIPSTER corpora of Wall Street Journal articles. These experiments provide an introduction to current work in the representation and visualization of documents by way of their semantic content.

  16. Cyclebase 3.0: a multi-organism database on cell-cycle regulation and phenotypes.

    PubMed

    Santos, Alberto; Wernersson, Rasmus; Jensen, Lars Juhl

    2015-01-01

    The eukaryotic cell division cycle is a highly regulated process that consists of a complex series of events and involves thousands of proteins. Researchers have studied the regulation of the cell cycle in several organisms, employing a wide range of high-throughput technologies, such as microarray-based mRNA expression profiling and quantitative proteomics. Due to its complexity, the cell cycle can also fail or otherwise change in many different ways if important genes are knocked out, which has been studied in several microscopy-based knockdown screens. The data from these many large-scale efforts are not easily accessed, analyzed and combined due to their inherent heterogeneity. To address this, we have created Cyclebase--available at http://www.cyclebase.org--an online database that allows users to easily visualize and download results from genome-wide cell-cycle-related experiments. In Cyclebase version 3.0, we have updated the content of the database to reflect changes to genome annotation, added new mRNA and protein expression data, and integrated cell-cycle phenotype information from high-content screens and model-organism databases. The new version of Cyclebase also features a new web interface, designed around an overview figure that summarizes all the cell-cycle-related data for a gene. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. FusionHub: A unified web platform for annotation and visualization of gene fusion events in human cancer.

    PubMed

    Panigrahi, Priyabrata; Jere, Abhay; Anamika, Krishanpal

    2018-01-01

    Gene fusion is a chromosomal rearrangement event which plays a significant role in cancer due to the oncogenic potential of the chimeric protein generated through fusions. At present many databases are available in public domain which provides detailed information about known gene fusion events and their functional role. Existing gene fusion detection tools, based on analysis of transcriptomics data usually report a large number of fusion genes as potential candidates, which could be either known or novel or false positives. Manual annotation of these putative genes is indeed time-consuming. We have developed a web platform FusionHub, which acts as integrated search engine interfacing various fusion gene databases and simplifies large scale annotation of fusion genes in a seamless way. In addition, FusionHub provides three ways of visualizing fusion events: circular view, domain architecture view and network view. Design of potential siRNA molecules through ensemble method is another utility integrated in FusionHub that could aid in siRNA-based targeted therapy. FusionHub is freely available at https://fusionhub.persistent.co.in.

  18. A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics.

    PubMed

    Joyce, Brendan; Lee, Danny; Rubio, Alex; Ogurtsov, Aleksey; Alves, Gelio; Yu, Yi-Kuo

    2018-03-15

    RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId's core program computes accurate E-values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goal here. We have constructed a graphical user interface to facilitate the use of RAId on users' local machines. Written in Java, RAId_GUI not only makes easy executions of RAId but also provides tools for data/spectra visualization, MS-product analysis, molecular isotopic distribution analysis, and graphing the retrieval versus the proportion of false discoveries. The results viewer displays and allows the users to download the analyses results. Both the knowledge-integrated organismal databases and the code package (containing source code, the graphical user interface, and a user manual) are available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/raid.html .

  19. POLYVIEW-MM: web-based platform for animation and analysis of molecular simulations

    PubMed Central

    Porollo, Aleksey; Meller, Jaroslaw

    2010-01-01

    Molecular simulations offer important mechanistic and functional clues in studies of proteins and other macromolecules. However, interpreting the results of such simulations increasingly requires tools that can combine information from multiple structural databases and other web resources, and provide highly integrated and versatile analysis tools. Here, we present a new web server that integrates high-quality animation of molecular motion (MM) with structural and functional analysis of macromolecules. The new tool, dubbed POLYVIEW-MM, enables animation of trajectories generated by molecular dynamics and related simulation techniques, as well as visualization of alternative conformers, e.g. obtained as a result of protein structure prediction methods or small molecule docking. To facilitate structural analysis, POLYVIEW-MM combines interactive view and analysis of conformational changes using Jmol and its tailored extensions, publication quality animation using PyMol, and customizable 2D summary plots that provide an overview of MM, e.g. in terms of changes in secondary structure states and relative solvent accessibility of individual residues in proteins. Furthermore, POLYVIEW-MM integrates visualization with various structural annotations, including automated mapping of known inter-action sites from structural homologs, mapping of cavities and ligand binding sites, transmembrane regions and protein domains. URL: http://polyview.cchmc.org/conform.html. PMID:20504857

  20. VMD DisRg: New User-Friendly Implement for calculation distance and radius of gyration in VMD program

    PubMed Central

    Falsafi-Zadeh, Sajad; Karimi, Zahra; Galehdari, Hamid

    2012-01-01

    Molecular dynamic simulation is a practical and powerful technique for analysis of protein structure. Several programs have been developed to facilitate the mentioned investigation, under them the visual molecular dynamic or VMD is the most frequently used programs. One of the beneficial properties of the VMD is its ability to be extendable by designing new plug-in. We introduce here a new facility of the VMD for distance analysis and radius of gyration of biopolymers such as protein and DNA. Availability The database is available for free at http://trc.ajums.ac.ir/HomePage.aspx/?TabID/=12618/&Site/=trc.ajums.ac/&Lang/=fa-IR PMID:22553393

  1. The Pfam protein families database.

    PubMed

    Punta, Marco; Coggill, Penny C; Eberhardt, Ruth Y; Mistry, Jaina; Tate, John; Boursnell, Chris; Pang, Ningze; Forslund, Kristoffer; Ceric, Goran; Clements, Jody; Heger, Andreas; Holm, Liisa; Sonnhammer, Erik L L; Eddy, Sean R; Bateman, Alex; Finn, Robert D

    2012-01-01

    Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the 'sunburst' representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.

  2. The new protein topology graph library web server.

    PubMed

    Schäfer, Tim; Scheck, Andreas; Bruneß, Daniel; May, Patrick; Koch, Ina

    2016-02-01

    We present a new, extended version of the Protein Topology Graph Library web server. The Protein Topology Graph Library describes the protein topology on the super-secondary structure level. It allows to compute and visualize protein ligand graphs and search for protein structural motifs. The new server features additional information on ligand binding to secondary structure elements, increased usability and an application programming interface (API) to retrieve data, allowing for an automated analysis of protein topology. The Protein Topology Graph Library server is freely available on the web at http://ptgl.uni-frankfurt.de. The website is implemented in PHP, JavaScript, PostgreSQL and Apache. It is supported by all major browsers. The VPLG software that was used to compute the protein ligand graphs and all other data in the database is available under the GNU public license 2.0 from http://vplg.sourceforge.net. tim.schaefer@bioinformatik.uni-frankfurt.de; ina.koch@bioinformatik.uni-frankfurt.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Profiling Oman education data using data visualization technique

    NASA Astrophysics Data System (ADS)

    Alalawi, Sultan Juma Sultan; Shaharanee, Izwan Nizal Mohd; Jamil, Jastini Mohd

    2016-10-01

    This research works presents an innovative data visualization technique to understand and visualize the information of Oman's education data generated from the Ministry of Education Oman "Educational Portal". The Ministry of Education in Sultanate of Oman have huge databases contains massive information. The volume of data in the database increase yearly as many students, teachers and employees enter into the database. The task for discovering and analyzing these vast volumes of data becomes increasingly difficult. Information visualization and data mining offer a better ways in dealing with large volume of information. In this paper, an innovative information visualization technique is developed to visualize the complex multidimensional educational data. Microsoft Excel Dashboard, Visual Basic Application (VBA) and Pivot Table are utilized to visualize the data. Findings from the summarization of the data are presented, and it is argued that information visualization can help related stakeholders to become aware of hidden and interesting information from large amount of data drowning in their educational portal.

  4. Time Series Proteome Profiling

    PubMed Central

    Formolo, Catherine A.; Mintz, Michelle; Takanohashi, Asako; Brown, Kristy J.; Vanderver, Adeline; Halligan, Brian; Hathout, Yetrib

    2014-01-01

    This chapter provides a detailed description of a method used to study temporal changes in the endoplasmic reticulum (ER) proteome of fibroblast cells exposed to ER stress agents (tunicamycin and thapsigargin). Differential stable isotope labeling by amino acids in cell culture (SILAC) is used in combination with crude ER fractionation, SDS–PAGE and LC-MS/MS to define altered protein expression in tunicamycin or thapsigargin treated cells versus untreated cells. Treated and untreated cells are harvested at different time points, mixed at a 1:1 ratio and processed for ER fractionation. Samples containing labeled and unlabeled proteins are separated by SDS–PAGE, bands are digested with trypsin and the resulting peptides analyzed by LC-MS/MS. Proteins are identified using Bioworks software and the Swiss-Prot data-base, whereas ratios of protein expression between treated and untreated cells are quantified using ZoomQuant software. Data visualization is facilitated by GeneSpring software. proteomics PMID:21082445

  5. ChloroKB: A Web Application for the Integration of Knowledge Related to Chloroplast Metabolic Network1[OPEN

    PubMed Central

    Gloaguen, Pauline; Alban, Claude; Ravanel, Stéphane; Seigneurin-Berny, Daphné; Matringe, Michel; Ferro, Myriam; Bruley, Christophe; Rolland, Norbert; Vandenbrouck, Yves

    2017-01-01

    Higher plants, as autotrophic organisms, are effective sources of molecules. They hold great promise for metabolic engineering, but the behavior of plant metabolism at the network level is still incompletely described. Although structural models (stoichiometry matrices) and pathway databases are extremely useful, they cannot describe the complexity of the metabolic context, and new tools are required to visually represent integrated biocurated knowledge for use by both humans and computers. Here, we describe ChloroKB, a Web application (http://chlorokb.fr/) for visual exploration and analysis of the Arabidopsis (Arabidopsis thaliana) metabolic network in the chloroplast and related cellular pathways. The network was manually reconstructed through extensive biocuration to provide transparent traceability of experimental data. Proteins and metabolites were placed in their biological context (spatial distribution within cells, connectivity in the network, participation in supramolecular complexes, and regulatory interactions) using CellDesigner software. The network contains 1,147 reviewed proteins (559 localized exclusively in plastids, 68 in at least one additional compartment, and 520 outside the plastid), 122 proteins awaiting biochemical/genetic characterization, and 228 proteins for which genes have not yet been identified. The visual presentation is intuitive and browsing is fluid, providing instant access to the graphical representation of integrated processes and to a wealth of refined qualitative and quantitative data. ChloroKB will be a significant support for structural and quantitative kinetic modeling, for biological reasoning, when comparing novel data with established knowledge, for computer analyses, and for educational purposes. ChloroKB will be enhanced by continuous updates following contributions from plant researchers. PMID:28442501

  6. Method for the reduction of image content redundancy in large image databases

    DOEpatents

    Tobin, Kenneth William; Karnowski, Thomas P.

    2010-03-02

    A method of increasing information content for content-based image retrieval (CBIR) systems includes the steps of providing a CBIR database, the database having an index for a plurality of stored digital images using a plurality of feature vectors, the feature vectors corresponding to distinct descriptive characteristics of the images. A visual similarity parameter value is calculated based on a degree of visual similarity between features vectors of an incoming image being considered for entry into the database and feature vectors associated with a most similar of the stored images. Based on said visual similarity parameter value it is determined whether to store or how long to store the feature vectors associated with the incoming image in the database.

  7. TRX-LOGOS - a graphical tool to demonstrate DNA information content dependent upon backbone dynamics in addition to base sequence.

    PubMed

    Fortin, Connor H; Schulze, Katharina V; Babbitt, Gregory A

    2015-01-01

    It is now widely-accepted that DNA sequences defining DNA-protein interactions functionally depend upon local biophysical features of DNA backbone that are important in defining sites of binding interaction in the genome (e.g. DNA shape, charge and intrinsic dynamics). However, these physical features of DNA polymer are not directly apparent when analyzing and viewing Shannon information content calculated at single nucleobases in a traditional sequence logo plot. Thus, sequence logos plots are severely limited in that they convey no explicit information regarding the structural dynamics of DNA backbone, a feature often critical to binding specificity. We present TRX-LOGOS, an R software package and Perl wrapper code that interfaces the JASPAR database for computational regulatory genomics. TRX-LOGOS extends the traditional sequence logo plot to include Shannon information content calculated with regard to the dinucleotide-based BI-BII conformation shifts in phosphate linkages on the DNA backbone, thereby adding a visual measure of intrinsic DNA flexibility that can be critical for many DNA-protein interactions. TRX-LOGOS is available as an R graphics module offered at both SourceForge and as a download supplement at this journal. To demonstrate the general utility of TRX logo plots, we first calculated the information content for 416 Saccharomyces cerevisiae transcription factor binding sites functionally confirmed in the Yeastract database and matched to previously published yeast genomic alignments. We discovered that flanking regions contain significantly elevated information content at phosphate linkages than can be observed at nucleobases. We also examined broader transcription factor classifications defined by the JASPAR database, and discovered that many general signatures of transcription factor binding are locally more information rich at the level of DNA backbone dynamics than nucleobase sequence. We used TRX-logos in combination with MEGA 6.0 software for molecular evolutionary genetics analysis to visually compare the human Forkhead box/FOX protein evolution to its binding site evolution. We also compared the DNA binding signatures of human TP53 tumor suppressor determined by two different laboratory methods (SELEX and ChIP-seq). Further analysis of the entire yeast genome, center aligned at the start codon, also revealed a distinct sequence-independent 3 bp periodic pattern in information content, present only in coding region, and perhaps indicative of the non-random organization of the genetic code. TRX-LOGOS is useful in any situation in which important information content in DNA can be better visualized at the positions of phosphate linkages (i.e. dinucleotides) where the dynamic properties of the DNA backbone functions to facilitate DNA-protein interaction.

  8. Windows on the brain: the emerging role of atlases and databases in neuroscience

    NASA Technical Reports Server (NTRS)

    Van Essen, David C.; VanEssen, D. C. (Principal Investigator)

    2002-01-01

    Brain atlases and associated databases have great potential as gateways for navigating, accessing, and visualizing a wide range of neuroscientific data. Recent progress towards realizing this potential includes the establishment of probabilistic atlases, surface-based atlases and associated databases, combined with improvements in visualization capabilities and internet access.

  9. BoreholeAR: A mobile tablet application for effective borehole database visualization using an augmented reality technology

    NASA Astrophysics Data System (ADS)

    Lee, Sangho; Suh, Jangwon; Park, Hyeong-Dong

    2015-03-01

    Boring logs are widely used in geological field studies since the data describes various attributes of underground and surface environments. However, it is difficult to manage multiple boring logs in the field as the conventional management and visualization methods are not suitable for integrating and combining large data sets. We developed an iPad application to enable its user to search the boring log rapidly and visualize them using the augmented reality (AR) technique. For the development of the application, a standard borehole database appropriate for a mobile-based borehole database management system was designed. The application consists of three modules: an AR module, a map module, and a database module. The AR module superimposes borehole data on camera imagery as viewed by the user and provides intuitive visualization of borehole locations. The map module shows the locations of corresponding borehole data on a 2D map with additional map layers. The database module provides data management functions for large borehole databases for other modules. Field survey was also carried out using more than 100,000 borehole data.

  10. MOPED 2.5—An Integrated Multi-Omics Resource: Multi-Omics Profiling Expression Database Now Includes Transcriptomics Data

    PubMed Central

    Montague, Elizabeth; Stanberry, Larissa; Higdon, Roger; Janko, Imre; Lee, Elaine; Anderson, Nathaniel; Choiniere, John; Stewart, Elizabeth; Yandl, Gregory; Broomall, William; Kolker, Natali

    2014-01-01

    Abstract Multi-omics data-driven scientific discovery crucially rests on high-throughput technologies and data sharing. Currently, data are scattered across single omics repositories, stored in varying raw and processed formats, and are often accompanied by limited or no metadata. The Multi-Omics Profiling Expression Database (MOPED, http://moped.proteinspire.org) version 2.5 is a freely accessible multi-omics expression database. Continual improvement and expansion of MOPED is driven by feedback from the Life Sciences Community. In order to meet the emergent need for an integrated multi-omics data resource, MOPED 2.5 now includes gene relative expression data in addition to protein absolute and relative expression data from over 250 large-scale experiments. To facilitate accurate integration of experiments and increase reproducibility, MOPED provides extensive metadata through the Data-Enabled Life Sciences Alliance (DELSA Global, http://delsaglobal.org) metadata checklist. MOPED 2.5 has greatly increased the number of proteomics absolute and relative expression records to over 500,000, in addition to adding more than four million transcriptomics relative expression records. MOPED has an intuitive user interface with tabs for querying different types of omics expression data and new tools for data visualization. Summary information including expression data, pathway mappings, and direct connection between proteins and genes can be viewed on Protein and Gene Details pages. These connections in MOPED provide a context for multi-omics expression data exploration. Researchers are encouraged to submit omics data which will be consistently processed into expression summaries. MOPED as a multi-omics data resource is a pivotal public database, interdisciplinary knowledge resource, and platform for multi-omics understanding. PMID:24910945

  11. Human Mitochondrial Protein Database

    National Institute of Standards and Technology Data Gateway

    SRD 131 Human Mitochondrial Protein Database (Web, free access)   The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.

  12. The Protein Information Resource: an integrated public resource of functional annotation of proteins

    PubMed Central

    Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.

    2002-01-01

    The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247

  13. A Brief Review of RNA–Protein Interaction Database Resources

    PubMed Central

    Yi, Ying; Zhao, Yue; Huang, Yan; Wang, Dong

    2017-01-01

    RNA–Protein interactions play critical roles in various biological processes. By collecting and analyzing the RNA–Protein interactions and binding sites from experiments and predictions, RNA–Protein interaction databases have become an essential resource for the exploration of the transcriptional and post-transcriptional regulatory network. Here, we briefly review several widely used RNA–Protein interaction database resources developed in recent years to provide a guide of these databases. The content and major functions in databases are presented. The brief description of database helps users to quickly choose the database containing information they interested. In short, these RNA–Protein interaction database resources are continually updated, but the current state shows the efforts to identify and analyze the large amount of RNA–Protein interactions. PMID:29657278

  14. The Human Ageing Genomic Resources: online databases and tools for biogerontologists

    PubMed Central

    de Magalhães, João Pedro; Budovsky, Arie; Lehmann, Gilad; Costa, Joana; Li, Yang; Fraifeld, Vadim; Church, George M.

    2009-01-01

    Summary Ageing is a complex, challenging phenomenon that will require multiple, interdisciplinary approaches to unravel its puzzles. To assist basic research on ageing, we developed the Human Ageing Genomic Resources (HAGR). This work provides an overview of the databases and tools in HAGR and describes how the gerontology research community can employ them. Several recent changes and improvements to HAGR are also presented. The two centrepieces in HAGR are GenAge and AnAge. GenAge is a gene database featuring genes associated with ageing and longevity in model organisms, a curated database of genes potentially associated with human ageing, and a list of genes tested for their association with human longevity. A myriad of biological data and information is included for hundreds of genes, making GenAge a reference for research that reflects our current understanding of the genetic basis of ageing. GenAge can also serve as a platform for the systems biology of ageing, and tools for the visualization of protein-protein interactions are also included. AnAge is a database of ageing in animals, featuring over 4,000 species, primarily assembled as a resource for comparative and evolutionary studies of ageing. Longevity records, developmental and reproductive traits, taxonomic information, basic metabolic characteristics, and key observations related to ageing are included in AnAge. Software is also available to aid researchers in the form of Perl modules to automate numerous tasks and as an SPSS script to analyse demographic mortality data. The Human Ageing Genomic Resources are available online at http://genomics.senescence.info. PMID:18986374

  15. NemaPath: online exploration of KEGG-based metabolic pathways for nematodes

    PubMed Central

    Wylie, Todd; Martin, John; Abubucker, Sahar; Yin, Yong; Messina, David; Wang, Zhengyuan; McCarter, James P; Mitreva, Makedonka

    2008-01-01

    Background Nematode.net is a web-accessible resource for investigating gene sequences from parasitic and free-living nematode genomes. Beyond the well-characterized model nematode C. elegans, over 500,000 expressed sequence tags (ESTs) and nearly 600,000 genome survey sequences (GSSs) have been generated from 36 nematode species as part of the Parasitic Nematode Genomics Program undertaken by the Genome Center at Washington University School of Medicine. However, these sequencing data are not present in most publicly available protein databases, which only include sequences in Swiss-Prot. Swiss-Prot, in turn, relies on GenBank/Embl/DDJP for predicted proteins from complete genomes or full-length proteins. Description Here we present the NemaPath pathway server, a web-based pathway-level visualization tool for navigating putative metabolic pathways for over 30 nematode species, including 27 parasites. The NemaPath approach consists of two parts: 1) a backend tool to align and evaluate nematode genomic sequences (curated EST contigs) against the annotated Kyoto Encyclopedia of Genes and Genomes (KEGG) protein database; 2) a web viewing application that displays annotated KEGG pathway maps based on desired confidence levels of primary sequence similarity as defined by a user. NemaPath also provides cross-referenced access to nematode genome information provided by other tools available on Nematode.net, including: detailed NemaGene EST cluster information; putative translations; GBrowse EST cluster views; links from nematode data to external databases for corresponding synonymous C. elegans counterparts, subject matches in KEGG's gene database, and also KEGG Ontology (KO) identification. Conclusion The NemaPath server hosts metabolic pathway mappings for 30 nematode species and is available on the World Wide Web at . The nematode source sequences used for the metabolic pathway mappings are available via FTP , as provided by the Genome Center at Washington University School of Medicine. PMID:18983679

  16. BloodChIP: a database of comparative genome-wide transcription factor binding profiles in human blood cells.

    PubMed

    Chacon, Diego; Beck, Dominik; Perera, Dilmi; Wong, Jason W H; Pimanda, John E

    2014-01-01

    The BloodChIP database (http://www.med.unsw.edu.au/CRCWeb.nsf/page/BloodChIP) supports exploration and visualization of combinatorial transcription factor (TF) binding at a particular locus in human CD34-positive and other normal and leukaemic cells or retrieval of target gene sets for user-defined combinations of TFs across one or more cell types. Increasing numbers of genome-wide TF binding profiles are being added to public repositories, and this trend is likely to continue. For the power of these data sets to be fully harnessed by experimental scientists, there is a need for these data to be placed in context and easily accessible for downstream applications. To this end, we have built a user-friendly database that has at its core the genome-wide binding profiles of seven key haematopoietic TFs in human stem/progenitor cells. These binding profiles are compared with binding profiles in normal differentiated and leukaemic cells. We have integrated these TF binding profiles with chromatin marks and expression data in normal and leukaemic cell fractions. All queries can be exported into external sites to construct TF-gene and protein-protein networks and to evaluate the association of genes with cellular processes and tissue expression.

  17. Domain Hierarchy and closed Loops (DHcL): a server for exploring hierarchy of protein domain structure

    PubMed Central

    Koczyk, Grzegorz; Berezovsky, Igor N.

    2008-01-01

    Domain hierarchy and closed loops (DHcL) (http://sitron.bccs.uib.no/dhcl/) is a web server that delineates energy hierarchy of protein domain structure and detects domains at different levels of this hierarchy. The server also identifies closed loops and van der Waals locks, which constitute a structural basis for the protein domain hierarchy. The DHcL can be a useful tool for an express analysis of protein structures and their alternative domain decompositions. The user submits a PDB identifier(s) or uploads a 3D protein structure in a PDB format. The results of the analysis are the location of domains at different levels of hierarchy, closed loops, van der Waals locks and their interactive visualization. The server maintains a regularly updated database of domains, closed loop and van der Waals locks for all X-ray structures in PDB. DHcL server is available at: http://sitron.bccs.uib.no/dhcl. PMID:18502776

  18. Causal biological network database: a comprehensive platform of causal biological network models focused on the pulmonary and vascular systems.

    PubMed

    Boué, Stéphanie; Talikka, Marja; Westra, Jurjen Willem; Hayes, William; Di Fabio, Anselmo; Park, Jennifer; Schlage, Walter K; Sewer, Alain; Fields, Brett; Ansari, Sam; Martin, Florian; Veljkovic, Emilija; Kenney, Renee; Peitsch, Manuel C; Hoeng, Julia

    2015-01-01

    With the wealth of publications and data available, powerful and transparent computational approaches are required to represent measured data and scientific knowledge in a computable and searchable format. We developed a set of biological network models, scripted in the Biological Expression Language, that reflect causal signaling pathways across a wide range of biological processes, including cell fate, cell stress, cell proliferation, inflammation, tissue repair and angiogenesis in the pulmonary and cardiovascular context. This comprehensive collection of networks is now freely available to the scientific community in a centralized web-based repository, the Causal Biological Network database, which is composed of over 120 manually curated and well annotated biological network models and can be accessed at http://causalbionet.com. The website accesses a MongoDB, which stores all versions of the networks as JSON objects and allows users to search for genes, proteins, biological processes, small molecules and keywords in the network descriptions to retrieve biological networks of interest. The content of the networks can be visualized and browsed. Nodes and edges can be filtered and all supporting evidence for the edges can be browsed and is linked to the original articles in PubMed. Moreover, networks may be downloaded for further visualization and evaluation. Database URL: http://causalbionet.com © The Author(s) 2015. Published by Oxford University Press.

  19. Expression Atlas: gene and protein expression across multiple studies and organisms

    PubMed Central

    Tang, Y Amy; Bazant, Wojciech; Burke, Melissa; Fuentes, Alfonso Muñoz-Pomer; George, Nancy; Koskinen, Satu; Mohammed, Suhaib; Geniza, Matthew; Preece, Justin; Jarnuczak, Andrew F; Huber, Wolfgang; Stegle, Oliver; Brazma, Alvis; Petryszak, Robert

    2018-01-01

    Abstract Expression Atlas (http://www.ebi.ac.uk/gxa) is an added value database that provides information about gene and protein expression in different species and contexts, such as tissue, developmental stage, disease or cell type. The available public and controlled access data sets from different sources are curated and re-analysed using standardized, open source pipelines and made available for queries, download and visualization. As of August 2017, Expression Atlas holds data from 3,126 studies across 33 different species, including 731 from plants. Data from large-scale RNA sequencing studies including Blueprint, PCAWG, ENCODE, GTEx and HipSci can be visualized next to each other. In Expression Atlas, users can query genes or gene-sets of interest and explore their expression across or within species, tissues, developmental stages in a constitutive or differential context, representing the effects of diseases, conditions or experimental interventions. All processed data matrices are available for direct download in tab-delimited format or as R-data. In addition to the web interface, data sets can now be searched and downloaded through the Expression Atlas R package. Novel features and visualizations include the on-the-fly analysis of gene set overlaps and the option to view gene co-expression in experiments investigating constitutive gene expression across tissues or other conditions. PMID:29165655

  20. Protein Information Resource: a community resource for expert annotation of protein data

    PubMed Central

    Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy

    2001-01-01

    The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter­national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041

  1. Analysis of gene expression profile microarray data in complex regional pain syndrome.

    PubMed

    Tan, Wulin; Song, Yiyan; Mo, Chengqiang; Jiang, Shuangjian; Wang, Zhongxing

    2017-09-01

    The aim of the present study was to predict key genes and proteins associated with complex regional pain syndrome (CRPS) using bioinformatics analysis. The gene expression profiling microarray data, GSE47603, which included peripheral blood samples from 4 patients with CRPS and 5 healthy controls, was obtained from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) in CRPS patients compared with healthy controls were identified using the GEO2R online tool. Functional enrichment analysis was then performed using The Database for Annotation Visualization and Integrated Discovery online tool. Protein‑protein interaction (PPI) network analysis was subsequently performed using Search Tool for the Retrieval of Interaction Genes database and analyzed with Cytoscape software. A total of 257 DEGs were identified, including 243 upregulated genes and 14 downregulated ones. Genes in the human leukocyte antigen (HLA) family were most significantly differentially expressed. Enrichment analysis demonstrated that signaling pathways, including immune response, cell motion, adhesion and angiogenesis were associated with CRPS. PPI network analysis revealed that key genes, including early region 1A binding protein p300 (EP300), CREB‑binding protein (CREBBP), signal transducer and activator of transcription (STAT)3, STAT5A and integrin α M were associated with CRPS. The results suggest that the immune response may therefore serve an important role in CRPS development. In addition, genes in the HLA family, such as HLA‑DQB1 and HLA‑DRB1, may present potential biomarkers for the diagnosis of CRPS. Furthermore, EP300, its paralog CREBBP, and the STAT family genes, STAT3 and STAT5 may be important in the development of CRPS.

  2. MMDB: Entrez’s 3D-structure database

    PubMed Central

    Wang, Yanli; Anderson, John B.; Chen, Jie; Geer, Lewis Y.; He, Siqian; Hurwitz, David I.; Liebert, Cynthia A.; Madej, Thomas; Marchler, Gabriele H.; Marchler-Bauer, Aron; Panchenko, Anna R.; Shoemaker, Benjamin A.; Song, James S.; Thiessen, Paul A.; Yamashita, Roxanne A.; Bryant, Stephen H.

    2002-01-01

    Three-dimensional structures are now known within many protein families and it is quite likely, in searching a sequence database, that one will encounter a homolog with known structure. The goal of Entrez’s 3D-structure database is to make this information, and the functional annotation it can provide, easily accessible to molecular biologists. To this end Entrez’s search engine provides three powerful features. (i) Sequence and structure neighbors; one may select all sequences similar to one of interest, for example, and link to any known 3D structures. (ii) Links between databases; one may search by term matching in MEDLINE, for example, and link to 3D structures reported in these articles. (iii) Sequence and structure visualization; identifying a homolog with known structure, one may view molecular-graphic and alignment displays, to infer approximate 3D structure. In this article we focus on two features of Entrez’s Molecular Modeling Database (MMDB) not described previously: links from individual biopolymer chains within 3D structures to a systematic taxonomy of organisms represented in molecular databases, and links from individual chains (and compact 3D domains within them) to structure neighbors, other chains (and 3D domains) with similar 3D structure. MMDB may be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure. PMID:11752307

  3. Computer systems and methods for the query and visualization of multidimensional databases

    DOEpatents

    Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

    2006-08-08

    A method and system for producing graphics. A hierarchical structure of a database is determined. A visual table, comprising a plurality of panes, is constructed by providing a specification that is in a language based on the hierarchical structure of the database. In some cases, this language can include fields that are in the database schema. The database is queried to retrieve a set of tuples in accordance with the specification. A subset of the set of tuples is associated with a pane in the plurality of panes.

  4. Computer systems and methods for the query and visualization of multidimensional database

    DOEpatents

    Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

    2010-05-11

    A method and system for producing graphics. A hierarchical structure of a database is determined. A visual table, comprising a plurality of panes, is constructed by providing a specification that is in a language based on the hierarchical structure of the database. In some cases, this language can include fields that are in the database schema. The database is queried to retrieve a set of tuples in accordance with the specification. A subset of the set of tuples is associated with a pane in the plurality of panes.

  5. MaGnET: Malaria Genome Exploration Tool.

    PubMed

    Sharman, Joanna L; Gerloff, Dietlind L

    2013-09-15

    The Malaria Genome Exploration Tool (MaGnET) is a software tool enabling intuitive 'exploration-style' visualization of functional genomics data relating to the malaria parasite, Plasmodium falciparum. MaGnET provides innovative integrated graphic displays for different datasets, including genomic location of genes, mRNA expression data, protein-protein interactions and more. Any selection of genes to explore made by the user is easily carried over between the different viewers for different datasets, and can be changed interactively at any point (without returning to a search). Free online use (Java Web Start) or download (Java application archive and MySQL database; requires local MySQL installation) at http://malariagenomeexplorer.org joanna.sharman@ed.ac.uk or dgerloff@ffame.org Supplementary data are available at Bioinformatics online.

  6. COGNAT: a web server for comparative analysis of genomic neighborhoods.

    PubMed

    Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y

    2017-11-22

    In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.

  7. Comparative analysis of amino acid composition in the active site of nirk gene encoding copper-containing nitrite reductase (CuNiR) in bacterial spp.

    PubMed

    Adhikari, Utpal Kumar; Rahman, M Mizanur

    2017-04-01

    The nirk gene encoding the copper-containing nitrite reductase (CuNiR), a key catalytic enzyme in the environmental denitrification process that helps to produce nitric oxide from nitrite. The molecular mechanism of denitrification process is definitely complex and in this case a theoretical investigation has been conducted to know the sequence information and amino acid composition of the active site of CuNiR enzyme using various Bioinformatics tools. 10 Fasta formatted sequences were retrieved from the NCBI database and the domain and disordered regions identification and phylogenetic analyses were done on these sequences. The comparative modeling of protein was performed through Modeller 9v14 program and visualized by PyMOL tools. Validated protein models were deposited in the Protein Model Database (PMDB) (PMDB id: PM0080150 to PM0080159). Active sites of nirk encoding CuNiR enzyme were identified by Castp server. The PROCHECK showed significant scores for four protein models in the most favored regions of the Ramachandran plot. Active sites and cavities prediction exhibited that the amino acid, namely Glycine, Alanine, Histidine, Aspartic acid, Glutamic acid, Threonine, and Glutamine were common in four predicted protein models. The present in silico study anticipates that active site analyses result will pave the way for further research on the complex denitrification mechanism of the selected species in the experimental laboratory. Copyright © 2016. Published by Elsevier Ltd.

  8. MassSieve: Panning MS/MS peptide data for proteins

    PubMed Central

    Slotta, Douglas J.; McFarland, Melinda A.; Markey, Sanford P.

    2010-01-01

    We present MassSieve, a Java-based platform for visualization and parsimony analysis of single and comparative LC-MS/MS database search engine results. The success of mass spectrometric peptide sequence assignment algorithms has led to the need for a tool to merge and evaluate the increasing data set sizes that result from LC-MS/MS-based shotgun proteomic experiments. MassSieve supports reports from multiple search engines with differing search characteristics, which can increase peptide sequence coverage and/or identify conflicting or ambiguous spectral assignments. PMID:20564260

  9. ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures.

    PubMed

    Konc, Janez; Cesnik, Tomo; Konc, Joanna Trykowska; Penca, Matej; Janežič, Dušanka

    2012-02-27

    ProBiS-Database is a searchable repository of precalculated local structural alignments in proteins detected by the ProBiS algorithm in the Protein Data Bank. Identification of functionally important binding regions of the protein is facilitated by structural similarity scores mapped to the query protein structure. PDB structures that have been aligned with a query protein may be rapidly retrieved from the ProBiS-Database, which is thus able to generate hypotheses concerning the roles of uncharacterized proteins. Presented with uncharacterized protein structure, ProBiS-Database can discern relationships between such a query protein and other better known proteins in the PDB. Fast access and a user-friendly graphical interface promote easy exploration of this database of over 420 million local structural alignments. The ProBiS-Database is updated weekly and is freely available online at http://probis.cmm.ki.si/database.

  10. The Papillomavirus Episteme: a major update to the papillomavirus sequence database.

    PubMed

    Van Doorslaer, Koenraad; Li, Zhiwen; Xirasagar, Sandhya; Maes, Piet; Kaminsky, David; Liou, David; Sun, Qiang; Kaur, Ramandeep; Huyen, Yentram; McBride, Alison A

    2017-01-04

    The Papillomavirus Episteme (PaVE) is a database of curated papillomavirus genomic sequences, accompanied by web-based sequence analysis tools. This update describes the addition of major new features. The papillomavirus genomes within PaVE have been further annotated, and now includes the major spliced mRNA transcripts. Viral genes and transcripts can be visualized on both linear and circular genome browsers. Evolutionary relationships among PaVE reference protein sequences can be analysed using multiple sequence alignments and phylogenetic trees. To assist in viral discovery, PaVE offers a typing tool; a simplified algorithm to determine whether a newly sequenced virus is novel. PaVE also now contains an image library containing gross clinical and histopathological images of papillomavirus infected lesions. Database URL: https://pave.niaid.nih.gov/. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  11. Profiling of Histone Post-Translational Modifications in Mouse Brain with High-Resolution Top-Down Mass Spectrometry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, Mowei; Paša-Tolić, Ljiljana; Stenoien, David L.

    Histones play central roles in most chromosomal functions and both their basic biology and roles in disease have been the subject of intense study. Since multiple PTMs along the entire protein sequence are potential regulators of histones, a top-down approach, where intact proteins are analyzed, is ultimately required for complete characterization of proteoforms. However, significant challenges remain for top-down histone analysis primarily because of deficiencies in separation/resolving power and effective identification algorithms. Here, we used state of the art mass spectrometry and a bioinformatics workflow for targeted data analysis and visualization. The workflow uses ProMex for intact mass deconvolution, MSPathFindermore » as search engine, and LcMsSpectator as a data visualization tool. ProMex sums across retention time to maximize sensitivity and accuracy for low abundance species in MS1deconvolution. MSPathFinder searches the MS2 data against protein sequence databases with user-defined modifications. LcMsSpectator presents the results from ProMex and MSPathFinder in a format that allows quick manual evaluation of critical attributes for high-confidence identifications. When complemented with the open-modification tool TopPIC, this workflow enabled identification of novel histone PTMs including tyrosine bromination on histone H4 and H2A, H3 glutathionylation, and mapping of conventional PTMs along the entire protein for many histone subunits.« less

  12. ChloroKB: A Web Application for the Integration of Knowledge Related to Chloroplast Metabolic Network.

    PubMed

    Gloaguen, Pauline; Bournais, Sylvain; Alban, Claude; Ravanel, Stéphane; Seigneurin-Berny, Daphné; Matringe, Michel; Tardif, Marianne; Kuntz, Marcel; Ferro, Myriam; Bruley, Christophe; Rolland, Norbert; Vandenbrouck, Yves; Curien, Gilles

    2017-06-01

    Higher plants, as autotrophic organisms, are effective sources of molecules. They hold great promise for metabolic engineering, but the behavior of plant metabolism at the network level is still incompletely described. Although structural models (stoichiometry matrices) and pathway databases are extremely useful, they cannot describe the complexity of the metabolic context, and new tools are required to visually represent integrated biocurated knowledge for use by both humans and computers. Here, we describe ChloroKB, a Web application (http://chlorokb.fr/) for visual exploration and analysis of the Arabidopsis ( Arabidopsis thaliana ) metabolic network in the chloroplast and related cellular pathways. The network was manually reconstructed through extensive biocuration to provide transparent traceability of experimental data. Proteins and metabolites were placed in their biological context (spatial distribution within cells, connectivity in the network, participation in supramolecular complexes, and regulatory interactions) using CellDesigner software. The network contains 1,147 reviewed proteins (559 localized exclusively in plastids, 68 in at least one additional compartment, and 520 outside the plastid), 122 proteins awaiting biochemical/genetic characterization, and 228 proteins for which genes have not yet been identified. The visual presentation is intuitive and browsing is fluid, providing instant access to the graphical representation of integrated processes and to a wealth of refined qualitative and quantitative data. ChloroKB will be a significant support for structural and quantitative kinetic modeling, for biological reasoning, when comparing novel data with established knowledge, for computer analyses, and for educational purposes. ChloroKB will be enhanced by continuous updates following contributions from plant researchers. © 2017 American Society of Plant Biologists. All Rights Reserved.

  13. E-MSD: an integrated data resource for bioinformatics.

    PubMed

    Golovin, A; Oldfield, T J; Tate, J G; Velankar, S; Barton, G J; Boutselakis, H; Dimitropoulos, D; Fillon, J; Hussain, A; Ionides, J M C; John, M; Keller, P A; Krissinel, E; McNeil, P; Naim, A; Newman, R; Pajon, A; Pineda, J; Rachedi, A; Copeland, J; Sitnov, A; Sobhany, S; Suarez-Uruena, A; Swaminathan, G J; Tagari, M; Tromm, S; Vranken, W; Henrick, K

    2004-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the Protein Data Bank (PDB) and to work towards the integration of various bioinformatics data resources. We have implemented a simple form-based interface that allows users to query the MSD directly. The MSD 'atlas pages' show all of the information in the MSD for a particular PDB entry. The group has designed new search interfaces aimed at specific areas of interest, such as the environment of ligands and the secondary structures of proteins. We have also implemented a novel search interface that begins to integrate separate MSD search services in a single graphical tool. We have worked closely with collaborators to build a new visualization tool that can present both structure and sequence data in a unified interface, and this data viewer is now used throughout the MSD services for the visualization and presentation of search results. Examples showcasing the functionality and power of these tools are available from tutorial webpages (http://www. ebi.ac.uk/msd-srv/docs/roadshow_tutorial/).

  14. E-MSD: an integrated data resource for bioinformatics

    PubMed Central

    Golovin, A.; Oldfield, T. J.; Tate, J. G.; Velankar, S.; Barton, G. J.; Boutselakis, H.; Dimitropoulos, D.; Fillon, J.; Hussain, A.; Ionides, J. M. C.; John, M.; Keller, P. A.; Krissinel, E.; McNeil, P.; Naim, A.; Newman, R.; Pajon, A.; Pineda, J.; Rachedi, A.; Copeland, J.; Sitnov, A.; Sobhany, S.; Suarez-Uruena, A.; Swaminathan, G. J.; Tagari, M.; Tromm, S.; Vranken, W.; Henrick, K.

    2004-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the Protein Data Bank (PDB) and to work towards the integration of various bioinformatics data resources. We have implemented a simple form-based interface that allows users to query the MSD directly. The MSD ‘atlas pages’ show all of the information in the MSD for a particular PDB entry. The group has designed new search interfaces aimed at specific areas of interest, such as the environment of ligands and the secondary structures of proteins. We have also implemented a novel search interface that begins to integrate separate MSD search services in a single graphical tool. We have worked closely with collaborators to build a new visualization tool that can present both structure and sequence data in a unified interface, and this data viewer is now used throughout the MSD services for the visualization and presentation of search results. Examples showcasing the functionality and power of these tools are available from tutorial webpages (http://www.ebi.ac.uk/msd-srv/docs/roadshow_tutorial/). PMID:14681397

  15. Intact mass detection, interpretation, and visualization to automate Top-Down proteomics on a large scale

    PubMed Central

    Durbin, Kenneth R.; Tran, John C.; Zamdborg, Leonid; Sweet, Steve M. M.; Catherman, Adam D.; Lee, Ji Eun; Li, Mingxi; Kellie, John F.; Kelleher, Neil L.

    2011-01-01

    Applying high-throughput Top-Down MS to an entire proteome requires a yet-to-be-established model for data processing. Since Top-Down is becoming possible on a large scale, we report our latest software pipeline dedicated to capturing the full value of intact protein data in automated fashion. For intact mass detection, we combine algorithms for processing MS1 data from both isotopically resolved (FT) and charge-state resolved (ion trap) LC-MS data, which are then linked to their fragment ions for database searching using ProSight. Automated determination of human keratin and tubulin isoforms is one result. Optimized for the intricacies of whole proteins, new software modules visualize proteome-scale data based on the LC retention time and intensity of intact masses and enable selective detection of PTMs to automatically screen for acetylation, phosphorylation, and methylation. Software functionality was demonstrated using comparative LC-MS data from yeast strains in addition to human cells undergoing chemical stress. We further these advances as a key aspect of realizing Top-Down MS on a proteomic scale. PMID:20848673

  16. CCProf: exploring conformational change profile of proteins

    PubMed Central

    Chang, Che-Wei; Chou, Chai-Wei; Chang, Darby Tien-Hao

    2016-01-01

    In many biological processes, proteins have important interactions with various molecules such as proteins, ions or ligands. Many proteins undergo conformational changes upon these interactions, where regions with large conformational changes are critical to the interactions. This work presents the CCProf platform, which provides conformational changes of entire proteins, named conformational change profile (CCP) in the context. CCProf aims to be a platform where users can study potential causes of novel conformational changes. It provides 10 biological features, including conformational change, potential binding target site, secondary structure, conservation, disorder propensity, hydropathy propensity, sequence domain, structural domain, phosphorylation site and catalytic site. All these information are integrated into a well-aligned view, so that researchers can capture important relevance between different biological features visually. The CCProf contains 986 187 protein structure pairs for 3123 proteins. In addition, CCProf provides a 3D view in which users can see the protein structures before and after conformational changes as well as binding targets that induce conformational changes. All information (e.g. CCP, binding targets and protein structures) shown in CCProf, including intermediate data are available for download to expedite further analyses. Database URL: http://zoro.ee.ncku.edu.tw/ccprof/ PMID:27016699

  17. Biomine: predicting links between biological entities using network models of heterogeneous databases.

    PubMed

    Eronen, Lauri; Toivonen, Hannu

    2012-06-06

    Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.

  18. Influenza Research Database: An integrated bioinformatics resource for influenza virus research.

    PubMed

    Zhang, Yun; Aevermann, Brian D; Anderson, Tavis K; Burke, David F; Dauphin, Gwenaelle; Gu, Zhiping; He, Sherry; Kumar, Sanjeev; Larsen, Christopher N; Lee, Alexandra J; Li, Xiaomei; Macken, Catherine; Mahaffey, Colin; Pickett, Brett E; Reardon, Brian; Smith, Thomas; Stewart, Lucy; Suloway, Christian; Sun, Guangyu; Tong, Lei; Vincent, Amy L; Walters, Bryan; Zaremba, Sam; Zhao, Hongtao; Zhou, Liwei; Zmasek, Christian; Klem, Edward B; Scheuermann, Richard H

    2017-01-04

    The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics and therapeutics against influenza virus by providing a comprehensive collection of influenza-related data integrated from various sources, a growing suite of analysis and visualization tools for data mining and hypothesis generation, personal workbench spaces for data storage and sharing, and active user community support. Here, we describe the recent improvements in IRD including the use of cloud and high performance computing resources, analysis and visualization of user-provided sequence data with associated metadata, predictions of novel variant proteins, annotations of phenotype-associated sequence markers and their predicted phenotypic effects, hemagglutinin (HA) clade classifications, an automated tool for HA subtype numbering conversion, linkouts to disease event data and the addition of host factor and antiviral drug components. All data and tools are freely available without restriction from the IRD website at https://www.fludb.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Choosing an Optimal Database for Protein Identification from Tandem Mass Spectrometry Data.

    PubMed

    Kumar, Dhirendra; Yadav, Amit Kumar; Dash, Debasis

    2017-01-01

    Database searching is the preferred method for protein identification from digital spectra of mass to charge ratios (m/z) detected for protein samples through mass spectrometers. The search database is one of the major influencing factors in discovering proteins present in the sample and thus in deriving biological conclusions. In most cases the choice of search database is arbitrary. Here we describe common search databases used in proteomic studies and their impact on final list of identified proteins. We also elaborate upon factors like composition and size of the search database that can influence the protein identification process. In conclusion, we suggest that choice of the database depends on the type of inferences to be derived from proteomics data. However, making additional efforts to build a compact and concise database for a targeted question should generally be rewarding in achieving confident protein identifications.

  20. The Pfam protein families database

    PubMed Central

    Punta, Marco; Coggill, Penny C.; Eberhardt, Ruth Y.; Mistry, Jaina; Tate, John; Boursnell, Chris; Pang, Ningze; Forslund, Kristoffer; Ceric, Goran; Clements, Jody; Heger, Andreas; Holm, Liisa; Sonnhammer, Erik L. L.; Eddy, Sean R.; Bateman, Alex; Finn, Robert D.

    2012-01-01

    Pfam is a widely used database of protein families, currently containing more than 13 000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the ‘sunburst’ representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam. PMID:22127870

  1. LCR-eXXXplorer: a web platform to search, visualize and share data for low complexity regions in protein sequences.

    PubMed

    Kirmitzoglou, Ioannis; Promponas, Vasilis J

    2015-07-01

    Local compositionally biased and low complexity regions (LCRs) in amino acid sequences have initially attracted the interest of researchers due to their implication in generating artifacts in sequence database searches. There is accumulating evidence of the biological significance of LCRs both in physiological and in pathological situations. Nonetheless, LCR-related algorithms and tools have not gained wide appreciation across the research community, partly due to the fact that only a handful of user-friendly software is currently freely available. We developed LCR-eXXXplorer, an extensible online platform attempting to fill this gap. LCR-eXXXplorer offers tools for displaying LCRs from the UniProt/SwissProt knowledgebase, in combination with other relevant protein features, predicted or experimentally verified. Moreover, users may perform powerful queries against a custom designed sequence/LCR-centric database. We anticipate that LCR-eXXXplorer will be a useful starting point in research efforts for the elucidation of the structure, function and evolution of proteins with LCRs. LCR-eXXXplorer is freely available at the URL http://repeat.biol.ucy.ac.cy/lcr-exxxplorer. vprobon@ucy.ac.cy Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  2. PATRIC, the bacterial bioinformatics database and analysis resource.

    PubMed

    Wattam, Alice R; Abraham, David; Dalay, Oral; Disz, Terry L; Driscoll, Timothy; Gabbard, Joseph L; Gillespie, Joseph J; Gough, Roger; Hix, Deborah; Kenyon, Ronald; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K; Olson, Robert; Overbeek, Ross; Pusch, Gordon D; Shukla, Maulik; Schulman, Julie; Stevens, Rick L; Sullivan, Daniel E; Vonstein, Veronika; Warren, Andrew; Will, Rebecca; Wilson, Meredith J C; Yoo, Hyun Seung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno W

    2014-01-01

    The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10,000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.

  3. GermOnline 4.0 is a genomics gateway for germline development, meiosis and the mitotic cell cycle.

    PubMed

    Lardenois, Aurélie; Gattiker, Alexandre; Collin, Olivier; Chalmel, Frédéric; Primig, Michael

    2010-01-01

    GermOnline 4.0 is a cross-species database portal focusing on high-throughput expression data relevant for germline development, the meiotic cell cycle and mitosis in healthy versus malignant cells. It is thus a source of information for life scientists as well as clinicians who are interested in gene expression and regulatory networks. The GermOnline gateway provides unlimited access to information produced with high-density oligonucleotide microarrays (3'-UTR GeneChips), genome-wide protein-DNA binding assays and protein-protein interaction studies in the context of Ensembl genome annotation. Samples used to produce high-throughput expression data and to carry out genome-wide in vivo DNA binding assays are annotated via the MIAME-compliant Multiomics Information Management and Annotation System (MIMAS 3.0). Furthermore, the Saccharomyces Genomics Viewer (SGV) was developed and integrated into the gateway. SGV is a visualization tool that outputs genome annotation and DNA-strand specific expression data produced with high-density oligonucleotide tiling microarrays (Sc_tlg GeneChips) which cover the complete budding yeast genome on both DNA strands. It facilitates the interpretation of expression levels and transcript structures determined for various cell types cultured under different growth and differentiation conditions. Database URL: www.germonline.org/

  4. Fragile X Mental Retardation Protein Is Required to Maintain Visual Conditioning-Induced Behavioral Plasticity by Limiting Local Protein Synthesis

    PubMed Central

    Liu, Han-Hsuan

    2016-01-01

    Fragile X mental retardation protein (FMRP) is thought to regulate neuronal plasticity by limiting dendritic protein synthesis, but direct demonstration of a requirement for FMRP control of local protein synthesis during behavioral plasticity is lacking. Here we tested whether FMRP knockdown in Xenopus optic tectum affects local protein synthesis in vivo and whether FMRP knockdown affects protein synthesis-dependent visual avoidance behavioral plasticity. We tagged newly synthesized proteins by incorporation of the noncanonical amino acid azidohomoalanine and visualized them with fluorescent noncanonical amino acid tagging (FUNCAT). Visual conditioning and FMRP knockdown produce similar increases in FUNCAT in tectal neuropil. Induction of visual conditioning-dependent behavioral plasticity occurs normally in FMRP knockdown animals, but plasticity degrades over 24 h. These results indicate that FMRP affects visual conditioning-induced local protein synthesis and is required to maintain the visual conditioning-induced behavioral plasticity. SIGNIFICANCE STATEMENT Fragile X syndrome (FXS) is the most common form of inherited intellectual disability. Exaggerated dendritic protein synthesis resulting from loss of fragile X mental retardation protein (FMRP) is thought to underlie cognitive deficits in FXS, but no direct evidence has demonstrated that FMRP-regulated dendritic protein synthesis affects behavioral plasticity in intact animals. Xenopus tadpoles exhibit a visual avoidance behavior that improves with visual conditioning in a protein synthesis-dependent manner. We showed that FMRP knockdown and visual conditioning dramatically increase protein synthesis in neuronal processes. Furthermore, induction of visual conditioning-dependent behavioral plasticity occurs normally after FMRP knockdown, but performance rapidly deteriorated in the absence of FMRP. These studies show that FMRP negatively regulates local protein synthesis and is required to maintain visual conditioning-induced behavioral plasticity in vivo. PMID:27383604

  5. Fragile X Mental Retardation Protein Is Required to Maintain Visual Conditioning-Induced Behavioral Plasticity by Limiting Local Protein Synthesis.

    PubMed

    Liu, Han-Hsuan; Cline, Hollis T

    2016-07-06

    Fragile X mental retardation protein (FMRP) is thought to regulate neuronal plasticity by limiting dendritic protein synthesis, but direct demonstration of a requirement for FMRP control of local protein synthesis during behavioral plasticity is lacking. Here we tested whether FMRP knockdown in Xenopus optic tectum affects local protein synthesis in vivo and whether FMRP knockdown affects protein synthesis-dependent visual avoidance behavioral plasticity. We tagged newly synthesized proteins by incorporation of the noncanonical amino acid azidohomoalanine and visualized them with fluorescent noncanonical amino acid tagging (FUNCAT). Visual conditioning and FMRP knockdown produce similar increases in FUNCAT in tectal neuropil. Induction of visual conditioning-dependent behavioral plasticity occurs normally in FMRP knockdown animals, but plasticity degrades over 24 h. These results indicate that FMRP affects visual conditioning-induced local protein synthesis and is required to maintain the visual conditioning-induced behavioral plasticity. Fragile X syndrome (FXS) is the most common form of inherited intellectual disability. Exaggerated dendritic protein synthesis resulting from loss of fragile X mental retardation protein (FMRP) is thought to underlie cognitive deficits in FXS, but no direct evidence has demonstrated that FMRP-regulated dendritic protein synthesis affects behavioral plasticity in intact animals. Xenopus tadpoles exhibit a visual avoidance behavior that improves with visual conditioning in a protein synthesis-dependent manner. We showed that FMRP knockdown and visual conditioning dramatically increase protein synthesis in neuronal processes. Furthermore, induction of visual conditioning-dependent behavioral plasticity occurs normally after FMRP knockdown, but performance rapidly deteriorated in the absence of FMRP. These studies show that FMRP negatively regulates local protein synthesis and is required to maintain visual conditioning-induced behavioral plasticity in vivo. Copyright © 2016 the authors 0270-6474/16/367325-15$15.00/0.

  6. MIPS: analysis and annotation of proteins from whole genomes.

    PubMed

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  7. FragFit: a web-application for interactive modeling of protein segments into cryo-EM density maps.

    PubMed

    Tiemann, Johanna K S; Rose, Alexander S; Ismer, Jochen; Darvish, Mitra D; Hilal, Tarek; Spahn, Christian M T; Hildebrand, Peter W

    2018-05-21

    Cryo-electron microscopy (cryo-EM) is a standard method to determine the three-dimensional structures of molecular complexes. However, easy to use tools for modeling of protein segments into cryo-EM maps are sparse. Here, we present the FragFit web-application, a web server for interactive modeling of segments of up to 35 amino acids length into cryo-EM density maps. The fragments are provided by a regularly updated database containing at the moment about 1 billion entries extracted from PDB structures and can be readily integrated into a protein structure. Fragments are selected based on geometric criteria, sequence similarity and fit into a given cryo-EM density map. Web-based molecular visualization with the NGL Viewer allows interactive selection of fragments. The FragFit web-application, accessible at http://proteinformatics.de/FragFit, is free and open to all users, without any login requirements.

  8. A Visual Interface for Querying Heterogeneous Phylogenetic Databases.

    PubMed

    Jamil, Hasan M

    2017-01-01

    Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.

  9. Prediction of novel target genes and pathways involved in bevacizumab-resistant colorectal cancer

    PubMed Central

    Makondi, Precious Takondwa; Lee, Chia-Hwa; Huang, Chien-Yu; Chu, Chi-Ming; Chang, Yu-Jia

    2018-01-01

    Bevacizumab combined with cytotoxic chemotherapy is the backbone of metastatic colorectal cancer (mCRC) therapy; however, its treatment efficacy is hampered by therapeutic resistance. Therefore, understanding the mechanisms underlying bevacizumab resistance is crucial to increasing the therapeutic efficacy of bevacizumab. The Gene Expression Omnibus (GEO) database (dataset, GSE86525) was used to identify the key genes and pathways involved in bevacizumab-resistant mCRC. The GEO2R web tool was used to identify differentially expressed genes (DEGs). Functional and pathway enrichment analyses of the DEGs were performed using the Database for Annotation, Visualization, and Integrated Discovery(DAVID). Protein–protein interaction (PPI) networks were established using the Search Tool for the Retrieval of Interacting Genes/Proteins database(STRING) and visualized using Cytoscape software. A total of 124 DEGs were obtained, 57 of which upregulated and 67 were downregulated. PPI network analysis showed that seven upregulated genes and nine downregulated genes exhibited high PPI degrees. In the functional enrichment, the DEGs were mainly enriched in negative regulation of phosphate metabolic process and positive regulation of cell cycle process gene ontologies (GOs); the enriched pathways were the phosphoinositide 3-kinase-serine/threonine kinase signaling pathway, bladder cancer, and microRNAs in cancer. Cyclin-dependent kinase inhibitor 1A(CDKN1A), toll-like receptor 4 (TLR4), CD19 molecule (CD19), breast cancer 1, early onset (BRCA1), platelet-derived growth factor subunit A (PDGFA), and matrix metallopeptidase 1 (MMP1) were the DEGs involved in the pathways and the PPIs. The clinical validation of the DEGs in mCRC (TNM clinical stages 3 and 4) revealed that high PDGFA expression levels were associated with poor overall survival, whereas high BRCA1 and MMP1 expression levels were associated with favorable progress free survival(PFS). The identified genes and pathways can be potential targets and predictors of therapeutic resistance and prognosis in bevacizumab-treated patients with mCRC. PMID:29342159

  10. A Visual Analytics Paradigm Enabling Trillion-Edge Graph Exploration

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wong, Pak C.; Haglin, David J.; Gillen, David S.

    We present a visual analytics paradigm and a system prototype for exploring web-scale graphs. A web-scale graph is described as a graph with ~one trillion edges and ~50 billion vertices. While there is an aggressive R&D effort in processing and exploring web-scale graphs among internet vendors such as Facebook and Google, visualizing a graph of that scale still remains an underexplored R&D area. The paper describes a nontraditional peek-and-filter strategy that facilitates the exploration of a graph database of unprecedented size for visualization and analytics. We demonstrate that our system prototype can 1) preprocess a graph with ~25 billion edgesmore » in less than two hours and 2) support database query and visualization on the processed graph database afterward. Based on our computational performance results, we argue that we most likely will achieve the one trillion edge mark (a computational performance improvement of 40 times) for graph visual analytics in the near future.« less

  11. Autonomous facial recognition system inspired by human visual system based logarithmical image visualization technique

    NASA Astrophysics Data System (ADS)

    Wan, Qianwen; Panetta, Karen; Agaian, Sos

    2017-05-01

    Autonomous facial recognition system is widely used in real-life applications, such as homeland border security, law enforcement identification and authentication, and video-based surveillance analysis. Issues like low image quality, non-uniform illumination as well as variations in poses and facial expressions can impair the performance of recognition systems. To address the non-uniform illumination challenge, we present a novel robust autonomous facial recognition system inspired by the human visual system based, so called, logarithmical image visualization technique. In this paper, the proposed method, for the first time, utilizes the logarithmical image visualization technique coupled with the local binary pattern to perform discriminative feature extraction for facial recognition system. The Yale database, the Yale-B database and the ATT database are used for computer simulation accuracy and efficiency testing. The extensive computer simulation demonstrates the method's efficiency, accuracy, and robustness of illumination invariance for facial recognition.

  12. Extension modules for storage, visualization and querying of genomic, genetic and breeding data in Tripal databases

    PubMed Central

    Lee, Taein; Cheng, Chun-Huai; Ficklin, Stephen; Yu, Jing; Humann, Jodi; Main, Dorrie

    2017-01-01

    Abstract Tripal is an open-source database platform primarily used for development of genomic, genetic and breeding databases. We report here on the release of the Chado Loader, Chado Data Display and Chado Search modules to extend the functionality of the core Tripal modules. These new extension modules provide additional tools for (1) data loading, (2) customized visualization and (3) advanced search functions for supported data types such as organism, marker, QTL/Mendelian Trait Loci, germplasm, map, project, phenotype, genotype and their respective metadata. The Chado Loader module provides data collection templates in Excel with defined metadata and data loaders with front end forms. The Chado Data Display module contains tools to visualize each data type and the metadata which can be used as is or customized as desired. The Chado Search module provides search and download functionality for the supported data types. Also included are the tools to visualize map and species summary. The use of materialized views in the Chado Search module enables better performance as well as flexibility of data modeling in Chado, allowing existing Tripal databases with different metadata types to utilize the module. These Tripal Extension modules are implemented in the Genome Database for Rosaceae (rosaceae.org), CottonGen (cottongen.org), Citrus Genome Database (citrusgenomedb.org), Genome Database for Vaccinium (vaccinium.org) and the Cool Season Food Legume Database (coolseasonfoodlegume.org). Database URL: https://www.citrusgenomedb.org/, https://www.coolseasonfoodlegume.org/, https://www.cottongen.org/, https://www.rosaceae.org/, https://www.vaccinium.org/

  13. Research resource: Update and extension of a glycoprotein hormone receptors web application.

    PubMed

    Kreuchwig, Annika; Kleinau, Gunnar; Kreuchwig, Franziska; Worth, Catherine L; Krause, Gerd

    2011-04-01

    The SSFA-GPHR (Sequence-Structure-Function-Analysis of Glycoprotein Hormone Receptors) database provides a comprehensive set of mutation data for the glycoprotein hormone receptors (covering the lutropin, the FSH, and the TSH receptors). Moreover, it provides a platform for comparison and investigation of these homologous receptors and helps in understanding protein malfunctions associated with several diseases. Besides extending the data set (> 1100 mutations), the database has been completely redesigned and several novel features and analysis tools have been added to the web site. These tools allow the focused extraction of semiquantitative mutant data from the GPHR subtypes and different experimental approaches. Functional and structural data of the GPHRs are now linked interactively at the web interface, and new tools for data visualization (on three-dimensional protein structures) are provided. The interpretation of functional findings is supported by receptor morphings simulating intramolecular changes during the activation process, which thus help to trace the potential function of each amino acid and provide clues to the local structural environment, including potentially relocated spatial counterpart residues. Furthermore, double and triple mutations are newly included to allow the analysis of their functional effects related to their spatial interrelationship in structures or homology models. A new important feature is the search option and data visualization by interactive and user-defined snake-plots. These new tools allow fast and easy searches for specific functional data and thereby give deeper insights in the mechanisms of hormone binding, signal transduction, and signaling regulation. The web application "Sequence-Structure-Function-Analysis of GPHRs" is accessible on the internet at http://www.ssfa-gphr.de/.

  14. MIPS: a database for genomes and protein sequences.

    PubMed Central

    Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D

    1999-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138

  15. NABIC: A New Access Portal to Search, Visualize, and Share Agricultural Genomics Data.

    PubMed

    Seol, Young-Joo; Lee, Tae-Ho; Park, Dong-Suk; Kim, Chang-Kug

    2016-01-01

    The National Agricultural Biotechnology Information Center developed an access portal to search, visualize, and share agricultural genomics data with a focus on South Korean information and resources. The portal features an agricultural biotechnology database containing a wide range of omics data from public and proprietary sources. We collected 28.4 TB of data from 162 agricultural organisms, with 10 types of omics data comprising next-generation sequencing sequence read archive, genome, gene, nucleotide, DNA chip, expressed sequence tag, interactome, protein structure, molecular marker, and single-nucleotide polymorphism datasets. Our genomic resources contain information on five animals, seven plants, and one fungus, which is accessed through a genome browser. We also developed a data submission and analysis system as a web service, with easy-to-use functions and cutting-edge algorithms, including those for handling next-generation sequencing data.

  16. Gene Composer: database software for protein construct design, codon engineering, and gene synthesis

    PubMed Central

    Lorimer, Don; Raymond, Amy; Walchli, John; Mixon, Mark; Barrow, Adrienne; Wallace, Ellen; Grice, Rena; Burgin, Alex; Stewart, Lance

    2009-01-01

    Background To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. Results An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. Conclusion We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene assembly procedure with mis-match specific endonuclease error correction in combination with PIPE cloning. In a sister manuscript we present data on how Gene Composer designed genes and protein constructs can result in improved protein production for structural studies. PMID:19383142

  17. Gene composer: database software for protein construct design, codon engineering, and gene synthesis.

    PubMed

    Lorimer, Don; Raymond, Amy; Walchli, John; Mixon, Mark; Barrow, Adrienne; Wallace, Ellen; Grice, Rena; Burgin, Alex; Stewart, Lance

    2009-04-21

    To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene assembly procedure with mis-match specific endonuclease error correction in combination with PIPE cloning. In a sister manuscript we present data on how Gene Composer designed genes and protein constructs can result in improved protein production for structural studies.

  18. The Halophile protein database.

    PubMed

    Sharma, Naveen; Farooqi, Mohammad Samir; Chaturvedi, Krishna Kumar; Lal, Shashi Bhushan; Grover, Monendra; Rai, Anil; Pandey, Pankaj

    2014-01-01

    Halophilic archaea/bacteria adapt to different salt concentration, namely extreme, moderate and low. These type of adaptations may occur as a result of modification of protein structure and other changes in different cell organelles. Thus proteins may play an important role in the adaptation of halophilic archaea/bacteria to saline conditions. The Halophile protein database (HProtDB) is a systematic attempt to document the biochemical and biophysical properties of proteins from halophilic archaea/bacteria which may be involved in adaptation of these organisms to saline conditions. In this database, various physicochemical properties such as molecular weight, theoretical pI, amino acid composition, atomic composition, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (Gravy) have been listed. These physicochemical properties play an important role in identifying the protein structure, bonding pattern and function of the specific proteins. This database is comprehensive, manually curated, non-redundant catalogue of proteins. The database currently contains 59 897 proteins properties extracted from 21 different strains of halophilic archaea/bacteria. The database can be accessed through link. Database URL: http://webapp.cabgrid.res.in/protein/ © The Author(s) 2014. Published by Oxford University Press.

  19. Cotton QTLdb: a cotton QTL database for QTL analysis, visualization, and comparison between Gossypium hirsutum and G. hirsutum × G. barbadense populations.

    PubMed

    Said, Joseph I; Knapka, Joseph A; Song, Mingzhou; Zhang, Jinfa

    2015-08-01

    A specialized database currently containing more than 2200 QTL is established, which allows graphic presentation, visualization and submission of QTL. In cotton quantitative trait loci (QTL), studies are focused on intraspecific Gossypium hirsutum and interspecific G. hirsutum × G. barbadense populations. These two populations are commercially important for the textile industry and are evaluated for fiber quality, yield, seed quality, resistance, physiological, and morphological trait QTL. With meta-analysis data based on the vast amount of QTL studies in cotton it will be beneficial to organize the data into a functional database for the cotton community. Here we provide a tool for cotton researchers to visualize previously identified QTL and submit their own QTL to the Cotton QTLdb database. The database provides the user with the option of selecting various QTL trait types from either the G. hirsutum or G. hirsutum × G. barbadense populations. Based on the user's QTL trait selection, graphical representations of chromosomes of the population selected are displayed in publication ready images. The database also provides users with trait information on QTL, LOD scores, and explained phenotypic variances for all QTL selected. The CottonQTLdb database provides cotton geneticist and breeders with statistical data on cotton QTL previously identified and provides a visualization tool to view QTL positions on chromosomes. Currently the database (Release 1) contains 2274 QTLs, and succeeding QTL studies will be updated regularly by the curators and members of the cotton community that contribute their data to keep the database current. The database is accessible from http://www.cottonqtldb.org.

  20. SNPversity: a web-based tool for visualizing diversity

    PubMed Central

    Schott, David A; Vinnakota, Abhinav G; Portwood, John L; Andorf, Carson M

    2018-01-01

    Abstract Many stand-alone desktop software suites exist to visualize single nucleotide polymorphism (SNP) diversity, but web-based software that can be easily implemented and used for biological databases is absent. SNPversity was created to answer this need by building an open-source visualization tool that can be implemented on a Unix-like machine and served through a web browser that can be accessible worldwide. SNPversity consists of a HDF5 database back-end for SNPs, a data exchange layer powered by TASSEL libraries that represent data in JSON format, and an interface layer using PHP to visualize SNP information. SNPversity displays data in real-time through a web browser in grids that are color-coded according to a given SNP’s allelic status and mutational state. SNPversity is currently available at MaizeGDB, the maize community’s database, and will be soon available at GrainGenes, the clade-oriented database for Triticeae and Avena species, including wheat, barley, rye, and oat. The code and documentation are uploaded onto github, and they are freely available to the public. We expect that the tool will be highly useful for other biological databases with a similar need to display SNP diversity through their web interfaces. Database URL: https://www.maizegdb.org/snpversity PMID:29688387

  1. Visual outcome of cataract surgery; study from the European Registry of Quality Outcomes for Cataract and Refractive Surgery.

    PubMed

    Lundström, Mats; Barry, Peter; Henry, Ype; Rosen, Paul; Stenevi, Ulf

    2013-05-01

    To analyze the visual outcome after cataract surgery. Cataract surgery clinics in 15 European countries. Database study. Data were drawn from case series of cataract extractions reported to the European Registry of Quality Outcomes for Cataract and Refractive Surgery database. These data were entered into the database via the Web by surgeons or by transfer from existing national registries or electronic medical record systems. The database contains individual anonymous data on preoperative, intraoperative, and postoperative measurements. Data on 368,256 cataract extractions were available for analysis. The best visual outcome was achieved in age groups 40 to 74 years, and men showed a higher percentage of excellent vision (1.0 [20/20] or better) than women. A corrected distance visual acuity (CDVA) of 0.5 (20/40) or better and of 1.0 (20/20) or better was achieved in 94.3% and 61.3% of cases, respectively. Ocular comorbidity and postoperative complications were the strongest influences on the visual outcome; however, surgical complications and ocular changes requiring complex surgery also had a negative influence. Deterioration of visual acuity after the surgery (n= 6112 [1.7% of all cases]) was most common in patients with a good preoperative visual acuity. The visual outcomes of cataract surgery were excellent, with 61.3% of patients achieving a corrected distance visual acuity of 1.0 (20/20) or better. Age and sex influenced the visual outcomes, but the greatest influences were short-term postoperative complications, ocular comorbidity, surgical complications, and complex surgery. A weakness of the study could be that some of the data is self-reported to the registry. Copyright © 2013 ASCRS and ESCRS. Published by Elsevier Inc. All rights reserved.

  2. MIPS: analysis and annotation of proteins from whole genomes

    PubMed Central

    Mewes, H. W.; Amid, C.; Arnold, R.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Münsterkötter, M.; Pagel, P.; Strack, N.; Stümpflen, V.; Warfsmann, J.; Ruepp, A.

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein–protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354

  3. cPath: open source software for collecting, storing, and querying biological pathways.

    PubMed

    Cerami, Ethan G; Bader, Gary D; Gross, Benjamin E; Sander, Chris

    2006-11-13

    Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required. We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use. cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling.

  4. Isolation and identification of peanut leaf proteins regulated by water stress.

    PubMed

    Akkasaeng, Chutipong; Tantisuwichwong, Napaporn; Chairam, Issariya; Prakrongrak, Narumon; Jogloy, Sanun; Pathanothai, Aran

    2007-05-15

    Water deficits trigger signaling cascades leading to modulation of protein expression in plant tissues. Identification of peanut leaf proteins regulated by water stress provides some insights of cellular and molecular response of peanut plants to drought stress. Peanut variety Khon Kaen 4, a water-stress sensitive variety, was grown in a growth chamber under controlled environment. Water stress was imposed on day 30 after seedling emergence by withholding watering peanut plants for 6 days as compared to plants adequately supplied with water. Total protein were prepared from a leaflet of fully expanded leaf on the main stem. Proteins were separated in duplicated gels using two-dimensional gel electrophoresis and visualized by silver nitrate staining. Image analysis was performed using ImageMaster 2D Platinum 5.0 to determine proteins regulated by water stress. Molecular mass and isoelectric point of each regulated protein were used in database queries for protein identification. One protein was induced under water stress and the homologous protein was identified as Serine/threonine-protein phosphatase PP 1. Five proteins were down-regulated by water deficit. The homologous proteins were chaperone protein DNAJ, auxin-responsive protein IAA29, peroxidase 43, caffeoyl-CoA O-methyltransferase and SNF1-related protein kinase regulatory subunit beta-2. Down-regulated proteins may be associated with sensitivity of the peanut variety to water stress.

  5. MitProNet: A Knowledgebase and Analysis Platform of Proteome, Interactome and Diseases for Mammalian Mitochondria

    PubMed Central

    Mao, Song; Chai, Xiaoqiang; Hu, Yuling; Hou, Xugang; Tang, Yiheng; Bi, Cheng; Li, Xiao

    2014-01-01

    Mitochondrion plays a central role in diverse biological processes in most eukaryotes, and its dysfunctions are critically involved in a large number of diseases and the aging process. A systematic identification of mitochondrial proteomes and characterization of functional linkages among mitochondrial proteins are fundamental in understanding the mechanisms underlying biological functions and human diseases associated with mitochondria. Here we present a database MitProNet which provides a comprehensive knowledgebase for mitochondrial proteome, interactome and human diseases. First an inventory of mammalian mitochondrial proteins was compiled by widely collecting proteomic datasets, and the proteins were classified by machine learning to achieve a high-confidence list of mitochondrial proteins. The current version of MitProNet covers 1124 high-confidence proteins, and the remainders were further classified as middle- or low-confidence. An organelle-specific network of functional linkages among mitochondrial proteins was then generated by integrating genomic features encoded by a wide range of datasets including genomic context, gene expression profiles, protein-protein interactions, functional similarity and metabolic pathways. The functional-linkage network should be a valuable resource for the study of biological functions of mitochondrial proteins and human mitochondrial diseases. Furthermore, we utilized the network to predict candidate genes for mitochondrial diseases using prioritization algorithms. All proteins, functional linkages and disease candidate genes in MitProNet were annotated according to the information collected from their original sources including GO, GEO, OMIM, KEGG, MIPS, HPRD and so on. MitProNet features a user-friendly graphic visualization interface to present functional analysis of linkage networks. As an up-to-date database and analysis platform, MitProNet should be particularly helpful in comprehensive studies of complicated biological mechanisms underlying mitochondrial functions and human mitochondrial diseases. MitProNet is freely accessible at http://bio.scu.edu.cn:8085/MitProNet. PMID:25347823

  6. WGE: a CRISPR database for genome engineering.

    PubMed

    Hodgkins, Alex; Farne, Anna; Perera, Sajith; Grego, Tiago; Parry-Smith, David J; Skarnes, William C; Iyer, Vivek

    2015-09-15

    The rapid development of CRISPR-Cas9 mediated genome editing techniques has given rise to a number of online and stand-alone tools to find and score CRISPR sites for whole genomes. Here we describe the Wellcome Trust Sanger Institute Genome Editing database (WGE), which uses novel methods to compute, visualize and select optimal CRISPR sites in a genome browser environment. The WGE database currently stores single and paired CRISPR sites and pre-calculated off-target information for CRISPRs located in the mouse and human exomes. Scoring and display of off-target sites is simple, and intuitive, and filters can be applied to identify high-quality CRISPR sites rapidly. WGE also provides a tool for the design and display of gene targeting vectors in the same genome browser, along with gene models, protein translation and variation tracks. WGE is open, extensible and can be set up to compute and present CRISPR sites for any genome. The WGE database is freely available at www.sanger.ac.uk/htgt/wge : vvi@sanger.ac.uk or skarnes@sanger.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  7. The Nuclear Protein Database (NPD): sub-nuclear localisation and functional annotation of the nuclear proteome

    PubMed Central

    Dellaire, G.; Farrall, R.; Bickmore, W.A.

    2003-01-01

    The Nuclear Protein Database (NPD) is a curated database that contains information on more than 1300 vertebrate proteins that are thought, or are known, to localise to the cell nucleus. Each entry is annotated with information on predicted protein size and isoelectric point, as well as any repeats, motifs or domains within the protein sequence. In addition, information on the sub-nuclear localisation of each protein is provided and the biological and molecular functions are described using Gene Ontology (GO) terms. The database is searchable by keyword, protein name, sub-nuclear compartment and protein domain/motif. Links to other databases are provided (e.g. Entrez, SWISS-PROT, OMIM, PubMed, PubMed Central). Thus, NPD provides a gateway through which the nuclear proteome may be explored. The database can be accessed at http://npd.hgu.mrc.ac.uk and is updated monthly. PMID:12520015

  8. Visualizing Data and the Online FRED Database

    ERIC Educational Resources Information Center

    Méndez-Carbajo, Diego

    2015-01-01

    The author discusses a pedagogical strategy based on data visualization and analysis in the teaching of intermediate macroeconomics and financial economics. In these short projects, students collect and manipulate economic data from the online Federal Reserve Economic Database (FRED) in order to illustrate theoretical relationships discussed in…

  9. DataHub knowledge based assistance for science visualization and analysis using large distributed databases

    NASA Technical Reports Server (NTRS)

    Handley, Thomas H., Jr.; Collins, Donald J.; Doyle, Richard J.; Jacobson, Allan S.

    1991-01-01

    Viewgraphs on DataHub knowledge based assistance for science visualization and analysis using large distributed databases. Topics covered include: DataHub functional architecture; data representation; logical access methods; preliminary software architecture; LinkWinds; data knowledge issues; expert systems; and data management.

  10. WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research.

    PubMed

    Slenter, Denise N; Kutmon, Martina; Hanspers, Kristina; Riutta, Anders; Windsor, Jacob; Nunes, Nuno; Mélius, Jonathan; Cirillo, Elisa; Coort, Susan L; Digles, Daniela; Ehrhart, Friederike; Giesbertz, Pieter; Kalafati, Marianthi; Martens, Marvin; Miller, Ryan; Nishida, Kozo; Rieswijk, Linda; Waagmeester, Andra; Eijssen, Lars M T; Evelo, Chris T; Pico, Alexander R; Willighagen, Egon L

    2018-01-04

    WikiPathways (wikipathways.org) captures the collective knowledge represented in biological pathways. By providing a database in a curated, machine readable way, omics data analysis and visualization is enabled. WikiPathways and other pathway databases are used to analyze experimental data by research groups in many fields. Due to the open and collaborative nature of the WikiPathways platform, our content keeps growing and is getting more accurate, making WikiPathways a reliable and rich pathway database. Previously, however, the focus was primarily on genes and proteins, leaving many metabolites with only limited annotation. Recent curation efforts focused on improving the annotation of metabolism and metabolic pathways by associating unmapped metabolites with database identifiers and providing more detailed interaction knowledge. Here, we report the outcomes of the continued growth and curation efforts, such as a doubling of the number of annotated metabolite nodes in WikiPathways. Furthermore, we introduce an OpenAPI documentation of our web services and the FAIR (Findable, Accessible, Interoperable and Reusable) annotation of resources to increase the interoperability of the knowledge encoded in these pathways and experimental omics data. New search options, monthly downloads, more links to metabolite databases, and new portals make pathway knowledge more effortlessly accessible to individual researchers and research communities. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. MutationAligner: a resource of recurrent mutation hotspots in protein domains in cancer

    PubMed Central

    Gauthier, Nicholas Paul; Reznik, Ed; Gao, Jianjiong; Sumer, Selcuk Onur; Schultz, Nikolaus; Sander, Chris; Miller, Martin L.

    2016-01-01

    The MutationAligner web resource, available at http://www.mutationaligner.org, enables discovery and exploration of somatic mutation hotspots identified in protein domains in currently (mid-2015) more than 5000 cancer patient samples across 22 different tumor types. Using multiple sequence alignments of protein domains in the human genome, we extend the principle of recurrence analysis by aggregating mutations in homologous positions across sets of paralogous genes. Protein domain analysis enhances the statistical power to detect cancer-relevant mutations and links mutations to the specific biological functions encoded in domains. We illustrate how the MutationAligner database and interactive web tool can be used to explore, visualize and analyze mutation hotspots in protein domains across genes and tumor types. We believe that MutationAligner will be an important resource for the cancer research community by providing detailed clues for the functional importance of particular mutations, as well as for the design of functional genomics experiments and for decision support in precision medicine. MutationAligner is slated to be periodically updated to incorporate additional analyses and new data from cancer genomics projects. PMID:26590264

  12. Graphical user interfaces for symbol-oriented database visualization and interaction

    NASA Astrophysics Data System (ADS)

    Brinkschulte, Uwe; Siormanolakis, Marios; Vogelsang, Holger

    1997-04-01

    In this approach, two basic services designed for the engineering of computer based systems are combined: a symbol-oriented man-machine-service and a high speed database-service. The man-machine service is used to build graphical user interfaces (GUIs) for the database service; these interfaces are stored using the database service. The idea is to create a GUI-builder and a GUI-manager for the database service based upon the man-machine service using the concept of symbols. With user-definable and predefined symbols, database contents can be visualized and manipulated in a very flexible and intuitive way. Using the GUI-builder and GUI-manager, a user can build and operate its own graphical user interface for a given database according to its needs without writing a single line of code.

  13. RPPAML/RIMS: A metadata format and an information management system for reverse phase protein arrays

    PubMed Central

    Stanislaus, Romesh; Carey, Mark; Deus, Helena F; Coombes, Kevin; Hennessy, Bryan T; Mills, Gordon B; Almeida, Jonas S

    2008-01-01

    Background Reverse Phase Protein Arrays (RPPA) are convenient assay platforms to investigate the presence of biomarkers in tissue lysates. As with other high-throughput technologies, substantial amounts of analytical data are generated. Over 1000 samples may be printed on a single nitrocellulose slide. Up to 100 different proteins may be assessed using immunoperoxidase or immunoflorescence techniques in order to determine relative amounts of protein expression in the samples of interest. Results In this report an RPPA Information Management System (RIMS) is described and made available with open source software. In order to implement the proposed system, we propose a metadata format known as reverse phase protein array markup language (RPPAML). RPPAML would enable researchers to describe, document and disseminate RPPA data. The complexity of the data structure needed to describe the results and the graphic tools necessary to visualize them require a software deployment distributed between a client and a server application. This was achieved without sacrificing interoperability between individual deployments through the use of an open source semantic database, S3DB. This data service backbone is available to multiple client side applications that can also access other server side deployments. The RIMS platform was designed to interoperate with other data analysis and data visualization tools such as Cytoscape. Conclusion The proposed RPPAML data format hopes to standardize RPPA data. Standardization of data would result in diverse client applications being able to operate on the same set of data. Additionally, having data in a standard format would enable data dissemination and data analysis. PMID:19102773

  14. MIPS: a database for genomes and protein sequences

    PubMed Central

    Mewes, H. W.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Münsterkötter, M.; Rudd, S.; Weil, B.

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz–Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91–93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155–158; Barker et al. (2001) Nucleic Acids Res., 29, 29–32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de). PMID:11752246

  15. MIPS: a database for genomes and protein sequences.

    PubMed

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  16. CoryneRegNet 4.0 – A reference database for corynebacterial gene regulatory networks

    PubMed Central

    Baumbach, Jan

    2007-01-01

    Background Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression) and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the way for the genome-wide analysis of transcriptional regulatory networks. The large-scale reconstruction of these networks allows the in silico analysis of cell behavior in response to changing environmental conditions. We previously published CoryneRegNet, an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks. Initially, it was designed to provide methods for the analysis and visualization of the gene regulatory network of Corynebacterium glutamicum. Results Now we introduce CoryneRegNet release 4.0, which integrates data on the gene regulatory networks of 4 corynebacteria, 2 mycobacteria and the model organism Escherichia coli K12. As the previous versions, CoryneRegNet provides a web-based user interface to access the database content, to allow various queries, and to support the reconstruction, analysis and visualization of regulatory networks at different hierarchical levels. In this article, we present the further improved database content of CoryneRegNet along with novel analysis features. The network visualization feature GraphVis now allows the inter-species comparisons of reconstructed gene regulatory networks and the projection of gene expression levels onto that networks. Therefore, we added stimulon data directly into the database, but also provide Web Service access to the DNA microarray analysis platform EMMA. Additionally, CoryneRegNet now provides a SOAP based Web Service server, which can easily be consumed by other bioinformatics software systems. Stimulons (imported from the database, or uploaded by the user) can be analyzed in the context of known transcriptional regulatory networks to predict putative contradictions or further gene regulatory interactions. Furthermore, it integrates protein clusters by means of heuristically solving the weighted graph cluster editing problem. In addition, it provides Web Service based access to up to date gene annotation data from GenDB. Conclusion The release 4.0 of CoryneRegNet is a comprehensive system for the integrated analysis of procaryotic gene regulatory networks. It is a versatile systems biology platform to support the efficient and large-scale analysis of transcriptional regulation of gene expression in microorganisms. It is publicly available at . PMID:17986320

  17. PDB-Explorer: a web-based interactive map of the protein data bank in shape space.

    PubMed

    Jin, Xian; Awale, Mahendra; Zasso, Michaël; Kostro, Daniel; Patiny, Luc; Reymond, Jean-Louis

    2015-10-23

    The RCSB Protein Data Bank (PDB) provides public access to experimentally determined 3D-structures of biological macromolecules (proteins, peptides and nucleic acids). While various tools are available to explore the PDB, options to access the global structural diversity of the entire PDB and to perceive relationships between PDB structures remain very limited. A 136-dimensional atom pair 3D-fingerprint for proteins (3DP) counting categorized atom pairs at increasing through-space distances was designed to represent the molecular shape of PDB-entries. Nearest neighbor searches examples were reported exemplifying the ability of 3DP-similarity to identify closely related biomolecules from small peptides to enzyme and large multiprotein complexes such as virus particles. The principle component analysis was used to obtain the visualization of PDB in 3DP-space. The 3DP property space groups proteins and protein assemblies according to their 3D-shape similarity, yet shows exquisite ability to distinguish between closely related structures. An interactive website called PDB-Explorer is presented featuring a color-coded interactive map of PDB in 3DP-space. Each pixel of the map contains one or more PDB-entries which are directly visualized as ribbon diagrams when the pixel is selected. The PDB-Explorer website allows performing 3DP-nearest neighbor searches of any PDB-entry or of any structure uploaded as protein-type PDB file. All functionalities on the website are implemented in JavaScript in a platform-independent manner and draw data from a server that is updated daily with the latest PDB additions, ensuring complete and up-to-date coverage. The essentially instantaneous 3DP-similarity search with the PDB-Explorer provides results comparable to those of much slower 3D-alignment algorithms, and automatically clusters proteins from the same superfamilies in tight groups. A chemical space classification of PDB based on molecular shape was obtained using a new atom-pair 3D-fingerprint for proteins and implemented in a web-based database exploration tool comprising an interactive color-coded map of the PDB chemical space and a nearest neighbor search tool. The PDB-Explorer website is freely available at www.cheminfo.org/pdbexplorer and represents an unprecedented opportunity to interactively visualize and explore the structural diversity of the PDB. ᅟ

  18. Integration of gel-based and gel-free proteomic data for functional analysis of proteins through Soybean Proteome Database.

    PubMed

    Komatsu, Setsuko; Wang, Xin; Yin, Xiaojian; Nanjo, Yohei; Ohyanagi, Hajime; Sakata, Katsumi

    2017-06-23

    The Soybean Proteome Database (SPD) stores data on soybean proteins obtained with gel-based and gel-free proteomic techniques. The database was constructed to provide information on proteins for functional analyses. The majority of the data is focused on soybean (Glycine max 'Enrei'). The growth and yield of soybean are strongly affected by environmental stresses such as flooding. The database was originally constructed using data on soybean proteins separated by two-dimensional polyacrylamide gel electrophoresis, which is a gel-based proteomic technique. Since 2015, the database has been expanded to incorporate data obtained by label-free mass spectrometry-based quantitative proteomics, which is a gel-free proteomic technique. Here, the portions of the database consisting of gel-free proteomic data are described. The gel-free proteomic database contains 39,212 proteins identified in 63 sample sets, such as temporal and organ-specific samples of soybean plants grown under flooding stress or non-stressed conditions. In addition, data on organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored. Furthermore, the database integrates multiple omics data such as genomics, transcriptomics, metabolomics, and proteomics. The SPD database is accessible at http://proteome.dc.affrc.go.jp/Soybean/. The Soybean Proteome Database stores data obtained from both gel-based and gel-free proteomic techniques. The gel-free proteomic database comprises 39,212 proteins identified in 63 sample sets, such as different organs of soybean plants grown under flooding stress or non-stressed conditions in a time-dependent manner. In addition, organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored in the gel-free proteomics database. A total of 44,704 proteins, including 5490 proteins identified using a gel-based proteomic technique, are stored in the SPD. It accounts for approximately 80% of all predicted proteins from genome sequences, though there are over lapped proteins. Based on the demonstrated application of data stored in the database for functional analyses, it is suggested that these data will be useful for analyses of biological mechanisms in soybean. Furthermore, coupled with recent advances in information and communication technology, the usefulness of this database would increase in the analyses of biological mechanisms. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Navigating through the Jungle of Allergens: Features and Applications of Allergen Databases.

    PubMed

    Radauer, Christian

    2017-01-01

    The increasing number of available data on allergenic proteins demanded the establishment of structured, freely accessible allergen databases. In this review article, features and applications of 6 of the most widely used allergen databases are discussed. The WHO/IUIS Allergen Nomenclature Database is the official resource of allergen designations. Allergome is the most comprehensive collection of data on allergens and allergen sources. AllergenOnline is aimed at providing a peer-reviewed database of allergen sequences for prediction of allergenicity of proteins, such as those planned to be inserted into genetically modified crops. The Structural Database of Allergenic Proteins (SDAP) provides a database of allergen sequences, structures, and epitopes linked to bioinformatics tools for sequence analysis and comparison. The Immune Epitope Database (IEDB) is the largest repository of T-cell, B-cell, and major histocompatibility complex protein epitopes including epitopes of allergens. AllFam classifies allergens into families of evolutionarily related proteins using definitions from the Pfam protein family database. These databases contain mostly overlapping data, but also show differences in terms of their targeted users, the criteria for including allergens, data shown for each allergen, and the availability of bioinformatics tools. © 2017 S. Karger AG, Basel.

  20. Detection of alternative splice variants at the proteome level in Aspergillus flavus.

    PubMed

    Chang, Kung-Yen; Georgianna, D Ryan; Heber, Steffen; Payne, Gary A; Muddiman, David C

    2010-03-05

    Identification of proteins from proteolytic peptides or intact proteins plays an essential role in proteomics. Researchers use search engines to match the acquired peptide sequences to the target proteins. However, search engines depend on protein databases to provide candidates for consideration. Alternative splicing (AS), the mechanism where the exon of pre-mRNAs can be spliced and rearranged to generate distinct mRNA and therefore protein variants, enable higher eukaryotic organisms, with only a limited number of genes, to have the requisite complexity and diversity at the proteome level. Multiple alternative isoforms from one gene often share common segments of sequences. However, many protein databases only include a limited number of isoforms to keep minimal redundancy. As a result, the database search might not identify a target protein even with high quality tandem MS data and accurate intact precursor ion mass. We computationally predicted an exhaustive list of putative isoforms of Aspergillus flavus proteins from 20 371 expressed sequence tags to investigate whether an alternative splicing protein database can assign a greater proportion of mass spectrometry data. The newly constructed AS database provided 9807 new alternatively spliced variants in addition to 12 832 previously annotated proteins. The searches of the existing tandem MS spectra data set using the AS database identified 29 new proteins encoded by 26 genes. Nine fungal genes appeared to have multiple protein isoforms. In addition to the discovery of splice variants, AS database also showed potential to improve genome annotation. In summary, the introduction of an alternative splicing database helps identify more proteins and unveils more information about a proteome.

  1. The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases.

    PubMed

    Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning

    2007-10-18

    Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at http://www.ebi.ac.uk/Tools/picr.

  2. The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases

    PubMed Central

    Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning

    2007-01-01

    Background Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. Results We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. Conclusion We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at . PMID:17945017

  3. A Graphics Design Framework to Visualize Multi-Dimensional Economic Datasets

    ERIC Educational Resources Information Center

    Chandramouli, Magesh; Narayanan, Badri; Bertoline, Gary R.

    2013-01-01

    This study implements a prototype graphics visualization framework to visualize multidimensional data. This graphics design framework serves as a "visual analytical database" for visualization and simulation of economic models. One of the primary goals of any kind of visualization is to extract useful information from colossal volumes of…

  4. NABIC: A New Access Portal to Search, Visualize, and Share Agricultural Genomics Data

    PubMed Central

    Seol, Young-Joo; Lee, Tae-Ho; Park, Dong-Suk; Kim, Chang-Kug

    2016-01-01

    The National Agricultural Biotechnology Information Center developed an access portal to search, visualize, and share agricultural genomics data with a focus on South Korean information and resources. The portal features an agricultural biotechnology database containing a wide range of omics data from public and proprietary sources. We collected 28.4 TB of data from 162 agricultural organisms, with 10 types of omics data comprising next-generation sequencing sequence read archive, genome, gene, nucleotide, DNA chip, expressed sequence tag, interactome, protein structure, molecular marker, and single-nucleotide polymorphism datasets. Our genomic resources contain information on five animals, seven plants, and one fungus, which is accessed through a genome browser. We also developed a data submission and analysis system as a web service, with easy-to-use functions and cutting-edge algorithms, including those for handling next-generation sequencing data. PMID:26848255

  5. PoSSuM v.2.0: data update and a new function for investigating ligand analogs and target proteins of small-molecule drugs.

    PubMed

    Ito, Jun-ichi; Ikeda, Kazuyoshi; Yamada, Kazunori; Mizuguchi, Kenji; Tomii, Kentaro

    2015-01-01

    PoSSuM (http://possum.cbrc.jp/PoSSuM/) is a database for detecting similar small-molecule binding sites on proteins. Since its initial release in 2011, PoSSuM has grown to provide information related to 49 million pairs of similar binding sites discovered among 5.5 million known and putative binding sites. This enlargement of the database is expected to enhance opportunities for biological and pharmaceutical applications, such as predictions of new functions and drug discovery. In this release, we have provided a new service named PoSSuM drug search (PoSSuMds) at http://possum.cbrc.jp/PoSSuM/drug_search/, in which we selected 194 approved drug compounds retrieved from ChEMBL, and detected their known binding pockets and pockets that are similar to them. Users can access and download all of the search results via a new web interface, which is useful for finding ligand analogs as well as potential target proteins. Furthermore, PoSSuMds enables users to explore the binding pocket universe within PoSSuM. Additionally, we have improved the web interface with new functions, including sortable tables and a viewer for visualizing and downloading superimposed pockets. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. TDR Targets: a chemogenomics resource for neglected diseases.

    PubMed

    Magariños, María P; Carmona, Santiago J; Crowther, Gregory J; Ralph, Stuart A; Roos, David S; Shanmugam, Dhanasekaran; Van Voorhis, Wesley C; Agüero, Fernán

    2012-01-01

    The TDR Targets Database (http://tdrtargets.org) has been designed and developed as an online resource to facilitate the rapid identification and prioritization of molecular targets for drug development, focusing on pathogens responsible for neglected human diseases. The database integrates pathogen specific genomic information with functional data (e.g. expression, phylogeny, essentiality) for genes collected from various sources, including literature curation. This information can be browsed and queried using an extensive web interface with functionalities for combining, saving, exporting and sharing the query results. Target genes can be ranked and prioritized using numerical weights assigned to the criteria used for querying. In this report we describe recent updates to the TDR Targets database, including the addition of new genomes (specifically helminths), and integration of chemical structure, property and bioactivity information for biological ligands, drugs and inhibitors and cheminformatic tools for querying and visualizing these chemical data. These changes greatly facilitate exploration of linkages (both known and predicted) between genes and small molecules, yielding insight into whether particular proteins may be druggable, effectively allowing the navigation of chemical space in a genomics context.

  7. TDR Targets: a chemogenomics resource for neglected diseases

    PubMed Central

    Magariños, María P.; Carmona, Santiago J.; Crowther, Gregory J.; Ralph, Stuart A.; Roos, David S.; Shanmugam, Dhanasekaran; Van Voorhis, Wesley C.; Agüero, Fernán

    2012-01-01

    The TDR Targets Database (http://tdrtargets.org) has been designed and developed as an online resource to facilitate the rapid identification and prioritization of molecular targets for drug development, focusing on pathogens responsible for neglected human diseases. The database integrates pathogen specific genomic information with functional data (e.g. expression, phylogeny, essentiality) for genes collected from various sources, including literature curation. This information can be browsed and queried using an extensive web interface with functionalities for combining, saving, exporting and sharing the query results. Target genes can be ranked and prioritized using numerical weights assigned to the criteria used for querying. In this report we describe recent updates to the TDR Targets database, including the addition of new genomes (specifically helminths), and integration of chemical structure, property and bioactivity information for biological ligands, drugs and inhibitors and cheminformatic tools for querying and visualizing these chemical data. These changes greatly facilitate exploration of linkages (both known and predicted) between genes and small molecules, yielding insight into whether particular proteins may be druggable, effectively allowing the navigation of chemical space in a genomics context. PMID:22116064

  8. UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions.

    PubMed

    Robasky, Kimberly; Bulyk, Martha L

    2011-01-01

    The Universal PBM Resource for Oligonucleotide-Binding Evaluation (UniPROBE) database is a centralized repository of information on the DNA-binding preferences of proteins as determined by universal protein-binding microarray (PBM) technology. Each entry for a protein (or protein complex) in UniPROBE provides the quantitative preferences for all possible nucleotide sequence variants ('words') of length k ('k-mers'), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. In this update, we describe >130% expansion of the database content, incorporation of a protein BLAST (blastp) tool for finding protein sequence matches in UniPROBE, the introduction of UniPROBE accession numbers and additional database enhancements. The UniPROBE database is available at http://uniprobe.org.

  9. MEGALEX: A megastudy of visual and auditory word recognition.

    PubMed

    Ferrand, Ludovic; Méot, Alain; Spinelli, Elsa; New, Boris; Pallier, Christophe; Bonin, Patrick; Dufau, Stéphane; Mathôt, Sebastiaan; Grainger, Jonathan

    2018-06-01

    Using the megastudy approach, we report a new database (MEGALEX) of visual and auditory lexical decision times and accuracy rates for tens of thousands of words. We collected visual lexical decision data for 28,466 French words and the same number of pseudowords, and auditory lexical decision data for 17,876 French words and the same number of pseudowords (synthesized tokens were used for the auditory modality). This constitutes the first large-scale database for auditory lexical decision, and the first database to enable a direct comparison of word recognition in different modalities. Different regression analyses were conducted to illustrate potential ways to exploit this megastudy database. First, we compared the proportions of variance accounted for by five word frequency measures. Second, we conducted item-level regression analyses to examine the relative importance of the lexical variables influencing performance in the different modalities (visual and auditory). Finally, we compared the similarities and differences between the two modalities. All data are freely available on our website ( https://sedufau.shinyapps.io/megalex/ ) and are searchable at www.lexique.org , inside the Open Lexique search engine.

  10. Visual Attention Modeling for Stereoscopic Video: A Benchmark and Computational Model.

    PubMed

    Fang, Yuming; Zhang, Chi; Li, Jing; Lei, Jianjun; Perreira Da Silva, Matthieu; Le Callet, Patrick

    2017-10-01

    In this paper, we investigate the visual attention modeling for stereoscopic video from the following two aspects. First, we build one large-scale eye tracking database as the benchmark of visual attention modeling for stereoscopic video. The database includes 47 video sequences and their corresponding eye fixation data. Second, we propose a novel computational model of visual attention for stereoscopic video based on Gestalt theory. In the proposed model, we extract the low-level features, including luminance, color, texture, and depth, from discrete cosine transform coefficients, which are used to calculate feature contrast for the spatial saliency computation. The temporal saliency is calculated by the motion contrast from the planar and depth motion features in the stereoscopic video sequences. The final saliency is estimated by fusing the spatial and temporal saliency with uncertainty weighting, which is estimated by the laws of proximity, continuity, and common fate in Gestalt theory. Experimental results show that the proposed method outperforms the state-of-the-art stereoscopic video saliency detection models on our built large-scale eye tracking database and one other database (DML-ITRACK-3D).

  11. MIPS: a database for protein sequences, homology data and yeast genome information.

    PubMed Central

    Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

    1997-01-01

    The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

  12. Cross-Link Guided Molecular Modeling with ROSETTA

    PubMed Central

    Leitner, Alexander; Rosenberger, George; Aebersold, Ruedi; Malmström, Lars

    2013-01-01

    Chemical cross-links identified by mass spectrometry generate distance restraints that reveal low-resolution structural information on proteins and protein complexes. The technology to reliably generate such data has become mature and robust enough to shift the focus to the question of how these distance restraints can be best integrated into molecular modeling calculations. Here, we introduce three workflows for incorporating distance restraints generated by chemical cross-linking and mass spectrometry into ROSETTA protocols for comparative and de novo modeling and protein-protein docking. We demonstrate that the cross-link validation and visualization software Xwalk facilitates successful cross-link data integration. Besides the protocols we introduce XLdb, a database of chemical cross-links from 14 different publications with 506 intra-protein and 62 inter-protein cross-links, where each cross-link can be mapped on an experimental structure from the Protein Data Bank. Finally, we demonstrate on a protein-protein docking reference data set the impact of virtual cross-links on protein docking calculations and show that an inter-protein cross-link can reduce on average the RMSD of a docking prediction by 5.0 Å. The methods and results presented here provide guidelines for the effective integration of chemical cross-link data in molecular modeling calculations and should advance the structural analysis of particularly large and transient protein complexes via hybrid structural biology methods. PMID:24069194

  13. NPIDB: Nucleic acid-Protein Interaction DataBase.

    PubMed

    Kirsanov, Dmitry D; Zanegina, Olga N; Aksianov, Evgeniy A; Spirin, Sergei A; Karyagina, Anna S; Alexeevski, Andrei V

    2013-01-01

    The Nucleic acid-Protein Interaction DataBase (http://npidb.belozersky.msu.ru/) contains information derived from structures of DNA-protein and RNA-protein complexes extracted from the Protein Data Bank (3846 complexes in October 2012). It provides a web interface and a set of tools for extracting biologically meaningful characteristics of nucleoprotein complexes. The content of the database is updated weekly. The current version of the Nucleic acid-Protein Interaction DataBase is an upgrade of the version published in 2007. The improvements include a new web interface, new tools for calculation of intermolecular interactions, a classification of SCOP families that contains DNA-binding protein domains and data on conserved water molecules on the DNA-protein interface.

  14. Development of the updated system of city underground pipelines based on Visual Studio

    NASA Astrophysics Data System (ADS)

    Zhang, Jianxiong; Zhu, Yun; Li, Xiangdong

    2009-10-01

    Our city has owned the integrated pipeline network management system with ArcGIS Engine 9.1 as the bottom development platform and with Oracle9i as basic database for storaging data. In this system, ArcGIS SDE9.1 is applied as the spatial data engine, and the system was a synthetic management software developed with Visual Studio visualization procedures development tools. As the pipeline update function of the system has the phenomenon of slower update and even sometimes the data lost, to ensure the underground pipeline data can real-time be updated conveniently and frequently, and the actuality and integrity of the underground pipeline data, we have increased a new update module in the system developed and researched by ourselves. The module has the powerful data update function, and can realize the function of inputting and outputting and rapid update volume of data. The new developed module adopts Visual Studio visualization procedures development tools, and uses access as the basic database to storage data. We can edit the graphics in AutoCAD software, and realize the database update using link between the graphics and the system. Practice shows that the update module has good compatibility with the original system, reliable and high update efficient of the database.

  15. The PMDB Protein Model Database

    PubMed Central

    Castrignanò, Tiziana; De Meo, Paolo D'Onorio; Cozzetto, Domenico; Talamo, Ivano Giuseppe; Tramontano, Anna

    2006-01-01

    The Protein Model Database (PMDB) is a public resource aimed at storing manually built 3D models of proteins. The database is designed to provide access to models published in the scientific literature, together with validating experimental data. It is a relational database and it currently contains >74 000 models for ∼240 proteins. The system is accessible at and allows predictors to submit models along with related supporting evidence and users to download them through a simple and intuitive interface. Users can navigate in the database and retrieve models referring to the same target protein or to different regions of the same protein. Each model is assigned a unique identifier that allows interested users to directly access the data. PMID:16381873

  16. PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank.

    PubMed

    Tusnády, Gábor E; Dosztányi, Zsuzsanna; Simon, István

    2005-01-01

    PDB_TM is a database for transmembrane proteins with known structures. It aims to collect all transmembrane proteins that are deposited in the protein structure database (PDB) and to determine their membrane-spanning regions. These assignments are based on the TMDET algorithm, which uses only structural information to locate the most likely position of the lipid bilayer and to distinguish between transmembrane and globular proteins. This algorithm was applied to all PDB entries and the results were collected in the PDB_TM database. By using TMDET algorithm, the PDB_TM database can be automatically updated every week, keeping it synchronized with the latest PDB updates. The PDB_TM database is available at http://www.enzim.hu/PDB_TM.

  17. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool

    PubMed Central

    2013-01-01

    Background System-wide profiling of genes and proteins in mammalian cells produce lists of differentially expressed genes/proteins that need to be further analyzed for their collective functions in order to extract new knowledge. Once unbiased lists of genes or proteins are generated from such experiments, these lists are used as input for computing enrichment with existing lists created from prior knowledge organized into gene-set libraries. While many enrichment analysis tools and gene-set libraries databases have been developed, there is still room for improvement. Results Here, we present Enrichr, an integrative web-based and mobile software application that includes new gene-set libraries, an alternative approach to rank enriched terms, and various interactive visualization approaches to display enrichment results using the JavaScript library, Data Driven Documents (D3). The software can also be embedded into any tool that performs gene list analysis. We applied Enrichr to analyze nine cancer cell lines by comparing their enrichment signatures to the enrichment signatures of matched normal tissues. We observed a common pattern of up regulation of the polycomb group PRC2 and enrichment for the histone mark H3K27me3 in many cancer cell lines, as well as alterations in Toll-like receptor and interlukin signaling in K562 cells when compared with normal myeloid CD33+ cells. Such analyses provide global visualization of critical differences between normal tissues and cancer cell lines but can be applied to many other scenarios. Conclusions Enrichr is an easy to use intuitive enrichment analysis web-based tool providing various types of visualization summaries of collective functions of gene lists. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr. PMID:23586463

  18. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool.

    PubMed

    Chen, Edward Y; Tan, Christopher M; Kou, Yan; Duan, Qiaonan; Wang, Zichen; Meirelles, Gabriela Vaz; Clark, Neil R; Ma'ayan, Avi

    2013-04-15

    System-wide profiling of genes and proteins in mammalian cells produce lists of differentially expressed genes/proteins that need to be further analyzed for their collective functions in order to extract new knowledge. Once unbiased lists of genes or proteins are generated from such experiments, these lists are used as input for computing enrichment with existing lists created from prior knowledge organized into gene-set libraries. While many enrichment analysis tools and gene-set libraries databases have been developed, there is still room for improvement. Here, we present Enrichr, an integrative web-based and mobile software application that includes new gene-set libraries, an alternative approach to rank enriched terms, and various interactive visualization approaches to display enrichment results using the JavaScript library, Data Driven Documents (D3). The software can also be embedded into any tool that performs gene list analysis. We applied Enrichr to analyze nine cancer cell lines by comparing their enrichment signatures to the enrichment signatures of matched normal tissues. We observed a common pattern of up regulation of the polycomb group PRC2 and enrichment for the histone mark H3K27me3 in many cancer cell lines, as well as alterations in Toll-like receptor and interlukin signaling in K562 cells when compared with normal myeloid CD33+ cells. Such analyses provide global visualization of critical differences between normal tissues and cancer cell lines but can be applied to many other scenarios. Enrichr is an easy to use intuitive enrichment analysis web-based tool providing various types of visualization summaries of collective functions of gene lists. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr.

  19. APID interactomes: providing proteome-based interactomes with controlled quality for multiple species and derived networks

    PubMed Central

    Alonso-López, Diego; Gutiérrez, Miguel A.; Lopes, Katia P.; Prieto, Carlos; Santamaría, Rodrigo; De Las Rivas, Javier

    2016-01-01

    APID (Agile Protein Interactomes DataServer) is an interactive web server that provides unified generation and delivery of protein interactomes mapped to their respective proteomes. This resource is a new, fully redesigned server that includes a comprehensive collection of protein interactomes for more than 400 organisms (25 of which include more than 500 interactions) produced by the integration of only experimentally validated protein–protein physical interactions. For each protein–protein interaction (PPI) the server includes currently reported information about its experimental validation to allow selection and filtering at different quality levels. As a whole, it provides easy access to the interactomes from specific species and includes a global uniform compendium of 90,379 distinct proteins and 678,441 singular interactions. APID integrates and unifies PPIs from major primary databases of molecular interactions, from other specific repositories and also from experimentally resolved 3D structures of protein complexes where more than two proteins were identified. For this purpose, a collection of 8,388 structures were analyzed to identify specific PPIs. APID also includes a new graph tool (based on Cytoscape.js) for visualization and interactive analyses of PPI networks. The server does not require registration and it is freely available for use at http://apid.dep.usal.es. PMID:27131791

  20. Kinase Pathway Database: An Integrated Protein-Kinase and NLP-Based Protein-Interaction Resource

    PubMed Central

    Koike, Asako; Kobayashi, Yoshiyuki; Takagi, Toshihisa

    2003-01-01

    Protein kinases play a crucial role in the regulation of cellular functions. Various kinds of information about these molecules are important for understanding signaling pathways and organism characteristics. We have developed the Kinase Pathway Database, an integrated database involving major completely sequenced eukaryotes. It contains the classification of protein kinases and their functional conservation, ortholog tables among species, protein–protein, protein–gene, and protein–compound interaction data, domain information, and structural information. It also provides an automatic pathway graphic image interface. The protein, gene, and compound interactions are automatically extracted from abstracts for all genes and proteins by natural-language processing (NLP).The method of automatic extraction uses phrase patterns and the GENA protein, gene, and compound name dictionary, which was developed by our group. With this database, pathways are easily compared among species using data with more than 47,000 protein interactions and protein kinase ortholog tables. The database is available for querying and browsing at http://kinasedb.ontology.ims.u-tokyo.ac.jp/. PMID:12799355

  1. Investigation of candidate genes for osteoarthritis based on gene expression profiles.

    PubMed

    Dong, Shuanghai; Xia, Tian; Wang, Lei; Zhao, Qinghua; Tian, Jiwei

    2016-12-01

    To explore the mechanism of osteoarthritis (OA) and provide valid biological information for further investigation. Gene expression profile of GSE46750 was downloaded from Gene Expression Omnibus database. The Linear Models for Microarray Data (limma) package (Bioconductor project, http://www.bioconductor.org/packages/release/bioc/html/limma.html) was used to identify differentially expressed genes (DEGs) in inflamed OA samples. Gene Ontology function enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis of DEGs were performed based on Database for Annotation, Visualization and Integrated Discovery data, and protein-protein interaction (PPI) network was constructed based on the Search Tool for the Retrieval of Interacting Genes/Proteins database. Regulatory network was screened based on Encyclopedia of DNA Elements. Molecular Complex Detection was used for sub-network screening. Two sub-networks with highest node degree were integrated with transcriptional regulatory network and KEGG functional enrichment analysis was processed for 2 modules. In total, 401 up- and 196 down-regulated DEGs were obtained. Up-regulated DEGs were involved in inflammatory response, while down-regulated DEGs were involved in cell cycle. PPI network with 2392 protein interactions was constructed. Moreover, 10 genes including Interleukin 6 (IL6) and Aurora B kinase (AURKB) were found to be outstanding in PPI network. There are 214 up- and 8 down-regulated transcription factor (TF)-target pairs in the TF regulatory network. Module 1 had TFs including SPI1, PRDM1, and FOS, while module 2 contained FOSL1. The nodes in module 1 were enriched in chemokine signaling pathway, while the nodes in module 2 were mainly enriched in cell cycle. The screened DEGs including IL6, AGT, and AURKB might be potential biomarkers for gene therapy for OA by being regulated by TFs such as FOS and SPI1, and participating in the cell cycle and cytokine-cytokine receptor interaction pathway. Copyright © 2016 Turkish Association of Orthopaedics and Traumatology. Production and hosting by Elsevier B.V. All rights reserved.

  2. Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures

    PubMed Central

    2012-01-01

    Background The NCBI Conserved Domain Database (CDD) consists of a collection of multiple sequence alignments of protein domains that are at various stages of being manually curated into evolutionary hierarchies based on conserved and divergent sequence and structural features. These domain models are annotated to provide insights into the relationships between sequence, structure and function via web-based BLAST searches. Results Here we automate the generation of conserved domain (CD) hierarchies using a combination of heuristic and Markov chain Monte Carlo (MCMC) sampling procedures and starting from a (typically very large) multiple sequence alignment. This procedure relies on statistical criteria to define each hierarchy based on the conserved and divergent sequence patterns associated with protein functional-specialization. At the same time this facilitates the sequence and structural annotation of residues that are functionally important. These statistical criteria also provide a means to objectively assess the quality of CD hierarchies, a non-trivial task considering that the protein subgroups are often very distantly related—a situation in which standard phylogenetic methods can be unreliable. Our aim here is to automatically generate (typically sub-optimal) hierarchies that, based on statistical criteria and visual comparisons, are comparable to manually curated hierarchies; this serves as the first step toward the ultimate goal of obtaining optimal hierarchical classifications. A plot of runtimes for the most time-intensive (non-parallelizable) part of the algorithm indicates a nearly linear time complexity so that, even for the extremely large Rossmann fold protein class, results were obtained in about a day. Conclusions This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation. At the same time, it also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups. PMID:22726767

  3. Visual Search Performance in Patients with Vision Impairment: A Systematic Review.

    PubMed

    Senger, Cassia; Margarido, Maria Rita Rodrigues Alves; De Moraes, Carlos Gustavo; De Fendi, Ligia Issa; Messias, André; Paula, Jayter Silva

    2017-11-01

    Patients with visual impairment are constantly facing challenges to achieve an independent and productive life, which depends upon both a good visual discrimination and search capacities. Given that visual search is a critical skill for several daily tasks and could be used as an index of the overall visual function, we investigated the relationship between vision impairment and visual search performance. A comprehensive search was undertaken using electronic PubMed, EMBASE, LILACS, and Cochrane databases from January 1980 to December 2016, applying the following terms: "visual search", "visual search performance", "visual impairment", "visual exploration", "visual field", "hemianopia", "search time", "vision lost", "visual loss", and "low vision". Two hundred seventy six studies from 12,059 electronic database files were selected, and 40 of them were included in this review. Studies included participants of all ages, both sexes, and the sample sizes ranged from 5 to 199 participants. Visual impairment was associated with worse visual search performance in several ophthalmologic conditions, which were either artificially induced, or related to specific eye and neurological diseases. This systematic review details all the described circumstances interfering with visual search tasks, highlights the need for developing technical standards, and outlines patterns for diagnosis and therapy using visual search capabilities.

  4. Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

    PubMed

    Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

    2017-06-01

    Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  5. Metagenomic Taxonomy-Guided Database-Searching Strategy for Improving Metaproteomic Analysis.

    PubMed

    Xiao, Jinqiu; Tanca, Alessandro; Jia, Ben; Yang, Runqing; Wang, Bo; Zhang, Yu; Li, Jing

    2018-04-06

    Metaproteomics provides a direct measure of the functional information by investigating all proteins expressed by a microbiota. However, due to the complexity and heterogeneity of microbial communities, it is very hard to construct a sequence database suitable for a metaproteomic study. Using a public database, researchers might not be able to identify proteins from poorly characterized microbial species, while a sequencing-based metagenomic database may not provide adequate coverage for all potentially expressed protein sequences. To address this challenge, we propose a metagenomic taxonomy-guided database-search strategy (MT), in which a merged database is employed, consisting of both taxonomy-guided reference protein sequences from public databases and proteins from metagenome assembly. By applying our MT strategy to a mock microbial mixture, about two times as many peptides were detected as with the metagenomic database only. According to the evaluation of the reliability of taxonomic attribution, the rate of misassignments was comparable to that obtained using an a priori matched database. We also evaluated the MT strategy with a human gut microbial sample, and we found 1.7 times as many peptides as using a standard metagenomic database. In conclusion, our MT strategy allows the construction of databases able to provide high sensitivity and precision in peptide identification in metaproteomic studies, enabling the detection of proteins from poorly characterized species within the microbiota.

  6. Merging in-silico and in vitro salivary protein complex partners using the STRING database: A tutorial.

    PubMed

    Crosara, Karla Tonelli Bicalho; Moffa, Eduardo Buozi; Xiao, Yizhi; Siqueira, Walter Luiz

    2018-01-16

    Protein-protein interaction is a common physiological mechanism for protection and actions of proteins in an organism. The identification and characterization of protein-protein interactions in different organisms is necessary to better understand their physiology and to determine their efficacy. In a previous in vitro study using mass spectrometry, we identified 43 proteins that interact with histatin 1. Six previously documented interactors were confirmed and 37 novel partners were identified. In this tutorial, we aimed to demonstrate the usefulness of the STRING database for studying protein-protein interactions. We used an in-silico approach along with the STRING database (http://string-db.org/) and successfully performed a fast simulation of a novel constructed histatin 1 protein-protein network, including both the previously known and the predicted interactors, along with our newly identified interactors. Our study highlights the advantages and importance of applying bioinformatics tools to merge in-silico tactics with experimental in vitro findings for rapid advancement of our knowledge about protein-protein interactions. Our findings also indicate that bioinformatics tools such as the STRING protein network database can help predict potential interactions between proteins and thus serve as a guide for future steps in our exploration of the Human Interactome. Our study highlights the usefulness of the STRING protein database for studying protein-protein interactions. The STRING database can collect and integrate data about known and predicted protein-protein associations from many organisms, including both direct (physical) and indirect (functional) interactions, in an easy-to-use interface. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. cPath: open source software for collecting, storing, and querying biological pathways

    PubMed Central

    Cerami, Ethan G; Bader, Gary D; Gross, Benjamin E; Sander, Chris

    2006-01-01

    Background Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required. Results We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use. Conclusion cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling. PMID:17101041

  8. Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies.

    PubMed

    Yang, Tsun-Po; Beazley, Claude; Montgomery, Stephen B; Dimas, Antigone S; Gutierrez-Arcelus, Maria; Stranger, Barbara E; Deloukas, Panos; Dermitzakis, Emmanouil T

    2010-10-01

    Genevar (GENe Expression VARiation) is a database and Java tool designed to integrate multiple datasets, and provides analysis and visualization of associations between sequence variation and gene expression. Genevar allows researchers to investigate expression quantitative trait loci (eQTL) associations within a gene locus of interest in real time. The database and application can be installed on a standard computer in database mode and, in addition, on a server to share discoveries among affiliations or the broader community over the Internet via web services protocols. http://www.sanger.ac.uk/resources/software/genevar.

  9. MetaMapp: mapping and visualizing metabolomic data by integrating information from biochemical pathways and chemical and mass spectral similarity

    PubMed Central

    2012-01-01

    Background Exposure to environmental tobacco smoke (ETS) leads to higher rates of pulmonary diseases and infections in children. To study the biochemical changes that may precede lung diseases, metabolomic effects on fetal and maternal lungs and plasma from rats exposed to ETS were compared to filtered air control animals. Genome- reconstructed metabolic pathways may be used to map and interpret dysregulation in metabolic networks. However, mass spectrometry-based non-targeted metabolomics datasets often comprise many metabolites for which links to enzymatic reactions have not yet been reported. Hence, network visualizations that rely on current biochemical databases are incomplete and also fail to visualize novel, structurally unidentified metabolites. Results We present a novel approach to integrate biochemical pathway and chemical relationships to map all detected metabolites in network graphs (MetaMapp) using KEGG reactant pair database, Tanimoto chemical and NIST mass spectral similarity scores. In fetal and maternal lungs, and in maternal blood plasma from pregnant rats exposed to environmental tobacco smoke (ETS), 459 unique metabolites comprising 179 structurally identified compounds were detected by gas chromatography time of flight mass spectrometry (GC-TOF MS) and BinBase data processing. MetaMapp graphs in Cytoscape showed much clearer metabolic modularity and complete content visualization compared to conventional biochemical mapping approaches. Cytoscape visualization of differential statistics results using these graphs showed that overall, fetal lung metabolism was more impaired than lungs and blood metabolism in dams. Fetuses from ETS-exposed dams expressed lower lipid and nucleotide levels and higher amounts of energy metabolism intermediates than control animals, indicating lower biosynthetic rates of metabolites for cell division, structural proteins and lipids that are critical for in lung development. Conclusions MetaMapp graphs efficiently visualizes mass spectrometry based metabolomics datasets as network graphs in Cytoscape, and highlights metabolic alterations that can be associated with higher rate of pulmonary diseases and infections in children prenatally exposed to ETS. The MetaMapp scripts can be accessed at http://metamapp.fiehnlab.ucdavis.edu. PMID:22591066

  10. HASP server: a database and structural visualization platform for comparative models of influenza A hemagglutinin proteins.

    PubMed

    Ambroggio, Xavier I; Dommer, Jennifer; Gopalan, Vivek; Dunham, Eleca J; Taubenberger, Jeffery K; Hurt, Darrell E

    2013-06-18

    Influenza A viruses possess RNA genomes that mutate frequently in response to immune pressures. The mutations in the hemagglutinin genes are particularly significant, as the hemagglutinin proteins mediate attachment and fusion to host cells, thereby influencing viral pathogenicity and species specificity. Large-scale influenza A genome sequencing efforts have been ongoing to understand past epidemics and pandemics and anticipate future outbreaks. Sequencing efforts thus far have generated nearly 9,000 distinct hemagglutinin amino acid sequences. Comparative models for all publicly available influenza A hemagglutinin protein sequences (8,769 to date) were generated using the Rosetta modeling suite. The C-alpha root mean square deviations between a randomly chosen test set of models and their crystallographic templates were less than 2 Å, suggesting that the modeling protocols yielded high-quality results. The models were compiled into an online resource, the Hemagglutinin Structure Prediction (HASP) server. The HASP server was designed as a scientific tool for researchers to visualize hemagglutinin protein sequences of interest in a three-dimensional context. With a built-in molecular viewer, hemagglutinin models can be compared side-by-side and navigated by a corresponding sequence alignment. The models and alignments can be downloaded for offline use and further analysis. The modeling protocols used in the HASP server scale well for large amounts of sequences and will keep pace with expanded sequencing efforts. The conservative approach to modeling and the intuitive search and visualization interfaces allow researchers to quickly analyze hemagglutinin sequences of interest in the context of the most highly related experimental structures, and allow them to directly compare hemagglutinin sequences to each other simultaneously in their two- and three-dimensional contexts. The models and methodology have shown utility in current research efforts and the ongoing aim of the HASP server is to continue to accelerate influenza A research and have a positive impact on global public health.

  11. sRNAdb: A small non-coding RNA database for gram-positive bacteria

    PubMed Central

    2012-01-01

    Background The class of small non-coding RNA molecules (sRNA) regulates gene expression by different mechanisms and enables bacteria to mount a physiological response due to adaptation to the environment or infection. Over the last decades the number of sRNAs has been increasing rapidly. Several databases like Rfam or fRNAdb were extended to include sRNAs as a class of its own. Furthermore new specialized databases like sRNAMap (gram-negative bacteria only) and sRNATarBase (target prediction) were established. To the best of the authors’ knowledge no database focusing on sRNAs from gram-positive bacteria is publicly available so far. Description In order to understand sRNA’s functional and phylogenetic relationships we have developed sRNAdb and provide tools for data analysis and visualization. The data compiled in our database is assembled from experiments as well as from bioinformatics analyses. The software enables comparison and visualization of gene loci surrounding the sRNAs of interest. To accomplish this, we use a client–server based approach. Offline versions of the database including analyses and visualization tools can easily be installed locally on the user’s computer. This feature facilitates customized local addition of unpublished sRNA candidates and related information such as promoters or terminators using tab-delimited files. Conclusion sRNAdb allows a user-friendly and comprehensive comparative analysis of sRNAs from available sequenced gram-positive prokaryotic replicons. Offline versions including analysis and visualization tools facilitate complex user specific bioinformatics analyses. PMID:22883983

  12. In search of the emotional face: anger versus happiness superiority in visual search.

    PubMed

    Savage, Ruth A; Lipp, Ottmar V; Craig, Belinda M; Becker, Stefanie I; Horstmann, Gernot

    2013-08-01

    Previous research has provided inconsistent results regarding visual search for emotional faces, yielding evidence for either anger superiority (i.e., more efficient search for angry faces) or happiness superiority effects (i.e., more efficient search for happy faces), suggesting that these results do not reflect on emotional expression, but on emotion (un-)related low-level perceptual features. The present study investigated possible factors mediating anger/happiness superiority effects; specifically search strategy (fixed vs. variable target search; Experiment 1), stimulus choice (Nimstim database vs. Ekman & Friesen database; Experiments 1 and 2), and emotional intensity (Experiment 3 and 3a). Angry faces were found faster than happy faces regardless of search strategy using faces from the Nimstim database (Experiment 1). By contrast, a happiness superiority effect was evident in Experiment 2 when using faces from the Ekman and Friesen database. Experiment 3 employed angry, happy, and exuberant expressions (Nimstim database) and yielded anger and happiness superiority effects, respectively, highlighting the importance of the choice of stimulus materials. Ratings of the stimulus materials collected in Experiment 3a indicate that differences in perceived emotional intensity, pleasantness, or arousal do not account for differences in search efficiency. Across three studies, the current investigation indicates that prior reports of anger or happiness superiority effects in visual search are likely to reflect on low-level visual features associated with the stimulus materials used, rather than on emotion. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  13. EpitopeViewer: a Java application for the visualization and analysis of immune epitopes in the Immune Epitope Database and Analysis Resource (IEDB).

    PubMed

    Beaver, John E; Bourne, Philip E; Ponomarenko, Julia V

    2007-02-21

    Structural information about epitopes, particularly the three-dimensional (3D) structures of antigens in complex with immune receptors, presents a valuable source of data for immunology. This information is available in the Protein Data Bank (PDB) and provided in curated form by the Immune Epitope Database and Analysis Resource (IEDB). With continued growth in these data and the importance in understanding molecular level interactions of immunological interest there is a need for new specialized molecular visualization and analysis tools. The EpitopeViewer is a platform-independent Java application for the visualization of the three-dimensional structure and sequence of epitopes and analyses of their interactions with antigen-specific receptors of the immune system (antibodies, T cell receptors and MHC molecules). The viewer renders both 3D views and two-dimensional plots of intermolecular interactions between the antigen and receptor(s) by reading curated data from the IEDB and/or calculated on-the-fly from atom coordinates from the PDB. The 3D views and associated interactions can be saved for future use and publication. The EpitopeViewer can be accessed from the IEDB Web site http://www.immuneepitope.org through the quick link 'Browse Records by 3D Structure.' The EpitopeViewer is designed and been tested for use by immunologists with little or no training in molecular graphics. The EpitopeViewer can be launched from most popular Web browsers without user intervention. A Java Runtime Environment (RJE) 1.4.2 or higher is required.

  14. The Yak genome database: an integrative database for studying yak biology and high-altitude adaption

    PubMed Central

    2012-01-01

    Background The yak (Bos grunniens) is a long-haired bovine that lives at high altitudes and is an important source of milk, meat, fiber and fuel. The recent sequencing, assembly and annotation of its genome are expected to further our understanding of the means by which it has adapted to life at high altitudes and its ecologically important traits. Description The Yak Genome Database (YGD) is an internet-based resource that provides access to genomic sequence data and predicted functional information concerning the genes and proteins of Bos grunniens. The curated data stored in the YGD includes genome sequences, predicted genes and associated annotations, non-coding RNA sequences, transposable elements, single nucleotide variants, and three-way whole-genome alignments between human, cattle and yak. YGD offers useful searching and data mining tools, including the ability to search for genes by name or using function keywords as well as GBrowse genome browsers and/or BLAST servers, which can be used to visualize genome regions and identify similar sequences. Sequence data from the YGD can also be downloaded to perform local searches. Conclusions A new yak genome database (YGD) has been developed to facilitate studies on high-altitude adaption and bovine genomics. The database will be continuously updated to incorporate new information such as transcriptome data and population resequencing data. The YGD can be accessed at http://me.lzu.edu.cn/yak. PMID:23134687

  15. atBioNet--an integrated network analysis tool for genomics and biomarker discovery.

    PubMed

    Ding, Yijun; Chen, Minjun; Liu, Zhichao; Ding, Don; Ye, Yanbin; Zhang, Min; Kelly, Reagan; Guo, Li; Su, Zhenqiang; Harris, Stephen C; Qian, Feng; Ge, Weigong; Fang, Hong; Xu, Xiaowei; Tong, Weida

    2012-07-20

    Large amounts of mammalian protein-protein interaction (PPI) data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should integrate different content-specific PPI databases into a comprehensive network format with a user-friendly platform to identify key functional modules/pathways and the underlying mechanisms of disease and toxicity. atBioNet integrates seven publicly available PPI databases into a network-specific knowledge base. Knowledge expansion is achieved by expanding a user supplied proteins/genes list with interactions from its integrated PPI network. The statistically significant functional modules are determined by applying a fast network-clustering algorithm (SCAN: a Structural Clustering Algorithm for Networks). The functional modules can be visualized either separately or together in the context of the whole network. Integration of pathway information enables enrichment analysis and assessment of the biological function of modules. Three case studies are presented using publicly available disease gene signatures as a basis to discover new biomarkers for acute leukemia, systemic lupus erythematosus, and breast cancer. The results demonstrated that atBioNet can not only identify functional modules and pathways related to the studied diseases, but this information can also be used to hypothesize novel biomarkers for future analysis. atBioNet is a free web-based network analysis tool that provides a systematic insight into proteins/genes interactions through examining significant functional modules. The identified functional modules are useful for determining underlying mechanisms of disease and biomarker discovery. It can be accessed at: http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm285284.htm.

  16. Lightweight genome viewer: portable software for browsing genomics data in its chromosomal context

    PubMed Central

    Faith, Jeremiah J; Olson, Andrew J; Gardner, Timothy S; Sachidanandam, Ravi

    2007-01-01

    Background Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal. Results lwgv simplifies the visualization of user-generated results on a local computer. The dynamic results of these analyses are written to transient files, which can import static content from a more permanent file. lwgv is currently used in many different applications, from whole genome browsers to single-gene RNAi design visualization, demonstrating its applicability in a large variety of contexts and scales. Conclusion lwgv provides a lightweight alternative to large genome browsers for visualizing biological annotations and dynamic analyses in their chromosomal context. It is particularly suited for applications ranging from short sequences to medium-sized genomes when the creation and maintenance of a large software and database infrastructure is not necessary or desired. PMID:17877794

  17. Nonlinear dimensionality reduction methods for synthetic biology biobricks' visualization.

    PubMed

    Yang, Jiaoyun; Wang, Haipeng; Ding, Huitong; An, Ning; Alterovitz, Gil

    2017-01-19

    Visualizing data by dimensionality reduction is an important strategy in Bioinformatics, which could help to discover hidden data properties and detect data quality issues, e.g. data noise, inappropriately labeled data, etc. As crowdsourcing-based synthetic biology databases face similar data quality issues, we propose to visualize biobricks to tackle them. However, existing dimensionality reduction methods could not be directly applied on biobricks datasets. Hereby, we use normalized edit distance to enhance dimensionality reduction methods, including Isomap and Laplacian Eigenmaps. By extracting biobricks from synthetic biology database Registry of Standard Biological Parts, six combinations of various types of biobricks are tested. The visualization graphs illustrate discriminated biobricks and inappropriately labeled biobricks. Clustering algorithm K-means is adopted to quantify the reduction results. The average clustering accuracy for Isomap and Laplacian Eigenmaps are 0.857 and 0.844, respectively. Besides, Laplacian Eigenmaps is 5 times faster than Isomap, and its visualization graph is more concentrated to discriminate biobricks. By combining normalized edit distance with Isomap and Laplacian Eigenmaps, synthetic biology biobircks are successfully visualized in two dimensional space. Various types of biobricks could be discriminated and inappropriately labeled biobricks could be determined, which could help to assess crowdsourcing-based synthetic biology databases' quality, and make biobricks selection.

  18. Lightweight genome viewer: portable software for browsing genomics data in its chromosomal context.

    PubMed

    Faith, Jeremiah J; Olson, Andrew J; Gardner, Timothy S; Sachidanandam, Ravi

    2007-09-18

    Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal. lwgv simplifies the visualization of user-generated results on a local computer. The dynamic results of these analyses are written to transient files, which can import static content from a more permanent file. lwgv is currently used in many different applications, from whole genome browsers to single-gene RNAi design visualization, demonstrating its applicability in a large variety of contexts and scales. lwgv provides a lightweight alternative to large genome browsers for visualizing biological annotations and dynamic analyses in their chromosomal context. It is particularly suited for applications ranging from short sequences to medium-sized genomes when the creation and maintenance of a large software and database infrastructure is not necessary or desired.

  19. Visualization and manipulating the image of a formal data structure (FDS)-based database

    NASA Astrophysics Data System (ADS)

    Verdiesen, Franc; de Hoop, Sylvia; Molenaar, Martien

    1994-08-01

    A vector map is a terrain representation with a vector-structured geometry. Molenaar formulated an object-oriented formal data structure for 3D single valued vector maps. This FDS is implemented in a database (Oracle). In this study we describe a methodology for visualizing a FDS-based database and manipulating the image. A data set retrieved by querying the database is converted into an import file for a drawing application. An objective of this study is that an end-user can alter and add terrain objects in the image. The drawing application creates an export file, that is compared with the import file. Differences between these files result in updating the database which involves checks on consistency. In this study Autocad is used for visualizing and manipulating the image of the data set. A computer program has been written for the data exchange and conversion between Oracle and Autocad. The data structure of the FDS is compared to the data structure of Autocad and the data of the FDS is converted into the structure of Autocad equal to the FDS.

  20. Using the clustered circular layout as an informative method for visualizing protein-protein interaction networks.

    PubMed

    Fung, David C Y; Wilkins, Marc R; Hart, David; Hong, Seok-Hee

    2010-07-01

    The force-directed layout is commonly used in computer-generated visualizations of protein-protein interaction networks. While it is good for providing a visual outline of the protein complexes and their interactions, it has two limitations when used as a visual analysis method. The first is poor reproducibility. Repeated running of the algorithm does not necessarily generate the same layout, therefore, demanding cognitive readaptation on the investigator's part. The second limitation is that it does not explicitly display complementary biological information, e.g. Gene Ontology, other than the protein names or gene symbols. Here, we present an alternative layout called the clustered circular layout. Using the human DNA replication protein-protein interaction network as a case study, we compared the two network layouts for their merits and limitations in supporting visual analysis.

  1. The Protein Disease Database of human body fluids: II. Computer methods and data issues.

    PubMed

    Lemkin, P F; Orr, G A; Goldstein, M P; Creed, G J; Myrick, J E; Merril, C R

    1995-01-01

    The Protein Disease Database (PDD) is a relational database of proteins and diseases. With this database it is possible to screen for quantitative protein abnormalities associated with disease states. These quantitative relationships use data drawn from the peer-reviewed biomedical literature. Assays may also include those observed in high-resolution electrophoretic gels that offer the potential to quantitate many proteins in a single test as well as data gathered by enzymatic or immunologic assays. We are using the Internet World Wide Web (WWW) and the Web browser paradigm as an access method for wide distribution and querying of the Protein Disease Database. The WWW hypertext transfer protocol and its Common Gateway Interface make it possible to build powerful graphical user interfaces that can support easy-to-use data retrieval using query specification forms or images. The details of these interactions are totally transparent to the users of these forms. Using a client-server SQL relational database, user query access, initial data entry and database maintenance are all performed over the Internet with a Web browser. We discuss the underlying design issues, mapping mechanisms and assumptions that we used in constructing the system, data entry, access to the database server, security, and synthesis of derived two-dimensional gel image maps and hypertext documents resulting from SQL database searches.

  2. Domain fusion analysis by applying relational algebra to protein sequence and domain databases

    PubMed Central

    Truong, Kevin; Ikura, Mitsuhiko

    2003-01-01

    Background Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. Results This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . Conclusion As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time. PMID:12734020

  3. Computer systems and methods for the query and visualization of multidimensional databases

    DOEpatents

    Stolte, Chris; Tang, Diane L; Hanrahan, Patrick

    2014-04-29

    In response to a user request, a computer generates a graphical user interface on a computer display. A schema information region of the graphical user interface includes multiple operand names, each operand name associated with one or more fields of a multi-dimensional database. A data visualization region of the graphical user interface includes multiple shelves. Upon detecting a user selection of the operand names and a user request to associate each user-selected operand name with a respective shelf in the data visualization region, the computer generates a visual table in the data visualization region in accordance with the associations between the operand names and the corresponding shelves. The visual table includes a plurality of panes, each pane having at least one axis defined based on data for the fields associated with a respective operand name.

  4. Computer systems and methods for the query and visualization of multidimensional databases

    DOEpatents

    Stolte, Chris [Palo Alto, CA; Tang, Diane L [Palo Alto, CA; Hanrahan, Patrick [Portola Valley, CA

    2011-02-01

    In response to a user request, a computer generates a graphical user interface on a computer display. A schema information region of the graphical user interface includes multiple operand names, each operand name associated with one or more fields of a multi-dimensional database. A data visualization region of the graphical user interface includes multiple shelves. Upon detecting a user selection of the operand names and a user request to associate each user-selected operand name with a respective shelf in the data visualization region, the computer generates a visual table in the data visualization region in accordance with the associations between the operand names and the corresponding shelves. The visual table includes a plurality of panes, each pane having at least one axis defined based on data for the fields associated with a respective operand name.

  5. Computer systems and methods for the query and visualization of multidimensional databases

    DOEpatents

    Stolte, Chris [Palo Alto, CA; Tang, Diane L [Palo Alto, CA; Hanrahan, Patrick [Portola Valley, CA

    2012-03-20

    In response to a user request, a computer generates a graphical user interface on a computer display. A schema information region of the graphical user interface includes multiple operand names, each operand name associated with one or more fields of a multi-dimensional database. A data visualization region of the graphical user interface includes multiple shelves. Upon detecting a user selection of the operand names and a user request to associate each user-selected operand name with a respective shelf in the data visualization region, the computer generates a visual table in the data visualization region in accordance with the associations between the operand names and the corresponding shelves. The visual table includes a plurality of panes, each pane having at least one axis defined based on data for the fields associated with a respective operand name.

  6. EuPathDB: the eukaryotic pathogen genomics database resource

    PubMed Central

    Aurrecoechea, Cristina; Barreto, Ana; Basenko, Evelina Y.; Brestelli, John; Brunk, Brian P.; Cade, Shon; Crouch, Kathryn; Doherty, Ryan; Falke, Dave; Fischer, Steve; Gajria, Bindu; Harb, Omar S.; Heiges, Mark; Hertz-Fowler, Christiane; Hu, Sufen; Iodice, John; Kissinger, Jessica C.; Lawrence, Cris; Li, Wei; Pinney, Deborah F.; Pulman, Jane A.; Roos, David S.; Shanmugasundram, Achchuthan; Silva-Franco, Fatima; Steinbiss, Sascha; Stoeckert, Christian J.; Spruill, Drew; Wang, Haiming; Warrenfeltz, Susanne; Zheng, Jie

    2017-01-01

    The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host–pathogen interactions. PMID:27903906

  7. Proteomic analysis of secreted proteins by human bronchial epithelial cells in response to cadmium toxicity.

    PubMed

    Chen, De-Ju; Xu, Yan-Ming; Zheng, Wei; Huang, Dong-Yang; Wong, Wing-Yan; Tai, William Chi-Shing; Cho, Yong-Yeon; Lau, Andy T Y

    2015-09-01

    For years, many studies have been conducted to investigate the intracellular response of cells challenged with toxic metal(s), yet, the corresponding secretome responses, especially in human lung cells, are largely unexplored. Here, we provide a secretome analysis of human bronchial epithelial cells (BEAS-2B) treated with cadmium chloride (CdCl2 ), with the aim of identifying secreted proteins in response to Cd toxicity. Proteins from control and spent media were separated by two-dimensional electrophoresis and visualized by silver staining. Differentially-secreted proteins were identified by MALDI-TOF-MS analysis and database searching. We characterized, for the first time, the extracellular proteome changes of BEAS-2B dosed with Cd. Our results unveiled that Cd treatment led to the marked upregulation of molecular chaperones, antioxidant enzymes, enzymes associated with glutathione metabolic process, proteins involved in cellular energy metabolism, as well as tumor-suppressors. Pretreatment of cells with the thiol antioxidant glutathione before Cd treatment effectively abrogated the secretion of these proteins and prevented cell death. Taken together, our results demonstrate that Cd causes oxidative stress-induced cytotoxicity; and the differentially-secreted protein signatures could be considered as targets for potential use as extracellular biomarkers upon Cd exposure. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Protein Expression Profile of Twenty-Week-Old Diabetic db/db and Non-Diabetic Mice Livers: A Proteomic and Bioinformatic Analysis.

    PubMed

    Guzmán-Flores, Juan Manuel; Flores-Pérez, Elsa Cristina; Hernández-Ortiz, Magdalena; Vargas-Ortiz, Katya; Ramírez-Emiliano, Joel; Encarnación-Guevara, Sergio; Pérez-Vázquez, Victoriano

    2018-06-01

    Type 2 diabetes mellitus is characterized by insulin resistance in the liver. Insulin is not only involved in carbohydrate metabolism, it also regulates protein synthesis. This work describes the expression of proteins in the liver of a diabetic mouse and identifies the metabolic pathways involved. Twenty-week-old diabetic db/db mice were hepatectomized, after which proteins were separated by 2D-Polyacrylamide Gel Electrophoresis (2D-PAGE). Spots varying in intensity were analyzed using mass spectrometry, and biological function was assigned by the Database for Annotation, Visualization and Integrated Discovery (DAVID) software. A differential expression of 26 proteins was identified; among these were arginase-1, pyruvate carboxylase, peroxiredoxin-1, regucalcin, and sorbitol dehydrogenase. Bioinformatics analysis indicated that many of these proteins are mitochondrial and participate in metabolic pathways, such as the citrate cycle, the fructose and mannose metabolism, and glycolysis or gluconeogenesis. In addition, these proteins are related to oxidation⁻reduction reactions and molecular function of vitamin binding and amino acid metabolism. In conclusion, the proteomic profile of the liver of diabetic mouse db/db exhibited mainly alterations in the metabolism of carbohydrates and nitrogen. These differences illustrate the heterogeneity of diabetes in its different stages and under different conditions and highlights the need to improve treatments for this disease.

  9. TOPDOM: database of conservatively located domains and motifs in proteins.

    PubMed

    Varga, Julia; Dobson, László; Tusnády, Gábor E

    2016-09-01

    The TOPDOM database-originally created as a collection of domains and motifs located consistently on the same side of the membranes in α-helical transmembrane proteins-has been updated and extended by taking into consideration consistently localized domains and motifs in globular proteins, too. By taking advantage of the recently developed CCTOP algorithm to determine the type of a protein and predict topology in case of transmembrane proteins, and by applying a thorough search for domains and motifs as well as utilizing the most up-to-date version of all source databases, we managed to reach a 6-fold increase in the size of the whole database and a 2-fold increase in the number of transmembrane proteins. TOPDOM database is available at http://topdom.enzim.hu The webpage utilizes the common Apache, PHP5 and MySQL software to provide the user interface for accessing and searching the database. The database itself is generated on a high performance computer. tusnady.gabor@ttk.mta.hu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  10. CellNetVis: a web tool for visualization of biological networks using force-directed layout constrained by cellular components.

    PubMed

    Heberle, Henry; Carazzolle, Marcelo Falsarella; Telles, Guilherme P; Meirelles, Gabriela Vaz; Minghim, Rosane

    2017-09-13

    The advent of "omics" science has brought new perspectives in contemporary biology through the high-throughput analyses of molecular interactions, providing new clues in protein/gene function and in the organization of biological pathways. Biomolecular interaction networks, or graphs, are simple abstract representations where the components of a cell (e.g. proteins, metabolites etc.) are represented by nodes and their interactions are represented by edges. An appropriate visualization of data is crucial for understanding such networks, since pathways are related to functions that occur in specific regions of the cell. The force-directed layout is an important and widely used technique to draw networks according to their topologies. Placing the networks into cellular compartments helps to quickly identify where network elements are located and, more specifically, concentrated. Currently, only a few tools provide the capability of visually organizing networks by cellular compartments. Most of them cannot handle large and dense networks. Even for small networks with hundreds of nodes the available tools are not able to reposition the network while the user is interacting, limiting the visual exploration capability. Here we propose CellNetVis, a web tool to easily display biological networks in a cell diagram employing a constrained force-directed layout algorithm. The tool is freely available and open-source. It was originally designed for networks generated by the Integrated Interactome System and can be used with networks from others databases, like InnateDB. CellNetVis has demonstrated to be applicable for dynamic investigation of complex networks over a consistent representation of a cell on the Web, with capabilities not matched elsewhere.

  11. COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps.

    PubMed

    Chang, Yi-Chien; Hu, Zhenjun; Rachlin, John; Anton, Brian P; Kasif, Simon; Roberts, Richard J; Steffen, Martin

    2016-01-04

    The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on ∼ 3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data.

    PubMed

    Hermjakob, Henning; Montecchi-Palazzi, Luisa; Bader, Gary; Wojcik, Jérôme; Salwinski, Lukasz; Ceol, Arnaud; Moore, Susan; Orchard, Sandra; Sarkans, Ugis; von Mering, Christian; Roechert, Bernd; Poux, Sylvain; Jung, Eva; Mersch, Henning; Kersey, Paul; Lappe, Michael; Li, Yixue; Zeng, Rong; Rana, Debashis; Nikolski, Macha; Husi, Holger; Brun, Christine; Shanker, K; Grant, Seth G N; Sander, Chris; Bork, Peer; Zhu, Weimin; Pandey, Akhilesh; Brazma, Alvis; Jacq, Bernard; Vidal, Marc; Sherman, David; Legrain, Pierre; Cesareni, Gianni; Xenarios, Ioannis; Eisenberg, David; Steipe, Boris; Hogue, Chris; Apweiler, Rolf

    2004-02-01

    A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across experiments is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community standard data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Standards Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomolecular Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Reference Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Molecular Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).

  13. iCAVE: an open source tool for visualizing biomolecular networks in 3D, stereoscopic 3D and immersive 3D

    PubMed Central

    Liluashvili, Vaja; Kalayci, Selim; Fluder, Eugene; Wilson, Manda; Gabow, Aaron

    2017-01-01

    Abstract Visualizations of biomolecular networks assist in systems-level data exploration in many cellular processes. Data generated from high-throughput experiments increasingly inform these networks, yet current tools do not adequately scale with concomitant increase in their size and complexity. We present an open source software platform, interactome-CAVE (iCAVE), for visualizing large and complex biomolecular interaction networks in 3D. Users can explore networks (i) in 3D using a desktop, (ii) in stereoscopic 3D using 3D-vision glasses and a desktop, or (iii) in immersive 3D within a CAVE environment. iCAVE introduces 3D extensions of known 2D network layout, clustering, and edge-bundling algorithms, as well as new 3D network layout algorithms. Furthermore, users can simultaneously query several built-in databases within iCAVE for network generation or visualize their own networks (e.g., disease, drug, protein, metabolite). iCAVE has modular structure that allows rapid development by addition of algorithms, datasets, or features without affecting other parts of the code. Overall, iCAVE is the first freely available open source tool that enables 3D (optionally stereoscopic or immersive) visualizations of complex, dense, or multi-layered biomolecular networks. While primarily designed for researchers utilizing biomolecular networks, iCAVE can assist researchers in any field. PMID:28814063

  14. iCAVE: an open source tool for visualizing biomolecular networks in 3D, stereoscopic 3D and immersive 3D.

    PubMed

    Liluashvili, Vaja; Kalayci, Selim; Fluder, Eugene; Wilson, Manda; Gabow, Aaron; Gümüs, Zeynep H

    2017-08-01

    Visualizations of biomolecular networks assist in systems-level data exploration in many cellular processes. Data generated from high-throughput experiments increasingly inform these networks, yet current tools do not adequately scale with concomitant increase in their size and complexity. We present an open source software platform, interactome-CAVE (iCAVE), for visualizing large and complex biomolecular interaction networks in 3D. Users can explore networks (i) in 3D using a desktop, (ii) in stereoscopic 3D using 3D-vision glasses and a desktop, or (iii) in immersive 3D within a CAVE environment. iCAVE introduces 3D extensions of known 2D network layout, clustering, and edge-bundling algorithms, as well as new 3D network layout algorithms. Furthermore, users can simultaneously query several built-in databases within iCAVE for network generation or visualize their own networks (e.g., disease, drug, protein, metabolite). iCAVE has modular structure that allows rapid development by addition of algorithms, datasets, or features without affecting other parts of the code. Overall, iCAVE is the first freely available open source tool that enables 3D (optionally stereoscopic or immersive) visualizations of complex, dense, or multi-layered biomolecular networks. While primarily designed for researchers utilizing biomolecular networks, iCAVE can assist researchers in any field. © The Authors 2017. Published by Oxford University Press.

  15. PATIKAweb: a Web interface for analyzing biological pathways through advanced querying and visualization.

    PubMed

    Dogrusoz, U; Erson, E Z; Giral, E; Demir, E; Babur, O; Cetintas, A; Colak, R

    2006-02-01

    Patikaweb provides a Web interface for retrieving and analyzing biological pathways in the Patika database, which contains data integrated from various prominent public pathway databases. It features a user-friendly interface, dynamic visualization and automated layout, advanced graph-theoretic queries for extracting biologically important phenomena, local persistence capability and exporting facilities to various pathway exchange formats.

  16. ChemMaps: Towards an approach for visualizing the chemical space based on adaptive satellite compounds

    PubMed Central

    Naveja, J. Jesús; Medina-Franco, José L.

    2017-01-01

    We present a novel approach called ChemMaps for visualizing chemical space based on the similarity matrix of compound datasets generated with molecular fingerprints’ similarity. The method uses a ‘satellites’ approach, where satellites are, in principle, molecules whose similarity to the rest of the molecules in the database provides sufficient information for generating a visualization of the chemical space. Such an approach could help make chemical space visualizations more efficient. We hereby describe a proof-of-principle application of the method to various databases that have different diversity measures. Unsurprisingly, we found the method works better with databases that have low 2D diversity. 3D diversity played a secondary role, although it seems to be more relevant as 2D diversity increases. For less diverse datasets, taking as few as 25% satellites seems to be sufficient for a fair depiction of the chemical space. We propose to iteratively increase the satellites number by a factor of 5% relative to the whole database, and stop when the new and the prior chemical space correlate highly. This Research Note represents a first exploratory step, prior to the full application of this method for several datasets. PMID:28794856

  17. ChemMaps: Towards an approach for visualizing the chemical space based on adaptive satellite compounds.

    PubMed

    Naveja, J Jesús; Medina-Franco, José L

    2017-01-01

    We present a novel approach called ChemMaps for visualizing chemical space based on the similarity matrix of compound datasets generated with molecular fingerprints' similarity. The method uses a 'satellites' approach, where satellites are, in principle, molecules whose similarity to the rest of the molecules in the database provides sufficient information for generating a visualization of the chemical space. Such an approach could help make chemical space visualizations more efficient. We hereby describe a proof-of-principle application of the method to various databases that have different diversity measures. Unsurprisingly, we found the method works better with databases that have low 2D diversity. 3D diversity played a secondary role, although it seems to be more relevant as 2D diversity increases. For less diverse datasets, taking as few as 25% satellites seems to be sufficient for a fair depiction of the chemical space. We propose to iteratively increase the satellites number by a factor of 5% relative to the whole database, and stop when the new and the prior chemical space correlate highly. This Research Note represents a first exploratory step, prior to the full application of this method for several datasets.

  18. MutationAligner: a resource of recurrent mutation hotspots in protein domains in cancer.

    PubMed

    Gauthier, Nicholas Paul; Reznik, Ed; Gao, Jianjiong; Sumer, Selcuk Onur; Schultz, Nikolaus; Sander, Chris; Miller, Martin L

    2016-01-04

    The MutationAligner web resource, available at http://www.mutationaligner.org, enables discovery and exploration of somatic mutation hotspots identified in protein domains in currently (mid-2015) more than 5000 cancer patient samples across 22 different tumor types. Using multiple sequence alignments of protein domains in the human genome, we extend the principle of recurrence analysis by aggregating mutations in homologous positions across sets of paralogous genes. Protein domain analysis enhances the statistical power to detect cancer-relevant mutations and links mutations to the specific biological functions encoded in domains. We illustrate how the MutationAligner database and interactive web tool can be used to explore, visualize and analyze mutation hotspots in protein domains across genes and tumor types. We believe that MutationAligner will be an important resource for the cancer research community by providing detailed clues for the functional importance of particular mutations, as well as for the design of functional genomics experiments and for decision support in precision medicine. MutationAligner is slated to be periodically updated to incorporate additional analyses and new data from cancer genomics projects. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Unraveling the Complexities of Life Sciences Data.

    PubMed

    Higdon, Roger; Haynes, Winston; Stanberry, Larissa; Stewart, Elizabeth; Yandl, Gregory; Howard, Chris; Broomall, William; Kolker, Natali; Kolker, Eugene

    2013-03-01

    The life sciences have entered into the realm of big data and data-enabled science, where data can either empower or overwhelm. These data bring the challenges of the 5 Vs of big data: volume, veracity, velocity, variety, and value. Both independently and through our involvement with DELSA Global (Data-Enabled Life Sciences Alliance, DELSAglobal.org), the Kolker Lab ( kolkerlab.org ) is creating partnerships that identify data challenges and solve community needs. We specialize in solutions to complex biological data challenges, as exemplified by the community resource of MOPED (Model Organism Protein Expression Database, MOPED.proteinspire.org ) and the analysis pipeline of SPIRE (Systematic Protein Investigative Research Environment, PROTEINSPIRE.org ). Our collaborative work extends into the computationally intensive tasks of analysis and visualization of millions of protein sequences through innovative implementations of sequence alignment algorithms and creation of the Protein Sequence Universe tool (PSU). Pushing into the future together with our collaborators, our lab is pursuing integration of multi-omics data and exploration of biological pathways, as well as assigning function to proteins and porting solutions to the cloud. Big data have come to the life sciences; discovering the knowledge in the data will bring breakthroughs and benefits.

  20. A large scale Plasmodium vivax- Saimiri boliviensis trophozoite-schizont transition proteome

    PubMed Central

    Lapp, Stacey A.; Barnwell, John W.; Galinski, Mary R.

    2017-01-01

    Plasmodium vivax is a complex protozoan parasite with over 6,500 genes and stage-specific differential expression. Much of the unique biology of this pathogen remains unknown, including how it modifies and restructures the host reticulocyte. Using a recently published P. vivax reference genome, we report the proteome from two biological replicates of infected Saimiri boliviensis host reticulocytes undergoing transition from the late trophozoite to early schizont stages. Using five database search engines, we identified a total of 2000 P. vivax and 3487 S. boliviensis proteins, making this the most comprehensive P. vivax proteome to date. PlasmoDB GO-term enrichment analysis of proteins identified at least twice by a search engine highlighted core metabolic processes and molecular functions such as glycolysis, translation and protein folding, cell components such as ribosomes, proteasomes and the Golgi apparatus, and a number of vesicle and trafficking related clusters. Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.8 enriched functional annotation clusters of S. boliviensis proteins highlighted vesicle and trafficking-related clusters, elements of the cytoskeleton, oxidative processes and response to oxidative stress, macromolecular complexes such as the proteasome and ribosome, metabolism, translation, and cell death. Host and parasite proteins potentially involved in cell adhesion were also identified. Over 25% of the P. vivax proteins have no functional annotation; this group includes 45 VIR members of the large PIR family. A number of host and pathogen proteins contained highly oxidized or nitrated residues, extending prior trophozoite-enriched stage observations from S. boliviensis infections, and supporting the possibility of oxidative stress in relation to the disease. This proteome significantly expands the size and complexity of the known P. vivax and Saimiri host iRBC proteomes, and provides in-depth data that will be valuable for ongoing research on this parasite’s biology and pathogenesis. PMID:28829774

  1. Geothopica and the interactive analysis and visualization of the updated Italian National Geothermal Database

    NASA Astrophysics Data System (ADS)

    Trumpy, Eugenio; Manzella, Adele

    2017-02-01

    The Italian National Geothermal Database (BDNG), is the largest collection of Italian Geothermal data and was set up in the 1980s. It has since been updated both in terms of content and management tools: information on deep wells and thermal springs (with temperature > 30 °C) are currently organized and stored in a PostgreSQL relational database management system, which guarantees high performance, data security and easy access through different client applications. The BDNG is the core of the Geothopica web site, whose webGIS tool allows different types of user to access geothermal data, to visualize multiple types of datasets, and to perform integrated analyses. The webGIS tool has been recently improved by two specially designed, programmed and implemented visualization tools to display data on well lithology and underground temperatures. This paper describes the contents of the database and its software and data update, as well as the webGIS tool including the new tools for data lithology and temperature visualization. The geoinformation organized in the database and accessible through Geothopica is of use not only for geothermal purposes, but also for any kind of georesource and CO2 storage project requiring the organization of, and access to, deep underground data. Geothopica also supports project developers, researchers, and decision makers in the assessment, management and sustainable deployment of georesources.

  2. A Circular Dichroism Reference Database for Membrane Proteins

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wallace,B.; Wien, F.; Stone, T.

    2006-01-01

    Membrane proteins are a major product of most genomes and the target of a large number of current pharmaceuticals, yet little information exists on their structures because of the difficulty of crystallising them; hence for the most part they have been excluded from structural genomics programme targets. Furthermore, even methods such as circular dichroism (CD) spectroscopy which seek to define secondary structure have not been fully exploited because of technical limitations to their interpretation for membrane embedded proteins. Empirical analyses of circular dichroism (CD) spectra are valuable for providing information on secondary structures of proteins. However, the accuracy of themore » results depends on the appropriateness of the reference databases used in the analyses. Membrane proteins have different spectral characteristics than do soluble proteins as a result of the low dielectric constants of membrane bilayers relative to those of aqueous solutions (Chen & Wallace (1997) Biophys. Chem. 65:65-74). To date, no CD reference database exists exclusively for the analysis of membrane proteins, and hence empirical analyses based on current reference databases derived from soluble proteins are not adequate for accurate analyses of membrane protein secondary structures (Wallace et al (2003) Prot. Sci. 12:875-884). We have therefore created a new reference database of CD spectra of integral membrane proteins whose crystal structures have been determined. To date it contains more than 20 proteins, and spans the range of secondary structures from mostly helical to mostly sheet proteins. This reference database should enable more accurate secondary structure determinations of membrane embedded proteins and will become one of the reference database options in the CD calculation server DICHROWEB (Whitmore & Wallace (2004) NAR 32:W668-673).« less

  3. Tomato functional genomics database (TFGD): a comprehensive collection and analysis package for tomato functional genomics

    USDA-ARS?s Scientific Manuscript database

    Tomato Functional Genomics Database (TFGD; http://ted.bti.cornell.edu) provides a comprehensive systems biology resource to store, mine, analyze, visualize and integrate large-scale tomato functional genomics datasets. The database is expanded from the previously described Tomato Expression Database...

  4. Tandem mass spectrometry for the detection of plant pathogenic fungi and the effects of database composition on protein inferences.

    PubMed

    Padliya, Neerav D; Garrett, Wesley M; Campbell, Kimberly B; Tabb, David L; Cooper, Bret

    2007-11-01

    LC-MS/MS has demonstrated potential for detecting plant pathogens. Unlike PCR or ELISA, LC-MS/MS does not require pathogen-specific reagents for the detection of pathogen-specific proteins and peptides. However, the MS/MS approach we and others have explored does require a protein sequence reference database and database-search software to interpret tandem mass spectra. To evaluate the limitations of database composition on pathogen identification, we analyzed proteins from cultured Ustilago maydis, Phytophthora sojae, Fusarium graminearum, and Rhizoctonia solani by LC-MS/MS. When the search database did not contain sequences for a target pathogen, or contained sequences to related pathogens, target pathogen spectra were reliably matched to protein sequences from nontarget organisms, giving an illusion that proteins from nontarget organisms were identified. Our analysis demonstrates that when database-search software is used as part of the identification process, a paradox exists whereby additional sequences needed to detect a wide variety of possible organisms may lead to more cross-species protein matches and misidentification of pathogens.

  5. PROFESS: a PROtein Function, Evolution, Structure and Sequence database

    PubMed Central

    Triplet, Thomas; Shortridge, Matthew D.; Griep, Mark A.; Stark, Jaime L.; Powers, Robert; Revesz, Peter

    2010-01-01

    The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are ∼1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein–protein interaction networks. Database URL: http://cse.unl.edu/∼profess/ PMID:20624718

  6. Human Connectome Project Informatics: quality control, database services, and data visualization

    PubMed Central

    Marcus, Daniel S.; Harms, Michael P.; Snyder, Abraham Z.; Jenkinson, Mark; Wilson, J Anthony; Glasser, Matthew F.; Barch, Deanna M.; Archie, Kevin A.; Burgess, Gregory C.; Ramaratnam, Mohana; Hodge, Michael; Horton, William; Herrick, Rick; Olsen, Timothy; McKay, Michael; House, Matthew; Hileman, Michael; Reid, Erin; Harwell, John; Coalson, Timothy; Schindler, Jon; Elam, Jennifer S.; Curtiss, Sandra W.; Van Essen, David C.

    2013-01-01

    The Human Connectome Project (HCP) has developed protocols, standard operating and quality control procedures, and a suite of informatics tools to enable high throughput data collection, data sharing, automated data processing and analysis, and data mining and visualization. Quality control procedures include methods to maintain data collection consistency over time, to measure head motion, and to establish quantitative modality-specific overall quality assessments. Database services developed as customizations of the XNAT imaging informatics platform support both internal daily operations and open access data sharing. The Connectome Workbench visualization environment enables user interaction with HCP data and is increasingly integrated with the HCP's database services. Here we describe the current state of these procedures and tools and their application in the ongoing HCP study. PMID:23707591

  7. PACSY, a relational database management system for protein structure and chemical shift analysis.

    PubMed

    Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo; Lee, Weontae; Markley, John L

    2012-10-01

    PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu.

  8. MIPS: a database for protein sequences and complete genomes.

    PubMed Central

    Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

    1998-01-01

    The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795

  9. Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies

    PubMed Central

    Yang, Tsun-Po; Beazley, Claude; Montgomery, Stephen B.; Dimas, Antigone S.; Gutierrez-Arcelus, Maria; Stranger, Barbara E.; Deloukas, Panos; Dermitzakis, Emmanouil T.

    2010-01-01

    Summary: Genevar (GENe Expression VARiation) is a database and Java tool designed to integrate multiple datasets, and provides analysis and visualization of associations between sequence variation and gene expression. Genevar allows researchers to investigate expression quantitative trait loci (eQTL) associations within a gene locus of interest in real time. The database and application can be installed on a standard computer in database mode and, in addition, on a server to share discoveries among affiliations or the broader community over the Internet via web services protocols. Availability: http://www.sanger.ac.uk/resources/software/genevar Contact: emmanouil.dermitzakis@unige.ch PMID:20702402

  10. HypoxiaDB: a database of hypoxia-regulated proteins

    PubMed Central

    Khurana, Pankaj; Sugadev, Ragumani; Jain, Jaspreet; Singh, Shashi Bala

    2013-01-01

    There has been intense interest in the cellular response to hypoxia, and a large number of differentially expressed proteins have been identified through various high-throughput experiments. These valuable data are scattered, and there have been no systematic attempts to document the various proteins regulated by hypoxia. Compilation, curation and annotation of these data are important in deciphering their role in hypoxia and hypoxia-related disorders. Therefore, we have compiled HypoxiaDB, a database of hypoxia-regulated proteins. It is a comprehensive, manually-curated, non-redundant catalog of proteins whose expressions are shown experimentally to be altered at different levels and durations of hypoxia. The database currently contains 72 000 manually curated entries taken on 3500 proteins extracted from 73 peer-reviewed publications selected from PubMed. HypoxiaDB is distinctive from other generalized databases: (i) it compiles tissue-specific protein expression changes under different levels and duration of hypoxia. Also, it provides manually curated literature references to support the inclusion of the protein in the database and establish its association with hypoxia. (ii) For each protein, HypoxiaDB integrates data on gene ontology, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway, protein–protein interactions, protein family (Pfam), OMIM (Online Mendelian Inheritance in Man), PDB (Protein Data Bank) structures and homology to other sequenced genomes. (iii) It also provides pre-compiled information on hypoxia-proteins, which otherwise requires tedious computational analysis. This includes information like chromosomal location, identifiers like Entrez, HGNC, Unigene, Uniprot, Ensembl, Vega, GI numbers and Genbank accession numbers associated with the protein. These are further cross-linked to respective public databases augmenting HypoxiaDB to the external repositories. (iv) In addition, HypoxiaDB provides an online sequence-similarity search tool for users to compare their protein sequences with HypoxiaDB protein database. We hope that HypoxiaDB will enrich our knowledge about hypoxia-related biology and eventually will lead to the development of novel hypothesis and advancements in diagnostic and therapeutic activities. HypoxiaDB is freely accessible for academic and non-profit users via http://www.hypoxiadb.com. Database URL: http://www.hypoxiadb.com PMID:24178989

  11. Development of a database for Louisiana highway bridge scour data : technical summary.

    DOT National Transportation Integrated Search

    1999-10-01

    The objectives of the project included: 1) developed a database with manipulation capabilities such as data retrieval, visualization, and update; 2) Input the existing scour data from DOTD files into the database.

  12. ARCPHdb: A comprehensive protein database for SF1 and SF2 helicase from archaea.

    PubMed

    Moukhtar, Mirna; Chaar, Wafi; Abdel-Razzak, Ziad; Khalil, Mohamad; Taha, Samir; Chamieh, Hala

    2017-01-01

    Superfamily 1 and Superfamily 2 helicases, two of the largest helicase protein families, play vital roles in many biological processes including replication, transcription and translation. Study of helicase proteins in the model microorganisms of archaea have largely contributed to the understanding of their function, architecture and assembly. Based on a large phylogenomics approach, we have identified and classified all SF1 and SF2 protein families in ninety five sequenced archaea genomes. Here we developed an online webserver linked to a specialized protein database named ARCPHdb to provide access for SF1 and SF2 helicase families from archaea. ARCPHdb was implemented using MySQL relational database. Web interfaces were developed using Netbeans. Data were stored according to UniProt accession numbers, NCBI Ref Seq ID, PDB IDs and Entrez Databases. A user-friendly interactive web interface has been developed to browse, search and download archaeal helicase protein sequences, their available 3D structure models, and related documentation available in the literature provided by ARCPHdb. The database provides direct links to matching external databases. The ARCPHdb is the first online database to compile all protein information on SF1 and SF2 helicase from archaea in one platform. This database provides essential resource information for all researchers interested in the field. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Rice proteome database: a step toward functional analysis of the rice genome.

    PubMed

    Komatsu, Setsuko

    2005-09-01

    The technique of proteome analysis using two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) has the power to monitor global changes that occur in the protein complement of tissues and subcellular compartments. In this study, the proteins of rice were cataloged, a rice proteome database was constructed, and a functional characterization of some of the identified proteins was undertaken. Proteins extracted from various tissues and subcellular compartments in rice were separated by 2D-PAGE and an image analyzer was used to construct a display of the proteins. The Rice Proteome Database contains 23 reference maps based on 2D-PAGE of proteins from various rice tissues and subcellular compartments. These reference maps comprise 13129 identified proteins, and the amino acid sequences of 5092 proteins are entered in the database. Major proteins involved in growth or stress responses were identified using the proteome approach. Some of these proteins, including a beta-tubulin, calreticulin, and ribulose-1,5-bisphosphate carboxylase/oxygenase activase in rice, have unexpected functions. The information obtained from the Rice Proteome Database will aid in cloning the genes for and predicting the function of unknown proteins.

  14. In silico analysis of fragile histidine triad involved in regression of carcinoma.

    PubMed

    Rasheed, Muhammad Asif; Tariq, Fatima; Afzal, Sara; Mannanv, Shazia

    2017-04-01

    Hepatocellular carcinoma (HCCa) is a primary malignancy of the liver. Many different proteins are involved in HCCa including insulin growth factor (IGF) II , signal transducers and activators of transcription (STAT) 3, STAT4, mothers against decapentaplegic homolog 4 (SMAD 4), fragile histidine triad (FHIT) and selective internal radiation therapy (SIRT) etc. The present study is based on the bioinformatics analysis of FHIT protein in order to understand the proteomics aspect and improvement of the diagnosis of the disease based on the protein. Different information related to protein were gathered from different databases, including National Centre for Biotechnology Information (NCBI) Gene, Protein and Online Mendelian Inheritance in Man (OMIM) databases, Uniprot database, String database and Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Moreover, the structure of the protein and evaluation of the quality of the structure were included from Easy modeler programme. Hence, this analysis not only helped to gather information related to the protein at one place, but also analysed the structure and quality of the protein to conclude that the protein has a role in carcinoma.

  15. Remote visual analysis of large turbulence databases at multiple scales

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pulido, Jesus; Livescu, Daniel; Kanov, Kalin

    The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less

  16. Remote visual analysis of large turbulence databases at multiple scales

    DOE PAGES

    Pulido, Jesus; Livescu, Daniel; Kanov, Kalin; ...

    2018-06-15

    The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less

  17. PROTERAN: animated terrain evolution for visual analysis of patterns in protein folding trajectory.

    PubMed

    Zhou, Ruhong; Parida, Laxmi; Kapila, Kush; Mudur, Sudhir

    2007-01-01

    The mechanism of protein folding remains largely a mystery in molecular biology, despite the enormous effort from many groups in the past decades. Currently, the protein folding mechanism is often characterized by calculating the free energy landscape versus various reaction coordinates such as the fraction of native contacts, the radius of gyration and so on. In this paper, we present an integrated approach towards understanding the folding process via visual analysis of patterns of these reaction coordinates. The three disparate processes (1) protein folding simulation, (2) pattern elicitation and (3) visualization of patterns, work in tandem. Thus as the protein folds, the changing landscape in the pattern space can be viewed via the visualization tool, PROTERAN, a program we developed for this purpose. We first present an incremental (on-line) trie-based pattern discovery algorithm to elicit the patterns and then describe the terrain metaphor based visualization tool. Using two example small proteins, a beta-hairpin and a designed protein Trp-cage, we next demonstrate that this combined pattern discovery and visualization approach extracts crucial information about protein folding intermediates and mechanism.

  18. Citrus sinensis annotation project (CAP): a comprehensive database for sweet orange genome.

    PubMed

    Wang, Jia; Chen, Dijun; Lei, Yang; Chang, Ji-Wei; Hao, Bao-Hai; Xing, Feng; Li, Sen; Xu, Qiang; Deng, Xiu-Xin; Chen, Ling-Ling

    2014-01-01

    Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET) evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs) and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.

  19. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education

    PubMed Central

    Rose, Peter W.; Prlić, Andreas; Bi, Chunxiao; Bluhm, Wolfgang F.; Christie, Cole H.; Dutta, Shuchismita; Green, Rachel Kramer; Goodsell, David S.; Westbrook, John D.; Woo, Jesse; Young, Jasmine; Zardecki, Christine; Berman, Helen M.; Bourne, Philip E.; Burley, Stephen K.

    2015-01-01

    The RCSB Protein Data Bank (RCSB PDB, http://www.rcsb.org) provides access to 3D structures of biological macromolecules and is one of the leading resources in biology and biomedicine worldwide. Our efforts over the past 2 years focused on enabling a deeper understanding of structural biology and providing new structural views of biology that support both basic and applied research and education. Herein, we describe recently introduced data annotations including integration with external biological resources, such as gene and drug databases, new visualization tools and improved support for the mobile web. We also describe access to data files, web services and open access software components to enable software developers to more effectively mine the PDB archive and related annotations. Our efforts are aimed at expanding the role of 3D structure in understanding biology and medicine. PMID:25428375

  20. The Protein-DNA Interface database

    PubMed Central

    2010-01-01

    The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 Å or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface. We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes. PMID:20482798

  1. The Protein-DNA Interface database.

    PubMed

    Norambuena, Tomás; Melo, Francisco

    2010-05-18

    The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 A or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface.We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes.

  2. The Universal Protein Resource (UniProt): an expanding universe of protein information.

    PubMed

    Wu, Cathy H; Apweiler, Rolf; Bairoch, Amos; Natale, Darren A; Barker, Winona C; Boeckmann, Brigitte; Ferro, Serenella; Gasteiger, Elisabeth; Huang, Hongzhan; Lopez, Rodrigo; Magrane, Michele; Martin, Maria J; Mazumder, Raja; O'Donovan, Claire; Redaschi, Nicole; Suzek, Baris

    2006-01-01

    The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.

  3. Seeing is believing: on the use of image databases for visually exploring plant organelle dynamics.

    PubMed

    Mano, Shoji; Miwa, Tomoki; Nishikawa, Shuh-ichi; Mimura, Tetsuro; Nishimura, Mikio

    2009-12-01

    Organelle dynamics vary dramatically depending on cell type, developmental stage and environmental stimuli, so that various parameters, such as size, number and behavior, are required for the description of the dynamics of each organelle. Imaging techniques are superior to other techniques for describing organelle dynamics because these parameters are visually exhibited. Therefore, as the results can be seen immediately, investigators can more easily grasp organelle dynamics. At present, imaging techniques are emerging as fundamental tools in plant organelle research, and the development of new methodologies to visualize organelles and the improvement of analytical tools and equipment have allowed the large-scale generation of image and movie data. Accordingly, image databases that accumulate information on organelle dynamics are an increasingly indispensable part of modern plant organelle research. In addition, image databases are potentially rich data sources for computational analyses, as image and movie data reposited in the databases contain valuable and significant information, such as size, number, length and velocity. Computational analytical tools support image-based data mining, such as segmentation, quantification and statistical analyses, to extract biologically meaningful information from each database and combine them to construct models. In this review, we outline the image databases that are dedicated to plant organelle research and present their potential as resources for image-based computational analyses.

  4. A web-based data visualization tool for the MIMIC-II database.

    PubMed

    Lee, Joon; Ribey, Evan; Wallace, James R

    2016-02-04

    Although MIMIC-II, a public intensive care database, has been recognized as an invaluable resource for many medical researchers worldwide, becoming a proficient MIMIC-II researcher requires knowledge of SQL programming and an understanding of the MIMIC-II database schema. These are challenging requirements especially for health researchers and clinicians who may have limited computer proficiency. In order to overcome this challenge, our objective was to create an interactive, web-based MIMIC-II data visualization tool that first-time MIMIC-II users can easily use to explore the database. The tool offers two main features: Explore and Compare. The Explore feature enables the user to select a patient cohort within MIMIC-II and visualize the distributions of various administrative, demographic, and clinical variables within the selected cohort. The Compare feature enables the user to select two patient cohorts and visually compare them with respect to a variety of variables. The tool is also helpful to experienced MIMIC-II researchers who can use it to substantially accelerate the cumbersome and time-consuming steps of writing SQL queries and manually visualizing extracted data. Any interested researcher can use the MIMIC-II data visualization tool for free to quickly and conveniently conduct a preliminary investigation on MIMIC-II with a few mouse clicks. Researchers can also use the tool to learn the characteristics of the MIMIC-II patients. Since it is still impossible to conduct multivariable regression inside the tool, future work includes adding analytics capabilities. Also, the next version of the tool will aim to utilize MIMIC-III which contains more data.

  5. Computer systems and methods for the query and visualization of multidimensional databases

    DOEpatents

    Stolte, Chris; Tang, Diane L; Hanrahan, Patrick

    2015-03-03

    A computer displays a graphical user interface on its display. The graphical user interface includes a schema information region and a data visualization region. The schema information region includes multiple operand names, each operand corresponding to one or more fields of a multi-dimensional database that includes at least one data hierarchy. The data visualization region includes a columns shelf and a rows shelf. The computer detects user actions to associate one or more first operands with the columns shelf and to associate one or more second operands with the rows shelf. The computer generates a visual table in the data visualization region in accordance with the user actions. The visual table includes one or more panes. Each pane has an x-axis defined based on data for the one or more first operands, and each pane has a y-axis defined based on data for the one or more second operands.

  6. Computer systems and methods for the query and visualization of multidimensional databases

    DOEpatents

    Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

    2015-11-10

    A computer displays a graphical user interface on its display. The graphical user interface includes a schema information region and a data visualization region. The schema information region includes a plurality of fields of a multi-dimensional database that includes at least one data hierarchy. The data visualization region includes a columns shelf and a rows shelf. The computer detects user actions to associate one or more first fields with the columns shelf and to associate one or more second fields with the rows shelf. The computer generates a visual table in the data visualization region in accordance with the user actions. The visual table includes one or more panes. Each pane has an x-axis defined based on data for the one or more first fields, and each pane has a y-axis defined based on data for the one or more second fields.

  7. Domain fusion analysis by applying relational algebra to protein sequence and domain databases.

    PubMed

    Truong, Kevin; Ikura, Mitsuhiko

    2003-05-06

    Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.

  8. Protein Bioinformatics Databases and Resources

    PubMed Central

    Chen, Chuming; Huang, Hongzhan; Wu, Cathy H.

    2017-01-01

    Many publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. To help researchers quickly find the appropriate protein related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era. PMID:28150231

  9. PATRIC, the bacterial bioinformatics database and analysis resource

    PubMed Central

    Wattam, Alice R.; Abraham, David; Dalay, Oral; Disz, Terry L.; Driscoll, Timothy; Gabbard, Joseph L.; Gillespie, Joseph J.; Gough, Roger; Hix, Deborah; Kenyon, Ronald; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K.; Olson, Robert; Overbeek, Ross; Pusch, Gordon D.; Shukla, Maulik; Schulman, Julie; Stevens, Rick L.; Sullivan, Daniel E.; Vonstein, Veronika; Warren, Andrew; Will, Rebecca; Wilson, Meredith J.C.; Yoo, Hyun Seung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno W.

    2014-01-01

    The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein–protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue. PMID:24225323

  10. PACSY, a relational database management system for protein structure and chemical shift analysis

    PubMed Central

    Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo

    2012-01-01

    PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu. PMID:22903636

  11. A protein relational database and protein family knowledge bases to facilitate structure-based design analyses.

    PubMed

    Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine

    2010-08-01

    The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.

  12. Cataloging the biomedical world of pain through semi-automated curation of molecular interactions

    PubMed Central

    Jamieson, Daniel G.; Roberts, Phoebe M.; Robertson, David L.; Sidders, Ben; Nenadic, Goran

    2013-01-01

    The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie ‘pain’, a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can facilitate rapid curation of molecular interactions to create a custom database. Database URL: ••• PMID:23707966

  13. Cataloging the biomedical world of pain through semi-automated curation of molecular interactions.

    PubMed

    Jamieson, Daniel G; Roberts, Phoebe M; Robertson, David L; Sidders, Ben; Nenadic, Goran

    2013-01-01

    The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie 'pain', a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can facilitate rapid curation of molecular interactions to create a custom database. Database URL: •••

  14. An Introductory Classroom Exercise on Protein Molecular Model Visualization and Detailed Analysis of Protein-Ligand Binding

    ERIC Educational Resources Information Center

    Poeylaut-Palena, Andres, A.; de los Angeles Laborde, Maria

    2013-01-01

    A learning module for molecular level analysis of protein structure and ligand/drug interaction through the visualization of X-ray diffraction is presented. Using DeepView as molecular model visualization software, students learn about the general concepts of protein structure. This Biochemistry classroom exercise is designed to be carried out by…

  15. GPCRM: a homology modeling web service with triple membrane-fitted quality assessment of GPCR models.

    PubMed

    Miszta, Przemyslaw; Pasznik, Pawel; Jakowiecki, Jakub; Sztyler, Agnieszka; Latek, Dorota; Filipek, Slawomir

    2018-05-21

    Due to the involvement of G protein-coupled receptors (GPCRs) in most of the physiological and pathological processes in humans they have been attracting a lot of attention from pharmaceutical industry as well as from scientific community. Therefore, the need for new, high quality structures of GPCRs is enormous. The updated homology modeling service GPCRM (http://gpcrm.biomodellab.eu/) meets those expectations by greatly reducing the execution time of submissions (from days to hours/minutes) with nearly the same average quality of obtained models. Additionally, due to three different scoring functions (Rosetta, Rosetta-MP, BCL::Score) it is possible to select accurate models for the required purposes: the structure of the binding site, the transmembrane domain or the overall shape of the receptor. Currently, no other web service for GPCR modeling provides this possibility. GPCRM is continually upgraded in a semi-automatic way and the number of template structures has increased from 20 in 2013 to over 90 including structures the same receptor with different ligands which can influence the structure not only in the on/off manner. Two types of protein viewers can be used for visual inspection of obtained models. The extended sortable tables with available templates provide links to external databases and display ligand-receptor interactions in visual form.

  16. An automated method for finding molecular complexes in large protein interaction networks

    PubMed Central

    Bader, Gary D; Hogue, Christopher WV

    2003-01-01

    Background Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery. Results This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation. Conclusion Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from . PMID:12525261

  17. Sys-BodyFluid: a systematical database for human body fluid proteome research

    PubMed Central

    Li, Su-Jun; Peng, Mao; Li, Hong; Liu, Bo-Shu; Wang, Chuan; Wu, Jia-Rui; Li, Yi-Xue; Zeng, Rong

    2009-01-01

    Recently, body fluids have widely become an important target for proteomic research and proteomic study has produced more and more body fluid related protein data. A database is needed to collect and analyze these proteome data. Thus, we developed this web-based body fluid proteome database Sys-BodyFluid. It contains eleven kinds of body fluid proteomes, including plasma/serum, urine, cerebrospinal fluid, saliva, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, tear fluid, seminal fluid, human milk and amniotic fluid. Over 10 000 proteins are presented in the Sys-BodyFluid. Sys-BodyFluid provides the detailed protein annotations, including protein description, Gene Ontology, domain information, protein sequence and involved pathways. These proteome data can be retrieved by using protein name, protein accession number and sequence similarity. In addition, users can query between these different body fluids to get the different proteins identification information. Sys-BodyFluid database can facilitate the body fluid proteomics and disease proteomics research as a reference database. It is available at http://www.biosino.org/bodyfluid/. PMID:18978022

  18. Sys-BodyFluid: a systematical database for human body fluid proteome research.

    PubMed

    Li, Su-Jun; Peng, Mao; Li, Hong; Liu, Bo-Shu; Wang, Chuan; Wu, Jia-Rui; Li, Yi-Xue; Zeng, Rong

    2009-01-01

    Recently, body fluids have widely become an important target for proteomic research and proteomic study has produced more and more body fluid related protein data. A database is needed to collect and analyze these proteome data. Thus, we developed this web-based body fluid proteome database Sys-BodyFluid. It contains eleven kinds of body fluid proteomes, including plasma/serum, urine, cerebrospinal fluid, saliva, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, tear fluid, seminal fluid, human milk and amniotic fluid. Over 10,000 proteins are presented in the Sys-BodyFluid. Sys-BodyFluid provides the detailed protein annotations, including protein description, Gene Ontology, domain information, protein sequence and involved pathways. These proteome data can be retrieved by using protein name, protein accession number and sequence similarity. In addition, users can query between these different body fluids to get the different proteins identification information. Sys-BodyFluid database can facilitate the body fluid proteomics and disease proteomics research as a reference database. It is available at http://www.biosino.org/bodyfluid/.

  19. NCBI GEO: archive for functional genomics data sets--10 years on.

    PubMed

    Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Muertter, Rolf N; Holko, Michelle; Ayanbule, Oluwabukunmi; Yefanov, Andrey; Soboleva, Alexandra

    2011-01-01

    A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20,000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.

  20. Researchermap: a tool for visualizing author locations using Google maps.

    PubMed

    Rastegar-Mojarad, Majid; Bales, Michael E; Yu, Hong

    2013-01-01

    We hereby present ResearcherMap, a tool to visualize locations of authors of scholarly papers. In response to a query, the system returns a map of author locations. To develop the system we first populated a database of author locations, geocoding institution locations for all available institutional affiliation data in our database. The database includes all authors of Medline papers from 1990 to 2012. We conducted a formative heuristic usability evaluation of the system and measured the system's accuracy and performance. The accuracy of finding the accurate address is 97.5% in our system.

  1. CRAVE: a database, middleware and visualization system for phenotype ontologies.

    PubMed

    Gkoutos, Georgios V; Green, Eain C J; Greenaway, Simon; Blake, Andrew; Mallon, Ann-Marie; Hancock, John M

    2005-04-01

    A major challenge in modern biology is to link genome sequence information to organismal function. In many organisms this is being done by characterizing phenotypes resulting from mutations. Efficiently expressing phenotypic information requires combinatorial use of ontologies. However tools are not currently available to visualize combinations of ontologies. Here we describe CRAVE (Concept Relation Assay Value Explorer), a package allowing storage, active updating and visualization of multiple ontologies. CRAVE is a web-accessible JAVA application that accesses an underlying MySQL database of ontologies via a JAVA persistent middleware layer (Chameleon). This maps the database tables into discrete JAVA classes and creates memory resident, interlinked objects corresponding to the ontology data. These JAVA objects are accessed via calls through the middleware's application programming interface. CRAVE allows simultaneous display and linking of multiple ontologies and searching using Boolean and advanced searches.

  2. LeishCyc: a guide to building a metabolic pathway database and visualization of metabolomic data.

    PubMed

    Saunders, Eleanor C; MacRae, James I; Naderer, Thomas; Ng, Milica; McConville, Malcolm J; Likić, Vladimir A

    2012-01-01

    The complexity of the metabolic networks in even the simplest organisms has raised new challenges in organizing metabolic information. To address this, specialized computer frameworks have been developed to capture, manage, and visualize metabolic knowledge. The leading databases of metabolic information are those organized under the umbrella of the BioCyc project, which consists of the reference database MetaCyc, and a number of pathway/genome databases (PGDBs) each focussed on a specific organism. A number of PGDBs have been developed for bacterial, fungal, and protozoan pathogens, greatly facilitating dissection of the metabolic potential of these organisms and the identification of new drug targets. Leishmania are protozoan parasites belonging to the family Trypanosomatidae that cause a broad spectrum of diseases in humans. In this work we use the LeishCyc database, the BioCyc database for Leishmania major, to describe how to build a BioCyc database from genomic sequences and associated annotations. By using metabolomic data generated in our group, we show how such databases can be utilized to elucidate specific changes in parasite metabolism.

  3. The changing pattern of cataract surgery indications: a 5-year study of 2 cataract surgery databases.

    PubMed

    Lundström, Mats; Goh, Pik-Pin; Henry, Ype; Salowi, Mohamad A; Barry, Peter; Manning, Sonia; Rosen, Paul; Stenevi, Ulf

    2015-01-01

    The aim of this study was to describe changes over time in the indications and outcomes of cataract surgery and to discuss optimal timing for the surgery. Database study. Patients who had undergone cataract extraction in the Netherlands, Sweden, or Malaysia from 2008 through 2012. We analyzed preoperative, surgical, and postoperative data from 2 databases: the European Registry of Quality Outcomes for Cataract and Refractive Surgery (EUREQUO) and the Malaysian National Cataract Registry. The EUREQUO contains complete data from the national cataract registries in the Netherlands and Sweden. Preoperative and postoperative corrected distance visual acuity, preoperative ocular comorbidity in the surgery eye, and capsule complications during surgery. There were substantial differences in indication for surgery between the 3 national data sets. The percentage of eyes with a preoperative best-corrected visual acuity of 20/200 or worse varied from 7.1% to 72%. In all 3 data sets, the visual thresholds for cataract surgery decreased over time by 6% to 28% of the baseline values. The frequency of capsule complications varied between the 3 data sets, from 1.1% to 3.7% in 2008 and from 0.6% to 2.7% in 2012. An increasing postoperative visual acuity was also seen for all 3 data sets. A high frequency of capsule complication was related significantly to poor preoperative visual acuity, and a high frequency of decreased visual acuity after surgery was related significantly to excellent preoperative visual acuity. The 5-year trend in all 3 national data sets showed decreasing visual thresholds for surgery, decreasing surgical complication rates, and increasing visual outcomes regardless of the initial preoperative visual level. Cataract surgery on eyes with poor preoperative visual acuity was related to surgical complications, and cataract surgery on eyes with excellent preoperative visual acuity was related to adverse visual results. Copyright © 2015 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  4. PARPs database: A LIMS systems for protein-protein interaction data mining or laboratory information management system

    PubMed Central

    Droit, Arnaud; Hunter, Joanna M; Rouleau, Michèle; Ethier, Chantal; Picard-Cloutier, Aude; Bourgais, David; Poirier, Guy G

    2007-01-01

    Background In the "post-genome" era, mass spectrometry (MS) has become an important method for the analysis of proteins and the rapid advancement of this technique, in combination with other proteomics methods, results in an increasing amount of proteome data. This data must be archived and analysed using specialized bioinformatics tools. Description We herein describe "PARPs database," a data analysis and management pipeline for liquid chromatography tandem mass spectrometry (LC-MS/MS) proteomics. PARPs database is a web-based tool whose features include experiment annotation, protein database searching, protein sequence management, as well as data-mining of the peptides and proteins identified. Conclusion Using this pipeline, we have successfully identified several interactions of biological significance between PARP-1 and other proteins, namely RFC-1, 2, 3, 4 and 5. PMID:18093328

  5. The 2015 Nucleic Acids Research Database Issue and molecular biology database collection.

    PubMed

    Galperin, Michael Y; Rigden, Daniel J; Fernández-Suárez, Xosé M

    2015-01-01

    The 2015 Nucleic Acids Research Database Issue contains 172 papers that include descriptions of 56 new molecular biology databases, and updates on 115 databases whose descriptions have been previously published in NAR or other journals. Following the classification that has been introduced last year in order to simplify navigation of the entire issue, these articles are divided into eight subject categories. This year's highlights include RNAcentral, an international community portal to various databases on noncoding RNA; ValidatorDB, a validation database for protein structures and their ligands; SASBDB, a primary repository for small-angle scattering data of various macromolecular complexes; MoonProt, a database of 'moonlighting' proteins, and two new databases of protein-protein and other macromolecular complexes, ComPPI and the Complex Portal. This issue also includes an unusually high number of cancer-related databases and other databases dedicated to genomic basics of disease and potential drugs and drug targets. The size of NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/a/, remained approximately the same, following the addition of 74 new resources and removal of 77 obsolete web sites. The entire Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/). Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  6. The Histone Database: an integrated resource for histones and histone fold-containing proteins

    PubMed Central

    Mariño-Ramírez, Leonardo; Levine, Kevin M.; Morales, Mario; Zhang, Suiyuan; Moreland, R. Travis; Baxevanis, Andreas D.; Landsman, David

    2011-01-01

    Eukaryotic chromatin is composed of DNA and protein components—core histones—that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Histone Sequence Database is a curated collection of sequences and structures of histones and non-histone proteins containing histone folds, assembled from major public databases. Here, we report a substantial increase in the number of sequences and taxonomic coverage for histone and histone fold-containing proteins available in the database. Additionally, the database now contains an expanded dataset that includes archaeal histone sequences. The database also provides comprehensive multiple sequence alignments for each of the four core histones (H2A, H2B, H3 and H4), the linker histones (H1/H5) and the archaeal histones. The database also includes current information on solved histone fold-containing structures. The Histone Sequence Database is an inclusive resource for the analysis of chromatin structure and function focused on histones and histone fold-containing proteins. Database URL: The Histone Sequence Database is freely available and can be accessed at http://research.nhgri.nih.gov/histones/. PMID:22025671

  7. [Better performance of Western blotting: quick vs slow protein transfer, blotting membranes and the visualization methods].

    PubMed

    Kong, Ling-Quan; Pu, Ying-Hui; Ma, Shi-Kun

    2008-01-01

    To study how the choices of the quick vs slow protein transfer, the blotting membranes and the visualization methods influence the performance of Western blotting. The cellular proteins were abstracted from human breast cell line MDA-MB-231 for analysis with Western blotting using quick (2 h) and slow (overnight) protein transfer, different blotting membranes (nitrocellulose, PVDF and nylon membranes) and different visualization methods (ECL and DAB). In Western blotting with slow and quick protein transfer, the prestained marker presented more distinct bands on nitrocellulose membrane than on the nylon and PVDF membranes, and the latter also showed clear bands on the back of the membrane to very likely cause confusion, which did not occur with nitrocellulose membrane. PVDF membrane allowed slightly clearer visualization of the proteins with DAB method as compared with nitrocellulose and nylon membranes, and on the latter two membranes, quick protein transfer was likely to result in somehow irregular bands in comparison with slow protein transfer. With slow protein transfer and chemiluminescence for visualization, all the 3 membranes showed clear background, while with quick protein transfer, nylon membrane gave rise to obvious background noise but the other two membranes did not. Different membranes should be selected for immunoblotting according to the actual needs of the experiment. Slow transfer of the proteins onto the membranes often has better effect than quick transfer, and enhanced chemiluminescence is superior to DAB for protein visualization and allows highly specific and sensitive analysis of the protein expressions.

  8. The electric dipole moment of DNA-binding HU protein calculated by the use of an NMR database.

    PubMed

    Takashima, S; Yamaoka, K

    1999-08-30

    Electric birefringence measurements indicated the presence of a large permanent dipole moment in HU protein-DNA complex. In order to substantiate this observation, numerical computation of the dipole moment of HU protein homodimer was carried out by using NMR protein databases. The dipole moments of globular proteins have hitherto been calculated with X-ray databases and NMR data have never been used before. The advantages of NMR databases are: (a) NMR data are obtained, unlike X-ray databases, using protein solutions. Accordingly, this method eliminates the bothersome question as to the possible alteration of the protein structure due to the transition from the crystalline state to the solution state. This question is particularly important for proteins such as HU protein which has some degree of internal flexibility; (b) the three-dimensional coordinates of hydrogen atoms in protein molecules can be determined with a sufficient resolution and this enables the N-H as well as C = O bond moments to be calculated. Since the NMR database of HU protein from Bacillus stearothermophilus consists of 25 models, the surface charge as well as the core dipole moments were computed for each of these structures. The results of these calculations show that the net permanent dipole moments of HU protein homodimer is approximately 500-530 D (1 D = 3.33 x 10(-30) Cm) at pH 7.5 and 600-630 D at the isoelectric point (pH 10.5). These permanent dipole moments are unusually large for a small protein of the size of 19.5 kDa. Nevertheless, the result of numerical calculations is compatible with the electro-optical observation, confirming a very large dipole moment in this protein.

  9. APADB: a database for alternative polyadenylation and microRNA regulation events

    PubMed Central

    Müller, Sören; Rycak, Lukas; Afonso-Grunz, Fabian; Winter, Peter; Zawada, Adam M.; Damrath, Ewa; Scheider, Jessica; Schmäh, Juliane; Koch, Ina; Kahl, Günter; Rotter, Björn

    2014-01-01

    Alternative polyadenylation (APA) is a widespread mechanism that contributes to the sophisticated dynamics of gene regulation. Approximately 50% of all protein-coding human genes harbor multiple polyadenylation (PA) sites; their selective and combinatorial use gives rise to transcript variants with differing length of their 3′ untranslated region (3′UTR). Shortened variants escape UTR-mediated regulation by microRNAs (miRNAs), especially in cancer, where global 3′UTR shortening accelerates disease progression, dedifferentiation and proliferation. Here we present APADB, a database of vertebrate PA sites determined by 3′ end sequencing, using massive analysis of complementary DNA ends. APADB provides (A)PA sites for coding and non-coding transcripts of human, mouse and chicken genes. For human and mouse, several tissue types, including different cancer specimens, are available. APADB records the loss of predicted miRNA binding sites and visualizes next-generation sequencing reads that support each PA site in a genome browser. The database tables can either be browsed according to organism and tissue or alternatively searched for a gene of interest. APADB is the largest database of APA in human, chicken and mouse. The stored information provides experimental evidence for thousands of PA sites and APA events. APADB combines 3′ end sequencing data with prediction algorithms of miRNA binding sites, allowing to further improve prediction algorithms. Current databases lack correct information about 3′UTR lengths, especially for chicken, and APADB provides necessary information to close this gap. Database URL: http://tools.genxpro.net/apadb/ PMID:25052703

  10. MoonProt: a database for proteins that are known to moonlight

    PubMed Central

    Mani, Mathew; Chen, Chang; Amblee, Vaishak; Liu, Haipeng; Mathur, Tanu; Zwicke, Grant; Zabad, Shadi; Patel, Bansi; Thakkar, Jagravi; Jeffery, Constance J.

    2015-01-01

    Moonlighting proteins comprise a class of multifunctional proteins in which a single polypeptide chain performs multiple biochemical functions that are not due to gene fusions, multiple RNA splice variants or pleiotropic effects. The known moonlighting proteins perform a variety of diverse functions in many different cell types and species, and information about their structures and functions is scattered in many publications. We have constructed the manually curated, searchable, internet-based MoonProt Database (http://www.moonlightingproteins.org) with information about the over 200 proteins that have been experimentally verified to be moonlighting proteins. The availability of this organized information provides a more complete picture of what is currently known about moonlighting proteins. The database will also aid researchers in other fields, including determining the functions of genes identified in genome sequencing projects, interpreting data from proteomics projects and annotating protein sequence and structural databases. In addition, information about the structures and functions of moonlighting proteins can be helpful in understanding how novel protein functional sites evolved on an ancient protein scaffold, which can also help in the design of proteins with novel functions. PMID:25324305

  11. The BioCyc collection of microbial genomes and metabolic pathways.

    PubMed

    Karp, Peter D; Billington, Richard; Caspi, Ron; Fulcher, Carol A; Latendresse, Mario; Kothari, Anamika; Keseler, Ingrid M; Krummenacker, Markus; Midford, Peter E; Ong, Quang; Ong, Wai Kit; Paley, Suzanne M; Subhraveti, Pallavi

    2017-08-17

    BioCyc.org is a microbial genome Web portal that combines thousands of genomes with additional information inferred by computer programs, imported from other databases and curated from the biomedical literature by biologist curators. BioCyc also provides an extensive range of query tools, visualization services and analysis software. Recent advances in BioCyc include an expansion in the content of BioCyc in terms of both the number of genomes and the types of information available for each genome; an expansion in the amount of curated content within BioCyc; and new developments in the BioCyc software tools including redesigned gene/protein pages and metabolite pages; new search tools; a new sequence-alignment tool; a new tool for visualizing groups of related metabolic pathways; and a facility called SmartTables, which enables biologists to perform analyses that previously would have required a programmer's assistance. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  12. A comprehensive and scalable database search system for metaproteomics.

    PubMed

    Chatterjee, Sandip; Stupp, Gregory S; Park, Sung Kyu Robin; Ducom, Jean-Christophe; Yates, John R; Su, Andrew I; Wolan, Dennis W

    2016-08-16

    Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples. The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.

  13. Creation of a Genome-Wide Metabolic Pathway Database for Populus trichocarpa Using a New Approach for Reconstruction and Curation of Metabolic Pathways for Plants1[W][OA

    PubMed Central

    Zhang, Peifen; Dreher, Kate; Karthikeyan, A.; Chi, Anjo; Pujar, Anuradha; Caspi, Ron; Karp, Peter; Kirkup, Vanessa; Latendresse, Mario; Lee, Cynthia; Mueller, Lukas A.; Muller, Robert; Rhee, Seung Yon

    2010-01-01

    Metabolic networks reconstructed from sequenced genomes or transcriptomes can help visualize and analyze large-scale experimental data, predict metabolic phenotypes, discover enzymes, engineer metabolic pathways, and study metabolic pathway evolution. We developed a general approach for reconstructing metabolic pathway complements of plant genomes. Two new reference databases were created and added to the core of the infrastructure: a comprehensive, all-plant reference pathway database, PlantCyc, and a reference enzyme sequence database, RESD, for annotating metabolic functions of protein sequences. PlantCyc (version 3.0) includes 714 metabolic pathways and 2,619 reactions from over 300 species. RESD (version 1.0) contains 14,187 literature-supported enzyme sequences from across all kingdoms. We used RESD, PlantCyc, and MetaCyc (an all-species reference metabolic pathway database), in conjunction with the pathway prediction software Pathway Tools, to reconstruct a metabolic pathway database, PoplarCyc, from the recently sequenced genome of Populus trichocarpa. PoplarCyc (version 1.0) contains 321 pathways with 1,807 assigned enzymes. Comparing PoplarCyc (version 1.0) with AraCyc (version 6.0, Arabidopsis [Arabidopsis thaliana]) showed comparable numbers of pathways distributed across all domains of metabolism in both databases, except for a higher number of AraCyc pathways in secondary metabolism and a 1.5-fold increase in carbohydrate metabolic enzymes in PoplarCyc. Here, we introduce these new resources and demonstrate the feasibility of using them to identify candidate enzymes for specific pathways and to analyze metabolite profiling data through concrete examples. These resources can be searched by text or BLAST, browsed, and downloaded from our project Web site (http://plantcyc.org). PMID:20522724

  14. BNDB - the Biochemical Network Database.

    PubMed

    Küntzer, Jan; Backes, Christina; Blum, Torsten; Gerasch, Andreas; Kaufmann, Michael; Kohlbacher, Oliver; Lenhof, Hans-Peter

    2007-10-02

    Technological advances in high-throughput techniques and efficient data acquisition methods have resulted in a massive amount of life science data. The data is stored in numerous databases that have been established over the last decades and are essential resources for scientists nowadays. However, the diversity of the databases and the underlying data models make it difficult to combine this information for solving complex problems in systems biology. Currently, researchers typically have to browse several, often highly focused, databases to obtain the required information. Hence, there is a pressing need for more efficient systems for integrating, analyzing, and interpreting these data. The standardization and virtual consolidation of the databases is a major challenge resulting in a unified access to a variety of data sources. We present the Biochemical Network Database (BNDB), a powerful relational database platform, allowing a complete semantic integration of an extensive collection of external databases. BNDB is built upon a comprehensive and extensible object model called BioCore, which is powerful enough to model most known biochemical processes and at the same time easily extensible to be adapted to new biological concepts. Besides a web interface for the search and curation of the data, a Java-based viewer (BiNA) provides a powerful platform-independent visualization and navigation of the data. BiNA uses sophisticated graph layout algorithms for an interactive visualization and navigation of BNDB. BNDB allows a simple, unified access to a variety of external data sources. Its tight integration with the biochemical network library BN++ offers the possibility for import, integration, analysis, and visualization of the data. BNDB is freely accessible at http://www.bndb.org.

  15. Development of an Open Global Oil and Gas Infrastructure Inventory and Geodatabase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rose, Kelly

    This submission contains a technical report describing the development process and visual graphics for the Global Oil and Gas Infrastructure database. Access the GOGI database using the following link: https://edx.netl.doe.gov/dataset/global-oil-gas-features-database

  16. hPDI: a database of experimental human protein-DNA interactions.

    PubMed

    Xie, Zhi; Hu, Shaohui; Blackshaw, Seth; Zhu, Heng; Qian, Jiang

    2010-01-15

    The human protein DNA Interactome (hPDI) database holds experimental protein-DNA interaction data for humans identified by protein microarray assays. The unique characteristics of hPDI are that it contains consensus DNA-binding sequences not only for nearly 500 human transcription factors but also for >500 unconventional DNA-binding proteins, which are completely uncharacterized previously. Users can browse, search and download a subset or the entire data via a web interface. This database is freely accessible for any academic purposes. http://bioinfo.wilmer.jhu.edu/PDI/.

  17. MitoNuc: a database of nuclear genes coding for mitochondrial proteins. Update 2002.

    PubMed

    Attimonelli, Marcella; Catalano, Domenico; Gissi, Carmela; Grillo, Giorgio; Licciulli, Flavio; Liuni, Sabino; Santamaria, Monica; Pesole, Graziano; Saccone, Cecilia

    2002-01-01

    Mitochondria, besides their central role in energy metabolism, have recently been found to be involved in a number of basic processes of cell life and to contribute to the pathogenesis of many degenerative diseases. All functions of mitochondria depend on the interaction of nuclear and organelle genomes. Mitochondrial genomes have been extensively sequenced and analysed and data have been collected in several specialised databases. In order to collect information on nuclear coded mitochondrial proteins we developed MitoNuc, a database containing detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa. The MitoNuc database can be retrieved through SRS and is available via the web site http://bighost.area.ba.cnr.it/mitochondriome where other mitochondrial databases developed by our group, the complete list of the sequenced mitochondrial genomes, links to other mitochondrial sites and related information, are available. The MitoAln database, related to MitoNuc in the previous release, reporting the multiple alignments of the relevant homologous protein coding regions, is no longer supported in the present release. In order to keep the links among entries in MitoNuc from homologous proteins, a new field in the database has been defined: the cluster identifier, an alpha numeric code used to identify each cluster of homologous proteins. A comment field derived from the corresponding SWISS-PROT entry has been introduced; this reports clinical data related to dysfunction of the protein. The logic scheme of MitoNuc database has been implemented in the ORACLE DBMS. This will allow the end-users to retrieve data through a friendly interface that will be soon implemented.

  18. HTAPP: High-Throughput Autonomous Proteomic Pipeline

    PubMed Central

    Yu, Kebing; Salomon, Arthur R.

    2011-01-01

    Recent advances in the speed and sensitivity of mass spectrometers and in analytical methods, the exponential acceleration of computer processing speeds, and the availability of genomic databases from an array of species and protein information databases have led to a deluge of proteomic data. The development of a lab-based automated proteomic software platform for the automated collection, processing, storage, and visualization of expansive proteomic datasets is critically important. The high-throughput autonomous proteomic pipeline (HTAPP) described here is designed from the ground up to provide critically important flexibility for diverse proteomic workflows and to streamline the total analysis of a complex proteomic sample. This tool is comprised of software that controls the acquisition of mass spectral data along with automation of post-acquisition tasks such as peptide quantification, clustered MS/MS spectral database searching, statistical validation, and data exploration within a user-configurable lab-based relational database. The software design of HTAPP focuses on accommodating diverse workflows and providing missing software functionality to a wide range of proteomic researchers to accelerate the extraction of biological meaning from immense proteomic data sets. Although individual software modules in our integrated technology platform may have some similarities to existing tools, the true novelty of the approach described here is in the synergistic and flexible combination of these tools to provide an integrated and efficient analysis of proteomic samples. PMID:20336676

  19. Using SQL Databases for Sequence Similarity Searching and Analysis.

    PubMed

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  20. SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein–Protein Interactions

    PubMed Central

    Jefferson, Emily R.; Walsh, Thomas P.; Roberts, Timothy J.; Barton, Geoffrey J.

    2007-01-01

    SNAPPI-DB, a high performance database of Structures, iNterfaces and Alignments of Protein–Protein Interactions, and its associated Java Application Programming Interface (API) is described. SNAPPI-DB contains structural data, down to the level of atom co-ordinates, for each structure in the Protein Data Bank (PDB) together with associated data including SCOP, CATH, Pfam, SWISSPROT, InterPro, GO terms, Protein Quaternary Structures (PQS) and secondary structure information. Domain–domain interactions are stored for multiple domain definitions and are classified by their Superfamily/Family pair and interaction interface. Each set of classified domain–domain interactions has an associated multiple structure alignment for each partner. The API facilitates data access via PDB entries, domains and domain–domain interactions. Rapid development, fast database access and the ability to perform advanced queries without the requirement for complex SQL statements are provided via an object oriented database and the Java Data Objects (JDO) API. SNAPPI-DB contains many features which are not available in other databases of structural protein–protein interactions. It has been applied in three studies on the properties of protein–protein interactions and is currently being employed to train a protein–protein interaction predictor and a functional residue predictor. The database, API and manual are available for download at: . PMID:17202171

  1. SALAD database: a motif-based database of protein annotations for plant comparative genomics

    PubMed Central

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2010-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933

  2. SALAD database: a motif-based database of protein annotations for plant comparative genomics.

    PubMed

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2010-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.

  3. Location-Driven Image Retrieval for Images Collected by a Mobile Robot

    NASA Astrophysics Data System (ADS)

    Tanaka, Kanji; Hirayama, Mitsuru; Okada, Nobuhiro; Kondo, Eiji

    Mobile robot teleoperation is a method for a human user to interact with a mobile robot over time and distance. Successful teleoperation depends on how well images taken by the mobile robot are visualized to the user. To enhance the efficiency and flexibility of the visualization, an image retrieval system on such a robot’s image database would be very useful. The main difference of the robot’s image database from standard image databases is that various relevant images exist due to variety of viewing conditions. The main contribution of this paper is to propose an efficient retrieval approach, named location-driven approach, utilizing correlation between visual features and real world locations of images. Combining the location-driven approach with the conventional feature-driven approach, our goal can be viewed as finding an optimal classifier between relevant and irrelevant feature-location pairs. An active learning technique based on support vector machine is extended for this aim.

  4. Expert Knowledge-Based Automatic Sleep Stage Determination by Multi-Valued Decision Making Method

    NASA Astrophysics Data System (ADS)

    Wang, Bei; Sugi, Takenao; Kawana, Fusae; Wang, Xingyu; Nakamura, Masatoshi

    In this study, an expert knowledge-based automatic sleep stage determination system working on a multi-valued decision making method is developed. Visual inspection by a qualified clinician is adopted to obtain the expert knowledge database. The expert knowledge database consists of probability density functions of parameters for various sleep stages. Sleep stages are determined automatically according to the conditional probability. Totally, four subjects were participated. The automatic sleep stage determination results showed close agreements with the visual inspection on sleep stages of awake, REM (rapid eye movement), light sleep and deep sleep. The constructed expert knowledge database reflects the distributions of characteristic parameters which can be adaptive to variable sleep data in hospitals. The developed automatic determination technique based on expert knowledge of visual inspection can be an assistant tool enabling further inspection of sleep disorder cases for clinical practice.

  5. ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.

    PubMed

    Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

    2010-03-01

    Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org

  6. LocSigDB: a database of protein localization signals

    PubMed Central

    Negi, Simarjeet; Pandey, Sanjit; Srinivasan, Satish M.; Mohammed, Akram; Guda, Chittibabu

    2015-01-01

    LocSigDB (http://genome.unmc.edu/LocSigDB/) is a manually curated database of experimental protein localization signals for eight distinct subcellular locations; primarily in a eukaryotic cell with brief coverage of bacterial proteins. Proteins must be localized at their appropriate subcellular compartment to perform their desired function. Mislocalization of proteins to unintended locations is a causative factor for many human diseases; therefore, collection of known sorting signals will help support many important areas of biomedical research. By performing an extensive literature study, we compiled a collection of 533 experimentally determined localization signals, along with the proteins that harbor such signals. Each signal in the LocSigDB is annotated with its localization, source, PubMed references and is linked to the proteins in UniProt database along with the organism information that contain the same amino acid pattern as the given signal. From LocSigDB webserver, users can download the whole database or browse/search for data using an intuitive query interface. To date, LocSigDB is the most comprehensive compendium of protein localization signals for eight distinct subcellular locations. Database URL: http://genome.unmc.edu/LocSigDB/ PMID:25725059

  7. bcl::Cluster : A method for clustering biological molecules coupled with visualization in the Pymol Molecular Graphics System.

    PubMed

    Alexander, Nathan; Woetzel, Nils; Meiler, Jens

    2011-02-01

    Clustering algorithms are used as data analysis tools in a wide variety of applications in Biology. Clustering has become especially important in protein structure prediction and virtual high throughput screening methods. In protein structure prediction, clustering is used to structure the conformational space of thousands of protein models. In virtual high throughput screening, databases with millions of drug-like molecules are organized by structural similarity, e.g. common scaffolds. The tree-like dendrogram structure obtained from hierarchical clustering can provide a qualitative overview of the results, which is important for focusing detailed analysis. However, in practice it is difficult to relate specific components of the dendrogram directly back to the objects of which it is comprised and to display all desired information within the two dimensions of the dendrogram. The current work presents a hierarchical agglomerative clustering method termed bcl::Cluster. bcl::Cluster utilizes the Pymol Molecular Graphics System to graphically depict dendrograms in three dimensions. This allows simultaneous display of relevant biological molecules as well as additional information about the clusters and the members comprising them.

  8. Rice proteome analysis: a step toward functional analysis of the rice genome.

    PubMed

    Komatsu, Setsuko; Tanaka, Naoki

    2005-03-01

    The technique of proteome analysis using 2-DE has the power to monitor global changes that occur in the protein complement of tissues and subcellular compartments. In this review, we describe construction of the rice proteome database, the cataloging of rice proteins, and the functional characterization of some of the proteins identified. Initially, proteins extracted from various tissues and organelles were separated by 2-DE and an image analyzer was used to construct a display or reference map of the proteins. The rice proteome database currently contains 23 reference maps based on 2-DE of proteins from different rice tissues and subcellular compartments. These reference maps comprise 13 129 rice proteins, and the amino acid sequences of 5092 of these proteins are entered in the database. Major proteins involved in growth or stress responses have been identified by using a proteomics approach and some of these proteins have unique functions. Furthermore, initial work has also begun on analyzing the phosphoproteome and protein-protein interactions in rice. The information obtained from the rice proteome database will aid in the molecular cloning of rice genes and in predicting the function of unknown proteins.

  9. Databases and Associated Tools for Glycomics and Glycoproteomics.

    PubMed

    Lisacek, Frederique; Mariethoz, Julien; Alocci, Davide; Rudd, Pauline M; Abrahams, Jodie L; Campbell, Matthew P; Packer, Nicolle H; Ståhle, Jonas; Widmalm, Göran; Mullen, Elaine; Adamczyk, Barbara; Rojas-Macias, Miguel A; Jin, Chunsheng; Karlsson, Niclas G

    2017-01-01

    The access to biodatabases for glycomics and glycoproteomics has proven to be essential for current glycobiological research. This chapter presents available databases that are devoted to different aspects of glycobioinformatics. This includes oligosaccharide sequence databases, experimental databases, 3D structure databases (of both glycans and glycorelated proteins) and association of glycans with tissue, disease, and proteins. Specific search protocols are also provided using tools associated with experimental databases for converting primary glycoanalytical data to glycan structural information. In particular, researchers using glycoanalysis methods by U/HPLC (GlycoBase), MS (GlycoWorkbench, UniCarb-DB, GlycoDigest), and NMR (CASPER) will benefit from this chapter. In addition we also include information on how to utilize glycan structural information to query databases that associate glycans with proteins (UniCarbKB) and with interactions with pathogens (SugarBind).

  10. Proteome of Caulobacter crescentus cell cycle publicly accessible on SWICZ server.

    PubMed

    Vohradsky, Jiri; Janda, Ivan; Grünenfelder, Björn; Berndt, Peter; Röder, Daniel; Langen, Hanno; Weiser, Jaroslav; Jenal, Urs

    2003-10-01

    Here we present the Swiss-Czech Proteomics Server (SWICZ), which hosts the proteomic database summarizing information about the cell cycle of the aquatic bacterium Caulobacter crescentus. The database provides a searchable tool for easy access of global protein synthesis and protein stability data as examined during the C. crescentus cell cycle. Protein synthesis data collected from five different cell cycle stages were determined for each protein spot as a relative value of the total amount of [(35)S]methionine incorporation. Protein stability of pulse-labeled extracts were measured during a chase period equivalent to one cell cycle unit. Quantitative information for individual proteins together with descriptive data such as protein identities, apparent molecular masses and isoelectric points, were combined with information on protein function, genomic context, and the cell cycle stage, and were then assembled in a relational database with a world wide web interface (http://proteom.biomed.cas.cz), which allows the database records to be searched and displays the recovered information. A total of 1250 protein spots were reproducibly detected on two-dimensional gel electropherograms, 295 of which were identified by mass spectroscopy. The database is accessible either through clickable two-dimensional gel electrophoretic maps or by means of a set of dedicated search engines. Basic characterization of the experimental procedures, data processing, and a comprehensive description of the web site are presented. In its current state, the SWICZ proteome database provides a platform for the incorporation of new data emerging from extended functional studies on the C. crescentus proteome.

  11. Visual and Verbal Learning in a Genetic Metabolic Disorder

    ERIC Educational Resources Information Center

    Spilkin, Amy M.; Ballantyne, Angela O.; Trauner, Doris A.

    2009-01-01

    Visual and verbal learning in a genetic metabolic disorder (cystinosis) were examined in the following three studies. The goal of Study I was to provide a normative database and establish the reliability and validity of a new test of visual learning and memory (Visual Learning and Memory Test; VLMT) that was modeled after a widely used test of…

  12. KnotProt: a database of proteins with knots and slipknots

    PubMed Central

    Jamroz, Michal; Niemyska, Wanda; Rawdon, Eric J.; Stasiak, Andrzej; Millett, Kenneth C.; Sułkowski, Piotr; Sulkowska, Joanna I.

    2015-01-01

    The protein topology database KnotProt, http://knotprot.cent.uw.edu.pl/, collects information about protein structures with open polypeptide chains forming knots or slipknots. The knotting complexity of the cataloged proteins is presented in the form of a matrix diagram that shows users the knot type of the entire polypeptide chain and of each of its subchains. The pattern visible in the matrix gives the knotting fingerprint of a given protein and permits users to determine, for example, the minimal length of the knotted regions (knot's core size) or the depth of a knot, i.e. how many amino acids can be removed from either end of the cataloged protein structure before converting it from a knot to a different type of knot. In addition, the database presents extensive information about the biological functions, families and fold types of proteins with non-trivial knotting. As an additional feature, the KnotProt database enables users to submit protein or polymer chains and generate their knotting fingerprints. PMID:25361973

  13. Integrated Controlling System and Unified Database for High Throughput Protein Crystallography Experiments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gaponov, Yu.A.; Igarashi, N.; Hiraki, M.

    2004-05-12

    An integrated controlling system and a unified database for high throughput protein crystallography experiments have been developed. Main features of protein crystallography experiments (purification, crystallization, crystal harvesting, data collection, data processing) were integrated into the software under development. All information necessary to perform protein crystallography experiments is stored (except raw X-ray data that are stored in a central data server) in a MySQL relational database. The database contains four mutually linked hierarchical trees describing protein crystals, data collection of protein crystal and experimental data processing. A database editor was designed and developed. The editor supports basic database functions to view,more » create, modify and delete user records in the database. Two search engines were realized: direct search of necessary information in the database and object oriented search. The system is based on TCP/IP secure UNIX sockets with four predefined sending and receiving behaviors, which support communications between all connected servers and clients with remote control functions (creating and modifying data for experimental conditions, data acquisition, viewing experimental data, and performing data processing). Two secure login schemes were designed and developed: a direct method (using the developed Linux clients with secure connection) and an indirect method (using the secure SSL connection using secure X11 support from any operating system with X-terminal and SSH support). A part of the system has been implemented on a new MAD beam line, NW12, at the Photon Factory Advanced Ring for general user experiments.« less

  14. Introducing the CPL/MUW proteome database: interpretation of human liver and liver cancer proteome profiles by referring to isolated primary cells.

    PubMed

    Wimmer, Helge; Gundacker, Nina C; Griss, Johannes; Haudek, Verena J; Stättner, Stefan; Mohr, Thomas; Zwickl, Hannes; Paulitschke, Verena; Baron, David M; Trittner, Wolfgang; Kubicek, Markus; Bayer, Editha; Slany, Astrid; Gerner, Christopher

    2009-06-01

    Interpretation of proteome data with a focus on biomarker discovery largely relies on comparative proteome analyses. Here, we introduce a database-assisted interpretation strategy based on proteome profiles of primary cells. Both 2-D-PAGE and shotgun proteomics are applied. We obtain high data concordance with these two different techniques. When applying mass analysis of tryptic spot digests from 2-D gels of cytoplasmic fractions, we typically identify several hundred proteins. Using the same protein fractions, we usually identify more than thousand proteins by shotgun proteomics. The data consistency obtained when comparing these independent data sets exceeds 99% of the proteins identified in the 2-D gels. Many characteristic differences in protein expression of different cells can thus be independently confirmed. Our self-designed SQL database (CPL/MUW - database of the Clinical Proteomics Laboratories at the Medical University of Vienna accessible via www.meduniwien.ac.at/proteomics/database) facilitates (i) quality management of protein identification data, which are based on MS, (ii) the detection of cell type-specific proteins and (iii) of molecular signatures of specific functional cell states. Here, we demonstrate, how the interpretation of proteome profiles obtained from human liver tissue and hepatocellular carcinoma tissue is assisted by the Clinical Proteomics Laboratories at the Medical University of Vienna-database. Therefore, we suggest that the use of reference experiments supported by a tailored database may substantially facilitate data interpretation of proteome profiling experiments.

  15. Extraction, integration and analysis of alternative splicing and protein structure distributed information

    PubMed Central

    D'Antonio, Matteo; Masseroli, Marco

    2009-01-01

    Background Alternative splicing has been demonstrated to affect most of human genes; different isoforms from the same gene encode for proteins which differ for a limited number of residues, thus yielding similar structures. This suggests possible correlations between alternative splicing and protein structure. In order to support the investigation of such relationships, we have developed the Alternative Splicing and Protein Structure Scrutinizer (PASS), a Web application to automatically extract, integrate and analyze human alternative splicing and protein structure data sparsely available in the Alternative Splicing Database, Ensembl databank and Protein Data Bank. Primary data from these databases have been integrated and analyzed using the Protein Identifier Cross-Reference, BLAST, CLUSTALW and FeatureMap3D software tools. Results A database has been developed to store the considered primary data and the results from their analysis; a system of Perl scripts has been implemented to automatically create and update the database and analyze the integrated data; a Web interface has been implemented to make the analyses easily accessible; a database has been created to manage user accesses to the PASS Web application and store user's data and searches. Conclusion PASS automatically integrates data from the Alternative Splicing Database with protein structure data from the Protein Data Bank. Additionally, it comprehensively analyzes the integrated data with publicly available well-known bioinformatics tools in order to generate structural information of isoform pairs. Further analysis of such valuable information might reveal interesting relationships between alternative splicing and protein structure differences, which may be significantly associated with different functions. PMID:19828075

  16. iview: an interactive WebGL visualizer for protein-ligand complex.

    PubMed

    Li, Hongjian; Leung, Kwong-Sak; Nakane, Takanori; Wong, Man-Hon

    2014-02-25

    Visualization of protein-ligand complex plays an important role in elaborating protein-ligand interactions and aiding novel drug design. Most existing web visualizers either rely on slow software rendering, or lack virtual reality support. The vital feature of macromolecular surface construction is also unavailable. We have developed iview, an easy-to-use interactive WebGL visualizer of protein-ligand complex. It exploits hardware acceleration rather than software rendering. It features three special effects in virtual reality settings, namely anaglyph, parallax barrier and oculus rift, resulting in visually appealing identification of intermolecular interactions. It supports four surface representations including Van der Waals surface, solvent excluded surface, solvent accessible surface and molecular surface. Moreover, based on the feature-rich version of iview, we have also developed a neat and tailor-made version specifically for our istar web platform for protein-ligand docking purpose. This demonstrates the excellent portability of iview. Using innovative 3D techniques, we provide a user friendly visualizer that is not intended to compete with professional visualizers, but to enable easy accessibility and platform independence.

  17. GRBase, a new gene regulation data base available by anonymous ftp.

    PubMed Central

    Collier, B; Danielsen, M

    1994-01-01

    The Gene Regulation Database (GRBase) is a compendium of information on the structure and function of proteins involved in the control of gene expression in eukaryotes. These proteins include transcription factors, proteins involved in signal transduction, and receptors. The database can be obtained by FTP in Filemaker Pro, text, and postscript formats. The database will be expanded in the coming year to include reviews on families of proteins involved in gene regulation and to allow online searching. PMID:7937071

  18. Multidimensional structured data visualization method and apparatus, text visualization method and apparatus, method and apparatus for visualizing and graphically navigating the world wide web, method and apparatus for visualizing hierarchies

    DOEpatents

    Risch, John S [Kennewick, WA; Dowson, Scott T [West Richland, WA; Hart, Michelle L [Richland, WA; Hatley, Wes L [Kennewick, WA

    2008-05-13

    A method of displaying correlations among information objects comprises receiving a query against a database; obtaining a query result set; and generating a visualization representing the components of the result set, the visualization including one of a plane and line to represent a data field, nodes representing data values, and links showing correlations among fields and values. Other visualization methods and apparatus are disclosed.

  19. Multidimensional structured data visualization method and apparatus, text visualization method and apparatus, method and apparatus for visualizing and graphically navigating the world wide web, method and apparatus for visualizing hierarchies

    DOEpatents

    Risch, John S [Kennewick, WA; Dowson, Scott T [West Richland, WA

    2012-03-06

    A method of displaying correlations among information objects includes receiving a query against a database; obtaining a query result set; and generating a visualization representing the components of the result set, the visualization including one of a plane and line to represent a data field, nodes representing data values, and links showing correlations among fields and values. Other visualization methods and apparatus are disclosed.

  20. DBMap: a TreeMap-based framework for data navigation and visualization of brain research registry

    NASA Astrophysics Data System (ADS)

    Zhang, Ming; Zhang, Hong; Tjandra, Donny; Wong, Stephen T. C.

    2003-05-01

    The purpose of this study is to investigate and apply a new, intuitive and space-conscious visualization framework to facilitate efficient data presentation and exploration of large-scale data warehouses. We have implemented the DBMap framework for the UCSF Brain Research Registry. Such a novel utility would facilitate medical specialists and clinical researchers in better exploring and evaluating a number of attributes organized in the brain research registry. The current UCSF Brain Research Registry consists of a federation of disease-oriented database modules, including Epilepsy, Brain Tumor, Intracerebral Hemorrphage, and CJD (Creuzfeld-Jacob disease). These database modules organize large volumes of imaging and non-imaging data to support Web-based clinical research. While the data warehouse supports general information retrieval and analysis, there lacks an effective way to visualize and present the voluminous and complex data stored. This study investigates whether the TreeMap algorithm can be adapted to display and navigate categorical biomedical data warehouse or registry. TreeMap is a space constrained graphical representation of large hierarchical data sets, mapped to a matrix of rectangles, whose size and color represent interested database fields. It allows the display of a large amount of numerical and categorical information in limited real estate of computer screen with an intuitive user interface. The paper will describe, DBMap, the proposed new data visualization framework for large biomedical databases. Built upon XML, Java and JDBC technologies, the prototype system includes a set of software modules that reside in the application server tier and provide interface to backend database tier and front-end Web tier of the brain registry.

  1. Interoperability, Data Control and Battlespace Visualization using XML, XSLT and X3D

    DTIC Science & Technology

    2003-09-01

    26 Rosenthal, Arnon, Seligman , Len and Costello, Roger, XML, Databases, and Interoperability, Federal Database Colloquium, AFCEA, San Diego...79 Rosenthal, Arnon, Seligman , Len and Costello, Roger, “XML, Databases, and Interoperability”, Federal Database Colloquium, AFCEA, San Diego, 1999... Linda , Mastering XML, Premium Edition, SYBEX, 2001 Wooldridge, Michael , An Introduction to MultiAgent Systems, Wiley, 2002 PAPERS Abernathy, M

  2. Quality labeled faces in the wild (QLFW): a database for studying face recognition in real-world environments

    NASA Astrophysics Data System (ADS)

    Karam, Lina J.; Zhu, Tong

    2015-03-01

    The varying quality of face images is an important challenge that limits the effectiveness of face recognition technology when applied in real-world applications. Existing face image databases do not consider the effect of distortions that commonly occur in real-world environments. This database (QLFW) represents an initial attempt to provide a set of labeled face images spanning the wide range of quality, from no perceived impairment to strong perceived impairment for face detection and face recognition applications. Types of impairment include JPEG2000 compression, JPEG compression, additive white noise, Gaussian blur and contrast change. Subjective experiments are conducted to assess the perceived visual quality of faces under different levels and types of distortions and also to assess the human recognition performance under the considered distortions. One goal of this work is to enable automated performance evaluation of face recognition technologies in the presence of different types and levels of visual distortions. This will consequently enable the development of face recognition systems that can operate reliably on real-world visual content in the presence of real-world visual distortions. Another goal is to enable the development and assessment of visual quality metrics for face images and for face detection and recognition applications.

  3. Columba: an integrated database of proteins, structures, and annotations.

    PubMed

    Trissl, Silke; Rother, Kristian; Müller, Heiko; Steinke, Thomas; Koch, Ina; Preissner, Robert; Frömmel, Cornelius; Leser, Ulf

    2005-03-31

    Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures. COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web. The COLUMBA database facilitates the creation of protein structure data sets for many structure-based studies. It allows to combine queries on a number of structure-related databases not covered by other projects at present. Thus, information on both many and few protein structures can be used efficiently. The web interface for COLUMBA is available at http://www.columba-db.de.

  4. An Animated Introduction to Relational Databases for Many Majors

    ERIC Educational Resources Information Center

    Dietrich, Suzanne W.; Goelman, Don; Borror, Connie M.; Crook, Sharon M.

    2015-01-01

    Database technology affects many disciplines beyond computer science and business. This paper describes two animations developed with images and color that visually and dynamically introduce fundamental relational database concepts and querying to students of many majors. The goal is for educators in diverse academic disciplines to incorporate the…

  5. Substrate-Driven Mapping of the Degradome by Comparison of Sequence Logos

    PubMed Central

    Fuchs, Julian E.; von Grafenstein, Susanne; Huber, Roland G.; Kramer, Christian; Liedl, Klaus R.

    2013-01-01

    Sequence logos are frequently used to illustrate substrate preferences and specificity of proteases. Here, we employed the compiled substrates of the MEROPS database to introduce a novel metric for comparison of protease substrate preferences. The constructed similarity matrix of 62 proteases can be used to intuitively visualize similarities in protease substrate readout via principal component analysis and construction of protease specificity trees. Since our new metric is solely based on substrate data, we can engraft the protease tree including proteolytic enzymes of different evolutionary origin. Thereby, our analyses confirm pronounced overlaps in substrate recognition not only between proteases closely related on sequence basis but also between proteolytic enzymes of different evolutionary origin and catalytic type. To illustrate the applicability of our approach we analyze the distribution of targets of small molecules from the ChEMBL database in our substrate-based protease specificity trees. We observe a striking clustering of annotated targets in tree branches even though these grouped targets do not necessarily share similarity on protein sequence level. This highlights the value and applicability of knowledge acquired from peptide substrates in drug design of small molecules, e.g., for the prediction of off-target effects or drug repurposing. Consequently, our similarity metric allows to map the degradome and its associated drug target network via comparison of known substrate peptides. The substrate-driven view of protein-protein interfaces is not limited to the field of proteases but can be applied to any target class where a sufficient amount of known substrate data is available. PMID:24244149

  6. sc-PDB: a 3D-database of ligandable binding sites—10 years on

    PubMed Central

    Desaphy, Jérémy; Bret, Guillaume; Rognan, Didier; Kellenberger, Esther

    2015-01-01

    The sc-PDB database (available at http://bioinfo-pharma.u-strasbg.fr/scPDB/) is a comprehensive and up-to-date selection of ligandable binding sites of the Protein Data Bank. Sites are defined from complexes between a protein and a pharmacological ligand. The database provides the all-atom description of the protein, its ligand, their binding site and their binding mode. Currently, the sc-PDB archive registers 9283 binding sites from 3678 unique proteins and 5608 unique ligands. The sc-PDB database was publicly launched in 2004 with the aim of providing structure files suitable for computational approaches to drug design, such as docking. During the last 10 years we have improved and standardized the processes for (i) identifying binding sites, (ii) correcting structures, (iii) annotating protein function and ligand properties and (iv) characterizing their binding mode. This paper presents the latest enhancements in the database, specifically pertaining to the representation of molecular interaction and to the similarity between ligand/protein binding patterns. The new website puts emphasis in pictorial analysis of data. PMID:25300483

  7. MiCroKit 3.0: an integrated database of midbody, centrosome and kinetochore.

    PubMed

    Ren, Jian; Liu, Zexian; Gao, Xinjiao; Jin, Changjiang; Ye, Mingliang; Zou, Hanfa; Wen, Longping; Zhang, Zhaolei; Xue, Yu; Yao, Xuebiao

    2010-01-01

    During cell division/mitosis, a specific subset of proteins is spatially and temporally assembled into protein super complexes in three distinct regions, i.e. centrosome/spindle pole, kinetochore/centromere and midbody/cleavage furrow/phragmoplast/bud neck, and modulates cell division process faithfully. Although many experimental efforts have been carried out to investigate the characteristics of these proteins, no integrated database was available. Here, we present the MiCroKit database (http://microkit.biocuckoo.org) of proteins that localize in midbody, centrosome and/or kinetochore. We collected into the MiCroKit database experimentally verified microkit proteins from the scientific literature that have unambiguous supportive evidence for subcellular localization under fluorescent microscope. The current version of MiCroKit 3.0 provides detailed information for 1489 microkit proteins from seven model organisms, including Saccharomyces cerevisiae, Schizasaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, Xenopus laevis, Mus musculus and Homo sapiens. Moreover, the orthologous information was provided for these microkit proteins, and could be a useful resource for further experimental identification. The online service of MiCroKit database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0).

  8. MiCroKit 3.0: an integrated database of midbody, centrosome and kinetochore

    PubMed Central

    Liu, Zexian; Gao, Xinjiao; Jin, Changjiang; Ye, Mingliang; Zou, Hanfa; Wen, Longping; Zhang, Zhaolei; Xue, Yu; Yao, Xuebiao

    2010-01-01

    During cell division/mitosis, a specific subset of proteins is spatially and temporally assembled into protein super complexes in three distinct regions, i.e. centrosome/spindle pole, kinetochore/centromere and midbody/cleavage furrow/phragmoplast/bud neck, and modulates cell division process faithfully. Although many experimental efforts have been carried out to investigate the characteristics of these proteins, no integrated database was available. Here, we present the MiCroKit database (http://microkit.biocuckoo.org) of proteins that localize in midbody, centrosome and/or kinetochore. We collected into the MiCroKit database experimentally verified microkit proteins from the scientific literature that have unambiguous supportive evidence for subcellular localization under fluorescent microscope. The current version of MiCroKit 3.0 provides detailed information for 1489 microkit proteins from seven model organisms, including Saccharomyces cerevisiae, Schizasaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, Xenopus laevis, Mus musculus and Homo sapiens. Moreover, the orthologous information was provided for these microkit proteins, and could be a useful resource for further experimental identification. The online service of MiCroKit database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0). PMID:19783819

  9. SASD: the Synthetic Alternative Splicing Database for identifying novel isoform from proteomics

    PubMed Central

    2013-01-01

    Background Alternative splicing is an important and widespread mechanism for generating protein diversity and regulating protein expression. High-throughput identification and analysis of alternative splicing in the protein level has more advantages than in the mRNA level. The combination of alternative splicing database and tandem mass spectrometry provides a powerful technique for identification, analysis and characterization of potential novel alternative splicing protein isoforms from proteomics. Therefore, based on the peptidomic database of human protein isoforms for proteomics experiments, our objective is to design a new alternative splicing database to 1) provide more coverage of genes, transcripts and alternative splicing, 2) exclusively focus on the alternative splicing, and 3) perform context-specific alternative splicing analysis. Results We used a three-step pipeline to create a synthetic alternative splicing database (SASD) to identify novel alternative splicing isoforms and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. First, we extracted information on gene structures of all genes in the Ensembl Genes 71 database and incorporated the Integrated Pathway Analysis Database. Then, we compiled artificial splicing transcripts. Lastly, we translated the artificial transcripts into alternative splicing peptides. The SASD is a comprehensive database containing 56,630 genes (Ensembl gene IDs), 95,260 transcripts (Ensembl transcript IDs), and 11,919,779 Alternative Splicing peptides, and also covering about 1,956 pathways, 6,704 diseases, 5,615 drugs, and 52 organs. The database has a web-based user interface that allows users to search, display and download a single gene/transcript/protein, custom gene set, pathway, disease, drug, organ related alternative splicing. Moreover, the quality of the database was validated with comparison to other known databases and two case studies: 1) in liver cancer and 2) in breast cancer. Conclusions The SASD provides the scientific community with an efficient means to identify, analyze, and characterize novel Exon Skipping and Intron Retention protein isoforms from mass spectrometry and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. PMID:24267658

  10. Why are they missing? : Bioinformatics characterization of missing human proteins.

    PubMed

    Elguoshy, Amr; Magdeldin, Sameh; Xu, Bo; Hirao, Yoshitoshi; Zhang, Ying; Kinoshita, Naohiko; Takisawa, Yusuke; Nameta, Masaaki; Yamamoto, Keiko; El-Refy, Ali; El-Fiky, Fawzy; Yamamoto, Tadashi

    2016-10-21

    NeXtProt is a web-based protein knowledge platform that supports research on human proteins. NeXtProt (release 2015-04-28) lists 20,060 proteins, among them, 3373 canonical proteins (16.8%) lack credible experimental evidence at protein level (PE2:PE5). Therefore, they are considered as "missing proteins". A comprehensive bioinformatic workflow has been proposed to analyze these "missing" proteins. The aims of current study were to analyze physicochemical properties, existence and distribution of the tryptic cleavage sites, and to pinpoint the signature peptides of the missing proteins. Our findings showed that 23.7% of missing proteins were hydrophobic proteins possessing transmembrane domains (TMD). Also, forty missing entries generate tryptic peptides were either out of mass detection range (>30aa) or mapped to different proteins (<9aa). Additionally, 21% of missing entries didn't generate any unique tryptic peptides. In silico endopeptidase combination strategy increased the possibility of missing proteins identification. Coherently, using both mature protein database and signal peptidome database could be a promising option to identify some missing proteins by targeting their unique N-terminal tryptic peptide from mature protein database and or C-terminus tryptic peptide from signal peptidome database. In conclusion, Identification of missing protein requires additional consideration during sample preparation, extraction, digestion and data analysis to increase its incidence of identification. Copyright © 2016. Published by Elsevier B.V.

  11. Receptor Tyrosine Kinase MET Interactome and Neurodevelopmental Disorder Partners at the Developing Synapse

    PubMed Central

    Xie, Zhihui; Li, Jing; Baker, Jonathan; Eagleson, Kathie L.; Coba, Marcelo P.; Levitt, Pat

    2016-01-01

    Background Atypical synapse development and plasticity are implicated in many neurodevelopmental disorders (NDDs). NDD-associated, high confidence risk genes have been identified, yet little is known about functional relationships at the level of protein-protein interactions, which are the dominant molecular bases responsible for mediating circuit development. Methods Proteomics in three independent developing neocortical synaptosomal preparations identified putative interacting proteins of the ligand-activated MET receptor tyrosine kinase, an autism risk gene that mediates synapse development. The candidates were translated into interactome networks and analyzed bioinformatically. Additionally, three independent quantitative proximity ligation assays (PLA) in cultured neurons and four independent immunoprecipitation analyses of synaptosomes validated protein interactions. Results Approximately 11% (8/72) of MET-interacting proteins, including SHANK3, SYNGAP1 and GRIN2B, are associated with NDDs. Proteins in the MET interactome were translated into a novel MET interactome network based on human protein-protein interaction databases. High confidence genes from different NDD datasets that encode synaptosomal proteins were analyzed for being enriched in MET interactome proteins. This was found for autism, but not schizophrenia, bipolar disorder, major depressive disorder or attentional deficit hyperactivity disorder. There is correlated gene expression between MET and its interactive partners in developing human temporal and visual neocortices, but not with highly expressed genes that are not in the interactome. PLA and biochemical analyses demonstrate that MET-protein partner interactions are dynamically regulated by receptor activation. Conclusions The results provide a novel molecular framework for deciphering the functional relations of key regulators of synaptogenesis that contribute to both typical cortical development and to NDDs. PMID:27086544

  12. Production of a reference transcriptome and transcriptomic database (EdwardsiellaBase) for the lined sea anemone, Edwardsiella lineata, a parasitic cnidarian

    PubMed Central

    2014-01-01

    Background The lined sea anemone Edwardsiella lineata is an informative model system for evolutionary-developmental studies of parasitism. In this species, it is possible to compare alternate developmental pathways leading from a larva to either a free-living polyp or a vermiform parasite that inhabits the mesoglea of a ctenophore host. Additionally, E. lineata is confamilial with the model cnidarian Nematostella vectensis, providing an opportunity for comparative genomic, molecular and organismal studies. Description We generated a reference transcriptome for E. lineata via high-throughput sequencing of RNA isolated from five developmental stages (parasite; parasite-to-larva transition; larva; larva-to-adult transition; adult). The transcriptome comprises 90,440 contigs assembled from >15 billion nucleotides of DNA sequence. Using a molecular clock approach, we estimated the divergence between E. lineata and N. vectensis at 215–364 million years ago. Based on gene ontology and metabolic pathway analyses and gene family surveys (bHLH-PAS, deiodinases, Fox genes, LIM homeodomains, minicollagens, nuclear receptors, Sox genes, and Wnts), the transcriptome of E. lineata is comparable in depth and completeness to N. vectensis. Analyses of protein motifs and revealed extensive conservation between the proteins of these two edwardsiid anemones, although we show the NF-κB protein of E. lineata reflects the ancestral structure, while the NF-κB protein of N. vectensis has undergone a split that separates the DNA-binding domain from the inhibitory domain. All contigs have been deposited in a public database (EdwardsiellaBase), where they may be searched according to contig ID, gene ontology, protein family motif (Pfam), enzyme commission number, and BLAST. The alignment of the raw reads to the contigs can also be visualized via JBrowse. Conclusions The transcriptomic data and database described here provide a platform for studying the evolutionary developmental genomics of a derived parasitic life cycle. In addition, these data from E. lineata will aid in the interpretation of evolutionary novelties in gene sequence or structure that have been reported for the model cnidarian N. vectensis (e.g., the split NF-κB locus). Finally, we include custom computational tools to facilitate the annotation of a transcriptome based on high-throughput sequencing data obtained from a “non-model system.” PMID:24467778

  13. Production of a reference transcriptome and transcriptomic database (EdwardsiellaBase) for the lined sea anemone, Edwardsiella lineata, a parasitic cnidarian.

    PubMed

    Stefanik, Derek J; Lubinski, Tristan J; Granger, Brian R; Byrd, Allyson L; Reitzel, Adam M; DeFilippo, Lukas; Lorenc, Allison; Finnerty, John R

    2014-01-28

    The lined sea anemone Edwardsiella lineata is an informative model system for evolutionary-developmental studies of parasitism. In this species, it is possible to compare alternate developmental pathways leading from a larva to either a free-living polyp or a vermiform parasite that inhabits the mesoglea of a ctenophore host. Additionally, E. lineata is confamilial with the model cnidarian Nematostella vectensis, providing an opportunity for comparative genomic, molecular and organismal studies. We generated a reference transcriptome for E. lineata via high-throughput sequencing of RNA isolated from five developmental stages (parasite; parasite-to-larva transition; larva; larva-to-adult transition; adult). The transcriptome comprises 90,440 contigs assembled from >15 billion nucleotides of DNA sequence. Using a molecular clock approach, we estimated the divergence between E. lineata and N. vectensis at 215-364 million years ago. Based on gene ontology and metabolic pathway analyses and gene family surveys (bHLH-PAS, deiodinases, Fox genes, LIM homeodomains, minicollagens, nuclear receptors, Sox genes, and Wnts), the transcriptome of E. lineata is comparable in depth and completeness to N. vectensis. Analyses of protein motifs and revealed extensive conservation between the proteins of these two edwardsiid anemones, although we show the NF-κB protein of E. lineata reflects the ancestral structure, while the NF-κB protein of N. vectensis has undergone a split that separates the DNA-binding domain from the inhibitory domain. All contigs have been deposited in a public database (EdwardsiellaBase), where they may be searched according to contig ID, gene ontology, protein family motif (Pfam), enzyme commission number, and BLAST. The alignment of the raw reads to the contigs can also be visualized via JBrowse. The transcriptomic data and database described here provide a platform for studying the evolutionary developmental genomics of a derived parasitic life cycle. In addition, these data from E. lineata will aid in the interpretation of evolutionary novelties in gene sequence or structure that have been reported for the model cnidarian N. vectensis (e.g., the split NF-κB locus). Finally, we include custom computational tools to facilitate the annotation of a transcriptome based on high-throughput sequencing data obtained from a "non-model system."

  14. ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes

    PubMed Central

    Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

    2010-01-01

    Motivation: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith–Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid™, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. Availability: The database can be accessed through http://proteinworlddb.org Contact: otto@fiocruz.br PMID:20089515

  15. Algorithms for database-dependent search of MS/MS data.

    PubMed

    Matthiesen, Rune

    2013-01-01

    The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.

  16. DOMMINO 2.0: integrating structurally resolved protein-, RNA-, and DNA-mediated macromolecular interactions

    PubMed Central

    Kuang, Xingyan; Dhroso, Andi; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry

    2016-01-01

    Macromolecular interactions are formed between proteins, DNA and RNA molecules. Being a principle building block in macromolecular assemblies and pathways, the interactions underlie most of cellular functions. Malfunctioning of macromolecular interactions is also linked to a number of diseases. Structural knowledge of the macromolecular interaction allows one to understand the interaction’s mechanism, determine its functional implications and characterize the effects of genetic variations, such as single nucleotide polymorphisms, on the interaction. Unfortunately, until now the interactions mediated by different types of macromolecules, e.g. protein–protein interactions or protein–DNA interactions, are collected into individual and unrelated structural databases. This presents a significant obstacle in the analysis of macromolecular interactions. For instance, the homogeneous structural interaction databases prevent scientists from studying structural interactions of different types but occurring in the same macromolecular complex. Here, we introduce DOMMINO 2.0, a structural Database Of Macro-Molecular INteractiOns. Compared to DOMMINO 1.0, a comprehensive database on protein-protein interactions, DOMMINO 2.0 includes the interactions between all three basic types of macromolecules extracted from PDB files. DOMMINO 2.0 is automatically updated on a weekly basis. It currently includes ∼1 040 000 interactions between two polypeptide subunits (e.g. domains, peptides, termini and interdomain linkers), ∼43 000 RNA-mediated interactions, and ∼12 000 DNA-mediated interactions. All protein structures in the database are annotated using SCOP and SUPERFAMILY family annotation. As a result, protein-mediated interactions involving protein domains, interdomain linkers, C- and N- termini, and peptides are identified. Our database provides an intuitive web interface, allowing one to investigate interactions at three different resolution levels: whole subunit network, binary interaction and interaction interface. Database URL: http://dommino.org PMID:26827237

  17. The structure and dipole moment of globular proteins in solution and crystalline states: use of NMR and X-ray databases for the numerical calculation of dipole moment.

    PubMed

    Takashima, S

    2001-04-05

    The large dipole moment of globular proteins has been well known because of the detailed studies using dielectric relaxation and electro-optical methods. The search for the origin of these dipolemoments, however, must be based on the detailed knowledge on protein structure with atomic resolutions. At present, we have two sources of information on the structure of protein molecules: (1) x-ray databases obtained in crystalline state; (2) NMR databases obtained in solution state. While x-ray databases consist of only one model, NMR databases, because of the fluctuation of the protein folding in solution, consist of a number of models, thus enabling the computation of dipole moment repeated for all these models. The aim of this work, using these databases, is the detailed investigation on the interdependence between the structure and dipole moment of protein molecules. The dipole moment of protein molecules has roughly two components: one dipole moment is due to surface charges and the other, core dipole moment, is due to polar groups such as N--H and C==O bonds. The computation of surface charge dipole moment consists of two steps: (A) calculation of the pK shifts of charged groups for electrostatic interactions and (B) calculation of the dipole moment using the pK corrected for electrostatic shifts. The dipole moments of several proteins were computed using both NMR and x-ray databases. The dipole moments of these two sets of calculations are, with a few exceptions, in good agreement with one another and also with measured dipole moments.

  18. Inconsistencies in the red blood cell membrane proteome analysis: generation of a database for research and diagnostic applications

    PubMed Central

    Hegedűs, Tamás; Chaubey, Pururawa Mayank; Várady, György; Szabó, Edit; Sarankó, Hajnalka; Hofstetter, Lia; Roschitzki, Bernd; Sarkadi, Balázs

    2015-01-01

    Based on recent results, the determination of the easily accessible red blood cell (RBC) membrane proteins may provide new diagnostic possibilities for assessing mutations, polymorphisms or regulatory alterations in diseases. However, the analysis of the current mass spectrometry-based proteomics datasets and other major databases indicates inconsistencies—the results show large scattering and only a limited overlap for the identified RBC membrane proteins. Here, we applied membrane-specific proteomics studies in human RBC, compared these results with the data in the literature, and generated a comprehensive and expandable database using all available data sources. The integrated web database now refers to proteomic, genetic and medical databases as well, and contains an unexpected large number of validated membrane proteins previously thought to be specific for other tissues and/or related to major human diseases. Since the determination of protein expression in RBC provides a method to indicate pathological alterations, our database should facilitate the development of RBC membrane biomarker platforms and provide a unique resource to aid related further research and diagnostics. Database URL: http://rbcc.hegelab.org PMID:26078478

  19. Database resources of the National Center for Biotechnology Information

    PubMed Central

    2015-01-01

    The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (Bookshelf, PubMed Central (PMC) and PubReader); medical genetics (ClinVar, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen); genes and genomics (BioProject, BioSample, dbSNP, dbVar, Epigenomics, Gene, Gene Expression Omnibus (GEO), Genome, HomoloGene, the Map Viewer, Nucleotide, PopSet, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser, Trace Archive and UniGene); and proteins and chemicals (Biosystems, COBALT, the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB), Protein Clusters, Protein and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for many of these databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. PMID:25398906

  20. Database resources of the National Center for Biotechnology Information

    PubMed Central

    2016-01-01

    The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (PubMed Central (PMC), Bookshelf and PubReader), health (ClinVar, dbGaP, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen), genomes (BioProject, Assembly, Genome, BioSample, dbSNP, dbVar, Epigenomics, the Map Viewer, Nucleotide, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser and the Trace Archive), genes (Gene, Gene Expression Omnibus (GEO), HomoloGene, PopSet and UniGene), proteins (Protein, the Conserved Domain Database (CDD), COBALT, Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB) and Protein Clusters) and chemicals (Biosystems and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for most of these databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:26615191

  1. Role for protein–protein interaction databases in human genetics

    PubMed Central

    Pattin, Kristine A; Moore, Jason H

    2010-01-01

    Proteomics and the study of protein–protein interactions are becoming increasingly important in our effort to understand human diseases on a system-wide level. Thanks to the development and curation of protein-interaction databases, up-to-date information on these interaction networks is accessible and publicly available to the scientific community. As our knowledge of protein–protein interactions increases, it is important to give thought to the different ways that these resources can impact biomedical research. In this article, we highlight the importance of protein–protein interactions in human genetics and genetic epidemiology. Since protein–protein interactions demonstrate one of the strongest functional relationships between genes, combining genomic data with available proteomic data may provide us with a more in-depth understanding of common human diseases. In this review, we will discuss some of the fundamentals of protein interactions, the databases that are publicly available and how information from these databases can be used to facilitate genome-wide genetic studies. PMID:19929610

  2. TryTransDB: A web-based resource for transport proteins in Trypanosomatidae.

    PubMed

    Sonar, Krushna; Kabra, Ritika; Singh, Shailza

    2018-03-12

    TryTransDB is a web-based resource that stores transport protein data which can be retrieved using a standalone BLAST tool. We have attempted to create an integrated database that can be a one-stop shop for the researchers working with transport proteins of Trypanosomatidae family. TryTransDB (Trypanosomatidae Transport Protein Database) is a web based comprehensive resource that can fire a BLAST search against most of the transport protein sequences (protein and nucleotide) from Trypanosomatidae family organisms. This web resource further allows to compute a phylogenetic tree by performing multiple sequence alignment (MSA) using CLUSTALW suite embedded in it. Also, cross-linking to other databases helps in gathering more information for a certain transport protein in a single website.

  3. AIM: A comprehensive Arabidopsis Interactome Module database and related interologs in plants

    USDA-ARS?s Scientific Manuscript database

    Systems biology analysis of protein modules is important for understanding the functional relationships between proteins in the interactome. Here, we present a comprehensive database named AIM for Arabidopsis (Arabidopsis thaliana) interactome modules. The database contains almost 250,000 modules th...

  4. VaProS: a database-integration approach for protein/genome information retrieval.

    PubMed

    Gojobori, Takashi; Ikeo, Kazuho; Katayama, Yukie; Kawabata, Takeshi; Kinjo, Akira R; Kinoshita, Kengo; Kwon, Yeondae; Migita, Ohsuke; Mizutani, Hisashi; Muraoka, Masafumi; Nagata, Koji; Omori, Satoshi; Sugawara, Hideaki; Yamada, Daichi; Yura, Kei

    2016-12-01

    Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein-protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts' knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/ .

  5. mTM-align: a server for fast protein structure database search and multiple protein structure alignment.

    PubMed

    Dong, Runze; Pan, Shuo; Peng, Zhenling; Zhang, Yang; Yang, Jianyi

    2018-05-21

    With the rapid increase of the number of protein structures in the Protein Data Bank, it becomes urgent to develop algorithms for efficient protein structure comparisons. In this article, we present the mTM-align server, which consists of two closely related modules: one for structure database search and the other for multiple structure alignment. The database search is speeded up based on a heuristic algorithm and a hierarchical organization of the structures in the database. The multiple structure alignment is performed using the recently developed algorithm mTM-align. Benchmark tests demonstrate that our algorithms outperform other peering methods for both modules, in terms of speed and accuracy. One of the unique features for the server is the interplay between database search and multiple structure alignment. The server provides service not only for performing fast database search, but also for making accurate multiple structure alignment with the structures found by the search. For the database search, it takes about 2-5 min for a structure of a medium size (∼300 residues). For the multiple structure alignment, it takes a few seconds for ∼10 structures of medium sizes. The server is freely available at: http://yanglab.nankai.edu.cn/mTM-align/.

  6. Open Access Internet Resources for Nano-Materials Physics Education

    NASA Astrophysics Data System (ADS)

    Moeck, Peter; Seipel, Bjoern; Upreti, Girish; Harvey, Morgan; Garrick, Will

    2006-05-01

    Because a great deal of nano-material science and engineering relies on crystalline materials, materials physicists have to provide their own specific contributions to the National Nanotechnology Initiative. Here we briefly review two freely accessible internet-based crystallographic databases, the Nano-Crystallography Database (http://nanocrystallography.research.pdx.edu) and the Crystallography Open Database (http://crystallography.net). Information on over 34,000 full structure determinations are stored in these two databases in the Crystallographic Information File format. The availability of such crystallographic data on the internet in a standardized format allows for all kinds of web-based crystallographic calculations and visualizations. Two examples of which that are dealt with in this paper are: interactive crystal structure visualizations in three dimensions and calculations of lattice-fringe fingerprints for the identification of unknown nanocrystals from their atomic-resolution transmission electron microscopy images.

  7. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

    PubMed Central

    Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.

    2005-01-01

    The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248

  8. DBAASP v.2: an enhanced database of structure and antimicrobial/cytotoxic activity of natural and synthetic peptides

    PubMed Central

    Pirtskhalava, Malak; Gabrielian, Andrei; Cruz, Phillip; Griggs, Hannah L.; Squires, R. Burke; Hurt, Darrell E.; Grigolava, Maia; Chubinidze, Mindia; Gogoladze, George; Vishnepolsky, Boris; Alekseev, Vsevolod; Rosenthal, Alex; Tartakovsky, Michael

    2016-01-01

    Antimicrobial peptides (AMPs) are anti-infectives that may represent a novel and untapped class of biotherapeutics. Increasing interest in AMPs means that new peptides (natural and synthetic) are discovered faster than ever before. We describe herein a new version of the Database of Antimicrobial Activity and Structure of Peptides (DBAASPv.2, which is freely accessible at http://dbaasp.org). This iteration of the database reports chemical structures and empirically-determined activities (MICs, IC50, etc.) against more than 4200 specific target microbes for more than 2000 ribosomal, 80 non-ribosomal and 5700 synthetic peptides. Of these, the vast majority are monomeric, but nearly 200 of these peptides are found as homo- or heterodimers. More than 6100 of the peptides are linear, but about 515 are cyclic and more than 1300 have other intra-chain covalent bonds. More than half of the entries in the database were added after the resource was initially described, which reflects the recent sharp uptick of interest in AMPs. New features of DBAASPv.2 include: (i) user-friendly utilities and reporting functions, (ii) a ‘Ranking Search’ function to query the database by target species and return a ranked list of peptides with activity against that target and (iii) structural descriptions of the peptides derived from empirical data or calculated by molecular dynamics (MD) simulations. The three-dimensional structural data are critical components for understanding structure–activity relationships and for design of new antimicrobial drugs. We created more than 300 high-throughput MD simulations specifically for inclusion in DBAASP. The resulting structures are described in the database by novel trajectory analysis plots and movies. Another 200+ DBAASP entries have links to the Protein DataBank. All of the structures are easily visualized directly in the web browser. PMID:26578581

  9. Creation of a federated database of blood proteins: a powerful new tool for finding and characterizing biomarkers in serum

    PubMed Central

    2014-01-01

    Protein biomarkers offer major benefits for diagnosis and monitoring of disease processes. Recent advances in protein mass spectrometry make it feasible to use this very sensitive technology to detect and quantify proteins in blood. To explore the potential of blood biomarkers, we conducted a thorough review to evaluate the reliability of data in the literature and to determine the spectrum of proteins reported to exist in blood with a goal of creating a Federated Database of Blood Proteins (FDBP). A unique feature of our approach is the use of a SQL database for all of the peptide data; the power of the SQL database combined with standard informatic algorithms such as BLAST and the statistical analysis system (SAS) allowed the rapid annotation and analysis of the database without the need to create special programs to manage the data. Our mathematical analysis and review shows that in addition to the usual secreted proteins found in blood, there are many reports of intracellular proteins and good agreement on transcription factors, DNA remodelling factors in addition to cellular receptors and their signal transduction enzymes. Overall, we have catalogued about 12,130 proteins identified by at least one unique peptide, and of these 3858 have 3 or more peptide correlations. The FDBP with annotations should facilitate testing blood for specific disease biomarkers. PMID:24476026

  10. CyanoClust: comparative genome resources of cyanobacteria and plastids.

    PubMed

    Sasaki, Naobumi V; Sato, Naoki

    2010-01-01

    Cyanobacteria, which perform oxygen-evolving photosynthesis as do chloroplasts of plants and algae, are one of the best-studied prokaryotic phyla and one from which many representative genomes have been sequenced. Lack of a suitable comparative genomic database has been a problem in cyanobacterial genomics because many proteins involved in physiological functions such as photosynthesis and nitrogen fixation are not catalogued in commonly used databases, such as Clusters of Orthologous Proteins (COG). CyanoClust is a database of homolog groups in cyanobacteria and plastids that are produced by the program Gclust. We have developed a web-server system for the protein homology database featuring cyanobacteria and plastids. Database URL: http://cyanoclust.c.u-tokyo.ac.jp/.

  11. Live Cell Visualization of Multiple Protein-Protein Interactions with BiFC Rainbow.

    PubMed

    Wang, Sheng; Ding, Miao; Xue, Boxin; Hou, Yingping; Sun, Yujie

    2018-05-18

    As one of the most powerful tools to visualize PPIs in living cells, bimolecular fluorescence complementation (BiFC) has gained great advancement during recent years, including deep tissue imaging with far-red or near-infrared fluorescent proteins or super-resolution imaging with photochromic fluorescent proteins. However, little progress has been made toward simultaneous detection and visualization of multiple PPIs in the same cell, mainly due to the spectral crosstalk. In this report, we developed novel BiFC assays based on large-Stokes-shift fluorescent proteins (LSS-FPs) to detect and visualize multiple PPIs in living cells. With the large excitation/emission spectral separation, LSS-FPs can be imaged together with normal Stokes shift fluorescent proteins to realize multicolor BiFC imaging using a simple illumination scheme. We also further demonstrated BiFC rainbow combining newly developed BiFC assays with previously established mCerulean/mVenus-based BiFC assays to achieve detection and visualization of four PPI pairs in the same cell. Additionally, we prove that with the complete spectral separation of mT-Sapphire and CyOFP1, LSS-FP-based BiFC assays can be readily combined with intensity-based FRET measurement to detect ternary protein complex formation with minimal spectral crosstalk. Thus, our newly developed LSS-FP-based BiFC assays not only expand the fluorescent protein toolbox available for BiFC but also facilitate the detection and visualization of multiple protein complex interactions in living cells.

  12. CerebralWeb: a Cytoscape.js plug-in to visualize networks stratified by subcellular localization.

    PubMed

    Frias, Silvia; Bryan, Kenneth; Brinkman, Fiona S L; Lynn, David J

    2015-01-01

    CerebralWeb is a light-weight JavaScript plug-in that extends Cytoscape.js to enable fast and interactive visualization of molecular interaction networks stratified based on subcellular localization or other user-supplied annotation. The application is designed to be easily integrated into any website and is configurable to support customized network visualization. CerebralWeb also supports the automatic retrieval of Cerebral-compatible localizations for human, mouse and bovine genes via a web service and enables the automated parsing of Cytoscape compatible XGMML network files. CerebralWeb currently supports embedded network visualization on the InnateDB (www.innatedb.com) and Allergy and Asthma Portal (allergen.innatedb.com) database and analysis resources. Database tool URL: http://www.innatedb.com/CerebralWeb © The Author(s) 2015. Published by Oxford University Press.

  13. Software Re-Engineering of the Human Factors Analysis and Classification System - (Maintenance Extension) Using Object Oriented Methods in a Microsoft Environment

    DTIC Science & Technology

    2001-09-01

    replication) -- all from Visual Basic and VBA . In fact, we found that the SQL Server engine actually had a plethora of options, most formidable of...2002, the new SQL Server 2000 database engine, and Microsoft Visual Basic.NET. This thesis describes our use of the Spiral Development Model to...versions of Microsoft products? Specifically, the pending release of Microsoft Office 2002, the new SQL Server 2000 database engine, and Microsoft

  14. MultitaskProtDB: a database of multitasking proteins.

    PubMed

    Hernández, Sergio; Ferragut, Gabriela; Amela, Isaac; Perez-Pons, JosepAntoni; Piñol, Jaume; Mozo-Villarias, Angel; Cedano, Juan; Querol, Enrique

    2014-01-01

    We have compiled MultitaskProtDB, available online at http://wallace.uab.es/multitask, to provide a repository where the many multitasking proteins found in the literature can be stored. Multitasking or moonlighting is the capability of some proteins to execute two or more biological functions. Usually, multitasking proteins are experimentally revealed by serendipity. This ability of proteins to perform multitasking functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Even so, the study of this phenomenon is complex because, among other things, there is no database of moonlighting proteins. The existence of such a tool facilitates the collection and dissemination of these important data. This work reports the database, MultitaskProtDB, which is designed as a friendly user web page containing >288 multitasking proteins with their NCBI and UniProt accession numbers, canonical and additional biological functions, monomeric/oligomeric states, PDB codes when available and bibliographic references. This database also serves to gain insight into some characteristics of multitasking proteins such as frequencies of the different pairs of functions, phylogenetic conservation and so forth.

  15. KnotProt: a database of proteins with knots and slipknots.

    PubMed

    Jamroz, Michal; Niemyska, Wanda; Rawdon, Eric J; Stasiak, Andrzej; Millett, Kenneth C; Sułkowski, Piotr; Sulkowska, Joanna I

    2015-01-01

    The protein topology database KnotProt, http://knotprot.cent.uw.edu.pl/, collects information about protein structures with open polypeptide chains forming knots or slipknots. The knotting complexity of the cataloged proteins is presented in the form of a matrix diagram that shows users the knot type of the entire polypeptide chain and of each of its subchains. The pattern visible in the matrix gives the knotting fingerprint of a given protein and permits users to determine, for example, the minimal length of the knotted regions (knot's core size) or the depth of a knot, i.e. how many amino acids can be removed from either end of the cataloged protein structure before converting it from a knot to a different type of knot. In addition, the database presents extensive information about the biological functions, families and fold types of proteins with non-trivial knotting. As an additional feature, the KnotProt database enables users to submit protein or polymer chains and generate their knotting fingerprints. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Microarray analysis to identify the similarities and differences of pathogenesis between aortic occlusive disease and abdominal aortic aneurysm.

    PubMed

    Wang, Guofu; Bi, Lechang; Wang, Gaofeng; Huang, Feilai; Lu, Mingjing; Zhu, Kai

    2018-06-01

    Objectives Expression profile of GSE57691 was analyzed to identify the similarities and differences between aortic occlusive disease and abdominal aortic aneurysm. Methods The expression profile of GSE57691 was downloaded from Gene Expression Omnibus database, including 20 small abdominal aortic aneurysm samples, 29 large abdominal aortic aneurysm samples, 9 aortic occlusive disease samples, and 10 control samples. Using the limma package in R, the differentially expressed genes were screened. Followed by enrichment analysis was performed for the differentially expressed genes using database for annotation, visualization, and integrated discovery online tool. Based on string online tool and Cytoscape software, protein-protein interaction network and module analyses were carried out. Moreover, integrated TF platform database and Cytoscape software were used for constructing transcriptional regulatory networks. Results As a result, 1757, 354, and 396 differentially expressed genes separately were identified in aortic occlusive disease, large abdominal aortic aneurysm, and small abdominal aortic aneurysm samples. UBB was significantly enriched in proteolysis related pathways with a high degree in three groups. SPARCL1 was another gene shared by these groups and regulated by NFIA, which had a high degree in transcriptional regulatory network. ACTB, a significant upregulated gene in abdominal aortic aneurysm samples, could be regulated by CLIC4, which was significantly enriched in cell motions. ACLY and NFIB were separately identified in aortic occlusive disease and small abdominal aortic aneurysm samples, and separately enriched in lipid metabolism and negative regulation of cell proliferation. Conclusions The downregulated UBB, NFIA, and SPARCL1 might play key roles in both aortic occlusive disease and abdominal aortic aneurysm, while the upregulated ACTB might only involve in abdominal aortic aneurysm. ACLY and NFIB were specifically involved in aortic occlusive disease and small abdominal aortic aneurysm separately.

  17. Five Years into the Past...Five Years into the Future.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1988-01-01

    Discusses issues which will have an impact on database searching for the next five years: (1) quality control; (2) more inhouse databases; (3) changes in database visuals; (4) pricing policies; and (5) market changes. A number of favorable and unfavorable changes unlikely to occur within five years are also noted. (MES)

  18. ReNE: A Cytoscape Plugin for Regulatory Network Enhancement

    PubMed Central

    Politano, Gianfranco; Benso, Alfredo; Savino, Alessandro; Di Carlo, Stefano

    2014-01-01

    One of the biggest challenges in the study of biological regulatory mechanisms is the integration, americanmodeling, and analysis of the complex interactions which take place in biological networks. Despite post transcriptional regulatory elements (i.e., miRNAs) are widely investigated in current research, their usage and visualization in biological networks is very limited. Regulatory networks are commonly limited to gene entities. To integrate networks with post transcriptional regulatory data, researchers are therefore forced to manually resort to specific third party databases. In this context, we introduce ReNE, a Cytoscape 3.x plugin designed to automatically enrich a standard gene-based regulatory network with more detailed transcriptional, post transcriptional, and translational data, resulting in an enhanced network that more precisely models the actual biological regulatory mechanisms. ReNE can automatically import a network layout from the Reactome or KEGG repositories, or work with custom pathways described using a standard OWL/XML data format that the Cytoscape import procedure accepts. Moreover, ReNE allows researchers to merge multiple pathways coming from different sources. The merged network structure is normalized to guarantee a consistent and uniform description of the network nodes and edges and to enrich all integrated data with additional annotations retrieved from genome-wide databases like NCBI, thus producing a pathway fully manageable through the Cytoscape environment. The normalized network is then analyzed to include missing transcription factors, miRNAs, and proteins. The resulting enhanced network is still a fully functional Cytoscape network where each regulatory element (transcription factor, miRNA, gene, protein) and regulatory mechanism (up-regulation/down-regulation) is clearly visually identifiable, thus enabling a better visual understanding of its role and the effect in the network behavior. The enhanced network produced by ReNE is exportable in multiple formats for further analysis via third party applications. ReNE can be freely installed from the Cytoscape App Store (http://apps.cytoscape.org/apps/rene) and the full source code is freely available for download through a SVN repository accessible at http://www.sysbio.polito.it/tools_svn/BioInformatics/Rene/releases/. ReNE enhances a network by only integrating data from public repositories, without any inference or prediction. The reliability of the introduced interactions only depends on the reliability of the source data, which is out of control of ReNe developers. PMID:25541727

  19. VitisCyc: a metabolic pathway knowledgebase for grapevine (Vitis vinifera)

    PubMed Central

    Naithani, Sushma; Raja, Rajani; Waddell, Elijah N.; Elser, Justin; Gouthu, Satyanarayana; Deluc, Laurent G.; Jaiswal, Pankaj

    2014-01-01

    We have developed VitisCyc, a grapevine-specific metabolic pathway database that allows researchers to (i) search and browse the database for its various components such as metabolic pathways, reactions, compounds, genes and proteins, (ii) compare grapevine metabolic networks with other publicly available plant metabolic networks, and (iii) upload, visualize and analyze high-throughput data such as transcriptomes, proteomes, metabolomes etc. using OMICs-Viewer tool. VitisCyc is based on the genome sequence of the nearly homozygous genotype PN40024 of Vitis vinifera “Pinot Noir” cultivar with 12X v1 annotations and was built on BioCyc platform using Pathway Tools software and MetaCyc reference database. Furthermore, VitisCyc was enriched for plant-specific pathways and grape-specific metabolites, reactions and pathways. Currently VitisCyc harbors 68 super pathways, 362 biosynthesis pathways, 118 catabolic pathways, 5 detoxification pathways, 36 energy related pathways and 6 transport pathways, 10,908 enzymes, 2912 enzymatic reactions, 31 transport reactions and 2024 compounds. VitisCyc, as a community resource, can aid in the discovery of candidate genes and pathways that are regulated during plant growth and development, and in response to biotic and abiotic stress signals generated from a plant's immediate environment. VitisCyc version 3.18 is available online at http://pathways.cgrb.oregonstate.edu. PMID:25538713

  20. NCBI GEO: archive for functional genomics data sets—10 years on

    PubMed Central

    Barrett, Tanya; Troup, Dennis B.; Wilhite, Stephen E.; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F.; Tomashevsky, Maxim; Marshall, Kimberly A.; Phillippy, Katherine H.; Sherman, Patti M.; Muertter, Rolf N.; Holko, Michelle; Ayanbule, Oluwabukunmi; Yefanov, Andrey; Soboleva, Alexandra

    2011-01-01

    A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20 000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/. PMID:21097893

  1. Phi-square Lexical Competition Database (Phi-Lex): an online tool for quantifying auditory and visual lexical competition.

    PubMed

    Strand, Julia F

    2014-03-01

    A widely agreed-upon feature of spoken word recognition is that multiple lexical candidates in memory are simultaneously activated in parallel when a listener hears a word, and that those candidates compete for recognition (Luce, Goldinger, Auer, & Vitevitch, Perception 62:615-625, 2000; Luce & Pisoni, Ear and Hearing 19:1-36, 1998; McClelland & Elman, Cognitive Psychology 18:1-86, 1986). Because the presence of those competitors influences word recognition, much research has sought to quantify the processes of lexical competition. Metrics that quantify lexical competition continuously are more effective predictors of auditory and visual (lipread) spoken word recognition than are the categorical metrics traditionally used (Feld & Sommers, Speech Communication 53:220-228, 2011; Strand & Sommers, Journal of the Acoustical Society of America 130:1663-1672, 2011). A limitation of the continuous metrics is that they are somewhat computationally cumbersome and require access to existing speech databases. This article describes the Phi-square Lexical Competition Database (Phi-Lex): an online, searchable database that provides access to multiple metrics of auditory and visual (lipread) lexical competition for English words, available at www.juliastrand.com/phi-lex .

  2. BLAST and FASTA similarity searching for multiple sequence alignment.

    PubMed

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  3. Reference System of DNA and Protein Sequences on CD-ROM

    NASA Astrophysics Data System (ADS)

    Nasu, Hisanori; Ito, Toshiaki

    DNASIS-DBREF31 is a database for DNA and Protein sequences in the form of optical Compact Disk (CD) ROM, developed and commercialized by Hitachi Software Engineering Co., Ltd. Both nucleic acid base sequences and protein amino acid sequences can be retrieved from a single CD-ROM. Existing database is offered in the form of on-line service, floppy disks, or magnetic tape, all of which have some problems or other, such as usability or storage capacity. DNASIS-DBREF31 newly adopt a CD-ROM as a database device to realize a mass storage and personal use of the database.

  4. A neotropical Miocene pollen database employing image-based search and semantic modeling.

    PubMed

    Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W; Jaramillo, Carlos; Shyu, Chi-Ren

    2014-08-01

    Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery.

  5. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education.

    PubMed

    Rose, Peter W; Prlić, Andreas; Bi, Chunxiao; Bluhm, Wolfgang F; Christie, Cole H; Dutta, Shuchismita; Green, Rachel Kramer; Goodsell, David S; Westbrook, John D; Woo, Jesse; Young, Jasmine; Zardecki, Christine; Berman, Helen M; Bourne, Philip E; Burley, Stephen K

    2015-01-01

    The RCSB Protein Data Bank (RCSB PDB, http://www.rcsb.org) provides access to 3D structures of biological macromolecules and is one of the leading resources in biology and biomedicine worldwide. Our efforts over the past 2 years focused on enabling a deeper understanding of structural biology and providing new structural views of biology that support both basic and applied research and education. Herein, we describe recently introduced data annotations including integration with external biological resources, such as gene and drug databases, new visualization tools and improved support for the mobile web. We also describe access to data files, web services and open access software components to enable software developers to more effectively mine the PDB archive and related annotations. Our efforts are aimed at expanding the role of 3D structure in understanding biology and medicine. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. APPRIS: annotation of principal and alternative splice isoforms

    PubMed Central

    Rodriguez, Jose Manuel; Maietta, Paolo; Ezkurdia, Iakes; Pietrelli, Alessandro; Wesselink, Jan-Jaap; Lopez, Gonzalo; Valencia, Alfonso; Tress, Michael L.

    2013-01-01

    Here, we present APPRIS (http://appris.bioinfo.cnio.es), a database that houses annotations of human splice isoforms. APPRIS has been designed to provide value to manual annotations of the human genome by adding reliable protein structural and functional data and information from cross-species conservation. The visual representation of the annotations provided by APPRIS for each gene allows annotators and researchers alike to easily identify functional changes brought about by splicing events. In addition to collecting, integrating and analyzing reliable predictions of the effect of splicing events, APPRIS also selects a single reference sequence for each gene, here termed the principal isoform, based on the annotations of structure, function and conservation for each transcript. APPRIS identifies a principal isoform for 85% of the protein-coding genes in the GENCODE 7 release for ENSEMBL. Analysis of the APPRIS data shows that at least 70% of the alternative (non-principal) variants would lose important functional or structural information relative to the principal isoform. PMID:23161672

  7. PRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets.

    PubMed

    Perez-Riverol, Yasset; Xu, Qing-Wei; Wang, Rui; Uszkoreit, Julian; Griss, Johannes; Sanchez, Aniel; Reisinger, Florian; Csordas, Attila; Ternent, Tobias; Del-Toro, Noemi; Dianes, Jose A; Eisenacher, Martin; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2016-01-01

    The original PRIDE Inspector tool was developed as an open source standalone tool to enable the visualization and validation of mass-spectrometry (MS)-based proteomics data before data submission or already publicly available in the Proteomics Identifications (PRIDE) database. The initial implementation of the tool focused on visualizing PRIDE data by supporting the PRIDE XML format and a direct access to private (password protected) and public experiments in PRIDE.The ProteomeXchange (PX) Consortium has been set up to enable a better integration of existing public proteomics repositories, maximizing its benefit to the scientific community through the implementation of standard submission and dissemination pipelines. Within the Consortium, PRIDE is focused on supporting submissions of tandem MS data. The increasing use and popularity of the new Proteomics Standards Initiative (PSI) data standards such as mzIdentML and mzTab, and the diversity of workflows supported by the PX resources, prompted us to design and implement a new suite of algorithms and libraries that would build upon the success of the original PRIDE Inspector and would enable users to visualize and validate PX "complete" submissions. The PRIDE Inspector Toolsuite supports the handling and visualization of different experimental output files, ranging from spectra (mzML, mzXML, and the most popular peak lists formats) and peptide and protein identification results (mzIdentML, PRIDE XML, mzTab) to quantification data (mzTab, PRIDE XML), using a modular and extensible set of open-source, cross-platform libraries. We believe that the PRIDE Inspector Toolsuite represents a milestone in the visualization and quality assessment of proteomics data. It is freely available at http://github.com/PRIDE-Toolsuite/. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  8. Integration and visualization of systems biology data in context of the genome

    PubMed Central

    2010-01-01

    Background High-density tiling arrays and new sequencing technologies are generating rapidly increasing volumes of transcriptome and protein-DNA interaction data. Visualization and exploration of this data is critical to understanding the regulatory logic encoded in the genome by which the cell dynamically affects its physiology and interacts with its environment. Results The Gaggle Genome Browser is a cross-platform desktop program for interactively visualizing high-throughput data in the context of the genome. Important features include dynamic panning and zooming, keyword search and open interoperability through the Gaggle framework. Users may bookmark locations on the genome with descriptive annotations and share these bookmarks with other users. The program handles large sets of user-generated data using an in-process database and leverages the facilities of SQL and the R environment for importing and manipulating data. A key aspect of the Gaggle Genome Browser is interoperability. By connecting to the Gaggle framework, the genome browser joins a suite of interconnected bioinformatics tools for analysis and visualization with connectivity to major public repositories of sequences, interactions and pathways. To this flexible environment for exploring and combining data, the Gaggle Genome Browser adds the ability to visualize diverse types of data in relation to its coordinates on the genome. Conclusions Genomic coordinates function as a common key by which disparate biological data types can be related to one another. In the Gaggle Genome Browser, heterogeneous data are joined by their location on the genome to create information-rich visualizations yielding insight into genome organization, transcription and its regulation and, ultimately, a better understanding of the mechanisms that enable the cell to dynamically respond to its environment. PMID:20642854

  9. PRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets*

    PubMed Central

    Perez-Riverol, Yasset; Xu, Qing-Wei; Wang, Rui; Uszkoreit, Julian; Griss, Johannes; Sanchez, Aniel; Reisinger, Florian; Csordas, Attila; Ternent, Tobias; del-Toro, Noemi; Dianes, Jose A.; Eisenacher, Martin; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2016-01-01

    The original PRIDE Inspector tool was developed as an open source standalone tool to enable the visualization and validation of mass-spectrometry (MS)-based proteomics data before data submission or already publicly available in the Proteomics Identifications (PRIDE) database. The initial implementation of the tool focused on visualizing PRIDE data by supporting the PRIDE XML format and a direct access to private (password protected) and public experiments in PRIDE. The ProteomeXchange (PX) Consortium has been set up to enable a better integration of existing public proteomics repositories, maximizing its benefit to the scientific community through the implementation of standard submission and dissemination pipelines. Within the Consortium, PRIDE is focused on supporting submissions of tandem MS data. The increasing use and popularity of the new Proteomics Standards Initiative (PSI) data standards such as mzIdentML and mzTab, and the diversity of workflows supported by the PX resources, prompted us to design and implement a new suite of algorithms and libraries that would build upon the success of the original PRIDE Inspector and would enable users to visualize and validate PX “complete” submissions. The PRIDE Inspector Toolsuite supports the handling and visualization of different experimental output files, ranging from spectra (mzML, mzXML, and the most popular peak lists formats) and peptide and protein identification results (mzIdentML, PRIDE XML, mzTab) to quantification data (mzTab, PRIDE XML), using a modular and extensible set of open-source, cross-platform libraries. We believe that the PRIDE Inspector Toolsuite represents a milestone in the visualization and quality assessment of proteomics data. It is freely available at http://github.com/PRIDE-Toolsuite/. PMID:26545397

  10. A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics*

    PubMed Central

    Li, Jing; Su, Zengliu; Ma, Ze-Qiang; Slebos, Robbert J. C.; Halvey, Patrick; Tabb, David L.; Liebler, Daniel C.; Pao, William; Zhang, Bing

    2011-01-01

    Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. PMID:21389108

  11. Achievable Rate Estimation of IEEE 802.11ad Visual Big-Data Uplink Access in Cloud-Enabled Surveillance Applications.

    PubMed

    Kim, Joongheon; Kim, Jong-Kook

    2016-01-01

    This paper addresses the computation procedures for estimating the impact of interference in 60 GHz IEEE 802.11ad uplink access in order to construct visual big-data database from randomly deployed surveillance camera sensing devices. The acquired large-scale massive visual information from surveillance camera devices will be used for organizing big-data database, i.e., this estimation is essential for constructing centralized cloud-enabled surveillance database. This performance estimation study captures interference impacts on the target cloud access points from multiple interference components generated by the 60 GHz wireless transmissions from nearby surveillance camera devices to their associated cloud access points. With this uplink interference scenario, the interference impacts on the main wireless transmission from a target surveillance camera device to its associated target cloud access point with a number of settings are measured and estimated under the consideration of 60 GHz radiation characteristics and antenna radiation pattern models.

  12. Integrative interactive visualization of crystal structure, band structure, and Brillouin zone

    NASA Astrophysics Data System (ADS)

    Hanson, Robert; Hinke, Ben; van Koevering, Matthew; Oses, Corey; Toher, Cormac; Hicks, David; Gossett, Eric; Plata Ramos, Jose; Curtarolo, Stefano; Aflow Collaboration

    The AFLOW library is an open-access database for high throughput ab-initio calculations that serves as a resource for the dissemination of computational results in the area of materials science. Our project aims to create an interactive web-based visualization of any structure in the AFLOW database that has associate band structure data in a way that allows novel simultaneous exploration of the crystal structure, band structure, and Brillouin zone. Interactivity is obtained using two synchronized JSmol implementations, one for the crystal structure and one for the Brillouin zone, along with a D3-based band-structure diagram produced on the fly from data obtained from the AFLOW database. The current website portal (http://aflowlib.mems.duke.edu/users/jmolers/matt/website) allows interactive access and visualization of crystal structure, Brillouin zone and band structure for more than 55,000 inorganic crystal structures. This work was supported by the US Navy Office of Naval Research through a Broad Area Announcement administered by Duke University.

  13. The Nencki Affective Picture System (NAPS): introduction to a novel, standardized, wide-range, high-quality, realistic picture database.

    PubMed

    Marchewka, Artur; Zurawski, Łukasz; Jednoróg, Katarzyna; Grabowska, Anna

    2014-06-01

    Selecting appropriate stimuli to induce emotional states is essential in affective research. Only a few standardized affective stimulus databases have been created for auditory, language, and visual materials. Numerous studies have extensively employed these databases using both behavioral and neuroimaging methods. However, some limitations of the existing databases have recently been reported, including limited numbers of stimuli in specific categories or poor picture quality of the visual stimuli. In the present article, we introduce the Nencki Affective Picture System (NAPS), which consists of 1,356 realistic, high-quality photographs that are divided into five categories (people, faces, animals, objects, and landscapes). Affective ratings were collected from 204 mostly European participants. The pictures were rated according to the valence, arousal, and approach-avoidance dimensions using computerized bipolar semantic slider scales. Normative ratings for the categories are presented for each dimension. Validation of the ratings was obtained by comparing them to ratings generated using the Self-Assessment Manikin and the International Affective Picture System. In addition, the physical properties of the photographs are reported, including luminance, contrast, and entropy. The new database, with accompanying ratings and image parameters, allows researchers to select a variety of visual stimulus materials specific to their experimental questions of interest. The NAPS system is freely accessible to the scientific community for noncommercial use by request at http://naps.nencki.gov.pl .

  14. Projections for fast protein structure retrieval

    PubMed Central

    Bhattacharya, Sourangshu; Bhattacharyya, Chiranjib; Chandra, Nagasuma R

    2006-01-01

    Background In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. Results Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali. PMID:17254310

  15. PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification.

    PubMed

    Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier

    2003-01-01

    The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.

  16. MinOmics, an Integrative and Immersive Tool for Multi-Omics Analysis.

    PubMed

    Maes, Alexandre; Martinez, Xavier; Druart, Karen; Laurent, Benoist; Guégan, Sean; Marchand, Christophe H; Lemaire, Stéphane D; Baaden, Marc

    2018-06-21

    Proteomic and transcriptomic technologies resulted in massive biological datasets, their interpretation requiring sophisticated computational strategies. Efficient and intuitive real-time analysis remains challenging. We use proteomic data on 1417 proteins of the green microalga Chlamydomonas reinhardtii to investigate physicochemical parameters governing selectivity of three cysteine-based redox post translational modifications (PTM): glutathionylation (SSG), nitrosylation (SNO) and disulphide bonds (SS) reduced by thioredoxins. We aim to understand underlying molecular mechanisms and structural determinants through integration of redox proteome data from gene- to structural level. Our interactive visual analytics approach on an 8.3 m2 display wall of 25 MPixel resolution features stereoscopic three dimensions (3D) representation performed by UnityMol WebGL. Virtual reality headsets complement the range of usage configurations for fully immersive tasks. Our experiments confirm that fast access to a rich cross-linked database is necessary for immersive analysis of structural data. We emphasize the possibility to display complex data structures and relationships in 3D, intrinsic to molecular structure visualization, but less common for omics-network analysis. Our setup is powered by MinOmics, an integrated analysis pipeline and visualization framework dedicated to multi-omics analysis. MinOmics integrates data from various sources into a materialized physical repository. We evaluate its performance, a design criterion for the framework.

  17. sc-PDB: a 3D-database of ligandable binding sites--10 years on.

    PubMed

    Desaphy, Jérémy; Bret, Guillaume; Rognan, Didier; Kellenberger, Esther

    2015-01-01

    The sc-PDB database (available at http://bioinfo-pharma.u-strasbg.fr/scPDB/) is a comprehensive and up-to-date selection of ligandable binding sites of the Protein Data Bank. Sites are defined from complexes between a protein and a pharmacological ligand. The database provides the all-atom description of the protein, its ligand, their binding site and their binding mode. Currently, the sc-PDB archive registers 9283 binding sites from 3678 unique proteins and 5608 unique ligands. The sc-PDB database was publicly launched in 2004 with the aim of providing structure files suitable for computational approaches to drug design, such as docking. During the last 10 years we have improved and standardized the processes for (i) identifying binding sites, (ii) correcting structures, (iii) annotating protein function and ligand properties and (iv) characterizing their binding mode. This paper presents the latest enhancements in the database, specifically pertaining to the representation of molecular interaction and to the similarity between ligand/protein binding patterns. The new website puts emphasis in pictorial analysis of data. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Expression of psoriasis-associated fatty acid-binding protein in senescent human dermal microvascular endothelial cells.

    PubMed

    Ha, Moon Kyung; Chung, Kee Yang; Lee, Ju Hee; Bang, Dongsik; Park, Yoon Kee; Lee, Kwang Hoon

    2004-09-01

    Aging is associated with the progressive pathophysiologic modification of endothelial cells. In vitro endothelial cell senescence is accompanied by proliferative activity failure and by perturbations in gene and protein expressions. Moreover, this cellular senescence in culture has been proposed to reflect processes that occur in aging organisms. In order to observe the changing patterns of protein expression in senescent human dermal microvascular endothelial cells (HDMECs), proteins obtained from both early- and late-passaged HDMECs were separated by two-dimensional electrophoresis, visualized by silver staining, and quantified by image processing. Proteins of interest were extracted by in-gel digestion with trypsin and quantified by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS), by searching the National Center for Biotechnology Information protein-sequence database. More than 2000 spots were detected by 2D electrophoresis within a linear pH range of 3-10. Twenty-two major differentially expressed spots were observed in serially passaged HDMECs and identified with high confidence by MALDI-TOF-MS. One of these spots was found to be a 14-15 kDa psoriasis-associated fatty acid-binding protein (PA-FABP) with high affinity for long-chain fatty acids. The expression of PA-FABP was confirmed to be elevated in senescent HDMECs (passage 20) by fluorescence-activated cell sorting (FACS), confocal laser microscopy, and by immunohistochemistry in aged human skin tissue. Our results suggest that the overexpression of FABP in cultured senescent HDMECs is closely related to skin aging.

  19. Novel signatures of cancer-associated fibroblasts.

    PubMed

    Bozóky, Benedek; Savchenko, Andrii; Csermely, Péter; Korcsmáros, Tamás; Dúl, Zoltán; Pontén, Fredrik; Székely, László; Klein, George

    2013-07-15

    Increasing evidence indicates the importance of the tumor microenvironment, in particular cancer-associated fibroblasts, in cancer development and progression. In our study, we developed a novel, visually based method to identify new immunohistochemical signatures of these fibroblasts. The method employed a protein list based on 759 protein products of genes identified by RNA profiling from our previous study, comparing fibroblasts with differential growth-modulating effect on human cancers cells, and their first neighbors in the human protein interactome. These 2,654 proteins were analyzed in the Human Protein Atlas online database by comparing their immunohistochemical expression patterns in normal versus tumor-associated fibroblasts. Twelve new proteins differentially expressed in cancer-associated fibroblasts were identified (DLG1, BHLHE40, ROCK2, RAB31, AZI2, PKM2, ARHGAP31, ARHGAP26, ITCH, EGLN1, RNF19A and PLOD2), four of them can be connected to the Rho kinase signaling pathway. They were further analyzed in several additional tumor stromata and revealed that the majority showed congruence among the different tumors. Many of them were also positive in normal myofibroblast-like cells. The new signatures can be useful in immunohistochemical analysis of different tumor stromata and may also give us an insight into the pathways activated in them in their true in vivo context. The method itself could be used for other similar analysis to identify proteins expressed in other cell types in tumors and their surrounding microenvironment. Copyright © 2013 UICC.

  20. Coordinating Council. Fourth Meeting: NACA Documents Database Project

    NASA Technical Reports Server (NTRS)

    1991-01-01

    This NASA Scientific and Technical Information Coordination Council meeting dealt with the topic 'NACA Documents Database Project'. The following presentations were made and reported on: NACA documents database project study plan, AIAA study, the Optimal NACA database, Deficiencies in online file, NACA documents: Availability and Preservation, the NARA Collection: What is in it? and What to do about it?, and NACA foreign documents and availability. Visuals are available for most presentations.

  1. Overview of EVE - the event visualization environment of ROOT

    NASA Astrophysics Data System (ADS)

    Tadel, Matevž

    2010-04-01

    EVE is a high-level visualization library using ROOT's data-processing, GUI and OpenGL interfaces. It is designed as a framework for object management offering hierarchical data organization, object interaction and visualization via GUI and OpenGL representations. Automatic creation of 2D projected views is also supported. On the other hand, it can serve as an event visualization toolkit satisfying most HEP requirements: visualization of geometry, simulated and reconstructed data such as hits, clusters, tracks and calorimeter information. Special classes are available for visualization of raw-data. Object-interaction layer allows for easy selection and highlighting of objects and their derived representations (projections) across several views (3D, Rho-Z, R-Phi). Object-specific tooltips are provided in both GUI and GL views. The visual-configuration layer of EVE is built around a data-base of template objects that can be applied to specific instances of visualization objects to ensure consistent object presentation. The data-base can be retrieved from a file, edited during the framework operation and stored to file. EVE prototype was developed within the ALICE collaboration and has been included into ROOT in December 2007. Since then all EVE components have reached maturity. EVE is used as the base of AliEve visualization framework in ALICE, Firework physics-oriented event-display in CMS, and as the visualization engine of FairRoot in FAIR.

  2. SIMAP—the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage

    PubMed Central

    Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas

    2014-01-01

    The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881

  3. Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification.

    PubMed

    Schuemie, Martijn J; Mons, Barend; Weeber, Marc; Kors, Jan A

    2007-06-01

    Gene and protein name identification in text requires a dictionary approach to relate synonyms to the same gene or protein, and to link names to external databases. However, existing dictionaries are incomplete. We investigate two complementary methods for automatic generation of a comprehensive dictionary: combination of information from existing gene and protein databases and rule-based generation of spelling variations. Both methods have been reported in literature before, but have hitherto not been combined and evaluated systematically. We combined gene and protein names from several existing databases of four different organisms. The combined dictionaries showed a substantial increase in recall on three different test sets, as compared to any single database. Application of 23 spelling variation rules to the combined dictionaries further increased recall. However, many rules appeared to have no effect and some appear to have a detrimental effect on precision.

  4. The COG database: a tool for genome-scale analysis of protein functions and evolution

    PubMed Central

    Tatusov, Roman L.; Galperin, Michael Y.; Natale, Darren A.; Koonin, Eugene V.

    2000-01-01

    Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www.ncbi.nlm.nih.gov/COG ). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56–83% of the gene products from each of the complete bacterial and archaeal genomes and ~35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes. PMID:10592175

  5. ATtRACT-a database of RNA-binding proteins and associated motifs.

    PubMed

    Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

    2016-01-01

    RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. © The Author(s) 2016. Published by Oxford University Press.

  6. ECOTOX knowledgebase: New tools for data visualization and database interoperability

    EPA Science Inventory

    The ECOTOXicology knowledgebase (ECOTOX) is a comprehensive, curated database that summarizes toxicology data fromsingle chemical exposure studies to terrestrial and aquatic organisms. The ECOTOX Knowledgebase provides risk assessors and researchers consistent information on toxi...

  7. Data mining and visualization of the Alabama accident database

    DOT National Transportation Integrated Search

    2000-08-01

    The Alabama Department of Public Safety has developed and maintains a centralized database that contain traffic accident data collected from crash report completed by local police officers and state troopers. The Critical Analysis Reporting Environme...

  8. SInCRe—structural interactome computational resource for Mycobacterium tuberculosis

    PubMed Central

    Metri, Rahul; Hariharaputran, Sridhar; Ramakrishnan, Gayatri; Anand, Praveen; Raghavender, Upadhyayula S.; Ochoa-Montaño, Bernardo; Higueruelo, Alicia P.; Sowdhamini, Ramanathan; Chandra, Nagasuma R.; Blundell, Tom L.; Srinivasan, Narayanaswamy

    2015-01-01

    We have developed an integrated database for Mycobacterium tuberculosis H37Rv (Mtb) that collates information on protein sequences, domain assignments, functional annotation and 3D structural information along with protein–protein and protein–small molecule interactions. SInCRe (Structural Interactome Computational Resource) is developed out of CamBan (Cambridge and Bangalore) collaboration. The motivation for development of this database is to provide an integrated platform to allow easily access and interpretation of data and results obtained by all the groups in CamBan in the field of Mtb informatics. In-house algorithms and databases developed independently by various academic groups in CamBan are used to generate Mtb-specific datasets and are integrated in this database to provide a structural dimension to studies on tuberculosis. The SInCRe database readily provides information on identification of functional domains, genome-scale modelling of structures of Mtb proteins and characterization of the small-molecule binding sites within Mtb. The resource also provides structure-based function annotation, information on small-molecule binders including FDA (Food and Drug Administration)-approved drugs, protein–protein interactions (PPIs) and natural compounds that bind to pathogen proteins potentially and result in weakening or elimination of host–pathogen protein–protein interactions. Together they provide prerequisites for identification of off-target binding. Database URL: http://proline.biochem.iisc.ernet.in/sincre PMID:26130660

  9. MetalS(3), a database-mining tool for the identification of structurally similar metal sites.

    PubMed

    Valasatava, Yana; Rosato, Antonio; Cavallaro, Gabriele; Andreini, Claudia

    2014-08-01

    We have developed a database search tool to identify metal sites having structural similarity to a query metal site structure within the MetalPDB database of minimal functional sites (MFSs) contained in metal-binding biological macromolecules. MFSs describe the local environment around the metal(s) independently of the larger context of the macromolecular structure. Such a local environment has a determinant role in tuning the chemical reactivity of the metal, ultimately contributing to the functional properties of the whole system. The database search tool, which we called MetalS(3) (Metal Sites Similarity Search), can be accessed through a Web interface at http://metalweb.cerm.unifi.it/tools/metals3/ . MetalS(3) uses a suitably adapted version of an algorithm that we previously developed to systematically compare the structure of the query metal site with each MFS in MetalPDB. For each MFS, the best superposition is kept. All these superpositions are then ranked according to the MetalS(3) scoring function and are presented to the user in tabular form. The user can interact with the output Web page to visualize the structural alignment or the sequence alignment derived from it. Options to filter the results are available. Test calculations show that the MetalS(3) output correlates well with expectations from protein homology considerations. Furthermore, we describe some usage scenarios that highlight the usefulness of MetalS(3) to obtain mechanistic and functional hints regardless of homology.

  10. iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence

    PubMed Central

    Turner, Brian; Razick, Sabry; Turinsky, Andrei L.; Vlasblom, James; Crowdy, Edgard K.; Cho, Emerson; Morrison, Kyle; Wodak, Shoshana J.

    2010-01-01

    We present iRefWeb, a web interface to protein interaction data consolidated from 10 public databases: BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPact, MPPI and OPHID. iRefWeb enables users to examine aggregated interactions for a protein of interest, and presents various statistical summaries of the data across databases, such as the number of organism-specific interactions, proteins and cited publications. Through links to source databases and supporting evidence, researchers may gauge the reliability of an interaction using simple criteria, such as the detection methods, the scale of the study (high- or low-throughput) or the number of cited publications. Furthermore, iRefWeb compares the information extracted from the same publication by different databases, and offers means to follow-up possible inconsistencies. We provide an overview of the consolidated protein–protein interaction landscape and show how it can be automatically cropped to aid the generation of meaningful organism-specific interactomes. iRefWeb can be accessed at: http://wodaklab.org/iRefWeb. Database URL: http://wodaklab.org/iRefWeb/ PMID:20940177

  11. sapFinder: an R/Bioconductor package for detection of variant peptides in shotgun proteomics experiments.

    PubMed

    Wen, Bo; Xu, Shaohang; Sheynkman, Gloria M; Feng, Qiang; Lin, Liang; Wang, Quanhui; Xu, Xun; Wang, Jun; Liu, Siqi

    2014-11-01

    Single nucleotide variations (SNVs) located within a reading frame can result in single amino acid polymorphisms (SAPs), leading to alteration of the corresponding amino acid sequence as well as function of a protein. Accurate detection of SAPs is an important issue in proteomic analysis at the experimental and bioinformatic level. Herein, we present sapFinder, an R software package, for detection of the variant peptides based on tandem mass spectrometry (MS/MS)-based proteomics data. This package automates the construction of variation-associated databases from public SNV repositories or sample-specific next-generation sequencing (NGS) data and the identification of SAPs through database searching, post-processing and generation of HTML-based report with visualized interface. sapFinder is implemented as a Bioconductor package in R. The package and the vignette can be downloaded at http://bioconductor.org/packages/devel/bioc/html/sapFinder.html and are provided under a GPL-2 license. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology

    PubMed Central

    Paley, Suzanne M.; Krummenacker, Markus; Latendresse, Mario; Dale, Joseph M.; Lee, Thomas J.; Kaipa, Pallavi; Gilham, Fred; Spaulding, Aaron; Popescu, Liviu; Altman, Tomer; Paulsen, Ian; Keseler, Ingrid M.; Caspi, Ron

    2010-01-01

    Pathway Tools is a production-quality software environment for creating a type of model-organism database called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc integrates the evolving understanding of the genes, proteins, metabolic network and regulatory network of an organism. This article provides an overview of Pathway Tools capabilities. The software performs multiple computational inferences including prediction of metabolic pathways, prediction of metabolic pathway hole fillers and prediction of operons. It enables interactive editing of PGDBs by DB curators. It supports web publishing of PGDBs, and provides a large number of query and visualization tools. The software also supports comparative analyses of PGDBs, and provides several systems biology analyses of PGDBs including reachability analysis of metabolic networks, and interactive tracing of metabolites through a metabolic network. More than 800 PGDBs have been created using Pathway Tools by scientists around the world, many of which are curated DBs for important model organisms. Those PGDBs can be exchanged using a peer-to-peer DB sharing system called the PGDB Registry. PMID:19955237

  13. CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database

    PubMed Central

    Jia, Baofeng; Raphenya, Amogelang R.; Alcock, Brian; Waglechner, Nicholas; Guo, Peiyao; Tsang, Kara K.; Lago, Briony A.; Dave, Biren M.; Pereira, Sheldon; Sharma, Arjun N.; Doshi, Sachin; Courtot, Mélanie; Lo, Raymond; Williams, Laura E.; Frye, Jonathan G.; Elsayegh, Tariq; Sardar, Daim; Westman, Erin L.; Pawlowski, Andrew C.; Johnson, Timothy A.; Brinkman, Fiona S.L.; Wright, Gerard D.; McArthur, Andrew G.

    2017-01-01

    The Comprehensive Antibiotic Resistance Database (CARD; http://arpcard.mcmaster.ca) is a manually curated resource containing high quality reference data on the molecular basis of antimicrobial resistance (AMR), with an emphasis on the genes, proteins and mutations involved in AMR. CARD is ontologically structured, model centric, and spans the breadth of AMR drug classes and resistance mechanisms, including intrinsic, mutation-driven and acquired resistance. It is built upon the Antibiotic Resistance Ontology (ARO), a custom built, interconnected and hierarchical controlled vocabulary allowing advanced data sharing and organization. Its design allows the development of novel genome analysis tools, such as the Resistance Gene Identifier (RGI) for resistome prediction from raw genome sequence. Recent improvements include extensive curation of additional reference sequences and mutations, development of a unique Model Ontology and accompanying AMR detection models to power sequence analysis, new visualization tools, and expansion of the RGI for detection of emergent AMR threats. CARD curation is updated monthly based on an interplay of manual literature curation, computational text mining, and genome analysis. PMID:27789705

  14. Exploiting genomic data to identify proteins involved in abalone reproduction.

    PubMed

    Mendoza-Porras, Omar; Botwright, Natasha A; McWilliam, Sean M; Cook, Mathew T; Harris, James O; Wijffels, Gene; Colgrave, Michelle L

    2014-08-28

    Aside from their critical role in reproduction, abalone gonads serve as an indicator of sexual maturity and energy balance, two key considerations for effective abalone culture. Temperate abalone farmers face issues with tank restocking with highly marketable abalone owing to inefficient spawning induction methods. The identification of key proteins in sexually mature abalone will serve as the foundation for a greater understanding of reproductive biology. Addressing this knowledge gap is the first step towards improving abalone aquaculture methods. Proteomic profiling of female and male gonads of greenlip abalone, Haliotis laevigata, was undertaken using liquid chromatography-mass spectrometry. Owing to the incomplete nature of abalone protein databases, in addition to searching against two publicly available databases, a custom database comprising genomic data was used. Overall, 162 and 110 proteins were identified in females and males respectively with 40 proteins common to both sexes. For proteins involved in sexual maturation, sperm and egg structure, motility, acrosomal reaction and fertilization, 23 were identified only in females, 18 only in males and 6 were common. Gene ontology analysis revealed clear differences between the female and male protein profiles reflecting a higher rate of protein synthesis in the ovary and higher metabolic activity in the testis. A comprehensive mass spectrometry-based analysis was performed to profile the abalone gonad proteome providing the foundation for future studies of reproduction in abalone. Key proteins involved in both reproduction and energy balance were identified. Genomic resources were utilised to build a database of molluscan proteins yielding >60% more protein identifications than in a standard workflow employing public protein databases. Copyright © 2014 Elsevier B.V. All rights reserved.

  15. An Examination of Job Skills Posted on Internet Databases: Implications for Information Systems Degree Programs.

    ERIC Educational Resources Information Center

    Liu, Xia; Liu, Lai C.; Koong, Kai S.; Lu, June

    2003-01-01

    Analysis of 300 information technology job postings in two Internet databases identified the following skill categories: programming languages (Java, C/C++, and Visual Basic were most frequent); website development (57% sought SQL and HTML skills); databases (nearly 50% required Oracle); networks (only Windows NT or wide-area/local-area networks);…

  16. Meta sequence analysis of human blood peptides and their parent proteins.

    PubMed

    Bowden, Peter; Pendrak, Voitek; Zhu, Peihong; Marshall, John G

    2010-04-18

    Sequence analysis of the blood peptides and their qualities will be key to understanding the mechanisms that contribute to error in LC-ESI-MS/MS. Analysis of peptides and their proteins at the level of sequences is much more direct and informative than the comparison of disparate accession numbers. A portable database of all blood peptide and protein sequences with descriptor fields and gene ontology terms might be useful for designing immunological or MRM assays from human blood. The results of twelve studies of human blood peptides and/or proteins identified by LC-MS/MS and correlated against a disparate array of genetic libraries were parsed and matched to proteins from the human ENSEMBL, SwissProt and RefSeq databases by SQL. The reported peptide and protein sequences were organized into an SQL database with full protein sequences and up to five unique peptides in order of prevalence along with the peptide count for each protein. Structured query language or BLAST was used to acquire descriptive information in current databases. Sampling error at the level of peptides is the largest source of disparity between groups. Chi Square analysis of peptide to protein distributions confirmed the significant agreement between groups on identified proteins. Copyright 2010. Published by Elsevier B.V.

  17. 3D visualization of molecular structures in the MOGADOC database

    NASA Astrophysics Data System (ADS)

    Vogt, Natalja; Popov, Evgeny; Rudert, Rainer; Kramer, Rüdiger; Vogt, Jürgen

    2010-08-01

    The MOGADOC database (Molecular Gas-Phase Documentation) is a powerful tool to retrieve information about compounds which have been studied in the gas-phase by electron diffraction, microwave spectroscopy and molecular radio astronomy. Presently the database contains over 34,500 bibliographic references (from the beginning of each method) for about 10,000 inorganic, organic and organometallic compounds and structural data (bond lengths, bond angles, dihedral angles, etc.) for about 7800 compounds. Most of the implemented molecular structures are given in a three-dimensional (3D) presentation. To create or edit and visualize the 3D images of molecules, new tools (special editor and Java-based 3D applet) were developed. Molecular structures in internal coordinates were converted to those in Cartesian coordinates.

  18. FARE-CAFE: a database of functional and regulatory elements of cancer-associated fusion events.

    PubMed

    Korla, Praveen Kumar; Cheng, Jack; Huang, Chien-Hung; Tsai, Jeffrey J P; Liu, Yu-Hsuan; Kurubanjerdjit, Nilubon; Hsieh, Wen-Tsong; Chen, Huey-Yi; Ng, Ka-Lok

    2015-01-01

    Chromosomal translocation (CT) is of enormous clinical interest because this disorder is associated with various major solid tumors and leukemia. A tumor-specific fusion gene event may occur when a translocation joins two separate genes. Currently, various CT databases provide information about fusion genes and their genomic elements. However, no database of the roles of fusion genes, in terms of essential functional and regulatory elements in oncogenesis, is available. FARE-CAFE is a unique combination of CTs, fusion proteins, protein domains, domain-domain interactions, protein-protein interactions, transcription factors and microRNAs, with subsequent experimental information, which cannot be found in any other CT database. Genomic DNA information including, for example, manually collected exact locations of the first and second break points, sequences and karyotypes of fusion genes are included. FARE-CAFE will substantially facilitate the cancer biologist's mission of elucidating the pathogenesis of various types of cancer. This database will ultimately help to develop 'novel' therapeutic approaches. Database URL: http://ppi.bioinfo.asia.edu.tw/FARE-CAFE. © The Author(s) 2015. Published by Oxford University Press.

  19. An attempt to understand glioma stem cell biology through centrality analysis of a protein interaction network.

    PubMed

    Mallik, Mrinmay Kumar

    2018-02-07

    Biological networks can be analyzed using "Centrality Analysis" to identify the more influential nodes and interactions in the network. This study was undertaken to create and visualize a biological network comprising of protein-protein interactions (PPIs) amongst proteins which are preferentially over-expressed in glioma cancer stem cell component (GCSC) of glioblastomas as compared to the glioma non-stem cancer cell (GNSC) component and then to analyze this network through centrality analyses (CA) in order to identify the essential proteins in this network and their interactions. In addition, this study proposes a new centrality analysis method pertaining exclusively to transcription factors (TFs) and interactions amongst them. Moreover the relevant molecular functions, biological processes and biochemical pathways amongst these proteins were sought through enrichment analysis. A protein interaction network was created using a list of proteins which have been shown to be preferentially expressed or over-expressed in GCSCs isolated from glioblastomas as compared to the GNSCs. This list comprising of 38 proteins, created using manual literature mining, was submitted to the Reactome FIViz tool, a web based application integrated into Cytoscape, an open source software platform for visualizing and analyzing molecular interaction networks and biological pathways to produce the network. This network was subjected to centrality analyses utilizing ranked lists of six centrality measures using the FIViz application and (for the first time) a dedicated centrality analysis plug-in ; CytoNCA. The interactions exclusively amongst the transcription factors were nalyzed through a newly proposed centrality analysis method called "Gene Expression Associated Degree Centrality Analysis (GEADCA)". Enrichment analysis was performed using the "network function analysis" tool on Reactome. The CA was able to identify a small set of proteins with consistently high centrality ranks that is indicative of their strong influence in the protein protein interaction network. Similarly the newly proposed GEADCA helped identify the transcription factors with high centrality values indicative of their key roles in transcriptional regulation. The enrichment studies provided a list of molecular functions, biological processes and biochemical pathways associated with the constructed network. The study shows how pathway based databases may be used to create and analyze a relevant protein interaction network in glioma cancer stem cells and identify the essential elements within it to gather insights into the molecular interactions that regulate the properties of glioma stem cells. How these insights may be utilized to help the development of future research towards formulation of new management strategies have been discussed from a theoretical standpoint. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Protein-Protein Interaction Network and Gene Ontology

    NASA Astrophysics Data System (ADS)

    Choi, Yunkyu; Kim, Seok; Yi, Gwan-Su; Park, Jinah

    Evolution of computer technologies makes it possible to access a large amount and various kinds of biological data via internet such as DNA sequences, proteomics data and information discovered about them. It is expected that the combination of various data could help researchers find further knowledge about them. Roles of a visualization system are to invoke human abilities to integrate information and to recognize certain patterns in the data. Thus, when the various kinds of data are examined and analyzed manually, an effective visualization system is an essential part. One instance of these integrated visualizations can be combination of protein-protein interaction (PPI) data and Gene Ontology (GO) which could help enhance the analysis of PPI network. We introduce a simple but comprehensive visualization system that integrates GO and PPI data where GO and PPI graphs are visualized side-by-side and supports quick reference functions between them. Furthermore, the proposed system provides several interactive visualization methods for efficiently analyzing the PPI network and GO directedacyclic- graph such as context-based browsing and common ancestors finding.

  1. Follicle Online: an integrated database of follicle assembly, development and ovulation.

    PubMed

    Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Cooke, Howard J; Zhang, Yuanwei; Shi, Qinghua

    2015-01-01

    Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database 'Follicle Online' that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43,000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php © The Author(s) 2015. Published by Oxford University Press.

  2. Follicle Online: an integrated database of follicle assembly, development and ovulation

    PubMed Central

    Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Zhang, Yuanwei; Shi, Qinghua

    2015-01-01

    Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database ‘Follicle Online’ that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43 000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php PMID:25931457

  3. The RAVEN Toolbox and Its Use for Generating a Genome-scale Metabolic Model for Penicillium chrysogenum

    PubMed Central

    Agren, Rasmus; Liu, Liming; Shoaie, Saeed; Vongsangnak, Wanwipa; Nookaew, Intawat; Nielsen, Jens

    2013-01-01

    We present the RAVEN (Reconstruction, Analysis and Visualization of Metabolic Networks) Toolbox: a software suite that allows for semi-automated reconstruction of genome-scale models. It makes use of published models and/or the KEGG database, coupled with extensive gap-filling and quality control features. The software suite also contains methods for visualizing simulation results and omics data, as well as a range of methods for performing simulations and analyzing the results. The software is a useful tool for system-wide data analysis in a metabolic context and for streamlined reconstruction of metabolic networks based on protein homology. The RAVEN Toolbox workflow was applied in order to reconstruct a genome-scale metabolic model for the important microbial cell factory Penicillium chrysogenum Wisconsin54-1255. The model was validated in a bibliomic study of in total 440 references, and it comprises 1471 unique biochemical reactions and 1006 ORFs. It was then used to study the roles of ATP and NADPH in the biosynthesis of penicillin, and to identify potential metabolic engineering targets for maximization of penicillin production. PMID:23555215

  4. Gene Graphics: a genomic neighborhood data visualization web application.

    PubMed

    Harrison, Katherine J; Crécy-Lagard, Valérie de; Zallot, Rémi

    2018-04-15

    The examination of gene neighborhood is an integral part of comparative genomics but no tools to produce publication quality graphics of gene clusters are available. Gene Graphics is a straightforward web application for creating such visuals. Supported inputs include National Center for Biotechnology Information gene and protein identifiers with automatic fetching of neighboring information, GenBank files and data extracted from the SEED database. Gene representations can be customized for many parameters including gene and genome names, colors and sizes. Gene attributes can be copied and pasted for rapid and user-friendly customization of homologous genes between species. In addition to Portable Network Graphics and Scalable Vector Graphics, produced representations can be exported as Tagged Image File Format or Encapsulated PostScript, formats that are standard for publication. Hands-on tutorials with real life examples inspired from publications are available for training. Gene Graphics is freely available at https://katlabs.cc/genegraphics/ and source code is hosted at https://github.com/katlabs/genegraphics. katherinejh@ufl.edu or remizallot@ufl.edu. Supplementary data are available at Bioinformatics online.

  5. DBSecSys 2.0: a database of Burkholderia mallei and Burkholderia pseudomallei secretion systems.

    PubMed

    Memišević, Vesna; Kumar, Kamal; Zavaljevski, Nela; DeShazer, David; Wallqvist, Anders; Reifman, Jaques

    2016-09-20

    Burkholderia mallei and B. pseudomallei are the causative agents of glanders and melioidosis, respectively, diseases with high morbidity and mortality rates. B. mallei and B. pseudomallei are closely related genetically; B. mallei evolved from an ancestral strain of B. pseudomallei by genome reduction and adaptation to an obligate intracellular lifestyle. Although these two bacteria cause different diseases, they share multiple virulence factors, including bacterial secretion systems, which represent key components of bacterial pathogenicity. Despite recent progress, the secretion system proteins for B. mallei and B. pseudomallei, their pathogenic mechanisms of action, and host factors are not well characterized. We previously developed a manually curated database, DBSecSys, of bacterial secretion system proteins for B. mallei. Here, we report an expansion of the database with corresponding information about B. pseudomallei. DBSecSys 2.0 contains comprehensive literature-based and computationally derived information about B. mallei ATCC 23344 and literature-based and computationally derived information about B. pseudomallei K96243. The database contains updated information for 163 B. mallei proteins from the previous database and 61 additional B. mallei proteins, and new information for 281 B. pseudomallei proteins associated with 5 secretion systems, their 1,633 human- and murine-interacting targets, and 2,400 host-B. mallei interactions and 2,286 host-B. pseudomallei interactions. The database also includes information about 13 pathogenic mechanisms of action for B. mallei and B. pseudomallei secretion system proteins inferred from the available literature or computationally. Additionally, DBSecSys 2.0 provides details about 82 virulence attenuation experiments for 52 B. mallei secretion system proteins and 98 virulence attenuation experiments for 61 B. pseudomallei secretion system proteins. We updated the Web interface and data access layer to speed-up users' search of detailed information for orthologous proteins related to secretion systems of the two pathogens. The updates of DBSecSys 2.0 provide unique capabilities to access comprehensive information about secretion systems of B. mallei and B. pseudomallei. They enable studies and comparisons of corresponding proteins of these two closely related pathogens and their host-interacting partners. The database is available at http://dbsecsys.bhsai.org .

  6. Gene and protein nomenclature in public databases

    PubMed Central

    Fundel, Katrin; Zimmer, Ralf

    2006-01-01

    Background Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. Results We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. Conclusion In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries considerably increases size and decreases ambiguity. The entries of the curated synonym dictionary are available for manual querying, editing, and PubMed- or Google-search via the ProThesaurus-wiki. For automated querying via custom software, we offer a web service and an exemplary client application. PMID:16899134

  7. MultitaskProtDB: a database of multitasking proteins

    PubMed Central

    Hernández, Sergio; Ferragut, Gabriela; Amela, Isaac; Perez-Pons, JosepAntoni; Piñol, Jaume; Mozo-Villarias, Angel; Cedano, Juan; Querol, Enrique

    2014-01-01

    We have compiled MultitaskProtDB, available online at http://wallace.uab.es/multitask, to provide a repository where the many multitasking proteins found in the literature can be stored. Multitasking or moonlighting is the capability of some proteins to execute two or more biological functions. Usually, multitasking proteins are experimentally revealed by serendipity. This ability of proteins to perform multitasking functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Even so, the study of this phenomenon is complex because, among other things, there is no database of moonlighting proteins. The existence of such a tool facilitates the collection and dissemination of these important data. This work reports the database, MultitaskProtDB, which is designed as a friendly user web page containing >288 multitasking proteins with their NCBI and UniProt accession numbers, canonical and additional biological functions, monomeric/oligomeric states, PDB codes when available and bibliographic references. This database also serves to gain insight into some characteristics of multitasking proteins such as frequencies of the different pairs of functions, phylogenetic conservation and so forth. PMID:24253302

  8. SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database.

    PubMed

    Chandonia, John-Marc; Fox, Naomi K; Brenner, Steven E

    2017-02-03

    SCOPe (Structural Classification of Proteins-extended, http://scop.berkeley.edu) is a database of relationships between protein structures that extends the Structural Classification of Proteins (SCOP) database. SCOP is an expert-curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. SCOPe classifies the majority of protein structures released since SCOP development concluded in 2009, using a combination of manual curation and highly precise automated tools, aiming to have the same accuracy as fully hand-curated SCOP releases. SCOPe also incorporates and updates the ASTRAL compendium, which provides several databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. SCOPe continues high-quality manual classification of new superfamilies, a key feature of SCOP. Artifacts such as expression tags are now separated into their own class, in order to distinguish them from the homology-based annotations in the remainder of the SCOPe hierarchy. SCOPe 2.06 contains 77,439 Protein Data Bank entries, double the 38,221 structures classified in SCOP. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  9. ECOTOX Knowledgebase: New tools for data visualization and database interoperability -Poster

    EPA Science Inventory

    The ECOTOXicology knowledgebase (ECOTOX) is a comprehensive, curated database that summarizes toxicology data from single chemical exposure studies to terrestrial and aquatic organisms. The ECOTOX Knowledgebase provides risk assessors and researchers consistent information on tox...

  10. ECOTOX Knowledgebase: New tools for data visualization and database interoperability (poster)

    EPA Science Inventory

    The ECOTOXicology knowledgebase (ECOTOX) is a comprehensive, curated database that summarizes toxicology data from single chemical exposure studies to terrestrial and aquatic organisms. The ECOTOX Knowledgebase provides risk assessors and researchers consistent information on tox...

  11. Teaching the Structure of Immunoglobulins by Molecular Visualization and SDS-PAGE Analysis

    ERIC Educational Resources Information Center

    Rižner, Tea Lanišnik

    2014-01-01

    This laboratory class combines molecular visualization and laboratory experimentation to teach the structure of the immunoglobulins (Ig). In the first part of the class, the three-dimensional structures of the human IgG and IgM molecules available through the RCSB PDB database are visualized using freely available software. In the second part, IgG…

  12. PrionScan: an online database of predicted prion domains in complete proteomes.

    PubMed

    Espinosa Angarica, Vladimir; Angulo, Alfonso; Giner, Arturo; Losilla, Guillermo; Ventura, Salvador; Sancho, Javier

    2014-02-05

    Prions are a particular type of amyloids related to a large variety of important processes in cells, but also responsible for serious diseases in mammals and humans. The number of experimentally characterized prions is still low and corresponds to a handful of examples in microorganisms and mammals. Prion aggregation is mediated by specific protein domains with a remarkable compositional bias towards glutamine/asparagine and against charged residues and prolines. These compositional features have been used to predict new prion proteins in the genomes of different organisms. Despite these efforts, there are only a few available data sources containing prion predictions at a genomic scale. Here we present PrionScan, a new database of predicted prion-like domains in complete proteomes. We have previously developed a predictive methodology to identify and score prionogenic stretches in protein sequences. In the present work, we exploit this approach to scan all the protein sequences in public databases and compile a repository containing relevant information of proteins bearing prion-like domains. The database is updated regularly alongside UniprotKB and in its present version contains approximately 28000 predictions in proteins from different functional categories in more than 3200 organisms from all the taxonomic subdivisions. PrionScan can be used in two different ways: database query and analysis of protein sequences submitted by the users. In the first mode, simple queries allow to retrieve a detailed description of the properties of a defined protein. Queries can also be combined to generate more complex and specific searching patterns. In the second mode, users can submit and analyze their own sequences. It is expected that this database would provide relevant insights on prion functions and regulation from a genome-wide perspective, allowing researches performing cross-species prion biology studies. Our database might also be useful for guiding experimentalists in the identification of new candidates for further experimental characterization.

  13. Proteomics: Protein Identification Using Online Databases

    ERIC Educational Resources Information Center

    Eurich, Chris; Fields, Peter A.; Rice, Elizabeth

    2012-01-01

    Proteomics is an emerging area of systems biology that allows simultaneous study of thousands of proteins expressed in cells, tissues, or whole organisms. We have developed this activity to enable high school or college students to explore proteomic databases using mass spectrometry data files generated from yeast proteins in a college laboratory…

  14. Receptor Tyrosine Kinase MET Interactome and Neurodevelopmental Disorder Partners at the Developing Synapse.

    PubMed

    Xie, Zhihui; Li, Jing; Baker, Jonathan; Eagleson, Kathie L; Coba, Marcelo P; Levitt, Pat

    2016-12-15

    Atypical synapse development and plasticity are implicated in many neurodevelopmental disorders (NDDs). NDD-associated, high-confidence risk genes have been identified, yet little is known about functional relationships at the level of protein-protein interactions, which are the dominant molecular bases responsible for mediating circuit development. Proteomics in three independent developing neocortical synaptosomal preparations identified putative interacting proteins of the ligand-activated MET receptor tyrosine kinase, an autism risk gene that mediates synapse development. The candidates were translated into interactome networks and analyzed bioinformatically. Additionally, three independent quantitative proximity ligation assays in cultured neurons and four independent immunoprecipitation analyses of synaptosomes validated protein interactions. Approximately 11% (8/72) of MET-interacting proteins, including SHANK3, SYNGAP1, and GRIN2B, are associated with NDDs. Proteins in the MET interactome were translated into a novel MET interactome network based on human protein-protein interaction databases. High-confidence genes from different NDD datasets that encode synaptosomal proteins were analyzed for being enriched in MET interactome proteins. This was found for autism but not schizophrenia, bipolar disorder, major depressive disorder, or attention-deficit/hyperactivity disorder. There is correlated gene expression between MET and its interactive partners in developing human temporal and visual neocortices but not with highly expressed genes that are not in the interactome. Proximity ligation assays and biochemical analyses demonstrate that MET-protein partner interactions are dynamically regulated by receptor activation. The results provide a novel molecular framework for deciphering the functional relations of key regulators of synaptogenesis that contribute to both typical cortical development and to NDDs. Copyright © 2016 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  15. Named Entity Recognition in a Hungarian NL Based QA System

    NASA Astrophysics Data System (ADS)

    Tikkl, Domonkos; Szidarovszky, P. Ferenc; Kardkovacs, Zsolt T.; Magyar, Gábor

    In WoW project our purpose is to create a complex search interface with the following features: search in the deep web content of contracted partners' databases, processing Hungarian natural language (NL) questions and transforming them to SQL queries for database access, image search supported by a visual thesaurus that describes in a structural form the visual content of images (also in Hungarian). This paper primarily focuses on a particular problem of question processing task: the entity recognition. Before going into details we give a short overview of the project's aims.

  16. Instruments of scientific visual representation in atomic databases

    NASA Astrophysics Data System (ADS)

    Kazakov, V. V.; Kazakov, V. G.; Meshkov, O. I.

    2017-10-01

    Graphic tools of spectral data representation provided by operating information systems on atomic spectroscopy—ASD NIST, VAMDC, SPECTR-W3, and Electronic Structure of Atoms—for the support of scientific-research and human-resource development are presented. Such tools of visual representation of scientific data as those of the spectrogram and Grotrian diagram plotting are considered. The possibility of comparative analysis of the experimentally obtained spectra and reference spectra of atomic systems formed according to the database of a resource is described. The access techniques to the mentioned graphic tools are presented.

  17. A curated gluten protein sequence database to support development of proteomics methods for determination of gluten in gluten-free foods.

    PubMed

    Bromilow, Sophie; Gethings, Lee A; Buckley, Mike; Bromley, Mike; Shewry, Peter R; Langridge, James I; Clare Mills, E N

    2017-06-23

    The unique physiochemical properties of wheat gluten enable a diverse range of food products to be manufactured. However, gluten triggers coeliac disease, a condition which is treated using a gluten-free diet. Analytical methods are required to confirm if foods are gluten-free, but current immunoassay-based methods can unreliable and proteomic methods offer an alternative but require comprehensive and well annotated sequence databases which are lacking for gluten. A manually a curated database (GluPro V1.0) of gluten proteins, comprising 630 discrete unique full length protein sequences has been compiled. It is representative of the different types of gliadin and glutenin components found in gluten. An in silico comparison of their coeliac toxicity was undertaken by analysing the distribution of coeliac toxic motifs. This demonstrated that whilst the α-gliadin proteins contained more toxic motifs, these were distributed across all gluten protein sub-types. Comparison of annotations observed using a discovery proteomics dataset acquired using ion mobility MS/MS showed that more reliable identifications were obtained using the GluPro V1.0 database compared to the complete reviewed Viridiplantae database. This highlights the value of a curated sequence database specifically designed to support the proteomic workflows and the development of methods to detect and quantify gluten. We have constructed the first manually curated open-source wheat gluten protein sequence database (GluPro V1.0) in a FASTA format to support the application of proteomic methods for gluten protein detection and quantification. We have also analysed the manually verified sequences to give the first comprehensive overview of the distribution of sequences able to elicit a reaction in coeliac disease, the prevalent form of gluten intolerance. Provision of this database will improve the reliability of gluten protein identification by proteomic analysis, and aid the development of targeted mass spectrometry methods in line with Codex Alimentarius Commission requirements for foods designed to meet the needs of gluten intolerant individuals. Copyright © 2017. Published by Elsevier B.V.

  18. The Hype over Hyperbolic Browsers.

    ERIC Educational Resources Information Center

    Allen, Maryellen Mott

    2002-01-01

    Considers complaints about the usability in the human-computer interaction aspect of information retrieval and discusses information visualization, the Online Library of Information Visualization Environments, hyperbolic information structure, subject searching, real-world applications, relational databases and hyperbolic trees, and the future of…

  19. MultitaskProtDB-II: an update of a database of multitasking/moonlighting proteins

    PubMed Central

    Franco-Serrano, Luís; Hernández, Sergio; Calvo, Alejandra; Severi, María A; Ferragut, Gabriela; Pérez-Pons, JosepAntoni; Piñol, Jaume; Pich, Òscar; Mozo-Villarias, Ángel; Amela, Isaac

    2018-01-01

    Abstract Multitasking, or moonlighting, is the capability of some proteins to execute two or more biological functions. MultitaskProtDB-II is a database of multifunctional proteins that has been updated. In the previous version, the information contained was: NCBI and UniProt accession numbers, canonical and additional biological functions, organism, monomeric/oligomeric states, PDB codes and bibliographic references. In the present update, the number of entries has been increased from 288 to 694 moonlighting proteins. MultitaskProtDB-II is continually being curated and updated. The new database also contains the following information: GO descriptors for the canonical and moonlighting functions, three-dimensional structure (for those proteins lacking PDB structure, a model was made using Itasser and Phyre), the involvement of the proteins in human diseases (78% of human moonlighting proteins) and whether the protein is a target of a current drug (48% of human moonlighting proteins). These numbers highlight the importance of these proteins for the analysis and explanation of human diseases and target-directed drug design. Moreover, 25% of the proteins of the database are involved in virulence of pathogenic microorganisms, largely in the mechanism of adhesion to the host. This highlights their importance for the mechanism of microorganism infection and vaccine design. MultitaskProtDB-II is available at http://wallace.uab.es/multitaskII. PMID:29136215

  20. Microarray analysis reveals key genes and pathways in Tetralogy of Fallot

    PubMed Central

    He, Yue-E; Qiu, Hui-Xian; Jiang, Jian-Bing; Wu, Rong-Zhou; Xiang, Ru-Lian; Zhang, Yuan-Hai

    2017-01-01

    The aim of the present study was to identify key genes that may be involved in the pathogenesis of Tetralogy of Fallot (TOF) using bioinformatics methods. The GSE26125 microarray dataset, which includes cardiovascular tissue samples derived from 16 children with TOF and five healthy age-matched control infants, was downloaded from the Gene Expression Omnibus database. Differential expression analysis was performed between TOF and control samples to identify differentially expressed genes (DEGs) using Student's t-test, and the R/limma package, with a log2 fold-change of >2 and a false discovery rate of <0.01 set as thresholds. The biological functions of DEGs were analyzed using the ToppGene database. The ReactomeFIViz application was used to construct functional interaction (FI) networks, and the genes in each module were subjected to pathway enrichment analysis. The iRegulon plugin was used to identify transcription factors predicted to regulate the DEGs in the FI network, and the gene-transcription factor pairs were then visualized using Cytoscape software. A total of 878 DEGs were identified, including 848 upregulated genes and 30 downregulated genes. The gene FI network contained seven function modules, which were all comprised of upregulated genes. Genes enriched in Module 1 were enriched in the following three neurological disorder-associated signaling pathways: Parkinson's disease, Alzheimer's disease and Huntington's disease. Genes in Modules 0, 3 and 5 were dominantly enriched in pathways associated with ribosomes and protein translation. The Xbox binding protein 1 transcription factor was demonstrated to be involved in the regulation of genes encoding the subunits of cytoplasmic and mitochondrial ribosomes, as well as genes involved in neurodegenerative disorders. Therefore, dysfunction of genes involved in signaling pathways associated with neurodegenerative disorders, ribosome function and protein translation may contribute to the pathogenesis of TOF. PMID:28713939

  1. MODBASE, a database of annotated comparative protein structure models

    PubMed Central

    Pieper, Ursula; Eswar, Narayanan; Stuart, Ashley C.; Ilyin, Valentin A.; Sali, Andrej

    2002-01-01

    MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10–4) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server. PMID:11752309

  2. 3D-SURFER 2.0: web platform for real-time search and characterization of protein surfaces.

    PubMed

    Xiong, Yi; Esquivel-Rodriguez, Juan; Sael, Lee; Kihara, Daisuke

    2014-01-01

    The increasing number of uncharacterized protein structures necessitates the development of computational approaches for function annotation using the protein tertiary structures. Protein structure database search is the basis of any structure-based functional elucidation of proteins. 3D-SURFER is a web platform for real-time protein surface comparison of a given protein structure against the entire PDB using 3D Zernike descriptors. It can smoothly navigate the protein structure space in real-time from one query structure to another. A major new feature of Release 2.0 is the ability to compare the protein surface of a single chain, a single domain, or a single complex against databases of protein chains, domains, complexes, or a combination of all three in the latest PDB. Additionally, two types of protein structures can now be compared: all-atom-surface and backbone-atom-surface. The server can also accept a batch job for a large number of database searches. Pockets in protein surfaces can be identified by VisGrid and LIGSITE (csc) . The server is available at http://kiharalab.org/3d-surfer/.

  3. Identifying relevant data for a biological database: handcrafted rules versus machine learning.

    PubMed

    Sehgal, Aditya Kumar; Das, Sanmay; Noto, Keith; Saier, Milton H; Elkan, Charles

    2011-01-01

    With well over 1,000 specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both the methods compared and the results will be of interest to curators of many specialized databases.

  4. Database resources of the National Center for Biotechnology Information.

    PubMed

    2016-01-04

    The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank(®) nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (PubMed Central (PMC), Bookshelf and PubReader), health (ClinVar, dbGaP, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen), genomes (BioProject, Assembly, Genome, BioSample, dbSNP, dbVar, Epigenomics, the Map Viewer, Nucleotide, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser and the Trace Archive), genes (Gene, Gene Expression Omnibus (GEO), HomoloGene, PopSet and UniGene), proteins (Protein, the Conserved Domain Database (CDD), COBALT, Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB) and Protein Clusters) and chemicals (Biosystems and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for most of these databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  5. Database resources of the National Center for Biotechnology Information.

    PubMed

    2015-01-01

    The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank(®) nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (Bookshelf, PubMed Central (PMC) and PubReader); medical genetics (ClinVar, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen); genes and genomics (BioProject, BioSample, dbSNP, dbVar, Epigenomics, Gene, Gene Expression Omnibus (GEO), Genome, HomoloGene, the Map Viewer, Nucleotide, PopSet, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser, Trace Archive and UniGene); and proteins and chemicals (Biosystems, COBALT, the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB), Protein Clusters, Protein and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for many of these databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  6. Histoplasma capsulatum proteome response to decreased iron availability

    PubMed Central

    Winters, Michael S; Spellman, Daniel S; Chan, Qilin; Gomez, Francisco J; Hernandez, Margarita; Catron, Brittany; Smulian, Alan G; Neubert, Thomas A; Deepe, George S

    2008-01-01

    Background A fundamental pathogenic feature of the fungus Histoplasma capsulatum is its ability to evade innate and adaptive immune defenses. Once ingested by macrophages the organism is faced with several hostile environmental conditions including iron limitation. H. capsulatum can establish a persistent state within the macrophage. A gap in knowledge exists because the identities and number of proteins regulated by the organism under host conditions has yet to be defined. Lack of such knowledge is an important problem because until these proteins are identified it is unlikely that they can be targeted as new and innovative treatment for histoplasmosis. Results To investigate the proteomic response by H. capsulatum to decreasing iron availability we have created H. capsulatum protein/genomic databases compatible with current mass spectrometric (MS) search engines. Databases were assembled from the H. capsulatum G217B strain genome using gene prediction programs and expressed sequence tag (EST) libraries. Searching these databases with MS data generated from two dimensional (2D) in-gel digestions of proteins resulted in over 50% more proteins identified compared to searching the publicly available fungal databases alone. Using 2D gel electrophoresis combined with statistical analysis we discovered 42 H. capsulatum proteins whose abundance was significantly modulated when iron concentrations were lowered. Altered proteins were identified by mass spectrometry and database searching to be involved in glycolysis, the tricarboxylic acid cycle, lysine metabolism, protein synthesis, and one protein sequence whose function was unknown. Conclusion We have created a bioinformatics platform for H. capsulatum and demonstrated the utility of a proteomic approach by identifying a shift in metabolism the organism utilizes to cope with the hostile conditions provided by the host. We have shown that enzyme transcripts regulated by other fungal pathogens in response to lowering iron availability are also regulated in H. capsulatum at the protein level. We also identified H. capsulatum proteins sensitive to iron level reductions which have yet to be connected to iron availability in other pathogens. These data also indicate the complexity of the response by H. capsulatum to nutritional deprivation. Finally, we demonstrate the importance of a strain specific gene/protein database for H. capsulatum proteomic analysis. PMID:19108728

  7. HBNG: Graph theory based visualization of hydrogen bond networks in protein structures.

    PubMed

    Tiwari, Abhishek; Tiwari, Vivek

    2007-07-09

    HBNG is a graph theory based tool for visualization of hydrogen bond network in 2D. Digraphs generated by HBNG facilitate visualization of cooperativity and anticooperativity chains and rings in protein structures. HBNG takes hydrogen bonds list files (output from HBAT, HBEXPLORE, HBPLUS and STRIDE) as input and generates a DOT language script and constructs digraphs using freeware AT and T Graphviz tool. HBNG is useful in the enumeration of favorable topologies of hydrogen bond networks in protein structures and determining the effect of cooperativity and anticooperativity on protein stability and folding. HBNG can be applied to protein structure comparison and in the identification of secondary structural regions in protein structures. Program is available from the authors for non-commercial purposes.

  8. A neotropical Miocene pollen database employing image-based search and semantic modeling1

    PubMed Central

    Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W.; Jaramillo, Carlos; Shyu, Chi-Ren

    2014-01-01

    • Premise of the study: Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Methods: Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Results: Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Discussion: Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery. PMID:25202648

  9. ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos.

    PubMed

    Roca, Alberto I

    2014-01-01

    The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.

  10. MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level.

    PubMed

    Chiapello, Hélène; Gendrault, Annie; Caron, Christophe; Blum, Jérome; Petit, Marie-Agnès; El Karoui, Meriem

    2008-11-27

    The recent availability of complete sequences for numerous closely related bacterial genomes opens up new challenges in comparative genomics. Several methods have been developed to align complete genomes at the nucleotide level but their use and the biological interpretation of results are not straightforward. It is therefore necessary to develop new resources to access, analyze, and visualize genome comparisons. Here we present recent developments on MOSAIC, a generalist comparative bacterial genome database. This database provides the bacteriologist community with easy access to comparisons of complete bacterial genomes at the intra-species level. The strategy we developed for comparison allows us to define two types of regions in bacterial genomes: backbone segments (i.e., regions conserved in all compared strains) and variable segments (i.e., regions that are either specific to or variable in one of the aligned genomes). Definition of these segments at the nucleotide level allows precise comparative and evolutionary analyses of both coding and non-coding regions of bacterial genomes. Such work is easily performed using the MOSAIC Web interface, which allows browsing and graphical visualization of genome comparisons. The MOSAIC database now includes 493 pairwise comparisons and 35 multiple maximal comparisons representing 78 bacterial species. Genome conserved regions (backbones) and variable segments are presented in various formats for further analysis. A graphical interface allows visualization of aligned genomes and functional annotations. The MOSAIC database is available online at http://genome.jouy.inra.fr/mosaic.

  11. MIPS: analysis and annotation of proteins from whole genomes in 2005.

    PubMed

    Mewes, H W; Frishman, D; Mayer, K F X; Münsterkötter, M; Noubibou, O; Pagel, P; Rattei, T; Oesterheld, M; Ruepp, A; Stümpflen, V

    2006-01-01

    The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein-protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.gsf.de).

  12. DaVIE: Database for the Visualization and Integration of Epigenetic data

    PubMed Central

    Fejes, Anthony P.; Jones, Meaghan J.; Kobor, Michael S.

    2014-01-01

    One of the challenges in the analysis of large data sets, particularly in a population-based setting, is the ability to perform comparisons across projects. This has to be done in such a way that the integrity of each individual project is maintained, while ensuring that the data are comparable across projects. These issues are beginning to be observed in human DNA methylation studies, as the Illumina 450k platform and next generation sequencing-based assays grow in popularity and decrease in price. This increase in productivity is enabling new insights into epigenetics, but also requires the development of pipelines and software capable of handling the large volumes of data. The specific problems inherent in creating a platform for the storage, comparison, integration, and visualization of DNA methylation data include data storage, algorithm efficiency and ability to interpret the results to derive biological meaning from them. Databases provide a ready-made solution to these issues, but as yet no tools exist that that leverage these advantages while providing an intuitive user interface for interpreting results in a genomic context. We have addressed this void by integrating a database to store DNA methylation data with a web interface to query and visualize the database and a set of libraries for more complex analysis. The resulting platform is called DaVIE: Database for the Visualization and Integration of Epigenetics data. DaVIE can use data culled from a variety of sources, and the web interface includes the ability to group samples by sub-type, compare multiple projects and visualize genomic features in relation to sites of interest. We have used DaVIE to identify patterns of DNA methylation in specific projects and across different projects, identify outlier samples, and cross-check differentially methylated CpG sites identified in specific projects across large numbers of samples. A demonstration server has been setup using GEO data at http://echelon.cmmt.ubc.ca/dbaccess/, with login “guest” and password “guest.” Groups may download and install their own version of the server following the instructions on the project's wiki. PMID:25278960

  13. Oceans of Data: In what ways can learning research inform the development of electronic interfaces and tools for use by students accessing large scientific databases?

    NASA Astrophysics Data System (ADS)

    Krumhansl, R. A.; Foster, J.; Peach, C. L.; Busey, A.; Baker, I.

    2012-12-01

    The practice of science and engineering is being revolutionized by the development of cyberinfrastructure for accessing near real-time and archived observatory data. Large cyberinfrastructure projects have the potential to transform the way science is taught in high school classrooms, making enormous quantities of scientific data available, giving students opportunities to analyze and draw conclusions from many kinds of complex data, and providing students with experiences using state-of-the-art resources and techniques for scientific investigations. However, online interfaces to scientific data are built by scientists for scientists, and their design can significantly impede broad use by novices. Knowledge relevant to the design of student interfaces to complex scientific databases is broadly dispersed among disciplines ranging from cognitive science to computer science and cartography and is not easily accessible to designers of educational interfaces. To inform efforts at bridging scientific cyberinfrastructure to the high school classroom, Education Development Center, Inc. and the Scripps Institution of Oceanography conducted an NSF-funded 2-year interdisciplinary review of literature and expert opinion pertinent to making interfaces to large scientific databases accessible to and usable by precollege learners and their teachers. Project findings are grounded in the fundamentals of Cognitive Load Theory, Visual Perception, Schemata formation and Universal Design for Learning. The Knowledge Status Report (KSR) presents cross-cutting and visualization-specific guidelines that highlight how interface design features can address/ ameliorate challenges novice high school students face as they navigate complex databases to find data, and construct and look for patterns in maps, graphs, animations and other data visualizations. The guidelines present ways to make scientific databases more broadly accessible by: 1) adjusting the cognitive load imposed by the user interface and visualizations so that it doesn't exceed the amount of information the learner can actively process; 2) drawing attention to important features and patterns; and 3) enabling customization of visualizations and tools to meet the needs of diverse learners.

  14. Visual Times of Maxima for Short Period Pulsating Stars II

    NASA Astrophysics Data System (ADS)

    Samolyk, G.

    2017-12-01

    This compilation contains 631 times of maxima of 8 short period pulsating stars (primarily RR Lyrae). These were reduced from a portion of the visual observations made from 1966 to 2014 that are included in the AAVSO International Database.

  15. A Web Tool for Generating High Quality Machine-readable Biological Pathways.

    PubMed

    Ramirez-Gaona, Miguel; Marcu, Ana; Pon, Allison; Grant, Jason; Wu, Anthony; Wishart, David S

    2017-02-08

    PathWhiz is a web server built to facilitate the creation of colorful, interactive, visually pleasing pathway diagrams that are rich in biological information. The pathways generated by this online application are machine-readable and fully compatible with essentially all web-browsers and computer operating systems. It uses a specially developed, web-enabled pathway drawing interface that permits the selection and placement of different combinations of pre-drawn biological or biochemical entities to depict reactions, interactions, transport processes and binding events. This palette of entities consists of chemical compounds, proteins, nucleic acids, cellular membranes, subcellular structures, tissues, and organs. All of the visual elements in it can be interactively adjusted and customized. Furthermore, because this tool is a web server, all pathways and pathway elements are publicly accessible. This kind of pathway "crowd sourcing" means that PathWhiz already contains a large and rapidly growing collection of previously drawn pathways and pathway elements. Here we describe a protocol for the quick and easy creation of new pathways and the alteration of existing pathways. To further facilitate pathway editing and creation, the tool contains replication and propagation functions. The replication function allows existing pathways to be used as templates to create or edit new pathways. The propagation function allows one to take an existing pathway and automatically propagate it across different species. Pathways created with this tool can be "re-styled" into different formats (KEGG-like or text-book like), colored with different backgrounds, exported to BioPAX, SBGN-ML, SBML, or PWML data exchange formats, and downloaded as PNG or SVG images. The pathways can easily be incorporated into online databases, integrated into presentations, posters or publications, or used exclusively for online visualization and exploration. This protocol has been successfully applied to generate over 2,000 pathway diagrams, which are now found in many online databases including HMDB, DrugBank, SMPDB, and ECMDB.

  16. Thermal Protection System Imagery Inspection Management System -TIIMS

    NASA Technical Reports Server (NTRS)

    Goza, Sharon; Melendrez, David L.; Henningan, Marsha; LaBasse, Daniel; Smith, Daniel J.

    2011-01-01

    TIIMS is used during the inspection phases of every mission to provide quick visual feedback, detailed inspection data, and determination to the mission management team. This system consists of a visual Web page interface, an SQL database, and a graphical image generator. These combine to allow a user to ascertain quickly the status of the inspection process, and current determination of any problem zones. The TIIMS system allows inspection engineers to enter their determinations into a database and to link pertinent images and video to those database entries. The database then assigns criteria to each zone and tile, and via query, sends the information to a graphical image generation program. Using the official TIPS database tile positions and sizes, the graphical image generation program creates images of the current status of the orbiter, coloring zones, and tiles based on a predefined key code. These images are then displayed on a Web page using customized JAVA scripts to display the appropriate zone of the orbiter based on the location of the user's cursor. The close-up graphic and database entry for that particular zone can then be seen by selecting the zone. This page contains links into the database to access the images used by the inspection engineer when they make the determination entered into the database. Status for the inspection zones changes as determinations are refined and shown by the appropriate color code.

  17. Influenza Research Database: an integrated bioinformatics resource for influenza research and surveillance

    PubMed Central

    Squires, R. Burke; Noronha, Jyothi; Hunt, Victoria; García‐Sastre, Adolfo; Macken, Catherine; Baumgarth, Nicole; Suarez, David; Pickett, Brett E.; Zhang, Yun; Larsen, Christopher N.; Ramsey, Alvin; Zhou, Liwei; Zaremba, Sam; Kumar, Sanjeev; Deitrich, Jon; Klem, Edward; Scheuermann, Richard H.

    2012-01-01

    Please cite this paper as: Squires et al. (2012) Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza and Other Respiratory Viruses 6(6), 404–416. Background  The recent emergence of the 2009 pandemic influenza A/H1N1 virus has highlighted the value of free and open access to influenza virus genome sequence data integrated with information about other important virus characteristics. Design  The Influenza Research Database (IRD, http://www.fludb.org) is a free, open, publicly‐accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user‐friendly interfaces for data retrieval, visualization and comparative genomics analysis, together with personal log in‐protected ‘workbench’ spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature. Results  To demonstrate the utility of the data and analysis tools available in IRD, two scientific use cases are presented. A comparison of hemagglutinin sequence conservation and epitope coverage information revealed highly conserved protein regions that can be recognized by the human adaptive immune system as possible targets for inducing cross‐protective immunity. Phylogenetic and geospatial analysis of sequences from wild bird surveillance samples revealed a possible evolutionary connection between influenza virus from Delaware Bay shorebirds and Alberta ducks. Conclusions  The IRD provides a wealth of integrated data and information about influenza virus to support research of the genetic determinants dictating virus pathogenicity, host range restriction and transmission, and to facilitate development of vaccines, diagnostics, and therapeutics. PMID:22260278

  18. Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

    PubMed

    Höps, Wolfram; Jeffryes, Matt; Bateman, Alex

    2018-01-01

    We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation.  Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases.  We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes.  Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.

  19. The Relationship between Perceived Computer Competence and the Employment Outcomes of Transition-Aged Youths with Visual Impairments

    ERIC Educational Resources Information Center

    Zhou, Li; Smith, Derrick W.; Parker, Amy T.; Griffin-Shirley, Nora

    2013-01-01

    Introduction: The study reported here explored the relationship between the self-perceived computer competence and employment outcomes of transition-aged youths with visual impairments. Methods: Data on 200 in-school youths and 190 out-of-school youths with a primary disability of visual impairment were retrieved from the database of the first…

  20. PICKLE 2.0: A human protein-protein interaction meta-database employing data integration via genetic information ontology

    PubMed Central

    Gioutlakis, Aris; Klapa, Maria I.

    2017-01-01

    It has been acknowledged that source databases recording experimentally supported human protein-protein interactions (PPIs) exhibit limited overlap. Thus, the reconstruction of a comprehensive PPI network requires appropriate integration of multiple heterogeneous primary datasets, presenting the PPIs at various genetic reference levels. Existing PPI meta-databases perform integration via normalization; namely, PPIs are merged after converted to a certain target level. Hence, the node set of the integrated network depends each time on the number and type of the combined datasets. Moreover, the irreversible a priori normalization process hinders the identification of normalization artifacts in the integrated network, which originate from the nonlinearity characterizing the genetic information flow. PICKLE (Protein InteraCtion KnowLedgebasE) 2.0 implements a new architecture for this recently introduced human PPI meta-database. Its main novel feature over the existing meta-databases is its approach to primary PPI dataset integration via genetic information ontology. Building upon the PICKLE principles of using the reviewed human complete proteome (RHCP) of UniProtKB/Swiss-Prot as the reference protein interactor set, and filtering out protein interactions with low probability of being direct based on the available evidence, PICKLE 2.0 first assembles the RHCP genetic information ontology network by connecting the corresponding genes, nucleotide sequences (mRNAs) and proteins (UniProt entries) and then integrates PPI datasets by superimposing them on the ontology network without any a priori transformations. Importantly, this process allows the resulting heterogeneous integrated network to be reversibly normalized to any level of genetic reference without loss of the original information, the latter being used for identification of normalization biases, and enables the appraisal of potential false positive interactions through PPI source database cross-checking. The PICKLE web-based interface (www.pickle.gr) allows for the simultaneous query of multiple entities and provides integrated human PPI networks at either the protein (UniProt) or the gene level, at three PPI filtering modes. PMID:29023571

  1. Morphinome Database - The database of proteins altered by morphine administration - An update.

    PubMed

    Bodzon-Kulakowska, Anna; Padrtova, Tereza; Drabik, Anna; Ner-Kluza, Joanna; Antolak, Anna; Kulakowski, Konrad; Suder, Piotr

    2018-04-13

    Morphine is considered a gold standard in pain treatment. Nevertheless, its use could be associated with severe side effects, including drug addiction. Thus, it is very important to understand the molecular mechanism of morphine action in order to develop new methods of pain therapy, or at least to attenuate the side effects of opioids usage. Proteomics allows for the indication of proteins involved in certain biological processes, but the number of items identified in a single study is usually overwhelming. Thus, researchers face the difficult problem of choosing the proteins which are really important for the investigated processes and worth further studies. Therefore, based on the 29 published articles, we created a database of proteins regulated by morphine administration - The Morphinome Database (addiction-proteomics.org). This web tool allows for indicating proteins that were identified during different proteomics studies. Moreover, the collection and organization of such a vast amount of data allows us to find the same proteins that were identified in various studies and to create their ranking, based on the frequency of their identification. STRING and KEGG databases indicated metabolic pathways which those molecules are involved in. This means that those molecular pathways seem to be strongly affected by morphine administration and could be important targets for further investigations. The data about proteins identified by different proteomics studies of molecular changes caused by morphine administration (29 published articles) were gathered in the Morphinome Database. Unification of those data allowed for the identification of proteins that were indicated several times by distinct proteomics studies, which means that they seem to be very well verified and important for the entire process. Those proteins might be now considered promising aims for more detailed studies of their role in the molecular mechanism of morphine action. Copyright © 2018. Published by Elsevier B.V.

  2. Data Mining.

    ERIC Educational Resources Information Center

    Benoit, Gerald

    2002-01-01

    Discusses data mining (DM) and knowledge discovery in databases (KDD), taking the view that KDD is the larger view of the entire process, with DM emphasizing the cleaning, warehousing, mining, and visualization of knowledge discovery in databases. Highlights include algorithms; users; the Internet; text mining; and information extraction.…

  3. The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection.

    PubMed

    Fernández-Suárez, Xosé M; Rigden, Daniel J; Galperin, Michael Y

    2014-01-01

    The 2014 Nucleic Acids Research Database Issue includes descriptions of 58 new molecular biology databases and recent updates to 123 databases previously featured in NAR or other journals. For convenience, the issue is now divided into eight sections that reflect major subject categories. Among the highlights of this issue are six databases of the transcription factor binding sites in various organisms and updates on such popular databases as CAZy, Database of Genomic Variants (DGV), dbGaP, DrugBank, KEGG, miRBase, Pfam, Reactome, SEED, TCDB and UniProt. There is a strong block of structural databases, which includes, among others, the new RNA Bricks database, updates on PDBe, PDBsum, ArchDB, Gene3D, ModBase, Nucleic Acid Database and the recently revived iPfam database. An update on the NCBI's MMDB describes VAST+, an improved tool for protein structure comparison. Two articles highlight the development of the Structural Classification of Proteins (SCOP) database: one describes SCOPe, which automates assignment of new structures to the existing SCOP hierarchy; the other one describes the first version of SCOP2, with its more flexible approach to classifying protein structures. This issue also includes a collection of articles on bacterial taxonomy and metagenomics, which includes updates on the List of Prokaryotic Names with Standing in Nomenclature (LPSN), Ribosomal Database Project (RDP), the Silva/LTP project and several new metagenomics resources. The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been expanded to 1552 databases. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/).

  4. AIM: a comprehensive Arabidopsis interactome module database and related interologs in plants.

    PubMed

    Wang, Yi; Thilmony, Roger; Zhao, Yunjun; Chen, Guoping; Gu, Yong Q

    2014-01-01

    Systems biology analysis of protein modules is important for understanding the functional relationships between proteins in the interactome. Here, we present a comprehensive database named AIM for Arabidopsis (Arabidopsis thaliana) interactome modules. The database contains almost 250,000 modules that were generated using multiple analysis methods and integration of microarray expression data. All the modules in AIM are well annotated using multiple gene function knowledge databases. AIM provides a user-friendly interface for different types of searches and offers a powerful graphical viewer for displaying module networks linked to the enrichment annotation terms. Both interactive Venn diagram and power graph viewer are integrated into the database for easy comparison of modules. In addition, predicted interologs from other plant species (homologous proteins from different species that share a conserved interaction module) are available for each Arabidopsis module. AIM is a powerful systems biology platform for obtaining valuable insights into the function of proteins in Arabidopsis and other plants using the modules of the Arabidopsis interactome. Database URL:http://probes.pw.usda.gov/AIM Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.

  5. Rhizoma Dioscoreae extract protects against alveolar bone loss by regulating the cell cycle: A predictive study based on the protein‑protein interaction network.

    PubMed

    Zhang, Zhi-Guo; Song, Chang-Heng; Zhang, Fang-Zhen; Chen, Yan-Jing; Xiang, Li-Hua; Xiao, Gary Guishan; Ju, Da-Hong

    2016-06-01

    Rhizoma Dioscoreae extract (RDE) exhibits a protective effect on alveolar bone loss in ovariectomized (OVX) rats. The aim of this study was to predict the pathways or targets that are regulated by RDE, by re‑assessing our previously reported data and conducting a protein‑protein interaction (PPI) network analysis. In total, 383 differentially expressed genes (≥3‑fold) between alveolar bone samples from the RDE and OVX group rats were identified, and a PPI network was constructed based on these genes. Furthermore, four molecular clusters (A‑D) in the PPI network with the smallest P‑values were detected by molecular complex detection (MCODE) algorithm. Using Database for Annotation, Visualization and Integrated Discovery (DAVID) and Ingenuity Pathway Analysis (IPA) tools, two molecular clusters (A and B) were enriched for biological process in Gene Ontology (GO). Only cluster A was associated with biological pathways in the IPA database. GO and pathway analysis results showed that cluster A, associated with cell cycle regulation, was the most important molecular cluster in the PPI network. In addition, cyclin‑dependent kinase 1 (CDK1) may be a key molecule achieving the cell‑cycle‑regulatory function of cluster A. From the PPI network analysis, it was predicted that delayed cell cycle progression in excessive alveolar bone remodeling via downregulation of CDK1 may be another mechanism underling the anti‑osteopenic effect of RDE on alveolar bone.

  6. Identification of the active compounds and significant pathways of yinchenhao decoction based on network pharmacology

    PubMed Central

    Huang, Jihan; Cheung, Fan; Tan, Hor-Yue; Hong, Ming; Wang, Ning; Yang, Juan; Feng, Yibin; Zheng, Qingshan

    2017-01-01

    Yinchenhao decoction (YCHD) is a traditional Chinese medicine formulation, which has been widely used for the treatment of jaundice for 2,000 years. Currently, YCHD is used to treat various liver disorders and metabolic diseases, however its chemical/pharmacologic profiles remain to be elucidated. The present study identified the active compounds and significant pathways of YCHD based on network pharmacology. All of the chemical ingredients of YCHD were retrieved from the Traditional Chinese Medicine Systems Pharmacology database. Absorption, distribution, metabolism and excretion screening with oral bioavailability (OB) screening, drug-likeness (DL) and intestinal epithelial permeability (Caco-2) evaluation were applied to discover the bioactive compounds in YCHD. Following this, target prediction, pathway identification and network construction were employed to clarify the mechanism of action of YCHD. Following OB screening, and evaluation of DL and Caco-2, 34 compounds in YCHD were identified as potential active ingredients, of which 30 compounds were associated with 217 protein targets. A total of 31 significant pathways were obtained by performing enrichment analyses of 217 proteins using the JEPETTO 3.x plugin, and 16 classes of gene-associated diseases were revealed by performing enrichment analyses using Database for Annotation, Visualization and Integrated Discovery v6.7. The present study identified potential active compounds and significant pathways in YCHD. In addition, the mechanism of action of YCHD in the treatment of various diseases through multiple pathways was clarified. PMID:28791364

  7. ELISA-BASE: An Integrated Bioinformatics Tool for Analyzing and Tracking ELISA Microarray Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    White, Amanda M.; Collett, James L.; Seurynck-Servoss, Shannon L.

    ELISA-BASE is an open-source database for capturing, organizing and analyzing protein enzyme-linked immunosorbent assay (ELISA) microarray data. ELISA-BASE is an extension of the BioArray Soft-ware Environment (BASE) database system, which was developed for DNA microarrays. In order to make BASE suitable for protein microarray experiments, we developed several plugins for importing and analyzing quantitative ELISA microarray data. Most notably, our Protein Microarray Analysis Tool (ProMAT) for processing quantita-tive ELISA data is now available as a plugin to the database.

  8. Evaluation of Software for Introducing Protein Structure: Visualization and Simulation

    ERIC Educational Resources Information Center

    White, Brian; Kahriman, Azmin; Luberice, Lois; Idleh, Farhia

    2010-01-01

    Communicating an understanding of the forces and factors that determine a protein's structure is an important goal of many biology and biochemistry courses at a variety of levels. Many educators use computer software that allows visualization of these complex molecules for this purpose. Although visualization is in wide use and has been associated…

  9. LONI visualization environment.

    PubMed

    Dinov, Ivo D; Valentino, Daniel; Shin, Bae Cheol; Konstantinidis, Fotios; Hu, Guogang; MacKenzie-Graham, Allan; Lee, Erh-Fang; Shattuck, David; Ma, Jeff; Schwartz, Craig; Toga, Arthur W

    2006-06-01

    Over the past decade, the use of informatics to solve complex neuroscientific problems has increased dramatically. Many of these research endeavors involve examining large amounts of imaging, behavioral, genetic, neurobiological, and neuropsychiatric data. Superimposing, processing, visualizing, or interpreting such a complex cohort of datasets frequently becomes a challenge. We developed a new software environment that allows investigators to integrate multimodal imaging data, hierarchical brain ontology systems, on-line genetic and phylogenic databases, and 3D virtual data reconstruction models. The Laboratory of Neuro Imaging visualization environment (LONI Viz) consists of the following components: a sectional viewer for imaging data, an interactive 3D display for surface and volume rendering of imaging data, a brain ontology viewer, and an external database query system. The synchronization of all components according to stereotaxic coordinates, region name, hierarchical ontology, and genetic labels is achieved via a comprehensive BrainMapper functionality, which directly maps between position, structure name, database, and functional connectivity information. This environment is freely available, portable, and extensible, and may prove very useful for neurobiologists, neurogenetisists, brain mappers, and for other clinical, pedagogical, and research endeavors.

  10. An integrated metagenome and -proteome analysis of the microbial community residing in a biogas production plant.

    PubMed

    Ortseifen, Vera; Stolze, Yvonne; Maus, Irena; Sczyrba, Alexander; Bremges, Andreas; Albaum, Stefan P; Jaenicke, Sebastian; Fracowiak, Jochen; Pühler, Alfred; Schlüter, Andreas

    2016-08-10

    To study the metaproteome of a biogas-producing microbial community, fermentation samples were taken from an agricultural biogas plant for microbial cell and protein extraction and corresponding metagenome analyses. Based on metagenome sequence data, taxonomic community profiling was performed to elucidate the composition of bacterial and archaeal sub-communities. The community's cytosolic metaproteome was represented in a 2D-PAGE approach. Metaproteome databases for protein identification were compiled based on the assembled metagenome sequence dataset for the biogas plant analyzed and non-corresponding biogas metagenomes. Protein identification results revealed that the corresponding biogas protein database facilitated the highest identification rate followed by other biogas-specific databases, whereas common public databases yielded insufficient identification rates. Proteins of the biogas microbiome identified as highly abundant were assigned to the pathways involved in methanogenesis, transport and carbon metabolism. Moreover, the integrated metagenome/-proteome approach enabled the examination of genetic-context information for genes encoding identified proteins by studying neighboring genes on the corresponding contig. Exemplarily, this approach led to the identification of a Methanoculleus sp. contig encoding 16 methanogenesis-related gene products, three of which were also detected as abundant proteins within the community's metaproteome. Thus, metagenome contigs provide additional information on the genetic environment of identified abundant proteins. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes

    PubMed Central

    Rigden, Daniel J

    2017-01-01

    Abstract This year's Database Issue of Nucleic Acids Research contains 152 papers that include descriptions of 54 new databases and update papers on 98 databases, of which 16 have not been previously featured in NAR. As always, these databases cover a broad range of molecular biology subjects, including genome structure, gene expression and its regulation, proteins, protein domains, and protein–protein interactions. Following the recent trend, an increasing number of new and established databases deal with the issues of human health, from cancer-causing mutations to drugs and drug targets. In accordance with this trend, three recently compiled databases that have been selected by NAR reviewers and editors as ‘breakthrough’ contributions, denovo-db, the Monarch Initiative, and Open Targets, cover human de novo gene variants, disease-related phenotypes in model organisms, and a bioinformatics platform for therapeutic target identification and validation, respectively. We expect these databases to attract the attention of numerous researchers working in various areas of genetics and genomics. Looking back at the past 12 years, we present here the ‘golden set’ of databases that have consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database. The Database Issue is freely available online at the https://academic.oup.com/nar web site. An updated version of the NAR Molecular Biology Database Collection is available at http://www.oxfordjournals.org/nar/database/a/. PMID:28053160

  12. PASS2: an automated database of protein alignments organised as structural superfamilies.

    PubMed

    Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan

    2004-04-02

    The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html

  13. Cazymes Analysis Toolkit (CAT): Webservice for searching and analyzing carbohydrateactive enzymes in a newly sequenced organism using CAZy database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karpinets, Tatiana V; Park, Byung; Syed, Mustafa H

    2010-01-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire non-redundant sequences of the CAZy database. Themore » second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains (DUF) and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit (CAT), and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.« less

  14. CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database.

    PubMed

    Park, Byung H; Karpinets, Tatiana V; Syed, Mustafa H; Leuze, Michael R; Uberbacher, Edward C

    2010-12-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.

  15. BiKEGG: a COBRA toolbox extension for bridging the BiGG and KEGG databases.

    PubMed

    Jamialahmadi, Oveis; Motamedian, Ehsan; Hashemi-Najafabadi, Sameereh

    2016-10-18

    Development of an interface tool between the Biochemical, Genetic and Genomic (BiGG) and KEGG databases is necessary for simultaneous access to the features of both databases. For this purpose, we present the BiKEGG toolbox, an open source COBRA toolbox extension providing a set of functions to infer the reaction correspondences between the KEGG reaction identifiers and those in the BiGG knowledgebase using a combination of manual verification and computational methods. Inferred reaction correspondences using this approach are supported by evidence from the literature, which provides a higher number of reconciled reactions between these two databases compared to the MetaNetX and MetRxn databases. This set of equivalent reactions is then used to automatically superimpose the predicted fluxes using COBRA methods on classical KEGG pathway maps or to create a customized metabolic map based on the KEGG global metabolic pathway, and to find the corresponding reactions in BiGG based on the genome annotation of an organism in the KEGG database. Customized metabolic maps can be created for a set of pathways of interest, for the whole KEGG global map or exclusively for all pathways for which there exists at least one flux carrying reaction. This flexibility in visualization enables BiKEGG to indicate reaction directionality as well as to visualize the reaction fluxes for different static or dynamic conditions in an animated manner. BiKEGG allows the user to export (1) the output visualized metabolic maps to various standard image formats or save them as a video or animated GIF file, and (2) the equivalent reactions for an organism as an Excel spreadsheet.

  16. PaperBLAST: Text Mining Papers for Information about Homologs.

    PubMed

    Price, Morgan N; Arkin, Adam P

    2017-01-01

    Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST's database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/. IMPORTANCE With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins' functions.

  17. When a domain isn’t a domain, and why it’s important to properly filter proteins in databases

    PubMed Central

    Towse, Clare-Louise; Daggett, Valerie

    2013-01-01

    Summary Membership in a protein domain database does not a domain make; a feature we realized when generating a consensus view of protein fold space with our Consensus Domain Dictionary (CDD). This dictionary was used to select representative structures for characterization of the protein dynameome: the Dynameomics initiative. Through this endeavor we rejected a surprising 40% of the 1695 folds in the CDD as being non-autonomous folding units. Although some of this was due to the challenges of grouping similar fold topologies, the dissonance between the cataloguing and structural qualification of protein domains remains surprising. Another potential factor is previously overlooked intrinsic disorder; predicted estimates suggest 40% of proteins to have either local or global disorder. One thing is clear, filtering a structural database and ensuring a consistent definition for protein domains is crucial, and caution is prescribed when generalizations of globular domains are drawn from unfiltered protein domain datasets. PMID:23108912

  18. GARNET--gene set analysis with exploration of annotation relations.

    PubMed

    Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu

    2011-02-15

    Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).

  19. GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis

    PubMed Central

    Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio

    2017-01-01

    There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet. PMID:28695067

  20. GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis.

    PubMed

    Costa, Raquel L; Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio

    2017-01-01

    There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet.

Top