Science.gov

Sample records for kegg pathways database

  1. EGENES: Transcriptome-Based Plant Database of Genes with Metabolic Pathway Information and Expressed Sequence Tag Indices in KEGG1[C][W][OA

    PubMed Central

    Masoudi-Nejad, Ali; Goto, Susumu; Jauregui, Ruy; Ito, Masumi; Kawashima, Shuichi; Moriya, Yuki; Endo, Takashi R.; Kanehisa, Minoru

    2007-01-01

    EGENES is a knowledge-based database for efficient analysis of plant expressed sequence tags (ESTs) that was recently added to the KEGG suite of databases. It links plant genomic information with higher order functional information in a single database. It also provides gene indices for each genome. The genomic information in EGENES is a collection of EST contigs constructed from assembly of ESTs. Due to the extremely large genomes of plant species, the bulk collection of data such as ESTs is a quick way to capture a complete repertoire of genes expressed in an organism. Using ESTs for reconstructing metabolic pathways is a new expansion in KEGG and provides researchers with a new resource for species in which only EST sequences are available. Functional annotation in EGENES is a process of linking a set of genes/transcripts in each genome with a network of interacting molecules in the cell. EGENES is a multispecies, integrated resource consisting of genomic, chemical, and network information containing a complete set of building blocks (genes and molecules) and wiring diagrams (biological pathways) to represent cellular functions. Using EGENES, genome-based pathway annotation and EST-based annotation can now be compared and mutually validated. The ultimate goals of EGENES will be to: bring new plant species into KEGG by clustering and annotating ESTs; abstract knowledge and principles from large-scale plant EST data; and improve computational prediction of systems of higher complexity. EGENES will be updated at least once a year. EGENES is publicly available and is accessible by the following link or by KEGG's navigation system (http://www.genome.jp/kegg-bin/create_kegg_menu?category=plants_egenes). PMID:17468225

  2. Drug-Path: a database for drug-induced pathways

    PubMed Central

    Zeng, Hui; Cui, Qinghua

    2015-01-01

    Some databases for drug-associated pathways have been built and are publicly available. However, the pathways curated in most of these databases are drug-action or drug-metabolism pathways. In recent years, high-throughput technologies such as microarray and RNA-sequencing have produced lots of drug-induced gene expression profiles. Interestingly, drug-induced gene expression profile frequently show distinct patterns, indicating that drugs normally induce the activation or repression of distinct pathways. Therefore, these pathways contribute to study the mechanisms of drugs and drug-repurposing. Here, we present Drug-Path, a database of drug-induced pathways, which was generated by KEGG pathway enrichment analysis for drug-induced upregulated genes and downregulated genes based on drug-induced gene expression datasets in Connectivity Map. Drug-Path provides user-friendly interfaces to retrieve, visualize and download the drug-induced pathway data in the database. In addition, the genes deregulated by a given drug are highlighted in the pathways. All data were organized using SQLite. The web site was implemented using Django, a Python web framework. Finally, we believe that this database will be useful for related researches. Database URL: http://www.cuilab.cn/drugpath PMID:26130661

  3. Drug-Path: a database for drug-induced pathways.

    PubMed

    Zeng, Hui; Qiu, Chengxiang; Cui, Qinghua

    2015-01-01

    Some databases for drug-associated pathways have been built and are publicly available. However, the pathways curated in most of these databases are drug-action or drug-metabolism pathways. In recent years, high-throughput technologies such as microarray and RNA-sequencing have produced lots of drug-induced gene expression profiles. Interestingly, drug-induced gene expression profile frequently show distinct patterns, indicating that drugs normally induce the activation or repression of distinct pathways. Therefore, these pathways contribute to study the mechanisms of drugs and drug-repurposing. Here, we present Drug-Path, a database of drug-induced pathways, which was generated by KEGG pathway enrichment analysis for drug-induced upregulated genes and downregulated genes based on drug-induced gene expression datasets in Connectivity Map. Drug-Path provides user-friendly interfaces to retrieve, visualize and download the drug-induced pathway data in the database. In addition, the genes deregulated by a given drug are highlighted in the pathways. All data were organized using SQLite. The web site was implemented using Django, a Python web framework. Finally, we believe that this database will be useful for related researches. PMID:26130661

  4. Pathway Interaction Database (PID) —

    Cancer.gov

    The National Cancer Institute (NCI) in collaboration with Nature Publishing Group has established the Pathway Interaction Database (PID) in order to provide a highly structured, curated collection of information about known biomolecular interactions and key cellular processes assembled into signaling pathways.

  5. MPW : the metabolic pathways database.

    SciTech Connect

    Selkov, E., Jr.; Grechkin, Y.; Mikhailova, N.; Selkov, E.; Mathematics and Computer Science; Russian Academy of Sciences

    1998-01-01

    The Metabolic Pathways Database (MPW) (www.biobase.com/emphome.html/homepage. html.pags/pathways.html) a derivative of EMP (www.biobase.com/EMP) plays a fundamental role in the technology of metabolic reconstructions from sequenced genomes under the PUMA (www.mcs.anl.gov/home/compbio/PUMA/Production/ ReconstructedMetabolism/reconstruction.html), WIT (www.mcs.anl.gov/home/compbio/WIT/wit.html ) and WIT2 (beauty.isdn.msc.anl.gov/WIT2.pub/CGI/user.cgi) systems. In October 1997, it included some 2800 pathway diagrams covering primary and secondary metabolism, membrane transport, signal transduction pathways, intracellular traffic, translation and transcription. In the current public release of MPW (beauty.isdn.mcs.anl.gov/MPW), the encoding is based on the logical structure of the pathways and is represented by the objects commonly used in electronic circuit design. This facilitates drawing and editing the diagrams and makes possible automation of the basic simulation operations such as deriving stoichiometric matrices, rate laws, and, ultimately, dynamic models of metabolic pathways. Individual pathway diagrams, automatically derived from the original ASCII records, are stored as SGML instances supplemented by relational indices. An auxiliary database of compound names and structures, encoded in the SMILES format, is maintained to unambiguously connect the pathways to the chemical structures of their intermediates.

  6. MetaPathways: a modular pipeline for constructing pathway/genome databases from environmental sequence information

    PubMed Central

    2013-01-01

    Background A central challenge to understanding the ecological and biogeochemical roles of microorganisms in natural and human engineered ecosystems is the reconstruction of metabolic interaction networks from environmental sequence information. The dominant paradigm in metabolic reconstruction is to assign functional annotations using BLAST. Functional annotations are then projected onto symbolic representations of metabolism in the form of KEGG pathways or SEED subsystems. Results Here we present MetaPathways, an open source pipeline for pathway inference that uses the PathoLogic algorithm to map functional annotations onto the MetaCyc collection of reactions and pathways, and construct environmental Pathway/Genome Databases (ePGDBs) compatible with the editing and navigation features of Pathway Tools. The pipeline accepts assembled or unassembled nucleotide sequences, performs quality assessment and control, predicts and annotates noncoding genes and open reading frames, and produces inputs to PathoLogic. In addition to constructing ePGDBs, MetaPathways uses MLTreeMap to build phylogenetic trees for selected taxonomic anchor and functional gene markers, converts General Feature Format (GFF) files into concatenated GenBank files for ePGDB construction based on third-party annotations, and generates useful file formats including Sequin files for direct GenBank submission and gene feature tables summarizing annotations, MLTreeMap trees, and ePGDB pathway coverage summaries for statistical comparisons. Conclusions MetaPathways provides users with a modular annotation and analysis pipeline for predicting metabolic interaction networks from environmental sequence information using an alternative to KEGG pathways and SEED subsystems mapping. It is extensible to genomic and transcriptomic datasets from a wide range of sequencing platforms, and generates useful data products for microbial community structure and function analysis. The MetaPathways software package

  7. SMPDB: The Small Molecule Pathway Database.

    PubMed

    Frolkis, Alex; Knox, Craig; Lim, Emilia; Jewison, Timothy; Law, Vivian; Hau, David D; Liu, Phillip; Gautam, Bijaya; Ly, Son; Guo, An Chi; Xia, Jianguo; Liang, Yongjie; Shrivastava, Savita; Wishart, David S

    2010-01-01

    The Small Molecule Pathway Database (SMPDB) is an interactive, visual database containing more than 350 small-molecule pathways found in humans. More than 2/3 of these pathways (>280) are not found in any other pathway database. SMPDB is designed specifically to support pathway elucidation and pathway discovery in clinical metabolomics, transcriptomics, proteomics and systems biology. SMPDB provides exquisitely detailed, hyperlinked diagrams of human metabolic pathways, metabolic disease pathways, metabolite signaling pathways and drug-action pathways. All SMPDB pathways include information on the relevant organs, organelles, subcellular compartments, protein cofactors, protein locations, metabolite locations, chemical structures and protein quaternary structures. Each small molecule is hyperlinked to detailed descriptions contained in the Human Metabolome Database (HMDB) or DrugBank and each protein or enzyme complex is hyperlinked to UniProt. All SMPDB pathways are accompanied with detailed descriptions, providing an overview of the pathway, condition or processes depicted in each diagram. The database is easily browsed and supports full text searching. Users may query SMPDB with lists of metabolite names, drug names, genes/protein names, SwissProt IDs, GenBank IDs, Affymetrix IDs or Agilent microarray IDs. These queries will produce lists of matching pathways and highlight the matching molecules on each of the pathway diagrams. Gene, metabolite and protein concentration data can also be visualized through SMPDB's mapping interface. All of SMPDB's images, image maps, descriptions and tables are downloadable. SMPDB is available at: http://www.smpdb.ca. PMID:19948758

  8. Towards rule-based metabolic databases: a requirement analysis based on KEGG.

    PubMed

    Richter, Stephan; Fetzer, Ingo; Thullner, Martin; Centler, Florian; Dittrich, Peter

    2015-01-01

    Knowledge of metabolic processes is collected in easily accessable online databases which are increasing rapidly in content and detail. Using these databases for the automatic construction of metabolic network models requires high accuracy and consistency. In this bipartite study we evaluate current accuracy and consistency problems using the KEGG database as a prominent example and propose design principles for dealing with such problems. In the first half, we present our computational approach for classifying inconsistencies and provide an overview of the classes of inconsistencies we identified. We detected inconsistencies both for database entries referring to substances and entries referring to reactions. In the second part, we present strategies to deal with the detected problem classes. We especially propose a rule-based database approach which allows for the inclusion of parameterised molecular species and parameterised reactions. Detailed case-studies and a comparison of explicit networks from KEGG with their anticipated rule-based representation underline the applicability and scalability of this approach. PMID:26547981

  9. HPD: an online integrated human pathway database enabling systems biology studies

    PubMed Central

    2009-01-01

    Background Pathway-oriented experimental and computational studies have led to a significant accumulation of biological knowledge concerning three major types of biological pathway events: molecular signaling events, gene regulation events, and metabolic reaction events. A pathway consists of a series of molecular pathway events that link molecular entities such as proteins, genes, and metabolites. There are approximately 300 biological pathway resources as of April 2009 according to the Pathguide database; however, these pathway databases generally have poor coverage or poor quality, and are difficult to integrate, due to syntactic-level and semantic-level data incompatibilities. Results We developed the Human Pathway Database (HPD) by integrating heterogeneous human pathway data that are either curated at the NCI Pathway Interaction Database (PID), Reactome, BioCarta, KEGG or indexed from the Protein Lounge Web sites. Integration of pathway data at syntactic, semantic, and schematic levels was based on a unified pathway data model and data warehousing-based integration techniques. HPD provides a comprehensive online view that connects human proteins, genes, RNA transcripts, enzymes, signaling events, metabolic reaction events, and gene regulatory events. At the time of this writing HPD includes 999 human pathways and more than 59,341 human molecular entities. The HPD software provides both a user-friendly Web interface for online use and a robust relational database backend for advanced pathway querying. This pathway tool enables users to 1) search for human pathways from different resources by simply entering genes/proteins involved in pathways or words appearing in pathway names, 2) analyze pathway-protein association, 3) study pathway-pathway similarity, and 4) build integrated pathway networks. We demonstrated the usage and characteristics of the new HPD through three breast cancer case studies. Conclusion HPD http://bio.informatics.iupui.edu/HPD is a new

  10. Therapeutic target database update 2016: enriched resource for bench to clinical drug target and targeted pathway information.

    PubMed

    Yang, Hong; Qin, Chu; Li, Ying Hong; Tao, Lin; Zhou, Jin; Yu, Chun Yan; Xu, Feng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong

    2016-01-01

    Extensive drug discovery efforts have yielded many approved and candidate drugs targeting various targets in different biological pathways. Several freely accessible databases provide the drug, target and drug-targeted pathway information for facilitating drug discovery efforts, but there is an insufficient coverage of the clinical trial drugs and the drug-targeted pathways. Here, we describe an update of the Therapeutic Target Database (TTD) previously featured in NAR. The updated contents include: (i) significantly increased coverage of the clinical trial targets and drugs (1.6 and 2.3 times of the previous release, respectively), (ii) cross-links of most TTD target and drug entries to the corresponding pathway entries of KEGG, MetaCyc/BioCyc, NetPath, PANTHER pathway, Pathway Interaction Database (PID), PathWhiz, Reactome and WikiPathways, (iii) the convenient access of the multiple targets and drugs cross-linked to each of these pathway entries and (iv) the recently emerged approved and investigative drugs. This update makes TTD a more useful resource to complement other databases for facilitating the drug discovery efforts. TTD is accessible at http://bidd.nus.edu.sg/group/ttd/ttd.asp. PMID:26578601

  11. Analysis of Tumor Suppressor Genes Based on Gene Ontology and the KEGG Pathway

    PubMed Central

    Kong, Xiangyin; Huang, Tao; Cai, Yu-Dong

    2014-01-01

    Cancer is a serious disease that causes many deaths every year. We urgently need to design effective treatments to cure this disease. Tumor suppressor genes (TSGs) are a type of gene that can protect cells from becoming cancerous. In view of this, correct identification of TSGs is an alternative method for identifying effective cancer therapies. In this study, we performed gene ontology (GO) and pathway enrichment analysis of the TSGs and non-TSGs. Some popular feature selection methods, including minimum redundancy maximum relevance (mRMR) and incremental feature selection (IFS), were employed to analyze the enrichment features. Accordingly, some GO terms and KEGG pathways, such as biological adhesion, cell cycle control, genomic stability maintenance and cell death regulation, were extracted, which are important factors for identifying TSGs. We hope these findings can help in building effective prediction methods for identifying TSGs and thereby, promoting the discovery of effective cancer treatments. PMID:25207935

  12. KaPPA-View4: a metabolic pathway database for representation and analysis of correlation networks of gene co-expression and metabolite co-accumulation and omics data.

    PubMed

    Sakurai, Nozomu; Ara, Takeshi; Ogata, Yoshiyuki; Sano, Ryosuke; Ohno, Takashi; Sugiyama, Kenjiro; Hiruta, Atsushi; Yamazaki, Kiyoshi; Yano, Kentaro; Aoki, Koh; Aharoni, Asaph; Hamada, Kazuki; Yokoyama, Koji; Kawamura, Shingo; Otsuka, Hirofumi; Tokimatsu, Toshiaki; Kanehisa, Minoru; Suzuki, Hideyuki; Saito, Kazuki; Shibata, Daisuke

    2011-01-01

    Correlations of gene-to-gene co-expression and metabolite-to-metabolite co-accumulation calculated from large amounts of transcriptome and metabolome data are useful for uncovering unknown functions of genes, functional diversities of gene family members and regulatory mechanisms of metabolic pathway flows. Many databases and tools are available to interpret quantitative transcriptome and metabolome data, but there are only limited ones that connect correlation data to biological knowledge and can be utilized to find biological significance of it. We report here a new metabolic pathway database, KaPPA-View4 (http://kpv.kazusa.or.jp/kpv4/), which is able to overlay gene-to-gene and/or metabolite-to-metabolite relationships as curves on a metabolic pathway map, or on a combination of up to four maps. This representation would help to discover, for example, novel functions of a transcription factor that regulates genes on a metabolic pathway. Pathway maps of the Kyoto Encyclopedia of Genes and Genomes (KEGG) and maps generated from their gene classifications are available at KaPPA-View4 KEGG version (http://kpv.kazusa.or.jp/kpv4-kegg/). At present, gene co-expression data from the databases ATTED-II, COXPRESdb, CoP and MiBASE for human, mouse, rat, Arabidopsis, rice, tomato and other plants are available. PMID:21097783

  13. Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data

    PubMed Central

    Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia

    2015-01-01

    Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/ PMID:26363020

  14. CyanoPhyChe: A Database for Physico-Chemical Properties, Structure and Biochemical Pathway Information of Cyanobacterial Proteins

    PubMed Central

    Arun, P. V. Parvati Sai; Bakku, Ranjith Kumar; Subhashini, Mranu; Singh, Pankaj; Prabhu, N. Prakash; Suzuki, Iwane; Prakash, Jogadhenu S. S.

    2012-01-01

    CyanoPhyChe is a user friendly database that one can browse through for physico-chemical properties, structure and biochemical pathway information of cyanobacterial proteins. We downloaded all the protein sequences from the cyanobacterial genome database for calculating the physico-chemical properties, such as molecular weight, net charge of protein, isoelectric point, molar extinction coefficient, canonical variable for solubility, grand average hydropathy, aliphatic index, and number of charged residues. Based on the physico-chemical properties, we provide the polarity, structural stability and probability of a protein entering in to an inclusion body (PEPIB). We used the data generated on physico-chemical properties, structure and biochemical pathway information of all cyanobacterial proteins to construct CyanoPhyChe. The data can be used for optimizing methods of expression and characterization of cyanobacterial proteins. Moreover, the ‘Search’ and data export options provided will be useful for proteome analysis. Secondary structure was predicted for all the cyanobacterial proteins using PSIPRED tool and the data generated is made accessible to researchers working on cyanobacteria. In addition, external links are provided to biological databases such as PDB and KEGG for molecular structure and biochemical pathway information, respectively. External links are also provided to different cyanobacterial databases. CyanoPhyChe can be accessed from the following URL: http://bif.uohyd.ac.in/cpc. PMID:23185330

  15. Redundancy control in pathway databases (ReCiPa): an application for improving gene-set enrichment analysis in Omics studies and "Big data" biology.

    PubMed

    Vivar, Juan C; Pemu, Priscilla; McPherson, Ruth; Ghosh, Sujoy

    2013-08-01

    Abstract Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of "Big Data" that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of 'omics'-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large 'omics' datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association

  16. BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences.

    PubMed

    Kanehisa, Minoru; Sato, Yoko; Morishima, Kanae

    2016-02-22

    BlastKOALA and GhostKOALA are automatic annotation servers for genome and metagenome sequences, which perform KO (KEGG Orthology) assignments to characterize individual gene functions and reconstruct KEGG pathways, BRITE hierarchies and KEGG modules to infer high-level functions of the organism or the ecosystem. Both servers are made freely available at the KEGG Web site (http://www.kegg.jp/blastkoala/). In BlastKOALA, the KO assignment is performed by a modified version of the internally used KOALA algorithm after the BLAST search against a non-redundant dataset of pangenome sequences at the species, genus or family level, which is generated from the KEGG GENES database by retaining the KO content of each taxonomic category. In GhostKOALA, which utilizes more rapid GHOSTX for database search and is suitable for metagenome annotation, the pangenome dataset is supplemented with Cd-hit clusters including those for viral genes. The result files may be downloaded and manipulated for further KEGG Mapper analysis, such as comparative pathway analysis using multiple BlastKOALA results. PMID:26585406

  17. Available pathways database (APD): an essential resource for combinatorial biology.

    PubMed

    Pirrung, M C; Silva, C M; Jaeger, J

    2000-10-01

    A relational database, the Available Pathways Database (APD), has been constructed of microbial natural products, their producing strains, and their biosynthetic pathways. The database allows the ready selection of donor strains for combinatorial biology experiments. It provides the same type of resource for combinatorial biology as the Available Chemicals Directory (ACD) does for combinatorial chemical library generation. Its cataloging ability can also provide insight into novel aspects of biosynthetic routes. In particular, no 10-unit Type I polyketides were found in the compilation of this edition of the APD (Version I). PMID:11076562

  18. PMAP: databases for analyzing proteolytic events and pathways.

    PubMed

    Igarashi, Yoshinobu; Heureux, Emily; Doctor, Kutbuddin S; Talwar, Priti; Gramatikova, Svetlana; Gramatikoff, Kosi; Zhang, Ying; Blinov, Michael; Ibragimova, Salmaz S; Boyd, Sarah; Ratnikov, Boris; Cieplak, Piotr; Godzik, Adam; Smith, Jeffrey W; Osterman, Andrei L; Eroshkin, Alexey M

    2009-01-01

    The Proteolysis MAP (PMAP, http://www.proteolysis.org) is a user-friendly website intended to aid the scientific community in reasoning about proteolytic networks and pathways. PMAP is comprised of five databases, linked together in one environment. The foundation databases, ProteaseDB and SubstrateDB, are driven by an automated annotation pipeline that generates dynamic 'Molecule Pages', rich in molecular information. PMAP also contains two community annotated databases focused on function; CutDB has information on more than 5000 proteolytic events, and ProfileDB is dedicated to information of the substrate recognition specificity of proteases. Together, the content within these four databases will ultimately feed PathwayDB, which will be comprised of known pathways whose function can be dynamically modeled in a rule-based manner, and hypothetical pathways suggested by semi-automated culling of the literature. A Protease Toolkit is also available for the analysis of proteases and proteolysis. Here, we describe how the databases of PMAP can be used to foster understanding of proteolytic pathways, and equally as significant, to reason about proteolysis. PMID:18842634

  19. PMAP: databases for analyzing proteolytic events and pathways

    PubMed Central

    Igarashi, Yoshinobu; Heureux, Emily; Doctor, Kutbuddin S.; Talwar, Priti; Gramatikova, Svetlana; Gramatikoff, Kosi; Zhang, Ying; Blinov, Michael; Ibragimova, Salmaz S.; Boyd, Sarah; Ratnikov, Boris; Cieplak, Piotr; Godzik, Adam; Smith, Jeffrey W.; Osterman, Andrei L.; Eroshkin, Alexey M.

    2009-01-01

    The Proteolysis MAP (PMAP, http://www.proteolysis.org) is a user-friendly website intended to aid the scientific community in reasoning about proteolytic networks and pathways. PMAP is comprised of five databases, linked together in one environment. The foundation databases, ProteaseDB and SubstrateDB, are driven by an automated annotation pipeline that generates dynamic ‘Molecule Pages’, rich in molecular information. PMAP also contains two community annotated databases focused on function; CutDB has information on more than 5000 proteolytic events, and ProfileDB is dedicated to information of the substrate recognition specificity of proteases. Together, the content within these four databases will ultimately feed PathwayDB, which will be comprised of known pathways whose function can be dynamically modeled in a rule-based manner, and hypothetical pathways suggested by semi-automated culling of the literature. A Protease Toolkit is also available for the analysis of proteases and proteolysis. Here, we describe how the databases of PMAP can be used to foster understanding of proteolytic pathways, and equally as significant, to reason about proteolysis. PMID:18842634

  20. Reactome: a database of reactions, pathways and biological processes.

    PubMed

    Croft, David; O'Kelly, Gavin; Wu, Guanming; Haw, Robin; Gillespie, Marc; Matthews, Lisa; Caudy, Michael; Garapati, Phani; Gopinath, Gopal; Jassal, Bijay; Jupe, Steven; Kalatskaya, Irina; Mahajan, Shahana; May, Bruce; Ndegwa, Nelson; Schmidt, Esther; Shamovsky, Veronica; Yung, Christina; Birney, Ewan; Hermjakob, Henning; D'Eustachio, Peter; Stein, Lincoln

    2011-01-01

    Reactome (http://www.reactome.org) is a collaboration among groups at the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University School of Medicine and The European Bioinformatics Institute, to develop an open source curated bioinformatics database of human pathways and reactions. Recently, we developed a new web site with improved tools for pathway browsing and data analysis. The Pathway Browser is an Systems Biology Graphical Notation (SBGN)-based visualization system that supports zooming, scrolling and event highlighting. It exploits PSIQUIC web services to overlay our curated pathways with molecular interaction data from the Reactome Functional Interaction Network and external interaction databases such as IntAct, BioGRID, ChEMBL, iRefIndex, MINT and STRING. Our Pathway and Expression Analysis tools enable ID mapping, pathway assignment and overrepresentation analysis of user-supplied data sets. To support pathway annotation and analysis in other species, we continue to make orthology-based inferences of pathways in non-human species, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species. The resulting inferred pathway sets can be browsed and analyzed with our Species Comparison tool. Collaborations are also underway to create manually curated data sets on the Reactome framework for chicken, Drosophila and rice. PMID:21067998

  1. The metabolic pathway collection from EMP: the enzymes and metabolic pathways database.

    PubMed

    Selkov, E; Basmanova, S; Gaasterland, T; Goryanin, I; Gretchkin, Y; Maltsev, N; Nenashev, V; Overbeek, R; Panyushkina, E; Pronevitch, L; Selkov, E; Yunus, I

    1996-01-01

    The Enzymes and Metabolic Pathways database (EMP) is an encoding of the contents of over 10 000 original publications on the topics of enzymology and metabolism. This large body of information has been transformed into a queryable database. An extraction of over 1800 pictorial representations of metabolic pathways from this collection is freely available on the World Wide Web. We believe that this collection will play an important role in the interpretation of genetic sequence data, as well as offering a meaningful framework for the integration of many other forms of biological data. PMID:8594593

  2. DEOP: a database on osmoprotectants and associated pathways

    PubMed Central

    Bougouffa, Salim; Radovanovic, Aleksandar; Essack, Magbubah; Bajic, Vladimir B.

    2014-01-01

    Microorganisms are known to counteract salt stress through salt influx or by the accumulation of osmoprotectants (also called compatible solutes). Understanding the pathways that synthesize and/or breakdown these osmoprotectants is of interest to studies of crops halotolerance and to biotechnology applications that use microbes as cell factories for production of biomass or commercial chemicals. To facilitate the exploration of osmoprotectants, we have developed the first online resource, ‘Dragon Explorer of Osmoprotection associated Pathways’ (DEOP) that gathers and presents curated information about osmoprotectants, complemented by information about reactions and pathways that use or affect them. A combined total of 141 compounds were confirmed osmoprotectants, which were matched to 1883 reactions and 834 pathways. DEOP can also be used to map genes or microbial genomes to potential osmoprotection-associated pathways, and thus link genes and genomes to other associated osmoprotection information. Moreover, DEOP provides a text-mining utility to search deeper into the scientific literature for supporting evidence or for new associations of osmoprotectants to pathways, reactions, enzymes, genes or organisms. Two case studies are provided to demonstrate the usefulness of DEOP. The system can be accessed at. Database URL: http://www.cbrc.kaust.edu.sa/deop/ PMID:25326239

  3. LeishCyc: a biochemical pathways database for Leishmania major

    PubMed Central

    Doyle, Maria A; MacRae, James I; De Souza, David P; Saunders, Eleanor C; McConville, Malcolm J; Likić, Vladimir A

    2009-01-01

    Background Leishmania spp. are sandfly transmitted protozoan parasites that cause a spectrum of diseases in more than 12 million people worldwide. Much research is now focusing on how these parasites adapt to the distinct nutrient environments they encounter in the digestive tract of the sandfly vector and the phagolysosome compartment of mammalian macrophages. While data mining and annotation of the genomes of three Leishmania species has provided an initial inventory of predicted metabolic components and associated pathways, resources for integrating this information into metabolic networks and incorporating data from transcript, protein, and metabolite profiling studies is currently lacking. The development of a reliable, expertly curated, and widely available model of Leishmania metabolic networks is required to facilitate systems analysis, as well as discovery and prioritization of new drug targets for this important human pathogen. Description The LeishCyc database was initially built from the genome sequence of Leishmania major (v5.2), based on the annotation published by the Wellcome Trust Sanger Institute. LeishCyc was manually curated to remove errors, correct automated predictions, and add information from the literature. The ongoing curation is based on public sources, literature searches, and our own experimental and bioinformatics studies. In a number of instances we have improved on the original genome annotation, and, in some ambiguous cases, collected relevant information from the literature in order to help clarify gene or protein annotation in the future. All genes in LeishCyc are linked to the corresponding entry in GeneDB (Wellcome Trust Sanger Institute). Conclusion The LeishCyc database describes Leishmania major genes, gene products, metabolites, their relationships and biochemical organization into metabolic pathways. LeishCyc provides a systematic approach to organizing the evolving information about Leishmania biochemical networks and is

  4. The Saccharomyces Genome Database: Exploring Biochemical Pathways and Mutant Phenotypes.

    PubMed

    Cherry, J Michael

    2015-12-01

    Many biochemical processes, and the proteins and cofactors involved, have been defined for the eukaryote Saccharomyces cerevisiae. This understanding has been largely derived through the awesome power of yeast genetics. The proteins responsible for the reactions that build complex molecules and generate energy for the cell have been integrated into web-based tools that provide classical views of pathways. The Yeast Pathways in the Saccharomyces Genome Database (SGD) is, however, the only database created from manually curated literature annotations. In this protocol, gene function is explored using phenotype annotations to enable hypotheses to be formulated about a gene's action. A common use of the SGD is to understand more about a gene that was identified via a phenotypic screen or found to interact with a gene/protein of interest. There are still many genes that do not yet have an experimentally defined function and so the information currently available can be used to speculate about their potential function. Typically, computational annotations based on sequence similarity are used to predict gene function. In addition, annotations are sometimes available for phenotypes of mutations in the gene of interest. Integrated results for a few example genes will be explored in this protocol. This will be instructive for the exploration of details that aid the analysis of experimental results and the establishment of connections within the yeast literature. PMID:26631123

  5. Genic and Intergenic SSR Database Generation, SNPs Determination and Pathway Annotations, in Date Palm (Phoenix dactylifera L.)

    PubMed Central

    2016-01-01

    The present investigation was carried out aiming to use the bioinformatics tools in order to identify and characterize, simple sequence repeats within the third Version of the date palm genome and develop a new SSR primers database. In addition single nucleotide polymorphisms (SNPs) that are located within the SSR flanking regions were recognized. Moreover, the pathways for the sequences assigned by SSR primers, the biological functions and gene interaction were determined. A total of 172,075 SSR motifs was identified on date palm genome sequence with a frequency of 450.97 SSRs per Mb. Out of these, 130,014 SSRs (75.6%) were located within the intergenic regions with a frequency of 499 SSRs per Mb. While, only 42,061 SSRs (24.4%) were located within the genic regions with a frequency of 347.5 SSRs per Mb. A total of 111,403 of SSR primer pairs were designed, that represents 291.9 SSR primers per Mb. Out of the 111,403, only 31,380 SSR primers were in the genic regions, while 80,023 primers were in the intergenic regions. A number of 250,507 SNPs were recognized in 84,172 SSR flanking regions, which represents 75.55% of the total SSR flanking regions. Out of 12,274 genes only 463 genes comprising 896 SSR primers were mapped onto 111 pathways using KEGG data base. The most abundant enzymes were identified in the pathway related to the biosynthesis of antibiotics. We tested 1031 SSR primers using both publicly available date palm genome sequences as templates in the in silico PCR reactions. Concerning in vitro validation, 31 SSR primers among those used in the in silico PCR were synthesized and tested for their ability to detect polymorphism among six Egyptian date palm cultivars. All tested primers have successfully amplified products, but only 18 primers detected polymorphic amplicons among the studied date palm cultivars. PMID:27434138

  6. Genic and Intergenic SSR Database Generation, SNPs Determination and Pathway Annotations, in Date Palm (Phoenix dactylifera L.).

    PubMed

    Mokhtar, Morad M; Adawy, Sami S; El-Assal, Salah El-Din S; Hussein, Ebtissam H A

    2016-01-01

    The present investigation was carried out aiming to use the bioinformatics tools in order to identify and characterize, simple sequence repeats within the third Version of the date palm genome and develop a new SSR primers database. In addition single nucleotide polymorphisms (SNPs) that are located within the SSR flanking regions were recognized. Moreover, the pathways for the sequences assigned by SSR primers, the biological functions and gene interaction were determined. A total of 172,075 SSR motifs was identified on date palm genome sequence with a frequency of 450.97 SSRs per Mb. Out of these, 130,014 SSRs (75.6%) were located within the intergenic regions with a frequency of 499 SSRs per Mb. While, only 42,061 SSRs (24.4%) were located within the genic regions with a frequency of 347.5 SSRs per Mb. A total of 111,403 of SSR primer pairs were designed, that represents 291.9 SSR primers per Mb. Out of the 111,403, only 31,380 SSR primers were in the genic regions, while 80,023 primers were in the intergenic regions. A number of 250,507 SNPs were recognized in 84,172 SSR flanking regions, which represents 75.55% of the total SSR flanking regions. Out of 12,274 genes only 463 genes comprising 896 SSR primers were mapped onto 111 pathways using KEGG data base. The most abundant enzymes were identified in the pathway related to the biosynthesis of antibiotics. We tested 1031 SSR primers using both publicly available date palm genome sequences as templates in the in silico PCR reactions. Concerning in vitro validation, 31 SSR primers among those used in the in silico PCR were synthesized and tested for their ability to detect polymorphism among six Egyptian date palm cultivars. All tested primers have successfully amplified products, but only 18 primers detected polymorphic amplicons among the studied date palm cultivars. PMID:27434138

  7. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    PubMed Central

    Caspi, Ron; Altman, Tomer; Dreher, Kate; Fulcher, Carol A.; Subhraveti, Pallavi; Keseler, Ingrid M.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Ong, Quang; Paley, Suzanne; Pujar, Anuradha; Shearer, Alexander G.; Travers, Michael; Weerasinghe, Deepika; Zhang, Peifen; Karp, Peter D.

    2012-01-01

    The MetaCyc database (http://metacyc.org/) provides a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains more than 1800 pathways derived from more than 30 000 publications, and is the largest curated collection of metabolic pathways currently available. Most reactions in MetaCyc pathways are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes and literature citations. BioCyc (http://biocyc.org/) is a collection of more than 1700 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference database, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs contain additional features, including predicted operons, transport systems and pathway-hole fillers. The BioCyc website and Pathway Tools software offer many tools for querying and analysis of PGDBs, including Omics Viewers and comparative analysis. New developments include a zoomable web interface for diagrams; flux-balance analysis model generation from PGDBs; web services; and a new tool called Web Groups. PMID:22102576

  8. KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase - an open access and searchable database of a coral genome

    PubMed Central

    2013-01-01

    Background Contemporary coral reef research has firmly established that a genomic approach is urgently needed to better understand the effects of anthropogenic environmental stress and global climate change on coral holobiont interactions. Here we present KEGG orthology-based annotation of the complete genome sequence of the scleractinian coral Acropora digitifera and provide the first comprehensive view of the genome of a reef-building coral by applying advanced bioinformatics. Description Sequences from the KEGG database of protein function were used to construct hidden Markov models. These models were used to search the predicted proteome of A. digitifera to establish complete genomic annotation. The annotated dataset is published in ZoophyteBase, an open access format with different options for searching the data. A particularly useful feature is the ability to use a Google-like search engine that links query words to protein attributes. We present features of the annotation that underpin the molecular structure of key processes of coral physiology that include (1) regulatory proteins of symbiosis, (2) planula and early developmental proteins, (3) neural messengers, receptors and sensory proteins, (4) calcification and Ca2+-signalling proteins, (5) plant-derived proteins, (6) proteins of nitrogen metabolism, (7) DNA repair proteins, (8) stress response proteins, (9) antioxidant and redox-protective proteins, (10) proteins of cellular apoptosis, (11) microbial symbioses and pathogenicity proteins, (12) proteins of viral pathogenicity, (13) toxins and venom, (14) proteins of the chemical defensome and (15) coral epigenetics. Conclusions We advocate that providing annotation in an open-access searchable database available to the public domain will give an unprecedented foundation to interrogate the fundamental molecular structure and interactions of coral symbiosis and allow critical questions to be addressed at the genomic level based on combined aspects of

  9. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    PubMed Central

    Caspi, Ron; Billington, Richard; Ferrer, Luciana; Foerster, Hartmut; Fulcher, Carol A.; Keseler, Ingrid M.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Ong, Quang; Paley, Suzanne; Subhraveti, Pallavi; Weaver, Daniel S.; Karp, Peter D.

    2016-01-01

    The MetaCyc database (MetaCyc.org) is a freely accessible comprehensive database describing metabolic pathways and enzymes from all domains of life. The majority of MetaCyc pathways are small-molecule metabolic pathways that have been experimentally determined. MetaCyc contains more than 2400 pathways derived from >46 000 publications, and is the largest curated collection of metabolic pathways. BioCyc (BioCyc.org) is a collection of 5700 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems, and pathway-hole fillers. The BioCyc website offers a variety of tools for querying and analyzing PGDBs, including Omics Viewers and tools for comparative analysis. This article provides an update of new developments in MetaCyc and BioCyc during the last two years, including addition of Gibbs free energy values for compounds and reactions; redesign of the primary gene/protein page; addition of a tool for creating diagrams containing multiple linked pathways; several new search capabilities, including searching for genes based on sequence patterns, searching for databases based on an organism's phenotypes, and a cross-organism search; and a metabolite identifier translation service. PMID:26527732

  10. Database Constraints Applied to Metabolic Pathway Reconstruction Tools

    PubMed Central

    Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi

    2014-01-01

    Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes. PMID:25202745

  11. Database constraints applied to metabolic pathway reconstruction tools.

    PubMed

    Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi

    2014-01-01

    Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes. PMID:25202745

  12. Data mining in the MetaCyc family of pathway databases.

    PubMed

    Karp, Peter D; Paley, Suzanne; Altman, Tomer

    2013-01-01

    Pathway databases collect the bioreactions and molecular interactions that define the processes of life. The MetaCyc family of pathway databases consists of thousands of databases that were derived through computational inference of metabolic pathways from the MetaCyc pathway/genome database (PGDB). In some cases, these DBs underwent subsequent manual curation. Curated pathway DBs are now available for most of the major model organisms. Databases in the MetaCyc family are managed using the Pathway Tools software. This chapter presents methods for performing data mining on the MetaCyc family of pathway DBs. We discuss the major data access mechanisms for the family, which include data files in multiple formats; application programming interfaces (APIs) for the Lisp, Java, and Perl languages; and web services. We present an overview of the Pathway Tools schema, an understanding of which is needed to query the DBs. The chapter also presents several interactive data mining tools within Pathway Tools for performing omics data analysis. PMID:23192547

  13. Data Mining in the MetaCyc Family of Pathway Databases

    PubMed Central

    Karp, Peter D.; Paley, Suzanne; Altman, Tomer

    2013-01-01

    Pathway databases collect the bioreactions and molecular interactions that define the processes of life. The MetaCyc family of pathway databases consists of thousands of databases that were derived through computational inference of metabolic pathways from the MetaCyc Pathway/Genome Database (PGDB). In some cases these DBs underwent subsequent manual curation. Curated pathway DBs are now available for most of the major model organisms. Databases in the MetaCyc family are managed using the Pathway Tools software. This chapter presents methods for performing data mining on the MetaCyc family of pathway DBs. We discuss the major data access mechanisms for the family, which include data files in multiple formats; application programming interfaces (APIs) for the Lisp, Java, and Perl languages; and web services. We present an overview of the Pathway Tools schema, an understanding of which is needed to query the DBs. The chapter also presents several interactive data mining tools within Pathway Tools for performing omics data analysis. PMID:23192547

  14. Human chromosome 21/Down syndrome gene function and pathway database.

    PubMed

    Nikolaienko, Oleksii; Nguyen, Cao; Crinc, Linda S; Cios, Krzysztof J; Gardiner, Katheleen

    2005-12-30

    Down syndrome, trisomy of human chromosome 21, is the most common genetic cause of intellectual disability. Correlating the increased expression, due to gene dosage, of the >300 genes encoded by chromosome 21 with specific phenotypic features is a goal that becomes more feasible with the increasing availability of large scale functional, expression and evolutionary data. These data are dispersed among diverse databases, and the variety of formats and locations, plus their often rapid growth, makes access and assimilation a daunting task. To aid the Down syndrome and chromosome 21 community, and researchers interested in the study of any chromosome 21 gene or ortholog, we are developing a comprehensive chromosome 21-specific database with the goals of (i) data consolidation, (ii) accuracy and completeness through expert curation, and (iii) facilitation of novel hypothesis generation. Here we describe the current status of data collection and the immediate future plans for this first human chromosome-specific database. PMID:16310977

  15. CathaCyc, a metabolic pathway database built from Catharanthus roseus RNA-Seq data.

    PubMed

    Van Moerkercke, Alex; Fabris, Michele; Pollier, Jacob; Baart, Gino J E; Rombauts, Stephane; Hasnain, Ghulam; Rischer, Heiko; Memelink, Johan; Oksman-Caldentey, Kirsi-Marja; Goossens, Alain

    2013-05-01

    The medicinal plant Madagascar periwinkle (Catharanthus roseus) synthesizes numerous terpenoid indole alkaloids (TIAs), such as the anticancer drugs vinblastine and vincristine. The TIA pathway operates in a complex metabolic network that steers plant growth and survival. Pathway databases and metabolic networks reconstructed from 'omics' sequence data can help to discover missing enzymes, study metabolic pathway evolution and, ultimately, engineer metabolic pathways. To date, such databases have mainly been built for model plant species with sequenced genomes. Although genome sequence data are not available for most medicinal plant species, next-generation sequencing is now extensively employed to create comprehensive medicinal plant transcriptome sequence resources. Here we report on the construction of CathaCyc, a detailed metabolic pathway database, from C. roseus RNA-Seq data sets. CathaCyc (version 1.0) contains 390 pathways with 1,347 assigned enzymes and spans primary and secondary metabolism. Curation of the pathways linked with the synthesis of TIAs and triterpenoids, their primary metabolic precursors, and their elicitors, the jasmonate hormones, demonstrated that RNA-Seq resources are suitable for the construction of pathway databases. CathaCyc is accessible online (http://www.cathacyc.org) and offers a range of tools for the visualization and analysis of metabolic networks and 'omics' data. Overlay with expression data from publicly available RNA-Seq resources demonstrated that two well-characterized C. roseus terpenoid pathways, those of TIAs and triterpenoids, are subject to distinct regulation by both developmental and environmental cues. We anticipate that databases such as CathaCyc will become key to the study and exploitation of the metabolism of medicinal plants. PMID:23493402

  16. Data, information, knowledge and principle: back to metabolism in KEGG.

    PubMed

    Kanehisa, Minoru; Goto, Susumu; Sato, Yoko; Kawashima, Masayuki; Furumichi, Miho; Tanabe, Mao

    2014-01-01

    In the hierarchy of data, information and knowledge, computational methods play a major role in the initial processing of data to extract information, but they alone become less effective to compile knowledge from information. The Kyoto Encyclopedia of Genes and Genomes (KEGG) resource (http://www.kegg.jp/ or http://www.genome.jp/kegg/) has been developed as a reference knowledge base to assist this latter process. In particular, the KEGG pathway maps are widely used for biological interpretation of genome sequences and other high-throughput data. The link from genomes to pathways is made through the KEGG Orthology system, a collection of manually defined ortholog groups identified by K numbers. To better automate this interpretation process the KEGG modules defined by Boolean expressions of K numbers have been expanded and improved. Once genes in a genome are annotated with K numbers, the KEGG modules can be computationally evaluated revealing metabolic capacities and other phenotypic features. The reaction modules, which represent chemical units of reactions, have been used to analyze design principles of metabolic networks and also to improve the definition of K numbers and associated annotations. For translational bioinformatics, the KEGG MEDICUS resource has been developed by integrating drug labels (package inserts) used in society. PMID:24214961

  17. Algorithms for effective querying of compound graph-based pathway databases

    PubMed Central

    2009-01-01

    Background Graph-based pathway ontologies and databases are widely used to represent data about cellular processes. This representation makes it possible to programmatically integrate cellular networks and to investigate them using the well-understood concepts of graph theory in order to predict their structural and dynamic properties. An extension of this graph representation, namely hierarchically structured or compound graphs, in which a member of a biological network may recursively contain a sub-network of a somehow logically similar group of biological objects, provides many additional benefits for analysis of biological pathways, including reduction of complexity by decomposition into distinct components or modules. In this regard, it is essential to effectively query such integrated large compound networks to extract the sub-networks of interest with the help of efficient algorithms and software tools. Results Towards this goal, we developed a querying framework, along with a number of graph-theoretic algorithms from simple neighborhood queries to shortest paths to feedback loops, that is applicable to all sorts of graph-based pathway databases, from PPIs (protein-protein interactions) to metabolic and signaling pathways. The framework is unique in that it can account for compound or nested structures and ubiquitous entities present in the pathway data. In addition, the queries may be related to each other through "AND" and "OR" operators, and can be recursively organized into a tree, in which the result of one query might be a source and/or target for another, to form more complex queries. The algorithms were implemented within the querying component of a new version of the software tool PATIKAweb (Pathway Analysis Tool for Integration and Knowledge Acquisition) and have proven useful for answering a number of biologically significant questions for large graph-based pathway databases. Conclusion The PATIKA Project Web site is http

  18. A New Methodology to Associate SNPs with Human Diseases According to Their Pathway Related Context

    PubMed Central

    Bakir-Gungor, Burcu; Sezerman, Osman Ugur

    2011-01-01

    Genome-wide association studies (GWAS) with hundreds of żthousands of single nucleotide polymorphisms (SNPs) are popular strategies to reveal the genetic basis of human complex diseases. Despite many successes of GWAS, it is well recognized that new analytical approaches have to be integrated to achieve their full potential. Starting with a list of SNPs, found to be associated with disease in GWAS, here we propose a novel methodology to devise functionally important KEGG pathways through the identification of genes within these pathways, where these genes are obtained from SNP analysis. Our methodology is based on functionalization of important SNPs to identify effected genes and disease related pathways. We have tested our methodology on WTCCC Rheumatoid Arthritis (RA) dataset and identified: i) previously known RA related KEGG pathways (e.g., Toll-like receptor signaling, Jak-STAT signaling, Antigen processing, Leukocyte transendothelial migration and MAPK signaling pathways); ii) additional KEGG pathways (e.g., Pathways in cancer, Neurotrophin signaling, Chemokine signaling pathways) as associated with RA. Furthermore, these newly found pathways included genes which are targets of RA-specific drugs. Even though GWAS analysis identifies 14 out of 83 of those drug target genes; newly found functionally important KEGG pathways led to the discovery of 25 out of 83 genes, known to be used as drug targets for the treatment of RA. Among the previously known pathways, we identified additional genes associated with RA (e.g. Antigen processing and presentation, Tight junction). Importantly, within these pathways, the associations between some of these additionally found genes, such as HLA-C, HLA-G, PRKCQ, PRKCZ, TAP1, TAP2 and RA were verified by either OMIM database or by literature retrieved from the NCBI PubMed module. With the whole-genome sequencing on the horizon, we show that the full potential of GWAS can be achieved by integrating pathway and network

  19. Pathway databases and tools for their exploitation: benefits, current limitations and challenges

    PubMed Central

    Bauer-Mehren, Anna; Furlong, Laura I; Sanz, Ferran

    2009-01-01

    In past years, comprehensive representations of cell signalling pathways have been developed by manual curation from literature, which requires huge effort and would benefit from information stored in databases and from automatic retrieval and integration methods. Once a reconstruction of the network of interactions is achieved, analysis of its structural features and its dynamic behaviour can take place. Mathematical modelling techniques are used to simulate the complex behaviour of cell signalling networks, which ultimately sheds light on the mechanisms leading to complex diseases or helps in the identification of drug targets. A variety of databases containing information on cell signalling pathways have been developed in conjunction with methodologies to access and analyse the data. In principle, the scenario is prepared to make the most of this information for the analysis of the dynamics of signalling pathways. However, are the knowledge repositories of signalling pathways ready to realize the systems biology promise? In this article we aim to initiate this discussion and to provide some insights on this issue. PMID:19638971

  20. FragariaCyc: A Metabolic Pathway Database for Woodland Strawberry Fragaria vesca.

    PubMed

    Naithani, Sushma; Partipilo, Christina M; Raja, Rajani; Elser, Justin L; Jaiswal, Pankaj

    2016-01-01

    FragariaCyc is a strawberry-specific cellular metabolic network based on the annotated genome sequence of Fragaria vesca L. ssp. vesca, accession Hawaii 4. It was built on the Pathway-Tools platform using MetaCyc as the reference. The experimental evidences from published literature were used for supporting/editing existing entities and for the addition of new pathways, enzymes, reactions, compounds, and small molecules in the database. To date, FragariaCyc comprises 66 super-pathways, 488 unique pathways, 2348 metabolic reactions, 3507 enzymes, and 2134 compounds. In addition to searching and browsing FragariaCyc, researchers can compare pathways across various plant metabolic networks and analyze their data using Omics Viewer tool. We view FragariaCyc as a resource for the community of researchers working with strawberry and related fruit crops. It can help understanding the regulation of overall metabolism of strawberry plant during development and in response to diseases and abiotic stresses. FragariaCyc is available online at http://pathways.cgrb.oregonstate.edu. PMID:26973684

  1. FragariaCyc: A Metabolic Pathway Database for Woodland Strawberry Fragaria vesca

    PubMed Central

    Naithani, Sushma; Partipilo, Christina M.; Raja, Rajani; Elser, Justin L.; Jaiswal, Pankaj

    2016-01-01

    FragariaCyc is a strawberry-specific cellular metabolic network based on the annotated genome sequence of Fragaria vesca L. ssp. vesca, accession Hawaii 4. It was built on the Pathway-Tools platform using MetaCyc as the reference. The experimental evidences from published literature were used for supporting/editing existing entities and for the addition of new pathways, enzymes, reactions, compounds, and small molecules in the database. To date, FragariaCyc comprises 66 super-pathways, 488 unique pathways, 2348 metabolic reactions, 3507 enzymes, and 2134 compounds. In addition to searching and browsing FragariaCyc, researchers can compare pathways across various plant metabolic networks and analyze their data using Omics Viewer tool. We view FragariaCyc as a resource for the community of researchers working with strawberry and related fruit crops. It can help understanding the regulation of overall metabolism of strawberry plant during development and in response to diseases and abiotic stresses. FragariaCyc is available online at http://pathways.cgrb.oregonstate.edu. PMID:26973684

  2. Comprehensive analysis of the N-glycan biosynthetic pathway using bioinformatics to generate UniCorn: A theoretical N-glycan structure database.

    PubMed

    Akune, Yukie; Lin, Chi-Hung; Abrahams, Jodie L; Zhang, Jingyu; Packer, Nicolle H; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P

    2016-08-01

    Glycan structures attached to proteins are comprised of diverse monosaccharide sequences and linkages that are produced from precursor nucleotide-sugars by a series of glycosyltransferases. Databases of these structures are an essential resource for the interpretation of analytical data and the development of bioinformatics tools. However, with no template to predict what structures are possible the human glycan structure databases are incomplete and rely heavily on the curation of published, experimentally determined, glycan structure data. In this work, a library of 45 human glycosyltransferases was used to generate a theoretical database of N-glycan structures comprised of 15 or less monosaccharide residues. Enzyme specificities were sourced from major online databases including Kyoto Encyclopedia of Genes and Genomes (KEGG) Glycan, Consortium for Functional Glycomics (CFG), Carbohydrate-Active enZymes (CAZy), GlycoGene DataBase (GGDB) and BRENDA. Based on the known activities, more than 1.1 million theoretical structures and 4.7 million synthetic reactions were generated and stored in our database called UniCorn. Furthermore, we analyzed the differences between the predicted glycan structures in UniCorn and those contained in UniCarbKB (www.unicarbkb.org), a database which stores experimentally described glycan structures reported in the literature, and demonstrate that UniCorn can be used to aid in the assignment of ambiguous structures whilst also serving as a discovery database. PMID:27318307

  3. DemaDb: an integrated dematiaceous fungal genomes database

    PubMed Central

    Kuan, Chee Sian; Yew, Su Mei; Chan, Chai Ling; Toh, Yue Fen; Lee, Kok Wei; Cheong, Wei-Hien; Yee, Wai-Yan; Hoh, Chee-Choong; Yap, Soon-Joo; Ng, Kee Peng

    2016-01-01

    Many species of dematiaceous fungi are associated with allergic reactions and potentially fatal diseases in human, especially in tropical climates. Over the past 10 years, we have isolated more than 400 dematiaceous fungi from various clinical samples. In this study, DemaDb, an integrated database was designed to support the integration and analysis of dematiaceous fungal genomes. A total of 92 072 putative genes and 6527 pathways that identified in eight dematiaceous fungi (Bipolaris papendorfii UM 226, Daldinia eschscholtzii UM 1400, D. eschscholtzii UM 1020, Pyrenochaeta unguis-hominis UM 256, Ochroconis mirabilis UM 578, Cladosporium sphaerospermum UM 843, Herpotrichiellaceae sp. UM 238 and Pleosporales sp. UM 1110) were deposited in DemaDb. DemaDb includes functional annotations for all predicted gene models in all genomes, such as Gene Ontology, EuKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam and InterProScan. All predicted protein models were further functionally annotated to Carbohydrate-Active enzymes, peptidases, secondary metabolites and virulence factors. DemaDb Genome Browser enables users to browse and visualize entire genomes with annotation data including gene prediction, structure, orientation and custom feature tracks. The Pathway Browser based on the KEGG pathway database allows users to look into molecular interaction and reaction networks for all KEGG annotated genes. The availability of downloadable files containing assembly, nucleic acid, as well as protein data allows the direct retrieval for further downstream works. DemaDb is a useful resource for fungal research community especially those involved in genome-scale analysis, functional genomics, genetics and disease studies of dematiaceous fungi. Database URL: http://fungaldb.um.edu.my PMID:26980516

  4. DemaDb: an integrated dematiaceous fungal genomes database.

    PubMed

    Kuan, Chee Sian; Yew, Su Mei; Chan, Chai Ling; Toh, Yue Fen; Lee, Kok Wei; Cheong, Wei-Hien; Yee, Wai-Yan; Hoh, Chee-Choong; Yap, Soon-Joo; Ng, Kee Peng

    2016-01-01

    Many species of dematiaceous fungi are associated with allergic reactions and potentially fatal diseases in human, especially in tropical climates. Over the past 10 years, we have isolated more than 400 dematiaceous fungi from various clinical samples. In this study, DemaDb, an integrated database was designed to support the integration and analysis of dematiaceous fungal genomes. A total of 92 072 putative genes and 6527 pathways that identified in eight dematiaceous fungi (Bipolaris papendorfii UM 226, Daldinia eschscholtzii UM 1400, D. eschscholtzii UM 1020, Pyrenochaeta unguis-hominis UM 256, Ochroconis mirabilis UM 578, Cladosporium sphaerospermum UM 843, Herpotrichiellaceae sp. UM 238 and Pleosporales sp. UM 1110) were deposited in DemaDb. DemaDb includes functional annotations for all predicted gene models in all genomes, such as Gene Ontology, EuKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam and InterProScan. All predicted protein models were further functionally annotated to Carbohydrate-Active enzymes, peptidases, secondary metabolites and virulence factors. DemaDb Genome Browser enables users to browse and visualize entire genomes with annotation data including gene prediction, structure, orientation and custom feature tracks. The Pathway Browser based on the KEGG pathway database allows users to look into molecular interaction and reaction networks for all KEGG annotated genes. The availability of downloadable files containing assembly, nucleic acid, as well as protein data allows the direct retrieval for further downstream works. DemaDb is a useful resource for fungal research community especially those involved in genome-scale analysis, functional genomics, genetics and disease studies of dematiaceous fungi. Database URL: http://fungaldb.um.edu.my. PMID:26980516

  5. Multiomics in grape berry skin revealed specific induction of the stilbene synthetic pathway by ultraviolet-C irradiation.

    PubMed

    Suzuki, Mami; Nakabayashi, Ryo; Ogata, Yoshiyuki; Sakurai, Nozomu; Tokimatsu, Toshiaki; Goto, Susumu; Suzuki, Makoto; Jasinski, Michal; Martinoia, Enrico; Otagaki, Shungo; Matsumoto, Shogo; Saito, Kazuki; Shiratake, Katsuhiro

    2015-05-01

    Grape (Vitis vinifera) accumulates various polyphenolic compounds, which protect against environmental stresses, including ultraviolet-C (UV-C) light and pathogens. In this study, we looked at the transcriptome and metabolome in grape berry skin after UV-C irradiation, which demonstrated the effectiveness of omics approaches to clarify important traits of grape. We performed transcriptome analysis using a genome-wide microarray, which revealed 238 genes up-regulated more than 5-fold by UV-C light. Enrichment analysis of Gene Ontology terms showed that genes encoding stilbene synthase, a key enzyme for resveratrol synthesis, were enriched in the up-regulated genes. We performed metabolome analysis using liquid chromatography-quadrupole time-of-flight mass spectrometry, and 2,012 metabolite peaks, including unidentified peaks, were detected. Principal component analysis using the peaks showed that only one metabolite peak, identified as resveratrol, was highly induced by UV-C light. We updated the metabolic pathway map of grape in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and in the KaPPA-View 4 KEGG system, then projected the transcriptome and metabolome data on a metabolic pathway map. The map showed specific induction of the resveratrol synthetic pathway by UV-C light. Our results showed that multiomics is a powerful tool to elucidate the accumulation mechanisms of secondary metabolites, and updated systems, such as KEGG and KaPPA-View 4 KEGG for grape, can support such studies. PMID:25761715

  6. Multiomics in Grape Berry Skin Revealed Specific Induction of the Stilbene Synthetic Pathway by Ultraviolet-C Irradiation1

    PubMed Central

    Suzuki, Mami; Nakabayashi, Ryo; Ogata, Yoshiyuki; Sakurai, Nozomu; Tokimatsu, Toshiaki; Goto, Susumu; Suzuki, Makoto; Jasinski, Michal; Martinoia, Enrico; Otagaki, Shungo; Matsumoto, Shogo; Saito, Kazuki; Shiratake, Katsuhiro

    2015-01-01

    Grape (Vitis vinifera) accumulates various polyphenolic compounds, which protect against environmental stresses, including ultraviolet-C (UV-C) light and pathogens. In this study, we looked at the transcriptome and metabolome in grape berry skin after UV-C irradiation, which demonstrated the effectiveness of omics approaches to clarify important traits of grape. We performed transcriptome analysis using a genome-wide microarray, which revealed 238 genes up-regulated more than 5-fold by UV-C light. Enrichment analysis of Gene Ontology terms showed that genes encoding stilbene synthase, a key enzyme for resveratrol synthesis, were enriched in the up-regulated genes. We performed metabolome analysis using liquid chromatography-quadrupole time-of-flight mass spectrometry, and 2,012 metabolite peaks, including unidentified peaks, were detected. Principal component analysis using the peaks showed that only one metabolite peak, identified as resveratrol, was highly induced by UV-C light. We updated the metabolic pathway map of grape in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and in the KaPPA-View 4 KEGG system, then projected the transcriptome and metabolome data on a metabolic pathway map. The map showed specific induction of the resveratrol synthetic pathway by UV-C light. Our results showed that multiomics is a powerful tool to elucidate the accumulation mechanisms of secondary metabolites, and updated systems, such as KEGG and KaPPA-View 4 KEGG for grape, can support such studies. PMID:25761715

  7. Enhancing a Pathway-Genome Database (PGDB) to Capture Subcellular Localization of Metabolites and Enzymes: The Nucleotide-Sugar Biosynthetic Pathways of Populus trichocarpa

    SciTech Connect

    Nag, A.; Karpinets, T. V.; Chang, C. H.; Bar-Peled, M.

    2012-01-01

    Understanding how cellular metabolism works and is regulated requires that the underlying biochemical pathways be adequately represented and integrated with large metabolomic data sets to establish a robust network model. Genetically engineering energy crops to be less recalcitrant to saccharification requires detailed knowledge of plant polysaccharide structures and a thorough understanding of the metabolic pathways involved in forming and regulating cell-wall synthesis. Nucleotide-sugars are building blocks for synthesis of cell wall polysaccharides. The biosynthesis of nucleotide-sugars is catalyzed by a multitude of enzymes that reside in different subcellular organelles, and precise representation of these pathways requires accurate capture of this biological compartmentalization. The lack of simple localization cues in genomic sequence data and annotations however leads to missing compartmentalization information for eukaryotes in automatically generated databases, such as the Pathway-Genome Databases (PGDBs) of the SRI Pathway Tools software that drives much biochemical knowledge representation on the internet. In this report, we provide an informal mechanism using the existing Pathway Tools framework to integrate protein and metabolite sub-cellular localization data with the existing representation of the nucleotide-sugar metabolic pathways in a prototype PGDB for Populus trichocarpa. The enhanced pathway representations have been successfully used to map SNP abundance data to individual nucleotide-sugar biosynthetic genes in the PGDB. The manually curated pathway representations are more conducive to the construction of a computational platform that will allow the simulation of natural and engineered nucleotide-sugar precursor fluxes into specific recalcitrant polysaccharide(s).

  8. Pathway Network Analyses for Autism Reveal Multisystem Involvement, Major Overlaps with Other Diseases and Convergence upon MAPK and Calcium Signaling.

    PubMed

    Wen, Ya; Alshikho, Mohamad J; Herbert, Martha R

    2016-01-01

    We used established databases in standard ways to systematically characterize gene ontologies, pathways and functional linkages in the large set of genes now associated with autism spectrum disorders (ASDs). These conditions are particularly challenging--they lack clear pathognomonic biological markers, they involve great heterogeneity across multiple levels (genes, systemic biological and brain characteristics, and nuances of behavioral manifestations)-and yet everyone with this diagnosis meets the same defining behavioral criteria. Using the human gene list from Simons Foundation Autism Research Initiative (SFARI) we performed gene set enrichment analysis with the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Database, and then derived a pathway network from pathway-pathway functional interactions again in reference to KEGG. Through identifying the GO (Gene Ontology) groups in which SFARI genes were enriched, mapping the coherence between pathways and GO groups, and ranking the relative strengths of representation of pathway network components, we 1) identified 10 disease-associated and 30 function-associated pathways 2) revealed calcium signaling pathway and neuroactive ligand-receptor interaction as the most enriched, statistically significant pathways from the enrichment analysis, 3) showed calcium signaling pathways and MAPK signaling pathway to be interactive hubs with other pathways and also to be involved with pervasively present biological processes, 4) found convergent indications that the process "calcium-PRC (protein kinase C)-Ras-Raf-MAPK/ERK" is likely a major contributor to ASD pathophysiology, and 5) noted that perturbations associated with KEGG's category of environmental information processing were common. These findings support the idea that ASD-associated genes may contribute not only to core features of ASD themselves but also to vulnerability to other chronic and systemic problems potentially including cancer, metabolic conditions

  9. Retroviral insertions in the VISION database identify molecular pathways in mouse lymphoid leukemia and lymphoma

    PubMed Central

    Weiser, Keith C.; Liu, Bin; Hansen, Gwenn M.; Skapura, Darlene; Hentges, Kathryn E.; Yarlagadda, Sujatha; Morse III, Herbert C.

    2007-01-01

    AKXD recombinant inbred (RI) strains develop a variety of leukemias and lymphomas due to somatically acquired insertions of retroviral DNA into the genome of hematopoetic cells that can mutate cellular proto-oncogenes and tumor suppressor genes. We generated a new set of tumors from nine AKXD RI strains selected for their propensity to develop B-cell tumors, the most common type of human hematopoietic cancers. We employed a PCR technique called viral insertion site amplification (VISA) to rapidly isolate genomic sequence at the site of provirus insertion. Here we describe 550 VISA sequence tags (VSTs) that identify 74 common insertion sites (CISs), of which 21 have not been identified previously. Several suspected proto-oncogenes and tumor suppressor genes lie near CISs, providing supportive evidence for their roles in cancer. Furthermore, numerous previously uncharacterized genes lie near CISs, providing a pool of candidate disease genes for future research. Pathway analysis of candidate genes identified several signaling pathways as common and powerful routes to blood cancer, including Notch, E-protein, NFκB, and Ras signaling. Misregulation of several Notch signaling genes was confirmed by quantitative RT-PCR. Our data suggest that analyses of insertional mutagenesis on a single genetic background are biased toward the identification of cooperating mutations. This tumor collection represents the most comprehensive study of the genetics of B-cell leukemia and lymphoma development in mice. We have deposited the VST sequences, CISs in a genome viewer, histopathology, and molecular tumor typing data in a public web database called VISION (Viral Insertion Sites Identifying Oncogenes), which is located at http://www.mouse-genome.bcm.tmc.edu/vision. PMID:17926094

  10. The pathway ontology – updates and applications

    PubMed Central

    2014-01-01

    Background The Pathway Ontology (PW) developed at the Rat Genome Database (RGD), covers all types of biological pathways, including altered and disease pathways and captures the relationships between them within the hierarchical structure of a directed acyclic graph. The ontology allows for the standardized annotation of rat, and of human and mouse genes to pathway terms. It also constitutes a vehicle for easy navigation between gene and ontology report pages, between reports and interactive pathway diagrams, between pathways directly connected within a diagram and between those that are globally related in pathway suites and suite networks. Surveys of the literature and the development of the Pathway and Disease Portals are important sources for the ongoing development of the ontology. User requests and mapping of pathways in other databases to terms in the ontology further contribute to increasing its content. Recently built automated pipelines use the mapped terms to make available the annotations generated by other groups. Results The two released pipelines – the Pathway Interaction Database (PID) Annotation Import Pipeline and the Kyoto Encyclopedia of Genes and Genomes (KEGG) Annotation Import Pipeline, make available over 7,400 and 31,000 pathway gene annotations, respectively. Building the PID pipeline lead to the addition of new terms within the signaling node, also augmented by the release of the RGD “Immune and Inflammatory Disease Portal” at that time. Building the KEGG pipeline lead to a substantial increase in the number of disease pathway terms, such as those within the ‘infectious disease pathway’ parent term category. The ‘drug pathway’ node has also seen increases in the number of terms as well as a restructuring of the node. Literature surveys, disease portal deployments and user requests have contributed and continue to contribute additional new terms across the ontology. Since first presented, the content of PW has increased by

  11. In silico database screening of potential targets and pathways of compounds contained in plants used for psoriasis vulgaris.

    PubMed

    May, Brian H; Deng, Shiqiang; Zhang, Anthony L; Lu, Chuanjian; Xue, Charlie C L

    2015-09-01

    Reviews and meta-analyses of clinical trials identified plants used as traditional medicines (TMs) that show promise for psoriasis. These include Rehmannia glutinosa, Camptotheca acuminata, Indigo naturalis and Salvia miltiorrhiza. Compounds contained in these TMs have shown activities of relevance to psoriasis in experimental models. To further investigate the likely mechanisms of action of the multiple compounds in these TMs, we undertook a computer-based in silico investigation of the proteins known to be regulated by these compounds and their associated biological pathways. The proteins reportedly regulated by compounds in these four TMs were identified using the HIT (Herbal Ingredients' Targets) database. The resultant data were entered into the PANTHER (Protein ANnotation THrough Evolutionary Relationship) database to identify the pathways in which the proteins could be involved. The study identified 237 compounds in the TMs and these retrieved 287 proteins from HIT. These proteins identified 59 pathways in PANTHER with most proteins being located in the Apoptosis, Angiogenesis, Inflammation mediated by chemokine and cytokine, Gonadotropin releasing hormone receptor, and/or Interleukin signaling pathways. All four TMs contained compounds that had regulating effects on Apoptosis regulator BAX, Apoptosis regulator Bcl-2, Caspase-3, Tumor necrosis factor (TNF) or Prostaglandin G/H synthase 2 (COX2). The main proteins and pathways are primarily related to inflammation, proliferation and angiogenesis which are all processes involved in psoriasis. Experimental studies have reported that certain compounds from these TMs can regulate the expression of proteins involved in each of these pathways. PMID:26142738

  12. A novel method to identify pathways associated with renal cell carcinoma based on a gene co-expression network.

    PubMed

    Ruan, Xiyun; Li, Hongyun; Liu, Bo; Chen, Jie; Zhang, Shibao; Sun, Zeqiang; Liu, Shuangqing; Sun, Fahai; Liu, Qingyong

    2015-08-01

    The aim of the present study was to develop a novel method for identifying pathways associated with renal cell carcinoma (RCC) based on a gene co-expression network. A framework was established where a co-expression network was derived from the database as well as various co-expression approaches. First, the backbone of the network based on differentially expressed (DE) genes between RCC patients and normal controls was constructed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database. The differentially co-expressed links were detected by Pearson's correlation, the empirical Bayesian (EB) approach and Weighted Gene Co-expression Network Analysis (WGCNA). The co-expressed gene pairs were merged by a rank-based algorithm. We obtained 842; 371; 2,883 and 1,595 co-expressed gene pairs from the co-expression networks of the STRING database, Pearson's correlation EB method and WGCNA, respectively. Two hundred and eighty-one differentially co-expressed (DC) gene pairs were obtained from the merged network using this novel method. Pathway enrichment analysis based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and the network enrichment analysis (NEA) method were performed to verify feasibility of the merged method. Results of the KEGG and NEA pathway analyses showed that the network was associated with RCC. The suggested method was computationally efficient to identify pathways associated with RCC and has been identified as a useful complement to traditional co-expression analysis. PMID:26058425

  13. iPathCons and iPathDB: an improved insect pathway construction tool and the database

    PubMed Central

    Zhang, Zan; Yin, Chuanlin; Liu, Ying; Jie, Wencai; Lei, Wenjie; Li, Fei

    2014-01-01

    Insects are one of the most successful animal groups on earth. Some insects, such as the silkworm and honeybee, are beneficial to humans, whereas others are notorious pests of crops. At present, the genomes of 38 insects have been sequenced and made publically available. In addition, the transcriptomes of dozens of insects have been sequenced. As gene data rapidly accumulate, constructing the pathway of molecular interactions becomes increasingly important for entomological research. Here, we developed an improved tool, iPathCons, for knowledge-based construction of pathways from the transcriptomes or the official gene sets of genomes. Considering the high evolution diversity in insects, iPathCons uses a voting system for Kyoto Encyclopedia of Genes and Genomes Orthology assignment. Both stand-alone software and a web server of iPathCons are provided. Using iPathCons, we constructed the pathways of molecular interactions of 52 insects, including 37 genome-sequenced and 15 transcriptome-sequenced ones. These pathways are available in the iPathDB, which provides searches, web server, data downloads, etc. This database will be highly useful for the insect research community. Database URL: http://ento.njau.edu.cn/ipath/ PMID:25388589

  14. PathCase-SB architecture and database design

    PubMed Central

    2011-01-01

    Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889

  15. Pathway Network Analyses for Autism Reveal Multisystem Involvement, Major Overlaps with Other Diseases and Convergence upon MAPK and Calcium Signaling

    PubMed Central

    Wen, Ya; Alshikho, Mohamad J.; Herbert, Martha R.

    2016-01-01

    We used established databases in standard ways to systematically characterize gene ontologies, pathways and functional linkages in the large set of genes now associated with autism spectrum disorders (ASDs). These conditions are particularly challenging—they lack clear pathognomonic biological markers, they involve great heterogeneity across multiple levels (genes, systemic biological and brain characteristics, and nuances of behavioral manifestations)—and yet everyone with this diagnosis meets the same defining behavioral criteria. Using the human gene list from Simons Foundation Autism Research Initiative (SFARI) we performed gene set enrichment analysis with the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Database, and then derived a pathway network from pathway-pathway functional interactions again in reference to KEGG. Through identifying the GO (Gene Ontology) groups in which SFARI genes were enriched, mapping the coherence between pathways and GO groups, and ranking the relative strengths of representation of pathway network components, we 1) identified 10 disease-associated and 30 function-associated pathways 2) revealed calcium signaling pathway and neuroactive ligand-receptor interaction as the most enriched, statistically significant pathways from the enrichment analysis, 3) showed calcium signaling pathways and MAPK signaling pathway to be interactive hubs with other pathways and also to be involved with pervasively present biological processes, 4) found convergent indications that the process “calcium-PRC (protein kinase C)-Ras-Raf-MAPK/ERK” is likely a major contributor to ASD pathophysiology, and 5) noted that perturbations associated with KEGG’s category of environmental information processing were common. These findings support the idea that ASD-associated genes may contribute not only to core features of ASD themselves but also to vulnerability to other chronic and systemic problems potentially including cancer, metabolic

  16. WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data

    PubMed Central

    Yi, Ming; Horton, Jay D; Cohen, Jonathan C; Hobbs, Helen H; Stephens, Robert M

    2006-01-01

    Background Analysis of High Throughput (HTP) Data such as microarray and proteomics data has provided a powerful methodology to study patterns of gene regulation at genome scale. A major unresolved problem in the post-genomic era is to assemble the large amounts of data generated into a meaningful biological context. We have developed a comprehensive software tool, WholePathwayScope (WPS), for deriving biological insights from analysis of HTP data. Result WPS extracts gene lists with shared biological themes through color cue templates. WPS statistically evaluates global functional category enrichment of gene lists and pathway-level pattern enrichment of data. WPS incorporates well-known biological pathways from KEGG (Kyoto Encyclopedia of Genes and Genomes) and Biocarta, GO (Gene Ontology) terms as well as user-defined pathways or relevant gene clusters or groups, and explores gene-term relationships within the derived gene-term association networks (GTANs). WPS simultaneously compares multiple datasets within biological contexts either as pathways or as association networks. WPS also integrates Genetic Association Database and Partial MedGene Database for disease-association information. We have used this program to analyze and compare microarray and proteomics datasets derived from a variety of biological systems. Application examples demonstrated the capacity of WPS to significantly facilitate the analysis of HTP data for integrative discovery. Conclusion This tool represents a pathway-based platform for discovery integration to maximize analysis power. The tool is freely available at . PMID:16423281

  17. Using Bioinformatic Approaches to Identify Pathways Targeted by Human Leukemogens

    PubMed Central

    Thomas, Reuben; Phuong, Jimmy; McHale, Cliona M.; Zhang, Luoping

    2012-01-01

    We have applied bioinformatic approaches to identify pathways common to chemical leukemogens and to determine whether leukemogens could be distinguished from non-leukemogenic carcinogens. From all known and probable carcinogens classified by IARC and NTP, we identified 35 carcinogens that were associated with leukemia risk in human studies and 16 non-leukemogenic carcinogens. Using data on gene/protein targets available in the Comparative Toxicogenomics Database (CTD) for 29 of the leukemogens and 11 of the non-leukemogenic carcinogens, we analyzed for enrichment of all 250 human biochemical pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The top pathways targeted by the leukemogens included metabolism of xenobiotics by cytochrome P450, glutathione metabolism, neurotrophin signaling pathway, apoptosis, MAPK signaling, Toll-like receptor signaling and various cancer pathways. The 29 leukemogens formed 18 distinct clusters comprising 1 to 3 chemicals that did not correlate with known mechanism of action or with structural similarity as determined by 2D Tanimoto coefficients in the PubChem database. Unsupervised clustering and one-class support vector machines, based on the pathway data, were unable to distinguish the 29 leukemogens from 11 non-leukemogenic known and probable IARC carcinogens. However, using two-class random forests to estimate leukemogen and non-leukemogen patterns, we estimated a 76% chance of distinguishing a random leukemogen/non-leukemogen pair from each other. PMID:22851955

  18. TrypanoCyc: a community-led biochemical pathways database for Trypanosoma brucei

    PubMed Central

    Shameer, Sanu; Logan-Klumpler, Flora J.; Vinson, Florence; Cottret, Ludovic; Merlet, Benjamin; Achcar, Fiona; Boshart, Michael; Berriman, Matthew; Breitling, Rainer; Bringaud, Frédéric; Bütikofer, Peter; Cattanach, Amy M.; Bannerman-Chukualim, Bridget; Creek, Darren J.; Crouch, Kathryn; de Koning, Harry P.; Denise, Hubert; Ebikeme, Charles; Fairlamb, Alan H.; Ferguson, Michael A. J.; Ginger, Michael L.; Hertz-Fowler, Christiane; Kerkhoven, Eduard J.; Mäser, Pascal; Michels, Paul A. M.; Nayak, Archana; Nes, David W.; Nolan, Derek P.; Olsen, Christian; Silva-Franco, Fatima; Smith, Terry K.; Taylor, Martin C.; Tielens, Aloysius G. M.; Urbaniak, Michael D.; van Hellemond, Jaap J.; Vincent, Isabel M.; Wilkinson, Shane R.; Wyllie, Susan; Opperdoes, Fred R.; Barrett, Michael P.; Jourdan, Fabien

    2015-01-01

    The metabolic network of a cell represents the catabolic and anabolic reactions that interconvert small molecules (metabolites) through the activity of enzymes, transporters and non-catalyzed chemical reactions. Our understanding of individual metabolic networks is increasing as we learn more about the enzymes that are active in particular cells under particular conditions and as technologies advance to allow detailed measurements of the cellular metabolome. Metabolic network databases are of increasing importance in allowing us to contextualise data sets emerging from transcriptomic, proteomic and metabolomic experiments. Here we present a dynamic database, TrypanoCyc (http://www.metexplore.fr/trypanocyc/), which describes the generic and condition-specific metabolic network of Trypanosoma brucei, a parasitic protozoan responsible for human and animal African trypanosomiasis. In addition to enabling navigation through the BioCyc-based TrypanoCyc interface, we have also implemented a network-based representation of the information through MetExplore, yielding a novel environment in which to visualise the metabolism of this important parasite. PMID:25300491

  19. TrypanoCyc: a community-led biochemical pathways database for Trypanosoma brucei.

    PubMed

    Shameer, Sanu; Logan-Klumpler, Flora J; Vinson, Florence; Cottret, Ludovic; Merlet, Benjamin; Achcar, Fiona; Boshart, Michael; Berriman, Matthew; Breitling, Rainer; Bringaud, Frédéric; Bütikofer, Peter; Cattanach, Amy M; Bannerman-Chukualim, Bridget; Creek, Darren J; Crouch, Kathryn; de Koning, Harry P; Denise, Hubert; Ebikeme, Charles; Fairlamb, Alan H; Ferguson, Michael A J; Ginger, Michael L; Hertz-Fowler, Christiane; Kerkhoven, Eduard J; Mäser, Pascal; Michels, Paul A M; Nayak, Archana; Nes, David W; Nolan, Derek P; Olsen, Christian; Silva-Franco, Fatima; Smith, Terry K; Taylor, Martin C; Tielens, Aloysius G M; Urbaniak, Michael D; van Hellemond, Jaap J; Vincent, Isabel M; Wilkinson, Shane R; Wyllie, Susan; Opperdoes, Fred R; Barrett, Michael P; Jourdan, Fabien

    2015-01-01

    The metabolic network of a cell represents the catabolic and anabolic reactions that interconvert small molecules (metabolites) through the activity of enzymes, transporters and non-catalyzed chemical reactions. Our understanding of individual metabolic networks is increasing as we learn more about the enzymes that are active in particular cells under particular conditions and as technologies advance to allow detailed measurements of the cellular metabolome. Metabolic network databases are of increasing importance in allowing us to contextualise data sets emerging from transcriptomic, proteomic and metabolomic experiments. Here we present a dynamic database, TrypanoCyc (http://www.metexplore.fr/trypanocyc/), which describes the generic and condition-specific metabolic network of Trypanosoma brucei, a parasitic protozoan responsible for human and animal African trypanosomiasis. In addition to enabling navigation through the BioCyc-based TrypanoCyc interface, we have also implemented a network-based representation of the information through MetExplore, yielding a novel environment in which to visualise the metabolism of this important parasite. PMID:25300491

  20. PathwAX: a web server for network crosstalk based pathway annotation.

    PubMed

    Ogris, Christoph; Helleday, Thomas; Sonnhammer, Erik L L

    2016-07-01

    Pathway annotation of gene lists is often used to functionally analyse biomolecular data such as gene expression in order to establish which processes are activated in a given experiment. Databases such as KEGG or GO represent collections of how genes are known to be organized in pathways, and the challenge is to compare a given gene list with the known pathways such that all true relations are identified. Most tools apply statistical measures to the gene overlap between the gene list and pathway. It is however problematic to avoid false negatives and false positives when only using the gene overlap. The pathwAX web server (http://pathwAX.sbc.su.se/) applies a different approach which is based on network crosstalk. It uses the comprehensive network FunCoup to analyse network crosstalk between a query gene list and KEGG pathways. PathwAX runs the BinoX algorithm, which employs Monte-Carlo sampling of randomized networks and estimates a binomial distribution, for estimating the statistical significance of the crosstalk. This results in substantially higher accuracy than gene overlap methods. The system was optimized for speed and allows interactive web usage. We illustrate the usage and output of pathwAX. PMID:27151197

  1. SubtiWiki-a database for the model organism Bacillus subtilis that links pathway, interaction and expression information.

    PubMed

    Michna, Raphael H; Commichau, Fabian M; Tödter, Dominik; Zschiedrich, Christopher P; Stülke, Jörg

    2014-01-01

    Genome annotation and access to information from large-scale experimental approaches at the genome level are essential to improve our understanding of living cells and organisms. This is even more the case for model organisms that are the basis to study pathogens and technologically important species. We have generated SubtiWiki, a database for the Gram-positive model bacterium Bacillus subtilis (http://subtiwiki.uni-goettingen.de/). In addition to the established companion modules of SubtiWiki, SubtiPathways and SubtInteract, we have now created SubtiExpress, a third module, to visualize genome scale transcription data that are of unprecedented quality and density. Today, SubtiWiki is one of the most complete collections of knowledge on a living organism in one single resource. PMID:24178028

  2. SubtiWiki–a database for the model organism Bacillus subtilis that links pathway, interaction and expression information

    PubMed Central

    Michna, Raphael H.; Commichau, Fabian M.; Tödter, Dominik; Zschiedrich, Christopher P.; Stülke, Jörg

    2014-01-01

    Genome annotation and access to information from large-scale experimental approaches at the genome level are essential to improve our understanding of living cells and organisms. This is even more the case for model organisms that are the basis to study pathogens and technologically important species. We have generated SubtiWiki, a database for the Gram-positive model bacterium Bacillus subtilis (http://subtiwiki.uni-goettingen.de/). In addition to the established companion modules of SubtiWiki, SubtiPathways and SubtInteract, we have now created SubtiExpress, a third module, to visualize genome scale transcription data that are of unprecedented quality and density. Today, SubtiWiki is one of the most complete collections of knowledge on a living organism in one single resource. PMID:24178028

  3. Genetics of Late-Onset Alzheimer's Disease: Update from the Alzgene Database and Analysis of Shared Pathways

    PubMed Central

    Olgiati, Paolo; Politis, Antonis M.; Papadimitriou, George N.; De Ronchi, Diana; Serretti, Alessandro

    2011-01-01

    The genetics of late-onset Alzheimer's disease (LOAD) has taken impressive steps forwards in the last few years. To date, more than six-hundred genes have been linked to the disorder. However, only a minority of them are supported by a sufficient level of evidence. This review focused on such genes and analyzed shared biological pathways. Genetic markers were selected from a web-based collection (Alzgene). For each SNP in the database, it was possible to perform a meta-analysis. The quality of studies was assessed using criteria such as size of research samples, heterogeneity across studies, and protection from publication bias. This produced a list of 15 top-rated genes: APOE, CLU, PICALM, EXOC3L2, BIN1, CR1, SORL1, TNK1, IL8, LDLR, CST3, CHRNB2, SORCS1, TNF, and CCR2. A systematic analysis of gene ontology terms associated with each marker showed that most genes were implicated in cholesterol metabolism, intracellular transport of beta-amyloid precursor, and autophagy of damaged organelles. Moreover, the impact of these genes on complement cascade and cytokine production highlights the role of inflammatory response in AD pathogenesis. Gene-gene and gene-environment interactions are prominent issues in AD genetics, but they are not specifically featured in the Alzgene database. PMID:22191060

  4. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways.

    PubMed

    Hattori, Masahiro; Okuno, Yasushi; Goto, Susumu; Kanehisa, Minoru

    2003-10-01

    Cellular functions result from intricate networks of molecular interactions, which involve not only proteins and nucleic acids but also small chemical compounds. Here we present an efficient algorithm for comparing two chemical structures of compounds, where the chemical structure is treated as a graph consisting of atoms as nodes and covalent bonds as edges. On the basis of the concept of functional groups, 68 atom types (node types) are defined for carbon, nitrogen, oxygen, and other atomic species with different environments, which has enabled detection of biochemically meaningful features. Maximal common subgraphs of two graphs can be found by searching for maximal cliques in the association graph, and we have introduced heuristics to accelerate the clique finding and to detect optimal local matches (simply connected common subgraphs). Our procedure was applied to the comparison and clustering of 9383 compounds, mostly metabolic compounds, in the KEGG/LIGAND database. The largest clusters of similar compounds were related to carbohydrates, and the clusters corresponded well to the categorization of pathways as represented by the KEGG pathway map numbers. When each pathway map was examined in more detail, finer clusters could be identified corresponding to subpathways or pathway modules containing continuous sets of reaction steps. Furthermore, it was found that the pathway modules identified by similar compound structures sometimes overlap with the pathway modules identified by genomic contexts, namely, by operon structures of enzyme genes. PMID:14505407

  5. SoyFN: a knowledge database of soybean functional networks.

    PubMed

    Xu, Yungang; Guo, Maozu; Liu, Xiaoyan; Wang, Chunyu; Liu, Yang

    2014-01-01

    Many databases for soybean genomic analysis have been built and made publicly available, but few of them contain knowledge specifically targeting the omics-level gene-gene, gene-microRNA (miRNA) and miRNA-miRNA interactions. Here, we present SoyFN, a knowledge database of soybean functional gene networks and miRNA functional networks. SoyFN provides user-friendly interfaces to retrieve, visualize, analyze and download the functional networks of soybean genes and miRNAs. In addition, it incorporates much information about KEGG pathways, gene ontology annotations and 3'-UTR sequences as well as many useful tools including SoySearch, ID mapping, Genome Browser, eFP Browser and promoter motif scan. SoyFN is a schema-free database that can be accessed as a Web service from any modern programming language using a simple Hypertext Transfer Protocol call. The Web site is implemented in Java, JavaScript, PHP, HTML and Apache, with all major browsers supported. We anticipate that this database will be useful for members of research communities both in soybean experimental science and bioinformatics. Database URL: http://nclab.hit.edu.cn/SoyFN. PMID:24618044

  6. dEMBF: A Comprehensive Database of Enzymes of Microalgal Biofuel Feedstock

    PubMed Central

    Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar; Mishra, Barada Kanta

    2016-01-01

    Microalgae have attracted wide attention as one of the most versatile renewable feedstocks for production of biofuel. To develop genetically engineered high lipid yielding algal strains, a thorough understanding of the lipid biosynthetic pathway and the underpinning enzymes is essential. In this work, we have systematically mined the genomes of fifteen diverse algal species belonging to Chlorophyta, Heterokontophyta, Rhodophyta, and Haptophyta, to identify and annotate the putative enzymes of lipid metabolic pathway. Consequently, we have also developed a database, dEMBF (Database of Enzymes of Microalgal Biofuel Feedstock), which catalogues the complete list of identified enzymes along with their computed annotation details including length, hydrophobicity, amino acid composition, subcellular location, gene ontology, KEGG pathway, orthologous group, Pfam domain, intron-exon organization, transmembrane topology, and secondary/tertiary structural data. Furthermore, to facilitate functional and evolutionary study of these enzymes, a collection of built-in applications for BLAST search, motif identification, sequence and phylogenetic analysis have been seamlessly integrated into the database. dEMBF is the first database that brings together all enzymes responsible for lipid synthesis from available algal genomes, and provides an integrative platform for enzyme inquiry and analysis. This database will be extremely useful for algal biofuel research. It can be accessed at http://bbprof.immt.res.in/embf. PMID:26727469

  7. Extending pathways based on gene lists using InterPro domain signatures

    PubMed Central

    Hahne, Florian; Mehrle, Alexander; Arlt, Dorit; Poustka, Annemarie; Wiemann, Stefan; Beissbarth, Tim

    2008-01-01

    Background High-throughput technologies like functional screens and gene expression analysis produce extended lists of candidate genes. Gene-Set Enrichment Analysis is a commonly used and well established technique to test for the statistically significant over-representation of particular pathways. A shortcoming of this method is however, that most genes that are investigated in the experiments have very sparse functional or pathway annotation and therefore cannot be the target of such an analysis. The approach presented here aims to assign lists of genes with limited annotation to previously described functional gene collections or pathways. This works by comparing InterPro domain signatures of the candidate gene lists with domain signatures of gene sets derived from known classifications, e.g. KEGG pathways. Results In order to validate our approach, we designed a simulation study. Based on all pathways available in the KEGG database, we create test gene lists by randomly selecting pathway genes, removing these genes from the known pathways and adding variable amounts of noise in the form of genes not annotated to the pathway. We show that we can recover pathway memberships based on the simulated gene lists with high accuracy. We further demonstrate the applicability of our approach on a biological example. Conclusion Results based on simulation and data analysis show that domain based pathway enrichment analysis is a very sensitive method to test for enrichment of pathways in sparsely annotated lists of genes. An R based software package domainsignatures, to routinely perform this analysis on the results of high-throughput screening, is available via Bioconductor. PMID:18177498

  8. PathPPI: an integrated dataset of human pathways and protein-protein interactions.

    PubMed

    Tang, HaiLin; Zhong, Fan; Liu, Wei; He, FuChu; Xie, HongWei

    2015-06-01

    Integration of pathway and protein-protein interaction (PPI) data can provide more information that could lead to new biological insights. PPIs are usually represented by a simple binary model, whereas pathways are represented by more complicated models. We developed a series of rules for transforming protein interactions from pathway to binary model, and the protein interactions from seven pathway databases, including PID, BioCarta, Reactome, NetPath, INOH, SPIKE and KEGG, were transformed based on these rules. These pathway-derived binary protein interactions were integrated with PPIs from other five PPI databases including HPRD, IntAct, BioGRID, MINT and DIP, to develop integrated dataset (named PathPPI). More detailed interaction type and modification information on protein interactions can be preserved in PathPPI than other existing datasets. Comparison analysis results indicate that most of the interaction overlaps values (O AB) among these pathway databases were less than 5%, and these databases must be used conjunctively. The PathPPI data was provided at http://proteomeview.hupo.org.cn/PathPPI/PathPPI.html. PMID:25591449

  9. Pedigree-based random effect tests to screen gene pathways.

    PubMed

    Almeida, Marcio; Peralta, Juan M; Farook, Vidya; Puppala, Sobha; Kent, John W; Duggirala, Ravindranath; Blangero, John

    2014-01-01

    The new generation of sequencing platforms opens new horizons in the genetics field. It is possible to exhaustively assay all genetic variants in an individual and search for phenotypic associations. The whole genome sequencing approach, when applied to a large human sample like the San Antonio Family Study, detects a very large number (>25 million) of single nucleotide variants along with other more complex variants. The analytical challenges imposed by this number of variants are formidable, suggesting that methods are needed to reduce the overall number of statistical tests. In this study, we develop a single degree-of-freedom test of variants in a gene pathway employing a random effect model that uses an empirical pathway-specific genetic relationship matrix as the focal covariance kernel. The empirical pathway-specific genetic relationship uses all variants (or a chosen subset) from gene members of a given biological pathway. Using SOLAR's pedigree-based variance components modeling, which also allows for arbitrary fixed effects, such as principal components, to deal with latent population structure, we employ a likelihood ratio test of the pathway-specific genetic relationship matrix model. We examine all gene pathways in KEGG database gene pathways using our method in the first replicate of the Genetic Analysis Workshop 18 simulation of systolic blood pressure. Our random effect approach was able to detect true association signals in causal gene pathways. Those pathways could be easily be further dissected by the independent analysis of all markers. PMID:25519354

  10. Genomic Gene Clustering Analysis of Pathways in Eukaryotes

    PubMed Central

    Lee, Jennifer M.; Sonnhammer, Erik L.L.

    2003-01-01

    Genomic clustering of genes in a pathway is commonly found in prokaryotes due to transcriptional operons, but these are not present in most eukaryotes. Yet, there might be clustering to a lesser extent of pathway members in eukaryotic genomes, that assist coregulation of a set of functionally cooperating genes. We analyzed five sequenced eukaryotic genomes for clustering of genes assigned to the same pathway in the KEGG database. Between 98% and 30% of the analyzed pathways in a genome were found to exhibit significantly higher clustering levels than expected by chance. In descending order by the level of clustering, the genomes studied were Saccharomyces cerevisiae, Homo sapiens, Caenorhabditis elegans, Arabidopsis thaliana, and Drosophila melanogaster. Surprisingly, there is not much agreement between genomes in terms of which pathways are most clustered. Only seven of 69 pathways found in all species were significantly clustered in all five of them. This species-specific pattern of pathway clustering may reflect adaptations or evolutionary events unique to a particular lineage. We note that although operons are common in C. elegans, only 58% of the pathways showed significant clustering, which is less than in human. Virtually all pathways in S. cerevisiae showed significant clustering. PMID:12695325

  11. Pathway modeling of microarray data: A case study of pathway activity changes in the testis following in utero exposure to dibutyl phthalate (DBP)

    SciTech Connect

    Ovacik, Meric A.; Sen, Banalata; Euling, Susan Y.; Gaido, Kevin W.; Ierapetritou, Marianthi G.; Androulakis, Ioannis P.

    2013-09-15

    Pathway activity level analysis, the approach pursued in this study, focuses on all genes that are known to be members of metabolic and signaling pathways as defined by the KEGG database. The pathway activity level analysis entails singular value decomposition (SVD) of the expression data of the genes constituting a given pathway. We explore an extension of the pathway activity methodology for application to time-course microarray data. We show that pathway analysis enhances our ability to detect biologically relevant changes in pathway activity using synthetic data. As a case study, we apply the pathway activity level formulation coupled with significance analysis to microarray data from two different rat testes exposed in utero to Dibutyl Phthalate (DBP). In utero DBP exposure in the rat results in developmental toxicity of a number of male reproductive organs, including the testes. One well-characterized mode of action for DBP and the male reproductive developmental effects is the repression of expression of genes involved in cholesterol transport, steroid biosynthesis and testosterone synthesis that lead to a decreased fetal testicular testosterone. Previous analyses of DBP testes microarray data focused on either individual gene expression changes or changes in the expression of specific genes that are hypothesized, or known, to be important in testicular development and testosterone synthesis. However, a pathway analysis may inform whether there are additional affected pathways that could inform additional modes of action linked to DBP developmental toxicity. We show that Pathway activity analysis may be considered for a more comprehensive analysis of microarray data.

  12. Pandora, a PAthway and Network DiscOveRy Approach based on common biological evidence

    PubMed Central

    Zhang, Kelvin Xi; Ouellette, B. F. Francis

    2010-01-01

    Motivation: Many biological phenomena involve extensive interactions between many of the biological pathways present in cells. However, extraction of all the inherent biological pathways remains a major challenge in systems biology. With the advent of high-throughput functional genomic techniques, it is now possible to infer biological pathways and pathway organization in a systematic way by integrating disparate biological information. Results: Here, we propose a novel integrated approach that uses network topology to predict biological pathways. We integrated four types of biological evidence (protein–protein interaction, genetic interaction, domain–domain interaction and semantic similarity of Gene Ontology terms) to generate a functionally associated network. This network was then used to develop a new pathway finding algorithm to predict biological pathways in yeast. Our approach discovered 195 biological pathways and 31 functionally redundant pathway pairs in yeast. By comparing our identified pathways to three public pathway databases (KEGG, BioCyc and Reactome), we observed that our approach achieves a maximum positive predictive value of 12.8% and improves on other predictive approaches. This study allows us to reconstruct biological pathways and delineates cellular machinery in a systematic view. Availability: The method has been implemented in Perl and is available for downloading from http://www.oicr.on.ca/research/ouellette/pandora. It is distributed under the terms of GPL (http://opensource.org/licenses/gpl-2.0.php) Contact: francis@oicr.on.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20031970

  13. ESEA: Discovering the Dysregulated Pathways based on Edge Set Enrichment Analysis

    PubMed Central

    Han, Junwei; Shi, Xinrui; Zhang, Yunpeng; Xu, Yanjun; Jiang, Ying; Zhang, Chunlong; Feng, Li; Yang, Haixiu; Shang, Desi; Sun, Zeguo; Su, Fei; Li, Chunquan; Li, Xia

    2015-01-01

    Pathway analyses are playing an increasingly important role in understanding biological mechanism, cellular function and disease states. Current pathway-identification methods generally focus on only the changes of gene expression levels; however, the biological relationships among genes are also the fundamental components of pathways, and the dysregulated relationships may also alter the pathway activities. We propose a powerful computational method, Edge Set Enrichment Analysis (ESEA), for the identification of dysregulated pathways. This provides a novel way of pathway analysis by investigating the changes of biological relationships of pathways in the context of gene expression data. Simulation studies illustrate the power and performance of ESEA under various simulated conditions. Using real datasets from p53 mutation, Type 2 diabetes and lung cancer, we validate effectiveness of ESEA in identifying dysregulated pathways. We further compare our results with five other pathway enrichment analysis methods. With these analyses, we show that ESEA is able to help uncover dysregulated biological pathways underlying complex traits and human diseases via specific use of the dysregulated biological relationships. We develop a freely available R-based tool of ESEA. Currently, ESEA can support pathway analysis of the seven public databases (KEGG; Reactome; Biocarta; NCI; SPIKE; HumanCyc; Panther). PMID:26267116

  14. A novel method to identify pathways associated with renal cell carcinoma based on a gene co-expression network

    PubMed Central

    RUAN, XIYUN; LI, HONGYUN; LIU, BO; CHEN, JIE; ZHANG, SHIBAO; SUN, ZEQIANG; LIU, SHUANGQING; SUN, FAHAI; LIU, QINGYONG

    2015-01-01

    The aim of the present study was to develop a novel method for identifying pathways associated with renal cell carcinoma (RCC) based on a gene co-expression network. A framework was established where a co-expression network was derived from the database as well as various co-expression approaches. First, the backbone of the network based on differentially expressed (DE) genes between RCC patients and normal controls was constructed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database. The differentially co-expressed links were detected by Pearson’s correlation, the empirical Bayesian (EB) approach and Weighted Gene Co-expression Network Analysis (WGCNA). The co-expressed gene pairs were merged by a rank-based algorithm. We obtained 842; 371; 2,883 and 1,595 co-expressed gene pairs from the co-expression networks of the STRING database, Pearson’s correlation EB method and WGCNA, respectively. Two hundred and eighty-one differentially co-expressed (DC) gene pairs were obtained from the merged network using this novel method. Pathway enrichment analysis based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and the network enrichment analysis (NEA) method were performed to verify feasibility of the merged method. Results of the KEGG and NEA pathway analyses showed that the network was associated with RCC. The suggested method was computationally efficient to identify pathways associated with RCC and has been identified as a useful complement to traditional co-expression analysis. PMID:26058425

  15. Crosstalk analysis of pathways in breast cancer using a network model based on overlapping differentially expressed genes

    PubMed Central

    SUN, YONG; YUAN, KAI; ZHANG, PENG; MA, RONG; ZHANG, QI-WEN; TIAN, XING-SONG

    2015-01-01

    Multiple signal transduction pathways can affect each other considerably through crosstalk. However, the presence and extent of this phenomenon have not been rigorously studied. The aim of the present study was to identify strong and normal interactions between pathways in breast cancer and determine the main pathway. Five sets of breast cancer data were downloaded from the high-throughput Gene Expression Omnibus (GEO) and analyzed to identify differentially expressed (DE) genes using the Rank Product (RankProd) method. A list of pathways with differential expression was obtained by gene set enrichment analysis (GSEA) of the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The DE genes that overlapped between pathways were identified and a crosstalk network diagram based on the overlap of DE genes was constructed. A total of 1,464 DE genes and 26 pathways were identified. In addition, the number of DE genes that overlapped between specific pathways were determined, and the greatest degree of overlap was between the extracellular matrix (ECM)-receptor interaction and Focal adhesion pathways, which had 22 overlapping DE genes. Weighted pathway analysis of the crosstalk between pathways identified that Pathways in cancer was the main pathway in breast cancer. PMID:26622386

  16. Twenty years of the MEROPS database of proteolytic enzymes, their substrates and inhibitors.

    PubMed

    Rawlings, Neil D; Barrett, Alan J; Finn, Robert

    2016-01-01

    The MEROPS database (http://merops.sanger.ac.uk) is an integrated source of information about peptidases, their substrates and inhibitors, which are of great relevance to biology, medicine and biotechnology. The hierarchical classification of the database is as follows: homologous sets of sequences are grouped into a protein species; protein species are grouped into a family; families are grouped into clans. There is a type example for each protein species (known as a 'holotype'), family and clan, and each protein species, family and clan has its own unique identifier. Pages to show the involvement of peptidases and peptidase inhibitors in biological pathways have been created. Each page shows the peptidases and peptidase inhibitors involved in the pathway, along with the known substrate cleavages and peptidase-inhibitor interactions, and a link to the KEGG database of biological pathways. Links have also been established with the IUPHAR Guide to Pharmacology. A new service has been set up to allow the submission of identified substrate cleavages so that conservation of the cleavage site can be assessed. This should help establish whether or not a cleavage site is physiologically relevant on the basis that such a cleavage site is likely to be conserved. PMID:26527717

  17. What's My Substrate? Computational Function Assignment of Candida parapsilosis ADH5 by Genome Database Search, Virtual Screening, and QM/MM Calculations.

    PubMed

    Dhoke, Gaurao V; Ensari, Yunus; Davari, Mehdi D; Ruff, Anna Joëlle; Schwaneberg, Ulrich; Bocola, Marco

    2016-07-25

    Zinc-dependent medium chain reductase from Candida parapsilosis can be used in the reduction of carbonyl compounds to pharmacologically important chiral secondary alcohols. To date, the nomenclature of cpADH5 is differing (CPCR2/RCR/SADH) in the literature, and its natural substrate is not known. In this study, we utilized a substrate docking based virtual screening method combined with KEGG, MetaCyc pathway, and Candida genome databases search for the discovery of natural substrates of cpADH5. The virtual screening of 7834 carbonyl compounds from the ZINC database provided 94 aldehydes or methyl/ethyl ketones as putative carbonyl substrates. Out of which, 52 carbonyl substrates of cpADH5 with catalytically active docking pose were identified by employing mechanism based substrate docking protocol. Comparison of the virtual screening results with KEGG, MetaCyc database search, and Candida genome pathway analysis suggest that cpADH5 might be involved in the Ehrlich pathway (reduction of fusel aldehydes in leucine, isoleucine, and valine degradation). Our QM/MM calculations and experimental activity measurements affirmed that butyraldehyde substrates are the potential natural substrates of cpADH5, suggesting a carbonyl reductase role for this enzyme in butyraldehyde reduction in aliphatic amino acid degradation pathways. Phylogenetic tree analysis of known ADHs from Candida albicans shows that cpADH5 is close to caADH5. We therefore propose, according to the experimental substrate identification and sequence similarity, the common name butyraldehyde dehydrogenase cpADH5 for Candida parapsilosis CPCR2/RCR/SADH. PMID:27387009

  18. KENeV: A web-application for the automated reconstruction and visualization of the enriched metabolic and signaling super-pathways deriving from genomic experiments

    PubMed Central

    Pilalis, Eleftherios; Koutsandreas, Theodoros; Valavanis, Ioannis; Athanasiadis, Emmanouil; Spyrou, George; Chatziioannou, Aristotelis

    2015-01-01

    Gene expression analysis, using high throughput genomic technologies,has become an indispensable step for the meaningful interpretation of the underlying molecular complexity, which shapes the phenotypic manifestation of the investigated biological mechanism. The modularity of the cellular response to different experimental conditions can be comprehended through the exploitation of molecular pathway databases, which offer a controlled, curated background for statistical enrichment analysis. Existing tools enable pathway analysis, visualization, or pathway merging but none integrates a fully automated workflow, combining all above-mentioned modules and destined to non-programmer users. We introduce an online web application, named KEGG Enriched Network Visualizer (KENeV), which enables a fully automated workflow starting from a list of differentially expressed genes and deriving the enriched KEGG metabolic and signaling pathways, merged into two respective, non-redundant super-networks. The final networks can be downloaded as SBML files, for further analysis, or instantly visualized through an interactive visualization module. In conclusion, KENeV (available online at http://www.grissom.gr/kenev) provides an integrative tool, suitable for users with no programming experience, for the functional interpretation, at both the metabolic and signaling level, of differentially expressed gene subsets deriving from genomic experiments. PMID:26925206

  19. KENeV: A web-application for the automated reconstruction and visualization of the enriched metabolic and signaling super-pathways deriving from genomic experiments.

    PubMed

    Pilalis, Eleftherios; Koutsandreas, Theodoros; Valavanis, Ioannis; Athanasiadis, Emmanouil; Spyrou, George; Chatziioannou, Aristotelis

    2015-01-01

    Gene expression analysis, using high throughput genomic technologies,has become an indispensable step for the meaningful interpretation of the underlying molecular complexity, which shapes the phenotypic manifestation of the investigated biological mechanism. The modularity of the cellular response to different experimental conditions can be comprehended through the exploitation of molecular pathway databases, which offer a controlled, curated background for statistical enrichment analysis. Existing tools enable pathway analysis, visualization, or pathway merging but none integrates a fully automated workflow, combining all above-mentioned modules and destined to non-programmer users. We introduce an online web application, named KEGG Enriched Network Visualizer (KENeV), which enables a fully automated workflow starting from a list of differentially expressed genes and deriving the enriched KEGG metabolic and signaling pathways, merged into two respective, non-redundant super-networks. The final networks can be downloaded as SBML files, for further analysis, or instantly visualized through an interactive visualization module. In conclusion, KENeV (available online at http://www.grissom.gr/kenev) provides an integrative tool, suitable for users with no programming experience, for the functional interpretation, at both the metabolic and signaling level, of differentially expressed gene subsets deriving from genomic experiments. PMID:26925206

  20. Identification of key pathways and genes in colorectal cancer using bioinformatics analysis.

    PubMed

    Liang, Bin; Li, Chunning; Zhao, Jianying

    2016-10-01

    Colorectal cancer (CRC) is the most common malignant tumor of digestive system. The aim of this study was to identify gene signatures during CRC and uncover their potential mechanisms. The gene expression profiles of GSE21815 were downloaded from GEO database. The GSE21815 dataset contained 141 samples, including 132 CRC and 9 normal colon epitheliums. The gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) enrichment analyses were performed, and protein-protein interaction (PPI) network of the differentially expressed genes (DEGs) was constructed by Cytoscape software. In total, 3500 DEGs were identified in CRC, including 1370 up-regulated genes and 2130 down-regulated genes. GO analysis results showed that up-regulated DEGs were significantly enriched in biological processes (BP), including cell cycle, cell division, and cell proliferation; the down-regulated DEGs were significantly enriched in biological processes, including immune response, intracellular signaling cascade and defense response. KEGG pathway analysis showed the up-regulated DEGs were enriched in cell cycle and DNA replication, while the down-regulated DEGs were enriched in drug metabolism, metabolism of xenobiotics by cytochrome P450, and retinol metabolism pathways. The top 10 hub genes, GNG2, AGT, SAA1, ADCY5, LPAR1, NMU, IL8, CXCL12, GNAI1, and CCR2 were identified from the PPI network, and sub-networks revealed these genes were involved in significant pathways, including G protein-coupled receptors signaling pathway, gastrin-CREB signaling pathway via PKC and MAPK, and extracellular matrix organization. In conclusion, the present study indicated that the identified DEGs and hub genes promote our understanding of the molecular mechanisms underlying the development of CRC, and might be used as molecular targets and diagnostic biomarkers for the treatment of CRC. PMID:27581154

  1. Genetic analysis of biological pathway data through genomic randomization

    PubMed Central

    Yaspan, Brian L.; Bush, William S.; Torstenson, Eric S.; Ma, Deqiong; Pericak-Vance, Margaret A.; Ritchie, Marylyn D.; Sutcliffe, James S.; Haines, Jonathan L.

    2011-01-01

    Genome Wide Association Studies (GWAS) are a standard approach for large-scale common variation characterization and for identification of single loci predisposing to disease. However, due to issues of moderate sample sizes and particularly multiple testing correction, many variants of smaller effect size are not detected within a single allele analysis framework. Thus, small main effects and potential epistatic effects are not consistently observed in GWAS using standard analytical approaches that consider only single SNP alleles. Here we propose unique methodology that aggregates variants of interest (for example, genes in a biological pathway) using GWAS results. Multiple testing and type I error concerns are minimized using empirical genomic randomization to estimate significance. Randomization corrects for common pathway-based analysis biases such as SNP coverage and density, linkage disequilibrium, gene size and pathway size. PARIS (Pathway Analysis by Randomization Incorporating Structure) applies this randomization and in doing so directly accounts for linkage disequilibrium effects. PARIS is independent of association analysis method and is thus applicable to GWAS datasets of all study designs. Using the KEGG database as an example, we apply PARIS to the publicly available Autism Genetic Resource Exchange (AGRE) GWA dataset, revealing pathways with a significant enrichment of positive association results. PMID:21279722

  2. Bayesian Joint Selection of Genes and Pathways: Applications in Multiple Myeloma Genomics

    PubMed Central

    Zhang, Lin; Morris, Jeffrey S; Zhang, Jiexin; Orlowski, Robert Z; Baladandayuthapani, Veerabhadran

    2014-01-01

    It is well-established that the development of a disease, especially cancer, is a complex process that results from the joint effects of multiple genes involved in various molecular signaling pathways. In this article, we propose methods to discover genes and molecular pathways significantly associated with clinical outcomes in cancer samples. We exploit the natural hierarchal structure of genes related to a given pathway as a group of interacting genes to conduct selection of both pathways and genes. We posit the problem in a hierarchical structured variable selection (HSVS) framework to analyze the corresponding gene expression data. HSVS methods conduct simultaneous variable selection at the pathway (group level) and the gene (within-group) level. To adapt to the overlapping group structure present in the pathway–gene hierarchy of the data, we developed an overlap-HSVS method that introduces latent partial effect variables that partition the marginal effect of the covariates and corresponding weights for a proportional shrinkage of the partial effects. Combining gene expression data with prior pathway information from the KEGG databases, we identified several gene–pathway combinations that are significantly associated with clinical outcomes of multiple myeloma. Biological discoveries support this relationship for the pathways and the corresponding genes we identified. PMID:25520554

  3. Bioinformatics Annotation of Human Y Chromosome-Encoded Protein Pathways and Interactions.

    PubMed

    Rengaraj, Deivendran; Kwon, Woo-Sung; Pang, Myung-Geol

    2015-09-01

    We performed a comprehensive analysis of human Y chromosome-encoded proteins, their pathways, and their interactions using bioinformatics tools. From the NCBI annotation release 107 of human genome, we retrieved a total of 66 proteins encoded on Y chromosome. Most of the retrieved proteins were also matched with the proteins listed in the core databases of the Human Proteome Project including neXtProt, PeptideAtlas, and the Human Protein Atlas. When we examined the pathways of human Y-encoded proteins through KEGG database and Pathway Studio software, many of proteins fall into the categories related to cell signaling pathways. Using the STRING program, we found a total of 49 human Y-encoded proteins showing strong/medium interaction with each other. While using the Pathway studio software, we found that a total of 16 proteins interact with other chromosome-encoded proteins. In particular, the SRY protein interacted with 17 proteins encoded on other chromosomes. Additionally, we aligned the sequences of human Y-encoded proteins with the sequences of chimpanzee and mouse Y-encoded proteins using the NCBI BLAST program. This analysis resulted in a significant number of orthologous proteins between human, chimpanzee, and mouse. Collectively, our findings provide the scientific community with additional information on the human Y chromosome-encoded proteins. PMID:26279084

  4. Comparative Pathway Analyzer--a web server for comparative analysis, clustering and visualization of metabolic networks in multiple organisms.

    PubMed

    Oehm, Sebastian; Gilbert, David; Tauch, Andreas; Stoye, Jens; Goesmann, Alexander

    2008-07-01

    In order to understand the phenotype of any living system, it is essential to not only investigate its genes, but also the specific metabolic pathway variant of the organism of interest, ideally in comparison with other organisms. The Comparative Pathway Analyzer, CPA, calculates and displays the differences in metabolic reaction content between two sets of organisms. Because results are highly dependent on the distribution of organisms into these two sets and the appropriate definition of these sets often is not easy, we provide hierarchical clustering methods for the identification of significant groupings. CPA also visualizes the reaction content of several organisms simultaneously allowing easy comparison. Reaction annotation data and maps for visualizing the results are taken from the KEGG database. Additionally, users can upload their own annotation data. This website is free and open to all users and there is no login requirement. It is available at https://www.cebitec.uni-bielefeld.de/groups/brf/software/cpa/index.html. PMID:18539612

  5. [De novotranscriptomic analysis of Chlorella sorokiniana: Pathway description and gene discovery for lipid production ].

    PubMed

    Li, Lin; Wang, Qinhong; Yang, Hailin; Wang, Wu

    2014-09-01

    [ OBJECTIVE] The paucity of genomic information limits the metabolic engineering of non-model microalgae Chlorella sorokiniana. Our study aimed to elucidate the fatty acid, triacylglycerol and starch biosynthetic pathways in the microalgae C. sorokiniana based on de novo transcriptomic analysis. [METHODS] We cultured C. sorokiniana with different nitrogen concentrations (KNO3: 8g/L and 2g/L) , then sequenced the transcriptomeusing Illumina Hiseq2000 platform. We used Trinity to de novo assemble the reads so as to obtain transcripts, aligned all the transcripts with Nr database, UniProtKB/Swiss-Prot database and COG database to annotate the function and classify using BLASTx algorithm, and assigned the transcript with metabolic pathway by aligning with KEGG database. Then we used RSEM to calculate FPKM value, and used it for preliminary analysis of different gene expression in the related pathways. [RESULTS] Over 49M high quality raw reads were produced with the length of 100bp, We used Trinity to assembled these reads into 49885 transcripts with an N50 of 1941bp, ranging from 300bp to 14100bp. 26479 transcripts were annotated through BLASTx similarity search, 2357 transcripts were assigned with EC number, and 207 metabolic pathways were assigned in total. Based on these analyses, we reconstructed the fatty acids, triacylglycerol and starch biosynthetic pathways in C. sorokiniana. We also identified preliminarily different geneexpression in the pathways. [CONCLUSION] Using RNA-seq technology, we reconstructed the metabolic pathways involving in the fatty acid, triacylglycerol and starch biosynthesis in non-model microalgae C. sorokiniana without genomic data, which is consistent with those in model microalgae Chlamydomonas reinhardtii, and compared the gene expression level under different conditions. These information is very useful for the metabolic engineering of C. sorokiniana and other microalgae to enhance the production of lipids. PMID:25522590

  6. A Bayesian Approach to Pathway Analysis by Integrating Gene–Gene Functional Directions and Microarray Data

    PubMed Central

    Zhao, Yifang; Chen, Ming-Hui; Pei, Baikang; Rowe, David; Shin, Dong-Guk; Xie, Wangang; Yu, Fang; Kuo, Lynn

    2012-01-01

    Many statistical methods have been developed to screen for differentially expressed genes associated with specific phenotypes in the microarray data. However, it remains a major challenge to synthesize the observed expression patterns with abundant biological knowledge for more complete understanding of the biological functions among genes. Various methods including clustering analysis on genes, neural network, Bayesian network and pathway analysis have been developed toward this goal. In most of these procedures, the activation and inhibition relationships among genes have hardly been utilized in the modeling steps. We propose two novel Bayesian models to integrate the microarray data with the putative pathway structures obtained from the KEGG database and the directional gene–gene interactions in the medical literature. We define the symmetric Kullback–Leibler divergence of a pathway, and use it to identify the pathway(s) most supported by the microarray data. Monte Carlo Markov Chain sampling algorithm is given for posterior computation in the hierarchical model. The proposed method is shown to select the most supported pathway in an illustrative example. Finally, we apply the methodology to a real microarray data set to understand the gene expression profile of osteoblast lineage at defined stages of differentiation. We observe that our method correctly identifies the pathways that are reported to play essential roles in modulating bone mass. PMID:23482678

  7. [A novel biological pathway expansion method based on the knowledge of protein-protein interactions].

    PubMed

    Zhao, Xiaolei; Zuo, Xiaoyu; Qin, Jiheng; Liang, Yan; Zhang, Naizun; Luan, Yizhao; Rao, Shaoqi

    2014-04-01

    Biological pathways have been widely used in gene function studies; however, the current knowledge for biological pathways is per se incomplete and has to be further expanded. Bioinformatics prediction provides us a cheap but effective way for pathway expansion. Here, we proposed a novel method for biological pathway prediction, by intergrating prior knowledge of protein?protein interactions and Gene Ontology (GO) database. First, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways to which the interacting neighbors of a targe gene (at the level of protein?protein interaction) belong were chosen as the candidate pathways. Then, the pathways to which the target gene belong were determined by testing whether the genes in the candidate pathways were enriched in the GO terms to which the target gene were annotated. The protein?protein interaction data obtained from the Human Protein Reference Database (HPRD) and Biological General Repository for Interaction Datasets (BioGRID) were respectively used to predict the pathway attribution(s) of the target gene. The results demanstrated that both the average accuracy (the ratio of the correctly predicted pathways to the totally pathways to which all the target genes were annotated) and the relative accuracy (of the genes with at least one annotated pathway being successful predicted, the percentage of the genes with all the annotated pathways being correctly predicted) for pathway predictions were increased with the number of the interacting neighbours. When the number of interacting neighbours reached 22, the average accuracy was 96.2% (HPRD) and 96.3% (BioGRID), respectively, and the relative accuracy was 93.3% (HPRD) and 84.1% (BioGRID), respectively. Further validation analysis of 89 genes whose pathway knowledge was updated in a new database release indicated that 50 genes were correctly predicted for at least one updated pathway, and 43 genes were accurately predicted for all the updated pathways, giving an

  8. Systems analysis of gene ontology and biological pathways involved in post-myocardial infarction responses

    PubMed Central

    2015-01-01

    Background Pathway analysis has been widely used to gain insight into essential mechanisms of the response to myocardial infarction (MI). Currently, there exist multiple pathway databases that organize molecular datasets and manually curate pathway maps for biological interpretation at varying forms of organization. However, inconsistencies among different databases in pathway descriptions, frequently due to conflicting results in the literature, can generate incorrect interpretations. Furthermore, although pathway analysis software provides detailed images of interactions among molecules, it does not exhibit how pathways interact with one another or with other biological processes under specific conditions. Methods We propose a novel method to standardize descriptions of enriched pathways for a set of genes/proteins using Gene Ontology terms. We used this method to examine the relationships among pathways and biological processes for a set of condition-specific genes/proteins, represented as a functional biological pathway-process network. We applied this algorithm to a set of 613 MI-specific proteins we previously identified. Results A total of 96 pathways from Biocarta, KEGG, and Reactome, and 448 Gene Ontology Biological Processes were enriched with these 613 proteins. The pathways were represented as Boolean functions of biological processes, delivering an interactive scheme to organize enriched information with an emphasis on involvement of biological processes in pathways. We extracted a network focusing on MI to demonstrate that tyrosine phosphorylation of Signal Transducer and Activator of Transcription (STAT) protein, positive regulation of collagen metabolic process, coagulation, and positive/negative regulation of blood coagulation have immediate impacts on the MI response. Conclusions Our method organized biological processes and pathways in an unbiased approach to provide an intuitive way to identify biological properties of pathways under specific

  9. ReprOlive: a database with linked data for the olive tree (Olea europaea L.) reproductive transcriptome.

    PubMed

    Carmona, Rosario; Zafra, Adoración; Seoane, Pedro; Castro, Antonio J; Guerrero-Fernández, Darío; Castillo-Castillo, Trinidad; Medina-García, Ana; Cánovas, Francisco M; Aldana-Montes, José F; Navas-Delgado, Ismael; Alché, Juan de Dios; Claros, M Gonzalo

    2015-01-01

    Plant reproductive transcriptomes have been analyzed in different species due to the agronomical and biotechnological importance of plant reproduction. Here we presented an olive tree reproductive transcriptome database with samples from pollen and pistil at different developmental stages, and leaf and root as control vegetative tissues http://reprolive.eez.csic.es). It was developed from 2,077,309 raw reads to 1,549 Sanger sequences. Using a pre-defined workflow based on open-source tools, sequences were pre-processed, assembled, mapped, and annotated with expression data, descriptions, GO terms, InterPro signatures, EC numbers, KEGG pathways, ORFs, and SSRs. Tentative transcripts (TTs) were also annotated with the corresponding orthologs in Arabidopsis thaliana from TAIR and RefSeq databases to enable Linked Data integration. It results in a reproductive transcriptome comprising 72,846 contigs with average length of 686 bp, of which 63,965 (87.8%) included at least one functional annotation, and 55,356 (75.9%) had an ortholog. A minimum of 23,568 different TTs was identified and 5,835 of them contain a complete ORF. The representative reproductive transcriptome can be reduced to 28,972 TTs for further gene expression studies. Partial transcriptomes from pollen, pistil, and vegetative tissues as control were also constructed. ReprOlive provides free access and download capability to these results. Retrieval mechanisms for sequences and transcript annotations are provided. Graphical localization of annotated enzymes into KEGG pathways is also possible. Finally, ReprOlive has included a semantic conceptualisation by means of a Resource Description Framework (RDF) allowing a Linked Data search for extracting the most updated information related to enzymes, interactions, allergens, structures, and reactive oxygen species. PMID:26322066

  10. ReprOlive: a database with linked data for the olive tree (Olea europaea L.) reproductive transcriptome

    PubMed Central

    Carmona, Rosario; Zafra, Adoración; Seoane, Pedro; Castro, Antonio J.; Guerrero-Fernández, Darío; Castillo-Castillo, Trinidad; Medina-García, Ana; Cánovas, Francisco M.; Aldana-Montes, José F.; Navas-Delgado, Ismael; Alché, Juan de Dios; Claros, M. Gonzalo

    2015-01-01

    Plant reproductive transcriptomes have been analyzed in different species due to the agronomical and biotechnological importance of plant reproduction. Here we presented an olive tree reproductive transcriptome database with samples from pollen and pistil at different developmental stages, and leaf and root as control vegetative tissues http://reprolive.eez.csic.es). It was developed from 2,077,309 raw reads to 1,549 Sanger sequences. Using a pre-defined workflow based on open-source tools, sequences were pre-processed, assembled, mapped, and annotated with expression data, descriptions, GO terms, InterPro signatures, EC numbers, KEGG pathways, ORFs, and SSRs. Tentative transcripts (TTs) were also annotated with the corresponding orthologs in Arabidopsis thaliana from TAIR and RefSeq databases to enable Linked Data integration. It results in a reproductive transcriptome comprising 72,846 contigs with average length of 686 bp, of which 63,965 (87.8%) included at least one functional annotation, and 55,356 (75.9%) had an ortholog. A minimum of 23,568 different TTs was identified and 5,835 of them contain a complete ORF. The representative reproductive transcriptome can be reduced to 28,972 TTs for further gene expression studies. Partial transcriptomes from pollen, pistil, and vegetative tissues as control were also constructed. ReprOlive provides free access and download capability to these results. Retrieval mechanisms for sequences and transcript annotations are provided. Graphical localization of annotated enzymes into KEGG pathways is also possible. Finally, ReprOlive has included a semantic conceptualisation by means of a Resource Description Framework (RDF) allowing a Linked Data search for extracting the most updated information related to enzymes, interactions, allergens, structures, and reactive oxygen species. PMID:26322066

  11. The relationship between inadvertent ingestion and dermal exposure pathways: a new integrated conceptual model and a database of dermal and oral transfer efficiencies.

    PubMed

    Gorman Ng, Melanie; Semple, Sean; Cherrie, John W; Christopher, Yvette; Northage, Christine; Tielemans, Erik; Veroughstraete, Violaine; Van Tongeren, Martie

    2012-11-01

    Occupational inadvertent ingestion exposure is ingestion exposure due to contact between the mouth and contaminated hands or objects. Although individuals are typically oblivious to their exposure by this route, it is a potentially significant source of occupational exposure for some substances. Due to the continual flux of saliva through the oral cavity and the non-specificity of biological monitoring to routes of exposure, direct measurement of exposure by the inadvertent ingestion route is challenging; predictive models may be required to assess exposure. The work described in this manuscript has been carried out as part of a project to develop a predictive model for estimating inadvertent ingestion exposure in the workplace. As inadvertent ingestion exposure mainly arises from hand-to-mouth contact, it is closely linked to dermal exposure. We present a new integrated conceptual model for dermal and inadvertent ingestion exposure that should help to increase our understanding of ingestion exposure and our ability to simultaneously estimate exposure by the dermal and ingestion routes. The conceptual model consists of eight compartments (source, air, surface contaminant layer, outer clothing contaminant layer, inner clothing contaminant layer, hands and arms layer, perioral layer, and oral cavity) and nine mass transport processes (emission, deposition, resuspension or evaporation, transfer, removal, redistribution, decontamination, penetration and/or permeation, and swallowing) that describe event-based movement of substances between compartments (e.g. emission, deposition, etc.). This conceptual model is intended to guide the development of predictive exposure models that estimate exposure from both the dermal and the inadvertent ingestion pathways. For exposure by these pathways the efficiency of transfer of materials between compartments (for example from surfaces to hands, or from hands to the mouth) are important determinants of exposure. A database of

  12. Analysis of Polygala tenuifolia Transcriptome and Description of Secondary Metabolite Biosynthetic Pathways by Illumina Sequencing.

    PubMed

    Tian, Hongling; Xu, Xiaoshuang; Zhang, Fusheng; Wang, Yaoqin; Guo, Shuhong; Qin, Xuemei; Du, Guanhua

    2015-01-01

    Radix polygalae, the dried roots of Polygala tenuifolia and P. sibirica, is one of the most well-known traditional Chinese medicinal plants. Radix polygalae contains various saponins, xanthones, and oligosaccharide esters and these compounds are responsible for several pharmacological properties. To provide basic breeding information, enhance molecular biological analysis, and determine secondary metabolite biosynthetic pathways of P. tenuifolia, we applied Illumina sequencing technology and de novo assembly. We also applied this technique to gain an overview of P. tenuifolia transcriptome from samples with different years. Using Illumina sequencing, approximately 67.2% of unique sequences were annotated by basic local alignment search tool similarity searches against public sequence databases. We classified the annotated unigenes by using Nr, Nt, GO, COG, and KEGG databases compared with NCBI. We also obtained many candidates CYP450s and UGTs by the analysis of genes in the secondary metabolite biosynthetic pathways, including putative terpenoid backbone and phenylpropanoid biosynthesis pathway. With this transcriptome sequencing, future genetic and genomics studies related to the molecular mechanisms associated with the chemical composition of P. tenuifolia may be improved. Genes involved in the enrichment of secondary metabolite biosynthesis-related pathways could enhance the potential applications of P. tenuifolia in pharmaceutical industries. PMID:26543847

  13. Analysis of Polygala tenuifolia Transcriptome and Description of Secondary Metabolite Biosynthetic Pathways by Illumina Sequencing

    PubMed Central

    Tian, Hongling; Xu, Xiaoshuang; Zhang, Fusheng; Wang, Yaoqin; Guo, Shuhong; Qin, Xuemei; Du, Guanhua

    2015-01-01

    Radix polygalae, the dried roots of Polygala tenuifolia and P. sibirica, is one of the most well-known traditional Chinese medicinal plants. Radix polygalae contains various saponins, xanthones, and oligosaccharide esters and these compounds are responsible for several pharmacological properties. To provide basic breeding information, enhance molecular biological analysis, and determine secondary metabolite biosynthetic pathways of P. tenuifolia, we applied Illumina sequencing technology and de novo assembly. We also applied this technique to gain an overview of P. tenuifolia transcriptome from samples with different years. Using Illumina sequencing, approximately 67.2% of unique sequences were annotated by basic local alignment search tool similarity searches against public sequence databases. We classified the annotated unigenes by using Nr, Nt, GO, COG, and KEGG databases compared with NCBI. We also obtained many candidates CYP450s and UGTs by the analysis of genes in the secondary metabolite biosynthetic pathways, including putative terpenoid backbone and phenylpropanoid biosynthesis pathway. With this transcriptome sequencing, future genetic and genomics studies related to the molecular mechanisms associated with the chemical composition of P. tenuifolia may be improved. Genes involved in the enrichment of secondary metabolite biosynthesis-related pathways could enhance the potential applications of P. tenuifolia in pharmaceutical industries. PMID:26543847

  14. Pathway-Based Genome-Wide Association Studies for Two Meat Production Traits in Simmental Cattle

    PubMed Central

    Fan, Huizhong; Wu, Yang; Zhou, Xiaojing; Xia, Jiangwei; Zhang, Wengang; Song, Yuxin; Liu, Fei; Chen, Yan; Zhang, Lupei; Gao, Xue; Gao, Huijiang; Li, Junya

    2015-01-01

    Most single nucleotide polymorphisms (SNPs) detected by genome-wide association studies (GWAS), explain only a small fraction of phenotypic variation. Pathway-based GWAS were proposed to improve the proportion of genes for some human complex traits that could be explained by enriching a mass of SNPs within genetic groups. However, few attempts have been made to describe the quantitative traits in domestic animals. In this study, we used a dataset with approximately 7,700,000 SNPs from 807 Simmental cattle and analyzed live weight and longissimus muscle area using a modified pathway-based GWAS method to orthogonalise the highly linked SNPs within each gene using principal component analysis (PCA). As a result, of the 262 biological pathways of cattle collected from the KEGG database, the gamma aminobutyric acid (GABA)ergic synapse pathway and the non-alcoholic fatty liver disease (NAFLD) pathway were significantly associated with the two traits analyzed. The GABAergic synapse pathway was biologically applicable to the traits analyzed because of its roles in feed intake and weight gain. The proposed method had high statistical power and a low false discovery rate, compared to those of the smallest P-value and SNP set enrichment analysis methods. PMID:26672757

  15. Disease co-morbidity and the human Wnt signaling pathway: a network-wise study.

    PubMed

    Nayak, Losiana; Tunga, Harinandan; De, Rajat K

    2013-06-01

    The human Wnt signaling pathway contains 57 genes communicating among themselves by 70 experimentally established associations, as given in the KEGG/PATHWAY database. It is responsible for a variety of crucial biological functions such as regulation of cell fate determination, proliferation, differentiation, migration, and apoptosis. Abnormal behavior of its members causes numerous types of human cancers, dramatic changes in bone mass density that lead to diseases such as osteoporosis-pseudo-glioma syndrome, Van-Buchem disease, skeletal malformation, autosomal dominant sclerosteosis, and osteoporosis type I syndromes. So far, single genes have been investigated for their disease-causing properties, and single diseases have been traced backwards to discover foul-play of the system pathways. Differential expression of the whole genome has been mapped by microarray. But how all the genes involved in a pathway affect each other in single/multiple disease state(s) and whether the presence of one disease state makes a person prone to another kind of disease(s) (i.e., co-morbidity among diseases associated with a certain important biological pathway) is still unknown. We have developed a human Wnt signaling pathway diseasome and analyzed it for finding answers to such questions. Data used in constructing the diseasome can be downloaded from the publicly accessible webserver http://www.isical.ac.in/-rajat/diseasome/index.php. PMID:23692364

  16. Data recovery and integration from public databases uncovers transformation-specific transcriptional downregulation of cAMP-PKA pathway-encoding genes

    PubMed Central

    Balestrieri, Chiara; Alberghina, Lilia; Vanoni, Marco; Chiaradonna, Ferdinando

    2009-01-01

    Background The integration of data from multiple genome-wide assays is essential for understanding dynamic spatio-temporal interactions within cells. Such integration, which leads to a more complete view of cellular processes, offers the opportunity to rationalize better the high amount of "omics" data freely available in several public databases. In particular, integration of microarray-derived transcriptome data with other high-throughput analyses (genomic and mutational analysis, promoter analysis) may allow us to unravel transcriptional regulatory networks under a variety of physio-pathological situations, such as the alteration in the cross-talk between signal transduction pathways in transformed cells. Results Here we sequentially apply web-based and statistical tools to a case study: the role of oncogenic activation of different signal transduction pathways in the transcriptional regulation of genes encoding proteins involved in the cAMP-PKA pathway. To this end, we first re-analyzed available genome-wide expression data for genes encoding proteins of the downstream branch of the PKA pathway in normal tissues and human tumor cell lines. Then, in order to identify mutation-dependent transcriptional signatures, we classified cancer cells as a function of their mutational state. The results of such procedure were used as a starting point to analyze the structure of PKA pathway-encoding genes promoters, leading to identification of specific combinations of transcription factor binding sites, which are neatly consistent with available experimental data and help to clarify the relation between gene expression, transcriptional factors and oncogenes in our case study. Conclusions Genome-wide, large-scale "omics" experimental technologies give different, complementary perspectives on the structure and regulatory properties of complex systems. Even the relatively simple, integrated workflow presented here offers opportunities not only for filtering data noise

  17. Pathway modeling of microarray data: a case study of pathway activity changes in the testis following in utero exposure to dibutyl phthalate (DBP).

    PubMed

    Ovacik, Meric A; Sen, Banalata; Euling, Susan Y; Gaido, Kevin W; Ierapetritou, Marianthi G; Androulakis, Ioannis P

    2013-09-15

    Pathway activity level analysis, the approach pursued in this study, focuses on all genes that are known to be members of metabolic and signaling pathways as defined by the KEGG database. The pathway activity level analysis entails singular value decomposition (SVD) of the expression data of the genes constituting a given pathway. We explore an extension of the pathway activity methodology for application to time-course microarray data. We show that pathway analysis enhances our ability to detect biologically relevant changes in pathway activity using synthetic data. As a case study, we apply the pathway activity level formulation coupled with significance analysis to microarray data from two different rat testes exposed in utero to Dibutyl Phthalate (DBP). In utero DBP exposure in the rat results in developmental toxicity of a number of male reproductive organs, including the testes. One well-characterized mode of action for DBP and the male reproductive developmental effects is the repression of expression of genes involved in cholesterol transport, steroid biosynthesis and testosterone synthesis that lead to a decreased fetal testicular testosterone. Previous analyses of DBP testes microarray data focused on either individual gene expression changes or changes in the expression of specific genes that are hypothesized, or known, to be important in testicular development and testosterone synthesis. However, a pathway analysis may inform whether there are additional affected pathways that could inform additional modes of action linked to DBP developmental toxicity. We show that Pathway activity analysis may be considered for a more comprehensive analysis of microarray data. PMID:20850466

  18. De Novo Transcriptomic Analysis of Peripheral Blood Lymphocytes from the Chinese Goose: Gene Discovery and Immune System Pathway Description

    PubMed Central

    Tariq, Mansoor; Chen, Rong; Yuan, Hongyu; Liu, Yanjie; Wu, Yanan; Wang, Junya; Xia, Chun

    2015-01-01

    Background The Chinese goose is one of the most economically important poultry birds and is a natural reservoir for many avian viruses. However, the nature and regulation of the innate and adaptive immune systems of this waterfowl species are not completely understood due to limited information on the goose genome. Recently, transcriptome sequencing technology was applied in the genomic studies focused on novel gene discovery. Thus, this study described the transcriptome of the goose peripheral blood lymphocytes to identify immunity relevant genes. Principal Findings De novo transcriptome assembly of the goose peripheral blood lymphocytes was sequenced by Illumina-Solexa technology. In total, 211,198 unigenes were assembled from the 69.36 million cleaned reads. The average length, N50 size and the maximum length of the assembled unigenes were 687 bp, 1,298 bp and 18,992 bp, respectively. A total of 36,854 unigenes showed similarity by BLAST search against the NCBI non-redundant (Nr) protein database. For functional classification, 163,161 unigenes were comprised of three Gene Ontology (Go) categories and 67 subcategories. A total of 15,334 unigenes were annotated into 25 eukaryotic orthologous groups (KOGs) categories. Kyoto Encyclopedia of Genes and Genomes (KEGG) database annotated 39,585 unigenes into six biological functional groups and 308 pathways. Among the 2,757 unigenes that participated in the 15 immune system KEGG pathways, 125 of the most important immune relevant genes were summarized and analyzed by STRING analysis to identify gene interactions and relationships. Moreover, 10 genes were confirmed by PCR and analyzed. Of these 125 unigenes, 109 unigenes, approximately 87%, were not previously identified in the goose. Conclusion This de novo transcriptome analysis could provide important Chinese goose sequence information and highlights the value of new gene discovery, pathways investigation and immune system gene identification, and comparison with

  19. Transcriptome Analysis Reveals the Genetic Basis of the Resveratrol Biosynthesis Pathway in an Endophytic Fungus (Alternaria sp. MG1) Isolated from Vitis vinifera

    PubMed Central

    Che, Jinxin; Shi, Junling; Gao, Zhenhong; Zhang, Yan

    2016-01-01

    Alternaria sp. MG1, an endophytic fungus previously isolated from Merlot grape, produces resveratrol from glucose, showing similar metabolic flux to the phenylpropanoid biosynthesis pathway, currently found solely in plants. In order to identify the resveratrol biosynthesis pathway in this strain at the gene level, de novo transcriptome sequencing was conducted using Illumina paired-end sequencing. A total of 22,954,434 high-quality reads were assembled into contigs and 18,570 unigenes were identified. Among these unigenes, 14,153 were annotated in the NCBI non-redundant protein database and 5341 were annotated in the Swiss-Prot database. After KEGG mapping, 2701 unigenes were mapped onto 115 pathways. Eighty-four unigenes were annotated in major pathways from glucose to resveratrol, coding 20 enzymes for glycolysis, 10 for phenylalanine biosynthesis, 4 for phenylpropanoid biosynthesis, and 4 for stilbenoid biosynthesis. Chalcone synthase was identified for resveratrol biosynthesis in this strain, due to the absence of stilbene synthase. All the identified enzymes indicated a reasonable biosynthesis pathway from glucose to resveratrol via glycolysis, phenylalanine biosynthesis, phenylpropanoid biosynthesis, and stilbenoid pathways. These results provide essential evidence for the occurrence of resveratrol biosynthesis in Alternaria sp. MG1 at the gene level, facilitating further elucidation of the molecular mechanisms involved in this strain's secondary metabolism. PMID:27588016

  20. Transcriptome Analysis Reveals the Genetic Basis of the Resveratrol Biosynthesis Pathway in an Endophytic Fungus (Alternaria sp. MG1) Isolated from Vitis vinifera.

    PubMed

    Che, Jinxin; Shi, Junling; Gao, Zhenhong; Zhang, Yan

    2016-01-01

    Alternaria sp. MG1, an endophytic fungus previously isolated from Merlot grape, produces resveratrol from glucose, showing similar metabolic flux to the phenylpropanoid biosynthesis pathway, currently found solely in plants. In order to identify the resveratrol biosynthesis pathway in this strain at the gene level, de novo transcriptome sequencing was conducted using Illumina paired-end sequencing. A total of 22,954,434 high-quality reads were assembled into contigs and 18,570 unigenes were identified. Among these unigenes, 14,153 were annotated in the NCBI non-redundant protein database and 5341 were annotated in the Swiss-Prot database. After KEGG mapping, 2701 unigenes were mapped onto 115 pathways. Eighty-four unigenes were annotated in major pathways from glucose to resveratrol, coding 20 enzymes for glycolysis, 10 for phenylalanine biosynthesis, 4 for phenylpropanoid biosynthesis, and 4 for stilbenoid biosynthesis. Chalcone synthase was identified for resveratrol biosynthesis in this strain, due to the absence of stilbene synthase. All the identified enzymes indicated a reasonable biosynthesis pathway from glucose to resveratrol via glycolysis, phenylalanine biosynthesis, phenylpropanoid biosynthesis, and stilbenoid pathways. These results provide essential evidence for the occurrence of resveratrol biosynthesis in Alternaria sp. MG1 at the gene level, facilitating further elucidation of the molecular mechanisms involved in this strain's secondary metabolism. PMID:27588016

  1. enRoute: dynamic path extraction from biological pathway maps for exploring heterogeneous experimental datasets

    PubMed Central

    2013-01-01

    Jointly analyzing biological pathway maps and experimental data is critical for understanding how biological processes work in different conditions and why different samples exhibit certain characteristics. This joint analysis, however, poses a significant challenge for visualization. Current techniques are either well suited to visualize large amounts of pathway node attributes, or to represent the topology of the pathway well, but do not accomplish both at the same time. To address this we introduce enRoute, a technique that enables analysts to specify a path of interest in a pathway, extract this path into a separate, linked view, and show detailed experimental data associated with the nodes of this extracted path right next to it. This juxtaposition of the extracted path and the experimental data allows analysts to simultaneously investigate large amounts of potentially heterogeneous data, thereby solving the problem of joint analysis of topology and node attributes. As this approach does not modify the layout of pathway maps, it is compatible with arbitrary graph layouts, including those of hand-crafted, image-based pathway maps. We demonstrate the technique in context of pathways from the KEGG and the Wikipathways databases. We apply experimental data from two public databases, the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA) that both contain a wide variety of genomic datasets for a large number of samples. In addition, we make use of a smaller dataset of hepatocellular carcinoma and common xenograft models. To verify the utility of enRoute, domain experts conducted two case studies where they explore data from the CCLE and the hepatocellular carcinoma datasets in the context of relevant pathways. PMID:24564375

  2. Gene Ontology and KEGG Enrichment Analyses of Genes Related to Age-Related Macular Degeneration

    PubMed Central

    Zhang, Jian; Xing, ZhiHao; Ma, Mingming; Wang, Ning; Cai, Yu-Dong; Chen, Lei; Xu, Xun

    2014-01-01

    Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD) is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes. PMID:25165703

  3. Characterization of Differentially Expressed Genes Involved in Pathways Associated with Gastric Cancer

    PubMed Central

    Li, Hao; Yu, Beiqin; Li, Jianfang; Su, Liping; Yan, Min; Zhang, Jun; Li, Chen; Zhu, Zhenggang; Liu, Bingya

    2015-01-01

    To explore the patterns of gene expression in gastric cancer, a total of 26 paired gastric cancer and noncancerous tissues from patients were enrolled for gene expression microarray analyses. Limma methods were applied to analyze the data, and genes were considered to be significantly differentially expressed if the False Discovery Rate (FDR) value was < 0.01, P-value was <0.01 and the fold change (FC) was >2. Subsequently, Gene Ontology (GO) categories were used to analyze the main functions of the differentially expressed genes. According to the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, we found pathways significantly associated with the differential genes. Gene-Act network and co-expression network were built respectively based on the relationships among the genes, proteins and compounds in the database. 2371 mRNAs and 350 lncRNAs considered as significantly differentially expressed genes were selected for the further analysis. The GO categories, pathway analyses and the Gene-Act network showed a consistent result that up-regulated genes were responsible for tumorigenesis, migration, angiogenesis and microenvironment formation, while down-regulated genes were involved in metabolism. These results of this study provide some novel findings on coding RNAs, lncRNAs, pathways and the co-expression network in gastric cancer which will be useful to guide further investigation and target therapy for this disease. PMID:25928635

  4. UniPathway: a resource for the exploration and annotation of metabolic pathways.

    PubMed

    Morgat, Anne; Coissac, Eric; Coudert, Elisabeth; Axelsen, Kristian B; Keller, Guillaume; Bairoch, Amos; Bridge, Alan; Bougueleret, Lydie; Xenarios, Ioannis; Viari, Alain

    2012-01-01

    UniPathway (http://www.unipathway.org) is a fully manually curated resource for the representation and annotation of metabolic pathways. UniPathway provides explicit representations of enzyme-catalyzed and spontaneous chemical reactions, as well as a hierarchical representation of metabolic pathways. This hierarchy uses linear subpathways as the basic building block for the assembly of larger and more complex pathways, including species-specific pathway variants. All of the pathway data in UniPathway has been extensively cross-linked to existing pathway resources such as KEGG and MetaCyc, as well as sequence resources such as the UniProt KnowledgeBase (UniProtKB), for which UniPathway provides a controlled vocabulary for pathway annotation. We introduce here the basic concepts underlying the UniPathway resource, with the aim of allowing users to fully exploit the information provided by UniPathway. PMID:22102589

  5. UniPathway: a resource for the exploration and annotation of metabolic pathways

    PubMed Central

    Morgat, Anne; Coissac, Eric; Coudert, Elisabeth; Axelsen, Kristian B.; Keller, Guillaume; Bairoch, Amos; Bridge, Alan; Bougueleret, Lydie; Xenarios, Ioannis; Viari, Alain

    2012-01-01

    UniPathway (http://www.unipathway.org) is a fully manually curated resource for the representation and annotation of metabolic pathways. UniPathway provides explicit representations of enzyme-catalyzed and spontaneous chemical reactions, as well as a hierarchical representation of metabolic pathways. This hierarchy uses linear subpathways as the basic building block for the assembly of larger and more complex pathways, including species-specific pathway variants. All of the pathway data in UniPathway has been extensively cross-linked to existing pathway resources such as KEGG and MetaCyc, as well as sequence resources such as the UniProt KnowledgeBase (UniProtKB), for which UniPathway provides a controlled vocabulary for pathway annotation. We introduce here the basic concepts underlying the UniPathway resource, with the aim of allowing users to fully exploit the information provided by UniPathway. PMID:22102589

  6. The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection.

    PubMed

    Galperin, Michael Y; Fernández-Suárez, Xosé M

    2012-01-01

    The 19th annual Database Issue of Nucleic Acids Research features descriptions of 92 new online databases covering various areas of molecular biology and 100 papers describing recent updates to the databases previously described in NAR and other journals. The highlights of this issue include, among others, a description of neXtProt, a knowledgebase on human proteins; a detailed explanation of the principles behind the NCBI Taxonomy Database; NCBI and EBI papers on the recently launched BioSample databases that store sample information for a variety of database resources; descriptions of the recent developments in the Gene Ontology and UniProt Gene Ontology Annotation projects; updates on Pfam, SMART and InterPro domain databases; update papers on KEGG and TAIR, two universally acclaimed databases that face an uncertain future; and a separate section with 10 wiki-based databases, introduced in an accompanying editorial. The NAR online Molecular Biology Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, has been updated and now lists 1380 databases. Brief machine-readable descriptions of the databases featured in this issue, according to the BioDBcore standards, will be provided at the http://biosharing.org/biodbcore web site. The full content of the Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/). PMID:22144685

  7. Using the Reactome Database

    PubMed Central

    Haw, Robin

    2012-01-01

    There is considerable interest in the bioinformatics community in creating pathway databases. The Reactome project (a collaboration between the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University Medical Center and the European Bioinformatics Institute) is one such pathway database and collects structured information on all the biological pathways and processes in the human. It is an expert-authored and peer-reviewed, curated collection of well-documented molecular reactions that span the gamut from simple intermediate metabolism to signaling pathways and complex cellular events. This information is supplemented with likely orthologous molecular reactions in mouse, rat, zebrafish, worm and other model organisms. This unit describes how to use the Reactome database to learn the steps of a biological pathway; navigate and browse through the Reactome database; identify the pathways in which a molecule of interest is involved; use the Pathway and Expression analysis tools to search the database for and visualize possible connections within user-supplied experimental data set and Reactome pathways; and the Species Comparison tool to compare human and model organism pathways. PMID:22700314

  8. Integrated miRNA–risk gene–pathway pair network analysis provides prognostic biomarkers for gastric cancer

    PubMed Central

    Cai, Hui; Xu, Jiping; Han, Yifang; Lu, Zhengmao; Han, Ting; Ding, Yibo; Ma, Liye

    2016-01-01

    Purpose This study aimed to identify molecular prognostic biomarkers for gastric cancer. Methods mRNA and miRNA expression profiles of eligible gastric cancer and control samples were downloaded from Gene Expression Omnibus to screen the differentially expressed genes (DEGs) and differentially expressed miRNAs (DEmiRs), using MetaDE and limma packages, respectively. Target genes of the DEmiRs were also collected from both predictive and experimentally validated target databases of miRNAs. The overlapping genes between selected targets and DEGs were identified as risk genes, followed by functional enrichment analysis. Human pathways and their corresponding genes were downloaded from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database for the expression analysis of each pathway in gastric cancer samples. Next, co-pathway pairs were selected according to the Pearson correlation coefficients. Finally, the co-pathway pairs, miRNA–target pairs, and risk gene–pathway pairs were merged into a complex interaction network, the most important nodes (miRNAs/target genes/co-pathway pairs) of which were selected by calculating their degrees. Results Totally, 1,260 DEGs and 144 DEmiRs were identified. There were 336 risk genes found in the 9,572 miRNA–target pairs. Judging from the pathway expression files, 45 co-pathway pairs were screened out. There were 1,389 interactive pairs and 480 nodes in the integrated network. Among all nodes in the network, focal adhesion/extracellular matrix–receptor interaction pathways, CALM2, miR-19b, and miR-181b were the hub nodes with higher degrees. Conclusion CALM2, hsa-miR-19b, and hsa-miR-181b might be used as potential prognostic targets for gastric cancer. PMID:27284247

  9. A data-based exploration of the adverse outcome pathway for skin sensitization points to the necessary requirements for its prediction with alternative methods.

    PubMed

    Benigni, Romualdo; Bossa, Cecilia; Tcheremenskaia, Olga

    2016-07-01

    This paper presents new data-based analyses on the ability of alternative methods to predict the skin sensitization potential of chemicals. It appears that skin sensitization, as shown in humans and rodents, can be predicted with good accuracy both with in vitro assays and QSAR approaches. The accuracy is about the same: 85-90%. Given that every biological measure has inherent uncertainty, this performance is quite remarkable. Overall, there is a good correlation between human data and experimental in vivo systems, except for sensitizers of intermediate potency. This uncertainty/variability is probably the reason why alternative methods are quite efficient in predicting both strong and non-sensitizers, but not the intermediate potency sensitizers. A detailed analysis of the predictivity of the individual approaches shows that the biological in vitro assays have limited added value in respect to the in chemico/QSAR ones, and suggests that the primary interaction with proteins is the rate-limiting step of the entire process. This confirms evidence from other fields (e.g., carcinogenicity, QSAR) indicating that successful predictive models are based on the parameterization of a few mechanistic features/events, whereas the consideration of all events supposedly involved in a toxicity pathway contributes to increase the uncertainty of the predictions. PMID:27090483

  10. The FunGenES Database: A Genomics Resource for Mouse Embryonic Stem Cell Differentiation

    PubMed Central

    Adler, Priit; Aksoy, Irène; Anastassiadis, Konstantinos; Bader, Michael; Billon, Nathalie; Boeuf, Hélène; Bourillot, Pierre-Yves; Buchholz, Frank; Dani, Christian; Doss, Michael Xavier; Forrester, Lesley; Gitton, Murielle; Henrique, Domingos; Hescheler, Jürgen; Himmelbauer, Heinz; Hübner, Norbert; Karantzali, Efthimia; Kretsovali, Androniki; Lubitz, Sandra; Pradier, Laurent; Rai, Meena; Reimand, Jüri; Rolletschek, Alexandra; Sachinidis, Agapios; Savatier, Pierre; Stewart, Francis; Storm, Mike P.; Trouillas, Marina; Vilo, Jaak; Welham, Melanie J.; Winkler, Johannes; Wobus, Anna M.; Hatzopoulos, Antonis K.

    2009-01-01

    Embryonic stem (ES) cells have high self-renewal capacity and the potential to differentiate into a large variety of cell types. To investigate gene networks operating in pluripotent ES cells and their derivatives, the “Functional Genomics in Embryonic Stem Cells” consortium (FunGenES) has analyzed the transcriptome of mouse ES cells in eleven diverse settings representing sixty-seven experimental conditions. To better illustrate gene expression profiles in mouse ES cells, we have organized the results in an interactive database with a number of features and tools. Specifically, we have generated clusters of transcripts that behave the same way under the entire spectrum of the sixty-seven experimental conditions; we have assembled genes in groups according to their time of expression during successive days of ES cell differentiation; we have included expression profiles of specific gene classes such as transcription regulatory factors and Expressed Sequence Tags; transcripts have been arranged in “Expression Waves” and juxtaposed to genes with opposite or complementary expression patterns; we have designed search engines to display the expression profile of any transcript during ES cell differentiation; gene expression data have been organized in animated graphs of KEGG signaling and metabolic pathways; and finally, we have incorporated advanced functional annotations for individual genes or gene clusters of interest and links to microarray and genomic resources. The FunGenES database provides a comprehensive resource for studies into the biology of ES cells. PMID:19727443

  11. PADB : Published Association Database

    PubMed Central

    Rhee, Hwanseok; Lee, Jin-Sung

    2007-01-01

    Background Although molecular pathway information and the International HapMap Project data can help biomedical researchers to investigate the aetiology of complex diseases more effectively, such information is missing or insufficient in current genetic association databases. In addition, only a few of the environmental risk factors are included as gene-environment interactions, and the risk measures of associations are not indexed in any association databases. Description We have developed a published association database (PADB; ) that includes both the genetic associations and the environmental risk factors available in PubMed database. Each genetic risk factor is linked to a molecular pathway database and the HapMap database through human gene symbols identified in the abstracts. And the risk measures such as odds ratios or hazard ratios are extracted automatically from the abstracts when available. Thus, users can review the association data sorted by the risk measures, and genetic associations can be grouped by human genes or molecular pathways. The search results can also be saved to tab-delimited text files for further sorting or analysis. Currently, PADB indexes more than 1,500,000 PubMed abstracts that include 3442 human genes, 461 molecular pathways and about 190,000 risk measures ranging from 0.00001 to 4878.9. Conclusion PADB is a unique online database of published associations that will serve as a novel and powerful resource for reviewing and interpreting huge association data of complex human diseases. PMID:17877839

  12. Databases for Microbiologists

    PubMed Central

    2015-01-01

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493

  13. Path2Models: large-scale generation of computational models from biochemical pathway maps

    PubMed Central

    2013-01-01

    Background Systems biology projects and omics technologies have led to a growing number of biochemical pathway models and reconstructions. However, the majority of these models are still created de novo, based on literature mining and the manual processing of pathway data. Results To increase the efficiency of model creation, the Path2Models project has automatically generated mathematical models from pathway representations using a suite of freely available software. Data sources include KEGG, BioCarta, MetaCyc and SABIO-RK. Depending on the source data, three types of models are provided: kinetic, logical and constraint-based. Models from over 2 600 organisms are encoded consistently in SBML, and are made freely available through BioModels Database at http://www.ebi.ac.uk/biomodels-main/path2models. Each model contains the list of participants, their interactions, the relevant mathematical constructs, and initial parameter values. Most models are also available as easy-to-understand graphical SBGN maps. Conclusions To date, the project has resulted in more than 140 000 freely available models. Such a resource can tremendously accelerate the development of mathematical models by providing initial starting models for simulation and analysis, which can be subsequently curated and further parameterized. PMID:24180668

  14. Identification of hub genes and pathways associated with retinoblastoma based on co-expression network analysis.

    PubMed

    Wang, Q L; Chen, X; Zhang, M H; Shen, Q H; Qin, Z M

    2015-01-01

    The objective of this paper was to identify hub genes and pathways associated with retinoblastoma using centrality analysis of the co-expression network and pathway-enrichment analysis. The co-expression network of retinoblastoma was constructed by weighted gene co-expression network analysis (WGCNA) based on differentially expressed (DE) genes, and clusters were obtained through the molecular complex detection (MCODE) algorithm. Degree centrality analysis of the co-expression network was performed to explore hub genes present in retinoblastoma. Pathway-enrichment analysis was performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Validation of hub gene expression in retinoblastoma was performed by reverse transcription-polymerase chain reaction (RT-PCR) analysis. The co-expression network based on 221 DE genes between retinoblastoma and normal controls consisted of 210 nodes and 3965 edges, and 5 clusters of the network were evaluated. By assessing the centrality analysis of the co-expression network, 21 hub genes were identified, such as SNORD115-41, RASSF2, and SNORD115-44. According to RT-PCR analysis, 16 of the 21 hub genes were differently expressed, including RASSF2 and CDCA7, and 5 were not differently expressed in retinoblastoma compared to normal controls. Pathway analysis showed that genes in 2 clusters were enriched in 3 pathways: purine metabolism, p53 signaling pathway, and melanogenesis. In this study, we successfully identified 16 hub genes and 3 pathways associated with retinoblastoma, which may be potential biomarkers for early detection and therapy for retinoblastoma. PMID:26662407

  15. The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection

    PubMed Central

    Fernández-Suárez, Xosé M.; Rigden, Daniel J.; Galperin, Michael Y.

    2014-01-01

    The 2014 Nucleic Acids Research Database Issue includes descriptions of 58 new molecular biology databases and recent updates to 123 databases previously featured in NAR or other journals. For convenience, the issue is now divided into eight sections that reflect major subject categories. Among the highlights of this issue are six databases of the transcription factor binding sites in various organisms and updates on such popular databases as CAZy, Database of Genomic Variants (DGV), dbGaP, DrugBank, KEGG, miRBase, Pfam, Reactome, SEED, TCDB and UniProt. There is a strong block of structural databases, which includes, among others, the new RNA Bricks database, updates on PDBe, PDBsum, ArchDB, Gene3D, ModBase, Nucleic Acid Database and the recently revived iPfam database. An update on the NCBI’s MMDB describes VAST+, an improved tool for protein structure comparison. Two articles highlight the development of the Structural Classification of Proteins (SCOP) database: one describes SCOPe, which automates assignment of new structures to the existing SCOP hierarchy; the other one describes the first version of SCOP2, with its more flexible approach to classifying protein structures. This issue also includes a collection of articles on bacterial taxonomy and metagenomics, which includes updates on the List of Prokaryotic Names with Standing in Nomenclature (LPSN), Ribosomal Database Project (RDP), the Silva/LTP project and several new metagenomics resources. The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been expanded to 1552 databases. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/). PMID:24316579

  16. The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection.

    PubMed

    Fernández-Suárez, Xosé M; Rigden, Daniel J; Galperin, Michael Y

    2014-01-01

    The 2014 Nucleic Acids Research Database Issue includes descriptions of 58 new molecular biology databases and recent updates to 123 databases previously featured in NAR or other journals. For convenience, the issue is now divided into eight sections that reflect major subject categories. Among the highlights of this issue are six databases of the transcription factor binding sites in various organisms and updates on such popular databases as CAZy, Database of Genomic Variants (DGV), dbGaP, DrugBank, KEGG, miRBase, Pfam, Reactome, SEED, TCDB and UniProt. There is a strong block of structural databases, which includes, among others, the new RNA Bricks database, updates on PDBe, PDBsum, ArchDB, Gene3D, ModBase, Nucleic Acid Database and the recently revived iPfam database. An update on the NCBI's MMDB describes VAST+, an improved tool for protein structure comparison. Two articles highlight the development of the Structural Classification of Proteins (SCOP) database: one describes SCOPe, which automates assignment of new structures to the existing SCOP hierarchy; the other one describes the first version of SCOP2, with its more flexible approach to classifying protein structures. This issue also includes a collection of articles on bacterial taxonomy and metagenomics, which includes updates on the List of Prokaryotic Names with Standing in Nomenclature (LPSN), Ribosomal Database Project (RDP), the Silva/LTP project and several new metagenomics resources. The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been expanded to 1552 databases. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/). PMID:24316579

  17. Altered Molecular Expression of the TLR4/NF-κB Signaling Pathway in Mammary Tissue of Chinese Holstein Cattle with Mastitis

    PubMed Central

    Wu, Jie; Li, Lian; Sun, Yu; Huang, Shuai; Tang, Juan; Yu, Pan; Wang, Genlin

    2015-01-01

    Toll-like receptor 4 (TLR4) mediated activation of the nuclear transcription factor κB (NF-κB) signaling pathway by mastitis initiates expression of genes associated with inflammation and the innate immune response. In this study, the profile of mastitis-induced differential gene expression in the mammary tissue of Chinese Holstein cattle was investigated by Gene-Chip microarray and bioinformatics. The microarray results revealed that 79 genes associated with the TLR4/NF-κB signaling pathway were differentially expressed. Of these genes, 19 were up-regulated and 29 were down-regulated in mastitis tissue compared to normal, healthy tissue. Statistical analysis of transcript and protein level expression changes indicated that 10 genes, namely TLR4, MyD88, IL-6, and IL-10, were up-regulated, while, CD14, TNF-α, MD-2, IL-β, NF-κB, and IL-12 were significantly down-regulated in mastitis tissue in comparison with normal tissue. Analyses using bioinformatics database resources, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and the Gene Ontology Consortium (GO) for term enrichment analysis, suggested that these differently expressed genes implicate different regulatory pathways for immune function in the mammary gland. In conclusion, our study provides new evidence for better understanding the differential expression and mechanisms of the TLR4 /NF-κB signaling pathway in Chinese Holstein cattle with mastitis. PMID:25706977

  18. EuDBase: An online resource for automated EST analysis pipeline (ESTFrontier) and database for red seaweed Eucheuma denticulatum.

    PubMed

    Hussein, Zeti Azura Mohamed; Loke, Kok Keong; Abidin, Rabiatul Adawiah Zainal; Othman, Roohaida

    2011-01-01

    Functional genomics has proven to be an efficient tool in identifying genes involved in various biological functions. However the availability of commercially important seaweed Eucheuma denticulatum functional resources is still limited. EuDBase is the first seaweed online repository that provides integrated access to ESTs of Eucheuma denticulatum generated from samples collected from Kudat and Semporna in Sabah, Malaysia. The database stored 10,031 ESTs that are clustered and assembled into 2,275 unique transcripts (UT) and 955 singletons. Raw data were automatically processed using ESTFrontier, an in-house automated EST analysis pipeline. Data was collected in MySQL database. Web interface is implemented using PHP and it allows browsing and querying EuDBase through search engine. Data is searchable via BLAST hit, domain search, Gene Ontology or KEGG Pathway. A user-friendly interface allows the identification of sequences either using a simple text query or similarity search. The development of EuDBase is initiated to store, manage and analyze the E. denticulatum ESTs and to provide accumulative digital resources for the use of global scientific community. EuDBase is freely available from http://www.inbiosis.ukm.my/eudbase/. PMID:22102771

  19. EuDBase: An online resource for automated EST analysis pipeline (ESTFrontier) and database for red seaweed Eucheuma denticulatum

    PubMed Central

    Hussein, Zeti Azura Mohamed; Loke, Kok Keong; Abidin, Rabiatul Adawiah Zainal; Othman, Roohaida

    2011-01-01

    Functional genomics has proven to be an efficient tool in identifying genes involved in various biological functions. However the availability of commercially important seaweed Eucheuma denticulatum functional resources is still limited. EuDBase is the first seaweed online repository that provides integrated access to ESTs of Eucheuma denticulatum generated from samples collected from Kudat and Semporna in Sabah, Malaysia. The database stored 10,031 ESTs that are clustered and assembled into 2,275 unique transcripts (UT) and 955 singletons. Raw data were automatically processed using ESTFrontier, an in-house automated EST analysis pipeline. Data was collected in MySQL database. Web interface is implemented using PHP and it allows browsing and querying EuDBase through search engine. Data is searchable via BLAST hit, domain search, Gene Ontology or KEGG Pathway. A user-friendly interface allows the identification of sequences either using a simple text query or similarity search. The development of EuDBase is initiated to store, manage and analyze the E. denticulatum ESTs and to provide accumulative digital resources for the use of global scientific community. EuDBase is freely available from http://www.inbiosis.ukm.my/eudbase/. PMID:22102771

  20. Enchytraeus albidus Microarray: Enrichment, Design, Annotation and Database (EnchyBASE)

    PubMed Central

    Novais, Sara C.; Arrais, Joel; Lopes, Pedro; Vandenbrouck, Tine; De Coen, Wim; Roelofs, Dick; Soares, Amadeu M. V. M.; Amorim, Mónica J. B.

    2012-01-01

    Enchytraeus albidus (Oligochaeta) is an ecologically relevant species used as standard test organisms for risk assessment. Effects of stressors in this species are commonly determined at the population level using reproduction and survival as endpoints. The assessment of transcriptomic responses can be very useful e.g. to understand underlying mechanisms of toxicity with gene expression fingerprinting. In the present paper the following is being addressed: 1) development of suppressive subtractive hybridization (SSH) libraries enriched for differentially expressed genes after metal and pesticide exposures; 2) sequencing and characterization of all generated cDNA inserts; 3) development of a publicly available genomic database on E. albidus. A total of 2100 Expressed Sequence Tags (ESTs) were isolated, sequenced and assembled into 1124 clusters (947 singletons and 177 contigs). From these sequences, 41% matched known proteins in GenBank (BLASTX, e-value≤10-5) and 37% had at least one Gene Ontology (GO) term assigned. In total, 5.5% of the sequences were assigned to a metabolic pathway, based on KEGG. With this new sequencing information, an Agilent custom oligonucleotide microarray was designed, representing a potential tool for transcriptomic studies. EnchyBASE (http://bioinformatics.ua.pt/enchybase/) was developed as a web freely available database containing genomic information on E. albidus and will be further extended in the near future for other enchytraeid species. The database so far includes all ESTs generated for E. albidus from three cDNA libraries. This information can be downloaded and applied in functional genomics and transcription studies. PMID:22558086

  1. Transcriptome Analysis and Discovery of Genes Involved in Immune Pathways from Hepatopancreas of Microbial Challenged Mitten Crab Eriocheir sinensis

    PubMed Central

    Li, Xihong; Cui, Zhaoxia; Liu, Yuan; Song, Chengwen; Shi, Guohui

    2013-01-01

    Background The Chinese mitten crab Eriocheir sinensis is an important economic crustacean and has been seriously attacked by various diseases, which requires more and more information for immune relevant genes on genome background. Recently, high-throughput RNA sequencing (RNA-seq) technology provides a powerful and efficient method for transcript analysis and immune gene discovery. Methods/Principal Findings A cDNA library from hepatopancreas of E. sinensis challenged by a mixture of three pathogen strains (Gram-positive bacteria Micrococcus luteus, Gram-negative bacteria Vibrio alginolyticus and fungi Pichia pastoris; 108 cfu·mL−1) was constructed and randomly sequenced using Illumina technique. Totally 39.76 million clean reads were assembled to 70,300 unigenes. After ruling out short-length and low-quality sequences, 52,074 non-redundant unigenes were compared to public databases for homology searching and 17,617 of them showed high similarity to sequences in NCBI non-redundant protein (Nr) database. For function classification and pathway assignment, 18,734 (36.00%) unigenes were categorized to three Gene Ontology (GO) categories, 12,243 (23.51%) were classified to 25 Clusters of Orthologous Groups (COG), and 8,983 (17.25%) were assigned to six Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Potentially, 24, 14, 47 and 132 unigenes were characterized to be involved in Toll, IMD, JAK-STAT and MAPK pathways, respectively. Conclusions/Significance This is the first systematical transcriptome analysis of components relating to innate immune pathways in E. sinensis. Functional genes and putative pathways identified here will contribute to better understand immune system and prevent various diseases in crab. PMID:23874555

  2. De novo assembly of Eugenia uniflora L. transcriptome and identification of genes from the terpenoid biosynthesis pathway.

    PubMed

    Guzman, Frank; Kulcheski, Franceli Rodrigues; Turchetto-Zolet, Andreia Carina; Margis, Rogerio

    2014-12-01

    Pitanga (Eugenia uniflora L.) is a member of the Myrtaceae family and is of particular interest due to its medicinal properties that are attributed to specialized metabolites with known biological activities. Among these molecules, terpenoids are the most abundant in essential oils that are found in the leaves and represent compounds with potential pharmacological benefits. The terpene diversity observed in Myrtaceae is determined by the activity of different members of the terpene synthase and oxidosqualene cyclase families. Therefore, the aim of this study was to perform a de novo assembly of transcripts from E. uniflora leaves and to annotation to identify the genes potentially involved in the terpenoid biosynthesis pathway and terpene diversity. In total, 72,742 unigenes with a mean length of 1048bp were identified. Of these, 43,631 and 36,289 were annotated with the NCBI non-redundant protein and Swiss-Prot databases, respectively. The gene ontology categorized the sequences into 53 functional groups. A metabolic pathway analysis with KEGG revealed 8,625 unigenes assigned to 141 metabolic pathways and 40 unigenes predicted to be associated with the biosynthesis of terpenoids. Furthermore, we identified four putative full-length terpene synthase genes involved in sesquiterpenes and monoterpenes biosynthesis, and three putative full-length oxidosqualene cyclase genes involved in the triterpenes biosynthesis. The expression of these genes was validated in different E. uniflora tissues. PMID:25443850

  3. Analysis of schizophrenia and hepatocellular carcinoma genetic network with corresponding modularity and pathways: novel insights to the immune system

    PubMed Central

    2013-01-01

    Background Schizophrenic patients show lower incidences of cancer, implicating schizophrenia may be a protective factor against cancer. To study the genetic correlation between the two diseases, a specific PPI network was constructed with candidate genes of both schizophrenia and hepatocellular carcinoma. The network, designated schizophrenia-hepatocellular carcinoma network (SHCN), was analysed and cliques were identified as potential functional modules or complexes. The findings were compared with information from pathway databases such as KEGG, Reactome, PID and ConsensusPathDB. Results The functions of mediator genes from SHCN show immune system and cell cycle regulation have important roles in the eitology mechanism of schizophrenia. For example, the over-expressing schizophrenia candidate genes, SIRPB1, SYK and LCK, are responsible for signal transduction in cytokine production; immune responses involving IL-2 and TREM-1/DAP12 pathways are relevant for the etiology mechanism of schizophrenia. Novel treatments were proposed by searching the target genes of FDA approved drugs with genes in potential protein complexes and pathways. It was found that Vitamin A, retinoid acid and a few other immune response agents modulated by RARA and LCK genes may be potential treatments for both schizophrenia and hepatocellular carcinoma. Conclusions This is the first study showing specific mediator genes in the SHCN which may suppress tumors. We also show that the schizophrenic protein interactions and modulation with cancer implicates the importance of immune system for etiology of schizophrenia. PMID:24564241

  4. Lysine Malonylome May Affect the Central Metabolism and Erythromycin Biosynthesis Pathway in Saccharopolyspora erythraea.

    PubMed

    Xu, Jun-Yu; Xu, Zhen; Zhou, Ying; Ye, Bang-Ce

    2016-05-01

    Lysine acylation is a dynamic, reversible post-translational modification that can regulate cellular and organismal metabolism in bacteria. Acetylome has been studied well in bacteria. However, to our knowledge, there are no proteomic data on the lysine malonylation in prokaryotes, especially in actinomycetes, which are the major producers of therapeutic antibiotics. In our study, the first malonylome of the erythromycin-producing Saccharopolyspora erythraea was described by using a high-resolution mass spectrometry-based proteomics approach and high-affinity antimalonyllysine antibodies. We identified 192 malonylated sites on 132 substrates. Malonylated proteins are enriched in many biological processes such as protein synthesis, glycolysis and gluconeogenesis, the TCA cycle, and the feeder metabolic pathways of erythromycin synthesis according to GO analysis and KEGG pathway analysis. A total of 238 S/T/Y/H-phosphorylated sites on 158 proteins were also identified in our study, which aimed to explore the potential cross-talk between acylation and phosphorylation. After that, site-specific mutations showed that malonylation is a negative regulatory modification on the enzymatic activity of the acetyl-CoA synthetase (Acs) and glutamine synthetase (Gs). Furthermore, we compared the malonylation levels of the two-growth state to explore the potential effect of malonylation on the erythromycin biosynthesis. These findings expand our current knowledge of the actinomycetes malonylome and supplement the acylproteome databases of the whole bacteria. PMID:27090497

  5. Systematic Pathway Enrichment Analysis of a Genome-Wide Association Study on Breast Cancer Survival Reveals an Influence of Genes Involved in Cell Adhesion and Calcium Signaling on the Patients’ Clinical Outcome

    PubMed Central

    Woltmann, Andrea; Chen, Bowang; Lascorz, Jesús; Johansson, Robert; Eyfjörd, Jorunn E.; Hamann, Ute; Manjer, Jonas; Enquist-Olsson, Kerstin; Henriksson, Roger; Herms, Stefan; Hoffmann, Per; Hemminki, Kari; Lenner, Per; Försti, Asta

    2014-01-01

    Genome-wide association studies (GWASs) may help to understand the effects of genetic polymorphisms on breast cancer (BC) progression and survival. However, they give only a focused view, which cannot capture the tremendous complexity of this disease. Therefore, we investigated data from a previously conducted GWAS on BC survival for enriched pathways by different enrichment analysis tools using the two main annotation databases Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). The goal was to identify the functional categories (GO terms and KEGG pathways) that are consistently overrepresented in a statistically significant way in the list of genes generated from the single nucleotide polymorphism (SNP) data. The SNPs with allelic p-value cut-offs 0.005 and 0.01 were annotated to the genes by excluding or including a 20 kb up-and down-stream sequence of the genes and analyzed by six different tools. We identified eleven consistently enriched categories, the most significant ones relating to cell adhesion and calcium ion binding. Moreover, we investigated the similarity between our GWAS and the enrichment analyses of twelve published gene expression signatures for breast cancer prognosis. Five of them were commonly used and commercially available, five were based on different aspects of metastasis formation and two were developed from meta-analyses of published prognostic signatures. This comparison revealed similarities between our GWAS data and the general and the specific brain metastasis gene signatures as well as the Oncotype DX signature. As metastasis formation is a strong indicator of a patient’s prognosis, this result reflects the survival aspect of the conducted GWAS and supports cell adhesion and calcium signaling as important pathways in cancer progression. PMID:24886783

  6. Biofuel Database

    National Institute of Standards and Technology Data Gateway

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  7. Integrative Pathway Analysis of Metabolic Signature in Bladder Cancer: A Linkage to The Cancer Genome Atlas Project and Prediction of Survival

    PubMed Central

    von Rundstedt, Friedrich-Carl; Rajapakshe, Kimal; Ma, Jing; Arnold, James M.; Gohlke, Jie; Putluri, Vasanta; Krishnapuram, Rashmi; Piyarathna, D. Badrajee; Lotan, Yair; Gödde, Daniel; Roth, Stephan; Störkel, Stephan; Levitt, Jonathan M.; Michailidis, George; Sreekumar, Arun; Lerner, Seth P.; Coarfa, Cristian; Putluri, Nagireddy

    2016-01-01

    Purpose We used targeted mass spectrometry to study the metabolic fingerprint of urothelial cancer and determine whether the biochemical pathway analysis gene signature would have a predictive value in independent cohorts of patients with bladder cancer. Materials and Methods Pathologically evaluated, bladder derived tissues, including benign adjacent tissue from 14 patients and bladder cancer from 46, were analyzed by liquid chromatography based targeted mass spectrometry. Differential metabolites associated with tumor samples in comparison to benign tissue were identified by adjusting the p values for multiple testing at a false discovery rate threshold of 15%. Enrichment of pathways and processes associated with the metabolic signature were determined using the GO (Gene Ontology) Database and MSigDB (Molecular Signature Database). Integration of metabolite alterations with transcriptome data from TCGA (The Cancer Genome Atlas) was done to identify the molecular signature of 30 metabolic genes. Available outcome data from TCGA portal were used to determine the association with survival. Results We identified 145 metabolites, of which analysis revealed 31 differential metabolites when comparing benign and tumor tissue samples. Using the KEGG (Kyoto Encyclopedia of Genes and Genomes) Database we identified a total of 174 genes that correlated with the altered metabolic pathways involved. By integrating these genes with the transcriptomic data from the corresponding TCGA data set we identified a metabolic signature consisting of 30 genes. The signature was significant in its prediction of survival in 95 patients with a low signature score vs 282 with a high signature score (p = 0.0458). Conclusions Targeted mass spectrometry of bladder cancer is highly sensitive for detecting metabolic alterations. Applying transcriptome data allows for integration into larger data sets and identification of relevant metabolic pathways in bladder cancer progression. PMID:26802582

  8. Electronic Databases.

    ERIC Educational Resources Information Center

    Williams, Martha E.

    1985-01-01

    Presents examples of bibliographic, full-text, and numeric databases. Also discusses how to access these databases online, aids to online retrieval, and several issues and trends (including copyright and downloading, transborder data flow, use of optical disc/videodisc technology, and changing roles in database generation and processing). (JN)

  9. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  10. LigandBox: A database for 3D structures of chemical compounds

    PubMed Central

    Kawabata, Takeshi; Sugihara, Yusuke; Fukunishi, Yoshifumi; Nakamura, Haruki

    2013-01-01

    A database for the 3D structures of available compounds is essential for the virtual screening by molecular docking. We have developed the LigandBox database (http://ligandbox.protein.osaka-u.ac.jp/ligandbox/) containing four million available compounds, collected from the catalogues of 37 commercial suppliers, and approved drugs and biochemical compounds taken from KEGG_DRUG, KEGG_COMPOUND and PDB databases. Each chemical compound in the database has several 3D conformers with hydrogen atoms and atomic charges, which are ready to be docked into receptors using docking programs. The 3D conformations were generated using our molecular simulation program package, myPresto. Various physical properties, such as aqueous solubility (LogS) and carcinogenicity have also been calculated to characterize the ADME-Tox properties of the compounds. The Web database provides two services for compound searches: a property/chemical ID search and a chemical structure search. The chemical structure search is performed by a descriptor search and a maximum common substructure (MCS) search combination, using our program kcombu. By specifying a query chemical structure, users can find similar compounds among the millions of compounds in the database within a few minutes. Our database is expected to assist a wide range of researchers, in the fields of medical science, chemical biology, and biochemistry, who are seeking to discover active chemical compounds by the virtual screening. PMID:27493549

  11. BIAdb: A curated database of benzylisoquinoline alkaloids

    PubMed Central

    2010-01-01

    Background Benzylisoquinoline is the structural backbone of many alkaloids with a wide variety of structures including papaverine, noscapine, codeine, morphine, apomorphine, berberine, protopine and tubocurarine. Many benzylisoquinoline alkaloids have been reported to show therapeutic properties and to act as novel medicines. Thus it is important to collect and compile benzylisoquinoline alkaloids in order to explore their usage in medicine. Description We extract information about benzylisoquinoline alkaloids from various sources like PubChem, KEGG, KNApSAcK and manual curation from literature. This information was processed and compiled in order to create a comprehensive database of benzylisoquinoline alkaloids, called BIAdb. The current version of BIAdb contains information about 846 unique benzylisoquinoline alkaloids, with multiple entries in term of source, function leads to total number of 2504 records. One of the major features of this database is that it provides data about 627 different plant species as a source of benzylisoquinoline and 114 different types of function performed by these compounds. A large number of online tools have been integrated, which facilitate user in exploring full potential of BIAdb. In order to provide additional information, we give external links to other resources/databases. One of the important features of this database is that it is tightly integrated with Drugpedia, which allows managing data in fixed/flexible format. Conclusions A database of benzylisoquinoline compounds has been created, which provides comprehensive information about benzylisoquinoline alkaloids. This database will be very useful for those who are working in the field of drug discovery based on natural products. This database will also serve researchers working in the field of synthetic biology, as developing medicinally important alkaloids using synthetic process are one of important challenges. This database is available from http

  12. CSGene: a literature-based database for cell senescence genes and its application to identify critical cell aging pathways and associated diseases

    PubMed Central

    Zhao, M; Chen, L; Qu, H

    2016-01-01

    Cell senescence is a cellular process in which normal diploid cells cease to replicate and is a major driving force for human cancers and aging-associated diseases. Recent studies on cell senescence have identified many new genetic components and pathways that control cell aging. However, there is no comprehensive resource for cell senescence that integrates various genetic studies and relationships with cell senescence, and the risk associated with complex diseases such as cancer is still unexplored. We have developed the first literature-based gene resource for exploring cell senescence genes, CSGene. We complied 504 experimentally verified genes from public data resources and published literature. Pathway analyses highlighted the prominent roles of cell senescence genes in the control of rRNA gene transcription and unusual rDNA repeat that constitute a center for the stability of the whole genome. We also found a strong association of cell senescence with HIV-1 infection and viral carcinogenesis that are mainly related to promoter/enhancer binding and chromatin modification processes. Moreover, pan-cancer mutation and network analysis also identified common cell aging mechanisms in cancers and uncovered a highly modular network structure. These results highlight the utility of CSGene for elucidating the complex cellular events of cell senescence. PMID:26775705

  13. Statistical databases

    SciTech Connect

    Kogalovskii, M.R.

    1995-03-01

    This paper presents a review of problems related to statistical database systems, which are wide-spread in various fields of activity. Statistical databases (SDB) are referred to as databases that consist of data and are used for statistical analysis. Topics under consideration are: SDB peculiarities, properties of data models adequate for SDB requirements, metadata functions, null-value problems, SDB compromise protection problems, stored data compression techniques, and statistical data representation means. Also examined is whether the present Database Management Systems (DBMS) satisfy the SDB requirements. Some actual research directions in SDB systems are considered.

  14. Transcriptome-based discovery of pathways and genes related to resistance against Fusarium head blight in wheat landrace Wangshuibai

    PubMed Central

    2013-01-01

    Background Fusarium head blight (FHB), caused mainly by Fusarium graminearum (Fg) Schwabe (teleomorph: Gibberellazeae Schwble), brings serious damage to wheat production. Chinese wheat landrace Wangshuibai is one of the most important resistance sources in the world. The knowledge of mechanism underlying its resistance to FHB is still limited. Results To get an overview of transcriptome characteristics of Wangshuibai during infection by Fg, a high-throughput RNA sequencing based on next generation sequencing (NGS) technology (Illumina) were performed. Totally, 165,499 unigenes were generated and assigned to known protein databases including NCBI non-redundant protein database (nr) (82,721, 50.0%), Gene Ontology (GO) (38,184, 23.1%), Swiss-Prot (50,702, 30.6%), Clusters of orthologous groups (COG) (51,566, 31.2%) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) (30,657, 18.5%), as determined by Blastx search. With another NGS based platform, a digital gene expression (DGE) system, gene expression in Wangshuibai and its FHB susceptible mutant NAUH117 was profiled and compared at two infection stages by inoculation of Fg at 24 and 48 hour, with the aim of identifying genes involved in FHB resistance. Conclusion Pathogen-related proteins such as PR5, PR14 and ABC transporter and JA signaling pathway were crucial for FHB resistance, especially that mediated by Fhb1. ET pathway and ROS/NO pathway were not activated in Wangshuibai and may be not pivotal in defense to FHB. Consistent with the fact that in NAUH117 there presented a chromosome fragment deletion, which led to its increased FHB susceptibility, in Wangshuibai, twenty out of eighty-nine genes showed changed expression patterns upon the infection of Fg. The up-regulation of eight of them was confirmed by qRT-PCR, revealing they may be candidate genes for Fhb1 and need further functional analysis to confirm their roles in FHB resistance. PMID:23514540

  15. Quantitative Proteogenomics and the Reconstruction of the Metabolic Pathway in Lactobacillus mucosae LM1.

    PubMed

    Pajarillo, Edward Alain B; Kim, Sang Hoon; Lee, Ji-Yoon; Valeriano, Valerie Diane V; Kang, Dae-Kyung

    2015-01-01

    Lactobacillus mucosae is a natural resident of the gastrointestinal tract of humans and animals and a potential probiotic bacterium. To understand the global protein expression profile and metabolic features of L. mucosae LM1 in the early stationary phase, the QExactive(TM) Hybrid Quadrupole-Orbitrap Mass Spectrometer was used. Characterization of the intracellular proteome identified 842 proteins, accounting for approximately 35% of the 2,404 protein-coding sequences in the complete genome of L. mucosae LM1. Proteome quantification using QExactive(TM) Orbitrap MS detected 19 highly abundant proteins (> 1.0% of the intracellular proteome), including CysK (cysteine synthase, 5.41%) and EF-Tu (elongation factor Tu, 4.91%), which are involved in cell survival against environmental stresses. Metabolic pathway annotation of LM1 proteome using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database showed that half of the proteins expressed are important for basic metabolic and biosynthetic processes, and the other half might be structurally important or involved in basic cellular processes. In addition, glycogen biosynthesis was activated in the early stationary phase, which is important for energy storage and maintenance. The proteogenomic data presented in this study provide a suitable reference to understand the protein expression pattern of lactobacilli in standard conditions. PMID:26761899

  16. Quantitative Proteogenomics and the Reconstruction of the Metabolic Pathway in Lactobacillus mucosae LM1

    PubMed Central

    Lee, Ji-Yoon

    2015-01-01

    Lactobacillus mucosae is a natural resident of the gastrointestinal tract of humans and animals and a potential probiotic bacterium. To understand the global protein expression profile and metabolic features of L. mucosae LM1 in the early stationary phase, the QExactiveTM Hybrid Quadrupole-Orbitrap Mass Spectrometer was used. Characterization of the intracellular proteome identified 842 proteins, accounting for approximately 35% of the 2,404 protein-coding sequences in the complete genome of L. mucosae LM1. Proteome quantification using QExactiveTM Orbitrap MS detected 19 highly abundant proteins (> 1.0% of the intracellular proteome), including CysK (cysteine synthase, 5.41%) and EF-Tu (elongation factor Tu, 4.91%), which are involved in cell survival against environmental stresses. Metabolic pathway annotation of LM1 proteome using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database showed that half of the proteins expressed are important for basic metabolic and biosynthetic processes, and the other half might be structurally important or involved in basic cellular processes. In addition, glycogen biosynthesis was activated in the early stationary phase, which is important for energy storage and maintenance. The proteogenomic data presented in this study provide a suitable reference to understand the protein expression pattern of lactobacilli in standard conditions. PMID:26761899

  17. Next Generation Sequencing and Transcriptome Analysis Predicts Biosynthetic Pathway of Sennosides from Senna (Cassia angustifolia Vahl.), a Non-Model Plant with Potent Laxative Properties

    PubMed Central

    Rama Reddy, Nagaraja Reddy; Mehta, Rucha Harishbhai; Soni, Palak Harendrabhai; Makasana, Jayanti; Gajbhiye, Narendra Athamaram; Ponnuchamy, Manivel; Kumar, Jitendra

    2015-01-01

    Senna (Cassia angustifolia Vahl.) is a world’s natural laxative medicinal plant. Laxative properties are due to sennosides (anthraquinone glycosides) natural products. However, little genetic information is available for this species, especially concerning the biosynthetic pathways of sennosides. We present here the transcriptome sequencing of young and mature leaf tissue of Cassia angustifolia using Illumina MiSeq platform that resulted in a total of 6.34 Gb of raw nucleotide sequence. The sequence assembly resulted in 42230 and 37174 transcripts with an average length of 1119 bp and 1467 bp for young and mature leaf, respectively. The transcripts were annotated using NCBI BLAST with ‘green plant database (txid 33090)’, Swiss Prot, Kyoto Encylcopedia of Genes & Genomes (KEGG), Cluster of Orthologous Gene (COG) and Gene Ontology (GO). Out of the total transcripts, 40138 (95.0%) and 36349 (97.7%) from young and mature leaf, respectively, were annotated by BLASTX against green plant database of NCBI. We used InterProscan to see protein similarity at domain level, a total of 34031 (young leaf) and 32077 (mature leaf) transcripts were annotated against the Pfam domains. All transcripts from young and mature leaf were assigned to 191 KEGG pathways. There were 166 and 159 CDS, respectively, from young and mature leaf involved in metabolism of terpenoids and polyketides. Many CDS encoding enzymes leading to biosynthesis of sennosides were identified. A total of 10,763 CDS differentially expressing in both young and mature leaf libraries of which 2,343 (21.7%) CDS were up-regulated in young compared to mature leaf. Several differentially expressed genes found functionally associated with sennoside biosynthesis. CDS encoding for many CYPs and TF families were identified having probable roles in metabolism of primary as well as secondary metabolites. We developed SSR markers for molecular breeding of senna. We have identified a set of putative genes involved in various

  18. Maize databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  19. Database Manager

    ERIC Educational Resources Information Center

    Martin, Andrew

    2010-01-01

    It is normal practice today for organizations to store large quantities of records of related information as computer-based files or databases. Purposeful information is retrieved by performing queries on the data sets. The purpose of DATABASE MANAGER is to communicate to students the method by which the computer performs these queries. This…

  20. Changes in the Proteome of Langat-Infected Ixodes scapularis ISE6 Cells: Metabolic Pathways Associated with Flavivirus Infection

    PubMed Central

    Grabowski, Jeffrey M.; Perera, Rushika; Roumani, Ali M.; Hedrick, Victoria E.; Inerowicz, Halina D.; Hill, Catherine A.; Kuhn, Richard J.

    2016-01-01

    Background Ticks (Family Ixodidae) transmit a variety of disease causing agents to humans and animals. The tick-borne flaviviruses (TBFs; family Flaviviridae) are a complex of viruses, many of which cause encephalitis and hemorrhagic fever, and represent global threats to human health and biosecurity. Pathogenesis has been well studied in human and animal disease models. Equivalent analyses of tick-flavivirus interactions are limited and represent an area of study that could reveal novel approaches for TBF control. Methodology/Principal Findings High resolution LC-MS/MS was used to analyze the proteome of Ixodes scapularis (Lyme disease tick) embryonic ISE6 cells following infection with Langat virus (LGTV) and identify proteins associated with viral infection and replication. Maximal LGTV infection of cells and determination of peak release of infectious virus, was observed at 36 hours post infection (hpi). Proteins were extracted from ISE6 cells treated with LGTV and non-infectious (UV inactivated) LGTV at 36 hpi and analyzed by mass spectrometry. The Omics Discovery Pipeline (ODP) identified thousands of MS peaks. Protein homology searches against the I. scapularis IscaW1 genome assembly identified a total of 486 proteins that were subsequently assigned to putative functional pathways using searches against the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. 266 proteins were differentially expressed following LGTV infection relative to non-infected (mock) cells. Of these, 68 proteins exhibited increased expression and 198 proteins had decreased expression. The majority of the former were classified in the KEGG pathways: “translation”, “amino acid metabolism”, and “protein folding/sorting/degradation”. Finally, Trichostatin A and Oligomycin A increased and decreased LGTV replication in vitro in ISE6 cells, respectively. Conclusions/Significance Proteomic analyses revealed ISE6 proteins that were differentially expressed at the peak of LGTV

  1. De Novo Transcriptome Analysis of Wing Development-Related Signaling Pathways in Locusta migratoria Manilensis and Ostrinia furnacalis (Guenée)

    PubMed Central

    Chu, Yuan; Zhang, Long; Shen, Jie; An, Chunju

    2014-01-01

    Background Orthopteran migratory locust, Locusta migratoria, and lepidopteran Asian corn borer, Ostrinia furnacalis, are two types of insects undergoing incomplete and complete metamorphosis, respectively. Identification of candidate genes regulating wing development in these two insects would provide insights into the further study about the molecular mechanisms controlling metamorphosis development. We have sequenced the transcriptome of O. furnacalis larvae previously. Here we sequenced and characterized the transcriptome of L. migratoria wing discs with special emphasis on wing development-related signaling pathways. Methodology/Principal Findings Illumina Hiseq2000 was used to sequence 8.38 Gb of the transcriptome from dissected nymphal wing discs. De novo assembly generated 91,907 unigenes with mean length of 610 nt. All unigenes were searched against five databases including Nt, Nr, Swiss-Prot, COG, and KEGG for annotations using blastn or blastx algorithm with an cut-off E-value of 10−5. A total of 23,359 (25.4%) unigenes have homologs within at least one database. Based on sequence similarity to homologs known to regulate Drosophila melanogaster wing development, we identified 50 and 46 potential wing development-related unigenes from L. migratoria and O. furnacalis transcriptome, respectively. The identified unigenes encode putative orthologs for nearly all components of the Hedgehog (Hh), Decapentaplegic (Dpp), Notch (N), and Wingless (Wg) signaling pathways, which are essential for growth and pattern formation during wing development. We investigated the expression profiles of the component genes involved in these signaling pathways in forewings and hind wings of L. migratoria and O. furnacalis. The results revealed the tested genes had different expression patterns in two insects. Conclusions/Significance This study provides the comprehensive sequence resource of the wing development-related signaling pathways of L. migratoria. The obtained data

  2. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  3. BIOMARKERS DATABASE

    EPA Science Inventory

    This database was developed by assembling and evaluating the literature relevant to human biomarkers. It catalogues and evaluates the usefulness of biomarkers of exposure, susceptibility and effect which may be relevant for a longitudinal cohort study. In addition to describing ...

  4. Pathway analysis of genome-wide association study and transcriptome data highlights new biological pathways in colorectal cancer.

    PubMed

    Quan, Baoku; Qi, Xingsi; Yu, Zhihui; Jiang, Yongshuai; Liao, Mingzhi; Wang, Guangyu; Feng, Rennan; Zhang, Liangcai; Chen, Zugen; Jiang, Qinghua; Liu, Guiyou

    2015-04-01

    Colorectal cancer (CRC) is a common malignancy that meets the definition of a complex disease. Genome-wide association study (GWAS) has identified several loci of weak predictive value in CRC, however, these do not fully explain the occurrence risk. Recently, gene set analysis has allowed enhanced interpretation of GWAS data in CRC, identifying a number of metabolic pathways as important for disease pathogenesis. Whether there are other important pathways involved in CRC, however, remains unclear. We present a systems analysis of KEGG pathways in CRC using (1) a human CRC GWAS dataset and (2) a human whole transcriptome CRC case-control expression dataset. Analysis of the GWAS dataset revealed significantly enriched KEGG pathways related to metabolism, immune system and diseases, cellular processes, environmental information processing, genetic information processing, and neurodegenerative diseases. Altered gene expression was confirmed in these pathways using the transcriptome dataset. Taken together, these findings not only confirm previous work in this area, but also highlight new biological pathways whose deregulation is critical for CRC. These results contribute to our understanding of disease-causing mechanisms and will prove useful for future genetic and functional studies in CRC. PMID:25362561

  5. Experiment Databases

    NASA Astrophysics Data System (ADS)

    Vanschoren, Joaquin; Blockeel, Hendrik

    Next to running machine learning algorithms based on inductive queries, much can be learned by immediately querying the combined results of many prior studies. Indeed, all around the globe, thousands of machine learning experiments are being executed on a daily basis, generating a constant stream of empirical information on machine learning techniques. While the information contained in these experiments might have many uses beyond their original intent, results are typically described very concisely in papers and discarded afterwards. If we properly store and organize these results in central databases, they can be immediately reused for further analysis, thus boosting future research. In this chapter, we propose the use of experiment databases: databases designed to collect all the necessary details of these experiments, and to intelligently organize them in online repositories to enable fast and thorough analysis of a myriad of collected results. They constitute an additional, queriable source of empirical meta-data based on principled descriptions of algorithm executions, without reimplementing the algorithms in an inductive database. As such, they engender a very dynamic, collaborative approach to experimentation, in which experiments can be freely shared, linked together, and immediately reused by researchers all over the world. They can be set up for personal use, to share results within a lab or to create open, community-wide repositories. Here, we provide a high-level overview of their design, and use an existing experiment database to answer various interesting research questions about machine learning algorithms and to verify a number of recent studies.

  6. The Comparative Toxicogenomics Database facilitates identification and understanding of chemical-gene-disease associations: arsenic as a case study

    PubMed Central

    Davis, Allan P; Murphy, Cynthia G; Rosenstein, Michael C; Wiegers, Thomas C; Mattingly, Carolyn J

    2008-01-01

    Background The etiology of many chronic diseases involves interactions between environmental factors and genes that modulate physiological processes. Understanding interactions between environmental chemicals and genes/proteins may provide insights into the mechanisms of chemical actions, disease susceptibility, toxicity, and therapeutic drug interactions. The Comparative Toxicogenomics Database (CTD; ) provides these insights by curating and integrating data describing relationships between chemicals, genes/proteins, and human diseases. To illustrate the scope and application of CTD, we present an analysis of curated data for the chemical arsenic. Arsenic represents a major global environmental health threat and is associated with many diseases. The mechanisms by which arsenic modulates these diseases are not well understood. Methods Curated interactions between arsenic compounds and genes were downloaded using export and batch query tools at CTD. The list of genes was analyzed for molecular interactions, Gene Ontology (GO) terms, KEGG pathway annotations, and inferred disease relationships. Results CTD contains curated data from the published literature describing 2,738 molecular interactions between 21 different arsenic compounds and 1,456 genes and proteins. Analysis of these genes and proteins provide insight into the biological functions and molecular networks that are affected by exposure to arsenic, including stress response, apoptosis, cell cycle, and specific protein signaling pathways. Integrating arsenic-gene data with gene-disease data yields a list of diseases that may be associated with arsenic exposure and genes that may explain this association. Conclusion CTD data integration and curation strategies yield insight into the actions of environmental chemicals and provide a basis for developing hypotheses about the molecular mechanisms underlying the etiology of environmental diseases. While many reports describe the molecular response to arsenic, CTD

  7. Transcriptome analysis of candidate genes and signaling pathways associated with light-induced brown film formation in Lentinula edodes.

    PubMed

    Tang, Li-Hua; Jian, Hua-Hua; Song, Chun-Yan; Bao, Da-Peng; Shang, Xiao-Dong; Wu, Da-Qiang; Tan, Qi; Zhang, Xue-Hong

    2013-06-01

    High-throughput Illumina RNA-seq was used for deep sequencing analysis of the transcriptome of poly(A)+ RNA from mycelium grown under three different conditions: 30 days darkness (sample 118), 80 days darkness (313W), and 30 days darkness followed by 50 days in the light (313C), in order to gain insight into the molecular mechanisms underlying the process of light-induced brown film (BF) formation in the edible mushroom, Lentinula edodes. Of the three growth conditions, BF formation occurred in 313C samples only. Approximately 159.23 million reads were obtained, trimmed, and de novo assembled into 31,511 contigs with an average length of 1,746 bp and an N 50 of 2,480 bp. Based on sequence orientations determined by a BLASTX search against the NR, Swiss-Prot, COG, and KEGG databases, 24,246 (76.9 %) contigs were assigned putative descriptions. Comparison of 313C/118 and 313C/313W expression profiles revealed 3,958 and 5,651 significantly differentially expressed contigs (DECs), respectively. Annotation using the COG database revealed that candidate genes for light-induced BF formation encoded proteins linked to light reception (e.g., WC-1, WC-2, phytochrome), light signal transduction pathways (e.g., two-component phosphorelay system, mitogen-activated protein kinase pathway), and pigment formation (e.g., polyketide synthase, O-methyltransferase, laccase, P450 monooxygenase, oxidoreductase). Several DECs were validated using quantitative real-time polymerase chain reaction. Our report is the first to identify genes associated with light-induced BF formation in L. edodes and represents a valuable resource for future genomic studies on this commercially important mushroom. PMID:23624682

  8. De novo Transcriptome Analysis of Sinapis alba in Revealing the Glucosinolate and Phytochelatin Pathways

    PubMed Central

    Zhang, Xiaohui; Liu, Tongjin; Duan, Mengmeng; Song, Jiangping; Li, Xixiang

    2016-01-01

    Sinapis alba is an important condiment crop and can also be used as a phytoremediation plant. Though it has important economic and agronomic values, sequence data, and the genetic tools are still rare in this plant. In the present study, a de novo transcriptome based on the transcriptions of leaves, stems, and roots was assembled for S. alba for the first time. The transcriptome contains 47,972 unigenes with a mean length of 1185 nt and an N50 of 1672 nt. Among these unigenes, 46,535 (97%) unigenes were annotated by at least one of the following databases: NCBI non-redundant (Nr), Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, Gene Ontology (GO), and Clusters of Orthologous Groups of proteins (COGs). The tissue expression pattern profiles revealed that 3489, 1361, and 8482 unigenes were predominantly expressed in the leaves, stems, and roots of S. alba, respectively. Genes predominantly expressed in the leaf were enriched in photosynthesis- and carbon fixation-related pathways. Genes predominantly expressed in the stem were enriched in not only pathways related to sugar, ether lipid, and amino acid metabolisms but also plant hormone signal transduction and circadian rhythm pathways, while the root-dominant genes were enriched in pathways related to lignin and cellulose syntheses, involved in plant-pathogen interactions, and potentially responsible for heavy metal chelating, and detoxification. Based on this transcriptome, 14,727 simple sequence repeats (SSRs) were identified, and 12,830 pairs of primers were developed for 2522 SSR-containing unigenes. Additionally, the glucosinolate (GSL) and phytochelatin metabolic pathways, which give the characteristic flavor and the heavy metal tolerance of this plant, were intensively analyzed. The genes of aliphatic GSLs pathway were predominantly expressed in roots. The absence of aliphatic GSLs in leaf tissues was due to the shutdown of BCAT4, MAM1, and CYP79F1 expressions. Glutathione was extensively

  9. De novo Transcriptome Analysis of Sinapis alba in Revealing the Glucosinolate and Phytochelatin Pathways.

    PubMed

    Zhang, Xiaohui; Liu, Tongjin; Duan, Mengmeng; Song, Jiangping; Li, Xixiang

    2016-01-01

    Sinapis alba is an important condiment crop and can also be used as a phytoremediation plant. Though it has important economic and agronomic values, sequence data, and the genetic tools are still rare in this plant. In the present study, a de novo transcriptome based on the transcriptions of leaves, stems, and roots was assembled for S. alba for the first time. The transcriptome contains 47,972 unigenes with a mean length of 1185 nt and an N50 of 1672 nt. Among these unigenes, 46,535 (97%) unigenes were annotated by at least one of the following databases: NCBI non-redundant (Nr), Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, Gene Ontology (GO), and Clusters of Orthologous Groups of proteins (COGs). The tissue expression pattern profiles revealed that 3489, 1361, and 8482 unigenes were predominantly expressed in the leaves, stems, and roots of S. alba, respectively. Genes predominantly expressed in the leaf were enriched in photosynthesis- and carbon fixation-related pathways. Genes predominantly expressed in the stem were enriched in not only pathways related to sugar, ether lipid, and amino acid metabolisms but also plant hormone signal transduction and circadian rhythm pathways, while the root-dominant genes were enriched in pathways related to lignin and cellulose syntheses, involved in plant-pathogen interactions, and potentially responsible for heavy metal chelating, and detoxification. Based on this transcriptome, 14,727 simple sequence repeats (SSRs) were identified, and 12,830 pairs of primers were developed for 2522 SSR-containing unigenes. Additionally, the glucosinolate (GSL) and phytochelatin metabolic pathways, which give the characteristic flavor and the heavy metal tolerance of this plant, were intensively analyzed. The genes of aliphatic GSLs pathway were predominantly expressed in roots. The absence of aliphatic GSLs in leaf tissues was due to the shutdown of BCAT4, MAM1, and CYP79F1 expressions. Glutathione was extensively

  10. Solubility Database

    National Institute of Standards and Technology Data Gateway

    SRD 106 IUPAC-NIST Solubility Database (Web, free access)   These solubilities are compiled from 18 volumes (Click here for List) of the International Union for Pure and Applied Chemistry(IUPAC)-NIST Solubility Data Series. The database includes liquid-liquid, solid-liquid, and gas-liquid systems. Typical solvents and solutes include water, seawater, heavy water, inorganic compounds, and a variety of organic compounds such as hydrocarbons, halogenated hydrocarbons, alcohols, acids, esters and nitrogen compounds. There are over 67,500 solubility measurements and over 1800 references.

  11. De Novo Transcriptome Sequencing Reveals Important Molecular Networks and Metabolic Pathways of the Plant, Chlorophytum borivilianum

    PubMed Central

    Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

    2013-01-01

    Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum. PMID:24376689

  12. MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics

    DOE PAGESBeta

    Jeffryes, James G.; Colastani, Ricardo L.; Elbadawi-Sidhu, Mona; Kind, Tobias; Niehaus, Thomas D.; Broadbelt, Linda J.; Hanson, Andrew D.; Fiehn, Oliver; Tyo, Keith E. J.; Henry, Christopher S.

    2015-08-28

    Metabolomics have proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likelymore » to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose

  13. MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics

    SciTech Connect

    Jeffryes, James G.; Colastani, Ricardo L.; Elbadawi-Sidhu, Mona; Kind, Tobias; Niehaus, Thomas D.; Broadbelt, Linda J.; Hanson, Andrew D.; Fiehn, Oliver; Tyo, Keith E. J.; Henry, Christopher S.

    2015-08-28

    Metabolomics have proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results

  14. Comparison of Metabolic Pathways in Escherichia coli by Using Genetic Algorithms

    PubMed Central

    Ortegon, Patricia; Poot-Hernández, Augusto C.; Perez-Rueda, Ernesto; Rodriguez-Vazquez, Katya

    2015-01-01

    In order to understand how cellular metabolism has taken its modern form, the conservation and variations between metabolic pathways were evaluated by using a genetic algorithm (GA). The GA approach considered information on the complete metabolism of the bacterium Escherichia coli K-12, as deposited in the KEGG database, and the enzymes belonging to a particular pathway were transformed into enzymatic step sequences by using the breadth-first search algorithm. These sequences represent contiguous enzymes linked to each other, based on their catalytic activities as they are encoded in the Enzyme Commission numbers. In a posterior step, these sequences were compared using a GA in an all-against-all (pairwise comparisons) approach. Individual reactions were chosen based on their measure of fitness to act as parents of offspring, which constitute the new generation. The sequences compared were used to construct a similarity matrix (of fitness values) that was then considered to be clustered by using a k-medoids algorithm. A total of 34 clusters of conserved reactions were obtained, and their sequences were finally aligned with a multiple-sequence alignment GA optimized to align all the reaction sequences included in each group or cluster. From these comparisons, maps associated with the metabolism of similar compounds also contained similar enzymatic step sequences, reinforcing the Patchwork Model for the evolution of metabolism in E. coli K-12, an observation that can be expanded to other organisms, for which there is metabolism information. Finally, our mapping of these reactions is discussed, with illustrations from a particular case. PMID:25973143

  15. Comparison of Metabolic Pathways in Escherichia coli by Using Genetic Algorithms.

    PubMed

    Ortegon, Patricia; Poot-Hernández, Augusto C; Perez-Rueda, Ernesto; Rodriguez-Vazquez, Katya

    2015-01-01

    In order to understand how cellular metabolism has taken its modern form, the conservation and variations between metabolic pathways were evaluated by using a genetic algorithm (GA). The GA approach considered information on the complete metabolism of the bacterium Escherichia coli K-12, as deposited in the KEGG database, and the enzymes belonging to a particular pathway were transformed into enzymatic step sequences by using the breadth-first search algorithm. These sequences represent contiguous enzymes linked to each other, based on their catalytic activities as they are encoded in the Enzyme Commission numbers. In a posterior step, these sequences were compared using a GA in an all-against-all (pairwise comparisons) approach. Individual reactions were chosen based on their measure of fitness to act as parents of offspring, which constitute the new generation. The sequences compared were used to construct a similarity matrix (of fitness values) that was then considered to be clustered by using a k-medoids algorithm. A total of 34 clusters of conserved reactions were obtained, and their sequences were finally aligned with a multiple-sequence alignment GA optimized to align all the reaction sequences included in each group or cluster. From these comparisons, maps associated with the metabolism of similar compounds also contained similar enzymatic step sequences, reinforcing the Patchwork Model for the evolution of metabolism in E. coli K-12, an observation that can be expanded to other organisms, for which there is metabolism information. Finally, our mapping of these reactions is discussed, with illustrations from a particular case. PMID:25973143

  16. Drinking Water Treatability Database (Database)

    EPA Science Inventory

    The drinking Water Treatability Database (TDB) will provide data taken from the literature on the control of contaminants in drinking water, and will be housed on an interactive, publicly-available USEPA web site. It can be used for identifying effective treatment processes, rec...

  17. Orchidstra: an integrated orchid functional genomics database.

    PubMed

    Su, Chun-lin; Chao, Ya-Ting; Yen, Shao-Hua; Chen, Chun-Yi; Chen, Wan-Chieh; Chang, Yao-Chien Alex; Shih, Ming-Che

    2013-02-01

    A specialized orchid database, named Orchidstra (URL: http://orchidstra.abrc.sinica.edu.tw), has been constructed to collect, annotate and share genomic information for orchid functional genomics studies. The Orchidaceae is a large family of Angiosperms that exhibits extraordinary biodiversity in terms of both the number of species and their distribution worldwide. Orchids exhibit many unique biological features; however, investigation of these traits is currently constrained due to the limited availability of genomic information. Transcriptome information for five orchid species and one commercial hybrid has been included in the Orchidstra database. Altogether, these comprise >380,000 non-redundant orchid transcript sequences, of which >110,000 are protein-coding genes. Sequences from the transcriptome shotgun assembly (TSA) were obtained either from output reads from next-generation sequencing technologies assembled into contigs, or from conventional cDNA library approaches. An annotation pipeline using Gene Ontology, KEGG and Pfam was built to assign gene descriptions and functional annotation to protein-coding genes. Deep sequencing of small RNA was also performed for Phalaenopsis aphrodite to search for microRNAs (miRNAs), extending the information archived for this species to miRNA annotation, precursors and putative target genes. The P. aphrodite transcriptome information was further used to design probes for an oligonucleotide microarray, and expression profiling analysis was carried out. The intensities of hybridized probes derived from microarray assays of various tissues were incorporated into the database as part of the functional evidence. In the future, the content of the Orchidstra database will be expanded with transcriptome data and genomic information from more orchid species. PMID:23324169

  18. Study on the Regulatory Mechanism of the Lipid Metabolism Pathways during Chicken Male Germ Cell Differentiation Based on RNA-Seq

    PubMed Central

    Zuo, Qisheng; Li, Dong; Zhang, Lei; Elsayed, Ahmed Kamel; Lian, Chao; Shi, Qingqing; Zhang, Zhentao; Zhu, Rui; Wang, Yinjie; Jin, Kai; Zhang, Yani; Li, Bichun

    2015-01-01

    Here, we explore the regulatory mechanism of lipid metabolic signaling pathways and related genes during differentiation of male germ cells in chickens, with the hope that better understanding of these pathways may improve in vitro induction. Fluorescence-activated cell sorting was used to obtain highly purified cultures of embryonic stem cells (ESCs), primitive germ cells (PGCs), and spermatogonial stem cells (SSCs). The total RNA was then extracted from each type of cell. High-throughput analysis methods (RNA-seq) were used to sequence the transcriptome of these cells. Gene Ontology (GO) analysis and the KEGG database were used to identify lipid metabolism pathways and related genes. Retinoic acid (RA), the end-product of the retinol metabolism pathway, induced in vitro differentiation of ESC into male germ cells. Quantitative real-time PCR (qRT-PCR) was used to detect changes in the expression of the genes involved in the retinol metabolic pathways. From the results of RNA-seq and the database analyses, we concluded that there are 328 genes in 27 lipid metabolic pathways continuously involved in lipid metabolism during the differentiation of ESC into SSC in vivo, including retinol metabolism. Alcohol dehydrogenase 5 (ADH5) and aldehyde dehydrogenase 1 family member A1 (ALDH1A1) are involved in RA synthesis in the cell. ADH5 was specifically expressed in PGC in our experiments and aldehyde dehydrogenase 1 family member A1 (ALDH1A1) persistently increased throughout development. CYP26b1, a member of the cytochrome P450 superfamily, is involved in the degradation of RA. Expression of CYP26b1, in contrast, decreased throughout development. Exogenous RA in the culture medium induced differentiation of ESC to SSC-like cells. The expression patterns of ADH5, ALDH1A1, and CYP26b1 were consistent with RNA-seq results. We conclude that the retinol metabolism pathway plays an important role in the process of chicken male germ cell differentiation. PMID:25658587

  19. Core Proteomic Analysis of Unique Metabolic Pathways of Salmonella enterica for the Identification of Potential Drug Targets

    PubMed Central

    2016-01-01

    Background Infections caused by Salmonella enterica, a Gram-negative facultative anaerobic bacteria belonging to the family of Enterobacteriaceae, are major threats to the health of humans and animals. The recent availability of complete genome data of pathogenic strains of the S. enterica gives new avenues for the identification of drug targets and drug candidates. We have used the genomic and metabolic pathway data to identify pathways and proteins essential to the pathogen and absent from the host. Methods We took the whole proteome sequence data of 42 strains of S. enterica and Homo sapiens along with KEGG-annotated metabolic pathway data, clustered proteins sequences using CD-HIT, identified essential genes using DEG database and discarded S. enterica homologs of human proteins in unique metabolic pathways (UMPs) and characterized hypothetical proteins with SVM-prot and InterProScan. Through this core proteomic analysis we have identified enzymes essential to the pathogen. Results The identification of 73 enzymes common in 42 strains of S. enterica is the real strength of the current study. We proposed all 73 unexplored enzymes as potential drug targets against the infections caused by the S. enterica. The study is comprehensive around S. enterica and simultaneously considered every possible pathogenic strain of S. enterica. This comprehensiveness turned the current study significant since, to the best of our knowledge it is the first subtractive core proteomic analysis of the unique metabolic pathways applied to any pathogen for the identification of drug targets. We applied extensive computational methods to shortlist few potential drug targets considering the druggability criteria e.g. Non-homologous to the human host, essential to the pathogen and playing significant role in essential metabolic pathways of the pathogen (i.e. S. enterica). In the current study, the subtractive proteomics through a novel approach was applied i.e. by considering only proteins

  20. Stackfile Database

    NASA Technical Reports Server (NTRS)

    deVarvalho, Robert; Desai, Shailen D.; Haines, Bruce J.; Kruizinga, Gerhard L.; Gilmer, Christopher

    2013-01-01

    This software provides storage retrieval and analysis functionality for managing satellite altimetry data. It improves the efficiency and analysis capabilities of existing database software with improved flexibility and documentation. It offers flexibility in the type of data that can be stored. There is efficient retrieval either across the spatial domain or the time domain. Built-in analysis tools are provided for frequently performed altimetry tasks. This software package is used for storing and manipulating satellite measurement data. It was developed with a focus on handling the requirements of repeat-track altimetry missions such as Topex and Jason. It was, however, designed to work with a wide variety of satellite measurement data [e.g., Gravity Recovery And Climate Experiment -- GRACE). The software consists of several command-line tools for importing, retrieving, and analyzing satellite measurement data.

  1. De novo assembly and transcriptome analysis of the rubber tree (Hevea brasiliensis) and SNP markers development for rubber biosynthesis pathways.

    PubMed

    Mantello, Camila Campos; Cardoso-Silva, Claudio Benicio; da Silva, Carla Cristina; de Souza, Livia Moura; Scaloppi Junior, Erivaldo José; de Souza Gonçalves, Paulo; Vicentini, Renato; de Souza, Anete Pereira

    2014-01-01

    Hevea brasiliensis (Willd. Ex Adr. Juss.) Muell.-Arg. is the primary source of natural rubber that is native to the Amazon rainforest. The singular properties of natural rubber make it superior to and competitive with synthetic rubber for use in several applications. Here, we performed RNA sequencing (RNA-seq) of H. brasiliensis bark on the Illumina GAIIx platform, which generated 179,326,804 raw reads on the Illumina GAIIx platform. A total of 50,384 contigs that were over 400 bp in size were obtained and subjected to further analyses. A similarity search against the non-redundant (nr) protein database returned 32,018 (63%) positive BLASTx hits. The transcriptome analysis was annotated using the clusters of orthologous groups (COG), gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Pfam databases. A search for putative molecular marker was performed to identify simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). In total, 17,927 SSRs and 404,114 SNPs were detected. Finally, we selected sequences that were identified as belonging to the mevalonate (MVA) and 2-C-methyl-D-erythritol 4-phosphate (MEP) pathways, which are involved in rubber biosynthesis, to validate the SNP markers. A total of 78 SNPs were validated in 36 genotypes of H. brasiliensis. This new dataset represents a powerful information source for rubber tree bark genes and will be an important tool for the development of microsatellites and SNP markers for use in future genetic analyses such as genetic linkage mapping, quantitative trait loci identification, investigations of linkage disequilibrium and marker-assisted selection. PMID:25048025

  2. De Novo Assembly and Transcriptome Analysis of the Rubber Tree (Hevea brasiliensis) and SNP Markers Development for Rubber Biosynthesis Pathways

    PubMed Central

    Mantello, Camila Campos; Cardoso-Silva, Claudio Benicio; da Silva, Carla Cristina; de Souza, Livia Moura; Scaloppi Junior, Erivaldo José; de Souza Gonçalves, Paulo; Vicentini, Renato; de Souza, Anete Pereira

    2014-01-01

    Hevea brasiliensis (Willd. Ex Adr. Juss.) Muell.-Arg. is the primary source of natural rubber that is native to the Amazon rainforest. The singular properties of natural rubber make it superior to and competitive with synthetic rubber for use in several applications. Here, we performed RNA sequencing (RNA-seq) of H. brasiliensis bark on the Illumina GAIIx platform, which generated 179,326,804 raw reads on the Illumina GAIIx platform. A total of 50,384 contigs that were over 400 bp in size were obtained and subjected to further analyses. A similarity search against the non-redundant (nr) protein database returned 32,018 (63%) positive BLASTx hits. The transcriptome analysis was annotated using the clusters of orthologous groups (COG), gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Pfam databases. A search for putative molecular marker was performed to identify simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). In total, 17,927 SSRs and 404,114 SNPs were detected. Finally, we selected sequences that were identified as belonging to the mevalonate (MVA) and 2-C-methyl-D-erythritol 4-phosphate (MEP) pathways, which are involved in rubber biosynthesis, to validate the SNP markers. A total of 78 SNPs were validated in 36 genotypes of H. brasiliensis. This new dataset represents a powerful information source for rubber tree bark genes and will be an important tool for the development of microsatellites and SNP markers for use in future genetic analyses such as genetic linkage mapping, quantitative trait loci identification, investigations of linkage disequilibrium and marker-assisted selection. PMID:25048025

  3. Detection of driver pathways using mutated gene network in cancer.

    PubMed

    Li, Feng; Gao, Lin; Ma, Xiaoke; Yang, Xiaofei

    2016-06-21

    Distinguishing driver pathways has been extensively studied because they are critical for understanding the development and molecular mechanisms of cancers. Most existing methods for driver pathways are based on high coverage as well as high mutual exclusivity, with the underlying assumption that mutations are exclusive. However, in many cases, mutated driver genes in the same pathways are not strictly mutually exclusive. Based on this observation, we propose an index for quantifying mutual exclusivity between gene pairs. Then, we construct a mutated gene network for detecting driver pathways by integrating the proposed index and coverage. The detection of driver pathways on the mutated gene network consists of two steps: raw pathways are obtained using a CPM method, and the final driver pathways are selected using a strict testing strategy. We apply this method to glioblastoma and breast cancers and find that our method is more accurate than state-of-the-art methods in terms of enrichment of KEGG pathways. Furthermore, the detected driver pathways intersect with well-known pathways with moderate exclusivity, which cannot be discovered using the existing algorithms. In conclusion, the proposed method provides an effective way to investigate driver pathways in cancers. PMID:27118146

  4. Database Marketplace 2002: The Database Universe.

    ERIC Educational Resources Information Center

    Tenopir, Carol; Baker, Gayle; Robinson, William

    2002-01-01

    Reviews the database industry over the past year, including new companies and services, company closures, popular database formats, popular access methods, and changes in existing products and services. Lists 33 firms and their database services; 33 firms and their database products; and 61 company profiles. (LRW)

  5. PathwayExplorer: web service for visualizing high-throughput expression data on biological pathways.

    PubMed

    Mlecnik, Bernhard; Scheideler, Marcel; Hackl, Hubert; Hartler, Jürgen; Sanchez-Cabo, Fatima; Trajanoski, Zlatko

    2005-07-01

    While generation of high-throughput expression data is becoming routine, the fast, easy, and systematic presentation and analysis of these data in a biological context is still an obstacle. To address this need, we have developed PathwayExplorer, which maps expression profiles of genes or proteins simultaneously onto major, currently available regulatory, metabolic and cellular pathways from KEGG, BioCarta and GenMAPP. PathwayExplorer is a platform-independent web server application with an optional standalone Java application using a SOAP (simple object access protocol) interface. Mapped pathways are ranked for the easy selection of the pathway of interest, displaying all available genes of this pathway with their expression profiles in a selectable and intuitive color code. Pathway maps produced can be downloaded as PNG, JPG or as high-resolution vector graphics SVG. The web service is freely available at https://pathwayexplorer.genome.tugraz.at; the standalone client can be downloaded at http://genome.tugraz.at. PMID:15980551

  6. A database of macromolecular motions.

    PubMed Central

    Gerstein, M; Krebs, W

    1998-01-01

    We describe a database of macromolecular motions meant to be of general use to the structural community. The database, which is accessible on the World Wide Web with an entry point at http://bioinfo.mbb.yale.edu/MolMovDB , attempts to systematize all instances of protein and nucleic acid movement for which there is at least some structural information. At present it contains >120 motions, most of which are of proteins. Protein motions are further classified hierarchically into a limited number of categories, first on the basis of size (distinguishing between fragment, domain and subunit motions) and then on the basis of packing. Our packing classification divides motions into various categories (shear, hinge, other) depending on whether or not they involve sliding over a continuously maintained and tightly packed interface. In addition, the database provides some indication about the evidence behind each motion (i.e. the type of experimental information or whether the motion is inferred based on structural similarity) and attempts to describe many aspects of a motion in terms of a standardized nomenclature (e.g. the maximum rotation, the residue selection of a fixed core, etc.). Currently, we use a standard relational design to implement the database. However, the complexity and heterogeneity of the information kept in the database makes it an ideal application for an object-relational approach, and we are moving it in this direction. Specifically, in terms of storing complex information, the database contains plausible representations for motion pathways, derived from restrained 3D interpolation between known endpoint conformations. These pathways can be viewed in a variety of movie formats, and the database is associated with a server that can automatically generate these movies from submitted coordinates. PMID:9722650

  7. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    SciTech Connect

    Shi, CY; Yang, H; Wei, CL; Yu, O; Zhang, ZZ; Sun, J; Wan, XC

    2011-01-01

    Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Using high-throughput Illumina RNA-seq, the transcriptome from poly (A){sup +} RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real

  8. A Survey of Metabolic Databases Emphasizing the MetaCyc Family

    PubMed Central

    Karp, Peter D.; Caspi, Ron

    2012-01-01

    Thanks to the confluence of genome sequencing and bioinformatics, the number of metabolic databases has expanded from a handful in the mid 1990s to several thousand today. These databases lie within distinct families that have common ancestry and common attributes. The main families are the MetaCyc, KEGG, Reactome, Model SEED, and BiGG families. We survey these database families, as well as important individual metabolic databases, including multiple human metabolic databases. The MetaCyc family is described in particular detail. It contains well over 1,000 databases, including highly curated databases for Escherichia coli, Saccharamyces cerevisiae, Mus musculus, and Arabidopsis thaliana. These databases are available through a number of web sites that offer a range of software tools for querying and visualizing metabolic networks. These web sites also provide multiple tools for analysis of gene expression and metabolomics data, including visualization of those datasets on metabolic network diagrams, and overrepresentation analysis of gene sets and metabolite sets. PMID:21523460

  9. Pathway-Based Factor Analysis of Gene Expression Data Produces Highly Heritable Phenotypes That Associate with Age

    PubMed Central

    Anand Brown, Andrew; Ding, Zhihao; Viñuela, Ana; Glass, Dan; Parts, Leopold; Spector, Tim; Winn, John; Durbin, Richard

    2015-01-01

    Statistical factor analysis methods have previously been used to remove noise components from high-dimensional data prior to genetic association mapping and, in a guided fashion, to summarize biologically relevant sources of variation. Here, we show how the derived factors summarizing pathway expression can be used to analyze the relationships between expression, heritability, and aging. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarize patterns of gene expression to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 “pathway phenotypes” that summarized patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold (P<5.38×10−5). These phenotypes are more heritable (h2=0.32) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolizing sugars and fatty acids; others relate to insulin signaling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors. PMID:25758824

  10. Pathway-based factor analysis of gene expression data produces highly heritable phenotypes that associate with age.

    PubMed

    Anand Brown, Andrew; Ding, Zhihao; Viñuela, Ana; Glass, Dan; Parts, Leopold; Spector, Tim; Winn, John; Durbin, Richard

    2015-05-01

    Statistical factor analysis methods have previously been used to remove noise components from high-dimensional data prior to genetic association mapping and, in a guided fashion, to summarize biologically relevant sources of variation. Here, we show how the derived factors summarizing pathway expression can be used to analyze the relationships between expression, heritability, and aging. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarize patterns of gene expression to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 "pathway phenotypes" that summarized patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold ([Formula: see text]). These phenotypes are more heritable ([Formula: see text]) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolizing sugars and fatty acids; others relate to insulin signaling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors. PMID:25758824

  11. Overlap in Bibliographic Databases.

    ERIC Educational Resources Information Center

    Hood, William W.; Wilson, Concepcion S.

    2003-01-01

    Examines the topic of Fuzzy Set Theory to determine the overlap of coverage in bibliographic databases. Highlights include examples of comparisons of database coverage; frequency distribution of the degree of overlap; records with maximum overlap; records unique to one database; intra-database duplicates; and overlap in the top ten databases.…

  12. The EXOSAT database system. Available databases.

    NASA Astrophysics Data System (ADS)

    Barron, C.

    1991-02-01

    This User's Guide describes the databases that are currently available by remote login to the EXOSAT/ESTEC site of the EXOSAT database system. This includes where ever possible the following: brief descriptions of each observatory, telescope and instrument references to more complete observatory descriptions a list of the contents of each database and how it was generated, parameter descriptions.

  13. Integrative data mining of high-throughput in vitro screens, in vivo data, and disease information to identify Adverse Outcome Pathway (AOP) signatures:ToxCast high-throughput screening data and Comparative Toxicogenomics Database (CTD) as a case study.

    EPA Science Inventory

    The Adverse Outcome Pathway (AOP) framework provides a systematic way to describe linkages between molecular and cellular processes and organism or population level effects. The current AOP assembly methods however, are inefficient. Our goal is to generate computationally-pr...

  14. Prolyl isomerase Pin1 regulated signaling pathway revealed by Pin1 +/+ and Pin1 -/- mouse embryonic fibroblast cells.

    PubMed

    Huang, Guo-Liang; Qiu, Jin-Hua; Li, Bin-Bin; Wu, Jing-Jing; Lu, Yan; Liu, Xing-Yan; He, Zhiwei

    2013-10-01

    Pin1 (peptidylprolyl cis/trans isomerase, NIMA-interacting 1) plays a key role in a number of diseases including cancer and Alzheimer disease. Previous studies have identified a wide range of phosphoproteins as Pin1 substrates. Related pathways were analyzed separately. The aim of this study was to provide a comprehensive picture involving Pin1 regulation. A genome-wide mRNA expression microarray was carried out using the RNA isolation from Pin1 (+/+) and Pin1 (-/-) mouse embryonic fibroblast (MEF) cells. Signaling pathways regulated by Pin1 were analyzed with the utility of KEGG pathway and GO annotation. An expression pattern regulated by Pin1 was revealed. A total of 606 genes, 375 being up-regulated and 231 down-regulated, were differentially expressed when comparing Pin1 +/+ to Pin1 -/- MEF cells. Totally 48 pathways were shown to be regulated by Pin1 expression in KEGG pathway analysis. In the GO annotation system, 19 processes on biological processes, 15 processes on cellular components, and 18 processes on molecular functions were found to be in the regulation of Pin1 expression. Pathways related to immune system and cancer showed most significant association with Pin1 regulation. Pin1 is an important regulator in a wide range of signaling pathways that were related to immune system and cancer. PMID:23563987

  15. Axonal guidance signaling pathway interacting with smoking in modifying the risk of pancreatic cancer: a gene- and pathway-based interaction analysis of GWAS data

    PubMed Central

    Li, Donghui

    2014-01-01

    Cigarette smoking is the best established modifiable risk factor for pancreatic cancer. Genetic factors that underlie smoking-related pancreatic cancer have previously not been examined at the genome-wide level. Taking advantage of the existing Genome-wide association study (GWAS) genotype and risk factor data from the Pancreatic Cancer Case Control Consortium, we conducted a discovery study in 2028 cases and 2109 controls to examine gene–smoking interactions at pathway/gene/single nucleotide polymorphism (SNP) level. Using the likelihood ratio test nested in logistic regression models and ingenuity pathway analysis (IPA), we examined 172 KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, 3 manually curated gene sets, 3 nicotine dependency gene ontology pathways, 17 912 genes and 468 114 SNPs. None of the individual pathway/gene/SNP showed significant interaction with smoking after adjusting for multiple comparisons. Six KEGG pathways showed nominal interactions (P < 0.05) with smoking, and the top two are the pancreatic secretion and salivary secretion pathways (major contributing genes: RAB8A, PLCB and CTRB1). Nine genes, i.e. ZBED2, EXO1, PSG2, SLC36A1, CLSTN1, MTHFSD, FAT2, IL10RB and ATXN2 had P interaction < 0.0005. Five intergenic region SNPs and two SNPs of the EVC and KCNIP4 genes had P interaction < 0.00003. In IPA analysis of genes with nominal interactions with smoking, axonal guidance signaling (P=2.12×10−7) and α-adrenergic signaling (P=2.52×10−5) genes were significantly overrepresented canonical pathways. Genes contributing to the axon guidance signaling pathway included the SLIT/ROBO signaling genes that were frequently altered in pancreatic cancer. These observations need to be confirmed in additional data set. Once confirmed, it will open a new avenue to unveiling the etiology of smoking-associated pancreatic cancer. PMID:24419231

  16. In silico enzymatic synthesis of a 400,000 compound biochemical database for nontargeted metabolomics.

    PubMed

    Menikarachchi, Lochana C; Hill, Dennis W; Hamdalla, Mai A; Mandoiu, Ion I; Grant, David F

    2013-09-23

    Current methods of structure identification in mass-spectrometry-based nontargeted metabolomics rely on matching experimentally determined features of an unknown compound to those of candidate compounds contained in biochemical databases. A major limitation of this approach is the relatively small number of compounds currently included in these databases. If the correct structure is not present in a database, it cannot be identified, and if it cannot be identified, it cannot be included in a database. Thus, there is an urgent need to augment metabolomics databases with rationally designed biochemical structures using alternative means. Here we present the In Vivo/In Silico Metabolites Database (IIMDB), a database of in silico enzymatically synthesized metabolites, to partially address this problem. The database, which is available at http://metabolomics.pharm.uconn.edu/iimdb/, includes ~23,000 known compounds (mammalian metabolites, drugs, secondary plant metabolites, and glycerophospholipids) collected from existing biochemical databases plus more than 400,000 computationally generated human phase-I and phase-II metabolites of these known compounds. IIMDB features a user-friendly web interface and a programmer-friendly RESTful web service. Ninety-five percent of the computationally generated metabolites in IIMDB were not found in any existing database. However, 21,640 were identical to compounds already listed in PubChem, HMDB, KEGG, or HumanCyc. Furthermore, the vast majority of these in silico metabolites were scored as biological using BioSM, a software program that identifies biochemical structures in chemical structure space. These results suggest that in silico biochemical synthesis represents a viable approach for significantly augmenting biochemical databases for nontargeted metabolomics applications. PMID:23991755

  17. Databases: Beyond the Basics.

    ERIC Educational Resources Information Center

    Whittaker, Robert

    This presented paper offers an elementary description of database characteristics and then provides a survey of databases that may be useful to the teacher and researcher in Slavic and East European languages and literatures. The survey focuses on commercial databases that are available, usable, and needed. Individual databases discussed include:…

  18. Reflective Database Access Control

    ERIC Educational Resources Information Center

    Olson, Lars E.

    2009-01-01

    "Reflective Database Access Control" (RDBAC) is a model in which a database privilege is expressed as a database query itself, rather than as a static privilege contained in an access control list. RDBAC aids the management of database access controls by improving the expressiveness of policies. However, such policies introduce new interactions…

  19. Human Mitochondrial Protein Database

    National Institute of Standards and Technology Data Gateway

    SRD 131 Human Mitochondrial Protein Database (Web, free access)   The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.

  20. YMDB: the Yeast Metabolome Database.

    PubMed

    Jewison, Timothy; Knox, Craig; Neveu, Vanessa; Djoumbou, Yannick; Guo, An Chi; Lee, Jacqueline; Liu, Philip; Mandal, Rupasri; Krishnamurthy, Ram; Sinelnikov, Igor; Wilson, Michael; Wishart, David S

    2012-01-01

    The Yeast Metabolome Database (YMDB, http://www.ymdb.ca) is a richly annotated 'metabolomic' database containing detailed information about the metabolome of Saccharomyces cerevisiae. Modeled closely after the Human Metabolome Database, the YMDB contains >2000 metabolites with links to 995 different genes/proteins, including enzymes and transporters. The information in YMDB has been gathered from hundreds of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the YMDB also contains an extensive collection of experimental intracellular and extracellular metabolite concentration data compiled from detailed Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) metabolomic analyses performed in our lab. This is further supplemented with thousands of NMR and MS spectra collected on pure, reference yeast metabolites. Each metabolite entry in the YMDB contains an average of 80 separate data fields including comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, intracellular/extracellular concentrations, growth conditions and substrates, pathway information, enzyme data, gene/protein sequence data, as well as numerous hyperlinks to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of S. cervesiae's importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers, but also to yeast biologists, systems biologists, the industrial fermentation industry, as well as the beer, wine and spirit industry. PMID:22064855

  1. Pathways with PathWhiz.

    PubMed

    Pon, Allison; Jewison, Timothy; Su, Yilu; Liang, Yongjie; Knox, Craig; Maciejewski, Adam; Wilson, Michael; Wishart, David S

    2015-07-01

    PathWhiz (http://smpdb.ca/pathwhiz) is a web server designed to create colourful, visually pleasing and biologically accurate pathway diagrams that are both machine-readable and interactive. As a web server, PathWhiz is accessible from almost any place and compatible with essentially any operating system. It also houses a public library of pathways and pathway components that can be easily viewed and expanded upon by its users. PathWhiz allows users to readily generate biologically complex pathways by using a specially designed drawing palette to quickly render metabolites (including automated structure generation), proteins (including quaternary structures, covalent modifications and cofactors), nucleic acids, membranes, subcellular structures, cells, tissues and organs. Both small-molecule and protein/gene pathways can be constructed by combining multiple pathway processes such as reactions, interactions, binding events and transport activities. PathWhiz's pathway replication and propagation functions allow for existing pathways to be used to create new pathways or for existing pathways to be automatically propagated across species. PathWhiz pathways can be saved in BioPAX, SBGN-ML and SBML data exchange formats, as well as PNG, PWML, HTML image map or SVG images that can be viewed offline or explored using PathWhiz's interactive viewer. PathWhiz has been used to generate over 700 pathway diagrams for a number of popular databases including HMDB, DrugBank and SMPDB. PMID:25934797

  2. Pathways with PathWhiz

    PubMed Central

    Pon, Allison; Jewison, Timothy; Su, Yilu; Liang, Yongjie; Knox, Craig; Maciejewski, Adam; Wilson, Michael; Wishart, David S.

    2015-01-01

    PathWhiz (http://smpdb.ca/pathwhiz) is a web server designed to create colourful, visually pleasing and biologically accurate pathway diagrams that are both machine-readable and interactive. As a web server, PathWhiz is accessible from almost any place and compatible with essentially any operating system. It also houses a public library of pathways and pathway components that can be easily viewed and expanded upon by its users. PathWhiz allows users to readily generate biologically complex pathways by using a specially designed drawing palette to quickly render metabolites (including automated structure generation), proteins (including quaternary structures, covalent modifications and cofactors), nucleic acids, membranes, subcellular structures, cells, tissues and organs. Both small-molecule and protein/gene pathways can be constructed by combining multiple pathway processes such as reactions, interactions, binding events and transport activities. PathWhiz's pathway replication and propagation functions allow for existing pathways to be used to create new pathways or for existing pathways to be automatically propagated across species. PathWhiz pathways can be saved in BioPAX, SBGN-ML and SBML data exchange formats, as well as PNG, PWML, HTML image map or SVG images that can be viewed offline or explored using PathWhiz's interactive viewer. PathWhiz has been used to generate over 700 pathway diagrams for a number of popular databases including HMDB, DrugBank and SMPDB. PMID:25934797

  3. MetaMapp: mapping and visualizing metabolomic data by integrating information from biochemical pathways and chemical and mass spectral similarity

    PubMed Central

    2012-01-01

    Background Exposure to environmental tobacco smoke (ETS) leads to higher rates of pulmonary diseases and infections in children. To study the biochemical changes that may precede lung diseases, metabolomic effects on fetal and maternal lungs and plasma from rats exposed to ETS were compared to filtered air control animals. Genome- reconstructed metabolic pathways may be used to map and interpret dysregulation in metabolic networks. However, mass spectrometry-based non-targeted metabolomics datasets often comprise many metabolites for which links to enzymatic reactions have not yet been reported. Hence, network visualizations that rely on current biochemical databases are incomplete and also fail to visualize novel, structurally unidentified metabolites. Results We present a novel approach to integrate biochemical pathway and chemical relationships to map all detected metabolites in network graphs (MetaMapp) using KEGG reactant pair database, Tanimoto chemical and NIST mass spectral similarity scores. In fetal and maternal lungs, and in maternal blood plasma from pregnant rats exposed to environmental tobacco smoke (ETS), 459 unique metabolites comprising 179 structurally identified compounds were detected by gas chromatography time of flight mass spectrometry (GC-TOF MS) and BinBase data processing. MetaMapp graphs in Cytoscape showed much clearer metabolic modularity and complete content visualization compared to conventional biochemical mapping approaches. Cytoscape visualization of differential statistics results using these graphs showed that overall, fetal lung metabolism was more impaired than lungs and blood metabolism in dams. Fetuses from ETS-exposed dams expressed lower lipid and nucleotide levels and higher amounts of energy metabolism intermediates than control animals, indicating lower biosynthetic rates of metabolites for cell division, structural proteins and lipids that are critical for in lung development. Conclusions MetaMapp graphs efficiently

  4. Plasma Metabolic Profile Determination in Young ST-segment Elevation Myocardial Infarction Patients with Ischemia and Reperfusion: Ultra-performance Liquid Chromatography and Mass Spectrometry for Pathway Analysis

    PubMed Central

    Huang, Lei; Li, Tong; Liu, Ying-Wu; Zhang, Lei; Dong, Zhi-Huan; Liu, Shu-Ye; Gao, Ying-Tang

    2016-01-01

    Background: This study was to establish a disease differentiation model for ST-segment elevation myocardial infarction (STEMI) youth patients experiencing ischemia and reperfusion via ultra-performance liquid chromatography and mass spectrometry (UPLC/MS) platform, which searches for closely related characteristic metabolites and metabolic pathways to evaluate their predictive value in the prognosis after discharge. Methods: Forty-seven consecutive STEMI patients (23 patients under 45 years of age, referred to here as “youth,” and 24 “elderly” patients) and 48 healthy control group members (24 youth, 24 elderly) were registered prospectively. The youth patients were required to provide a second blood draw during a follow-up visit one year after morbidity (n = 22, one lost). Characteristic metabolites and relative metabolic pathways were screened via UPLC/MS platform base on the Kyoto encyclopedia of genes and genomes (KEGG) and Human Metabolome Database. Receiver operating characteristic (ROC) curves were drawn to evaluate the predictive value of characteristic metabolites in the prognosis after discharge. Results: We successfully established an orthogonal partial least squares discriminated analysis model (R2X = 71.2%, R2Y = 79.6%, and Q2 = 55.9%) and screened out 24 ions; the sphingolipid metabolism pathway showed the most drastic change. The ROC curve analysis showed that ceramide [Cer(d18:0/16:0), Cer(t18:0/12:0)] and sphinganine in the sphingolipid pathway have high sensitivity and specificity on the prognosis related to major adverse cardiovascular events after youth patients were discharged. The area under curve (AUC) was 0.671, 0.750, and 0.711, respectively. A follow-up validation one year after morbidity showed corresponding AUC of 0.778, 0.833, and 0.806. Conclusions: By analyzing the plasma metabolism of myocardial infarction patients, we successfully established a model that can distinguish two different factors simultaneously: pathological

  5. Network II Database

    Energy Science and Technology Software Center (ESTSC)

    1994-11-07

    The Oak Ridge National Laboratory (ORNL) Rail and Barge Network II Database is a representation of the rail and barge system of the United States. The network is derived from the Federal Rail Administration (FRA) rail database.

  6. Physiological Information Database (PID)

    EPA Science Inventory

    EPA has developed a physiological information database (created using Microsoft ACCESS) intended to be used in PBPK modeling. The database contains physiological parameter values for humans from early childhood through senescence as well as similar data for laboratory animal spec...

  7. THE ECOTOX DATABASE

    EPA Science Inventory

    The database provides chemical-specific toxicity information for aquatic life, terrestrial plants, and terrestrial wildlife. ECOTOX is a comprehensive ecotoxicology database and is therefore essential for providing and suppoirting high quality models needed to estimate population...

  8. Household Products Database: Pesticides

    MedlinePlus

    ... Names Types of Products Manufacturers Ingredients About the Database FAQ Product Recalls Help Glossary Contact Us More ... holders. Information is extracted from Consumer Product Information Database ©2001-2015 by DeLima Associates. All rights reserved. ...

  9. MPlus Database system

    SciTech Connect

    Not Available

    1989-01-20

    The MPlus Database program was developed to keep track of mail received. This system was developed by TRESP for the Department of Energy/Oak Ridge Operations. The MPlus Database program is a PC application, written in dBase III+'' and compiled with Clipper'' into an executable file. The files you need to run the MPLus Database program can be installed on a Bernoulli, or a hard drive. This paper discusses the use of this database.

  10. Aviation Safety Issues Database

    NASA Technical Reports Server (NTRS)

    Morello, Samuel A.; Ricks, Wendell R.

    2009-01-01

    The aviation safety issues database was instrumental in the refinement and substantiation of the National Aviation Safety Strategic Plan (NASSP). The issues database is a comprehensive set of issues from an extremely broad base of aviation functions, personnel, and vehicle categories, both nationally and internationally. Several aviation safety stakeholders such as the Commercial Aviation Safety Team (CAST) have already used the database. This broader interest was the genesis to making the database publically accessible and writing this report.

  11. The BioPAX community standard for pathway

    SciTech Connect

    Syed, Mustafa H

    2010-01-01

    Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery.

  12. Mission and Assets Database

    NASA Technical Reports Server (NTRS)

    Baldwin, John; Zendejas, Silvino; Gutheinz, Sandy; Borden, Chester; Wang, Yeou-Fang

    2009-01-01

    Mission and Assets Database (MADB) Version 1.0 is an SQL database system with a Web user interface to centralize information. The database stores flight project support resource requirements, view periods, antenna information, schedule, and forecast results for use in mid-range and long-term planning of Deep Space Network (DSN) assets.

  13. Plant and Crop Databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Databases have become an integral part of all aspects of biological research, including basic and applied plant biology. The importance of databases continues to increase as the volume of data from direct and indirect genomics approaches expands. What is not always obvious to users of databases is t...

  14. The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection

    PubMed Central

    Rigden, Daniel J.; Fernández-Suárez, Xosé M.; Galperin, Michael Y.

    2016-01-01

    The 2016 Database Issue of Nucleic Acids Research starts with overviews of the resources provided by three major bioinformatics centers, the U.S. National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EMBL-EBI) and Swiss Institute for Bioinformatics (SIB). Also included are descriptions of 62 new databases and updates on 95 databases that have been previously featured in NAR plus 17 previously described elsewhere. A number of papers in this issue deal with resources on nucleic acids, including various kinds of non-coding RNAs and their interactions, molecular dynamics simulations of nucleic acid structure, and two databases of super-enhancers. The protein database section features important updates on the EBI's Pfam, PDBe and PRIDE databases, as well as a variety of resources on pathways, metabolomics and metabolic modeling. This issue also includes updates on popular metagenomics resources, such as MG-RAST, EBI Metagenomics, and probeBASE, as well as a newly compiled Human Pan-Microbe Communities database. A significant fraction of the new and updated databases are dedicated to the genetic basis of disease, primarily cancer, and various aspects of drug research, including resources for patented drugs, their side effects, withdrawn drugs, and potential drug targets. A further six papers present updated databases of various antimicrobial and anticancer peptides. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/). The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been updated with the addition of 88 new resources and removal of 23 obsolete websites, which brought the current listing to 1685 databases. PMID:26740669

  15. The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection.

    PubMed

    Rigden, Daniel J; Fernández-Suárez, Xosé M; Galperin, Michael Y

    2016-01-01

    The 2016 Database Issue of Nucleic Acids Research starts with overviews of the resources provided by three major bioinformatics centers, the U.S. National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EMBL-EBI) and Swiss Institute for Bioinformatics (SIB). Also included are descriptions of 62 new databases and updates on 95 databases that have been previously featured in NAR plus 17 previously described elsewhere. A number of papers in this issue deal with resources on nucleic acids, including various kinds of non-coding RNAs and their interactions, molecular dynamics simulations of nucleic acid structure, and two databases of super-enhancers. The protein database section features important updates on the EBI's Pfam, PDBe and PRIDE databases, as well as a variety of resources on pathways, metabolomics and metabolic modeling. This issue also includes updates on popular metagenomics resources, such as MG-RAST, EBI Metagenomics, and probeBASE, as well as a newly compiled Human Pan-Microbe Communities database. A significant fraction of the new and updated databases are dedicated to the genetic basis of disease, primarily cancer, and various aspects of drug research, including resources for patented drugs, their side effects, withdrawn drugs, and potential drug targets. A further six papers present updated databases of various antimicrobial and anticancer peptides. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/). The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been updated with the addition of 88 new resources and removal of 23 obsolete websites, which brought the current listing to 1685 databases. PMID:26740669

  16. SENTRA, a database of signal transduction proteins.

    SciTech Connect

    D'Souza, M.; Romine, M. F.; Maltsev, N.; Mathematics and Computer Science; PNNL

    2000-01-01

    SENTRA, available via URL http://wit.mcs.anl.gov/WIT2/Sentra/, is a database of proteins associated with microbial signal transduction. The database currently includes the classical two-component signal transduction pathway proteins and methyl-accepting chemotaxis proteins, but will be expanded to also include other classes of signal transduction systems that are modulated by phosphorylation or methylation reactions. Although the majority of database entries are from prokaryotic systems, eukaroytic proteins with bacterial-like signal transduction domains are also included. Currently SENTRA contains signal transduction proteins in 34 complete and almost completely sequenced prokaryotic genomes, as well as sequences from 243 organisms available in public databases (SWISS-PROT and EMBL). The analysis was carried out within the framework of the WIT2 system, which is designed and implemented to support genetic sequence analysis and comparative analysis of sequenced genomes.

  17. Visualization of multidimensional database

    NASA Astrophysics Data System (ADS)

    Lee, Chung

    2008-01-01

    The concept of multidimensional databases has been extensively researched and wildly used in actual database application. It plays an important role in contemporary information technology, but due to the complexity of its inner structure, the database design is a complicated process and users are having a hard time fully understanding and using the database. An effective visualization tool for higher dimensional information system helps database designers and users alike. Most visualization techniques focus on displaying dimensional data using spreadsheets and charts. This may be sufficient for the databases having three or fewer dimensions but for higher dimensions, various combinations of projection operations are needed and a full grasp of total database architecture is very difficult. This study reviews existing visualization techniques for multidimensional database and then proposes an alternate approach to visualize a database of any dimension by adopting the tool proposed by Kiviat for software engineering processes. In this diagramming method, each dimension is represented by one branch of concentric spikes. This paper documents a C++ based visualization tool with extensive use of OpenGL graphics library and GUI functions. Detailed examples of actual databases demonstrate the feasibility and effectiveness in visualizing multidimensional databases.

  18. HMDB: the Human Metabolome Database.

    PubMed

    Wishart, David S; Tzur, Dan; Knox, Craig; Eisner, Roman; Guo, An Chi; Young, Nelson; Cheng, Dean; Jewell, Kevin; Arndt, David; Sawhney, Summit; Fung, Chris; Nikolai, Lisa; Lewis, Mike; Coutouly, Marie-Aude; Forsythe, Ian; Tang, Peter; Shrivastava, Savita; Jeroncic, Kevin; Stothard, Paul; Amegbey, Godwin; Block, David; Hau, David D; Wagner, James; Miniaci, Jessica; Clements, Melisa; Gebremedhin, Mulu; Guo, Natalie; Zhang, Ying; Duggan, Gavin E; Macinnis, Glen D; Weljie, Alim M; Dowlatabadi, Reza; Bamforth, Fiona; Clive, Derrick; Greiner, Russ; Li, Liang; Marrie, Tom; Sykes, Brian D; Vogel, Hans J; Querengesser, Lori

    2007-01-01

    The Human Metabolome Database (HMDB) is currently the most complete and comprehensive curated collection of human metabolite and human metabolism data in the world. It contains records for more than 2180 endogenous metabolites with information gathered from thousands of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the HMDB also contains an extensive collection of experimental metabolite concentration data compiled from hundreds of mass spectra (MS) and Nuclear Magnetic resonance (NMR) metabolomic analyses performed on urine, blood and cerebrospinal fluid samples. This is further supplemented with thousands of NMR and MS spectra collected on purified, reference metabolites. Each metabolite entry in the HMDB contains an average of 90 separate data fields including a comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, biofluid concentrations, disease associations, pathway information, enzyme data, gene sequence data, SNP and mutation data as well as extensive links to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided. The HMDB is designed to address the broad needs of biochemists, clinical chemists, physicians, medical geneticists, nutritionists and members of the metabolomics community. The HMDB is available at: www.hmdb.ca. PMID:17202168

  19. The EcoCyc Database

    PubMed Central

    Karp, Peter D.; Weaver, Daniel; Paley, Suzanne; Fulcher, Carol; Kubo, Aya; Kothari, Anamika; Krummenacker, Markus; Subhraveti, Pallavi; Weerasinghe, Deepika; Gama-Castro, Socorro; Huerta, Araceli M.; Muñiz-Rascado, Luis; Bonavides-Martinez, César; Weiss, Verena; Peralta-Gil, Martin; Santos-Zavaleta, Alberto; Schröder, Imke; Mackie, Amanda; Gunsalus, Robert; Collado-Vides, Julio; Keseler, Ingrid M.; Paulsen, Ian

    2014-01-01

    EcoCyc is a bioinformatics database available at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists, and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene, metabolite, reaction, operon, and metabolic pathway. The database also includes information on E. coli gene essentiality, and on nutrient conditions that do or do not support the growth of E. coli. The web site and downloadable software contain tools for analysis of high-throughput datasets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. This chapter provides a detailed description of the data content of EcoCyc, and of the procedures by which this content is generated. PMID:26442933

  20. HMDB: the Human Metabolome Database

    PubMed Central

    Wishart, David S.; Tzur, Dan; Knox, Craig; Eisner, Roman; Guo, An Chi; Young, Nelson; Cheng, Dean; Jewell, Kevin; Arndt, David; Sawhney, Summit; Fung, Chris; Nikolai, Lisa; Lewis, Mike; Coutouly, Marie-Aude; Forsythe, Ian; Tang, Peter; Shrivastava, Savita; Jeroncic, Kevin; Stothard, Paul; Amegbey, Godwin; Block, David; Hau, David. D.; Wagner, James; Miniaci, Jessica; Clements, Melisa; Gebremedhin, Mulu; Guo, Natalie; Zhang, Ying; Duggan, Gavin E.; MacInnis, Glen D.; Weljie, Alim M.; Dowlatabadi, Reza; Bamforth, Fiona; Clive, Derrick; Greiner, Russ; Li, Liang; Marrie, Tom; Sykes, Brian D.; Vogel, Hans J.; Querengesser, Lori

    2007-01-01

    The Human Metabolome Database (HMDB) is currently the most complete and comprehensive curated collection of human metabolite and human metabolism data in the world. It contains records for more than 2180 endogenous metabolites with information gathered from thousands of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the HMDB also contains an extensive collection of experimental metabolite concentration data compiled from hundreds of mass spectra (MS) and Nuclear Magnetic resonance (NMR) metabolomic analyses performed on urine, blood and cerebrospinal fluid samples. This is further supplemented with thousands of NMR and MS spectra collected on purified, reference metabolites. Each metabolite entry in the HMDB contains an average of 90 separate data fields including a comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, biofluid concentrations, disease associations, pathway information, enzyme data, gene sequence data, SNP and mutation data as well as extensive links to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided. The HMDB is designed to address the broad needs of biochemists, clinical chemists, physicians, medical geneticists, nutritionists and members of the metabolomics community. The HMDB is available at: PMID:17202168

  1. A signal transduction score flow algorithm for cyclic cellular pathway analysis, which combines transcriptome and ChIP-seq data.

    PubMed

    Isik, Zerrin; Ersahin, Tulin; Atalay, Volkan; Aykanat, Cevdet; Cetin-Atalay, Rengul

    2012-10-30

    Determination of cell signalling behaviour is crucial for understanding the physiological response to a specific stimulus or drug treatment. Current approaches for large-scale data analysis do not effectively incorporate critical topological information provided by the signalling network. We herein describe a novel model- and data-driven hybrid approach, or signal transduction score flow algorithm, which allows quantitative visualization of cyclic cell signalling pathways that lead to ultimate cell responses such as survival, migration or death. This score flow algorithm translates signalling pathways as a directed graph and maps experimental data, including negative and positive feedbacks, onto gene nodes as scores, which then computationally traverse the signalling pathway until a pre-defined biological target response is attained. Initially, experimental data-driven enrichment scores of the genes were computed in a pathway, then a heuristic approach was applied using the gene score partition as a solution for protein node stoichiometry during dynamic scoring of the pathway of interest. Incorporation of a score partition during the signal flow and cyclic feedback loops in the signalling pathway significantly improves the usefulness of this model, as compared to other approaches. Evaluation of the score flow algorithm using both transcriptome and ChIP-seq data-generated signalling pathways showed good correlation with expected cellular behaviour on both KEGG and manually generated pathways. Implementation of the algorithm as a Cytoscape plug-in allows interactive visualization and analysis of KEGG pathways as well as user-generated and curated Cytoscape pathways. Moreover, the algorithm accurately predicts gene-level and global impacts of single or multiple in silico gene knockouts. PMID:23042589

  2. An Introduction to Database Structure and Database Machines.

    ERIC Educational Resources Information Center

    Detweiler, Karen

    1984-01-01

    Enumerates principal management objectives of database management systems (data independence, quality, security, multiuser access, central control) and criteria for comparison (response time, size, flexibility, other features). Conventional database management systems, relational databases, and database machines used for backend processing are…

  3. FOAM (Functional Ontology Assignments for Metagenomes): a Hidden Markov Model (HMM) database with environmental focus.

    PubMed

    Prestat, Emmanuel; David, Maude M; Hultman, Jenni; Taş, Neslihan; Lamendella, Regina; Dvornik, Jill; Mackelprang, Rachel; Myrold, David D; Jumpponen, Ari; Tringe, Susannah G; Holman, Elizabeth; Mavromatis, Konstantinos; Jansson, Janet K

    2014-10-29

    A new functional gene database, FOAM (Functional Ontology Assignments for Metagenomes), was developed to screen environmental metagenomic sequence datasets. FOAM provides a new functional ontology dedicated to classify gene functions relevant to environmental microorganisms based on Hidden Markov Models (HMMs). Sets of aligned protein sequences (i.e. 'profiles') were tailored to a large group of target KEGG Orthologs (KOs) from which HMMs were trained. The alignments were checked and curated to make them specific to the targeted KO. Within this process, sequence profiles were enriched with the most abundant sequences available to maximize the yield of accurate classifier models. An associated functional ontology was built to describe the functional groups and hierarchy. FOAM allows the user to select the target search space before HMM-based comparison steps and to easily organize the results into different functional categories and subcategories. FOAM is publicly available at http://portal.nersc.gov/project/m1317/FOAM/. PMID:25260589

  4. FOAM (Functional Ontology Assignments for Metagenomes): A Hidden Markov Model (HMM) database with environmental focus

    SciTech Connect

    Prestat, Emmanuel; David, Maude M.; Hultman, Jenni; Ta , Neslihan; Lamendella, Regina; Dvornik, Jill; Mackelprang, Rachel; Myrold, David D.; Jumpponen, Ari; Tringe, Susannah G.; Holman, Elizabeth; Mavromatis, Konstantinos; Jansson, Janet K.

    2014-09-26

    A new functional gene database, FOAM (Functional Ontology Assignments for Metagenomes), was developed to screen environmental metagenomic sequence datasets. FOAM provides a new functional ontology dedicated to classify gene functions relevant to environmental microorganisms based on Hidden Markov Models (HMMs). Sets of aligned protein sequences (i.e. ‘profiles’) were tailored to a large group of target KEGG Orthologs (KOs) from which HMMs were trained. The alignments were checked and curated to make them specific to the targeted KO. Within this process, sequence profiles were enriched with the most abundant sequences available to maximize the yield of accurate classifier models. An associated functional ontology was built to describe the functional groups and hierarchy. FOAM allows the user to select the target search space before HMM-based comparison steps and to easily organize the results into different functional categories and subcategories. FOAM is publicly available at http://portal.nersc.gov/project/m1317/FOAM/.

  5. Pathway analysis of body mass index genome-wide association study highlights risk pathways in cardiovascular disease

    PubMed Central

    Zhao, Xin; Gu, Jinxia; Li, Ming; Xi, Jie; Sun, Wenyu; Song, Guangmin; Liu, Guiyou

    2015-01-01

    Cardiovascular disease (CVD) is a class of diseases that involve the heart or blood vessels. It is reported that body mass index (BMI) is risk factor for CVD. Genome-wide association studies (GWAS) have recently provided rapid insights into genetics of CVD and its risk factors. However, the specific mechanisms how BMI influences CVD risk are largely unknown. We think that BMI may influences CVD risk by shared genetic pathways. In order to confirm this view, we conducted a pathway analysis of BMI GWAS, which examined approximately 329,091 single nucleotide polymorphisms from 4763 samples. We identified 31 significant KEGG pathways. There is literature evidence supporting the involvement of GnRH signaling, vascular smooth muscle contraction, dilated cardiomyopathy, Gap junction, Wnt signaling, Calcium signaling and Chemokine signaling in CVD. Collectively, our study supports the potential role of the CVD risk pathways in BMI. BMI may influence CVD risk by the shared genetic pathways. We believe that our results may advance our understanding of BMI mechanisms in CVD. PMID:26264282

  6. An extended bioreaction database that significantly improves reconstruction and analysis of genome-scale metabolic networks.

    PubMed

    Stelzer, Michael; Sun, Jibin; Kamphans, Tom; Fekete, Sándor P; Zeng, An-Ping

    2011-11-01

    The bioreaction database established by Ma and Zeng (Bioinformatics, 2003, 19, 270-277) for in silico reconstruction of genome-scale metabolic networks has been widely used. Based on more recent information in the reference databases KEGG LIGAND and Brenda, we upgrade the bioreaction database in this work by almost doubling the number of reactions from 3565 to 6851. Over 70% of the reactions have been manually updated/revised in terms of reversibility, reactant pairs, currency metabolites and error correction. For the first time, 41 spontaneous sugar mutarotation reactions are introduced into the biochemical database. The upgrade significantly improves the reconstruction of genome scale metabolic networks. Many gaps or missing biochemical links can be recovered, as exemplified with three model organisms Homo sapiens, Aspergillus niger, and Escherichia coli. The topological parameters of the constructed networks were also largely affected, however, the overall network structure remains scale-free. Furthermore, we consider the problem of computing biologically feasible shortest paths in reconstructed metabolic networks. We show that these paths are hard to compute and present solutions to find such paths in networks of small and medium size. PMID:21952610

  7. 2010 Worldwide Gasification Database

    DOE Data Explorer

    The 2010 Worldwide Gasification Database describes the current world gasification industry and identifies near-term planned capacity additions. The database lists gasification projects and includes information (e.g., plant location, number and type of gasifiers, syngas capacity, feedstock, and products). The database reveals that the worldwide gasification capacity has continued to grow for the past several decades and is now at 70,817 megawatts thermal (MWth) of syngas output at 144 operating plants with a total of 412 gasifiers.

  8. ITS-90 Thermocouple Database

    National Institute of Standards and Technology Data Gateway

    SRD 60 NIST ITS-90 Thermocouple Database (Web, free access)   Web version of Standard Reference Database 60 and NIST Monograph 175. The database gives temperature -- electromotive force (emf) reference functions and tables for the letter-designated thermocouple types B, E, J, K, N, R, S and T. These reference functions have been adopted as standards by the American Society for Testing and Materials (ASTM) and the International Electrotechnical Commission (IEC).

  9. Backing up DMF Databases

    NASA Technical Reports Server (NTRS)

    Cardo, Nicholas P.; Woodrow, Thomas (Technical Monitor)

    1994-01-01

    A complete backup of the Cray Data Migration Facility (DMF) databases should include the data migration databases, all media specific process' (MSP's) databases, and the journal file. The backup should be able to accomplished without impacting users or stopping DMF. The High Speed Processors group at the Numerical Aerodynamics Simulation (NAS) Facility at NASA Ames Research Center undertook the task of finding an effective and efficient way to backup all DMF databases. This has been accomplished by taking advantage of new features introduced in DMF 2.0 and adding a minor modification to the dmdaemon. This paper discusses the investigation and the changes necessary to implement these enhancements.

  10. Opening CEM vendor databases

    SciTech Connect

    Long, A.; Patel, D.

    1995-12-31

    CEM database performance requirements (i.e., voluminous data storage, rapid response times) often conflict with the concept of an open, accessible database. Utilities would like to use their CEM data for more purposes than simply submitting environmental reports. But in most cases, other uses are inhibited because today`s sophisticated CEM systems incorporate databases that have forsaken openness and accessibility in favor of performance. Several options are available for CEM vendors wishing to move in the direction of open, accessible CEM databases.

  11. Veterans Administration Databases

    Cancer.gov

    The Veterans Administration Information Resource Center provides database and informatics experts, customer service, expert advice, information products, and web technology to VA researchers and others.

  12. Combinations of gene ontology and pathway characterize and predict prognosis genes for recurrence of gastric cancer after surgery.

    PubMed

    Fan, Haiyan; Guo, Zhanjun; Wang, Cuijv

    2015-09-01

    Gastric cancer (GC) is the second leading cause of death from cancer globally. The most common cause of GC is the infection of Helicobacter pylori, but ∼11% of cases are caused by genetic factors. However, recurrences occur in approximately one-third of stage II GC patients, even if they are treated with adjuvant chemotherapy or chemoradiotherapy. This is potentially due to expression variation of genes; some candidate prognostic genes were identified in patients with high-risk recurrences. The objective of this study was to develop an effective computational method for meaningfully interpreting these GC-related genes and accurately predicting novel prognostic genes for high-risk recurrence patients. We employed properties of genes (gene ontology [GO] and KEGG pathway information) as features to characterize GC-related genes. We obtained an optimal set of features for interpreting these genes. By applying the minimum redundancy maximum relevance algorithm, we predicted the GC-related genes. With the same approach, we further predicted the genes for the prognostic of high-risk recurrence. We obtained 1104 GO terms and KEGG pathways and 530 GO terms and KEGG pathways, respectively, that characterized GC-related genes and recurrence-related genes well. Finally, three novel prognostic genes were predicted to help supplement genetic markers of high-risk GC patients for recurrence after surgery. An in-depth text mining indicated that the results are quite consistent with previous knowledge. Survival analysis of patients confirmed the novel prognostic genes as markers. By analyzing the related genes, we developed a systematic method to interpret the possible underlying mechanism of GC. The novel prognostic genes facilitate the understanding and therapy of GC recurrences after surgery. PMID:26154702

  13. Implication of the immune system in Alzheimer's disease: evidence from genome-wide pathway analysis.

    PubMed

    Lambert, Jean-Charles; Grenier-Boley, Benjamin; Chouraki, Vincent; Heath, Simon; Zelenika, Diana; Fievet, Nathalie; Hannequin, Didier; Pasquier, Florence; Hanon, Olivier; Brice, Alexis; Epelbaum, Jacques; Berr, Claudine; Dartigues, Jean-Francois; Tzourio, Christophe; Campion, Dominique; Lathrop, Mark; Amouyel, Philippe

    2010-01-01

    The results of several genome-wide association studies (GWASs) in the field of Alzheimer's disease (AD) have recently been published. Although these studies reported in detail on single-nucleotide polymorphisms (SNPs) and the neighboring genes with the strongest evidence of association with AD, little attention was paid to the rest of the genome. However, complementary statistical and bio-informatics approaches now enable the extraction of pertinent information from other SNPs and/or genes which are only nominally associated with the disease risk. Two different tools (the ALIGATOR and GenGen/KEGG software packages) were used to analyze a large GWAS dataset containing 2,032 AD cases and 5,328 controls. Convergent outputs from the two gene set enrichment approaches suggested an immune system dysfunction in AD. Furthermore, although these statistical approaches did not adopt a priori hypotheses concerning a biological function's putative role in the disease process, genes associated with AD risk were overrepresented in the "Alzheimer's disease" KEGG pathway. In conclusion, a systematic search for biological pathways using GWAS data set seems to comfort the primary causes already suspected but may specifically highlight the importance of the immune system in AD. PMID:20413860

  14. GOLD.db: genomics of lipid-associated disorders database

    PubMed Central

    Hackl, Hubert; Maurer, Michael; Mlecnik, Bernhard; Hartler, Jürgen; Stocker, Gernot; Miranda-Saavedra, Diego; Trajanoski, Zlatko

    2004-01-01

    Background The GOLD.db (Genomics of Lipid-Associated Disorders Database) was developed to address the need for integrating disparate information on the function and properties of genes and their products that are particularly relevant to the biology, diagnosis management, treatment, and prevention of lipid-associated disorders. Description The GOLD.db provides a reference for pathways and information about the relevant genes and proteins in an efficiently organized way. The main focus was to provide biological pathways with image maps and visual pathway information for lipid metabolism and obesity-related research. This database provides also the possibility to map gene expression data individually to each pathway. Gene expression at different experimental conditions can be viewed sequentially in context of the pathway. Related large scale gene expression data sets were provided and can be searched for specific genes to integrate information regarding their expression levels in different studies and conditions. Analytic and data mining tools, reagents, protocols, references, and links to relevant genomic resources were included in the database. Finally, the usability of the database was demonstrated using an example about the regulation of Pten mRNA during adipocyte differentiation in the context of relevant pathways. Conclusions The GOLD.db will be a valuable tool that allow researchers to efficiently analyze patterns of gene expression and to display them in a variety of useful and informative ways, allowing outside researchers to perform queries pertaining to gene expression results in the context of biological processes and pathways. PMID:15588328

  15. DIP: The Database of Interacting Proteins

    DOE Data Explorer

    The DIP Database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. By interaction, the DIP Database creators mean that two amino acid chains were experimentally identified to bind to each other. The database lists such pairs to aid those studying a particular protein-protein interaction but also those investigating entire regulatory and signaling pathways as well as those studying the organisation and complexity of the protein interaction network at the cellular level. The data stored within the DIP database were curated, both, manually by expert curators and also automatically using computational approaches that utilize the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data. It is a relational database that can be searched by protein, sequence, motif, article information, and pathBLAST. The website also serves as an access point to a number of projects related to DIP, such as LiveDIP, The Database of Ligand-Receptor Partners (DLRP) and JDIP. Users have free and open access to DIP after login. [Taken from the DIP Guide and the DIP website] (Specialized Interface) (Registration Required)

  16. ECMDB: the E. coli Metabolome Database.

    PubMed

    Guo, An Chi; Jewison, Timothy; Wilson, Michael; Liu, Yifeng; Knox, Craig; Djoumbou, Yannick; Lo, Patrick; Mandal, Rupasri; Krishnamurthy, Ram; Wishart, David S

    2013-01-01

    The Escherichia coli Metabolome Database (ECMDB, http://www.ecmdb.ca) is a comprehensively annotated metabolomic database containing detailed information about the metabolome of E. coli (K-12). Modelled closely on the Human and Yeast Metabolome Databases, the ECMDB contains >2600 metabolites with links to ∼1500 different genes and proteins, including enzymes and transporters. The information in the ECMDB has been collected from dozens of textbooks, journal articles and electronic databases. Each metabolite entry in the ECMDB contains an average of 75 separate data fields, including comprehensive compound descriptions, names and synonyms, chemical taxonomy, compound structural and physicochemical data, bacterial growth conditions and substrates, reactions, pathway information, enzyme data, gene/protein sequence data and numerous hyperlinks to images, references and other public databases. The ECMDB also includes an extensive collection of intracellular metabolite concentration data compiled from our own work as well as other published metabolomic studies. This information is further supplemented with thousands of fully assigned reference nuclear magnetic resonance and mass spectrometry spectra obtained from pure E. coli metabolites that we (and others) have collected. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of E. coli's importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers but also to molecular biologists, systems biologists and individuals in the biotechnology industry. PMID:23109553

  17. National Vulnerability Database (NVD)

    National Institute of Standards and Technology Data Gateway

    National Vulnerability Database (NVD) (Web, free access)   NVD is a comprehensive cyber security vulnerability database that integrates all publicly available U.S. Government vulnerability resources and provides references to industry resources. It is based on and synchronized with the CVE vulnerability naming standard.

  18. HIV Structural Database

    National Institute of Standards and Technology Data Gateway

    SRD 102 HIV Structural Database (Web, free access)   The HIV Protease Structural Database is an archive of experimentally determined 3-D structures of Human Immunodeficiency Virus 1 (HIV-1), Human Immunodeficiency Virus 2 (HIV-2) and Simian Immunodeficiency Virus (SIV) Proteases and their complexes with inhibitors or products of substrate cleavage.

  19. Biological Macromolecule Crystallization Database

    National Institute of Standards and Technology Data Gateway

    SRD 21 Biological Macromolecule Crystallization Database (Web, free access)   The Biological Macromolecule Crystallization Database and NASA Archive for Protein Crystal Growth Data (BMCD) contains the conditions reported for the crystallization of proteins and nucleic acids used in X-ray structure determinations and archives the results of microgravity macromolecule crystallization studies.

  20. Assignment to database industy

    NASA Astrophysics Data System (ADS)

    Abe, Kohichiroh

    Various kinds of databases are considered to be essential part in future large sized systems. Information provision only by databases is also considered to be growing as the market becomes mature. This paper discusses how such circumstances have been built and will be developed from now on.

  1. Dictionary as Database.

    ERIC Educational Resources Information Center

    Painter, Derrick

    1996-01-01

    Discussion of dictionaries as databases focuses on the digitizing of The Oxford English dictionary (OED) and the use of Standard Generalized Mark-Up Language (SGML). Topics include the creation of a consortium to digitize the OED, document structure, relational databases, text forms, sequence, and discourse. (LRW)

  2. A Quality System Database

    NASA Technical Reports Server (NTRS)

    Snell, William H.; Turner, Anne M.; Gifford, Luther; Stites, William

    2010-01-01

    A quality system database (QSD), and software to administer the database, were developed to support recording of administrative nonconformance activities that involve requirements for documentation of corrective and/or preventive actions, which can include ISO 9000 internal quality audits and customer complaints.

  3. BioImaging Database

    Energy Science and Technology Software Center (ESTSC)

    2006-10-25

    The Biolmaging Database (BID) is a relational database developed to store the data and meta-data for the 3D gene expression in early Drosophila embryo development on a cellular level. The schema was written to be used with the MySQL DBMS but with minor modifications can be used on any SQL compliant relational DBMS.

  4. The intelligent database machine

    NASA Technical Reports Server (NTRS)

    Yancey, K. E.

    1985-01-01

    The IDM data base was compared with the data base crack to determine whether IDM 500 would better serve the needs of the MSFC data base management system than Oracle. The two were compared and the performance of the IDM was studied. Implementations that work best on which database are implicated. The choice is left to the database administrator.

  5. Build Your Own Database.

    ERIC Educational Resources Information Center

    Jacso, Peter; Lancaster, F. W.

    This book is intended to help librarians and others to produce databases of better value and quality, especially if they have had little previous experience in database construction. Drawing upon almost 40 years of experience in the field of information retrieval, this book emphasizes basic principles and approaches rather than in-depth and…

  6. Atomic Spectra Database (ASD)

    National Institute of Standards and Technology Data Gateway

    SRD 78 NIST Atomic Spectra Database (ASD) (Web, free access)   This database provides access and search capability for NIST critically evaluated data on atomic energy levels, wavelengths, and transition probabilities that are reasonably up-to-date. The NIST Atomic Spectroscopy Data Center has carried out these critical compilations.

  7. CDS - Database Administrator's Guide

    NASA Astrophysics Data System (ADS)

    Day, J. P.

    This guide aims to instruct the CDS database administrator in: o The CDS file system. o The CDS index files. o The procedure for assimilating a new CDS tape into the database. It is assumed that the administrator has read SUN/79.

  8. Ionic Liquids Database- (ILThermo)

    National Institute of Standards and Technology Data Gateway

    SRD 147 Ionic Liquids Database- (ILThermo) (Web, free access)   IUPAC Ionic Liquids Database, ILThermo, is a free web research tool that allows users worldwide to access an up-to-date data collection from the publications on experimental investigations of thermodynamic, and transport properties of ionic liquids as well as binary and ternary mixtures containing ionic liquids.

  9. Database Searching by Managers.

    ERIC Educational Resources Information Center

    Arnold, Stephen E.

    Managers and executives need the easy and quick access to business and management information that online databases can provide, but many have difficulty articulating their search needs to an intermediary. One possible solution would be to encourage managers and their immediate support staff members to search textual databases directly as they now…

  10. Morchella MLST database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Welcome to the Morchella MLST database. This dedicated database was set up at the CBS-KNAW Biodiversity Center by Vincent Robert in February 2012, using BioloMICS software (Robert et al., 2011), to facilitate DNA sequence-based identifications of Morchella species via the Internet. The current datab...

  11. First Look: TRADEMARKSCAN Database.

    ERIC Educational Resources Information Center

    Fernald, Anne Conway; Davidson, Alan B.

    1984-01-01

    Describes database produced by Thomson and Thomson and available on Dialog which contains over 700,000 records representing all active federal trademark registrations and applications for registrations filed in United States Patent and Trademark Office. A typical record, special features, database applications, learning to use TRADEMARKSCAN, and…

  12. Knowledge Discovery in Databases.

    ERIC Educational Resources Information Center

    Norton, M. Jay

    1999-01-01

    Knowledge discovery in databases (KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and mechanisms for retrieving knowledge from data collections. The article is an introductory overview of KDD. The rationale and environment of its development and applications are discussed. Issues related to database design…

  13. Database Reviews: Legal Information.

    ERIC Educational Resources Information Center

    Seiser, Virginia

    Detailed reviews of two legal information databases--"Laborlaw I" and "Legal Resource Index"--are presented in this paper. Each database review begins with a bibliographic entry listing the title; producer; vendor; cost per hour contact time; offline print cost per citation; time period covered; frequency of updates; and size of file. A detailed…

  14. Database in Artificial Intelligence.

    ERIC Educational Resources Information Center

    Wilkinson, Julia

    1986-01-01

    Describes a specialist bibliographic database of literature in the field of artificial intelligence created by the Turing Institute (Glasgow, Scotland) using the BRS/Search information retrieval software. The subscription method for end-users--i.e., annual fee entitles user to unlimited access to database, document provision, and printed awareness…

  15. Structural Ceramics Database

    National Institute of Standards and Technology Data Gateway

    SRD 30 NIST Structural Ceramics Database (Web, free access)   The NIST Structural Ceramics Database (WebSCD) provides evaluated materials property data for a wide range of advanced ceramics known variously as structural ceramics, engineering ceramics, and fine ceramics.

  16. Online Database Searching Workbook.

    ERIC Educational Resources Information Center

    Littlejohn, Alice C.; Parker, Joan M.

    Designed primarily for use by first-time searchers, this workbook provides an overview of online searching. Following a brief introduction which defines online searching, databases, and database producers, five steps in carrying out a successful search are described: (1) identifying the main concepts of the search statement; (2) selecting a…

  17. CPDB: Carcinogenic Potency Database.

    PubMed

    Fitzpatrick, Roberta Bronson

    2008-01-01

    The Carcinogenic Potency Database reports analyses of animal cancer tests on 1,547 chemicals. These tests are used in support of cancer risk assessments for humans. Results are searchable and are made available via the National Library of Medicine's (NLM) TOXNET system. This column will provide background information on the database, as well as present search basics. PMID:19042710

  18. A computational platform to maintain and migrate manual functional annotations for BioCyc databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Model organism databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integratio...

  19. Cascadia Tsunami Deposit Database

    USGS Publications Warehouse

    Peters, Robert; Jaffe, Bruce; Gelfenbaum, Guy; Peterson, Curt

    2003-01-01

    The Cascadia Tsunami Deposit Database contains data on the location and sedimentological properties of tsunami deposits found along the Cascadia margin. Data have been compiled from 52 studies, documenting 59 sites from northern California to Vancouver Island, British Columbia that contain known or potential tsunami deposits. Bibliographical references are provided for all sites included in the database. Cascadia tsunami deposits are usually seen as anomalous sand layers in coastal marsh or lake sediments. The studies cited in the database use numerous criteria based on sedimentary characteristics to distinguish tsunami deposits from sand layers deposited by other processes, such as river flooding and storm surges. Several studies cited in the database contain evidence for more than one tsunami at a site. Data categories include age, thickness, layering, grainsize, and other sedimentological characteristics of Cascadia tsunami deposits. The database documents the variability observed in tsunami deposits found along the Cascadia margin.

  20. Protein sequence databases.

    PubMed

    Apweiler, Rolf; Bairoch, Amos; Wu, Cathy H

    2004-02-01

    A variety of protein sequence databases exist, ranging from simple sequence repositories, which store data with little or no manual intervention in the creation of the records, to expertly curated universal databases that cover all species and in which the original sequence data are enhanced by the manual addition of further information in each sequence record. As the focus of researchers moves from the genome to the proteins encoded by it, these databases will play an even more important role as central comprehensive resources of protein information. Several the leading protein sequence databases are discussed here, with special emphasis on the databases now provided by the Universal Protein Knowledgebase (UniProt) consortium. PMID:15036160

  1. Human protein reference database--2006 update.

    PubMed

    Mishra, Gopa R; Suresh, M; Kumaran, K; Kannabiran, N; Suresh, Shubha; Bala, P; Shivakumar, K; Anuradha, N; Reddy, Raghunath; Raghavan, T Madhan; Menon, Shalini; Hanumanthu, G; Gupta, Malvika; Upendran, Sapna; Gupta, Shweta; Mahesh, M; Jacob, Bincy; Mathew, Pinky; Chatterjee, Pritam; Arun, K S; Sharma, Salil; Chandrika, K N; Deshpande, Nandan; Palvankar, Kshitish; Raghavnath, R; Krishnakanth, R; Karathia, Hiren; Rekha, B; Nayak, Rashmi; Vishnupriya, G; Kumar, H G Mohan; Nagini, M; Kumar, G S Sameer; Jose, Rojan; Deepthi, P; Mohan, S Sujatha; Gandhi, T K B; Harsha, H C; Deshpande, Krishna S; Sarker, Malabika; Prasad, T S Keshava; Pandey, Akhilesh

    2006-01-01

    Human Protein Reference Database (HPRD) (http://www.hprd.org) was developed to serve as a comprehensive collection of protein features, post-translational modifications (PTMs) and protein-protein interactions. Since the original report, this database has increased to >20 000 proteins entries and has become the largest database for literature-derived protein-protein interactions (>30 000) and PTMs (>8000) for human proteins. We have also introduced several new features in HPRD including: (i) protein isoforms, (ii) enhanced search options, (iii) linking of pathway annotations and (iv) integration of a novel browser, GenProt Viewer (http://www.genprot.org), developed by us that allows integration of genomic and proteomic information. With the continued support and active participation by the biomedical community, we expect HPRD to become a unique source of curated information for the human proteome and spur biomedical discoveries based on integration of genomic, transcriptomic and proteomic data. PMID:16381900

  2. Dynameomics: A comprehensive database of protein dynamics

    PubMed Central

    van der Kamp, Marc W.; Schaeffer, Richard D.; Jonsson, Amanda L.; Scouras, Alexander D.; Simms, Andrew; Toofanny, Rudesh D.; Benson, Noah C.; Anderson, Peter C.; Merkley, Eric D.; Rysavy, Steve; Bromley, Denny; Beck, David A. C.; Daggett, Valerie

    2010-01-01

    Summary The dynamic behavior of proteins is important for an understanding of their function and folding. We have performed molecular dynamics simulations of the native state and unfolding pathways of over 1000 proteins, representing the majority of folds in globular proteins. These data are stored and organized using an innovative database approach, which can be mined to obtain both general and specific information about the dynamics and folding/unfolding of proteins, relevant subsets thereof, and individual proteins. Here we describe the project in general terms and the type of information contained in the database. Then we provide examples of mining the database for information relevant to protein folding, structure building, the effect of single-nucleotide polymorphisms, and drug design. The native state simulation data and corresponding analyses for the 100 most populated metafolds, together with related resources, are publicly accessible through www.dynameomics.org. PMID:20399180

  3. Hazard Analysis Database Report

    SciTech Connect

    GRAMS, W.H.

    2000-12-28

    The Hazard Analysis Database was developed in conjunction with the hazard analysis activities conducted in accordance with DOE-STD-3009-94, Preparation Guide for U S . Department of Energy Nonreactor Nuclear Facility Safety Analysis Reports, for HNF-SD-WM-SAR-067, Tank Farms Final Safety Analysis Report (FSAR). The FSAR is part of the approved Authorization Basis (AB) for the River Protection Project (RPP). This document describes, identifies, and defines the contents and structure of the Tank Farms FSAR Hazard Analysis Database and documents the configuration control changes made to the database. The Hazard Analysis Database contains the collection of information generated during the initial hazard evaluations and the subsequent hazard and accident analysis activities. The Hazard Analysis Database supports the preparation of Chapters 3 ,4 , and 5 of the Tank Farms FSAR and the Unreviewed Safety Question (USQ) process and consists of two major, interrelated data sets: (1) Hazard Analysis Database: Data from the results of the hazard evaluations, and (2) Hazard Topography Database: Data from the system familiarization and hazard identification.

  4. ResPlan Database

    NASA Technical Reports Server (NTRS)

    Zellers, Michael L.

    2003-01-01

    The main project I was involved in was new application development for the existing CIS0 Database (ResPlan). This database application was developed in Microsoft Access. Initial meetings with Greg Follen, Linda McMillen, Griselle LaFontaine and others identified a few key weaknesses with the existing database. The weaknesses centered around that while the database correctly modeled the structure of Programs, Projects and Tasks, once the data was entered, the database did not capture any dynamic status information, and as such was of limited usefulness. After the initial meetings my goals were identified as follows: Enhance the ResPlan Database to include qualitative and quantitative status information about the Programs, Projects and Tasks Train staff members about the ResPlan database from both the user perspective and the developer perspective Give consideration to a Web Interface for reporting. Initially, the thought was that there would not be adequate time to actually develop the Web Interface, Greg wanted it understood that this was an eventual goal and as such should be a consideration throughout the development process.

  5. Hazard Analysis Database Report

    SciTech Connect

    GAULT, G.W.

    1999-10-13

    The Hazard Analysis Database was developed in conjunction with the hazard analysis activities conducted in accordance with DOE-STD-3009-94, Preparation Guide for US Department of Energy Nonreactor Nuclear Facility Safety Analysis Reports, for the Tank Waste Remediation System (TWRS) Final Safety Analysis Report (FSAR). The FSAR is part of the approved TWRS Authorization Basis (AB). This document describes, identifies, and defines the contents and structure of the TWRS FSAR Hazard Analysis Database and documents the configuration control changes made to the database. The TWRS Hazard Analysis Database contains the collection of information generated during the initial hazard evaluations and the subsequent hazard and accident analysis activities. The database supports the preparation of Chapters 3,4, and 5 of the TWRS FSAR and the USQ process and consists of two major, interrelated data sets: (1) Hazard Evaluation Database--Data from the results of the hazard evaluations; and (2) Hazard Topography Database--Data from the system familiarization and hazard identification.

  6. Cancer Metabolomics and the Human Metabolome Database

    PubMed Central

    Wishart, David S.; Mandal, Rupasri; Stanislaus, Avalyn; Ramirez-Gaona, Miguel

    2016-01-01

    The application of metabolomics towards cancer research has led to a renewed appreciation of metabolism in cancer development and progression. It has also led to the discovery of metabolite cancer biomarkers and the identification of a number of novel cancer causing metabolites. The rapid growth of metabolomics in cancer research is also leading to challenges. In particular, with so many cancer-associate metabolites being identified, it is often difficult to keep track of which compounds are associated with which cancers. It is also challenging to track down information on the specific pathways that particular metabolites, drugs or drug metabolites may be affecting. Even more frustrating are the difficulties associated with identifying metabolites from NMR or MS spectra. Fortunately, a number of metabolomics databases are emerging that are designed to address these challenges. One such database is the Human Metabolome Database (HMDB). The HMDB is currently the world’s largest and most comprehensive, organism-specific metabolomics database. It contains more than 40,000 metabolite entries, thousands of metabolite concentrations, >700 metabolic and disease-associated pathways, as well as information on dozens of cancer biomarkers. This review is intended to provide a brief summary of the HMDB and to offer some guidance on how it can be used in metabolomic studies of cancer. PMID:26950159

  7. Database for propagation models

    NASA Technical Reports Server (NTRS)

    Kantak, Anil V.

    1991-01-01

    A propagation researcher or a systems engineer who intends to use the results of a propagation experiment is generally faced with various database tasks such as the selection of the computer software, the hardware, and the writing of the programs to pass the data through the models of interest. This task is repeated every time a new experiment is conducted or the same experiment is carried out at a different location generating different data. Thus the users of this data have to spend a considerable portion of their time learning how to implement the computer hardware and the software towards the desired end. This situation may be facilitated considerably if an easily accessible propagation database is created that has all the accepted (standardized) propagation phenomena models approved by the propagation research community. Also, the handling of data will become easier for the user. Such a database construction can only stimulate the growth of the propagation research it if is available to all the researchers, so that the results of the experiment conducted by one researcher can be examined independently by another, without different hardware and software being used. The database may be made flexible so that the researchers need not be confined only to the contents of the database. Another way in which the database may help the researchers is by the fact that they will not have to document the software and hardware tools used in their research since the propagation research community will know the database already. The following sections show a possible database construction, as well as properties of the database for the propagation research.

  8. The Gaia Parameter Database

    NASA Astrophysics Data System (ADS)

    de Bruijne, J. H. J.; Lammers, U.; Perryman, M. A. C.

    2005-01-01

    The parallel development of many aspects of a complex mission like Gaia, which includes numerous participants in ESA, industrial companies, and a large and active scientific collaboration throughout Europe, makes keeping track of the many design changes, instrument and operational complexities, and numerical values for the data analysis a very challenging problem. A comprehensive, easily-accessible, up-to-date, and definitive compilation of a large range of numerical quantities is required, and the Gaia parameter database has been established to satisfy these needs. The database is a centralised repository containing, besides mathematical, physical, and astronomical constants, many satellite and subsystem design parameters. At the end of 2004, more than 1600 parameters had been included. Version control has been implemented, providing, next to a `live' version with the most recent parameters, well-defined reference versions of the full database contents. The database can be queried or browsed using a regular Web browser (http://www.rssd.esa.int/Gaia/paramdb). Query results are formated by default in HTML. Data can also be retrieved as Fortran-77, Fortran-90, Java, ANSIC, C++, or XML structures for direct inclusion into software codes in these languages. The idea is that all collaborating scientists can use the database parameters and values, once retrieved, directly linked to computational routines. An off-line access mode is also available, enabling users to automatically download the contents of the database. The database will be maintained actively, and significant extensions of the contents are planned. Consistent use in the future of the database by the Gaia community at large, including all industrial teams, will ensure correct numerical values throughout the complex software systems being built up as details of the Gaia design develop. The database is already being used for the telemetry simulation chain in ESTEC, and in the data simulations for GDAAS2.

  9. JICST Factual Database(2)

    NASA Astrophysics Data System (ADS)

    Araki, Keisuke

    The computer programme, which builds atom-bond connection tables from nomenclatures, is developed. Chemical substances with their nomenclature and varieties of trivial names or experimental code numbers are inputted. The chemical structures of the database are stereospecifically stored and are able to be searched and displayed according to stereochemistry. Source data are from laws and regulations of Japan, RTECS of US and so on. The database plays a central role within the integrated fact database service of JICST and makes interrelational retrieval possible.

  10. Databases for materials selection

    SciTech Connect

    1996-06-01

    The Cambridge Materials Selector (CMS2.0) materials database was developed by the Engineering Dept. at Cambridge University in the United Kingdom. This database makes it possible to select a material for a specific application from essentially all classes of materials. Genera, Predict, and Socrates software programs from CLI International, Houston, Texas, automate materials selection and corrosion problem-solving tasks. They are said to significantly reduce the time necessary to select a suitable material and/or to assess a corrosion problem and reach cost-effective solutions. This article describes both databases and tells how to use them.