Sample records for ncicb gforge caarray

  1. caCORE: a common infrastructure for cancer informatics.

    PubMed

    Covitz, Peter A; Hartel, Frank; Schaefer, Carl; De Coronado, Sherri; Fragoso, Gilberto; Sahni, Himanso; Gustafson, Scott; Buetow, Kenneth H

    2003-12-12

    Sites with substantive bioinformatics operations are challenged to build data processing and delivery infrastructure that provides reliable access and enables data integration. Locally generated data must be processed and stored such that relationships to external data sources can be presented. Consistency and comparability across data sets requires annotation with controlled vocabularies and, further, metadata standards for data representation. Programmatic access to the processed data should be supported to ensure the maximum possible value is extracted. Confronted with these challenges at the National Cancer Institute Center for Bioinformatics, we decided to develop a robust infrastructure for data management and integration that supports advanced biomedical applications. We have developed an interconnected set of software and services called caCORE. Enterprise Vocabulary Services (EVS) provide controlled vocabulary, dictionary and thesaurus services. The Cancer Data Standards Repository (caDSR) provides a metadata registry for common data elements. Cancer Bioinformatics Infrastructure Objects (caBIO) implements an object-oriented model of the biomedical domain and provides Java, Simple Object Access Protocol and HTTP-XML application programming interfaces. caCORE has been used to develop scientific applications that bring together data from distinct genomic and clinical science sources. caCORE downloads and web interfaces can be accessed from links on the caCORE web site (http://ncicb.nci.nih.gov/core). caBIO software is distributed under an open source license that permits unrestricted academic and commercial use. Vocabulary and metadata content in the EVS and caDSR, respectively, is similarly unrestricted, and is available through web applications and FTP downloads. http://ncicb.nci.nih.gov/core/publications contains links to the caBIO 1.0 class diagram and the caCORE 1.0 Technical Guide, which provide detailed information on the present caCORE architecture, data sources and APIs. Updated information appears on a regular basis on the caCORE web site (http://ncicb.nci.nih.gov/core).

  2. [caCORE: core architecture of bioinformation on cancer research in America].

    PubMed

    Gao, Qin; Zhang, Yan-lei; Xie, Zhi-yun; Zhang, Qi-peng; Hu, Zhang-zhi

    2006-04-18

    A critical factor in the advancement of biomedical research is the ease with which data can be integrated, redistributed and analyzed both within and across domains. This paper summarizes the Biomedical Information Core Infrastructure built by National Cancer Institute Center for Bioinformatics in America (NCICB). The main product from the Core Infrastructure is caCORE--cancer Common Ontologic Reference Environment, which is the infrastructure backbone supporting data management and application development at NCICB. The paper explains the structure and function of caCORE: (1) Enterprise Vocabulary Services (EVS). They provide controlled vocabulary, dictionary and thesaurus services, and EVS produces the NCI Thesaurus and the NCI Metathesaurus; (2) The Cancer Data Standards Repository (caDSR). It provides a metadata registry for common data elements. (3) Cancer Bioinformatics Infrastructure Objects (caBIO). They provide Java, Simple Object Access Protocol and HTTP-XML application programming interfaces. The vision for caCORE is to provide a common data management framework that will support the consistency, clarity, and comparability of biomedical research data and information. In addition to providing facilities for data management and redistribution, caCORE helps solve problems of data integration. All NCICB-developed caCORE components are distributed under open-source licenses that support unrestricted usage by both non-profit and commercial entities, and caCORE has laid the foundation for a number of scientific and clinical applications. Based on it, the paper expounds caCORE-base applications simply in several NCI projects, of which one is CMAP (Cancer Molecular Analysis Project), and the other is caBIG (Cancer Biomedical Informatics Grid). In the end, the paper also gives good prospects of caCORE, and while caCORE was born out of the needs of the cancer research community, it is intended to serve as a general resource. Cancer research has historically contributed to many areas beyond tumor biology. At the same time, the paper makes some suggestions about the study at the present time on biomedical informatics in China.

  3. Semantic web data warehousing for caGrid.

    PubMed

    McCusker, James P; Phillips, Joshua A; González Beltrán, Alejandra; Finkelstein, Anthony; Krauthammer, Michael

    2009-10-01

    The National Cancer Institute (NCI) is developing caGrid as a means for sharing cancer-related data and services. As more data sets become available on caGrid, we need effective ways of accessing and integrating this information. Although the data models exposed on caGrid are semantically well annotated, it is currently up to the caGrid client to infer relationships between the different models and their classes. In this paper, we present a Semantic Web-based data warehouse (Corvus) for creating relationships among caGrid models. This is accomplished through the transformation of semantically-annotated caBIG Unified Modeling Language (UML) information models into Web Ontology Language (OWL) ontologies that preserve those semantics. We demonstrate the validity of the approach by Semantic Extraction, Transformation and Loading (SETL) of data from two caGrid data sources, caTissue and caArray, as well as alignment and query of those sources in Corvus. We argue that semantic integration is necessary for integration of data from distributed web services and that Corvus is a useful way of accomplishing this. Our approach is generalizable and of broad utility to researchers facing similar integration challenges.

  4. caCORE version 3: Implementation of a model driven, service-oriented architecture for semantic interoperability.

    PubMed

    Komatsoulis, George A; Warzel, Denise B; Hartel, Francis W; Shanbhag, Krishnakant; Chilukuri, Ram; Fragoso, Gilberto; Coronado, Sherri de; Reeves, Dianne M; Hadfield, Jillaine B; Ludet, Christophe; Covitz, Peter A

    2008-02-01

    One of the requirements for a federated information system is interoperability, the ability of one computer system to access and use the resources of another system. This feature is particularly important in biomedical research systems, which need to coordinate a variety of disparate types of data. In order to meet this need, the National Cancer Institute Center for Bioinformatics (NCICB) has created the cancer Common Ontologic Representation Environment (caCORE), an interoperability infrastructure based on Model Driven Architecture. The caCORE infrastructure provides a mechanism to create interoperable biomedical information systems. Systems built using the caCORE paradigm address both aspects of interoperability: the ability to access data (syntactic interoperability) and understand the data once retrieved (semantic interoperability). This infrastructure consists of an integrated set of three major components: a controlled terminology service (Enterprise Vocabulary Services), a standards-based metadata repository (the cancer Data Standards Repository) and an information system with an Application Programming Interface (API) based on Domain Model Driven Architecture. This infrastructure is being leveraged to create a Semantic Service-Oriented Architecture (SSOA) for cancer research by the National Cancer Institute's cancer Biomedical Informatics Grid (caBIG).

  5. caCORE version 3: Implementation of a model driven, service-oriented architecture for semantic interoperability

    PubMed Central

    Komatsoulis, George A.; Warzel, Denise B.; Hartel, Frank W.; Shanbhag, Krishnakant; Chilukuri, Ram; Fragoso, Gilberto; de Coronado, Sherri; Reeves, Dianne M.; Hadfield, Jillaine B.; Ludet, Christophe; Covitz, Peter A.

    2008-01-01

    One of the requirements for a federated information system is interoperability, the ability of one computer system to access and use the resources of another system. This feature is particularly important in biomedical research systems, which need to coordinate a variety of disparate types of data. In order to meet this need, the National Cancer Institute Center for Bioinformatics (NCICB) has created the cancer Common Ontologic Representation Environment (caCORE), an interoperability infrastructure based on Model Driven Architecture. The caCORE infrastructure provides a mechanism to create interoperable biomedical information systems. Systems built using the caCORE paradigm address both aspects of interoperability: the ability to access data (syntactic interoperability) and understand the data once retrieved (semantic interoperability). This infrastructure consists of an integrated set of three major components: a controlled terminology service (Enterprise Vocabulary Services), a standards-based metadata repository (the cancer Data Standards Repository) and an information system with an Application Programming Interface (API) based on Domain Model Driven Architecture. This infrastructure is being leveraged to create a Semantic Service Oriented Architecture (SSOA) for cancer research by the National Cancer Institute’s cancer Biomedical Informatics Grid (caBIG™). PMID:17512259

  6. Semantic web data warehousing for caGrid

    PubMed Central

    McCusker, James P; Phillips, Joshua A; Beltrán, Alejandra González; Finkelstein, Anthony; Krauthammer, Michael

    2009-01-01

    The National Cancer Institute (NCI) is developing caGrid as a means for sharing cancer-related data and services. As more data sets become available on caGrid, we need effective ways of accessing and integrating this information. Although the data models exposed on caGrid are semantically well annotated, it is currently up to the caGrid client to infer relationships between the different models and their classes. In this paper, we present a Semantic Web-based data warehouse (Corvus) for creating relationships among caGrid models. This is accomplished through the transformation of semantically-annotated caBIG® Unified Modeling Language (UML) information models into Web Ontology Language (OWL) ontologies that preserve those semantics. We demonstrate the validity of the approach by Semantic Extraction, Transformation and Loading (SETL) of data from two caGrid data sources, caTissue and caArray, as well as alignment and query of those sources in Corvus. We argue that semantic integration is necessary for integration of data from distributed web services and that Corvus is a useful way of accomplishing this. Our approach is generalizable and of broad utility to researchers facing similar integration challenges. PMID:19796399

  7. The MGED Ontology: a resource for semantics-based description of microarray experiments.

    PubMed

    Whetzel, Patricia L; Parkinson, Helen; Causton, Helen C; Fan, Liju; Fostel, Jennifer; Fragoso, Gilberto; Game, Laurence; Heiskanen, Mervi; Morrison, Norman; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Taylor, Chris; White, Joseph; Stoeckert, Christian J

    2006-04-01

    The generation of large amounts of microarray data and the need to share these data bring challenges for both data management and annotation and highlights the need for standards. MIAME specifies the minimum information needed to describe a microarray experiment and the Microarray Gene Expression Object Model (MAGE-OM) and resulting MAGE-ML provide a mechanism to standardize data representation for data exchange, however a common terminology for data annotation is needed to support these standards. Here we describe the MGED Ontology (MO) developed by the Ontology Working Group of the Microarray Gene Expression Data (MGED) Society. The MO provides terms for annotating all aspects of a microarray experiment from the design of the experiment and array layout, through to the preparation of the biological sample and the protocols used to hybridize the RNA and analyze the data. The MO was developed to provide terms for annotating experiments in line with the MIAME guidelines, i.e. to provide the semantics to describe a microarray experiment according to the concepts specified in MIAME. The MO does not attempt to incorporate terms from existing ontologies, e.g. those that deal with anatomical parts or developmental stages terms, but provides a framework to reference terms in other ontologies and therefore facilitates the use of ontologies in microarray data annotation. The MGED Ontology version.1.2.0 is available as a file in both DAML and OWL formats at http://mged.sourceforge.net/ontologies/index.php. Release notes and annotation examples are provided. The MO is also provided via the NCICB's Enterprise Vocabulary System (http://nciterms.nci.nih.gov/NCIBrowser/Dictionary.do). Stoeckrt@pcbi.upenn.edu Supplementary data are available at Bioinformatics online.

  8. Telling metabolic stories to explore metabolomics data: a case study on the yeast response to cadmium exposure.

    PubMed

    Milreu, Paulo Vieira; Klein, Cecilia Coimbra; Cottret, Ludovic; Acuña, Vicente; Birmelé, Etienne; Borassi, Michele; Junot, Christophe; Marchetti-Spaccamela, Alberto; Marino, Andrea; Stougie, Leen; Jourdan, Fabien; Crescenzi, Pierluigi; Lacroix, Vincent; Sagot, Marie-France

    2014-01-01

    The increasing availability of metabolomics data enables to better understand the metabolic processes involved in the immediate response of an organism to environmental changes and stress. The data usually come in the form of a list of metabolites whose concentrations significantly changed under some conditions, and are thus not easy to interpret without being able to precisely visualize how such metabolites are interconnected. We present a method that enables to organize the data from any metabolomics experiment into metabolic stories. Each story corresponds to a possible scenario explaining the flow of matter between the metabolites of interest. These scenarios may then be ranked in different ways depending on which interpretation one wishes to emphasize for the causal link between two affected metabolites: enzyme activation, enzyme inhibition or domino effect on the concentration changes of substrates and products. Equally probable stories under any selected ranking scheme can be further grouped into a single anthology that summarizes, in a unique subnetwork, all equivalently plausible alternative stories. An anthology is simply a union of such stories. We detail an application of the method to the response of yeast to cadmium exposure. We use this system as a proof of concept for our method, and we show that we are able to find a story that reproduces very well the current knowledge about the yeast response to cadmium. We further show that this response is mostly based on enzyme activation. We also provide a framework for exploring the alternative pathways or side effects this local response is expected to have in the rest of the network. We discuss several interpretations for the changes we see, and we suggest hypotheses that could in principle be experimentally tested. Noticeably, our method requires simple input data and could be used in a wide variety of applications. The code for the method presented in this article is available at http://gobbolino.gforge.inria.fr.

  9. Telling metabolic stories to explore metabolomics data: a case study on the yeast response to cadmium exposure

    PubMed Central

    Milreu, Paulo Vieira; Klein, Cecilia Coimbra; Cottret, Ludovic; Acuña, Vicente; Birmelé, Etienne; Borassi, Michele; Junot, Christophe; Marchetti-Spaccamela, Alberto; Marino, Andrea; Stougie, Leen; Jourdan, Fabien; Crescenzi, Pierluigi; Lacroix, Vincent; Sagot, Marie-France

    2014-01-01

    Motivation: The increasing availability of metabolomics data enables to better understand the metabolic processes involved in the immediate response of an organism to environmental changes and stress. The data usually come in the form of a list of metabolites whose concentrations significantly changed under some conditions, and are thus not easy to interpret without being able to precisely visualize how such metabolites are interconnected. Results: We present a method that enables to organize the data from any metabolomics experiment into metabolic stories. Each story corresponds to a possible scenario explaining the flow of matter between the metabolites of interest. These scenarios may then be ranked in different ways depending on which interpretation one wishes to emphasize for the causal link between two affected metabolites: enzyme activation, enzyme inhibition or domino effect on the concentration changes of substrates and products. Equally probable stories under any selected ranking scheme can be further grouped into a single anthology that summarizes, in a unique subnetwork, all equivalently plausible alternative stories. An anthology is simply a union of such stories. We detail an application of the method to the response of yeast to cadmium exposure. We use this system as a proof of concept for our method, and we show that we are able to find a story that reproduces very well the current knowledge about the yeast response to cadmium. We further show that this response is mostly based on enzyme activation. We also provide a framework for exploring the alternative pathways or side effects this local response is expected to have in the rest of the network. We discuss several interpretations for the changes we see, and we suggest hypotheses that could in principle be experimentally tested. Noticeably, our method requires simple input data and could be used in a wide variety of applications. Availability and implementation: The code for the method presented in this article is available at http://gobbolino.gforge.inria.fr. Contact: pvmilreu@gmail.com; vincent.lacroix@univ-lyon1.fr; marie-france.sagot@inria.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24167155

  10. G-DOC Plus - an integrative bioinformatics platform for precision medicine.

    PubMed

    Bhuvaneshwar, Krithika; Belouali, Anas; Singh, Varun; Johnson, Robert M; Song, Lei; Alaoui, Adil; Harris, Michael A; Clarke, Robert; Weiner, Louis M; Gusev, Yuriy; Madhavan, Subha

    2016-04-30

    G-DOC Plus is a data integration and bioinformatics platform that uses cloud computing and other advanced computational tools to handle a variety of biomedical BIG DATA including gene expression arrays, NGS and medical images so that they can be analyzed in the full context of other omics and clinical information. G-DOC Plus currently holds data from over 10,000 patients selected from private and public resources including Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA) and the recently added datasets from REpository for Molecular BRAin Neoplasia DaTa (REMBRANDT), caArray studies of lung and colon cancer, ImmPort and the 1000 genomes data sets. The system allows researchers to explore clinical-omic data one sample at a time, as a cohort of samples; or at the level of population, providing the user with a comprehensive view of the data. G-DOC Plus tools have been leveraged in cancer and non-cancer studies for hypothesis generation and validation; biomarker discovery and multi-omics analysis, to explore somatic mutations and cancer MRI images; as well as for training and graduate education in bioinformatics, data and computational sciences. Several of these use cases are described in this paper to demonstrate its multifaceted usability. G-DOC Plus can be used to support a variety of user groups in multiple domains to enable hypothesis generation for precision medicine research. The long-term vision of G-DOC Plus is to extend this translational bioinformatics platform to stay current with emerging omics technologies and analysis methods to continue supporting novel hypothesis generation, analysis and validation for integrative biomedical research. By integrating several aspects of the disease and exposing various data elements, such as outpatient lab workup, pathology, radiology, current treatments, molecular signatures and expected outcomes over a web interface, G-DOC Plus will continue to strengthen precision medicine research. G-DOC Plus is available at: https://gdoc.georgetown.edu .

  11. Genomic pathway analysis reveals that EZH2 and HDAC4 represent mutually exclusive epigenetic pathways across human cancers

    PubMed Central

    2013-01-01

    Background Alterations in epigenetic marks, including methylation or acetylation, are common in human cancers. For many epigenetic pathways, however, direct measures of activity are unknown, making their role in various cancers difficult to assess. Gene expression signatures facilitate the examination of patterns of epigenetic pathway activation across and within human cancer types allowing better understanding of the relationships between these pathways. Methods We used Bayesian regression to generate gene expression signatures from normal epithelial cells before and after epigenetic pathway activation. Signatures were applied to datasets from TCGA, GEO, CaArray, ArrayExpress, and the cancer cell line encyclopedia. For TCGA data, signature results were correlated with copy number variation and DNA methylation changes. GSEA was used to identify biologic pathways related to the signatures. Results We developed and validated signatures reflecting downstream effects of enhancer of zeste homolog 2(EZH2), histone deacetylase(HDAC) 1, HDAC4, sirtuin 1(SIRT1), and DNA methyltransferase 2(DNMT2). By applying these signatures to data from cancer cell lines and tumors in large public repositories, we identify those cancers that have the highest and lowest activation of each of these pathways. Highest EZH2 activation is seen in neuroblastoma, hepatocellular carcinoma, small cell lung cancer, and melanoma, while highest HDAC activity is seen in pharyngeal cancer, kidney cancer, and pancreatic cancer. Across all datasets studied, activation of both EZH2 and HDAC4 is significantly underrepresented. Using breast cancer and glioblastoma as examples to examine intrinsic subtypes of particular cancers, EZH2 activation was highest in luminal breast cancers and proneural glioblastomas, while HDAC4 activation was highest in basal breast cancer and mesenchymal glioblastoma. EZH2 and HDAC4 activation are associated with particular chromosome abnormalities: EZH2 activation with aberrations in genes from the TGF and phosphatidylinositol pathways and HDAC4 activation with aberrations in inflammatory and chemokine related genes. Conclusion Gene expression patterns can reveal the activation level of epigenetic pathways. Epigenetic pathways define biologically relevant subsets of human cancers. EZH2 activation and HDAC4 activation correlate with growth factor signaling and inflammation, respectively, and represent two distinct states for cancer cells. This understanding may allow us to identify targetable drivers in these cancer subsets. PMID:24079712

Top